HOME

TheInfoList



OR:

In
statistics Statistics (from German language, German: ''wikt:Statistik#German, Statistik'', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of ...
, the Fisher transformation (or Fisher ''z''-transformation) of a
Pearson correlation coefficient In statistics, the Pearson correlation coefficient (PCC, pronounced ) ― also known as Pearson's ''r'', the Pearson product-moment correlation coefficient (PPMCC), the bivariate correlation, or colloquially simply as the correlation coefficient ...
is its
inverse hyperbolic tangent In mathematics, the inverse hyperbolic functions are the inverse functions of the hyperbolic functions. For a given value of a hyperbolic function, the corresponding inverse hyperbolic function provides the corresponding hyperbolic angle. The ...
(artanh). When the sample correlation coefficient ''r'' is near 1 or -1, its distribution is highly
skewed In probability theory and statistics, skewness is a measure of the asymmetry of the probability distribution of a real-valued random variable about its mean. The skewness value can be positive, zero, negative, or undefined. For a unimoda ...
, which makes it difficult to estimate
confidence intervals In frequentist statistics, a confidence interval (CI) is a range of estimates for an unknown parameter. A confidence interval is computed at a designated ''confidence level''; the 95% confidence level is most common, but other levels, such as 9 ...
and apply
tests of significance A statistical hypothesis test is a method of statistical inference used to decide whether the data at hand sufficiently support a particular hypothesis. Hypothesis testing allows us to make probabilistic statements about population parameters. ...
for the population correlation coefficient ρ. The Fisher transformation solves this problem by yielding a variable whose distribution is approximately normally distributed, with a variance that is stable over different values of ''r''.


Definition

Given a set of ''N'' bivariate sample pairs (''X''''i'', ''Y''''i''), ''i'' = 1, …, ''N'', the sample correlation coefficient ''r'' is given by :r = \frac = \frac. Here \operatorname(X,Y) stands for the
covariance In probability theory and statistics, covariance is a measure of the joint variability of two random variables. If the greater values of one variable mainly correspond with the greater values of the other variable, and the same holds for the les ...
between the variables X and Y and \sigma stands for the
standard deviation In statistics, the standard deviation is a measure of the amount of variation or dispersion of a set of values. A low standard deviation indicates that the values tend to be close to the mean (also called the expected value) of the set, while ...
of the respective variable. Fisher's z-transformation of ''r'' is defined as :z = \ln\left(\right) = \operatorname(r), where "ln" is the
natural logarithm The natural logarithm of a number is its logarithm to the base of the mathematical constant , which is an irrational and transcendental number approximately equal to . The natural logarithm of is generally written as , , or sometimes, if ...
function and "artanh" is the inverse hyperbolic tangent function. If (''X'', ''Y'') has a
bivariate normal distribution In probability theory and statistics, the multivariate normal distribution, multivariate Gaussian distribution, or joint normal distribution is a generalization of the one-dimensional (univariate) normal distribution to higher dimensions. One ...
with correlation ρ and the pairs (''X''''i'', ''Y''''i'') are
independent and identically distributed In probability theory and statistics, a collection of random variables is independent and identically distributed if each random variable has the same probability distribution as the others and all are mutually independent. This property is usua ...
, then ''z'' is approximately normally distributed with mean :\ln\left(\right), and
standard error The standard error (SE) of a statistic (usually an estimate of a parameter) is the standard deviation of its sampling distribution or an estimate of that standard deviation. If the statistic is the sample mean, it is called the standard error ...
:, where ''N'' is the sample size, and ρ is the true correlation coefficient. This transformation, and its inverse :r = \frac = \operatorname(z), can be used to construct a large-sample
confidence interval In frequentist statistics, a confidence interval (CI) is a range of estimates for an unknown parameter. A confidence interval is computed at a designated ''confidence level''; the 95% confidence level is most common, but other levels, such as 9 ...
for ''r'' using standard normal theory and derivations. See also application to
partial correlation In probability theory and statistics, partial correlation measures the degree of association between two random variables, with the effect of a set of controlling random variables removed. When determining the numerical relationship between two va ...
.


Derivation

Hotelling gives a concise derivation of the Fisher transformation. To derive the Fisher transformation, one starts by considering an arbitrary increasing function of r, say G(r). Finding the first term in the large-N expansion of the corresponding
skewness In probability theory and statistics, skewness is a measure of the asymmetry of the probability distribution of a real-valued random variable about its mean. The skewness value can be positive, zero, negative, or undefined. For a unimodal d ...
\kappa_3 results in :\kappa_3=\frac+O(N^). Setting \kappa_3=0 and solving the corresponding differential equation for G yields the inverse hyperbolic tangent G(\rho)=\operatorname(\rho) function. Similarly expanding the mean and variance of \operatorname(r), one gets :\operatorname(\rho )+\frac+O(N^) and :\frac+\frac+O(N^) respectively. The extra terms are not part of the usual Fisher transformation. For large values of \rho and small values of N they represent a large improvement of accuracy at minimal cost, although they greatly complicate the computation of the inverse – a
closed-form expression In mathematics, a closed-form expression is a mathematical expression that uses a finite number of standard operations. It may contain constants, variables, certain well-known operations (e.g., + − × ÷), and functions (e.g., ''n''th roo ...
is not available. The near-constant variance of the transformation is the result of removing its skewness – the actual improvement is achieved by the latter, not by the extra terms. Including the extra terms yields: :\frac which has, to an excellent approximation, a standard normal distribution.


Application

The application of Fisher's transformation can be enhanced using a software calculator as shown in the figure. Assuming that the r-squared value found is 0.80, that there are 30 data , and accepting a 90% confidence interval, the r-squared value in another random sample from the same population may range from 0.588 to 0.921. When r-squared is outside this range, the population is considered to be different. However, if a certain data set is analysed with two different regression models while the first model yields r-squared = 0.80 and the second r-squared is 0.49, one may conclude that the second model is insignificant as the value 0.49 is below the critical value 0.588.


Discussion

The Fisher transformation is an approximate
variance-stabilizing transformation In applied statistics, a variance-stabilizing transformation is a data transformation that is specifically chosen either to simplify considerations in graphical exploratory data analysis or to allow the application of simple regression-based or anal ...
for ''r'' when ''X'' and ''Y'' follow a bivariate normal distribution. This means that the variance of ''z'' is approximately constant for all values of the population correlation coefficient ''ρ''. Without the Fisher transformation, the variance of ''r'' grows smaller as , ''ρ'', gets closer to 1. Since the Fisher transformation is approximately the identity function when , ''r'',  < 1/2, it is sometimes useful to remember that the variance of ''r'' is well approximated by 1/''N'' as long as , ''ρ'', is not too large and ''N'' is not too small. This is related to the fact that the asymptotic variance of ''r'' is 1 for bivariate normal data. The behavior of this transform has been extensively studied since
Fisher Fisher is an archaic term for a fisherman, revived as gender-neutral. Fisher, Fishers or The Fisher may also refer to: Places Australia *Division of Fisher, an electoral district in the Australian House of Representatives, in Queensland *Elect ...
introduced it in 1915. Fisher himself found the exact distribution of ''z'' for data from a bivariate normal distribution in 1921; Gayen in 1951 determined the exact distribution of ''z'' for data from a bivariate Type A
Edgeworth distribution Edgeworth may refer to: People * Edgeworth (surname) Places * Edgeworth, Gloucestershire, England * Edgeworth, New South Wales, Australia * Edgeworth, Pennsylvania, USA * Edgeworth Island, Nunavut, Canada * Edgeworthstown, County Longford, R ...
. Hotelling in 1953 calculated the Taylor series expressions for the moments of ''z'' and several related statistics and Hawkins in 1989 discovered the asymptotic distribution of ''z'' for data from a distribution with bounded fourth moments. An alternative to the Fisher transformation is to use the exact
confidence distribution In statistical inference, the concept of a confidence distribution (CD) has often been loosely referred to as a distribution function on the parameter space that can represent confidence intervals of all levels for a parameter of interest. Histor ...
density for ''ρ'' given by \pi (\rho , r) = \frac (1 - r^2)^ \cdot (1 - \rho^2)^ \cdot (1 - r \rho )^ F\!\left(\frac,-\frac; \nu + \frac; \frac\right) where F is the Gaussian hypergeometric function and \nu = N-1 > 1 .


Other uses

While the Fisher transformation is mainly associated with the
Pearson product-moment correlation coefficient In statistics, the Pearson correlation coefficient (PCC, pronounced ) ― also known as Pearson's ''r'', the Pearson product-moment correlation coefficient (PPMCC), the bivariate correlation, or colloquially simply as the correlation coefficient ...
for bivariate normal observations, it can also be applied to
Spearman's rank correlation coefficient In statistics, Spearman's rank correlation coefficient or Spearman's ''ρ'', named after Charles Spearman and often denoted by the Greek letter \rho (rho) or as r_s, is a nonparametric measure of rank correlation ( statistical dependence between ...
in more general cases. A similar result for the
asymptotic distribution In mathematics and statistics, an asymptotic distribution is a probability distribution that is in a sense the "limiting" distribution of a sequence of distributions. One of the main uses of the idea of an asymptotic distribution is in providing ...
applies, but with a minor adjustment factor: see the latter article for details.


See also

*
Data transformation (statistics) In statistics, data transformation is the application of a deterministic mathematical function to each point in a data set—that is, each data point ''zi'' is replaced with the transformed value ''yi'' = ''f''(''zi''), where ''f'' is a functio ...
*
Meta-analysis A meta-analysis is a statistical analysis that combines the results of multiple scientific studies. Meta-analyses can be performed when there are multiple scientific studies addressing the same question, with each individual study reporting me ...
(this transformation is used in meta analysis for stabilizing the variance) *
Partial correlation In probability theory and statistics, partial correlation measures the degree of association between two random variables, with the effect of a set of controlling random variables removed. When determining the numerical relationship between two va ...
*


References


External links

* Rbr>implementation
{{DEFAULTSORT:Fisher Transformation Covariance and correlation Transforms