Variance-stabilizing Transformation
   HOME

TheInfoList



OR:

In applied
statistics Statistics (from German language, German: ''wikt:Statistik#German, Statistik'', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of ...
, a variance-stabilizing transformation is a
data transformation In computing, data transformation is the process of converting data from one format or structure into another format or structure. It is a fundamental aspect of most data integrationCIO.com. Agile Comes to Data Integration. Retrieved from: http ...
that is specifically chosen either to simplify considerations in graphical exploratory data analysis or to allow the application of simple regression-based or
analysis of variance Analysis of variance (ANOVA) is a collection of statistical models and their associated estimation procedures (such as the "variation" among and between groups) used to analyze the differences among means. ANOVA was developed by the statisticia ...
techniques.


Overview

The aim behind the choice of a variance-stabilizing transformation is to find a simple function ''ƒ'' to apply to values ''x'' in a data set to create new values such that the variability of the values ''y'' is not related to their mean value. For example, suppose that the values x are realizations from different
Poisson distribution In probability theory and statistics, the Poisson distribution is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time or space if these events occur with a known co ...
s: i.e. the distributions each have different mean values ''μ''. Then, because for the Poisson distribution the variance is identical to the mean, the variance varies with the mean. However, if the simple variance-stabilizing transformation :y=\sqrt \, is applied, the sampling variance associated with observation will be nearly constant: see
Anscombe transform In statistics, the Anscombe transform, named after Francis Anscombe, is a variance-stabilizing transformation that transforms a random variable with a Poisson distribution into one with an approximately standard Gaussian distribution. The Ansc ...
for details and some alternative transformations. While variance-stabilizing transformations are well known for certain parametric families of distributions, such as the Poisson and the
binomial distribution In probability theory and statistics, the binomial distribution with parameters ''n'' and ''p'' is the discrete probability distribution of the number of successes in a sequence of ''n'' independent experiments, each asking a yes–no quest ...
, some types of data analysis proceed more empirically: for example by searching among
power transform In statistics, a power transform is a family of functions applied to create a monotonic transformation of data using power functions. It is a data transformation technique used to stabilize variance, make the data more normal distribution-like, i ...
ations to find a suitable fixed transformation. Alternatively, if data analysis suggests a functional form for the relation between variance and mean, this can be used to deduce a variance-stabilizing transformation. Thus if, for a mean ''μ'', :\operatorname(X)=h(\mu), \, a suitable basis for a variance stabilizing transformation would be :y \propto \int^x \frac \, d\mu, where the arbitrary constant of integration and an arbitrary scaling factor can be chosen for convenience.


Example: relative variance

If is a positive random variable and the variance is given as then the standard deviation is proportional to the mean, which is called fixed
relative error The approximation error in a data value is the discrepancy between an exact value and some ''approximation'' to it. This error can be expressed as an absolute error (the numerical amount of the discrepancy) or as a relative error (the absolute er ...
. In this case, the variance-stabilizing transformation is :y = \int^x \frac = \frac \ln(x) \propto \log(x)\,. That is, the variance-stabilizing transformation is the logarithmic transformation.


Example: absolute plus relative variance

If the variance is given as then the variance is dominated by a fixed variance when is small enough and is dominated by the relative variance when is large enough. In this case, the variance-stabilizing transformation is :y = \int^x \frac = \frac \operatorname \frac \propto \operatorname \frac\,. That is, the variance-stabilizing transformation is the
inverse hyperbolic sine In mathematics, the inverse hyperbolic functions are the inverse functions of the hyperbolic functions. For a given value of a hyperbolic function, the corresponding inverse hyperbolic function provides the corresponding hyperbolic angle. The s ...
of the scaled value for .


Relationship to the delta method

Here, the
delta method In statistics, the delta method is a result concerning the approximate probability distribution for a function of an asymptotically normal statistical estimator from knowledge of the limiting variance of that estimator. History The delta method ...
is presented in a rough way, but it is enough to see the relation with the variance-stabilizing transformations. To see a more formal approach see
delta method In statistics, the delta method is a result concerning the approximate probability distribution for a function of an asymptotically normal statistical estimator from knowledge of the limiting variance of that estimator. History The delta method ...
. Let X be a random variable, with E \mu and \operatorname(X)=\sigma^2 . Define Y=g(X) , where g is a regular function. A first order Taylor approximation for Y=g(x) is: : Y=g(X)\approx g(\mu)+g'(\mu)(X-\mu) From the equation above, we obtain: : E = g(\mu) and \operatorname \sigma^2g'(\mu)^2 This approximation method is called delta method. Consider now a random variable X such that E \mu and \operatorname h(\mu) . Notice the relation between the variance and the mean, which implies, for example,
heteroscedasticity In statistics, a sequence (or a vector) of random variables is homoscedastic () if all its random variables have the same finite variance. This is also known as homogeneity of variance. The complementary notion is called heteroscedasticity. The s ...
in a linear model. Therefore, the goal is to find a function g such that Y=g(X) has a variance independent (at least approximately) of its expectation. Imposing the condition \operatorname approx h(\mu)g'(\mu)^2=\text , this equality implies the differential equation: : \frac=\frac This ordinary differential equation has, by separation of variables, the following solution: : g(\mu)=\int \frac This last expression appeared for the first time in a M. S. Bartlett paper.{{cite journal , last=Bartlett , first=M. S. , year=1947 , title=The Use of Transformations , journal=Biometrics , volume=3 , pages=39–52 , doi=10.2307/3001536


References

Statistical data transformation