Box–Cox Transformation
   HOME

TheInfoList



OR:

In statistics, a power transform is a family of functions applied to create a
monotonic transformation In mathematics, a monotonic function (or monotone function) is a function (mathematics), function between List of order structures in mathematics, ordered sets that preserves or reverses the given order relation, order. This concept first aro ...
of data using
power function Exponentiation is a mathematical operation, written as , involving two numbers, the '' base'' and the ''exponent'' or ''power'' , and pronounced as " (raised) to the (power of) ". When is a positive integer, exponentiation corresponds to re ...
s. It is a
data transformation In computing, data transformation is the process of converting data from one format or structure into another format or structure. It is a fundamental aspect of most data integrationCIO.com. Agile Comes to Data Integration. Retrieved from: htt ...
technique used to stabilize variance, make the data more
normal distribution In statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is : f(x) = \frac e^ The parameter \mu ...
-like, improve the validity of measures of association (such as the
Pearson correlation In statistics, the Pearson correlation coefficient (PCC, pronounced ) ― also known as Pearson's ''r'', the Pearson product-moment correlation coefficient (PPMCC), the bivariate correlation, or colloquially simply as the correlation coefficient ...
between variables), and for other data stabilization procedures. Power transforms are used in multiple fields, including multi-resolution and wavelet analysis, statistical data analysis, medical research, modeling of physical processes, geochemical data analysis,
epidemiology Epidemiology is the study and analysis of the distribution (who, when, and where), patterns and determinants of health and disease conditions in a defined population. It is a cornerstone of public health, and shapes policy decisions and evide ...
and many other clinical, environmental and social research areas.


Definition

The power transformation is defined as a continuously varying function, with respect to the power parameter ''λ'', in a piece-wise function form that makes it continuous at the point of singularity (''λ'' = 0). For data vectors (''y''1,..., ''y''''n'') in which each ''y''''i'' > 0, the power transform is : y_i^ = \begin \dfrac , &\text \lambda \neq 0 \\
2pt PT, Pt, or pt may refer to: Arts and entertainment * ''P.T.'' (video game), acronym for ''Playable Teaser'', a short video game released to promote the cancelled video game ''Silent Hills'' * Porcupine Tree, a British progressive rock group ...
\operatorname(y)\ln , &\text \lambda = 0 \end where : \operatorname(y) = \left(\prod_^n y_i\right)^\frac = \sqrt \, is the geometric mean of the observations ''y''1, ..., ''y''''n''. The case for \lambda = 0 is the limit as \lambda approaches 0. To see this, note that y_i^\lambda = \exp() = 1 + \lambda \ln(y_i) + O((\lambda \ln(y_i))^2). Then \dfrac\lambda = \ln(y_i) + O(\lambda), and everything but \ln(y_i) becomes negligible for \lambda sufficiently small. The inclusion of the (''λ'' − 1)th power of the geometric mean in the denominator simplifies the scientific interpretation of any equation involving y_i^, because the units of measurement do not change as ''λ'' changes.
Box A box (plural: boxes) is a container used for the storage or transportation of its contents. Most boxes have flat, parallel, rectangular sides. Boxes can be very small (like a matchbox) or very large (like a shipping box for furniture), and can ...
and
Cox Cox may refer to: * Cox (surname), including people with the name Companies * Cox Enterprises, a media and communications company ** Cox Communications, cable provider ** Cox Media Group, a company that owns television and radio stations ** ...
(1964) introduced the geometric mean into this transformation by first including the Jacobian of rescaled power transformation : \frac \lambda. with the likelihood. This Jacobian is as follows: : J(\lambda; y_1, \ldots, y_n) = \prod_^n , d y_i^ / dy, = \prod_^n y_i^ = \operatorname(y)^ This allows the normal log likelihood at its maximum to be written as follows: : \begin \log ( \mathcal (\hat\mu,\hat\sigma)) & = (-n/2)(\log(2\pi\hat\sigma^2) +1) + n(\lambda-1) \log(\operatorname(y)) \\ pt& = (-n/2)(\log(2\pi\hat\sigma^2 / \operatorname(y)^) + 1). \end From here, absorbing \operatorname(y)^ into the expression for \hat\sigma^2 produces an expression that establishes that minimizing the sum of squares of residuals from y_i^is equivalent to maximizing the sum of the normal
log likelihood The likelihood function (often simply called the likelihood) represents the probability of random variable realizations conditional on particular values of the statistical parameters. Thus, when evaluated on a given sample, the likelihood funct ...
of deviations from (y^\lambda-1)/\lambda and the log of the Jacobian of the transformation. The value at ''Y'' = 1 for any ''λ'' is 0, and the
derivative In mathematics, the derivative of a function of a real variable measures the sensitivity to change of the function value (output value) with respect to a change in its argument (input value). Derivatives are a fundamental tool of calculus. ...
with respect to ''Y'' there is 1 for any ''λ''. Sometimes ''Y'' is a version of some other variable scaled to give ''Y'' = 1 at some sort of average value. The transformation is a
power Power most often refers to: * Power (physics), meaning "rate of doing work" ** Engine power, the power put out by an engine ** Electric power * Power (social and political), the ability to influence people or events ** Abusive power Power may a ...
transformation, but done in such a way as to make it
continuous Continuity or continuous may refer to: Mathematics * Continuity (mathematics), the opposing concept to discreteness; common examples include ** Continuous probability distribution or random variable in probability and statistics ** Continuous ...
with the parameter ''λ'' at ''λ'' = 0. It has proved popular in
regression analysis In statistical modeling, regression analysis is a set of statistical processes for estimating the relationships between a dependent variable (often called the 'outcome' or 'response' variable, or a 'label' in machine learning parlance) and one ...
, including
econometrics Econometrics is the application of statistical methods to economic data in order to give empirical content to economic relationships. M. Hashem Pesaran (1987). "Econometrics," '' The New Palgrave: A Dictionary of Economics'', v. 2, p. 8 p. 8 ...
. Box and Cox also proposed a more general form of the transformation that incorporates a shift parameter. :\tau(y_i;\lambda, \alpha) = \begin \dfrac & \text \lambda\neq 0, \\ \\ \operatorname(y+\alpha)\ln(y_i + \alpha)& \text \lambda=0,\end which holds if ''y''''i'' + α > 0 for all ''i''. If τ(''Y'', λ, α) follows a
truncated normal distribution In probability and statistics, the truncated normal distribution is the probability distribution derived from that of a normally distributed random variable by bounding the random variable from either below or above (or both). The truncated no ...
, then ''Y'' is said to follow a Box–Cox distribution. Bickel and Doksum eliminated the need to use a
truncated distribution In statistics, a truncated distribution is a conditional distribution that results from restricting the domain of some other probability distribution. Truncated distributions arise in practical statistics in cases where the ability to record, or e ...
by extending the range of the transformation to all ''y'', as follows: :\tau(y_i;\lambda, \alpha) = \begin \dfrac & \text \lambda\neq 0, \\ \\ \operatorname(y+\alpha)\operatorname(y+\alpha)\ln(y_i + \alpha)& \text \lambda=0,\end where sgn(.) is the sign function. This change in definition has little practical import as long as \alpha is less than \operatorname(y_i), which it usually is. Bickel and Doksum also proved that the parameter estimates are
consistent In classical deductive logic, a consistent theory is one that does not lead to a logical contradiction. The lack of contradiction can be defined in either semantic or syntactic terms. The semantic definition states that a theory is consistent ...
and asymptotically normal under appropriate regularity conditions, though the standard Cramér–Rao lower bound can substantially underestimate the variance when parameter values are small relative to the noise variance. However, this problem of underestimating the variance may not be a substantive problem in many applications.


Box–Cox transformation

The one-parameter Box–Cox transformations are defined as : y_i^ = \begin \dfrac & \text \lambda \neq 0, \\ \ln y_i & \text \lambda = 0, \end and the two-parameter Box–Cox transformations as : y_i^ = \begin \dfrac & \text \lambda_1 \neq 0, \\ \ln (y_i + \lambda_2) & \text \lambda_1 = 0, \end as described in the original article. Moreover, the first transformations hold for y_i > 0, and the second for y_i > -\lambda_2. The parameter \lambda is estimated using the profile likelihood function and using goodness-of-fit tests.


Confidence interval

Confidence interval for the Box–Cox transformation can be asymptotically constructed using Wilks's theorem on the profile likelihood function to find all the possible values of \lambda that fulfill the following restriction: :\ln \big(L(\lambda)\big) \ge \ln \big(L(\hat\lambda)\big) - \frac _.


Example

The BUPA liver data set contains data on liver enzymes ALT and γGT. Suppose we are interested in using log(γGT) to predict ALT. A plot of the data appears in panel (a) of the figure. There appears to be non-constant variance, and a Box–Cox transformation might help. image:BUPA BoxCox.JPG The log-likelihood of the power parameter appears in panel (b). The horizontal reference line is at a distance of χ12/2 from the maximum and can be used to read off an approximate 95% confidence interval for λ. It appears as though a value close to zero would be good, so we take logs. Possibly, the transformation could be improved by adding a shift parameter to the log transformation. Panel (c) of the figure shows the log-likelihood. In this case, the maximum of the likelihood is close to zero suggesting that a shift parameter is not needed. The final panel shows the transformed data with a superimposed regression line. Note that although Box–Cox transformations can make big improvements in model fit, there are some issues that the transformation cannot help with. In the current example, the data are rather heavy-tailed so that the assumption of normality is not realistic and a
robust regression In robust statistics, robust regression seeks to overcome some limitations of traditional regression analysis. A regression analysis models the relationship between one or more independent variables and a dependent variable. Standard types of ...
approach leads to a more precise model.


Econometric application

Economists often characterize production relationships by some variant of the Box–Cox transformation. Consider a common representation of production ''Q'' as dependent on services provided by a capital stock ''K'' and by labor hours ''N'': :\tau(Q)=\alpha \tau(K)+ (1-\alpha)\tau(N).\, Solving for ''Q'' by inverting the Box–Cox transformation we find :Q=\big(\alpha K^\lambda + (1-\alpha) N^\lambda\big)^,\, which is known as the ''
constant elasticity of substitution Constant elasticity of substitution (CES), in economics, is a property of some production functions and utility functions. Several economists have featured in the topic and have contributed in the final finding of the constant. They include Tom McK ...
(CES)'' production function. The CES production function is a homogeneous function of degree one. When ''λ'' = 1, this produces the linear production function: : Q=\alpha K + (1-\alpha)N.\, When ''λ'' → 0 this produces the famous Cobb–Douglas production function: : Q=K^\alpha N^.\,


Activities and demonstrations

The SOCR resource pages contain a number of hands-on interactive activitiesPower Transform Family Graphs
SOCR webpages
demonstrating the Box–Cox (power) transformation using Java applets and charts. These directly illustrate the effects of this transform on
Q–Q plot In statistics, a Q–Q plot (quantile-quantile plot) is a probability plot, a graphical method for comparing two probability distributions by plotting their ''quantiles'' against each other. A point on the plot corresponds to one of the qu ...
s, X–Y
scatterplot A scatter plot (also called a scatterplot, scatter graph, scatter chart, scattergram, or scatter diagram) is a type of plot or mathematical diagram using Cartesian coordinates to display values for typically two variables for a set of data ...
s,
time-series In mathematics, a time series is a series of data points indexed (or listed or graphed) in time order. Most commonly, a time series is a sequence taken at successive equally spaced points in time. Thus it is a sequence of discrete-time data. Ex ...
plots and histograms.


Yeo–Johnson transformation

The Yeo–Johnson transformation allows also for zero and negative values of y. \lambda can be any real number, where \lambda = 1 produces the identity transformation. The transformation law reads: : y_i^ = \begin ((y_i+1)^\lambda-1)/\lambda & \text\lambda \neq 0, y \geq 0 \\ pt \ln(y_i + 1) & \text\lambda = 0, y \geq 0 \\ pt -((-y_i + 1)^ - 1) / (2 - \lambda) & \text\lambda \neq 2, y < 0 \\ pt -\ln(-y_i + 1) & \text\lambda = 2, y < 0 \end


Notes


References

* * * * * *


External links

* {{SpringerEOM , title=Box–Cox transformation , id=B/b110790 , first=R. , last=Nishii
fixed link
* Sanford Weisberg
Yeo-Johnson Power Transformations
Normal distribution Statistical data transformation