Box-Cox Transform
   HOME

TheInfoList



OR:

In
statistics Statistics (from German language, German: ', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a s ...
, a power transform is a family of functions applied to create a monotonic transformation of data using
power function In mathematics, exponentiation, denoted , is an operation involving two numbers: the ''base'', , and the ''exponent'' or ''power'', . When is a positive integer, exponentiation corresponds to repeated multiplication of the base: that is, i ...
s. It is a
data transformation In computing, data transformation is the process of converting data from one format or structure into another format or structure. It is a fundamental aspect of most data integrationCIO.com. Agile Comes to Data Integration. Retrieved from: https ...
technique used to stabilize variance, make the data more
normal distribution In probability theory and statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is f(x) = \frac ...
-like, improve the validity of measures of association (such as the
Pearson correlation In statistics, the Pearson correlation coefficient (PCC) is a correlation coefficient that measures linear correlation between two sets of data. It is the ratio between the covariance of two variables and the product of their standard deviation ...
between variables), and for other data stabilization procedures. Power transforms are used in multiple fields, including multi-resolution and wavelet analysis, statistical data analysis, medical research, modeling of physical processes, geochemical data analysis,
epidemiology Epidemiology is the study and analysis of the distribution (who, when, and where), patterns and Risk factor (epidemiology), determinants of health and disease conditions in a defined population, and application of this knowledge to prevent dise ...
and many other clinical, environmental and social research areas.


Definition

The power transformation is defined as a continuous function of power parameter ''λ'', typically given in piece-wise form that makes it continuous at the point of singularity (''λ'' = 0). For data vectors (''y''1,..., ''y''''n'') in which each ''y''''i'' > 0, the power transform is : y_i^ = \begin \dfrac , &\text \lambda \neq 0 \\ 2pt\operatorname(y)\ln , &\text \lambda = 0 \end where : \operatorname(y) = \left(\prod_^n y_i\right)^\frac = \sqrt \, is the
geometric mean In mathematics, the geometric mean is a mean or average which indicates a central tendency of a finite collection of positive real numbers by using the product of their values (as opposed to the arithmetic mean which uses their sum). The geometri ...
of the observations ''y''1, ..., ''y''''n''. The case for \lambda = 0 is the limit as \lambda approaches 0. To see this, note that y_i^\lambda = \exp() = 1 + \lambda \ln(y_i) + O((\lambda \ln(y_i))^2) - using
Taylor series In mathematics, the Taylor series or Taylor expansion of a function is an infinite sum of terms that are expressed in terms of the function's derivatives at a single point. For most common functions, the function and the sum of its Taylor ser ...
. Then \dfrac\lambda = \ln(y_i) + O(\lambda), and everything but \ln(y_i) becomes negligible for \lambda sufficiently small. The inclusion of the (''λ'' − 1)th power of the geometric mean in the denominator simplifies the scientific interpretation of any equation involving y_i^, because the units of measurement do not change as ''λ'' changes.
Box A box (plural: boxes) is a container with rigid sides used for the storage or transportation of its contents. Most boxes have flat, parallel, rectangular sides (typically rectangular prisms). Boxes can be very small (like a matchbox) or v ...
and
Cox Cox or COX may refer to: Companies * Cox Enterprises, a media and communications company ** Cox Communications, cable provider ** Cox Media Group, a company that owns television and radio stations ** Cox Automotive, an Atlanta-based busines ...
(1964) introduced the geometric mean into this transformation by first including the Jacobian of rescaled power transformation : \frac \lambda. with the likelihood. This Jacobian is as follows: : J(\lambda; y_1, \ldots, y_n) = \prod_^n , d y_i^ / dy, = \prod_^n y_i^ = \operatorname(y)^ This allows the normal log likelihood at its maximum to be written as follows: : \begin \log ( \mathcal (\hat\mu,\hat\sigma)) & = (-n/2)(\log(2\pi\hat\sigma^2) +1) + n(\lambda-1) \log(\operatorname(y)) \\ pt& = (-n/2)(\log(2\pi\hat\sigma^2 / \operatorname(y)^) + 1). \end From here, absorbing \operatorname(y)^ into the expression for \hat\sigma^2 produces an expression that establishes that minimizing the sum of squares of residuals from y_i^ is equivalent to maximizing the sum of the normal
log likelihood A likelihood function (often simply called the likelihood) measures how well a statistical model explains observed data by calculating the probability of seeing that data under different parameter values of the model. It is constructed from the j ...
of deviations from (y^\lambda-1)/\lambda and the log of the Jacobian of the transformation. The value at ''Y'' = 1 for any ''λ'' is 0, and the
derivative In mathematics, the derivative is a fundamental tool that quantifies the sensitivity to change of a function's output with respect to its input. The derivative of a function of a single variable at a chosen input value, when it exists, is t ...
with respect to ''Y'' there is 1 for any ''λ''. Sometimes ''Y'' is a version of some other variable scaled to give ''Y'' = 1 at some sort of average value. The transformation is a
power Power may refer to: Common meanings * Power (physics), meaning "rate of doing work" ** Engine power, the power put out by an engine ** Electric power, a type of energy * Power (social and political), the ability to influence people or events Math ...
transformation, but done in such a way as to make it
continuous Continuity or continuous may refer to: Mathematics * Continuity (mathematics), the opposing concept to discreteness; common examples include ** Continuous probability distribution or random variable in probability and statistics ** Continuous ...
with the parameter ''λ'' at ''λ'' = 0. It has proved popular in regression analysis, including
econometrics Econometrics is an application of statistical methods to economic data in order to give empirical content to economic relationships. M. Hashem Pesaran (1987). "Econometrics", '' The New Palgrave: A Dictionary of Economics'', v. 2, p. 8 p. 8 ...
. Box and Cox also proposed a more general form of the transformation that incorporates a shift parameter. :\tau(y_i;\lambda, \alpha) = \begin \dfrac & \text \lambda\neq 0, \\ \\ \operatorname(y+\alpha)\ln(y_i + \alpha)& \text \lambda=0,\end which holds if ''y''''i'' + α > 0 for all ''i''. If τ(''Y'', λ, α) follows a
truncated normal distribution In probability and statistics, the truncated normal distribution is the probability distribution derived from that of a normally distributed random variable by bounding the random variable from either below or above (or both). The truncated no ...
, then ''Y'' is said to follow a
Box–Cox distribution In statistics, the Box–Cox distribution (also known as the power-normal distribution) is the distribution of a random variable ''X'' for which the Box–Cox transformation on ''X'' follows a truncated normal distribution. It is a continuous pro ...
. Bickel and Doksum eliminated the need to use a
truncated distribution In statistics, a truncated distribution is a conditional distribution that results from restricting the domain of some other probability distribution. Truncated distributions arise in practical statistics in cases where the ability to record, or ...
by extending the range of the transformation to all ''y'', as follows: :\tau(y_i;\lambda, \alpha) = \begin \dfrac & \text \lambda\neq 0, \\ \\ \operatorname(y+\alpha)\operatorname(y+\alpha)\ln(y_i + \alpha)& \text \lambda=0,\end where sgn(.) is the
sign function In mathematics, the sign function or signum function (from '' signum'', Latin for "sign") is a function that has the value , or according to whether the sign of a given real number is positive or negative, or the given number is itself zer ...
. This change in definition has little practical import as long as \alpha is less than \operatorname(y_i), which it usually is. Bickel and Doksum also proved that the parameter estimates are
consistent In deductive logic, a consistent theory is one that does not lead to a logical contradiction. A theory T is consistent if there is no formula \varphi such that both \varphi and its negation \lnot\varphi are elements of the set of consequences ...
and asymptotically normal under appropriate regularity conditions, though the standard Cramér–Rao lower bound can substantially underestimate the variance when parameter values are small relative to the noise variance. However, this problem of underestimating the variance may not be a substantive problem in many applications.


Box–Cox transformation

The one-parameter Box–Cox transformations are defined as : y_i^ = \begin \dfrac & \text \lambda \neq 0, \\ \ln y_i & \text \lambda = 0, \end and the two-parameter Box–Cox transformations as : y_i^ = \begin \dfrac & \text \lambda_1 \neq 0, \\ \ln (y_i + \lambda_2) & \text \lambda_1 = 0, \end as described in the original article. Moreover, the first transformations hold for y_i > 0, and the second for y_i > -\lambda_2. The parameter \lambda is estimated using the profile likelihood function and using goodness-of-fit tests.


Confidence interval

Confidence interval for the Box–Cox transformation can be asymptotically constructed using Wilks's theorem on the profile likelihood function to find all the possible values of \lambda that fulfill the following restriction: :\ln \big(L(\lambda)\big) \ge \ln \big(L(\hat\lambda)\big) - \frac _.


Example

The BUPA liver data set contains data on liver enzymes
ALT Alt or ALT may refer to: Abbreviations for words * Alt account, an alternative online identity also known as a sock puppet account * Alternate character, in online gaming * Alternate route, type of highway designation * Alternating group, mathem ...
and γGT. Suppose we are interested in using log(γGT) to predict ALT. A plot of the data appears in panel (a) of the figure. There appears to be non-constant variance, and a Box–Cox transformation might help. image:BUPA BoxCox.JPG The log-likelihood of the power parameter appears in panel (b). The horizontal reference line is at a distance of χ12/2 from the maximum and can be used to read off an approximate 95% confidence interval for λ. It appears as though a value close to zero would be good, so we take logs. Possibly, the transformation could be improved by adding a shift parameter to the log transformation. Panel (c) of the figure shows the log-likelihood. In this case, the maximum of the likelihood is close to zero suggesting that a shift parameter is not needed. The final panel shows the transformed data with a superimposed regression line. Note that although Box–Cox transformations can make big improvements in model fit, there are some issues that the transformation cannot help with. In the current example, the data are rather heavy-tailed so that the assumption of normality is not realistic and a
robust regression In robust statistics, robust regression seeks to overcome some limitations of traditional regression analysis. A regression analysis models the relationship between one or more independent variables and a dependent variable. Standard types of re ...
approach leads to a more precise model.


Econometric application

Economists often characterize production relationships by some variant of the Box–Cox transformation. Consider a common representation of production ''Q'' as dependent on services provided by a capital stock ''K'' and by labor hours ''N'': :\tau(Q)=\alpha \tau(K)+ (1-\alpha)\tau(N).\, Solving for ''Q'' by inverting the Box–Cox transformation we find :Q=\big(\alpha K^\lambda + (1-\alpha) N^\lambda\big)^,\, which is known as the ''
constant elasticity of substitution Constant elasticity of substitution (CES) is a common specification of many production functions and utility function In economics, utility is a measure of a certain person's satisfaction from a certain state of the world. Over time, the term ...
(CES)''
production function In economics, a production function gives the technological relation between quantities of physical inputs and quantities of output of goods. The production function is one of the key concepts of mainstream economics, mainstream neoclassical econ ...
. The CES production function is a
homogeneous function In mathematics, a homogeneous function is a function of several variables such that the following holds: If each of the function's arguments is multiplied by the same scalar (mathematics), scalar, then the function's value is multiplied by some p ...
of degree one. When ''λ'' = 1, this produces the linear production function: : Q=\alpha K + (1-\alpha)N.\, When ''λ'' → 0 this produces the famous Cobb–Douglas production function: : Q=K^\alpha N^.\,


Activities and demonstrations

The
SOCR The Statistics Online Computational Resource (SOCR) is an online multi-institutional research and education organization. SOCR designs, validates and broadly shares a suite of online tools for statistical computing, and interactive materials for ...
resource pages contain a number of hands-on interactive activitiesPower Transform Family Graphs
SOCR webpages
demonstrating the Box–Cox (power) transformation using Java applets and charts. These directly illustrate the effects of this transform on
Q–Q plot In statistics, a Q–Q plot (quantile–quantile plot) is a probability plot, a List of graphical methods, graphical method for comparing two probability distributions by plotting their ''quantiles'' against each other. A point on the plot ...
s, X–Y
scatterplot A scatter plot, also called a scatterplot, scatter graph, scatter chart, scattergram, or scatter diagram, is a type of plot or mathematical diagram using Cartesian coordinates to display values for typically two variables for a set of dat ...
s,
time-series In mathematics, a time series is a series of data points indexed (or listed or graphed) in time order. Most commonly, a time series is a sequence taken at successive equally spaced points in time. Thus it is a sequence of discrete-time data. E ...
plots and
histogram A histogram is a visual representation of the frequency distribution, distribution of quantitative data. To construct a histogram, the first step is to Data binning, "bin" (or "bucket") the range of values— divide the entire range of values in ...
s.


Yeo–Johnson transformation

The Yeo–Johnson transformation allows also for zero and negative values of y. \lambda can be any real number, where \lambda = 1 produces the identity transformation. The transformation law reads: : y_i^ = \begin ((y_i+1)^\lambda-1)/\lambda & \text\lambda \neq 0, y \geq 0 \\ pt \ln(y_i + 1) & \text\lambda = 0, y \geq 0 \\ pt -((-y_i + 1)^ - 1) / (2 - \lambda) & \text\lambda \neq 2, y < 0 \\ pt -\ln(-y_i + 1) & \text\lambda = 2, y < 0 \end


Box-Tidwell transformation

The Box-Tidwell transformation is a statistical technique used to assess and correct non-linearity between predictor variables and the
logit In statistics, the logit ( ) function is the quantile function associated with the standard logistic distribution. It has many uses in data analysis and machine learning, especially in Data transformation (statistics), data transformations. Ma ...
in a
generalized linear model In statistics, a generalized linear model (GLM) is a flexible generalization of ordinary linear regression. The GLM generalizes linear regression by allowing the linear model to be related to the response variable via a ''link function'' and by ...
, particularly in
logistic regression In statistics, a logistic model (or logit model) is a statistical model that models the logit, log-odds of an event as a linear function (calculus), linear combination of one or more independent variables. In regression analysis, logistic regres ...
. This transformation is useful when the relationship between the independent variables and the outcome is non-linear and cannot be adequately captured by the standard model.


Overview

The Box-Tidwell transformation was developed by
George E. P. Box George Edward Pelham Box (18 October 1919 – 28 March 2013) was a British statistician, who worked in the areas of quality control, time-series analysis, design of experiments, and Bayesian inference. He has been called "one of the gre ...
and Paul W. Tidwell in 1962 as an extension of Box-Cox transformations, which are applied to the dependent variable. However, unlike the Box-Cox transformation, the Box-Tidwell transformation is applied to the independent variables in regression models. It is often used when the assumption of linearity between the predictors and the outcome is violated.


Method

The general idea behind the Box-Tidwell transformation is to apply a power transformation to each independent variable Xi in the regression model: X_i' = X_i^ Where \lambda is the parameter estimated from the data. If Box-Tidwell Transformation is significantly different from 1, this indicates a non-linear relationship between Xi and the logit, and the transformation improves the model fit. The Box-Tidwell test is typically performed by augmenting the regression model with terms like X_i \log(X_i) and testing the significance of the coefficients. If significant, this suggests that a transformation should be applied to achieve a linear relationship between the predictor and the logit.


Applications


Stabilizing Continuous Predictors

The transformation is beneficial in
logistic regression In statistics, a logistic model (or logit model) is a statistical model that models the logit, log-odds of an event as a linear function (calculus), linear combination of one or more independent variables. In regression analysis, logistic regres ...
or
proportional hazards models Proportional hazards models are a class of survival models in statistics. Survival models relate the time that passes, before some event occurs, to one or more covariates that may be associated with that quantity of time. In a proportional haz ...
where non-linearity in continuous predictors can distort the relationship with the dependent variable. It is a flexible tool that allows the researcher to fit a more appropriate model to the data without guessing the relationship's functional form in advance.


Verifying Linearity in Logistic Regression

In
logistic regression In statistics, a logistic model (or logit model) is a statistical model that models the logit, log-odds of an event as a linear function (calculus), linear combination of one or more independent variables. In regression analysis, logistic regres ...
, a key assumption is that continuous independent variables exhibit a linear relationship with the logit of the dependent variable. Violations of this assumption can lead to biased estimates and reduced model performance. The Box-Tidwell transformation is a method used to assess and correct such violations by determining whether a continuous predictor requires transformation to achieve linearity with the logit.


Method for Verifying Linearity

The Box-Tidwell transformation introduces an interaction term between each continuous variable ''X''i and its natural logarithm \log(X_i): X_i \log(X_i) This term is included in the logistic regression model to test whether the relationship between ''X''i and the logit is non-linear. A statistically significant coefficient for this interaction term indicates a violation of the linearity assumption, suggesting the need for a transformation of the predictor. the Box-Tidwell transformation provides an appropriate power transformation to linearize the relationship, thereby improving model accuracy and validity. Conversely, non-significant results support the assumption of linearity.


Limitations

One limitation of the Box-Tidwell transformation is that it only works for positive values of the independent variables. If your data contains negative values, the transformation cannot be applied directly without modifying the variables (e.g., adding a constant).


Notes


References

* * * * * * * Box, G.E.P. and Tidwell, P.W. (1962) Transformation of Independent Variables. Technometrics, 4, 531-550. https://doi.org/10.1080/00401706.1962.10490038 (a.k.a. Box-Tidwell transformation)


External links

* {{SpringerEOM , title=Box–Cox transformation , id=B/b110790 , first=R. , last=Nishii
fixed link
* Sanford Weisberg
Yeo-Johnson Power Transformations
Normal distribution Statistical data transformation