A whitening transformation or sphering transformation is a
linear transformation
In mathematics, and more specifically in linear algebra, a linear map (also called a linear mapping, linear transformation, vector space homomorphism, or in some contexts linear function) is a mapping V \to W between two vector spaces that pre ...
that transforms a vector of
random variables with a known
covariance matrix into a set of new variables whose covariance is the
identity matrix, meaning that they are
uncorrelated
In probability theory and statistics, two real-valued random variables, X, Y, are said to be uncorrelated if their covariance, \operatorname ,Y= \operatorname Y- \operatorname \operatorname /math>, is zero. If two variables are uncorrelated, ther ...
and each have
variance
In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its population mean or sample mean. Variance is a measure of dispersion, meaning it is a measure of how far a set of numbe ...
1. The transformation is called "whitening" because it changes the input vector into a
white noise
In signal processing, white noise is a random signal having equal intensity at different frequencies, giving it a constant power spectral density. The term is used, with this or similar meanings, in many scientific and technical disciplines ...
vector.
Several other transformations are closely related to whitening:
# the decorrelation transform removes only the correlations but leaves variances intact,
# the standardization transform sets variances to 1 but leaves correlations intact,
# a coloring transformation transforms a vector of white random variables into a random vector with a specified covariance matrix.
Definition
Suppose
is a
random (column) vector with non-singular covariance matrix
and mean
. Then the transformation
with
a whitening matrix
satisfying the condition
yields the whitened random vector
with unit diagonal covariance.
There are infinitely many possible whitening matrices
that all satisfy the above condition. Commonly used choices are
(Mahalanobis or ZCA whitening),
where
is the
Cholesky decomposition
In linear algebra, the Cholesky decomposition or Cholesky factorization (pronounced ) is a decomposition of a Hermitian, positive-definite matrix into the product of a lower triangular matrix and its conjugate transpose, which is useful for effici ...
of
(Cholesky whitening),
or the eigen-system of
(PCA whitening).
Optimal whitening transforms can be singled out by investigating the cross-covariance and cross-correlation of
and
.
[ For example, the unique optimal whitening transformation achieving maximal component-wise correlation between original and whitened is produced by the whitening matrix where is the correlation matrix and the variance matrix.
]
Whitening a data matrix
Whitening a data matrix follows the same transformation as for random variables. An empirical whitening transform is obtained by estimating the covariance (e.g. by maximum likelihood
In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of an assumed probability distribution, given some observed data. This is achieved by maximizing a likelihood function so that, under the assumed stat ...
) and subsequently constructing a corresponding estimated whitening matrix (e.g. by Cholesky decomposition
In linear algebra, the Cholesky decomposition or Cholesky factorization (pronounced ) is a decomposition of a Hermitian, positive-definite matrix into the product of a lower triangular matrix and its conjugate transpose, which is useful for effici ...
).
R implementation
An implementation of several whitening procedures in R, including ZCA-whitening and PCA whitening but also CCA whitening, is available in the "whitening" R package published on CRAN.
See also
* Decorrelation
* Principal component analysis
* Weighted least squares
Weighted least squares (WLS), also known as weighted linear regression, is a generalization of ordinary least squares and linear regression in which knowledge of the variance of observations is incorporated into the regression.
WLS is also a speci ...
* Canonical correlation
In statistics, canonical-correlation analysis (CCA), also called canonical variates analysis, is a way of inferring information from cross-covariance matrices. If we have two vectors ''X'' = (''X''1, ..., ''X'n'') and ''Y' ...
* Mahalanobis distance (is Euclidean after W. transformation).
References
{{reflist
External links
* http://courses.media.mit.edu/2010fall/mas622j/whiten.pdf
The ZCA whitening transformation
Appendix A of ''Learning Multiple Layers of Features from Tiny Images'' by A. Krizhevsky.
Classification algorithms
Articles with example Python (programming language) code