In
mathematics
Mathematics is a field of study that discovers and organizes methods, Mathematical theory, theories and theorems that are developed and Mathematical proof, proved for the needs of empirical sciences and mathematics itself. There are many ar ...
and
multivariate statistics
Multivariate statistics is a subdivision of statistics encompassing the simultaneous observation and analysis of more than one outcome variable, i.e., '' multivariate random variables''.
Multivariate statistics concerns understanding the differ ...
, the centering matrix
[John I. Marden, ''Analyzing and Modeling Rank Data'', Chapman & Hall, 1995, , page 59.] is a
symmetric and
idempotent matrix
In linear algebra, an idempotent matrix is a matrix which, when multiplied by itself, yields itself. That is, the matrix A is idempotent if and only if A^2 = A. For this product A^2 to be defined, A must necessarily be a square matrix. Viewed thi ...
, which when multiplied with a vector has the same effect as subtracting the
mean
A mean is a quantity representing the "center" of a collection of numbers and is intermediate to the extreme values of the set of numbers. There are several kinds of means (or "measures of central tendency") in mathematics, especially in statist ...
of the components of the vector from every component of that vector.
Definition
The centering matrix of size ''n'' is defined as the ''n''-by-''n'' matrix
:
where
is the
identity matrix
In linear algebra, the identity matrix of size n is the n\times n square matrix with ones on the main diagonal and zeros elsewhere. It has unique properties, for example when the identity matrix represents a geometric transformation, the obje ...
of size ''n'' and
is an ''n''-by-''n''
matrix of all 1's.
For example
:
,
:
,
:
Properties
Given a column-vector,
of size ''n'', the centering property of
can be expressed as
:
where
is a
column vector of ones and
is the mean of the components of
.
is symmetric
positive semi-definite.
is
idempotent
Idempotence (, ) is the property of certain operations in mathematics and computer science whereby they can be applied multiple times without changing the result beyond the initial application. The concept of idempotence arises in a number of pl ...
, so that
, for
. Once the mean has been removed, it is zero and removing it again has no effect.
is
singular. The effects of applying the transformation
cannot be reversed.
has the
eigenvalue
In linear algebra, an eigenvector ( ) or characteristic vector is a vector that has its direction unchanged (or reversed) by a given linear transformation. More precisely, an eigenvector \mathbf v of a linear transformation T is scaled by a ...
1 of multiplicity ''n'' − 1 and eigenvalue 0 of multiplicity 1.
has a
nullspace of dimension 1, along the vector
.
is an
orthogonal projection matrix. That is,
is a projection of
onto the (''n'' − 1)-dimensional
subspace that is orthogonal to the nullspace
. (This is the subspace of all ''n''-vectors whose components sum to zero.)
The trace of
is
.
Application
Although multiplication by the centering matrix is not a computationally efficient way of removing the mean from a vector, it is a convenient analytical tool. It can be used not only to remove the mean of a single vector, but also of multiple vectors stored in the rows or columns of an ''m''-by-''n'' matrix
.
The left multiplication by
subtracts a corresponding mean value from each of the ''n'' columns, so that each column of the product
has a zero mean. Similarly, the multiplication by
on the right subtracts a corresponding mean value from each of the ''m'' rows, and each row of the product
has a zero mean.
The multiplication on both sides creates a doubly centred matrix
, whose row and column means are equal to zero.
The centering matrix provides in particular a succinct way to express the
scatter matrix,
of a data sample
, where
is the
sample mean
The sample mean (sample average) or empirical mean (empirical average), and the sample covariance or empirical covariance are statistics computed from a sample of data on one or more random variables.
The sample mean is the average value (or me ...
. The centering matrix allows us to express the scatter matrix more compactly as
:
is the
covariance matrix of the
multinomial distribution, in the special case where the parameters of that distribution are
, and
.
References
{{Matrix classes
Data processing
Matrices (mathematics)