mathematics Mathematics is a field of study that discovers and organizes methods, Mathematical theory, theories and theorems that are developed and Mathematical proof, proved for the needs of empirical sciences and mathematics itself. There are many ar ...

and

multivariate statistics Multivariate statistics is a subdivision of statistics encompassing the simultaneous observation and analysis of more than one outcome variable, i.e., '' multivariate random variables''. Multivariate statistics concerns understanding the differ ...

, the centering matrixJohn I. Marden, ''Analyzing and Modeling Rank Data'', Chapman & Hall, 1995, , page 59. is a symmetric and

idempotent matrix In linear algebra, an idempotent matrix is a matrix which, when multiplied by itself, yields itself. That is, the matrix A is idempotent if and only if A^2 = A. For this product A^2 to be defined, A must necessarily be a square matrix. Viewed thi ...

, which when multiplied with a vector has the same effect as subtracting the

mean A mean is a quantity representing the "center" of a collection of numbers and is intermediate to the extreme values of the set of numbers. There are several kinds of means (or "measures of central tendency") in mathematics, especially in statist ...

of the components of the vector from every component of that vector.

Definition

The centering matrix of size ''n'' is defined as the ''n''-by-''n'' matrix :

C_n =  I_n - \tfracJ_n

where

I_n\,

is the

identity matrix In linear algebra, the identity matrix of size n is the n\times n square matrix with ones on the main diagonal and zeros elsewhere. It has unique properties, for example when the identity matrix represents a geometric transformation, the obje ...

of size ''n'' and

J_n

is an ''n''-by-''n'' matrix of all 1's. For example :

C_1 = \begin
0 \end

, :

C_2= \left \begin 
1 & 0 \\
0 & 1 
\end \right - \frac\left \begin 
1 & 1 \\
1 & 1
\end \right = \left \begin 
\frac & -\frac \\
-\frac & \frac 
\end \right

, :

C_3 = \left \begin
1 & 0 & 0 \\
0 & 1 & 0 \\
0 & 0 & 1 
\end \right -  \frac\left \begin
1 & 1 & 1 \\
1 & 1 & 1 \\
1 & 1 & 1 
\end \right = \left \begin
\frac & -\frac & -\frac \\
-\frac & \frac & -\frac \\
-\frac & -\frac & \frac 
\end \right

Properties

Given a column-vector,

\mathbf\,

of size ''n'', the centering property of

C_n\,

can be expressed as :

C_n\,\mathbf = \mathbf - (\tfracJ_^\textrm\mathbf)J_

where

J_

is a column vector of ones and

\tfracJ_^\textrm\mathbf

is the mean of the components of

\mathbf\,

C_n\,

is symmetric positive semi-definite.

C_n\,

idempotent Idempotence (, ) is the property of certain operations in mathematics and computer science whereby they can be applied multiple times without changing the result beyond the initial application. The concept of idempotence arises in a number of pl ...

, so that

C_n^k=C_n

, for

k=1,2,\ldots

. Once the mean has been removed, it is zero and removing it again has no effect.

C_n\,

is singular. The effects of applying the transformation

C_n\,\mathbf

cannot be reversed.

C_n\,

has the

eigenvalue In linear algebra, an eigenvector ( ) or characteristic vector is a vector that has its direction unchanged (or reversed) by a given linear transformation. More precisely, an eigenvector \mathbf v of a linear transformation T is scaled by a ...

1 of multiplicity ''n'' − 1 and eigenvalue 0 of multiplicity 1.

C_n\,

has a nullspace of dimension 1, along the vector

J_

C_n\,

is an orthogonal projection matrix. That is,

C_n\mathbf

is a projection of

\mathbf\,

onto the (''n'' − 1)-dimensional subspace that is orthogonal to the nullspace

J_

. (This is the subspace of all ''n''-vectors whose components sum to zero.) The trace of

C_n

n(n-1)/n = n-1

Application

Although multiplication by the centering matrix is not a computationally efficient way of removing the mean from a vector, it is a convenient analytical tool. It can be used not only to remove the mean of a single vector, but also of multiple vectors stored in the rows or columns of an ''m''-by-''n'' matrix

X

. The left multiplication by

C_m

subtracts a corresponding mean value from each of the ''n'' columns, so that each column of the product

C_m\,X

has a zero mean. Similarly, the multiplication by

C_n

on the right subtracts a corresponding mean value from each of the ''m'' rows, and each row of the product

X\,C_n

has a zero mean. The multiplication on both sides creates a doubly centred matrix

C_m\,X\,C_n

, whose row and column means are equal to zero. The centering matrix provides in particular a succinct way to express the scatter matrix,

S=(X-\mu J_^)(X-\mu J_^)^

of a data sample

X\,

, where

\mu=\tfracX J_

is the

sample mean The sample mean (sample average) or empirical mean (empirical average), and the sample covariance or empirical covariance are statistics computed from a sample of data on one or more random variables. The sample mean is the average value (or me ...

. The centering matrix allows us to express the scatter matrix more compactly as :

S=X\,C_n(X\,C_n)^=X\,C_n\,C_n\,X\,^=X\,C_n\,X\,^.

C_n

is the covariance matrix of the multinomial distribution, in the special case where the parameters of that distribution are

k=n

, and

p_1=p_2=\cdots=p_n=\frac

References

{{Matrix classes Data processing Matrices (mathematics)