Centering Matrix
   HOME

TheInfoList



OR:

In
mathematics Mathematics is a field of study that discovers and organizes methods, Mathematical theory, theories and theorems that are developed and Mathematical proof, proved for the needs of empirical sciences and mathematics itself. There are many ar ...
and
multivariate statistics Multivariate statistics is a subdivision of statistics encompassing the simultaneous observation and analysis of more than one outcome variable, i.e., '' multivariate random variables''. Multivariate statistics concerns understanding the differ ...
, the centering matrixJohn I. Marden, ''Analyzing and Modeling Rank Data'', Chapman & Hall, 1995, , page 59. is a symmetric and
idempotent matrix In linear algebra, an idempotent matrix is a matrix which, when multiplied by itself, yields itself. That is, the matrix A is idempotent if and only if A^2 = A. For this product A^2 to be defined, A must necessarily be a square matrix. Viewed thi ...
, which when multiplied with a vector has the same effect as subtracting the
mean A mean is a quantity representing the "center" of a collection of numbers and is intermediate to the extreme values of the set of numbers. There are several kinds of means (or "measures of central tendency") in mathematics, especially in statist ...
of the components of the vector from every component of that vector.


Definition

The centering matrix of size ''n'' is defined as the ''n''-by-''n'' matrix :C_n = I_n - \tfracJ_n where I_n\, is the
identity matrix In linear algebra, the identity matrix of size n is the n\times n square matrix with ones on the main diagonal and zeros elsewhere. It has unique properties, for example when the identity matrix represents a geometric transformation, the obje ...
of size ''n'' and J_n is an ''n''-by-''n'' matrix of all 1's. For example :C_1 = \begin 0 \end , :C_2= \left \begin 1 & 0 \\ 0 & 1 \end \right- \frac\left \begin 1 & 1 \\ 1 & 1 \end \right = \left \begin \frac & -\frac \\ -\frac & \frac \end \right , :C_3 = \left \begin 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end \right- \frac\left \begin 1 & 1 & 1 \\ 1 & 1 & 1 \\ 1 & 1 & 1 \end \right = \left \begin \frac & -\frac & -\frac \\ -\frac & \frac & -\frac \\ -\frac & -\frac & \frac \end \right


Properties

Given a column-vector, \mathbf\, of size ''n'', the centering property of C_n\, can be expressed as :C_n\,\mathbf = \mathbf - (\tfracJ_^\textrm\mathbf)J_ where J_ is a column vector of ones and \tfracJ_^\textrm\mathbf is the mean of the components of \mathbf\,. C_n\, is symmetric positive semi-definite. C_n\, is
idempotent Idempotence (, ) is the property of certain operations in mathematics and computer science whereby they can be applied multiple times without changing the result beyond the initial application. The concept of idempotence arises in a number of pl ...
, so that C_n^k=C_n, for k=1,2,\ldots. Once the mean has been removed, it is zero and removing it again has no effect. C_n\, is singular. The effects of applying the transformation C_n\,\mathbf cannot be reversed. C_n\, has the
eigenvalue In linear algebra, an eigenvector ( ) or characteristic vector is a vector that has its direction unchanged (or reversed) by a given linear transformation. More precisely, an eigenvector \mathbf v of a linear transformation T is scaled by a ...
1 of multiplicity ''n'' − 1 and eigenvalue 0 of multiplicity 1. C_n\, has a nullspace of dimension 1, along the vector J_. C_n\, is an orthogonal projection matrix. That is, C_n\mathbf is a projection of \mathbf\, onto the (''n'' − 1)-dimensional subspace that is orthogonal to the nullspace J_. (This is the subspace of all ''n''-vectors whose components sum to zero.) The trace of C_n is n(n-1)/n = n-1.


Application

Although multiplication by the centering matrix is not a computationally efficient way of removing the mean from a vector, it is a convenient analytical tool. It can be used not only to remove the mean of a single vector, but also of multiple vectors stored in the rows or columns of an ''m''-by-''n'' matrix X. The left multiplication by C_m subtracts a corresponding mean value from each of the ''n'' columns, so that each column of the product C_m\,X has a zero mean. Similarly, the multiplication by C_n on the right subtracts a corresponding mean value from each of the ''m'' rows, and each row of the product X\,C_n has a zero mean. The multiplication on both sides creates a doubly centred matrix C_m\,X\,C_n, whose row and column means are equal to zero. The centering matrix provides in particular a succinct way to express the scatter matrix, S=(X-\mu J_^)(X-\mu J_^)^ of a data sample X\,, where \mu=\tfracX J_ is the
sample mean The sample mean (sample average) or empirical mean (empirical average), and the sample covariance or empirical covariance are statistics computed from a sample of data on one or more random variables. The sample mean is the average value (or me ...
. The centering matrix allows us to express the scatter matrix more compactly as :S=X\,C_n(X\,C_n)^=X\,C_n\,C_n\,X\,^=X\,C_n\,X\,^. C_n is the covariance matrix of the multinomial distribution, in the special case where the parameters of that distribution are k=n, and p_1=p_2=\cdots=p_n=\frac.


References

{{Matrix classes Data processing Matrices (mathematics)