Centering Matrix
   HOME

TheInfoList



OR:

In
mathematics Mathematics is an area of knowledge that includes the topics of numbers, formulas and related structures, shapes and the spaces in which they are contained, and quantities and their changes. These topics are represented in modern mathematics ...
and
multivariate statistics Multivariate statistics is a subdivision of statistics encompassing the simultaneous observation and analysis of more than one outcome variable. Multivariate statistics concerns understanding the different aims and background of each of the dif ...
, the centering matrixJohn I. Marden, ''Analyzing and Modeling Rank Data'', Chapman & Hall, 1995, , page 59. is a
symmetric Symmetry (from grc, συμμετρία "agreement in dimensions, due proportion, arrangement") in everyday language refers to a sense of harmonious and beautiful proportion and balance. In mathematics, "symmetry" has a more precise definiti ...
and
idempotent matrix In linear algebra, an idempotent matrix is a matrix which, when multiplied by itself, yields itself. That is, the matrix A is idempotent if and only if A^2 = A. For this product A^2 to be defined, A must necessarily be a square matrix. Viewed this ...
, which when multiplied with a vector has the same effect as subtracting the
mean There are several kinds of mean in mathematics, especially in statistics. Each mean serves to summarize a given group of data, often to better understand the overall value (magnitude and sign) of a given data set. For a data set, the ''arithme ...
of the components of the vector from every component of that vector.


Definition

The centering matrix of size ''n'' is defined as the ''n''-by-''n'' matrix :C_n = I_n - \tfracJ_n where I_n\, is the
identity matrix In linear algebra, the identity matrix of size n is the n\times n square matrix with ones on the main diagonal and zeros elsewhere. Terminology and notation The identity matrix is often denoted by I_n, or simply by I if the size is immaterial o ...
of size ''n'' and J_n is an ''n''-by-''n'' matrix of all 1's. For example :C_1 = \begin 0 \end , :C_2= \left \begin 1 & 0 \\ 0 & 1 \end \right- \frac\left \begin 1 & 1 \\ 1 & 1 \end \right = \left \begin \frac & -\frac \\ -\frac & \frac \end \right , :C_3 = \left \begin 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end \right- \frac\left \begin 1 & 1 & 1 \\ 1 & 1 & 1 \\ 1 & 1 & 1 \end \right = \left \begin \frac & -\frac & -\frac \\ -\frac & \frac & -\frac \\ -\frac & -\frac & \frac \end \right


Properties

Given a column-vector, \mathbf\, of size ''n'', the centering property of C_n\, can be expressed as :C_n\,\mathbf = \mathbf - (\tfracJ_^\textrm\mathbf)J_ where J_ is a column vector of ones and \tfracJ_^\textrm\mathbf is the mean of the components of \mathbf\,. C_n\, is symmetric positive semi-definite. C_n\, is
idempotent Idempotence (, ) is the property of certain operation (mathematics), operations in mathematics and computer science whereby they can be applied multiple times without changing the result beyond the initial application. The concept of idempotence ...
, so that C_n^k=C_n, for k=1,2,\ldots. Once the mean has been removed, it is zero and removing it again has no effect. C_n\, is singular. The effects of applying the transformation C_n\,\mathbf cannot be reversed. C_n\, has the
eigenvalue In linear algebra, an eigenvector () or characteristic vector of a linear transformation is a nonzero vector that changes at most by a scalar factor when that linear transformation is applied to it. The corresponding eigenvalue, often denoted b ...
1 of multiplicity ''n'' − 1 and eigenvalue 0 of multiplicity 1. C_n\, has a
nullspace In mathematics, the kernel of a linear map, also known as the null space or nullspace, is the linear subspace of the domain of the map which is mapped to the zero vector. That is, given a linear map between two vector spaces and , the kernel of ...
of dimension 1, along the vector J_. C_n\, is an orthogonal projection matrix. That is, C_n\mathbf is a projection of \mathbf\, onto the (''n'' − 1)-dimensional subspace that is orthogonal to the nullspace J_. (This is the subspace of all ''n''-vectors whose components sum to zero.) The trace of C_n is n*(n-1)/n = n-1.


Application

Although multiplication by the centering matrix is not a computationally efficient way of removing the mean from a vector, it is a convenient analytical tool. It can be used not only to remove the mean of a single vector, but also of multiple vectors stored in the rows or columns of an ''m''-by-''n'' matrix X. The left multiplication by C_m subtracts a corresponding mean value from each of the ''n'' columns, so that each column of the product C_m\,X has a zero mean. Similarly, the multiplication by C_n on the right subtracts a corresponding mean value from each of the ''m'' rows, and each row of the product X\,C_n has a zero mean. The multiplication on both sides creates a doubly centred matrix C_m\,X\,C_n, whose row and column means are equal to zero. The centering matrix provides in particular a succinct way to express the
scatter matrix : ''For the notion in quantum mechanics, see scattering matrix.'' In multivariate statistics and probability theory, the scatter matrix is a statistic that is used to make estimates of the covariance matrix, for instance of the multivariate normal ...
, S=(X-\mu J_^)(X-\mu J_^)^ of a data sample X\,, where \mu=\tfracX J_ is the
sample mean The sample mean (or "empirical mean") and the sample covariance are statistics computed from a Sample (statistics), sample of data on one or more random variables. The sample mean is the average value (or mean, mean value) of a sample (statistic ...
. The centering matrix allows us to express the scatter matrix more compactly as :S=X\,C_n(X\,C_n)^=X\,C_n\,C_n\,X\,^=X\,C_n\,X\,^. C_n is the
covariance matrix In probability theory and statistics, a covariance matrix (also known as auto-covariance matrix, dispersion matrix, variance matrix, or variance–covariance matrix) is a square matrix giving the covariance between each pair of elements of ...
of the
multinomial distribution In probability theory, the multinomial distribution is a generalization of the binomial distribution. For example, it models the probability of counts for each side of a ''k''-sided dice rolled ''n'' times. For ''n'' independent trials each of w ...
, in the special case where the parameters of that distribution are k=n, and p_1=p_2=\cdots=p_n=\frac.


References

{{Matrix classes Data processing Matrices