HOME

TheInfoList



OR:

In
linear algebra Linear algebra is the branch of mathematics concerning linear equations such as: :a_1x_1+\cdots +a_nx_n=b, linear maps such as: :(x_1, \ldots, x_n) \mapsto a_1x_1+\cdots +a_nx_n, and their representations in vector spaces and through matrice ...
and the theory of
matrices Matrix most commonly refers to: * ''The Matrix'' (franchise), an American media franchise ** ''The Matrix'', a 1999 science-fiction action film ** "The Matrix", a fictional setting, a virtual reality environment, within ''The Matrix'' (franchis ...
, the Schur complement of a
block matrix In mathematics, a block matrix or a partitioned matrix is a matrix that is '' interpreted'' as having been broken into sections called blocks or submatrices. Intuitively, a matrix interpreted as a block matrix can be visualized as the original mat ...
is defined as follows. Suppose ''p'', ''q'' are nonnegative integers, and suppose ''A'', ''B'', ''C'', ''D'' are respectively ''p'' × ''p'', ''p'' × ''q'', ''q'' × ''p'', and ''q'' × ''q'' matrices of complex numbers. Let :M = \left begin A & B \\ C & D \end\right/math> so that ''M'' is a (''p'' + ''q'') × (''p'' + ''q'') matrix. If ''D'' is invertible, then the Schur complement of the block ''D'' of the matrix ''M'' is the ''p'' × ''p'' matrix defined by :M/D := A - BD^C. If ''A'' is invertible, the Schur complement of the block ''A'' of the matrix ''M'' is the ''q'' × ''q'' matrix defined by :M/A := D - CA^B. In the case that ''A'' or ''D'' is
singular Singular may refer to: * Singular, the grammatical number that denotes a unit quantity, as opposed to the plural and other forms * Singular homology * SINGULAR, an open source Computer Algebra System (CAS) * Singular or sounder, a group of boar ...
, substituting a
generalized inverse In mathematics, and in particular, algebra, a generalized inverse (or, g-inverse) of an element ''x'' is an element ''y'' that has some properties of an inverse element but not necessarily all of them. The purpose of constructing a generalized in ...
for the inverses on ''M/A'' and ''M/D'' yields the generalized Schur complement. The Schur complement is named after
Issai Schur Issai Schur (10 January 1875 – 10 January 1941) was a Russian mathematician who worked in Germany for most of his life. He studied at the University of Berlin. He obtained his doctorate in 1901, became lecturer in 1903 and, after a stay at ...
who used it to prove
Schur's lemma In mathematics, Schur's lemma is an elementary but extremely useful statement in representation theory of groups and algebras. In the group case it says that if ''M'' and ''N'' are two finite-dimensional irreducible representations of a group ...
, although it had been used previously.
Emilie Virginia Haynsworth Emilie Virginia Haynsworth (June 1, 1916 – May 4, 1985) was an American mathematician at Auburn University who worked in linear algebra and matrix theory. She gave the name to Schur complements and is the namesake of the Haynsworth inertia addi ...
was the first to call it the ''Schur complement''. The Schur complement is a key tool in the fields of numerical analysis, statistics, and matrix analysis.


Background

The Schur complement arises when performing a block
Gaussian elimination In mathematics, Gaussian elimination, also known as row reduction, is an algorithm for solving systems of linear equations. It consists of a sequence of operations performed on the corresponding matrix of coefficients. This method can also be used ...
on the matrix ''M''. In order to eliminate the elements below the block diagonal, one multiplies the matrix ''M'' by a ''block lower triangular'' matrix on the right as follows: :\begin &M = \begin A & B \\ C & D \end \quad \to \quad \begin A & B \\ C & D \end \begin I_p & 0 \\ -D^C & I_q \end = \begin A - BD^C & B \\ 0 & D \end, \end where ''Ip'' denotes a ''p''×''p''
identity matrix In linear algebra, the identity matrix of size n is the n\times n square matrix with ones on the main diagonal and zeros elsewhere. Terminology and notation The identity matrix is often denoted by I_n, or simply by I if the size is immaterial or ...
. As a result, the Schur complement M/D = A - BD^C appears in the upper-left ''p''×''p'' block. Continuing the elimination process beyond this point (i.e., performing a block Gauss–Jordan elimination), :\begin &\begin A - BD^C & B \\ 0 & D \end \quad \to \quad \begin I_p & -BD^ \\ 0 & I_q \end \begin A - BD^C & B \\ 0 & D \end = \begin A - BD^C & 0 \\ 0 & D \end, \end leads to an LDU decomposition of ''M'', which reads :\begin M &= \begin A & B \\ C & D \end = \begin I_p & BD^ \\ 0 & I_q \end\begin A - BD^C & 0 \\ 0 & D \end\begin I_p & 0 \\ D^C & I_q \end. \end Thus, the inverse of ''M'' may be expressed involving ''D''−1 and the inverse of Schur's complement, assuming it exists, as :\begin M^ = \begin A & B \\ C & D \end^ = &\left(\begin I_p & BD^ \\ 0 & I_q \end \begin A - BD^C & 0 \\ 0 & D \end \begin I_p & 0 \\ D^C & I_q \end \right)^ \\ = &\begin I_p & 0 \\ -D^C & I_q \end \begin \left(A - BD^C\right)^ & 0 \\ 0 & D^ \end \begin I_p & -BD^ \\ 0 & I_q \end \\ pt = &\begin \left(A - BD^C\right)^ & -\left(A - BD^C\right)^ BD^ \\ -D^C\left(A - BD^C\right)^ & D^ + D^C\left(A - BD^C\right)^BD^ \end \\ pt = &\begin \left(M/D\right)^ & -\left(M/D\right)^ B D^ \\ -D^C\left(M/D\right)^ & D^ + D^C\left(M/D\right)^ B D^ \end. \end The above relationship comes from the elimination operations that involve ''D''−1 and ''M/D''. An equivalent derivation can be done with the roles of ''A'' and ''D'' interchanged. By equating the expressions for ''M''−1 obtained in these two different ways, one can establish the
matrix inversion lemma In mathematics (specifically linear algebra), the Woodbury matrix identity, named after Max A. Woodbury, says that the inverse of a rank-''k'' correction of some matrix can be computed by doing a rank-''k'' correction to the inverse of the origin ...
, which relates the two Schur complements of ''M'': ''M/D'' and ''M/A'' (see ''"Derivation from LDU decomposition"'' in ).


Properties

* If ''p'' and ''q'' are both 1 (i.e., ''A'', ''B'', ''C'' and ''D'' are all scalars), we get the familiar formula for the inverse of a 2-by-2 matrix: : M^ = \frac \left \begin D & -B \\ -C & A \end\right : provided that ''AD'' − ''BC'' is non-zero. * In general, if ''A'' is invertible, then : \begin M &= \begin A&B\\C&D \end = \begin I_p & 0 \\ CA^ & I_q \end\begin A & 0 \\ 0 & D - CA^B \end\begin I_p & A^B \\ 0 & I_q \end, \\ pt M^ &= \begin A^ + A^ B (M/A)^ C A^ & - A^ B (M/A)^ \\ - (M/A)^ CA^ & (M/A)^ \end \end : whenever this inverse exists. * (Schur's formula) When ''A'', respectively ''D'', is invertible, the determinant of ''M'' is also clearly seen to be given by : \det(M) = \det(A) \det\left(D - CA^ B\right), respectively : \det(M) = \det(D) \det\left(A - BD^ C\right), : which generalizes the determinant formula for 2 × 2 matrices. * (Guttman rank additivity formula) If ''D'' is invertible, then the
rank Rank is the relative position, value, worth, complexity, power, importance, authority, level, etc. of a person or object within a ranking, such as: Level or position in a hierarchical organization * Academic rank * Diplomatic rank * Hierarchy * ...
of ''M'' is given by : \operatorname(M) = \operatorname(D) + \operatorname\left(A - BD^ C\right) * ( Haynsworth inertia additivity formula) If ''A'' is invertible, then the ''inertia'' of the block matrix ''M'' is equal to the inertia of ''A'' plus the inertia of ''M''/''A''. * (Quotient identity) A/B = ((A/C)/(B/C)). * The Schur complement of a
Laplacian matrix In the mathematical field of graph theory, the Laplacian matrix, also called the graph Laplacian, admittance matrix, Kirchhoff matrix or discrete Laplacian, is a matrix representation of a graph. Named after Pierre-Simon Laplace, the graph La ...
is also a Laplacian matrix.


Application to solving linear equations

The Schur complement arises naturally in solving a system of linear equations such as \begin A & B \\ C & D \end\begin x \\ y \end = \begin u \\ v \end . Assuming that the submatrix A is invertible, we can eliminate x from the equations, as follows. x = A^ (u - By). Substituting this expression into the second equation yields : \left(D - CA^B\right)y = v-CA^u. We refer to this as the ''reduced equation'' obtained by eliminating x from the original equation. The matrix appearing in the reduced equation is called the Schur complement of the first block A in M: : S \ \overset\ D - CA^B. Solving the reduced equation, we obtain : y = S^ \left(v-CA^u\right). Substituting this into the first equation yields : x = \left(A^ + A^ B S^ C A^\right) u - A^ B S^ v. We can express the above two equation as: : \begin x \\ y \end = \begin A^ + A^ B S^ C A^ & -A^ B S^ \\ -S^ C A^ & S^ \end \begin u \\ v \end . Therefore, a formulation for the inverse of a block matrix is: : \begin A & B \\ C & D \end^ = \begin A^ + A^ B S^ C A^ & - A^ B S^ \\ -S^ C A^ & S^ \end = \begin I_p & -A^B\\ & I_q \end\begin A^ & \\ & S^ \end\begin I_p & \\ -CA^ & I_q \end . In particular, we see that the Schur complement is the inverse of the 2,2 block entry of the inverse of M. In practice, one needs A to be
well-conditioned In numerical analysis, the condition number of a function measures how much the output value of the function can change for a small change in the input argument. This is used to measure how sensitive a function is to changes or errors in the input ...
in order for this algorithm to be numerically accurate. In electrical engineering this is often referred to as node elimination or Kron reduction.


Applications to probability theory and statistics

Suppose the random column vectors ''X'', ''Y'' live in R''n'' and R''m'' respectively, and the vector (''X'', ''Y'') in R''n'' + ''m'' has a
multivariate normal distribution In probability theory and statistics, the multivariate normal distribution, multivariate Gaussian distribution, or joint normal distribution is a generalization of the one-dimensional ( univariate) normal distribution to higher dimensions. One ...
whose covariance is the symmetric positive-definite matrix :\Sigma = \left begin A & B \\ B^\mathsf & C\end\right where A \in \mathbb^ is the covariance matrix of ''X'', C \in \mathbb^ is the covariance matrix of ''Y'' and B \in \mathbb^ is the covariance matrix between ''X'' and ''Y''. Then the conditional covariance of ''X'' given ''Y'' is the Schur complement of ''C'' in \Sigma: :\begin \operatorname(X \mid Y) &= A - BC^B^\mathsf \\ \operatorname(X \mid Y) &= \operatorname(X) + BC^(Y - \operatorname(Y)) \end If we take the matrix \Sigma above to be, not a covariance of a random vector, but a ''sample'' covariance, then it may have a
Wishart distribution In statistics, the Wishart distribution is a generalization to multiple dimensions of the gamma distribution. It is named in honor of John Wishart, who first formulated the distribution in 1928. It is a family of probability distributions defi ...
. In that case, the Schur complement of ''C'' in \Sigma also has a Wishart distribution.


Conditions for positive definiteness and semi-definiteness

Let ''X'' be a symmetric matrix of real numbers given by :X = \left begin A & B \\ B^\mathsf & C\end\right Then * If ''A'' is invertible, then ''X'' is positive definite if and only if ''A'' and its complement ''X/A'' are both positive definite: *: X \succ 0 \Leftrightarrow A \succ 0, X/A = C - B^\mathsf A^ B \succ 0. * If ''C'' is invertible, then ''X'' is positive definite if and only if ''C'' and its complement ''X/C'' are both positive definite: *: X \succ 0 \Leftrightarrow C \succ 0, X/C = A - B C^ B^\mathsf \succ 0. * If ''A'' is positive definite, then ''X'' is positive semi-definite if and only if the complement ''X/A'' is positive semi-definite: *: \text A \succ 0, \text X \succeq 0 \Leftrightarrow X/A = C - B^\mathsf A^ B \succeq 0. * If ''C'' is positive definite, then ''X'' is positive semi-definite if and only if the complement ''X/C'' is positive semi-definite: *: \text C \succ 0,\text X \succeq 0 \Leftrightarrow X/C = A - B C^ B^\mathsf \succeq 0. The first and third statements can be derivedBoyd, S. and Vandenberghe, L. (2004), "Convex Optimization", Cambridge University Press (Appendix A.5.5) by considering the minimizer of the quantity :u^\mathsf A u + 2 v^\mathsf B^\mathsf u + v^\mathsf C v, \, as a function of ''v'' (for fixed ''u''). Furthermore, since : \left begin A & B \\ B^\mathsf & C \end\right\succ 0 \Longleftrightarrow \left begin C & B^\mathsf \\ B & A \end\right\succ 0 and similarly for positive semi-definite matrices, the second (respectively fourth) statement is immediate from the first (resp. third) statement. There is also a sufficient and necessary condition for the positive semi-definiteness of ''X'' in terms of a generalized Schur complement. Precisely, * X \succeq 0 \Leftrightarrow A \succeq 0, C - B^\mathsf A^g B \succeq 0, \left(I - A A^\right)B = 0 \, and * X \succeq 0 \Leftrightarrow C \succeq 0, A - B C^g B^\mathsf \succeq 0, \left(I - C C^g\right)B^\mathsf = 0, where A^g denotes the
generalized inverse In mathematics, and in particular, algebra, a generalized inverse (or, g-inverse) of an element ''x'' is an element ''y'' that has some properties of an inverse element but not necessarily all of them. The purpose of constructing a generalized in ...
of A.


See also

*
Woodbury matrix identity In mathematics (specifically linear algebra), the Woodbury matrix identity, named after Max A. Woodbury, says that the inverse of a rank-''k'' correction of some matrix can be computed by doing a rank-''k'' correction to the inverse of the origina ...
*
Quasi-Newton method Quasi-Newton methods are methods used to either find zeroes or local maxima and minima of functions, as an alternative to Newton's method. They can be used if the Jacobian or Hessian is unavailable or is too expensive to compute at every iteration. ...
* Haynsworth inertia additivity formula *
Gaussian process In probability theory and statistics, a Gaussian process is a stochastic process (a collection of random variables indexed by time or space), such that every finite collection of those random variables has a multivariate normal distribution, i.e. ...
*
Total least squares In applied statistics, total least squares is a type of errors-in-variables regression, a least squares data modeling technique in which observational errors on both dependent and independent variables are taken into account. It is a generalizati ...


References

{{reflist Linear algebra