HOME

TheInfoList



OR:

In numerical linear algebra, the
conjugate gradient method In mathematics, the conjugate gradient method is an algorithm for the numerical solution of particular systems of linear equations, namely those whose matrix is positive-definite. The conjugate gradient method is often implemented as an iterativ ...
is an iterative method for numerically solving the linear system :\boldsymbol=\boldsymbol where \boldsymbol is symmetric positive-definite. The conjugate gradient method can be derived from several different perspectives, including specialization of the conjugate direction methodConjugate Direction Methods http://user.it.uu.se/~matsh/opt/f8/node5.html for optimization, and variation of the Arnoldi/ Lanczos iteration for eigenvalue problems. The intent of this article is to document the important steps in these derivations.


Derivation from the conjugate direction method

The conjugate gradient method can be seen as a special case of the conjugate direction method applied to minimization of the quadratic function :f(\boldsymbol)=\boldsymbol^\mathrm\boldsymbol\boldsymbol-2\boldsymbol^\mathrm\boldsymbol\text


The conjugate direction method

In the conjugate direction method for minimizing :f(\boldsymbol)=\boldsymbol^\mathrm\boldsymbol\boldsymbol-2\boldsymbol^\mathrm\boldsymbol\text one starts with an initial guess \boldsymbol_0 and the corresponding residual \boldsymbol_0=\boldsymbol-\boldsymbol_0, and computes the iterate and residual by the formulae :\begin \alpha_i&=\frac\text\\ \boldsymbol_&=\boldsymbol_i+\alpha_i\boldsymbol_i\text\\ \boldsymbol_&=\boldsymbol_i-\alpha_i\boldsymbol_i \end where \boldsymbol_0,\boldsymbol_1,\boldsymbol_2,\ldots are a series of mutually conjugate directions, i.e., :\boldsymbol_i^\mathrm\boldsymbol_j=0 for any i\neq j. The conjugate direction method is imprecise in the sense that no formulae are given for selection of the directions \boldsymbol_0,\boldsymbol_1,\boldsymbol_2,\ldots. Specific choices lead to various methods including the conjugate gradient method and
Gaussian elimination In mathematics, Gaussian elimination, also known as row reduction, is an algorithm for solving systems of linear equations. It consists of a sequence of operations performed on the corresponding matrix of coefficients. This method can also be used ...
.


Derivation from the Arnoldi/Lanczos iteration

The conjugate gradient method can also be seen as a variant of the Arnoldi/Lanczos iteration applied to solving linear systems.


The general Arnoldi method

In the Arnoldi iteration, one starts with a vector \boldsymbol_0 and gradually builds an orthonormal basis \ of the Krylov subspace :\mathcal(\boldsymbol,\boldsymbol_0)=\mathrm\ by defining \boldsymbol_i=\boldsymbol_i/\lVert\boldsymbol_i\rVert_2 where :\boldsymbol_i=\begin \boldsymbol_0 & \texti=1\text\\ \boldsymbol_-\sum_^(\boldsymbol_j^\mathrm\boldsymbol_)\boldsymbol_j & \texti>1\text \end In other words, for i>1, \boldsymbol_i is found by Gram-Schmidt orthogonalizing \boldsymbol_ against \ followed by normalization. Put in matrix form, the iteration is captured by the equation :\boldsymbol_i=\boldsymbol_\boldsymbol_i where :\begin \boldsymbol_i&=\begin \boldsymbol_1 & \boldsymbol_2 & \cdots & \boldsymbol_i \end\text\\ \boldsymbol_i&=\begin h_ & h_ & h_ & \cdots & h_\\ h_ & h_ & h_ & \cdots & h_\\ & h_ & h_ & \cdots & h_\\ & & \ddots & \ddots & \vdots\\ & & & h_ & h_\\ & & & & h_ \end=\begin \boldsymbol_i\\ h_\boldsymbol_i^\mathrm \end \end with :h_=\begin \boldsymbol_j^\mathrm\boldsymbol_i & \textj\leq i\text\\ \lVert\boldsymbol_\rVert_2 & \textj=i+1\text\\ 0 & \textj>i+1\text \end When applying the Arnoldi iteration to solving linear systems, one starts with \boldsymbol_0=\boldsymbol-\boldsymbol_0, the residual corresponding to an initial guess \boldsymbol_0. After each step of iteration, one computes \boldsymbol_i=\boldsymbol_i^(\lVert\boldsymbol_0\rVert_2\boldsymbol_1) and the new iterate \boldsymbol_i=\boldsymbol_0+\boldsymbol_i\boldsymbol_i.


The direct Lanczos method

For the rest of discussion, we assume that \boldsymbol is symmetric positive-definite. With symmetry of \boldsymbol, the upper Hessenberg matrix \boldsymbol_i=\boldsymbol_i^\mathrm\boldsymbol_i becomes symmetric and thus tridiagonal. It then can be more clearly denoted by :\boldsymbol_i=\begin a_1 & b_2\\ b_2 & a_2 & b_3\\ & \ddots & \ddots & \ddots\\ & & b_ & a_ & b_i\\ & & & b_i & a_i \end\text This enables a short three-term recurrence for \boldsymbol_i in the iteration, and the Arnoldi iteration is reduced to the Lanczos iteration. Since \boldsymbol is symmetric positive-definite, so is \boldsymbol_i. Hence, \boldsymbol_i can be LU factorized without partial pivoting into :\boldsymbol_i=\boldsymbol_i\boldsymbol_i=\begin 1\\ c_2 & 1\\ & \ddots & \ddots\\ & & c_ & 1\\ & & & c_i & 1 \end\begin d_1 & b_2\\ & d_2 & b_3\\ & & \ddots & \ddots\\ & & & d_ & b_i\\ & & & & d_i \end with convenient recurrences for c_i and d_i: :\begin c_i&=b_i/d_\text\\ d_i&=\begin a_1 & \texti=1\text\\ a_i-c_ib_i & \texti>1\text \end \end Rewrite \boldsymbol_i=\boldsymbol_0+\boldsymbol_i\boldsymbol_i as :\begin \boldsymbol_i&=\boldsymbol_0+\boldsymbol_i\boldsymbol_i^(\lVert\boldsymbol_0\rVert_2\boldsymbol_1)\\ &=\boldsymbol_0+\boldsymbol_i\boldsymbol_i^\boldsymbol_i^(\lVert\boldsymbol_0\rVert_2\boldsymbol_1)\\ &=\boldsymbol_0+\boldsymbol_i\boldsymbol_i \end with :\begin \boldsymbol_i&=\boldsymbol_\boldsymbol_i^\text\\ \boldsymbol_i&=\boldsymbol_i^(\lVert\boldsymbol_0\rVert_2\boldsymbol_1)\text \end It is now important to observe that :\begin \boldsymbol_i&=\begin \boldsymbol_ & \boldsymbol_i \end\text\\ \boldsymbol_i&=\begin \boldsymbol_\\ \zeta_i \end\text \end In fact, there are short recurrences for \boldsymbol_i and \zeta_i as well: :\begin \boldsymbol_i&=\frac(\boldsymbol_i-b_i\boldsymbol_)\text\\ \zeta_i&=-c_i\zeta_\text \end With this formulation, we arrive at a simple recurrence for \boldsymbol_i: :\begin \boldsymbol_i&=\boldsymbol_0+\boldsymbol_i\boldsymbol_i\\ &=\boldsymbol_0+\boldsymbol_\boldsymbol_+\zeta_i\boldsymbol_i\\ &=\boldsymbol_+\zeta_i\boldsymbol_i\text \end The relations above straightforwardly lead to the direct Lanczos method, which turns out to be slightly more complex.


The conjugate gradient method from imposing orthogonality and conjugacy

If we allow \boldsymbol_i to scale and compensate for the scaling in the constant factor, we potentially can have simpler recurrences of the form: :\begin \boldsymbol_i&=\boldsymbol_+\alpha_\boldsymbol_\text\\ \boldsymbol_i&=\boldsymbol_-\alpha_\boldsymbol_\text\\ \boldsymbol_i&=\boldsymbol_i+\beta_\boldsymbol_\text \end As premises for the simplification, we now derive the orthogonality of \boldsymbol_i and conjugacy of \boldsymbol_i, i.e., for i\neq j, :\begin \boldsymbol_i^\mathrm\boldsymbol_j&=0\text\\ \boldsymbol_i^\mathrm\boldsymbol_j&=0\text \end The residuals are mutually orthogonal because \boldsymbol_i is essentially a multiple of \boldsymbol_ since for i=0, \boldsymbol_0=\lVert\boldsymbol_0\rVert_2\boldsymbol_1, for i>0, :\begin \boldsymbol_i&=\boldsymbol-\boldsymbol_i\\ &=\boldsymbol-\boldsymbol(\boldsymbol_0+\boldsymbol_i\boldsymbol_i)\\ &=\boldsymbol_0-\boldsymbol_i\boldsymbol_i\\ &=\boldsymbol_0-\boldsymbol_\boldsymbol_i\boldsymbol_i\\ &=\boldsymbol_0-\boldsymbol_i\boldsymbol_i\boldsymbol_i-h_(\boldsymbol_i^\mathrm\boldsymbol_i)\boldsymbol_\\ &=\lVert\boldsymbol_0\rVert_2\boldsymbol_1-\boldsymbol_i(\lVert\boldsymbol_0\rVert_2\boldsymbol_1)-h_(\boldsymbol_i^\mathrm\boldsymbol_i)\boldsymbol_\\ &=-h_(\boldsymbol_i^\mathrm\boldsymbol_i)\boldsymbol_\text\end To see the conjugacy of \boldsymbol_i, it suffices to show that \boldsymbol_i^\mathrm\boldsymbol_i is diagonal: :\begin \boldsymbol_i^\mathrm\boldsymbol_i&=\boldsymbol_i^\boldsymbol_i^\mathrm\boldsymbol_i\boldsymbol_i^\\ &=\boldsymbol_i^\boldsymbol_i\boldsymbol_i^\\ &=\boldsymbol_i^\boldsymbol_i\boldsymbol_i\boldsymbol_i^\\ &=\boldsymbol_i^\boldsymbol_i \end is symmetric and lower triangular simultaneously and thus must be diagonal. Now we can derive the constant factors \alpha_i and \beta_i with respect to the scaled \boldsymbol_i by solely imposing the orthogonality of \boldsymbol_i and conjugacy of \boldsymbol_i. Due to the orthogonality of \boldsymbol_i, it is necessary that \boldsymbol_^\mathrm\boldsymbol_i=(\boldsymbol_i-\alpha_i\boldsymbol_i)^\mathrm\boldsymbol_i=0. As a result, :\begin \alpha_i&=\frac\\ &=\frac\\ &=\frac\text \end Similarly, due to the conjugacy of \boldsymbol_i, it is necessary that \boldsymbol_^\mathrm\boldsymbol_i=(\boldsymbol_+\beta_i\boldsymbol_i)^\mathrm\boldsymbol_i=0. As a result, :\begin \beta_i&=-\frac\\ &=-\frac\\ &=\frac\text \end This completes the derivation.


References

# # {{Numerical linear algebra Numerical linear algebra Optimization algorithms and methods Gradient methods Articles containing proofs