In numerical linear algebra, the

conjugate gradient method In mathematics, the conjugate gradient method is an algorithm for the numerical solution of particular systems of linear equations, namely those whose matrix is positive-definite. The conjugate gradient method is often implemented as an iterativ ...

is an iterative method for numerically solving the linear system :

\boldsymbol=\boldsymbol

where

\boldsymbol

is symmetric positive-definite. The conjugate gradient method can be derived from several different perspectives, including specialization of the conjugate direction methodConjugate Direction Methods http://user.it.uu.se/~matsh/opt/f8/node5.html for optimization, and variation of the Arnoldi/ Lanczos iteration for eigenvalue problems. The intent of this article is to document the important steps in these derivations.

Derivation from the conjugate direction method

The conjugate gradient method can be seen as a special case of the conjugate direction method applied to minimization of the quadratic function :

f(\boldsymbol)=\boldsymbol^\mathrm\boldsymbol\boldsymbol-2\boldsymbol^\mathrm\boldsymbol\text

The conjugate direction method

In the conjugate direction method for minimizing :

f(\boldsymbol)=\boldsymbol^\mathrm\boldsymbol\boldsymbol-2\boldsymbol^\mathrm\boldsymbol\text

one starts with an initial guess

\boldsymbol_0

and the corresponding residual

\boldsymbol_0=\boldsymbol-\boldsymbol_0

, and computes the iterate and residual by the formulae :

\begin
\alpha_i&=\frac\text\\
\boldsymbol_&=\boldsymbol_i+\alpha_i\boldsymbol_i\text\\
\boldsymbol_&=\boldsymbol_i-\alpha_i\boldsymbol_i
\end

where

\boldsymbol_0,\boldsymbol_1,\boldsymbol_2,\ldots

are a series of mutually conjugate directions, i.e., :

\boldsymbol_i^\mathrm\boldsymbol_j=0

for any

i\neq j

. The conjugate direction method is imprecise in the sense that no formulae are given for selection of the directions

\boldsymbol_0,\boldsymbol_1,\boldsymbol_2,\ldots

. Specific choices lead to various methods including the conjugate gradient method and

Gaussian elimination In mathematics, Gaussian elimination, also known as row reduction, is an algorithm for solving systems of linear equations. It consists of a sequence of operations performed on the corresponding matrix of coefficients. This method can also be used ...

Derivation from the Arnoldi/Lanczos iteration

The conjugate gradient method can also be seen as a variant of the Arnoldi/Lanczos iteration applied to solving linear systems.

The general Arnoldi method

In the Arnoldi iteration, one starts with a vector

\boldsymbol_0

and gradually builds an orthonormal basis

\

of the Krylov subspace :

\mathcal(\boldsymbol,\boldsymbol_0)=\mathrm\

by defining

\boldsymbol_i=\boldsymbol_i/\lVert\boldsymbol_i\rVert_2

where :

\boldsymbol_i=\begin
\boldsymbol_0 & \texti=1\text\\
\boldsymbol_-\sum_^(\boldsymbol_j^\mathrm\boldsymbol_)\boldsymbol_j & \texti>1\text
\end

In other words, for

i>1

\boldsymbol_i

is found by Gram-Schmidt orthogonalizing

\boldsymbol_

against

\

followed by normalization. Put in matrix form, the iteration is captured by the equation :

\boldsymbol_i=\boldsymbol_\boldsymbol_i

where :

\begin
\boldsymbol_i&=\begin
\boldsymbol_1 & \boldsymbol_2 & \cdots & \boldsymbol_i
\end\text\\
\boldsymbol_i&=\begin
h_ & h_ & h_ & \cdots & h_\\
h_ & h_ & h_ & \cdots & h_\\
& h_ & h_ & \cdots & h_\\
& & \ddots & \ddots & \vdots\\
& & & h_ & h_\\
& & & & h_
\end=\begin
\boldsymbol_i\\
h_\boldsymbol_i^\mathrm
\end
\end

with :

h_=\begin
\boldsymbol_j^\mathrm\boldsymbol_i & \textj\leq i\text\\
\lVert\boldsymbol_\rVert_2 & \textj=i+1\text\\
0 & \textj>i+1\text
\end

When applying the Arnoldi iteration to solving linear systems, one starts with

\boldsymbol_0=\boldsymbol-\boldsymbol_0

, the residual corresponding to an initial guess

\boldsymbol_0

. After each step of iteration, one computes

\boldsymbol_i=\boldsymbol_i^(\lVert\boldsymbol_0\rVert_2\boldsymbol_1)

and the new iterate

\boldsymbol_i=\boldsymbol_0+\boldsymbol_i\boldsymbol_i

The direct Lanczos method

For the rest of discussion, we assume that

\boldsymbol

is symmetric positive-definite. With symmetry of

\boldsymbol

, the upper Hessenberg matrix

\boldsymbol_i=\boldsymbol_i^\mathrm\boldsymbol_i

becomes symmetric and thus tridiagonal. It then can be more clearly denoted by :

\boldsymbol_i=\begin
a_1 & b_2\\
b_2 & a_2 & b_3\\
& \ddots & \ddots & \ddots\\
& & b_ & a_ & b_i\\
& & & b_i & a_i
\end\text

This enables a short three-term recurrence for

\boldsymbol_i

in the iteration, and the Arnoldi iteration is reduced to the Lanczos iteration. Since

\boldsymbol

is symmetric positive-definite, so is

\boldsymbol_i

. Hence,

\boldsymbol_i

can be LU factorized without partial pivoting into :

\boldsymbol_i=\boldsymbol_i\boldsymbol_i=\begin
1\\
c_2 & 1\\
& \ddots & \ddots\\
& & c_ & 1\\
& & & c_i & 1
\end\begin
d_1 & b_2\\
& d_2 & b_3\\
& & \ddots & \ddots\\
& & & d_ & b_i\\
& & & & d_i
\end

with convenient recurrences for

c_i

and

d_i

: :

\begin
c_i&=b_i/d_\text\\
d_i&=\begin
a_1 & \texti=1\text\\
a_i-c_ib_i & \texti>1\text
\end
\end

Rewrite

\boldsymbol_i=\boldsymbol_0+\boldsymbol_i\boldsymbol_i

as :

\begin
\boldsymbol_i&=\boldsymbol_0+\boldsymbol_i\boldsymbol_i^(\lVert\boldsymbol_0\rVert_2\boldsymbol_1)\\
&=\boldsymbol_0+\boldsymbol_i\boldsymbol_i^\boldsymbol_i^(\lVert\boldsymbol_0\rVert_2\boldsymbol_1)\\
&=\boldsymbol_0+\boldsymbol_i\boldsymbol_i
\end

with :

\begin
\boldsymbol_i&=\boldsymbol_\boldsymbol_i^\text\\
\boldsymbol_i&=\boldsymbol_i^(\lVert\boldsymbol_0\rVert_2\boldsymbol_1)\text
\end

It is now important to observe that :

\begin
\boldsymbol_i&=\begin
\boldsymbol_ & \boldsymbol_i
\end\text\\
\boldsymbol_i&=\begin
\boldsymbol_\\
\zeta_i
\end\text
\end

In fact, there are short recurrences for

\boldsymbol_i

and

\zeta_i

as well: :

\begin
\boldsymbol_i&=\frac(\boldsymbol_i-b_i\boldsymbol_)\text\\
\zeta_i&=-c_i\zeta_\text
\end

With this formulation, we arrive at a simple recurrence for

\boldsymbol_i

: :

\begin
\boldsymbol_i&=\boldsymbol_0+\boldsymbol_i\boldsymbol_i\\
&=\boldsymbol_0+\boldsymbol_\boldsymbol_+\zeta_i\boldsymbol_i\\
&=\boldsymbol_+\zeta_i\boldsymbol_i\text
\end

The relations above straightforwardly lead to the direct Lanczos method, which turns out to be slightly more complex.

The conjugate gradient method from imposing orthogonality and conjugacy

If we allow

\boldsymbol_i

to scale and compensate for the scaling in the constant factor, we potentially can have simpler recurrences of the form: :

\begin
\boldsymbol_i&=\boldsymbol_+\alpha_\boldsymbol_\text\\
\boldsymbol_i&=\boldsymbol_-\alpha_\boldsymbol_\text\\
\boldsymbol_i&=\boldsymbol_i+\beta_\boldsymbol_\text
\end

As premises for the simplification, we now derive the orthogonality of

\boldsymbol_i

and conjugacy of

\boldsymbol_i

, i.e., for

i\neq j

, :

\begin
\boldsymbol_i^\mathrm\boldsymbol_j&=0\text\\
\boldsymbol_i^\mathrm\boldsymbol_j&=0\text
\end

The residuals are mutually orthogonal because

\boldsymbol_i

is essentially a multiple of

\boldsymbol_

since for

i=0

\boldsymbol_0=\lVert\boldsymbol_0\rVert_2\boldsymbol_1

, for

i>0

, :

\begin
\boldsymbol_i&=\boldsymbol-\boldsymbol_i\\
&=\boldsymbol-\boldsymbol(\boldsymbol_0+\boldsymbol_i\boldsymbol_i)\\
&=\boldsymbol_0-\boldsymbol_i\boldsymbol_i\\
&=\boldsymbol_0-\boldsymbol_\boldsymbol_i\boldsymbol_i\\
&=\boldsymbol_0-\boldsymbol_i\boldsymbol_i\boldsymbol_i-h_(\boldsymbol_i^\mathrm\boldsymbol_i)\boldsymbol_\\
&=\lVert\boldsymbol_0\rVert_2\boldsymbol_1-\boldsymbol_i(\lVert\boldsymbol_0\rVert_2\boldsymbol_1)-h_(\boldsymbol_i^\mathrm\boldsymbol_i)\boldsymbol_\\
&=-h_(\boldsymbol_i^\mathrm\boldsymbol_i)\boldsymbol_\text\end

To see the conjugacy of

\boldsymbol_i

, it suffices to show that

\boldsymbol_i^\mathrm\boldsymbol_i

is diagonal: :

\begin
\boldsymbol_i^\mathrm\boldsymbol_i&=\boldsymbol_i^\boldsymbol_i^\mathrm\boldsymbol_i\boldsymbol_i^\\
&=\boldsymbol_i^\boldsymbol_i\boldsymbol_i^\\
&=\boldsymbol_i^\boldsymbol_i\boldsymbol_i\boldsymbol_i^\\
&=\boldsymbol_i^\boldsymbol_i
\end

is symmetric and lower triangular simultaneously and thus must be diagonal. Now we can derive the constant factors

\alpha_i

and

\beta_i

with respect to the scaled

\boldsymbol_i

by solely imposing the orthogonality of

\boldsymbol_i

and conjugacy of

\boldsymbol_i

. Due to the orthogonality of

\boldsymbol_i

, it is necessary that

\boldsymbol_^\mathrm\boldsymbol_i=(\boldsymbol_i-\alpha_i\boldsymbol_i)^\mathrm\boldsymbol_i=0

. As a result, :

\begin
\alpha_i&=\frac\\
&=\frac\\
&=\frac\text
\end

Similarly, due to the conjugacy of

\boldsymbol_i

, it is necessary that

\boldsymbol_^\mathrm\boldsymbol_i=(\boldsymbol_+\beta_i\boldsymbol_i)^\mathrm\boldsymbol_i=0

. As a result, :

\begin
\beta_i&=-\frac\\
&=-\frac\\
&=\frac\text
\end

This completes the derivation.

References

# # {{Numerical linear algebra Numerical linear algebra Optimization algorithms and methods Gradient methods Articles containing proofs