In
numerical linear algebra, the
conjugate gradient method
In mathematics, the conjugate gradient method is an algorithm for the numerical solution of particular systems of linear equations, namely those whose matrix is positive-definite. The conjugate gradient method is often implemented as an iterativ ...
is an
iterative method for numerically solving the
linear system
:
where
is
symmetric positive-definite. The conjugate gradient method can be derived from several different perspectives, including specialization of the
conjugate direction method[Conjugate Direction Methods http://user.it.uu.se/~matsh/opt/f8/node5.html] for
optimization, and variation of the
Arnoldi/
Lanczos iteration for
eigenvalue problems.
The intent of this article is to document the important steps in these derivations.
Derivation from the conjugate direction method
The conjugate gradient method can be seen as a special case of the conjugate direction method applied to minimization of the quadratic function
:
The conjugate direction method
In the conjugate direction method for minimizing
:
one starts with an initial guess
and the corresponding residual
, and computes the iterate and residual by the formulae
:
where
are a series of mutually conjugate directions, i.e.,
:
for any
.
The conjugate direction method is imprecise in the sense that no formulae are given for selection of the directions
. Specific choices lead to various methods including the conjugate gradient method and
Gaussian elimination
In mathematics, Gaussian elimination, also known as row reduction, is an algorithm for solving systems of linear equations. It consists of a sequence of operations performed on the corresponding matrix of coefficients. This method can also be used ...
.
Derivation from the Arnoldi/Lanczos iteration
The conjugate gradient method can also be seen as a variant of the Arnoldi/Lanczos iteration applied to solving linear systems.
The general Arnoldi method
In the Arnoldi iteration, one starts with a vector
and gradually builds an
orthonormal basis
of the
Krylov subspace
:
by defining
where
:
In other words, for
,
is found by
Gram-Schmidt orthogonalizing against
followed by normalization.
Put in matrix form, the iteration is captured by the equation
:
where
:
with
:
When applying the Arnoldi iteration to solving linear systems, one starts with
, the residual corresponding to an initial guess
. After each step of iteration, one computes
and the new iterate
.
The direct Lanczos method
For the rest of discussion, we assume that
is symmetric positive-definite. With symmetry of
, the
upper Hessenberg matrix becomes symmetric and thus tridiagonal. It then can be more clearly denoted by
:
This enables a short three-term recurrence for
in the iteration, and the Arnoldi iteration is reduced to the Lanczos iteration.
Since
is symmetric positive-definite, so is
. Hence,
can be
LU factorized without
partial pivoting into
:
with convenient recurrences for
and
:
:
Rewrite
as
:
with
:
It is now important to observe that
:
In fact, there are short recurrences for
and
as well:
:
With this formulation, we arrive at a simple recurrence for
:
:
The relations above straightforwardly lead to the direct Lanczos method, which turns out to be slightly more complex.
The conjugate gradient method from imposing orthogonality and conjugacy
If we allow
to scale and compensate for the scaling in the constant factor, we potentially can have simpler recurrences of the form:
:
As premises for the simplification, we now derive the orthogonality of
and conjugacy of
, i.e., for
,
:
The residuals are mutually orthogonal because
is essentially a multiple of
since for
,
, for
,
:
To see the conjugacy of
, it suffices to show that
is diagonal:
:
is symmetric and lower triangular simultaneously and thus must be diagonal.
Now we can derive the constant factors
and
with respect to the scaled
by solely imposing the orthogonality of
and conjugacy of
.
Due to the orthogonality of
, it is necessary that
. As a result,
:
Similarly, due to the conjugacy of
, it is necessary that
. As a result,
:
This completes the derivation.
References
#
#
{{Numerical linear algebra
Numerical linear algebra
Optimization algorithms and methods
Gradient methods
Articles containing proofs