The Lanczos algorithm is an
iterative method
In computational mathematics, an iterative method is a Algorithm, mathematical procedure that uses an initial value to generate a sequence of improving approximate solutions for a class of problems, in which the ''n''-th approximation is derived fr ...
devised by
Cornelius Lanczos
__NOTOC__
Cornelius (Cornel) Lanczos ( hu, Lánczos Kornél, ; born as Kornél Lőwy, until 1906: ''Löwy (Lőwy) Kornél''; February 2, 1893 – June 25, 1974) was a Hungarian-American and later Hungarian-Irish mathematician and physicist. Accor ...
that is an adaptation of
power methods to find the
"most useful" (tending towards extreme highest/lowest)
eigenvalues and eigenvectors
In linear algebra, an eigenvector () or characteristic vector of a linear transformation is a nonzero vector that changes at most by a scalar factor when that linear transformation is applied to it. The corresponding eigenvalue, often denoted b ...
of an
Hermitian matrix
In mathematics, a Hermitian matrix (or self-adjoint matrix) is a complex square matrix that is equal to its own conjugate transpose—that is, the element in the -th row and -th column is equal to the complex conjugate of the element in the -th ...
, where
is often but not necessarily much smaller than
. Although computationally efficient in principle, the method as initially formulated was not useful, due to its
numerical instability
In the mathematics, mathematical subfield of numerical analysis, numerical stability is a generally desirable property of numerical algorithms. The precise definition of stability depends on the context. One is numerical linear algebra and the oth ...
.
In 1970, Ojalvo and Newman showed how to make the method numerically stable and applied it to the solution of very large engineering structures subjected to dynamic loading.
This was achieved using a method for purifying the Lanczos vectors (i.e. by repeatedly reorthogonalizing each newly generated vector with all previously generated ones)
to any degree of accuracy, which when not performed, produced a series of vectors that were highly contaminated by those associated with the lowest natural frequencies.
In their original work, these authors also suggested how to select a starting vector (i.e. use a random-number generator to select each element of the starting vector) and suggested an empirically determined method for determining
, the reduced number of vectors (i.e. it should be selected to be approximately 1.5 times the number of accurate eigenvalues desired). Soon thereafter their work was followed by Paige, who also provided an error analysis. In 1988, Ojalvo produced a more detailed history of this algorithm and an efficient eigenvalue error test.
The algorithm
:Input a
Hermitian matrix
In mathematics, a Hermitian matrix (or self-adjoint matrix) is a complex square matrix that is equal to its own conjugate transpose—that is, the element in the -th row and -th column is equal to the complex conjugate of the element in the -th ...
of size
, and optionally a number of iterations
(as default, let
).
:* Strictly speaking, the algorithm does not need access to the explicit matrix, but only a function
that computes the product of the matrix by an arbitrary vector. This function is called at most
times.
:Output an
matrix
with
orthonormal
In linear algebra, two vectors in an inner product space are orthonormal if they are orthogonal (or perpendicular along a line) unit vectors. A set of vectors form an orthonormal set if all vectors in the set are mutually orthogonal and all of un ...
columns and a
tridiagonal
In linear algebra, a tridiagonal matrix is a band matrix that has nonzero elements only on the main diagonal, the subdiagonal/lower diagonal (the first diagonal below this), and the supradiagonal/upper diagonal (the first diagonal above the main di ...
real symmetric matrix
of size
. If
, then
is
unitary
Unitary may refer to:
Mathematics
* Unitary divisor
* Unitary element
* Unitary group
* Unitary matrix
* Unitary morphism
* Unitary operator
* Unitary transformation
* Unitary representation
* Unitarity (physics)
* ''E''-unitary inverse semigroup ...
, and
.
:Warning The Lanczos iteration is prone to numerical instability. When executed in non-exact arithmetic, additional measures (as outlined in later sections) should be taken to ensure validity of the results.
:# Let
be an arbitrary vector with
Euclidean norm
Euclidean space is the fundamental space of geometry, intended to represent physical space. Originally, that is, in Euclid's ''Elements'', it was the three-dimensional space of Euclidean geometry, but in modern mathematics there are Euclidean s ...
.
:# Abbreviated initial iteration step:
:## Let
.
:## Let
.
:## Let
.
:# For
do:
:## Let
(also
Euclidean norm
Euclidean space is the fundamental space of geometry, intended to represent physical space. Originally, that is, in Euclid's ''Elements'', it was the three-dimensional space of Euclidean geometry, but in modern mathematics there are Euclidean s ...
).
:## If
, then let
,
:##: else pick as
an arbitrary vector with Euclidean norm
that is orthogonal to all of
.
:## Let
.
:## Let
.
:## Let
.
:# Let
be the matrix with columns
. Let
.
:Note
for
.
There are in principle four ways to write the iteration procedure. Paige and other works show that the above order of operations is the most numerically stable.
In practice the initial vector
may be taken as another argument of the procedure, with
and indicators of numerical imprecision being included as additional loop termination conditions.
Not counting the matrix–vector multiplication, each iteration does
arithmetical operations. The matrix–vector multiplication can be done in
arithmetical operations where
is the average number of nonzero elements in a row. The total complexity is thus
, or
if
; the Lanczos algorithm can be very fast for sparse matrices. Schemes for improving numerical stability are typically judged against this high performance.
The vectors
are called ''Lanczos vectors''.
The vector
is not used after
is computed, and the vector
is not used after
is computed. Hence one may use the same storage for all three. Likewise, if only the tridiagonal matrix
is sought, then the raw iteration does not need
after having computed
, although some schemes for improving the numerical stability would need it later on. Sometimes the subsequent Lanczos vectors are recomputed from
when needed.
Application to the eigenproblem
The Lanczos algorithm is most often brought up in the context of finding the
eigenvalue
In linear algebra, an eigenvector () or characteristic vector of a linear transformation is a nonzero vector that changes at most by a scalar factor when that linear transformation is applied to it. The corresponding eigenvalue, often denoted b ...
s and
eigenvectors of a matrix, but whereas an ordinary
diagonalization of a matrix
In linear algebra, a square matrix A is called diagonalizable or non-defective if it is similar to a diagonal matrix, i.e., if there exists an invertible matrix P and a diagonal matrix D such that or equivalently (Such D are not unique.) F ...
would make eigenvectors and eigenvalues apparent from inspection, the same is not true for the tridiagonalization performed by the Lanczos algorithm; nontrivial additional steps are needed to compute even a single eigenvalue or eigenvector. Nonetheless, applying the Lanczos algorithm is often a significant step forward in computing the eigendecomposition. If
is an eigenvalue of
, and if
(
is an eigenvector of
) then
is the corresponding eigenvector of
(since
). Thus the Lanczos algorithm transforms the eigendecomposition problem for
into the eigendecomposition problem for
.
# For tridiagonal matrices, there exist a number of specialised algorithms, often with better computational complexity than general-purpose algorithms. For example, if
is an
tridiagonal symmetric matrix then:
#* The
continuant recursion allows computing the
characteristic polynomial
In linear algebra, the characteristic polynomial of a square matrix is a polynomial which is invariant under matrix similarity and has the eigenvalues as roots. It has the determinant and the trace of the matrix among its coefficients. The chara ...
in
operations, and evaluating it at a point in
operations.
#* The
divide-and-conquer eigenvalue algorithm Divide-and-conquer eigenvalue algorithms are a class of eigenvalue algorithms for Hermitian or real symmetric matrices that have recently (circa 1990s) become competitive in terms of stability and efficiency with more traditional algorithms such as ...
can be used to compute the entire eigendecomposition of
in
operations.
#* The Fast Multipole Method can compute all eigenvalues in just
operations.
# Some general eigendecomposition algorithms, notably the
QR algorithm, are known to converge faster for tridiagonal matrices than for general matrices. Asymptotic complexity of tridiagonal QR is
just as for the divide-and-conquer algorithm (though the constant factor may be different); since the eigenvectors together have
elements, this is asymptotically optimal.
# Even algorithms whose convergence rates are unaffected by unitary transformations, such as the
power method
In mathematics, power iteration (also known as the power method) is an eigenvalue algorithm: given a diagonalizable matrix A, the algorithm will produce a number \lambda, which is the greatest (in absolute value) eigenvalue of A, and a nonzero vect ...
and
inverse iteration In numerical analysis, inverse iteration (also known as the ''inverse power method'') is an Iterative method, iterative eigenvalue algorithm. It allows one to find an approximate
eigenvector when an approximation to a corresponding eigenvalue is alr ...
, may enjoy low-level performance benefits from being applied to the tridiagonal matrix
rather than the original matrix
. Since
is very sparse with all nonzero elements in highly predictable positions, it permits compact storage with excellent performance vis-à-vis
caching
In computing, a cache ( ) is a hardware or software component that stores data so that future requests for that data can be served faster; the data stored in a cache might be the result of an earlier computation or a copy of data stored elsewher ...
. Likewise,
is a
real matrix with all eigenvectors and eigenvalues real, whereas
in general may have complex elements and eigenvectors, so real arithmetic is sufficient for finding the eigenvectors and eigenvalues of
.
# If
is very large, then reducing
so that
is of a manageable size will still allow finding the more extreme eigenvalues and eigenvectors of
; in the
region, the Lanczos algorithm can be viewed as a
lossy compression
In information technology, lossy compression or irreversible compression is the class of data compression methods that uses inexact approximations and partial data discarding to represent the content. These techniques are used to reduce data size ...
scheme for Hermitian matrices, that emphasises preserving the extreme eigenvalues.
The combination of good performance for sparse matrices and the ability to compute several (without computing all) eigenvalues are the main reasons for choosing to use the Lanczos algorithm.
Application to tridiagonalization
Though the eigenproblem is often the motivation for applying the Lanczos algorithm, the operation the algorithm primarily performs is tridiagonalization of a matrix, for which numerically stable
Householder transformations have been favoured since the 1950s. During the 1960s the Lanczos algorithm was disregarded. Interest in it was rejuvenated by the Kaniel–Paige convergence theory and the development of methods to prevent numerical instability, but the Lanczos algorithm remains the alternative algorithm that one tries only if Householder is not satisfactory.
Aspects in which the two algorithms differ include:
* Lanczos takes advantage of
being a sparse matrix, whereas Householder does not, and will generate
fill-in Fill-in can refer to:
* A puzzle, see Fill-In (puzzle)
* In numerical analysis, the entries of a matrix which change from zero to a non-zero value in the execution of an algorithm; see Sparse matrix#Reducing fill-in
* An issue of a comic book prod ...
.
* Lanczos works throughout with the original matrix
(and has no problem with it being known only implicitly), whereas raw Householder wants to modify the matrix during the computation (although that can be avoided).
* Each iteration of the Lanczos algorithm produces another column of the final transformation matrix
, whereas an iteration of Householder produces another factor in a unitary factorisation
of
. Each factor is however determined by a single vector, so the storage requirements are the same for both algorithms, and
can be computed in
time.
* Householder is numerically stable, whereas raw Lanczos is not.
* Lanczos is highly parallel, with only
points of
synchronisation
Synchronization is the coordination of events to operate a system in unison. For example, the conductor of an orchestra keeps the orchestra synchronized or ''in time''. Systems that operate with all parts in synchrony are said to be synchrono ...
(the computations of
and
). Householder is less parallel, having a sequence of
scalar quantities computed that each depend on the previous quantity in the sequence.
Derivation of the algorithm
There are several lines of reasoning which lead to the Lanczos algorithm.
A more provident power method
The power method for finding the eigenvalue of largest magnitude and a corresponding eigenvector of a matrix
is roughly
:# Pick a random vector
.
:# For
(until the direction of
has converged) do:
:## Let
:## Let
:* In the large
limit,
approaches the normed eigenvector corresponding to the largest magnitude eigenvalue.
A critique that can be raised against this method is that it is wasteful: it spends a lot of work (the matrix–vector products in step 2.1) extracting information from the matrix
, but pays attention only to the very last result; implementations typically use the same variable for all the vectors
, having each new iteration overwrite the results from the previous one. What if we instead kept all the intermediate results and organised their data?
One piece of information that trivially is available from the vectors
is a chain of
Krylov subspaces. One way of stating that without introducing sets into the algorithm is to claim that it computes
:a subset
of a basis of
such that
for every
and all
this is trivially satisfied by
as long as
is linearly independent of
(and in the case that there is such a dependence then one may continue the sequence by picking as
an arbitrary vector linearly independent of
). A basis containing the
vectors is however likely to be numerically
ill-conditioned
In numerical analysis, the condition number of a function measures how much the output value of the function can change for a small change in the input argument. This is used to measure how sensitive a function is to changes or errors in the input ...
, since this sequence of vectors is by design meant to converge to an eigenvector of
. To avoid that, one can combine the power iteration with a
Gram–Schmidt process, to instead produce an orthonormal basis of these Krylov subspaces.
# Pick a random vector
of Euclidean norm
. Let
.
# For
do:
## Let
.
## For all
let
. (These are the coordinates of
with respect to the basis vectors
.)
## Let
. (Cancel the component of
that is in
.)
## If
then let
and
,
##: otherwise pick as
an arbitrary vector of Euclidean norm
that is orthogonal to all of
.
The relation between the power iteration vectors
and the orthogonal vectors
is that
:
.
Here it may be observed that we do not actually need the
vectors to compute these
, because
and therefore the difference between
and
is in
, which is cancelled out by the orthogonalisation process. Thus the same basis for the chain of Krylov subspaces is computed by
# Pick a random vector
of Euclidean norm
.
# For
do:
## Let
.
## For all
let
.
## Let
.
## Let
.
## If
then let
,
##: otherwise pick as
an arbitrary vector of Euclidean norm
that is orthogonal to all of
.
A priori the coefficients
satisfy
:
for all
;
the definition
may seem a bit odd, but fits the general pattern
since
:
Because the power iteration vectors
that were eliminated from this recursion satisfy
the vectors
and coefficients
contain enough information from
that all of
can be computed, so nothing was lost by switching vectors. (Indeed, it turns out that the data collected here give significantly better approximations of the largest eigenvalue than one gets from an equal number of iterations in the power method, although that is not necessarily obvious at this point.)
This last procedure is the
Arnoldi iteration In numerical linear algebra, the Arnoldi iteration is an eigenvalue algorithm and an important example of an iterative method. Arnoldi finds an approximation to the eigenvalues and eigenvectors of general (possibly non-Hermitian) matrices by const ...
. The Lanczos algorithm then arises as the simplification one gets from eliminating calculation steps that turn out to be trivial when
is Hermitian—in particular most of the
coefficients turn out to be zero.
Elementarily, if
is Hermitian then
:
For
we know that
, and since
by construction is orthogonal to this subspace, this inner product must be zero. (This is essentially also the reason why sequences of orthogonal polynomials can always be given a
three-term recurrence relation.) For
one gets
:
since the latter is real on account of being the norm of a vector. For
one gets
:
meaning this is real too.
More abstractly, if
is the matrix with columns
then the numbers
can be identified as elements of the matrix
, and
for
the matrix
is
upper Hessenberg. Since
:
the matrix
is Hermitian. This implies that
is also lower Hessenberg, so it must in fact be tridiagional. Being Hermitian, its main diagonal is real, and since its first subdiagonal is real by construction, the same is true for its first superdiagonal. Therefore,
is a real, symmetric matrix—the matrix
of the Lanczos algorithm specification.
Simultaneous approximation of extreme eigenvalues
One way of characterising the eigenvectors of a Hermitian matrix
is as
stationary points of the
Rayleigh quotient
In mathematics, the Rayleigh quotient () for a given complex Hermitian matrix ''M'' and nonzero vector ''x'' is defined as:
R(M,x) = .
For real matrices and vectors, the condition of being Hermitian reduces to that of being symmetric, and the con ...
:
In particular, the largest eigenvalue
is the global maximum of
and the smallest eigenvalue
is the global minimum of
.
Within a low-dimensional subspace
of
it can be feasible to locate the maximum
and minimum
of
. Repeating that for an increasing chain
produces two sequences of vectors:
and
such that
and
:
The question then arises how to choose the subspaces so that these sequences converge at optimal rate.
From
, the optimal direction in which to seek larger values of
is that of the
gradient
In vector calculus, the gradient of a scalar-valued differentiable function of several variables is the vector field (or vector-valued function) \nabla f whose value at a point p is the "direction and rate of fastest increase". If the gradi ...
, and likewise from
the optimal direction in which to seek smaller values of
is that of the negative gradient
. In general
:
so the directions of interest are easy enough to compute in matrix arithmetic, but if one wishes to improve on both
and
then there are two new directions to take into account:
and
since
and
can be linearly independent vectors (indeed, are close to orthogonal), one cannot in general expect
and
to be parallel. Is it therefore necessary to increase the dimension of
by
on every step? Not if
are taken to be Krylov subspaces, because then
for all
thus in particular for both
and
.
In other words, we can start with some arbitrary initial vector
construct the vector spaces
:
and then seek
such that
:
Since the
th power method iterate
belongs to
it follows that an iteration to produce the
and
cannot converge slower than that of the power method, and will achieve more by approximating both eigenvalue extremes. For the subproblem of optimising
on some
, it is convenient to have an orthonormal basis
for this vector space. Thus we are again led to the problem of iteratively computing such a basis for the sequence of Krylov subspaces.
Convergence and other dynamics
When analysing the dynamics of the algorithm, it is convenient to take the eigenvalues and eigenvectors of
as given, even though they are not explicitly known to the user. To fix notation, let
be the eigenvalues (these are known to all be real, and thus possible to order) and let
be an orthonormal set of eigenvectors such that
for all
.
It is also convenient to fix a notation for the coefficients of the initial Lanczos vector
with respect to this eigenbasis; let
for all
, so that
. A starting vector
depleted of some eigencomponent will delay convergence to the corresponding eigenvalue, and even though this just comes out as a constant factor in the error bounds, depletion remains undesirable. One common technique for avoiding being consistently hit by it is to pick
by first drawing the elements randomly according to the same
normal distribution
In statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is
:
f(x) = \frac e^
The parameter \mu ...
with mean
and then rescale the vector to norm
. Prior to the rescaling, this causes the coefficients
to also be independent normally distributed stochastic variables from the same normal distribution (since the change of coordinates is unitary), and after rescaling the vector
will have a
uniform distribution
Uniform distribution may refer to:
* Continuous uniform distribution
* Discrete uniform distribution
* Uniform distribution (ecology)
* Equidistributed sequence In mathematics, a sequence (''s''1, ''s''2, ''s''3, ...) of real numbers is said to be ...
on the unit sphere in
. This makes it possible to bound the probability that for example
.
The fact that the Lanczos algorithm is coordinate-agnostic – operations only look at inner products of vectors, never at individual elements of vectors – makes it easy to construct examples with known eigenstructure to run the algorithm on: make
a diagonal matrix with the desired eigenvalues on the diagonal; as long as the starting vector
has enough nonzero elements, the algorithm will output a general tridiagonal symmetric matrix as
.
Kaniel–Paige convergence theory
After
iteration steps of the Lanczos algorithm,
is an
real symmetric matrix, that similarly to the above has
eigenvalues
By convergence is primarily understood the convergence of
to
(and the symmetrical convergence of
to
) as
grows, and secondarily the convergence of some range
of eigenvalues of
to their counterparts
of
. The convergence for the Lanczos algorithm is often orders of magnitude faster than that for the power iteration algorithm.
The bounds for
come from the above interpretation of eigenvalues as extreme values of the Rayleigh quotient
. Since
is a priori the maximum of
on the whole of
whereas
is merely the maximum on an
-dimensional Krylov subspace, we trivially get
. Conversely, any point
in that Krylov subspace provides a lower bound
for
, so if a point can be exhibited for which
is small then this provides a tight bound on
.
The dimension
Krylov subspace is
:
so any element of it can be expressed as
for some polynomial
of degree at most
; the coefficients of that polynomial are simply the coefficients in the linear combination of the vectors
. The polynomial we want will turn out to have real coefficients, but for the moment we should allow also for complex coefficients, and we will write
for the polynomial obtained by complex conjugating all coefficients of
. In this parametrisation of the Krylov subspace, we have
:
Using now the expression for
as a linear combination of eigenvectors, we get
:
and more generally
:
for any polynomial
.
Thus
:
A key difference between numerator and denominator here is that the
term vanishes in the numerator, but not in the denominator. Thus if one can pick
to be large at
but small at all other eigenvalues, one will get a tight bound on the error
.
Since
has many more eigenvalues than
has coefficients, this may seem a tall order, but one way to meet it is to use
Chebyshev polynomial
The Chebyshev polynomials are two sequences of polynomials related to the cosine and sine functions, notated as T_n(x) and U_n(x). They can be defined in several equivalent ways, one of which starts with trigonometric functions:
The Chebyshe ...
s. Writing
for the degree
Chebyshev polynomial of the first kind (that which satisfies
for all
), we have a polynomial which stays in the range