The Lanczos algorithm is an

iterative method In computational mathematics, an iterative method is a mathematical procedure that uses an initial value to generate a sequence of improving approximate solutions for a class of problems, in which the ''n''-th approximation is derived from the pr ...

devised by Cornelius Lanczos that is an adaptation of power methods to find the

m

"most useful" (tending towards extreme highest/lowest)

eigenvalues and eigenvectors In linear algebra, an eigenvector () or characteristic vector of a linear transformation is a nonzero vector that changes at most by a scalar factor when that linear transformation is applied to it. The corresponding eigenvalue, often denote ...

of an

n \times n

Hermitian matrix In mathematics, a Hermitian matrix (or self-adjoint matrix) is a complex square matrix that is equal to its own conjugate transpose—that is, the element in the -th row and -th column is equal to the complex conjugate of the element in the -t ...

, where

m

is often but not necessarily much smaller than

n

. Although computationally efficient in principle, the method as initially formulated was not useful, due to its numerical instability. In 1970, Ojalvo and Newman showed how to make the method numerically stable and applied it to the solution of very large engineering structures subjected to dynamic loading. This was achieved using a method for purifying the Lanczos vectors (i.e. by repeatedly reorthogonalizing each newly generated vector with all previously generated ones) to any degree of accuracy, which when not performed, produced a series of vectors that were highly contaminated by those associated with the lowest natural frequencies. In their original work, these authors also suggested how to select a starting vector (i.e. use a random-number generator to select each element of the starting vector) and suggested an empirically determined method for determining

m

, the reduced number of vectors (i.e. it should be selected to be approximately 1.5 times the number of accurate eigenvalues desired). Soon thereafter their work was followed by Paige, who also provided an error analysis. In 1988, Ojalvo produced a more detailed history of this algorithm and an efficient eigenvalue error test.

The algorithm

:Input a

A

of size

n \times n

, and optionally a number of iterations

m

(as default, let

m=n

). :* Strictly speaking, the algorithm does not need access to the explicit matrix, but only a function

v \mapsto A v

that computes the product of the matrix by an arbitrary vector. This function is called at most

m

times. :Output an

n \times m

matrix

V

with

orthonormal In linear algebra, two vectors in an inner product space are orthonormal if they are orthogonal (or perpendicular along a line) unit vectors. A set of vectors form an orthonormal set if all vectors in the set are mutually orthogonal and all of un ...

columns and a tridiagonal real symmetric matrix

T = V^* A V

of size

m \times m

. If

m=n

, then

V

unitary Unitary may refer to: Mathematics * Unitary divisor * Unitary element * Unitary group * Unitary matrix * Unitary morphism * Unitary operator * Unitary transformation * Unitary representation * Unitarity (physics) * ''E''-unitary inverse semigrou ...

, and

A = V T V^*

. :Warning The Lanczos iteration is prone to numerical instability. When executed in non-exact arithmetic, additional measures (as outlined in later sections) should be taken to ensure validity of the results. :# Let

v_1 \in \mathbb^n

be an arbitrary vector with Euclidean norm

1

. :# Abbreviated initial iteration step: :## Let

w_1' = A v_1

. :## Let

\alpha_1 =  w_1'^* v_1

. :## Let

w_1 = w_1' - \alpha_1 v_1

. :# For

j=2,\dots,m

do: :## Let

\beta_j = \,  w_ \,

(also Euclidean norm). :## If

\beta_j \neq 0

, then let

v_j = w_ / \beta_j

, :##: else pick as

v_j

an arbitrary vector with Euclidean norm

1

that is orthogonal to all of

v_1,\dots,v_

. :## Let

w_j' = A v_j

. :## Let

\alpha_j = w_j'^* v_j

. :## Let

w_j = w_j' - \alpha_j v_j   - \beta_j v_

. :# Let

V

be the matrix with columns

v_1,\dots,v_m

. Let

T = \begin
\alpha_1 & \beta_2  &          &             &              & 0 \\
\beta_2  & \alpha_2 & \beta_3  &             &              & \\
         & \beta_3  & \alpha_3 & \ddots      &              & \\
         &          & \ddots   & \ddots      & \beta_  & \\
         &          &          & \beta_ & \alpha_ & \beta_m \\
0        &          &          &             & \beta_m      & \alpha_m \\
\end

. :Note

A v_j = w_j' = \beta_ v_ + \alpha_j v_j + \beta_j v_

for

1 < j < m

. There are in principle four ways to write the iteration procedure. Paige and other works show that the above order of operations is the most numerically stable. In practice the initial vector

v_1

may be taken as another argument of the procedure, with

\beta_j=0

and indicators of numerical imprecision being included as additional loop termination conditions. Not counting the matrix–vector multiplication, each iteration does

O(n)

arithmetical operations. The matrix–vector multiplication can be done in

O(dn)

arithmetical operations where

d

is the average number of nonzero elements in a row. The total complexity is thus

O(dmn)

, or

O(dn^2)

m=n

; the Lanczos algorithm can be very fast for sparse matrices. Schemes for improving numerical stability are typically judged against this high performance. The vectors

v_j

are called ''Lanczos vectors''. The vector

w_j'

is not used after

w_j

is computed, and the vector

w_j

is not used after

v_

is computed. Hence one may use the same storage for all three. Likewise, if only the tridiagonal matrix

T

is sought, then the raw iteration does not need

v_

after having computed

w_j

, although some schemes for improving the numerical stability would need it later on. Sometimes the subsequent Lanczos vectors are recomputed from

v_1

when needed.

Application to the eigenproblem

The Lanczos algorithm is most often brought up in the context of finding the

eigenvalue In linear algebra, an eigenvector () or characteristic vector of a linear transformation is a nonzero vector that changes at most by a scalar factor when that linear transformation is applied to it. The corresponding eigenvalue, often denoted ...

s and

eigenvector In linear algebra, an eigenvector () or characteristic vector of a linear transformation is a nonzero vector that changes at most by a scalar factor when that linear transformation is applied to it. The corresponding eigenvalue, often denoted ...

s of a matrix, but whereas an ordinary diagonalization of a matrix would make eigenvectors and eigenvalues apparent from inspection, the same is not true for the tridiagonalization performed by the Lanczos algorithm; nontrivial additional steps are needed to compute even a single eigenvalue or eigenvector. Nonetheless, applying the Lanczos algorithm is often a significant step forward in computing the eigendecomposition. If

\lambda

is an eigenvalue of

A

, and if

T x = \lambda x

(

x

is an eigenvector of

T

) then

y = V x

is the corresponding eigenvector of

A

(since

A y = A V x = V T V^* V x = V T I x = V T x = V (\lambda x) = \lambda V x = \lambda y

). Thus the Lanczos algorithm transforms the eigendecomposition problem for

A

into the eigendecomposition problem for

T

. # For tridiagonal matrices, there exist a number of specialised algorithms, often with better computational complexity than general-purpose algorithms. For example, if

T

is an

m \times m

tridiagonal symmetric matrix then: #* The continuant recursion allows computing the characteristic polynomial in

O(m^2)

operations, and evaluating it at a point in

O(m)

operations. #* The

divide-and-conquer eigenvalue algorithm Divide-and-conquer eigenvalue algorithms are a class of eigenvalue algorithms for Hermitian or real symmetric matrices that have recently (circa 1990s) become competitive in terms of stability and efficiency with more traditional algorithms such as ...

can be used to compute the entire eigendecomposition of

T

O(m^2)

operations. #* The Fast Multipole Method can compute all eigenvalues in just

O(m \log m)

operations. # Some general eigendecomposition algorithms, notably the

QR algorithm In numerical linear algebra, the QR algorithm or QR iteration is an eigenvalue algorithm: that is, a procedure to calculate the eigenvalues and eigenvectors of a matrix. The QR algorithm was developed in the late 1950s by John G. F. Francis and by ...

, are known to converge faster for tridiagonal matrices than for general matrices. Asymptotic complexity of tridiagonal QR is

O(m^2)

just as for the divide-and-conquer algorithm (though the constant factor may be different); since the eigenvectors together have

m^2

elements, this is asymptotically optimal. # Even algorithms whose convergence rates are unaffected by unitary transformations, such as the

power method In mathematics, power iteration (also known as the power method) is an eigenvalue algorithm: given a diagonalizable matrix A, the algorithm will produce a number \lambda, which is the greatest (in absolute value) eigenvalue of A, and a nonzero vec ...

and inverse iteration, may enjoy low-level performance benefits from being applied to the tridiagonal matrix

T

rather than the original matrix

A

. Since

T

is very sparse with all nonzero elements in highly predictable positions, it permits compact storage with excellent performance vis-à-vis caching. Likewise,

T

is a

real Real may refer to: Currencies * Brazilian real (R$) * Central American Republic real * Mexican real * Portuguese real * Spanish real * Spanish colonial real Music Albums * ''Real'' (L'Arc-en-Ciel album) (2000) * ''Real'' (Bright album) (2010) ...

matrix with all eigenvectors and eigenvalues real, whereas

A

in general may have complex elements and eigenvectors, so real arithmetic is sufficient for finding the eigenvectors and eigenvalues of

T

. # If

n

is very large, then reducing

m

so that

T

is of a manageable size will still allow finding the more extreme eigenvalues and eigenvectors of

A

; in the

m \ll n

region, the Lanczos algorithm can be viewed as a

lossy compression In information technology, lossy compression or irreversible compression is the class of data compression methods that uses inexact approximations and partial data discarding to represent the content. These techniques are used to reduce data size ...

scheme for Hermitian matrices, that emphasises preserving the extreme eigenvalues. The combination of good performance for sparse matrices and the ability to compute several (without computing all) eigenvalues are the main reasons for choosing to use the Lanczos algorithm.

Application to tridiagonalization

Though the eigenproblem is often the motivation for applying the Lanczos algorithm, the operation the algorithm primarily performs is tridiagonalization of a matrix, for which numerically stable

Householder transformation In linear algebra, a Householder transformation (also known as a Householder reflection or elementary reflector) is a linear transformation that describes a reflection about a plane or hyperplane containing the origin. The Householder transformati ...

s have been favoured since the 1950s. During the 1960s the Lanczos algorithm was disregarded. Interest in it was rejuvenated by the Kaniel–Paige convergence theory and the development of methods to prevent numerical instability, but the Lanczos algorithm remains the alternative algorithm that one tries only if Householder is not satisfactory. Aspects in which the two algorithms differ include: * Lanczos takes advantage of

A

being a sparse matrix, whereas Householder does not, and will generate fill-in. * Lanczos works throughout with the original matrix

A

(and has no problem with it being known only implicitly), whereas raw Householder wants to modify the matrix during the computation (although that can be avoided). * Each iteration of the Lanczos algorithm produces another column of the final transformation matrix

V

, whereas an iteration of Householder produces another factor in a unitary factorisation

Q_1 Q_2 \dots Q_n

V

. Each factor is however determined by a single vector, so the storage requirements are the same for both algorithms, and

V = Q_1 Q_2 \dots Q_n

can be computed in

O(n^3)

time. * Householder is numerically stable, whereas raw Lanczos is not. * Lanczos is highly parallel, with only

O(n)

points of synchronisation (the computations of

\alpha_j

and

\beta_j

). Householder is less parallel, having a sequence of

O(n^2)

scalar quantities computed that each depend on the previous quantity in the sequence.

Derivation of the algorithm

There are several lines of reasoning which lead to the Lanczos algorithm.

A more provident power method

The power method for finding the eigenvalue of largest magnitude and a corresponding eigenvector of a matrix

A

is roughly :# Pick a random vector

u_1 \neq 0

. :# For

j \geqslant 1

(until the direction of

u_j

has converged) do: :## Let

u_' = A u_j.

:## Let

u_ = u_' / \,  u_' \, .

:* In the large

j

limit,

u_j

approaches the normed eigenvector corresponding to the largest magnitude eigenvalue. A critique that can be raised against this method is that it is wasteful: it spends a lot of work (the matrix–vector products in step 2.1) extracting information from the matrix

A

, but pays attention only to the very last result; implementations typically use the same variable for all the vectors

u_j

, having each new iteration overwrite the results from the previous one. What if we instead kept all the intermediate results and organised their data? One piece of information that trivially is available from the vectors

u_j

is a chain of

Krylov subspace In linear algebra, the order-''r'' Krylov subspace generated by an ''n''-by-''n'' matrix ''A'' and a vector ''b'' of dimension ''n'' is the linear subspace spanned by the images of ''b'' under the first ''r'' powers of ''A'' (starting from A^0=I), ...

s. One way of stating that without introducing sets into the algorithm is to claim that it computes :a subset

\_^m

of a basis of

\Complex^n

such that

Ax \in \operatorname(v_1,\dotsc,v_)

for every

x \in \operatorname(v_1,\dotsc,v_j)

and all

1 \leqslant j < m;

this is trivially satisfied by

v_j = u_j

as long as

u_j

is linearly independent of

u_1,\dotsc,u_

(and in the case that there is such a dependence then one may continue the sequence by picking as

v_j

an arbitrary vector linearly independent of

u_1,\dotsc,u_

). A basis containing the

u_j

vectors is however likely to be numerically

ill-conditioned In numerical analysis, the condition number of a function measures how much the output value of the function can change for a small change in the input argument. This is used to measure how sensitive a function is to changes or errors in the input ...

, since this sequence of vectors is by design meant to converge to an eigenvector of

A

. To avoid that, one can combine the power iteration with a

Gram–Schmidt process In mathematics, particularly linear algebra and numerical analysis, the Gram–Schmidt process is a method for orthonormalizing a set of vectors in an inner product space, most commonly the Euclidean space equipped with the standard inner produ ...

, to instead produce an orthonormal basis of these Krylov subspaces. # Pick a random vector

u_1

of Euclidean norm

1

. Let

v_1 = u_1

. # For

j = 1,\dotsc,m-1

do: ## Let

u_' = A u_j

. ## For all

k = 1, \dotsc, j

let

g_ = v_k^* u_'

. (These are the coordinates of

A u_j = u_'

with respect to the basis vectors

v_1,\dotsc,v_j

.) ## Let

w_ = u_' - \sum_^j g_ v_k

. (Cancel the component of

u_'

that is in

\operatorname(v_1,\dotsc,v_j)

.) ## If

w_ \neq 0

then let

u_ = u_' / \,  u_' \,

and

v_ = w_ / \,  w_ \,

, ##: otherwise pick as

u_ = v_

an arbitrary vector of Euclidean norm

1

that is orthogonal to all of

v_1,\dotsc,v_j

. The relation between the power iteration vectors

u_j

and the orthogonal vectors

v_j

is that :

A u_j = \,  u_'\,  u_ = u_' = w_ + \sum_^j g_ v_k = \,  w_\,  v_ + \sum_^j g_ v_k

. Here it may be observed that we do not actually need the

u_j

vectors to compute these

v_j

, because

u_j - v_j \in \operatorname(v_1,\dotsc,v_)

and therefore the difference between

u_' = A u_j

and

w_' = A v_j

is in

\operatorname(v_1,\dotsc,v_j)

, which is cancelled out by the orthogonalisation process. Thus the same basis for the chain of Krylov subspaces is computed by # Pick a random vector

v_1

of Euclidean norm

1

. # For

j = 1,\dotsc,m-1

do: ## Let

w_' = A v_j

. ## For all

k = 1, \dotsc, j

let

h_ = v_k^* w_'

. ## Let

w_ = w_' - \sum_^j h_ v_k

. ## Let

h_ = \,  w_ \,

. ## If

h_ \neq 0

then let

v_ = w_ / h_

, ##: otherwise pick as

v_

an arbitrary vector of Euclidean norm

1

that is orthogonal to all of

v_1,\dotsc,v_j

. A priori the coefficients

h_

satisfy :

A v_j = \sum_^ h_ v_k

for all

j < m

; the definition

h_ = \,  w_ \,

may seem a bit odd, but fits the general pattern

h_ = v_k^* w_'

since :

v_^* w_' = v_^* w_ = \,  w_ \,  v_^* v_ = \,  w_ \, .

Because the power iteration vectors

u_j

that were eliminated from this recursion satisfy

u_j \in \operatorname(v_1,\ldots,v_j),

the vectors

\_^m

and coefficients

h_

contain enough information from

A

that all of

u_1,\ldots,u_m

can be computed, so nothing was lost by switching vectors. (Indeed, it turns out that the data collected here give significantly better approximations of the largest eigenvalue than one gets from an equal number of iterations in the power method, although that is not necessarily obvious at this point.) This last procedure is the Arnoldi iteration. The Lanczos algorithm then arises as the simplification one gets from eliminating calculation steps that turn out to be trivial when

A

is Hermitian—in particular most of the

h_

coefficients turn out to be zero. Elementarily, if

A

is Hermitian then :

h_ = v_k^* w_' = v_k^* A v_j = v_k^* A^* v_j = (A v_k)^* v_j.

For

k < j-1

we know that

A v_k \in \operatorname(v_1,\ldots,v_)

, and since

v_j

by construction is orthogonal to this subspace, this inner product must be zero. (This is essentially also the reason why sequences of orthogonal polynomials can always be given a three-term recurrence relation.) For

k = j-1

one gets :

h_ = (A v_)^* v_j = \overline = \overline = h_

since the latter is real on account of being the norm of a vector. For

k = j

one gets :

h_ = (A v_j)^* v_j = \overline = \overline,

meaning this is real too. More abstractly, if

V

is the matrix with columns

v_1,\ldots,v_m

then the numbers

h_

can be identified as elements of the matrix

H = V^*AV

, and

h_ = 0

for

k > j+1;

the matrix

H

is upper Hessenberg. Since :

H^* = \left (V^* A V \right )^* = V^* A^* V = V^* A V = H

the matrix

H

is Hermitian. This implies that

H

is also lower Hessenberg, so it must in fact be tridiagional. Being Hermitian, its main diagonal is real, and since its first subdiagonal is real by construction, the same is true for its first superdiagonal. Therefore,

H

is a real, symmetric matrix—the matrix

T

of the Lanczos algorithm specification.

Simultaneous approximation of extreme eigenvalues

One way of characterising the eigenvectors of a Hermitian matrix

A

is as

stationary point In mathematics, particularly in calculus, a stationary point of a differentiable function of one variable is a point on the graph of the function where the function's derivative is zero. Informally, it is a point where the function "stops" in ...

s of the

Rayleigh quotient In mathematics, the Rayleigh quotient () for a given complex Hermitian matrix ''M'' and nonzero vector ''x'' is defined as: R(M,x) = . For real matrices and vectors, the condition of being Hermitian reduces to that of being symmetric, and the co ...

r(x) = \frac , \qquad x \in\Complex^n.

In particular, the largest eigenvalue

\lambda_\max

is the global maximum of

r

and the smallest eigenvalue

\lambda_\min

is the global minimum of

r

. Within a low-dimensional subspace

\mathcal

\Complex^n

it can be feasible to locate the maximum

x

and minimum

y

r

. Repeating that for an increasing chain

\mathcal_1 \subset \mathcal_2 \subset \cdots

produces two sequences of vectors:

x_1, x_2, \ldots

and

y_1, y_2, \dotsc

such that

x_j, y_j \in \mathcal_j

and :

\begin
r(x_1) &\leqslant r(x_2) \leqslant \cdots \leqslant \lambda_\max \\
r(y_1) &\geqslant r(y_2) \geqslant \cdots \geqslant \lambda_\min 
\end

The question then arises how to choose the subspaces so that these sequences converge at optimal rate. From

x_j

, the optimal direction in which to seek larger values of

r

is that of the

gradient In vector calculus, the gradient of a scalar-valued differentiable function of several variables is the vector field (or vector-valued function) \nabla f whose value at a point p is the "direction and rate of fastest increase". If the gr ...

\nabla r(x_j)

, and likewise from

y_j

the optimal direction in which to seek smaller values of

r

is that of the negative gradient

-\nabla r(y_j)

. In general :

\nabla r(x) = \frac ( A x - r(x) x ),

so the directions of interest are easy enough to compute in matrix arithmetic, but if one wishes to improve on both

x_j

and

y_j

then there are two new directions to take into account:

Ax_j

and

Ay_j;

since

x_j

and

y_j

can be linearly independent vectors (indeed, are close to orthogonal), one cannot in general expect

Ax_j

and

Ay_j

to be parallel. Is it therefore necessary to increase the dimension of

\mathcal_j

2

on every step? Not if

\_^m

are taken to be Krylov subspaces, because then

Az \in \mathcal_

for all

z \in \mathcal_j,

thus in particular for both

z = x_j

and

z = y_j

. In other words, we can start with some arbitrary initial vector

x_1 = y_1,

construct the vector spaces :

\mathcal_j = \operatorname( x_1, A x_1, \ldots, A^ x_1 )

and then seek

x_j, y_j \in \mathcal_j

such that :

r(x_j) = \max_ r(z) \qquad \text \qquad  r(y_j) = \min_ r(z).

Since the

j

th power method iterate

u_j

belongs to

\mathcal_j,

it follows that an iteration to produce the

x_j

and

y_j

cannot converge slower than that of the power method, and will achieve more by approximating both eigenvalue extremes. For the subproblem of optimising

r

on some

\mathcal_j

, it is convenient to have an orthonormal basis

\

for this vector space. Thus we are again led to the problem of iteratively computing such a basis for the sequence of Krylov subspaces.

Convergence and other dynamics

When analysing the dynamics of the algorithm, it is convenient to take the eigenvalues and eigenvectors of

A

as given, even though they are not explicitly known to the user. To fix notation, let

\lambda_1 \geqslant \lambda_2 \geqslant \dotsb \geqslant \lambda_n

be the eigenvalues (these are known to all be real, and thus possible to order) and let

z_1,\dotsc,z_n

be an orthonormal set of eigenvectors such that

A z_k = \lambda_k z_k

for all

k=1,\dotsc,n

. It is also convenient to fix a notation for the coefficients of the initial Lanczos vector

v_1

with respect to this eigenbasis; let

d_k = z_k^* v_1

for all

k=1,\dotsc,n

, so that

\textstyle v_1 = \sum_^n d_k z_k

. A starting vector

v_1

depleted of some eigencomponent will delay convergence to the corresponding eigenvalue, and even though this just comes out as a constant factor in the error bounds, depletion remains undesirable. One common technique for avoiding being consistently hit by it is to pick

v_1

by first drawing the elements randomly according to the same

normal distribution In statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is : f(x) = \frac e^ The parameter \mu ...

with mean

0

and then rescale the vector to norm

1

. Prior to the rescaling, this causes the coefficients

d_k

to also be independent normally distributed stochastic variables from the same normal distribution (since the change of coordinates is unitary), and after rescaling the vector

(d_1,\dotsc,d_n)

will have a uniform distribution on the unit sphere in

\mathbb^n

. This makes it possible to bound the probability that for example

, d_1,  < \varepsilon

. The fact that the Lanczos algorithm is coordinate-agnostic – operations only look at inner products of vectors, never at individual elements of vectors – makes it easy to construct examples with known eigenstructure to run the algorithm on: make

A

a diagonal matrix with the desired eigenvalues on the diagonal; as long as the starting vector

v_1

has enough nonzero elements, the algorithm will output a general tridiagonal symmetric matrix as

T

Kaniel–Paige convergence theory

After

m

iteration steps of the Lanczos algorithm,

T

is an

m \times m

real symmetric matrix, that similarly to the above has

m

eigenvalues

\theta_1 \geqslant \theta_2 \geqslant \dots \geqslant \theta_m.

By convergence is primarily understood the convergence of

\theta_1

\lambda_1

(and the symmetrical convergence of

\theta_m

\lambda_n

) as

m

grows, and secondarily the convergence of some range

\theta_1, \ldots, \theta_k

of eigenvalues of

T

to their counterparts

\lambda_1, \ldots, \lambda_k

A

. The convergence for the Lanczos algorithm is often orders of magnitude faster than that for the power iteration algorithm. The bounds for

\theta_1

come from the above interpretation of eigenvalues as extreme values of the Rayleigh quotient

r(x)

. Since

\lambda_1

is a priori the maximum of

r

on the whole of

\Complex^n,

whereas

\theta_1

is merely the maximum on an

m

-dimensional Krylov subspace, we trivially get

\lambda_1 \geqslant \theta_1

. Conversely, any point

x

in that Krylov subspace provides a lower bound

r(x)

for

\theta_1

, so if a point can be exhibited for which

\lambda_1 - r(x)

is small then this provides a tight bound on

\theta_1

. The dimension

m

Krylov subspace is :

\operatorname \left \,

so any element of it can be expressed as

p(A) v_1

for some polynomial

p

of degree at most

m-1

; the coefficients of that polynomial are simply the coefficients in the linear combination of the vectors

v_1, A v_1, A^2 v_1, \ldots, A^ v_1

. The polynomial we want will turn out to have real coefficients, but for the moment we should allow also for complex coefficients, and we will write

p^*

for the polynomial obtained by complex conjugating all coefficients of

p

. In this parametrisation of the Krylov subspace, we have :

r(p(A)v_1) = \frac = \frac = \frac = \frac

Using now the expression for

v_1

as a linear combination of eigenvectors, we get :

A v_1 = A \sum_^n d_k z_k = \sum_^n d_k \lambda_k z_k

and more generally :

q(A) v_1 = \sum_^n d_k q(\lambda_k) z_k

for any polynomial

q

. Thus :

\lambda_1 - r(p(A)v_1) = \lambda_1 - \frac = \lambda_1 - \frac = \frac.

A key difference between numerator and denominator here is that the

k=1

term vanishes in the numerator, but not in the denominator. Thus if one can pick

p

to be large at

\lambda_1

but small at all other eigenvalues, one will get a tight bound on the error

\lambda_1-\theta_1

. Since

A

has many more eigenvalues than

p

has coefficients, this may seem a tall order, but one way to meet it is to use

Chebyshev polynomial The Chebyshev polynomials are two sequences of polynomials related to the cosine and sine functions, notated as T_n(x) and U_n(x). They can be defined in several equivalent ways, one of which starts with trigonometric functions: The Chebyshe ...

s. Writing

c_k

for the degree

k

Chebyshev polynomial of the first kind (that which satisfies

c_k(\cos x) = \cos(kx)

for all

x

), we have a polynomial which stays in the range

1,1 /math> on the known interval 1,1 /math> but grows rapidly outside it. With some scaling of the argument, we can have it map all eigenvalues except \lambda_1 into 1,1 /math>. Let

: p(x) = c_\left( \frac \right) (in case \lambda_2=\lambda_1, use instead the largest eigenvalue strictly less than \lambda_1), then the maximal value of,  p(\lambda_k), ^2 for k \geqslant 2 is 1 and the minimal value is 0, so

: \lambda_1 - \theta_1 \leqslant \lambda_1 - r(p(A) v_1) = \frac \leqslant \frac \leqslant \frac. Furthermore

: p(\lambda_1) = c_\left( \frac \right) = c_\left( 2\frac + 1 \right); the quantity 

: \rho = \frac (i.e., the ratio of the first

eigengap In linear algebra, the eigengap of a linear operator is the difference between two successive eigenvalues, where eigenvalues are sorted in ascending order. The Davis–Kahan theorem, named after Chandler Davis and William Kahan, uses the ei ...

to the diameter of the rest of the

spectrum A spectrum (plural ''spectra'' or ''spectrums'') is a condition that is not limited to a specific set of values but can vary, without gaps, across a continuum. The word was first used scientifically in optics to describe the rainbow of colors ...

) is thus of key importance for the convergence rate here. Also writing :

R = e^ = 1 + 2\rho + 2\sqrt,

we may conclude that :

&\leqslant 4 \frac (\lambda_1-\lambda_n) R^ \end

The convergence rate is thus controlled chiefly by

R

, since this bound shrinks by a factor

R^

for each extra iteration. For comparison, one may consider how the convergence rate of the power method depends on

\rho

, but since the power method primarily is sensitive to the quotient between absolute values of the eigenvalues, we need

, \lambda_n,  \leqslant , \lambda_2,

for the eigengap between

\lambda_1

and

\lambda_2

to be the dominant one. Under that constraint, the case that most favours the power method is that

\lambda_n = -\lambda_2

, so consider that. Late in the power method, the iteration vector: :

u = (1-t^2)^ z_1 + t z_2 \approx z_1 + t z_2,

where each new iteration effectively multiplies the

z_2

-amplitude

t

by :

\frac = \frac = \frac = \frac.

The estimate of the largest eigenvalue is then :

u^*Au = (1-t^2)\lambda_1 + t^2\lambda_2,

so the above bound for the Lanczos algorithm convergence rate should be compared to :

\lambda_1 - u^*Au = (\lambda_1-\lambda_2) t^2,

which shrinks by a factor of

(1+2\rho)^

for each iteration. The difference thus boils down to that between

1+2\rho

and

R = 1 + 2\rho + 2\sqrt

. In the

\rho \gg 1

region, the latter is more like

1+4\rho

, and performs like the power method would with an eigengap twice as large; a notable improvement. The more challenging case is however that of

\rho \ll 1,

in which

R \approx 1 + 2\sqrt

is an even larger improvement on the eigengap; the

\rho \gg 1

region is where the Lanczos algorithm convergence-wise makes the ''smallest'' improvement on the power method.

Numerical stability

Stability means how much the algorithm will be affected (i.e. will it produce the approximate result close to the original one) if there are small numerical errors introduced and accumulated. Numerical stability is the central criterion for judging the usefulness of implementing an algorithm on a computer with roundoff. For the Lanczos algorithm, it can be proved that with ''exact arithmetic'', the set of vectors

v_1, v_2, \cdots, v_

constructs an ''orthonormal'' basis, and the eigenvalues/vectors solved are good approximations to those of the original matrix. However, in practice (as the calculations are performed in floating point arithmetic where inaccuracy is inevitable), the orthogonality is quickly lost and in some cases the new vector could even be linearly dependent on the set that is already constructed. As a result, some of the eigenvalues of the resultant tridiagonal matrix may not be approximations to the original matrix. Therefore, the Lanczos algorithm is not very stable. Users of this algorithm must be able to find and remove those "spurious" eigenvalues. Practical implementations of the Lanczos algorithm go in three directions to fight this stability issue: # Prevent the loss of orthogonality, # Recover the orthogonality after the basis is generated. # After the good and "spurious" eigenvalues are all identified, remove the spurious ones.

Variations

Variations on the Lanczos algorithm exist where the vectors involved are tall, narrow matrices instead of vectors and the normalizing constants are small square matrices. These are called "block" Lanczos algorithms and can be much faster on computers with large numbers of registers and long memory-fetch times. Many implementations of the Lanczos algorithm restart after a certain number of iterations. One of the most influential restarted variations is the implicitly restarted Lanczos method, which is implemented in ARPACK. This has led into a number of other restarted variations such as restarted Lanczos bidiagonalization. Another successful restarted variation is the Thick-Restart Lanczos method, which has been implemented in a software package called TRLan.

Nullspace over a finite field

In 1995, Peter Montgomery published an algorithm, based on the Lanczos algorithm, for finding elements of the

nullspace In mathematics, the kernel of a linear map, also known as the null space or nullspace, is the linear subspace of the domain of the map which is mapped to the zero vector. That is, given a linear map between two vector spaces and , the kernel of ...

of a large sparse matrix over

GF(2) (also denoted \mathbb F_2, or \mathbb Z/2\mathbb Z) is the finite field of two elements (GF is the initialism of ''Galois field'', another name for finite fields). Notations and \mathbb Z_2 may be encountered although they can be confused with ...

; since the set of people interested in large sparse matrices over finite fields and the set of people interested in large eigenvalue problems scarcely overlap, this is often also called the ''block Lanczos algorithm'' without causing unreasonable confusion.

Applications

Lanczos algorithms are very attractive because the multiplication by

A\,

is the only large-scale linear operation. Since weighted-term text retrieval engines implement just this operation, the Lanczos algorithm can be applied efficiently to text documents (see

latent semantic indexing Latent semantic analysis (LSA) is a technique in natural language processing, in particular distributional semantics, of analyzing relationships between a set of documents and the terms they contain by producing a set of concepts related to the do ...

). Eigenvectors are also important for large-scale ranking methods such as the

HITS algorithm Hyperlink-Induced Topic Search (HITS; also known as hubs and authorities) is a link analysis algorithm that rates Web pages, developed by Jon Kleinberg. The idea behind Hubs and Authorities stemmed from a particular insight into the creation of we ...

developed by

Jon Kleinberg Jon Michael Kleinberg (born 1971) is an American computer scientist and the Tisch University Professor of Computer Science and Information Science at Cornell University known for his work in algorithms and networks. He is a recipient of the Nevanl ...

, or the

PageRank PageRank (PR) is an algorithm used by Google Search to rank webpages, web pages in their search engine results. It is named after both the term "web page" and co-founder Larry Page. PageRank is a way of measuring the importance of website pages. A ...

algorithm used by Google. Lanczos algorithms are also used in condensed matter physics as a method for solving Hamiltonians of strongly correlated electron systems, as well as in

shell model The SHELL model is a conceptual model of human factors that clarifies the scope of aviation human factors and assists in understanding the human factor relationships between aviation system resources/environment (the flying subsystem) and the huma ...

codes in

nuclear physics Nuclear physics is the field of physics that studies atomic nuclei and their constituents and interactions, in addition to the study of other forms of nuclear matter. Nuclear physics should not be confused with atomic physics, which studies the ...

Implementations

The NAG Library contains several routines for the solution of large scale linear systems and eigenproblems which use the Lanczos algorithm.

MATLAB MATLAB (an abbreviation of "MATrix LABoratory") is a proprietary multi-paradigm programming language and numeric computing environment developed by MathWorks. MATLAB allows matrix manipulations, plotting of functions and data, implementa ...

and

GNU Octave GNU Octave is a high-level programming language primarily intended for scientific computing and numerical computation. Octave helps in solving linear and nonlinear problems numerically, and for performing other numerical experiments using a langu ...

come with ARPACK built-in. Both stored and implicit matrices can be analyzed through the ''eigs()'' function
Matlab
. Similarly, in

Python Python may refer to: Snakes * Pythonidae, a family of nonvenomous snakes found in Africa, Asia, and Australia ** ''Python'' (genus), a genus of Pythonidae found in Africa and Asia * Python (mythology), a mythical serpent Computing * Python (pro ...

, the

SciPy SciPy (pronounced "sigh pie") is a free and open-source Python library used for scientific computing and technical computing. SciPy contains modules for optimization, linear algebra, integration, interpolation, special functions, FFT, ...

package ha
scipy.sparse.linalg.eigsh
which is also a wrapper for the SSEUPD and DSEUPD functions functions from ARPACK which use the Implicitly Restarted Lanczos Method. A Matlab implementation of the Lanczos algorithm (note precision issues) is available as a part of th
Gaussian Belief Propagation Matlab Package
The GraphLabGraphLab

collaborative filtering Collaborative filtering (CF) is a technique used by recommender systems.Francesco Ricci and Lior Rokach and Bracha ShapiraIntroduction to Recommender Systems Handbook Recommender Systems Handbook, Springer, 2011, pp. 1-35 Collaborative filtering ...

library incorporates a large scale parallel implementation of the Lanczos algorithm (in C++) for multicore. Th
PRIMME
library also implements a Lanczos-like algorithm.

The algorithm

Application to the eigenproblem

Application to tridiagonalization

Derivation of the algorithm

A more provident power method

Simultaneous approximation of extreme eigenvalues

Convergence and other dynamics

Kaniel–Paige convergence theory

Numerical stability

Variations

Nullspace over a finite field

Applications

Implementations

Notes

References

Further reading