Locally Optimal Block Preconditioned Conjugate Gradient (LOBPCG) is a
matrix-free method for finding the largest (or smallest)
eigenvalues
In linear algebra, an eigenvector () or characteristic vector of a linear transformation is a nonzero vector that changes at most by a scalar factor when that linear transformation is applied to it. The corresponding eigenvalue, often denoted b ...
and the corresponding
eigenvectors of a symmetric
generalized eigenvalue problem
In linear algebra, eigendecomposition is the factorization of a matrix into a canonical form, whereby the matrix is represented in terms of its eigenvalues and eigenvectors. Only diagonalizable matrices can be factorized in this way. When the matri ...
:
for a given pair
of complex
Hermitian or real
symmetric matrices, where
the matrix
is also assumed
positive-definite In mathematics, positive definiteness is a property of any object to which a bilinear form or a sesquilinear form may be naturally associated, which is positive-definite. See, in particular:
* Positive-definite bilinear form
* Positive-definite fu ...
.
Background
Kantorovich
Leonid Vitalyevich Kantorovich ( rus, Леони́д Вита́льевич Канторо́вич, , p=lʲɪɐˈnʲit vʲɪˈtalʲjɪvʲɪtɕ kəntɐˈrovʲɪtɕ, a=Ru-Leonid_Vitaliyevich_Kantorovich.ogg; 19 January 19127 April 1986) was a Soviet ...
in 1948 proposed calculating the smallest
eigenvalue
In linear algebra, an eigenvector () or characteristic vector of a linear transformation is a nonzero vector that changes at most by a scalar factor when that linear transformation is applied to it. The corresponding eigenvalue, often denoted b ...
of a symmetric matrix
by
steepest descent using a direction
of a scaled
gradient
In vector calculus, the gradient of a scalar-valued differentiable function of several variables is the vector field (or vector-valued function) \nabla f whose value at a point p is the "direction and rate of fastest increase". If the gradi ...
of a
Rayleigh quotient
In mathematics, the Rayleigh quotient () for a given complex Hermitian matrix ''M'' and nonzero vector ''x'' is defined as:
R(M,x) = .
For real matrices and vectors, the condition of being Hermitian reduces to that of being symmetric, and the con ...
in a
scalar product , with the step size computed by minimizing the Rayleigh quotient in the
linear span
In mathematics, the linear span (also called the linear hull or just span) of a set of vectors (from a vector space), denoted , pp. 29-30, §§ 2.5, 2.8 is defined as the set of all linear combinations of the vectors in . It can be characterized ...
of the vectors
and
, i.e. in a locally optimal manner. Samokish
proposed applying a
preconditioner
In mathematics, preconditioning is the application of a transformation, called the preconditioner, that conditions a given problem into a form that is more suitable for numerical solving methods. Preconditioning is typically related to reducing ...
to the residual vector
to generate the preconditioned direction
and derived asymptotic, as
approaches the
eigenvector, convergence rate bounds.
D'yakonov suggested
spectrally equivalent
preconditioning
In mathematics, preconditioning is the application of a transformation, called the preconditioner, that conditions a given problem into a form that is more suitable for numerical solving methods. Preconditioning is typically related to reducing ...
and derived non-asymptotic convergence rate bounds. Block locally optimal multi-step steepest descent for eigenvalue problems was described in.
Local minimization of the Rayleigh quotient on the subspace spanned by the current approximation, the current residual and the previous approximation, as well as its block version, appeared in.
The preconditioned version was analyzed in
and.
Main features
*
Matrix-free, i.e. does not require storing the coefficient matrix explicitly, but can access the matrix by evaluating matrix-vector products.
*
Factorization-free, i.e. does not require any
matrix decomposition even for a
generalized eigenvalue problem
In linear algebra, eigendecomposition is the factorization of a matrix into a canonical form, whereby the matrix is represented in terms of its eigenvalues and eigenvectors. Only diagonalizable matrices can be factorized in this way. When the matri ...
.
* The costs per iteration and the memory use are competitive with those of the
Lanczos method
The Lanczos algorithm is an iterative method devised by Cornelius Lanczos that is an adaptation of power methods to find the m "most useful" (tending towards extreme highest/lowest) eigenvalues and eigenvectors of an n \times n Hermitian matrix ...
, computing a single extreme eigenpair of a symmetric matrix.
* Linear convergence is theoretically guaranteed and practically observed.
* Accelerated convergence due to direct
preconditioning
In mathematics, preconditioning is the application of a transformation, called the preconditioner, that conditions a given problem into a form that is more suitable for numerical solving methods. Preconditioning is typically related to reducing ...
, in contrast to the
Lanczos method
The Lanczos algorithm is an iterative method devised by Cornelius Lanczos that is an adaptation of power methods to find the m "most useful" (tending towards extreme highest/lowest) eigenvalues and eigenvectors of an n \times n Hermitian matrix ...
, including variable and non-symmetric as well as fixed and positive definite
preconditioning
In mathematics, preconditioning is the application of a transformation, called the preconditioner, that conditions a given problem into a form that is more suitable for numerical solving methods. Preconditioning is typically related to reducing ...
.
* Allows trivial incorporation of efficient
domain decomposition
In mathematics, numerical analysis, and numerical partial differential equations, domain decomposition methods solve a boundary value problem by splitting it into smaller boundary value problems on subdomains and iterating to coordinate the soluti ...
and
multigrid techniques via preconditioning.
* Warm starts and computes an approximation to the eigenvector on every iteration.
* More numerically stable compared to the
Lanczos method
The Lanczos algorithm is an iterative method devised by Cornelius Lanczos that is an adaptation of power methods to find the m "most useful" (tending towards extreme highest/lowest) eigenvalues and eigenvectors of an n \times n Hermitian matrix ...
, and can operate in low-precision computer arithmetic.
* Easy to implement, with many versions already appeared.
* Blocking allows utilizing highly efficient matrix-matrix operations, e.g.,
BLAS
Basic Linear Algebra Subprograms (BLAS) is a specification that prescribes a set of low-level routines for performing common linear algebra operations such as vector addition, scalar multiplication, dot products, linear combinations, and matrix ...
3.
* The block size can be tuned to balance convergence speed vs. computer costs of orthogonalizations and the
Rayleigh-Ritz method on every iteration.
Algorithm
Single-vector version
Preliminaries: Gradient descent for eigenvalue problems
The method performs an
iterative
Iteration is the repetition of a process in order to generate a (possibly unbounded) sequence of outcomes. Each repetition of the process is a single iteration, and the outcome of each iteration is then the starting point of the next iteration. ...
maximization (or minimization) of the generalized
Rayleigh quotient
In mathematics, the Rayleigh quotient () for a given complex Hermitian matrix ''M'' and nonzero vector ''x'' is defined as:
R(M,x) = .
For real matrices and vectors, the condition of being Hermitian reduces to that of being symmetric, and the con ...
:
which results in finding largest (or smallest) eigenpairs of
The direction of the steepest ascent, which is the
gradient
In vector calculus, the gradient of a scalar-valued differentiable function of several variables is the vector field (or vector-valued function) \nabla f whose value at a point p is the "direction and rate of fastest increase". If the gradi ...
, of the generalized
Rayleigh quotient
In mathematics, the Rayleigh quotient () for a given complex Hermitian matrix ''M'' and nonzero vector ''x'' is defined as:
R(M,x) = .
For real matrices and vectors, the condition of being Hermitian reduces to that of being symmetric, and the con ...
is positively proportional to the vector
:
called the eigenvector
residual. If a
preconditioner
In mathematics, preconditioning is the application of a transformation, called the preconditioner, that conditions a given problem into a form that is more suitable for numerical solving methods. Preconditioning is typically related to reducing ...
is available, it is applied to the residual and gives the vector
:
called the preconditioned residual. Without preconditioning, we set
and so
. An iterative method
:
or, in short,
:
:
:
is known as preconditioned
steepest ascent (or descent), where the scalar
is called the step size. The optimal step size can be determined by maximizing the Rayleigh quotient, i.e.,
:
(or
in case of minimizing),
in which case the method is called locally optimal.
Three-term recurrence
To dramatically accelerate the convergence of the locally optimal preconditioned steepest ascent (or descent), one extra vector can be added to the two-term
recurrence relation to make it three-term:
:
(use
in case of minimizing). The maximization/minimization of the Rayleigh quotient in a 3-dimensional subspace can be performed numerically by the
Rayleigh–Ritz method The Rayleigh–Ritz method is a direct numerical method of approximating eigenvalues, originated in the context of solving physical boundary value problems and named after Lord Rayleigh and Walther Ritz.
The name Rayleigh–Ritz is being debated v ...
. Adding more vectors, see, e.g.,
Richardson extrapolation
In numerical analysis, Richardson extrapolation is a sequence acceleration method used to improve the rate of convergence of a sequence of estimates of some value A^\ast = \lim_ A(h). In essence, given the value of A(h) for several values of h, we ...
, does not result in significant acceleration
but increases computation costs, so is not generally recommended.
Numerical stability improvements
As the iterations converge, the vectors
and
become nearly
linearly dependent, resulting in a precision loss and making the
Rayleigh–Ritz method The Rayleigh–Ritz method is a direct numerical method of approximating eigenvalues, originated in the context of solving physical boundary value problems and named after Lord Rayleigh and Walther Ritz.
The name Rayleigh–Ritz is being debated v ...
numerically unstable in the presence of round-off errors. The loss of precision may be avoided by substituting the vector
with a vector
, that may be further away from
, in the basis of the three-dimensional subspace
, while keeping the subspace unchanged and avoiding
orthogonalization
In linear algebra, orthogonalization is the process of finding a set of orthogonal vectors that span a particular subspace. Formally, starting with a linearly independent set of vectors in an inner product space (most commonly the Euclidean spa ...
or any other extra operations.
Furthermore, orthogonalizing the basis of the three-dimensional subspace may be needed for
ill-conditioned
In numerical analysis, the condition number of a function measures how much the output value of the function can change for a small change in the input argument. This is used to measure how sensitive a function is to changes or errors in the input ...
eigenvalue problems to improve stability and attainable accuracy.
Krylov subspace analogs
This is a single-vector version of the LOBPCG method—one of possible generalization of the
preconditioned conjugate gradient
In mathematics, the conjugate gradient method is an algorithm for the numerical solution of particular systems of linear equations, namely those whose matrix is positive-definite. The conjugate gradient method is often implemented as an iterative ...
linear solvers to the case of symmetric
eigenvalue
In linear algebra, an eigenvector () or characteristic vector of a linear transformation is a nonzero vector that changes at most by a scalar factor when that linear transformation is applied to it. The corresponding eigenvalue, often denoted b ...
problems.
Even in the trivial case
and
the resulting approximation with
will be different from that obtained by the
Lanczos algorithm
The Lanczos algorithm is an iterative method devised by Cornelius Lanczos that is an adaptation of power iteration, power methods to find the m "most useful" (tending towards extreme highest/lowest) eigenvalues and eigenvectors of an n \times n ...
, although both approximations will belong to the same
Krylov subspace.
Practical use scenarios
Extreme simplicity and high efficiency of the single-vector version of LOBPCG make it attractive for eigenvalue-related applications under severe hardware limitations, ranging from
spectral clustering
In multivariate statistics, spectral clustering techniques make use of the Spectrum of a matrix, spectrum (eigenvalues) of the similarity matrix of the data to perform dimensionality reduction before clustering in fewer dimensions. The similarity ...
based real-time
anomaly detection via
graph partitioning
In mathematics, a graph partition is the reduction of a graph to a smaller graph by partitioning its set of nodes into mutually exclusive groups. Edges of the original graph that cross between the groups will produce edges in the partitioned graph ...
on embedded
ASIC
An application-specific integrated circuit (ASIC ) is an integrated circuit (IC) chip customized for a particular use, rather than intended for general-purpose use, such as a chip designed to run in a digital voice recorder or a high-efficien ...
or
FPGA
A field-programmable gate array (FPGA) is an integrated circuit designed to be configured by a customer or a designer after manufacturinghence the term '' field-programmable''. The FPGA configuration is generally specified using a hardware de ...
to modelling physical phenomena of record computing complexity on
exascale
Exascale computing refers to computing systems capable of calculating at least "1018 IEEE 754 Double Precision (64-bit) operations (multiplications and/or additions) per second (exaFLOPS)"; it is a measure of supercomputer performance.
Exascale ...
TOP500
The TOP500 project ranks and details the 500 most powerful non-distributed computing, distributed computer systems in the world. The project was started in 1993 and publishes an updated list of the supercomputers twice a year. The first of these ...
supercomputers.
Block version
Summary
Subsequent eigenpairs can be computed one-by-one via single-vector LOBPCG supplemented with an orthogonal deflation or simultaneously as a block. In the former approach, imprecisions in already computed approximate eigenvectors additively affect the accuracy of the subsequently computed eigenvectors, thus increasing the error with every new computation. Iterating several approximate
eigenvectors together in a block in a locally optimal fashion in the block version of the LOBPCG.
allows fast, accurate, and robust computation of eigenvectors, including those corresponding to nearly-multiple eigenvalues where the single-vector LOBPCG suffers from slow convergence. The block size can be tuned to balance numerical stability vs. convergence speed vs. computer costs of orthogonalizations and the Rayleigh-Ritz method on every iteration.
Core design
The block approach in LOBPCG replaces single-vectors
and
with block-vectors, i.e. matrices
and
, where, e.g., every column of
approximates one of the eigenvectors. All columns are iterated simultaneously, and the next matrix of approximate eigenvectors
is determined by the
Rayleigh–Ritz method The Rayleigh–Ritz method is a direct numerical method of approximating eigenvalues, originated in the context of solving physical boundary value problems and named after Lord Rayleigh and Walther Ritz.
The name Rayleigh–Ritz is being debated v ...
on the subspace spanned by all columns of matrices
and
. Each column of
is computed simply as the preconditioned residual for every column of
The matrix
is determined such that the subspaces spanned by the columns of