In mathematics, preconditioning is the application of a transformation, called the preconditioner, that conditions a given problem into a form that is more suitable for numerical solving methods. Preconditioning is typically related to reducing a

condition number In numerical analysis, the condition number of a function measures how much the output value of the function can change for a small change in the input argument. This is used to measure how sensitive a function is to changes or errors in the inpu ...

of the problem. The preconditioned problem is then usually solved by an

iterative method In computational mathematics, an iterative method is a mathematical procedure that uses an initial value to generate a sequence of improving approximate solutions for a class of problems, in which the ''n''-th approximation is derived from the pre ...

Preconditioning for linear systems

linear algebra Linear algebra is the branch of mathematics concerning linear equations such as: :a_1x_1+\cdots +a_nx_n=b, linear maps such as: :(x_1, \ldots, x_n) \mapsto a_1x_1+\cdots +a_nx_n, and their representations in vector spaces and through matrices. ...

and

numerical analysis Numerical analysis is the study of algorithms that use numerical approximation (as opposed to symbolic manipulations) for the problems of mathematical analysis (as distinguished from discrete mathematics). It is the study of numerical methods t ...

, a preconditioner

P

of a matrix

A

is a matrix such that

P^A

has a smaller

than

A

. It is also common to call

T=P^

the preconditioner, rather than

P

, since

P

itself is rarely explicitly available. In modern preconditioning, the application of

T=P^

, i.e., multiplication of a column vector, or a block of column vectors, by

T=P^

, is commonly performed in a matrix-free fashion, i.e., where neither

P

, nor

T=P^

(and often not even

A

) are explicitly available in a matrix form. Preconditioners are useful in

iterative methods In computational mathematics, an iterative method is a mathematical procedure that uses an initial value to generate a sequence of improving approximate solutions for a class of problems, in which the ''n''-th approximation is derived from the pre ...

to solve a linear system

Ax=b

for

x

since the

rate of convergence In numerical analysis, the order of convergence and the rate of convergence of a convergent sequence are quantities that represent how quickly the sequence approaches its limit. A sequence (x_n) that converges to x^* is said to have ''order of co ...

for most iterative linear solvers increases because the

of a matrix decreases as a result of preconditioning. Preconditioned iterative solvers typically outperform direct solvers, e.g.,

Gaussian elimination In mathematics, Gaussian elimination, also known as row reduction, is an algorithm for solving systems of linear equations. It consists of a sequence of operations performed on the corresponding matrix of coefficients. This method can also be used ...

, for large, especially for sparse, matrices. Iterative solvers can be used as

matrix-free methods In computational mathematics, a matrix-free method is an algorithm for solving a linear system of equations or an eigenvalue problem that does not store the coefficient matrix explicitly, but accesses the matrix by evaluating matrix-vector products. ...

, i.e. become the only choice if the coefficient matrix

A

is not stored explicitly, but is accessed by evaluating matrix-vector products.

Description

Instead of solving the original linear system above, one may consider the right preconditioned system :

AP^(Px) = b

and solve :

AP^y=b

for

y

and :

Px=y

for

x

. Alternatively, one may solve the left preconditioned system :

P^(Ax-b)=0 .

Both systems give the same solution as the original system as long as the preconditioner matrix

P

nonsingular In linear algebra, an -by- square matrix is called invertible (also nonsingular or nondegenerate), if there exists an -by- square matrix such that :\mathbf = \mathbf = \mathbf_n \ where denotes the -by- identity matrix and the multiplicati ...

. The left preconditioning is more traditional. The two-sided preconditioned system :

QAP^(Px) = Qb

may be beneficial, e.g., to preserve the matrix symmetry: if the original matrix

A

is real symmetric and real preconditioners

Q

and

P

satisfy

Q^=P^

then the preconditioned matrix

QAP^

is also symmetric. The two-sided preconditioning is common for diagonal scaling where the preconditioners

Q

and

P

are diagonal and scaling is applied both to columns and rows of the original matrix

A

, e.g., in order to decrease the dynamic range of entries of the matrix. The goal of preconditioning is reducing the

, e.g., of the left or right preconditioned system matrix

P^A

AP^

. Small condition numbers benefit fast convergence of iterative solvers and improve stability of the solution with respect to perturbations in the system matrix and the right-hand side, e.g., allowing for more aggressive quantization of the matrix entries using lower computer precision. The preconditioned matrix

P^A

AP^

is rarely explicitly formed. Only the action of applying the preconditioner solve operation

P^

to a given vector may need to be computed. Typically there is a trade-off in the choice of

P

. Since the operator

P^

must be applied at each step of the iterative linear solver, it should have a small cost (computing time) of applying the

P^

operation. The cheapest preconditioner would therefore be

P=I

since then

P^=I.

Clearly, this results in the original linear system and the preconditioner does nothing. At the other extreme, the choice

P=A

gives

P^A = AP^ = I,

which has optimal

of 1, requiring a single iteration for convergence; however in this case

P^=A^,

and applying the preconditioner is as difficult as solving the original system. One therefore chooses

P

as somewhere between these two extremes, in an attempt to achieve a minimal number of linear iterations while keeping the operator

P^

as simple as possible. Some examples of typical preconditioning approaches are detailed below.

Preconditioned iterative methods

Preconditioned iterative methods for

Ax-b=0

are, in most cases, mathematically equivalent to standard iterative methods applied to the preconditioned system

P^(Ax-b)=0.

For example, the standard

Richardson iteration Modified Richardson iteration is an iterative method for solving a system of linear equations. Richardson iteration was proposed by Lewis Fry Richardson in his work dated 1910. It is similar to the Jacobi and Gauss–Seidel method. We seek the so ...

for solving

Ax-b=0

is :

\mathbf_=\mathbf_n-\gamma_n (A\mathbf_n-\mathbf),\ n \ge 0.

Applied to the preconditioned system

P^(Ax-b)=0,

it turns into a preconditioned method :

\mathbf_=\mathbf_n-\gamma_n P^(A\mathbf_n-\mathbf),\ n \ge 0.

Examples of popular preconditioned

for linear systems include the

preconditioned conjugate gradient method In mathematics, the conjugate gradient method is an algorithm for the numerical solution of particular systems of linear equations, namely those whose matrix is positive-definite. The conjugate gradient method is often implemented as an iterati ...

, the

biconjugate gradient method In mathematics, more specifically in numerical linear algebra, the biconjugate gradient method is an algorithm to solve systems of linear equations :A x= b.\, Unlike the conjugate gradient method, this algorithm does not require the matrix A to ...

, and

generalized minimal residual method In mathematics, the generalized minimal residual method (GMRES) is an iterative method for the numerical solution of an indefinite nonsymmetric system of linear equations. The method approximates the solution by the vector in a Krylov subspace with ...

. Iterative methods, which use scalar products to compute the iterative parameters, require corresponding changes in the scalar product together with substituting

P^(Ax-b)=0

for

Ax-b=0.

Matrix splitting

A stationary iterative method is determined by the matrix splitting

A=M-N

and the iteration matrix

C=I-M^A

. Assuming that * the system matrix

A

symmetric Symmetry (from grc, συμμετρία "agreement in dimensions, due proportion, arrangement") in everyday language refers to a sense of harmonious and beautiful proportion and balance. In mathematics, "symmetry" has a more precise definit ...

positive-definite In mathematics, positive definiteness is a property of any object to which a bilinear form or a sesquilinear form may be naturally associated, which is positive-definite. See, in particular: * Positive-definite bilinear form * Positive-definite fu ...

, * the splitting matrix

M

, * the stationary iterative method is convergent, as determined by

\rho(C)<1

, the

\kappa(M^A)

is bounded above by :

\kappa(M^A) \leq \frac \,.

Geometric interpretation

For a

positive definite In mathematics, positive definiteness is a property of any object to which a bilinear form or a sesquilinear form may be naturally associated, which is positive-definite. See, in particular: * Positive-definite bilinear form * Positive-definite fun ...

matrix

A

the preconditioner

P

is typically chosen to be symmetric positive definite as well. The preconditioned operator

P^A

is then also symmetric positive definite, but with respect to the

P

-based

scalar product In mathematics, the dot product or scalar productThe term ''scalar product'' means literally "product with a scalar as a result". It is also used sometimes for other symmetric bilinear forms, for example in a pseudo-Euclidean space. is an algeb ...

. In this case, the desired effect in applying a preconditioner is to make the

quadratic form In mathematics, a quadratic form is a polynomial with terms all of degree two (" form" is another name for a homogeneous polynomial). For example, :4x^2 + 2xy - 3y^2 is a quadratic form in the variables and . The coefficients usually belong t ...

of the preconditioned operator

P^A

with respect to the

P

-based

to be nearly spherical.

Variable and non-linear preconditioning

Denoting

T=P^

, we highlight that preconditioning is practically implemented as multiplying some vector

r

T

, i.e., computing the product

Tr.

In many applications,

T

is not given as a matrix, but rather as an operator

T(r)

acting on the vector

r

. Some popular preconditioners, however, change with

r

and the dependence on

r

may not be linear. Typical examples involve using non-linear

, e.g., the

conjugate gradient method In mathematics, the conjugate gradient method is an algorithm for the numerical solution of particular systems of linear equations, namely those whose matrix is positive-definite. The conjugate gradient method is often implemented as an iterativ ...

, as a part of the preconditioner construction. Such preconditioners may be practically very efficient, however, their behavior is hard to predict theoretically.

Random preconditioning

One interesting particular case of variable preconditioning is random preconditioning, e.g.,

multigrid In numerical analysis, a multigrid method (MG method) is an algorithm for solving differential equations using a hierarchy of discretizations. They are an example of a class of techniques called multiresolution methods, very useful in problems exhi ...

preconditioning on random course grids. If used in

gradient descent In mathematics, gradient descent (also often called steepest descent) is a first-order iterative optimization algorithm for finding a local minimum of a differentiable function. The idea is to take repeated steps in the opposite direction of th ...

methods, random preconditioning can be viewed as an implementation of

stochastic gradient descent Stochastic gradient descent (often abbreviated SGD) is an iterative method for optimizing an objective function with suitable smoothness properties (e.g. differentiable or subdifferentiable). It can be regarded as a stochastic approximation of gr ...

and can lead to faster convergence, compared to fixed preconditioning, since it breaks the asymptotic "zig-zag" pattern of the

Spectrally equivalent preconditioning

The most common use of preconditioning is for iterative solution of linear systems resulting from approximations of

partial differential equations In mathematics, a partial differential equation (PDE) is an equation which imposes relations between the various partial derivatives of a multivariable function. The function is often thought of as an "unknown" to be solved for, similarly to ...

. The better the approximation quality, the larger the matrix size is. In such a case, the goal of optimal preconditioning is, on the one side, to make the spectral condition number of

P^A

to be bounded from above by a constant independent of the matrix size, which is called ''spectrally equivalent'' preconditioning by D'yakonov. On the other hand, the cost of application of the

P^

should ideally be proportional (also independent of the matrix size) to the cost of multiplication of

A

by a vector.

Examples

Jacobi (or diagonal) preconditioner

The Jacobi preconditioner is one of the simplest forms of preconditioning, in which the preconditioner is chosen to be the diagonal of the matrix

P = \mathrm(A).

Assuming

A_ \neq 0, \forall i

, we get

P^_ = \frac.

It is efficient for diagonally dominant matrices

A

. It is used in analysis softwares for beam problems or 1-D problems (EX:- STAAD PRO)

SPAI

The Sparse Approximate Inverse preconditioner minimises

\, AT-I\, _F,

where

\, \cdot\, _F

is the

Frobenius norm In mathematics, a matrix norm is a vector norm in a vector space whose elements (vectors) are matrices (of given dimensions). Preliminaries Given a field K of either real or complex numbers, let K^ be the -vector space of matrices with m row ...

and

T = P^

is from some suitably constrained set of

sparse matrices In numerical analysis and scientific computing, a sparse matrix or sparse array is a matrix in which most of the elements are zero. There is no strict definition regarding the proportion of zero-value elements for a matrix to qualify as sparse b ...

. Under the Frobenius norm, this reduces to solving numerous independent least-squares problems (one for every column). The entries in

T

must be restricted to some sparsity pattern or the problem remains as difficult and time-consuming as finding the exact inverse of

A

. The method was introduced by M.J. Grote and T. Huckle together with an approach to selecting sparsity patterns.

Other preconditioners

* Incomplete Cholesky factorization * Incomplete LU factorization *

Successive over-relaxation In numerical linear algebra, the method of successive over-relaxation (SOR) is a variant of the Gauss–Seidel method for solving a linear system of equations, resulting in faster convergence. A similar method can be used for any slowly converging ...

Symmetric successive over-relaxation In applied mathematics, symmetric successive over-relaxation (SSOR), is a preconditioner. If the original matrix can be split into diagonal, lower and upper triangular as A=D+L+L^\mathsf then the SSOR preconditioner matrix is defined as M=(D+L) D^ ...

* Multigrid preconditioning

External links

Preconditioned Conjugate Gradient
– math-linux.com

Preconditioning for eigenvalue problems

Eigenvalue problems can be framed in several alternative ways, each leading to its own preconditioning. The traditional preconditioning is based on the so-called ''spectral transformations.'' Knowing (approximately) the targeted eigenvalue, one can compute the corresponding eigenvector by solving the related homogeneous linear system, thus allowing to use preconditioning for linear system. Finally, formulating the eigenvalue problem as optimization of the

Rayleigh quotient In mathematics, the Rayleigh quotient () for a given complex Hermitian matrix ''M'' and nonzero vector ''x'' is defined as: R(M,x) = . For real matrices and vectors, the condition of being Hermitian reduces to that of being symmetric, and the con ...

brings preconditioned optimization techniques to the scene.

Spectral transformations

By analogy with linear systems, for an

eigenvalue In linear algebra, an eigenvector () or characteristic vector of a linear transformation is a nonzero vector that changes at most by a scalar factor when that linear transformation is applied to it. The corresponding eigenvalue, often denoted ...

problem

Ax = \lambda x

one may be tempted to replace the matrix

A

with the matrix

P^A

using a preconditioner

P

. However, this makes sense only if the seeking

eigenvectors In linear algebra, an eigenvector () or characteristic vector of a linear transformation is a nonzero vector that changes at most by a scalar factor when that linear transformation is applied to it. The corresponding eigenvalue, often denoted ...

A

and

P^A

are the same. This is the case for spectral transformations. The most popular spectral transformation is the so-called ''shift-and-invert'' transformation, where for a given scalar

\alpha

, called the ''shift'', the original eigenvalue problem

Ax = \lambda x

is replaced with the shift-and-invert problem

(A-\alpha I)^x = \mu x

. The eigenvectors are preserved, and one can solve the shift-and-invert problem by an iterative solver, e.g., the

power iteration In mathematics, power iteration (also known as the power method) is an eigenvalue algorithm: given a diagonalizable matrix A, the algorithm will produce a number \lambda, which is the greatest (in absolute value) eigenvalue of A, and a nonzero vec ...

. This gives the Inverse iteration, which normally converges to the eigenvector, corresponding to the eigenvalue closest to the shift

\alpha

. The Rayleigh quotient iteration is a shift-and-invert method with a variable shift. Spectral transformations are specific for eigenvalue problems and have no analogs for linear systems. They require accurate numerical calculation of the transformation involved, which becomes the main bottleneck for large problems.

General preconditioning

To make a close connection to linear systems, let us suppose that the targeted eigenvalue

\lambda_\star

is known (approximately). Then one can compute the corresponding eigenvector from the homogeneous linear system

(A-\lambda_\star I)x=0

. Using the concept of left preconditioning for linear systems, we obtain

T(A-\lambda_\star I)x=0

, where

T

is the preconditioner, which we can try to solve using the

\mathbf_=\mathbf_n-\gamma_n T(A-\lambda_\star I)\mathbf_n,\ n \ge 0.

The ''ideal'' preconditioning

The Moore–Penrose pseudoinverse

T=(A-\lambda_\star I)^+

is the preconditioner, which makes the

above converge in one step with

\gamma_n=1

, since

I-(A-\lambda_\star I)^+(A-\lambda_\star I)

, denoted by

P_\star

, is the orthogonal projector on the eigenspace, corresponding to

\lambda_\star

. The choice

T=(A-\lambda_\star I)^+

is impractical for three independent reasons. First,

\lambda_\star

is actually not known, although it can be replaced with its approximation

\tilde\lambda_\star

. Second, the exact Moore–Penrose pseudoinverse requires the knowledge of the eigenvector, which we are trying to find. This can be somewhat circumvented by the use of the Jacobi–Davidson preconditioner

T=(I-\tilde P_\star)(A-\tilde\lambda_\star I)^(I-\tilde P_\star)

, where

\tilde P_\star

approximates

P_\star

. Last, but not least, this approach requires accurate numerical solution of linear system with the system matrix

(A-\tilde\lambda_\star I)

, which becomes as expensive for large problems as the shift-and-invert method above. If the solution is not accurate enough, step two may be redundant.

Practical preconditioning

Let us first replace the theoretical value

\lambda_\star

in the

above with its current approximation

\lambda_n

to obtain a practical algorithm :

\mathbf_=\mathbf_n-\gamma_n T(A-\lambda_n I)\mathbf_n,\ n \ge 0.

A popular choice is

\lambda_n=\rho(x_n)

using the

function

\rho(\cdot)

. Practical preconditioning may be as trivial as just using

T=(diag(A))^

T=(diag(A-\lambda_n I))^.

For some classes of eigenvalue problems the efficiency of

T\approx A^

has been demonstrated, both numerically and theoretically. The choice

T\approx A^

allows one to easily utilize for eigenvalue problems the vast variety of preconditioners developed for linear systems. Due to the changing value

\lambda_n

, a comprehensive theoretical convergence analysis is much more difficult, compared to the linear systems case, even for the simplest methods, such as the

External links

Templates for the Solution of Algebraic Eigenvalue Problems: a Practical Guide

Preconditioning in optimization

optimization Mathematical optimization (alternatively spelled ''optimisation'') or mathematical programming is the selection of a best element, with regard to some criterion, from some set of available alternatives. It is generally divided into two subfi ...

, preconditioning is typically used to accelerate first-order

algorithms In mathematics and computer science, an algorithm () is a finite sequence of rigorous instructions, typically used to solve a class of specific problems or to perform a computation. Algorithms are used as specifications for performing c ...

Description

For example, to find a

local minimum In mathematical analysis, the maxima and minima (the respective plurals of maximum and minimum) of a function, known collectively as extrema (the plural of extremum), are the largest and smallest value of the function, either within a given ran ...

of a real-valued function

F(\mathbf)

using

, one takes steps proportional to the ''negative'' of the

gradient In vector calculus, the gradient of a scalar-valued differentiable function of several variables is the vector field (or vector-valued function) \nabla f whose value at a point p is the "direction and rate of fastest increase". If the gradi ...

-\nabla F(\mathbf)

(or of the approximate gradient) of the function at the current point: :

\mathbf_=\mathbf_n-\gamma_n \nabla F(\mathbf_n),\ n \ge 0.

The preconditioner is applied to the gradient: :

\mathbf_=\mathbf_n-\gamma_n P^ \nabla F(\mathbf_n),\ n \ge 0.

Preconditioning here can be viewed as changing the geometry of the vector space with the goal to make the level sets look like circles. In this case the preconditioned gradient aims closer to the point of the extrema as on the figure, which speeds up the convergence.

Connection to linear systems

The minimum of a quadratic function :

F(\mathbf)= \frac\mathbf^TA\mathbf-\mathbf^T\mathbf

, where

\mathbf

and

\mathbf

are real column-vectors and

A

is a real

positive-definite matrix In mathematics, a symmetric matrix M with real entries is positive-definite if the real number z^\textsfMz is positive for every nonzero real column vector z, where z^\textsf is the transpose of More generally, a Hermitian matrix (that is, a ...

, is exactly the solution of the linear equation

A\mathbf=\mathbf

. Since

\nabla F(\mathbf)=A\mathbf-\mathbf

, the preconditioned

method of minimizing

F(\mathbf)

is :

\mathbf_=\mathbf_n-\gamma_n P^(A\mathbf_n-\mathbf),\ n \ge 0.

This is the preconditioned

for solving a

system of linear equations In mathematics, a system of linear equations (or linear system) is a collection of one or more linear equations involving the same variables. For example, :\begin 3x+2y-z=1\\ 2x-2y+4z=-2\\ -x+\fracy-z=0 \end is a system of three equations in t ...

Connection to eigenvalue problems

The minimum of the

\rho(\mathbf)= \frac,

where

\mathbf

is a real non-zero column-vector and

A

is a real

, is the smallest

A

, while the minimizer is the corresponding

eigenvector In linear algebra, an eigenvector () or characteristic vector of a linear transformation is a nonzero vector that changes at most by a scalar factor when that linear transformation is applied to it. The corresponding eigenvalue, often denoted b ...

. Since

\nabla \rho(\mathbf)

is proportional to

A\mathbf-\rho(\mathbf)\mathbf

, the preconditioned

method of minimizing

\rho(\mathbf)

is :

\mathbf_=\mathbf_n-\gamma_n P^(A\mathbf_n-\rho(\mathbf)\mathbf),\ n \ge 0.

This is an analog of preconditioned

for solving eigenvalue problems.

Variable preconditioning

In many cases, it may be beneficial to change the preconditioner at some or even every step of an

iterative algorithm In computational mathematics, an iterative method is a mathematical procedure that uses an initial value to generate a sequence of improving approximate solutions for a class of problems, in which the ''n''-th approximation is derived from the pre ...

in order to accommodate for a changing shape of the level sets, as in :

\mathbf_=\mathbf_n-\gamma_n P_n^ \nabla F(\mathbf_n),\ n \ge 0.

One should have in mind, however, that constructing an efficient preconditioner is very often computationally expensive. The increased cost of updating the preconditioner can easily override the positive effect of faster convergence.

References

Sources

* * * * * Ke Chen: "Matrix Preconditioning Techniques and Applications", Cambridge University Press, (2005). {{Numerical linear algebra Numerical linear algebra