HOME

TheInfoList



OR:

In mathematics, the generalized minimal residual method (GMRES) is an
iterative method In computational mathematics, an iterative method is a Algorithm, mathematical procedure that uses an initial value to generate a sequence of improving approximate solutions for a class of problems, in which the ''n''-th approximation is derived fr ...
for the numerical solution of an indefinite nonsymmetric
system of linear equations In mathematics, a system of linear equations (or linear system) is a collection of one or more linear equations involving the same variable (math), variables. For example, :\begin 3x+2y-z=1\\ 2x-2y+4z=-2\\ -x+\fracy-z=0 \end is a system of three ...
. The method approximates the solution by the vector in a
Krylov subspace In linear algebra, the order-''r'' Krylov subspace generated by an ''n''-by-''n'' matrix ''A'' and a vector ''b'' of dimension ''n'' is the linear subspace spanned by the images of ''b'' under the first ''r'' powers of ''A'' (starting from A^0=I), ...
with minimal residual. The
Arnoldi iteration In numerical linear algebra, the Arnoldi iteration is an eigenvalue algorithm and an important example of an iterative method. Arnoldi finds an approximation to the eigenvalues and eigenvectors of general (possibly non-Hermitian) matrices by const ...
is used to find this vector. The GMRES method was developed by
Yousef Saad Yousef Saad (born 1950) is an I.T. Distinguished Professor of Computer Science in the Department of Computer Science and Engineering at the University of Minnesota.
and Martin H. Schultz in 1986. It is a generalization and improvement of the
MINRES The Minimal Residual Method or MINRES is a Krylov subspace method for the iterative solution of symmetric linear equation systems. It was proposed by mathematicians Christopher Conway Paige and Michael Alan Saunders in 1975. In contrast to the ...
method due to Paige and Saunders in 1975. The MINRES method requires that the matrix is symmetric, but has the advantage that it only requires handling of three vectors. GMRES is a special case of the
DIIS DIIS (direct inversion in the iterative subspace or direct inversion of the iterative subspace), also known as Pulay mixing, is a technique for extrapolating the solution to a set of linear equations by directly minimizing an error residual (e.g. a ...
method developed by Peter Pulay in 1980. DIIS is applicable to non-linear systems.


The method

Denote the
Euclidean norm Euclidean space is the fundamental space of geometry, intended to represent physical space. Originally, that is, in Euclid's ''Elements'', it was the three-dimensional space of Euclidean geometry, but in modern mathematics there are Euclidean s ...
of any vector v by \, v\, . Denote the (square) system of linear equations to be solved by : Ax = b. \, The matrix ''A'' is assumed to be
invertible In mathematics, the concept of an inverse element generalises the concepts of opposite () and reciprocal () of numbers. Given an operation denoted here , and an identity element denoted , if , one says that is a left inverse of , and that is ...
of size ''m''-by-''m''. Furthermore, it is assumed that b is normalized, i.e., that \, b\, = 1. The ''n''-th
Krylov subspace In linear algebra, the order-''r'' Krylov subspace generated by an ''n''-by-''n'' matrix ''A'' and a vector ''b'' of dimension ''n'' is the linear subspace spanned by the images of ''b'' under the first ''r'' powers of ''A'' (starting from A^0=I), ...
for this problem is : K_n = K_n(A,r_0) = \operatorname \, \. \, where r_0=b-Ax_0 is the initial error given an initial guess x_0\ne0. Clearly r_0=b if x_0=0. GMRES approximates the exact solution of Ax = b by the vector x_n \in K_n that minimizes the Euclidean norm of the residual r_n= b - Ax_n. The vectors r_0,Ar_0,\ldots A^r_0 might be close to
linearly dependent In the theory of vector spaces, a set of vectors is said to be if there is a nontrivial linear combination of the vectors that equals the zero vector. If no such linear combination exists, then the vectors are said to be . These concepts are ...
, so instead of this basis, the
Arnoldi iteration In numerical linear algebra, the Arnoldi iteration is an eigenvalue algorithm and an important example of an iterative method. Arnoldi finds an approximation to the eigenvalues and eigenvectors of general (possibly non-Hermitian) matrices by const ...
is used to find orthonormal vectors q_1, q_2, \ldots, q_n \, which form a basis for K_n. In particular, q_1=\, r_0\, _2^r_0. Therefore, the vector x_n \in K_n can be written as x_n = x_0 + Q_n y_n with y_n \in \mathbb^n , where Q_n is the ''m''-by-''n'' matrix formed by q_1,\ldots,q_n . The Arnoldi process also produces an (n+1)-by-n upper
Hessenberg matrix In linear algebra, a Hessenberg matrix is a special kind of square matrix, one that is "almost" triangular. To be exact, an upper Hessenberg matrix has zero entries below the first subdiagonal, and a lower Hessenberg matrix has zero entries above ...
\tilde_n with : AQ_n = Q_ \tilde_n. \, For symmetric matrices, a symmetric tri-diagonal matrix is actually achieved, resulting in the
minres The Minimal Residual Method or MINRES is a Krylov subspace method for the iterative solution of symmetric linear equation systems. It was proposed by mathematicians Christopher Conway Paige and Michael Alan Saunders in 1975. In contrast to the ...
method. Because columns of Q_n are orthonormal, we have : \, r_n \, = \, b - A x_n \, = \, b - A(x_0 + Q_n y_n) \, = \, r_0 - A Q_n y_n \, = \, \beta q_1 - A Q_n y_n \, = \, \beta q_1 - Q_ \tilde_n y_n \, = \, Q_ (\beta e_1 - \tilde_n y_n) \, = \, \beta e_1 - \tilde_n y_n \, , \, where : e_1 = (1,0,0,\ldots,0)^T \, is the first vector in the
standard basis In mathematics, the standard basis (also called natural basis or canonical basis) of a coordinate vector space (such as \mathbb^n or \mathbb^n) is the set of vectors whose components are all zero, except one that equals 1. For example, in the c ...
of \mathbb^ , and : \beta = \, r_0\, \, , r_0 being the first trial vector (usually zero). Hence, x_n can be found by minimizing the Euclidean norm of the residual : r_n = \tilde_n y_n - \beta e_1. This is a
linear least squares Linear least squares (LLS) is the least squares approximation of linear functions to data. It is a set of formulations for solving statistical problems involved in linear regression, including variants for ordinary (unweighted), weighted, and ...
problem of size ''n''. This yields the GMRES method. On the n-th iteration: # calculate q_n with the Arnoldi method; # find the y_n which minimizes \, r_n\, ; # compute x_n = x_0 + Q_n y_n ; # repeat if the residual is not yet small enough. At every iteration, a matrix-vector product A q_n must be computed. This costs about 2m^2 floating-point operations for general dense matrices of size m, but the cost can decrease to O(m) for
sparse matrices In numerical analysis and scientific computing, a sparse matrix or sparse array is a matrix in which most of the elements are zero. There is no strict definition regarding the proportion of zero-value elements for a matrix to qualify as sparse b ...
. In addition to the matrix-vector product, O(nm) floating-point operations must be computed at the ''n'' -th iteration.


Convergence

The ''n''th iterate minimizes the residual in the Krylov subspace K_n. Since every subspace is contained in the next subspace, the residual does not increase. After ''m'' iterations, where ''m'' is the size of the matrix ''A'', the Krylov space ''K''''m'' is the whole of R''m'' and hence the GMRES method arrives at the exact solution. However, the idea is that after a small number of iterations (relative to ''m''), the vector ''x''''n'' is already a good approximation to the exact solution. This does not happen in general. Indeed, a theorem of Greenbaum, Pták and Strakoš states that for every nonincreasing sequence ''a''1, ..., ''a''''m''−1, ''a''''m'' = 0, one can find a matrix ''A'' such that the , , ''r''''n'', , = ''a''''n'' for all ''n'', where ''r''''n'' is the residual defined above. In particular, it is possible to find a matrix for which the residual stays constant for ''m'' − 1 iterations, and only drops to zero at the last iteration. In practice, though, GMRES often performs well. This can be proven in specific situations. If the symmetric part of ''A'', that is (A^T + A)/2, is
positive definite In mathematics, positive definiteness is a property of any object to which a bilinear form or a sesquilinear form may be naturally associated, which is positive-definite. See, in particular: * Positive-definite bilinear form * Positive-definite f ...
, then : \, r_n\, \leq \left( 1-\frac \right)^ \, r_0\, , where \lambda_(M) and \lambda_(M) denote the smallest and largest
eigenvalue In linear algebra, an eigenvector () or characteristic vector of a linear transformation is a nonzero vector that changes at most by a scalar factor when that linear transformation is applied to it. The corresponding eigenvalue, often denoted b ...
of the matrix M, respectively. If ''A'' is
symmetric Symmetry (from grc, συμμετρία "agreement in dimensions, due proportion, arrangement") in everyday language refers to a sense of harmonious and beautiful proportion and balance. In mathematics, "symmetry" has a more precise definiti ...
and positive definite, then we even have : \, r_n\, \leq \left( \frac \right)^ \, r_0\, . where \kappa_2(A) denotes the condition number of ''A'' in the Euclidean norm. In the general case, where ''A'' is not positive definite, we have : \frac \le \inf_ \, p(A)\, \le \kappa_2(V) \inf_ \max_ , p(\lambda), , \, where ''P''''n'' denotes the set of polynomials of degree at most ''n'' with ''p''(0) = 1, ''V'' is the matrix appearing in the spectral decomposition of ''A'', and ''σ''(''A'') is the
spectrum A spectrum (plural ''spectra'' or ''spectrums'') is a condition that is not limited to a specific set of values but can vary, without gaps, across a continuum. The word was first used scientifically in optics to describe the rainbow of colors i ...
of ''A''. Roughly speaking, this says that fast convergence occurs when the eigenvalues of ''A'' are clustered away from the origin and ''A'' is not too far from normality. All these inequalities bound only the residuals instead of the actual error, that is, the distance between the current iterate ''x''''n'' and the exact solution.


Extensions of the method

Like other iterative methods, GMRES is usually combined with a
preconditioning In mathematics, preconditioning is the application of a transformation, called the preconditioner, that conditions a given problem into a form that is more suitable for numerical solving methods. Preconditioning is typically related to reducin ...
method in order to speed up convergence. The cost of the iterations grow as O(''n''2), where ''n'' is the iteration number. Therefore, the method is sometimes restarted after a number, say ''k'', of iterations, with ''x''''k'' as initial guess. The resulting method is called GMRES(''k'') or Restarted GMRES. For non-positive definite matrices, this method may suffer from stagnation in convergence as the restarted subspace is often close to the earlier subspace. The shortcomings of GMRES and restarted GMRES are addressed by the recycling of Krylov subspace in the GCRO type methods such as GCROT and GCRODR. Recycling of Krylov subspaces in GMRES can also speed up convergence when sequences of linear systems need to be solved.


Comparison with other solvers

The Arnoldi iteration reduces to the
Lanczos iteration The Lanczos algorithm is an iterative method devised by Cornelius Lanczos that is an adaptation of power methods to find the m "most useful" (tending towards extreme highest/lowest) eigenvalues and eigenvectors of an n \times n Hermitian matri ...
for symmetric matrices. The corresponding
Krylov subspace In linear algebra, the order-''r'' Krylov subspace generated by an ''n''-by-''n'' matrix ''A'' and a vector ''b'' of dimension ''n'' is the linear subspace spanned by the images of ''b'' under the first ''r'' powers of ''A'' (starting from A^0=I), ...
method is the minimal residual method (MinRes) of Paige and Saunders. Unlike the unsymmetric case, the MinRes method is given by a three-term
recurrence relation In mathematics, a recurrence relation is an equation according to which the nth term of a sequence of numbers is equal to some combination of the previous terms. Often, only k previous terms of the sequence appear in the equation, for a parameter ...
. It can be shown that there is no Krylov subspace method for general matrices, which is given by a short recurrence relation and yet minimizes the norms of the residuals, as GMRES does. Another class of methods builds on the unsymmetric Lanczos iteration, in particular the BiCG method. These use a three-term recurrence relation, but they do not attain the minimum residual, and hence the residual does not decrease monotonically for these methods. Convergence is not even guaranteed. The third class is formed by methods like CGS and
BiCGSTAB In numerical linear algebra, the biconjugate gradient stabilized method, often abbreviated as BiCGSTAB, is an iterative method developed by H. A. van der Vorst for the numerical solution of nonsymmetric linear systems. It is a variant of the bic ...
. These also work with a three-term recurrence relation (hence, without optimality) and they can even terminate prematurely without achieving convergence. The idea behind these methods is to choose the generating polynomials of the iteration sequence suitably. None of these three classes is the best for all matrices; there are always examples in which one class outperforms the other. Therefore, multiple solvers are tried in practice to see which one is the best for a given problem.


Solving the least squares problem

One part of the GMRES method is to find the vector y_n which minimizes : \, \tilde_n y_n - \beta e_1 \, . \, Note that \tilde_n is an (''n'' + 1)-by-''n'' matrix, hence it gives an over-constrained linear system of ''n''+1 equations for ''n'' unknowns. The minimum can be computed using a
QR decomposition In linear algebra, a QR decomposition, also known as a QR factorization or QU factorization, is a decomposition of a matrix ''A'' into a product ''A'' = ''QR'' of an orthogonal matrix ''Q'' and an upper triangular matrix ''R''. QR decompo ...
: find an (''n'' + 1)-by-(''n'' + 1)
orthogonal matrix In linear algebra, an orthogonal matrix, or orthonormal matrix, is a real square matrix whose columns and rows are orthonormal vectors. One way to express this is Q^\mathrm Q = Q Q^\mathrm = I, where is the transpose of and is the identity ma ...
Ω''n'' and an (''n'' + 1)-by-''n'' upper
triangular matrix In mathematics, a triangular matrix is a special kind of square matrix. A square matrix is called if all the entries ''above'' the main diagonal are zero. Similarly, a square matrix is called if all the entries ''below'' the main diagonal are ...
\tilde_n such that : \Omega_n \tilde_n = \tilde_n. The triangular matrix has one more row than it has columns, so its bottom row consists of zero. Hence, it can be decomposed as : \tilde_n = \begin R_n \\ 0 \end, where R_n is an ''n''-by-''n'' (thus square) triangular matrix. The QR decomposition can be updated cheaply from one iteration to the next, because the Hessenberg matrices differ only by a row of zeros and a column: :\tilde_ = \begin \tilde_n & h_ \\ 0 & h_ \end, where ''h''''n+1'' = (''h''1,''n+1'', …, ''h''''n+1,n+1'')T. This implies that premultiplying the Hessenberg matrix with Ω''n'', augmented with zeroes and a row with multiplicative identity, yields almost a triangular matrix: : \begin \Omega_n & 0 \\ 0 & 1 \end \tilde_ = \begin R_n & r_ \\ 0 & \rho \\ 0 & \sigma \end This would be triangular if σ is zero. To remedy this, one needs the
Givens rotation In numerical linear algebra, a Givens rotation is a rotation in the plane spanned by two coordinates axes. Givens rotations are named after Wallace Givens, who introduced them to numerical analysts in the 1950s while he was working at Argonne Nation ...
: G_n = \begin I_ & 0 & 0 \\ 0 & c_n & s_n \\ 0 & -s_n & c_n \end where : c_n = \frac \quad\mbox\quad s_n = \frac. With this Givens rotation, we form : \Omega_ = G_n \begin \Omega_n & 0 \\ 0 & 1 \end. Indeed, : \Omega_ \tilde_ = \begin R_n & r_ \\ 0 & r_ \\ 0 & 0 \end \quad\text\quad r_ = \sqrt is a triangular matrix. Given the QR decomposition, the minimization problem is easily solved by noting that : \, \tilde_n y_n - \beta e_1 \, = \, \Omega_n (\tilde_n y_n - \beta e_1) \, = \, \tilde_n y_n - \beta \Omega_n e_1 \, . Denoting the vector \beta\Omega_ne_1 by : \tilde_n = \begin g_n \\ \gamma_n \end with ''g''''n'' ∈ R''n'' and γ''n'' ∈ R, this is : \, \tilde_n y_n - \beta e_1 \, = \, \tilde_n y_n - \beta \Omega_n e_1 \, = \left\, \begin R_n \\ 0 \end y_n - \begin g_n \\ \gamma_n \end \right\, . The vector ''y'' that minimizes this expression is given by : y_n = R_n^ g_n. Again, the vectors g_n are easy to update.Stoer and Bulirsch, §8.7.2


Example code


Regular GMRES (MATLAB / GNU Octave)

function , e= gmres( A, b, x, max_iterations, threshold) n = length(A); m = max_iterations; % use x as the initial vector r = b - A * x; b_norm = norm(b); error = norm(r) / b_norm; % initialize the 1D vectors sn = zeros(m, 1); cs = zeros(m, 1); %e1 = zeros(n, 1); e1 = zeros(m+1, 1); e1(1) = 1; e = rror r_norm = norm(r); Q(:,1) = r / r_norm; beta = r_norm * e1; %Note: this is not the beta scalar in section "The method" above but the beta scalar multiplied by e1 for k = 1:m % run arnoldi (1:k+1, k), Q(:, k+1)= arnoldi(A, Q, k); % eliminate the last element in H ith row and update the rotation matrix (1:k+1, k), cs(k), sn(k)= apply_givens_rotation(H(1:k+1,k), cs, sn, k); % update the residual vector beta(k + 1) = -sn(k) * beta(k); beta(k) = cs(k) * beta(k); error = abs(beta(k + 1)) / b_norm; % save the error e = ; error if (error <= threshold) break; end end % if threshold is not reached, k = m at this point (and not m+1) % calculate the result y = H(1:k, 1:k) \ beta(1:k); x = x + Q(:, 1:k) * y; end %----------------------------------------------------% % Arnoldi Function % %----------------------------------------------------% function , q= arnoldi(A, Q, k) q = A*Q(:,k); % Krylov Vector for i = 1:k % Modified Gram-Schmidt, keeping the Hessenberg matrix h(i) = q' * Q(:, i); q = q - h(i) * Q(:, i); end h(k + 1) = norm(q); q = q / h(k + 1); end %---------------------------------------------------------------------% % Applying Givens Rotation to H col % %---------------------------------------------------------------------% function , cs_k, sn_k= apply_givens_rotation(h, cs, sn, k) % apply for ith column for i = 1:k-1 temp = cs(i) * h(i) + sn(i) * h(i + 1); h(i+1) = -sn(i) * h(i) + cs(i) * h(i + 1); h(i) = temp; end % update the next sin cos values for rotation s_k, sn_k= givens_rotation(h(k), h(k + 1)); % eliminate H(i + 1, i) h(k) = cs_k * h(k) + sn_k * h(k + 1); h(k + 1) = 0.0; end %%----Calculate the Givens rotation matrix----%% function
s, sn S-comma (majuscule: Ș, minuscule: ș) is a letter which is part of the Romanian alphabet, used to represent the sound , the voiceless postalveolar fricative (like ''sh'' in ''shoe''). History The letter was proposed in the ''Buda Lexicon'' ...
= givens_rotation(v1, v2) % if (v1

0) % cs = 0; % sn = 1; % else t = sqrt(v1^2 + v2^2); % cs = abs(v1) / t; % sn = cs * v2 / v1; cs = v1 / t; % see http://www.netlib.org/eispack/comqr.f sn = v2 / t; % end end


See also

*
Biconjugate gradient method In mathematics, more specifically in numerical linear algebra, the biconjugate gradient method is an algorithm to solve systems of linear equations :A x= b.\, Unlike the conjugate gradient method, this algorithm does not require the matrix A to ...


References


Notes

* A. Meister, ''Numerik linearer Gleichungssysteme'', 2nd edition, Vieweg 2005, . * Y. Saad, ''Iterative Methods for Sparse Linear Systems'', 2nd edition,
Society for Industrial and Applied Mathematics Society for Industrial and Applied Mathematics (SIAM) is a professional society dedicated to applied mathematics, computational science, and data science through research, publications, and community. SIAM is the world's largest scientific socie ...
, 2003. . * Y. Saad and M.H. Schultz, "GMRES: A generalized minimal residual algorithm for solving nonsymmetric linear systems", ''SIAM J. Sci. Stat. Comput.'', 7:856–869, 1986. . * S. C. Eisenstat, H.C. Elman and M.H. Schultz, "Variational iterative methods for nonsymmetric systems of linear equations", ''SIAM Journal on Numerical Analysis'', 20(2), 345–357, 1983. * J. Stoer and R. Bulirsch, ''Introduction to numerical analysis'', 3rd edition, Springer, New York, 2002. . * Lloyd N. Trefethen and David Bau, III, ''Numerical Linear Algebra'', Society for Industrial and Applied Mathematics, 1997. .
Dongarra et al. , ''Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods''
2nd Edition, SIAM, Philadelphia, 1994 * Amritkar, Amit; de Sturler, Eric; Świrydowicz, Katarzyna; Tafti, Danesh; Ahuja, Kapil (2015). "Recycling Krylov subspaces for CFD applications and a new hybrid recycling solver". Journal of Computational Physics 303: 222. doi:10.1016/j.jcp.2015.09.040 * Imankulov T, Lebedev D, Matkerim B, Daribayev B, Kassymbek N. Numerical Simulation of Multiphase Multicomponent Flow in Porous Media: Efficiency Analysis of Newton-Based Method. Fluids. 2021; 6(10):355. https://doi.org/10.3390/fluids6100355 {{Numerical linear algebra Numerical linear algebra Articles with example pseudocode