The Kaczmarz method or Kaczmarz's algorithm is an
iterative algorithm
In computational mathematics, an iterative method is a mathematical procedure that uses an initial value to generate a sequence of improving approximate solutions for a class of problems, in which the ''n''-th approximation is derived from the pre ...
for solving
linear equation systems
. It was first discovered by the Polish mathematician
Stefan Kaczmarz, and was rediscovered in the field of image reconstruction from projections by
Richard Gordon, Robert Bender, and
Gabor Herman
Gabor Tamas Herman is a Hungarian-American professor of computer science. He is Emiritas Professor of Computer Science at The Graduate Center, City University of New York (CUNY) where he was Distinguished Professor until 2017. He is known for his ...
in 1970, where it is called the
Algebraic Reconstruction Technique (ART). ART includes the positivity constraint, making it nonlinear.
The Kaczmarz method is applicable to any linear system of equations, but its computational advantage relative to other methods depends on the system being
sparse
Sparse is a computer software tool designed to find possible coding faults in the Linux kernel. Unlike other such tools, this static analysis tool was initially designed to only flag constructs that were likely to be of interest to kernel de ...
. It has been demonstrated to be superior, in some biomedical imaging applications, to other methods such as the
filtered backprojection
In mathematics, the Radon transform is the integral transform which takes a function ''f'' defined on the plane to a function ''Rf'' defined on the (two-dimensional) space of lines in the plane, whose value at a particular line is equal to the l ...
method.
It has many applications ranging from
computed tomography
A computed tomography scan (CT scan; formerly called computed axial tomography scan or CAT scan) is a medical imaging technique used to obtain detailed internal images of the body. The personnel that perform CT scans are called radiographers ...
(CT) to
signal processing
Signal processing is an electrical engineering subfield that focuses on analyzing, modifying and synthesizing '' signals'', such as sound, images, and scientific measurements. Signal processing techniques are used to optimize transmissions, ...
. It can be obtained also by applying to the hyperplanes, described by the linear system, the method of successive
projections onto convex sets (POCS).
Algorithm 1: Kaczmarz algorithm
Let
be a
system of linear equations, let
be the number of rows of ''A'',
be the
th row of
complex
Complex commonly refers to:
* Complexity, the behaviour of a system whose components interact in multiple ways so possible interactions are difficult to describe
** Complex system, a system composed of many components which may interact with each ...
-valued
matrix
Matrix most commonly refers to:
* ''The Matrix'' (franchise), an American media franchise
** '' The Matrix'', a 1999 science-fiction action film
** "The Matrix", a fictional setting, a virtual reality environment, within ''The Matrix'' (franchi ...
, and let
be arbitrary complex-valued initial approximation to the solution of
. For
compute:
where
and
denotes
complex conjugation
In mathematics, the complex conjugate of a complex number is the number with an equal real part and an imaginary part equal in magnitude but opposite in sign. That is, (if a and b are real, then) the complex conjugate of a + bi is equal to a ...
of
.
If the system is consistent,
converges to the minimum-
norm solution, provided that the iterations start with the zero vector.
A more general algorithm can be defined using a
relaxation parameter
:
There are versions of the method that converge to a regularized weighted least squares solution when applied to a system of inconsistent equations and, at least as far as initial behavior is concerned, at a lesser cost than other iterative methods, such as the
conjugate gradient method
In mathematics, the conjugate gradient method is an algorithm for the numerical solution of particular systems of linear equations, namely those whose matrix is positive-definite. The conjugate gradient method is often implemented as an iter ...
.
Algorithm 2: Randomized Kaczmarz algorithm
In 2009, a randomized version of the Kaczmarz method for
overdetermined linear systems was introduced by Thomas Strohmer and Roman Vershynin
in which the ''i''-th equation is selected randomly with probability proportional to
This method can be seen as a particular case of
stochastic gradient descent
Stochastic gradient descent (often abbreviated SGD) is an iterative method for optimizing an objective function with suitable smoothness properties (e.g. differentiable or subdifferentiable). It can be regarded as a stochastic approximation of ...
.
Under such circumstances
converges exponentially fast to the solution of
and the rate of convergence depends only on the scaled
condition number
In numerical analysis, the condition number of a function measures how much the output value of the function can change for a small change in the input argument. This is used to measure how sensitive a function is to changes or errors in the inpu ...
.
:Theorem. Let
be the solution of
Then Algorithm 2 converges to
in expectation, with the average error:
::
Proof
We have
Using
:
we can write () as
The main point of the proof is to view the left hand side in () as an expectation of some random variable. Namely, recall that the solution space of the
equation of
is the hyperplane
:
whose normal is
Define a random vector ''Z'' whose values are the normals to all the equations of
, with probabilities as in our algorithm:
:
with probability
Then () says that
The orthogonal projection
onto the solution space of a random equation of
is given by
Now we are ready to analyze our algorithm. We want to show that the error
reduces at each step in average (conditioned on the previous steps) by at least the factor of
The next approximation
is computed from
as
where
are independent realizations of the random projection
The vector
is in the kernel of
It is orthogonal to the solution space of the equation onto which
projects, which contains the vector
(recall that
is the solution to all equations). The orthogonality of these two vectors then yields
:
To complete the proof, we have to bound
from below. By the definition of
, we have
:
where
are independent realizations of the random vector
Thus
:
Now we take the expectation of both sides conditional upon the choice of the random vectors
(hence we fix the choice of the random projections
and thus the random vectors
and we average over the random vector
). Then
:
By () and the independence,
:
Taking the full expectation of both sides, we conclude that
:
The superiority of this selection was illustrated with the reconstruction of a bandlimited function from its nonuniformly spaced sampling values. However, it has been pointed out
that the reported success by Strohmer and Vershynin depends on the specific choices that were made there in translating the underlying problem, whose geometrical nature is to ''find a common point of a set of hyperplanes'', into a system of algebraic equations. There will always be legitimate algebraic representations of the underlying problem for which the selection method in
will perform in an inferior manner.
The Kaczmarz iteration () has a purely geometric interpretation: the algorithm successively projects the current iterate onto the hyperplane defined by the next equation. Hence, any scaling of the equations is irrelevant; it can also be seen from () that any (nonzero) scaling of the equations cancels out. Thus, in RK, one can use
or any other weights that may be relevant. Specifically, in the above-mentioned reconstruction example, the equations were chosen with probability proportional to the average distance of each sample point from its two nearest neighbors — a concept introduced by
Feichtinger and
Gröchenig. For additional progress on this topic, see,
and the references therein.
Algorithm 3: Gower-Richtarik algorithm
In 2015, Robert M. Gower and
Peter Richtarik
Peter Richtarik is a Slovak mathematician and computer scientist working in the area of big data optimization and machine learning, known for his work on randomized coordinate descent algorithms, stochastic gradient descent and federated learn ...
developed a versatile randomized iterative method for solving a consistent system of linear equations
which includes the randomized Kaczmarz algorithm as a special case. Other special cases include randomized coordinate descent, randomized Gaussian descent and randomized Newton method. Block versions and versions with importance sampling of all these methods also arise as special cases. The method is shown to enjoy exponential rate decay (in expectation) - also known as linear convergence, under very mild conditions on the way randomness enters the algorithm. The Gower-Richtarik method is the first algorithm uncovering a "sibling" relationship between these methods, some of which were independently proposed before, while many of which were new.
Insights about Randomized Kaczmarz
Interesting new insights about the randomized Kaczmarz method that can be gained from the analysis of the method include:
* The general rate of the Gower-Richtarik algorithm precisely recovers the rate of the randomized Kaczmarz method in the special case when it reduced to it.
* The choice of probabilities for which the randomized Kaczmarz algorithm was originally formulated and analyzed (probabilities proportional to the squares of the row norms) is not optimal. Optimal probabilities are the solution of a certain semidefinite program. The theoretical complexity of randomized Kaczmarz with the optimal probabilities can be arbitrarily better than the complexity for the standard probabilities. However, the amount by which it is better depends on the matrix
. There are problems for which the standard probabilities are optimal.
* When applied to a system with matrix
which is positive definite, Randomized Kaczmarz method is equivalent to the Stochastic Gradient Descent (SGD) method (with a very special stepsize) for minimizing the strongly convex quadratic function
Note that since
is convex, the minimizers of
must satisfy
, which is equivalent to
The "special stepsize" is the stepsize which leads to a point which in the one-dimensional line spanned by the stochastic gradient minimizes the Euclidean distance from the unknown(!) minimizer of
, namely, from
This insight is gained from a dual view of the iterative process (below described as "Optimization Viewpoint: Constrain and Approximate").
Six Equivalent Formulations
The Gower-Richtarik method enjoys six seemingly different but equivalent formulations, shedding additional light on how to interpret it (and, as a consequence, how to interpret its many variants, including randomized Kaczmarz):
* 1. Sketching viewpoint: Sketch & Project
* 2. Optimization viewpoint: Constrain and Approximate
* 3. Geometric viewpoint: Random Intersect
* 4. Algebraic viewpoint 1: Random Linear Solve
* 5. Algebraic viewpoint 2: Random Update
* 6. Analytic viewpoint: Random Fixed Point
We now describe some of these viewpoints. The method depends on 2 parameters:
* a positive definite matrix
giving rise to a weighted Euclidean inner product
and the induced norm
::
* and a random matrix
with as many rows as
(and possibly random number of columns).
1. Sketch and Project
Given previous iterate
the new point
is computed by drawing a random matrix
(in an iid fashion from some fixed distribution), and setting
:
That is,
is obtained as the projection of
onto the randomly sketched system
. The idea behind this method is to pick
in such a way that a projection onto the sketched system is substantially simpler than the solution of the original system
. Randomized Kaczmarz method is obtained by picking
to be the identity matrix, and
to be the
unit coordinate vector with probability
Different choices of
and
lead to different variants of the method.
2. Constrain and Approximate
A seemingly different but entirely equivalent formulation of the method (obtained via Lagrangian duality) is
:
where
is also allowed to vary, and where
is any solution of the system
Hence,
is obtained by first constraining the update to the linear subspace spanned by the columns of the random matrix
, i.e., to
:
and then choosing the point
from this subspace which best approximates
. This formulation may look surprising as it seems impossible to perform the approximation step due to the fact that
is not known (after all, this is what we are trying the compute!). However, it is still possible to do this, simply because
computed this way is the same as
computed via the sketch and project formulation and since
does not appear there.
5. Random Update
The update can also be written explicitly as
:
where by
we denote the Moore-Penrose pseudo-inverse of matrix
. Hence, the method can be written in the form
, where
is a random update vector.
Letting
it can be shown that the system
always has a solution
, and that for all such solutions the vector
is the same. Hence, it does not matter which of these solutions is chosen, and the method can be also written as
. The pseudo-inverse leads just to one particular solution. The role of the pseudo-inverse is twofold:
* It allows the method to be written in the explicit "random update" form as above,
* It makes the analysis simple through the final, sixth, formulation.
6. Random Fixed Point
If we subtract
from both sides of the random update formula, denote
:
and use the fact that
we arrive at the last formulation:
:
where
is the identity matrix. The iteration matrix,
is random, whence the name of this formulation.
Convergence
By taking conditional expectations in the 6th formulation (conditional on
), we obtain
:
By taking expectation again, and using the tower property of expectations, we obtain
:
Gower and Richtarik
show that
:
where the matrix norm is defined by
:
Moreover, without any assumptions on
one has
By taking norms and unrolling the recurrence, we obtain
Theorem ower & Richtarik 2015/h2>
:
''Remark''. A sufficient condition for the expected residuals to converge to 0 is
This can be achieved if
has a full column rank and under very mild conditions on
Convergence of the method can be established also without the full column rank assumption in a different way.
It is also possible to show a stronger result:
Theorem ower & Richtarik 2015/h2>
The expected squared norms (rather than norms of expectations) converge at the same rate:
:
''Remark''. This second type of convergence is stronger due to the following identity
which holds for any random vector
and any fixed vector
:
:
Convergence of Randomized Kaczmarz
We have seen that the randomized Kaczmarz method appears as a special case of the Gower-Richtarik method for
and
being the
unit coordinate vector with probability
where
is the
row of
It can be checked by direct calculation that
:
Further Special Cases
Algorithm 4: PLSS-Kaczmarz
Since the convergence of the (randomized) Kaczmarz method depends on a
rate of convergence the method may make slow progress on some practical problems.
To ensure finite termination of the method, Johannes Brust and
Michael Saunders (academic)
have developed a process that generalizes the (randomized) Kaczmarz iteration and
terminates in at most
iterations to a solution for the consistent system
. The process is based on
Dimensionality reduction
Dimensionality reduction, or dimension reduction, is the transformation of data from a high-dimensional space into a low-dimensional space so that the low-dimensional representation retains some meaningful properties of the original data, ideally ...
, or projections
onto lower dimensional spaces, which is how it derives its name PLSS (Projected Linear Systems Solver). An iteration of PLSS-Kaczmarz can be regarded as the generalization
:
where
is the selection of rows 1 to
and all columns of
. A randomized version of the method uses
non repeated row indices at each iteration:
where each
is in
.
The iteration converges to a solution when
. In particular, since
it holds that
:
and therefore
is a solution to the linear system. The computation of iterates in PLSS-Kaczmarz can be simplified and organized effectively.
The resulting algorithm only requires matrix-vector products and has a direct form
algorithm PLSS-Kaczmarz is
input: matrix ''A'' right hand side ''b''
output: solution ''x'' such that ''Ax=b''
''x := 0'', ''P =
'
for ''k'' in ''1,2,...,m'' do
''a'' := ''A(i
k,:)' // Select an index i
k in 1,...,m without resampling''
''d'' := ''P' * a''
''c
1'' := ''norm(a)''
''c
2'' := ''norm(d)''
''c
3'' := ''(b
ik-x'*a)/((c
1-c
2)*(c
1+c
2))''
''p'' := c
3*(a - P*(P'*a))
''P'' :=
P, p/norm(p) // Append a normalized update''
''x'' := x + p
return ''x''
Notes
References
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
External links
A randomized Kaczmarz algorithm with exponential convergence
Comments on the randomized Kaczmarz method
{{Numerical linear algebra
Numerical linear algebra
Medical imaging
Signal processing