In
mathematics and computing, the Levenberg–Marquardt algorithm (LMA or just LM), also known as the damped least-squares (DLS) method, is used to solve
non-linear least squares problems. These minimization problems arise especially in
least squares
The method of least squares is a standard approach in regression analysis to approximate the solution of overdetermined systems (sets of equations in which there are more equations than unknowns) by minimizing the sum of the squares of the r ...
curve fitting
Curve fitting is the process of constructing a curve, or mathematical function, that has the best fit to a series of data points, possibly subject to constraints. Curve fitting can involve either interpolation, where an exact fit to the data i ...
. The LMA interpolates between the
Gauss–Newton algorithm (GNA) and the method of
gradient descent. The LMA is more
robust than the GNA, which means that in many cases it finds a solution even if it starts very far off the final minimum. For well-behaved functions and reasonable starting parameters, the LMA tends to be slower than the GNA. LMA can also be viewed as
Gauss–Newton using a
trust region approach.
The algorithm was first published in 1944 by
Kenneth Levenberg,
while working at the
Frankford Army Arsenal. It was rediscovered in 1963 by
Donald Marquardt Donald W. Marquardt (March 13, 1929, New York City – July 5, 1997, New Castle, Delaware) was an American statistician, the rediscoverer of the Levenberg–Marquardt nonlinear least squares fitting algorithm.Paul Davis (1993)Levenberg–Marquart ...
,
who worked as a
statistician
A statistician is a person who works with theoretical or applied statistics. The profession exists in both the private and public sectors.
It is common to combine statistical knowledge with expertise in other subjects, and statisticians may wor ...
at
DuPont, and independently by Girard,
Wynne
and Morrison.
The LMA is used in many software applications for solving generic curve-fitting problems. By using the Gauss–Newton algorithm it often converges faster than first-order methods. However, like other iterative optimization algorithms, the LMA finds only a
local minimum
In mathematical analysis, the maxima and minima (the respective plurals of maximum and minimum) of a function, known collectively as extrema (the plural of extremum), are the largest and smallest value of the function, either within a given r ...
, which is not necessarily the
global minimum.
The problem
The primary application of the Levenberg–Marquardt algorithm is in the least-squares curve fitting problem: given a set of
empirical pairs
of independent and dependent variables, find the parameters of the model curve
so that the sum of the squares of the deviations
is minimized:
:
which is assumed to be non-empty.
The solution
Like other numeric minimization algorithms, the Levenberg–Marquardt algorithm is an
iterative procedure. To start a minimization, the user has to provide an initial guess for the parameter vector . In cases with only one minimum, an uninformed standard guess like
will work fine; in cases with
multiple minima, the algorithm converges to the global minimum only if the initial guess is already somewhat close to the final solution.
In each iteration step, the parameter vector is replaced by a new estimate . To determine , the function
is approximated by its
linearization
In mathematics, linearization is finding the linear approximation to a function at a given point. The linear approximation of a function is the first order Taylor expansion around the point of interest. In the study of dynamical systems, linea ...
:
:
where
:
is the
gradient
In vector calculus, the gradient of a scalar-valued differentiable function of several variables is the vector field (or vector-valued function) \nabla f whose value at a point p is the "direction and rate of fastest increase". If the gr ...
(row-vector in this case) of with respect to .
The sum
of square deviations has its minimum at a zero
gradient
In vector calculus, the gradient of a scalar-valued differentiable function of several variables is the vector field (or vector-valued function) \nabla f whose value at a point p is the "direction and rate of fastest increase". If the gr ...
with respect to . The above first-order approximation of
gives
:
or in vector notation,
:
Taking the derivative of
with respect to and setting the result to zero gives
:
where
is the
Jacobian matrix
In vector calculus, the Jacobian matrix (, ) of a vector-valued function of several variables is the matrix of all its first-order partial derivatives. When this matrix is square, that is, when the function takes the same number of variables ...
, whose -th row equals
, and where
and
are vectors with -th component
and
respectively. The above expression obtained for comes under the Gauss–Newton method. The Jacobian matrix as defined above is not (in general) a square matrix, but a rectangular matrix of size
, where
is the number of parameters (size of the vector
). The matrix multiplication
yields the required
square matrix and the matrix-vector product on the right hand side yields a vector of size
. The result is a set of
linear equations, which can be solved for .
Levenberg's contribution is to replace this equation by a "damped version":
:
where is the identity matrix, giving as the increment to the estimated parameter vector .
The (non-negative) damping factor is adjusted at each iteration. If reduction of is rapid, a smaller value can be used, bringing the algorithm closer to the
Gauss–Newton algorithm, whereas if an iteration gives insufficient reduction in the residual, can be increased, giving a step closer to the gradient-descent direction. Note that the
gradient
In vector calculus, the gradient of a scalar-valued differentiable function of several variables is the vector field (or vector-valued function) \nabla f whose value at a point p is the "direction and rate of fastest increase". If the gr ...
of with respect to equals
. Therefore, for large values of , the step will be taken approximately in the direction opposite to the gradient. If either the length of the calculated step or the reduction of sum of squares from the latest parameter vector fall below predefined limits, iteration stops, and the last parameter vector is considered to be the solution.
When the damping factor is large relative to
, inverting
is not necessary, as the update is well-approximated by the small gradient step