In
mathematics
Mathematics is an area of knowledge that includes the topics of numbers, formulas and related structures, shapes and the spaces in which they are contained, and quantities and their changes. These topics are represented in modern mathematics ...
and computing, the Levenberg–Marquardt algorithm (LMA or just LM), also known as the damped least-squares (DLS) method, is used to solve
non-linear least squares
Non-linear least squares is the form of least squares analysis used to fit a set of ''m'' observations with a model that is non-linear in ''n'' unknown parameters (''m'' ≥ ''n''). It is used in some forms of nonlinear regression. The ...
problems. These minimization problems arise especially in
least squares curve fitting
Curve fitting is the process of constructing a curve, or mathematical function, that has the best fit to a series of data points, possibly subject to constraints. Curve fitting can involve either interpolation, where an exact fit to the data is ...
. The LMA interpolates between the
Gauss–Newton algorithm
The Gauss–Newton algorithm is used to solve non-linear least squares problems, which is equivalent to minimizing a sum of squared function values. It is an extension of Newton's method for finding a minimum of a non-linear function. Since a sum ...
(GNA) and the method of
gradient descent
In mathematics, gradient descent (also often called steepest descent) is a first-order iterative optimization algorithm for finding a local minimum of a differentiable function. The idea is to take repeated steps in the opposite direction of the ...
. The LMA is more
robust
Robustness is the property of being strong and healthy in constitution. When it is transposed into a system, it refers to the ability of tolerating perturbations that might affect the system’s functional body. In the same line ''robustness'' ca ...
than the GNA, which means that in many cases it finds a solution even if it starts very far off the final minimum. For well-behaved functions and reasonable starting parameters, the LMA tends to be slower than the GNA. LMA can also be viewed as
Gauss–Newton using a
trust region In mathematical optimization, a trust region is the subset of the region of the objective function that is approximated using a model function (often a quadratic). If an adequate model of the objective function is found within the trust region, the ...
approach.
The algorithm was first published in 1944 by
Kenneth Levenberg,
while working at the
Frankford Army Arsenal. It was rediscovered in 1963 by
Donald Marquardt,
who worked as a
statistician
A statistician is a person who works with theoretical or applied statistics. The profession exists in both the private and public sectors.
It is common to combine statistical knowledge with expertise in other subjects, and statisticians may wor ...
at
DuPont
DuPont de Nemours, Inc., commonly shortened to DuPont, is an American multinational chemical company first formed in 1802 by French-American chemist and industrialist Éleuthère Irénée du Pont de Nemours. The company played a major role in ...
, and independently by Girard,
Wynne
and Morrison.
The LMA is used in many software applications for solving generic curve-fitting problems. By using the Gauss–Newton algorithm it often converges faster than first-order methods. However, like other iterative optimization algorithms, the LMA finds only a
local minimum
In mathematical analysis, the maxima and minima (the respective plurals of maximum and minimum) of a function, known collectively as extrema (the plural of extremum), are the largest and smallest value of the function, either within a given ra ...
, which is not necessarily the
global minimum
In mathematical analysis, the maxima and minima (the respective plurals of maximum and minimum) of a function, known collectively as extrema (the plural of extremum), are the largest and smallest value of the function, either within a given ran ...
.
The problem
The primary application of the Levenberg–Marquardt algorithm is in the least-squares curve fitting problem: given a set of
empirical pairs
of independent and dependent variables, find the parameters of the model curve
so that the sum of the squares of the deviations
is minimized:
:
which is assumed to be non-empty.
The solution
Like other numeric minimization algorithms, the Levenberg–Marquardt algorithm is an
iterative
Iteration is the repetition of a process in order to generate a (possibly unbounded) sequence of outcomes. Each repetition of the process is a single iteration, and the outcome of each iteration is then the starting point of the next iteration. ...
procedure. To start a minimization, the user has to provide an initial guess for the parameter vector . In cases with only one minimum, an uninformed standard guess like
will work fine; in cases with
multiple minima, the algorithm converges to the global minimum only if the initial guess is already somewhat close to the final solution.
In each iteration step, the parameter vector is replaced by a new estimate . To determine , the function
is approximated by its
linearization
In mathematics, linearization is finding the linear approximation to a function at a given point. The linear approximation of a function is the first order Taylor expansion around the point of interest. In the study of dynamical systems, lineariz ...
:
:
where
:
is the
gradient
In vector calculus, the gradient of a scalar-valued differentiable function of several variables is the vector field (or vector-valued function) \nabla f whose value at a point p is the "direction and rate of fastest increase". If the gradi ...
(row-vector in this case) of with respect to .
The sum
of square deviations has its minimum at a zero
gradient
In vector calculus, the gradient of a scalar-valued differentiable function of several variables is the vector field (or vector-valued function) \nabla f whose value at a point p is the "direction and rate of fastest increase". If the gradi ...
with respect to . The above first-order approximation of
gives
:
or in vector notation,
:
Taking the derivative of
with respect to and setting the result to zero gives
:
where
is the
Jacobian matrix
In vector calculus, the Jacobian matrix (, ) of a vector-valued function of several variables is the matrix of all its first-order partial derivatives. When this matrix is square, that is, when the function takes the same number of variables as ...
, whose -th row equals
, and where
and
are vectors with -th component
and
respectively. The above expression obtained for comes under the Gauss–Newton method. The Jacobian matrix as defined above is not (in general) a square matrix, but a rectangular matrix of size
, where
is the number of parameters (size of the vector
). The matrix multiplication
yields the required
square matrix and the matrix-vector product on the right hand side yields a vector of size
. The result is a set of
linear equations, which can be solved for .
Levenberg's contribution is to replace this equation by a "damped version":
:
where is the identity matrix, giving as the increment to the estimated parameter vector .
The (non-negative) damping factor is adjusted at each iteration. If reduction of is rapid, a smaller value can be used, bringing the algorithm closer to the
Gauss–Newton algorithm
The Gauss–Newton algorithm is used to solve non-linear least squares problems, which is equivalent to minimizing a sum of squared function values. It is an extension of Newton's method for finding a minimum of a non-linear function. Since a sum ...
, whereas if an iteration gives insufficient reduction in the residual, can be increased, giving a step closer to the gradient-descent direction. Note that the
gradient
In vector calculus, the gradient of a scalar-valued differentiable function of several variables is the vector field (or vector-valued function) \nabla f whose value at a point p is the "direction and rate of fastest increase". If the gradi ...
of with respect to equals
. Therefore, for large values of , the step will be taken approximately in the direction opposite to the gradient. If either the length of the calculated step or the reduction of sum of squares from the latest parameter vector fall below predefined limits, iteration stops, and the last parameter vector is considered to be the solution.
When the damping factor is large relative to
, inverting
is not necessary, as the update is well-approximated by the small gradient step