The Gauss–Newton algorithm is used to solve
non-linear least squares
Non-linear least squares is the form of least squares analysis used to fit a set of ''m'' observations with a model that is non-linear in ''n'' unknown parameters (''m'' ≥ ''n''). It is used in some forms of nonlinear regression. The ...
problems, which is equivalent to minimizing a sum of squared function values. It is an extension of
Newton's method
In numerical analysis, Newton's method, also known as the Newton–Raphson method, named after Isaac Newton and Joseph Raphson, is a root-finding algorithm which produces successively better approximations to the roots (or zeroes) of a real-valu ...
for finding a
minimum
In mathematical analysis, the maxima and minima (the respective plurals of maximum and minimum) of a function, known collectively as extrema (the plural of extremum), are the largest and smallest value of the function, either within a given ran ...
of a non-linear
function
Function or functionality may refer to:
Computing
* Function key, a type of key on computer keyboards
* Function model, a structured representation of processes in a system
* Function object or functor or functionoid, a concept of object-oriente ...
. Since a sum of squares must be nonnegative, the algorithm can be viewed as using Newton's method to iteratively approximate
zeroes of the components of the sum, and thus minimizing the sum. It has the advantage that second derivatives, which can be challenging to compute, are not required.
Non-linear least squares problems arise, for instance, in
non-linear regression
In statistics, nonlinear regression is a form of regression analysis in which observational data are modeled by a function which is a nonlinear combination of the model parameters and depends on one or more independent variables. The data are fit ...
, where parameters in a model are sought such that the model is in good agreement with available observations.
The method is named after the mathematicians
Carl Friedrich Gauss
Johann Carl Friedrich Gauss (; german: Gauß ; la, Carolus Fridericus Gauss; 30 April 177723 February 1855) was a German mathematician and physicist who made significant contributions to many fields in mathematics and science. Sometimes refer ...
and
Isaac Newton
Sir Isaac Newton (25 December 1642 – 20 March 1726/27) was an English mathematician, physicist, astronomer, alchemist, theologian, and author (described in his time as a "natural philosopher"), widely recognised as one of the grea ...
, and first appeared in Gauss' 1809 work ''Theoria motus corporum coelestium in sectionibus conicis solem ambientum''.
Description
Given
functions
(often called residuals) of
variables
with
the Gauss–Newton algorithm
iteratively finds the value of the variables that minimize the sum of squares
[Björck (1996)]
Starting with an initial guess
for the minimum, the method proceeds by the iterations
where, if r and ''β'' are
column vectors
In linear algebra, a column vector with m elements is an m \times 1 Matrix_(mathematics), matrix consisting of a single column of m entries, for example,
\boldsymbol = \begin x_1 \\ x_2 \\ \vdots \\ x_m \end.
Similarly, a row vector is a 1 \times ...
, the entries of the
Jacobian matrix
In vector calculus, the Jacobian matrix (, ) of a vector-valued function of several variables is the matrix of all its first-order partial derivatives. When this matrix is square, that is, when the function takes the same number of variables as ...
are
and the symbol
denotes the
matrix transpose
In linear algebra, the transpose of a matrix is an operator which flips a matrix over its diagonal;
that is, it switches the row and column indices of the matrix by producing another matrix, often denoted by (among other notations).
The tr ...
.
At each iteration, the update
can be found by rearranging the previous equation in the following two steps:
*
*
With substitutions
,
, and
, this turns into the conventional matrix equation of form
, which can then be solved in a variety of methods (see
Notes
Note, notes, or NOTE may refer to:
Music and entertainment
* Musical note, a pitched sound (or a symbol for a sound) in music
* Notes (album), ''Notes'' (album), a 1987 album by Paul Bley and Paul Motian
* ''Notes'', a common (yet unofficial) sho ...
).
If , the iteration simplifies to
which is a direct generalization of
Newton's method
In numerical analysis, Newton's method, also known as the Newton–Raphson method, named after Isaac Newton and Joseph Raphson, is a root-finding algorithm which produces successively better approximations to the roots (or zeroes) of a real-valu ...
in one dimension.
In data fitting, where the goal is to find the parameters
such that a given model function
best fits some data points
, the functions
are the
residuals:
Then, the Gauss–Newton method can be expressed in terms of the Jacobian
of the function
as
Note that
is the left
pseudoinverse
In mathematics, and in particular, algebra, a generalized inverse (or, g-inverse) of an element ''x'' is an element ''y'' that has some properties of an inverse element but not necessarily all of them. The purpose of constructing a generalized inv ...
of
.
Notes
The assumption in the algorithm statement is necessary, as otherwise the matrix
is not invertible and the normal equations cannot be solved (at least uniquely).
The Gauss–Newton algorithm can be derived by
linearly approximating the vector of functions ''r''
''i''. Using
Taylor's theorem
In calculus, Taylor's theorem gives an approximation of a ''k''-times differentiable function around a given point by a polynomial of degree ''k'', called the ''k''th-order Taylor polynomial. For a smooth function, the Taylor polynomial is the t ...
, we can write at every iteration:
with
. The task of finding
minimizing the sum of squares of the right-hand side; i.e.,
is a
linear least-squares problem, which can be solved explicitly, yielding the normal equations in the algorithm.
The normal equations are ''n'' simultaneous linear equations in the unknown increments
. They may be solved in one step, using
Cholesky decomposition
In linear algebra, the Cholesky decomposition or Cholesky factorization (pronounced ) is a decomposition of a Hermitian, positive-definite matrix into the product of a lower triangular matrix and its conjugate transpose, which is useful for effici ...
, or, better, the
QR factorization
In linear algebra, a QR decomposition, also known as a QR factorization or QU factorization, is a decomposition of a matrix ''A'' into a product ''A'' = ''QR'' of an orthogonal matrix ''Q'' and an upper triangular matrix ''R''. QR decomp ...
of
. For large systems, an
iterative method
In computational mathematics, an iterative method is a Algorithm, mathematical procedure that uses an initial value to generate a sequence of improving approximate solutions for a class of problems, in which the ''n''-th approximation is derived fr ...
, such as the
conjugate gradient
In mathematics, the conjugate gradient method is an algorithm for the numerical solution of particular systems of linear equations, namely those whose matrix is positive-definite. The conjugate gradient method is often implemented as an iterat ...
method, may be more efficient. If there is a linear dependence between columns of J
r, the iterations will fail, as
becomes singular.
When
is complex
the conjugate form should be used:
.
Example
In this example, the Gauss–Newton algorithm will be used to fit a model to some data by minimizing the sum of squares of errors between the data and model's predictions.
In a biology experiment studying the relation between substrate concentration and reaction rate in an enzyme-mediated reaction, the data in the following table were obtained.
It is desired to find a curve (model function) of the form
that fits best the data in the least-squares sense, with the parameters
and
to be determined.
Denote by
and
the values of and rate respectively, with
. Let
and
. We will find
and
such that the sum of squares of the residuals
is minimized.
The Jacobian
of the vector of residuals
with respect to the unknowns
is a
matrix with the
-th row having the entries
Starting with the initial estimates of
and
, after five iterations of the Gauss–Newton algorithm, the optimal values
and
are obtained. The sum of squares of residuals decreased from the initial value of 1.445 to 0.00784 after the fifth iteration. The plot in the figure on the right shows the curve determined by the model for the optimal parameters with the observed data.
Convergence properties
The Gauss-Newton iteration is guaranteed to converge toward a local minimum point
under 4 conditions
: The functions
are twice continuously differentiable in an open convex set
, the Jacobian
is of full column rank, the initial iterate
is near
, and the local minimum value
is small. The convergence is quadratic if
.
It can be shown that the increment Δ is a
descent direction In optimization, a descent direction is a vector \mathbf\in\mathbb R^n that, in the sense below, moves us closer towards a local minimum \mathbf^* of our objective function f:\mathbb R^n\to\mathbb R.
Suppose we are computing \mathbf^* by an iterati ...
for , and, if the algorithm converges, then the limit is a
stationary point
In mathematics, particularly in calculus, a stationary point of a differentiable function of one variable is a point on the graph of the function where the function's derivative is zero. Informally, it is a point where the function "stops" in ...
of . For large minimum value
, however, convergence is not guaranteed, not even
local convergence In numerical analysis, an iterative method is called locally convergent if the successive approximations produced by the method are guaranteed to converge to a solution when the initial approximation is already close enough to the solution. Iterati ...
as in
Newton's method
In numerical analysis, Newton's method, also known as the Newton–Raphson method, named after Isaac Newton and Joseph Raphson, is a root-finding algorithm which produces successively better approximations to the roots (or zeroes) of a real-valu ...
, or convergence under the usual Wolfe conditions.
The rate of convergence of the Gauss–Newton algorithm can approach
quadratic. The algorithm may converge slowly or not at all if the initial guess is far from the minimum or the matrix
is
ill-conditioned
In numerical analysis, the condition number of a function measures how much the output value of the function can change for a small change in the input argument. This is used to measure how sensitive a function is to changes or errors in the input ...
. For example, consider the problem with
equations and
variable, given by
The optimum is at
. (Actually the optimum is at
for
, because
, but
.) If
, then the problem is in fact linear and the method finds the optimum in one iteration. If , λ, < 1, then the method converges linearly and the error decreases asymptotically with a factor , λ, at every iteration. However, if , λ, > 1, then the method does not even converge locally.
Solving overdetermined systems of equations
The Gauss-Newton iteration
for
is an effective method for solving
overdetermined system
In mathematics, a system of equations is considered overdetermined if there are more equations than unknowns. An overdetermined system is almost always inconsistent (it has no solution) when constructed with random coefficients. However, an over ...
s of equations in the form of
with