Numerical methods for linear least squares entails the

numerical analysis Numerical analysis is the study of algorithms that use numerical approximation (as opposed to symbolic manipulations) for the problems of mathematical analysis (as distinguished from discrete mathematics). It is the study of numerical methods ...

of linear least squares problems.

Introduction

A general approach to the least squares problem

\operatorname \, \big\, \mathbf y - X \boldsymbol \beta \big\, ^2

can be described as follows. Suppose that we can find an ''n'' by ''m'' matrix S such that XS is an

orthogonal projection In linear algebra and functional analysis, a projection is a linear transformation P from a vector space to itself (an endomorphism) such that P\circ P=P. That is, whenever P is applied twice to any vector, it gives the same result as if it wer ...

onto the image of X. Then a solution to our minimization problem is given by :

\boldsymbol \beta  =  S   \mathbf y

simply because :

X \boldsymbol \beta  =  X ( S   \mathbf y) = (X  S)  \mathbf y

is exactly a sought for orthogonal projection of

\mathbf y

onto an image of X ( see the picture below and note that as explained in the next section the image of X is just a subspace generated by column vectors of X). A few popular ways to find such a matrix ''S'' are described below.

Inverting the matrix of the normal equations

The equation

(\mathbf X^  \mathbf X )\beta = \mathbf X^  y

is known as the normal equation. The algebraic solution of the normal equations with a full-rank matrix ''X''^T''X'' can be written as :

\hat = (\mathbf X^  \mathbf X )^ \mathbf X^  \mathbf y 
= \mathbf X^+ \mathbf y

where ''X''⁺ is the Moore–Penrose pseudoinverse of ''X''. Although this equation is correct and can work in many applications, it is not computationally efficient to invert the normal-equations matrix (the Gramian matrix). An exception occurs in

numerical smoothing and differentiation Numerical may refer to: * Number * Numerical digit * Numerical analysis Numerical analysis is the study of algorithms that use numerical approximation (as opposed to symbolic manipulations) for the problems of mathematical analysis (as distin ...

where an analytical expression is required. If the matrix ''X''^T''X'' is

well-conditioned In numerical analysis, the condition number of a function measures how much the output value of the function can change for a small change in the input argument. This is used to measure how sensitive a function is to changes or errors in the input ...

and

positive definite In mathematics, positive definiteness is a property of any object to which a bilinear form or a sesquilinear form may be naturally associated, which is positive-definite. See, in particular: * Positive-definite bilinear form * Positive-definite fu ...

, implying that it has full

rank Rank is the relative position, value, worth, complexity, power, importance, authority, level, etc. of a person or object within a ranking, such as: Level or position in a hierarchical organization * Academic rank * Diplomatic rank * Hierarchy * ...

, the normal equations can be solved directly by using the

Cholesky decomposition In linear algebra, the Cholesky decomposition or Cholesky factorization (pronounced ) is a decomposition of a Hermitian, positive-definite matrix into the product of a lower triangular matrix and its conjugate transpose, which is useful for effici ...

''R''^T''R'', where ''R'' is an upper

triangular matrix In mathematics, a triangular matrix is a special kind of square matrix. A square matrix is called if all the entries ''above'' the main diagonal are zero. Similarly, a square matrix is called if all the entries ''below'' the main diagonal are ...

, giving: :

R^ R \hat =  X^ \mathbf y.

The solution is obtained in two stages, a

forward substitution In mathematics, a triangular matrix is a special kind of square matrix. A square matrix is called if all the entries ''above'' the main diagonal are zero. Similarly, a square matrix is called if all the entries ''below'' the main diagonal are ...

step, solving for z: :

R^ \mathbf z = X^ \mathbf y,

followed by a backward substitution, solving for

\hat

: :

R \hat= \mathbf z.

Both substitutions are facilitated by the triangular nature of ''R''.

Orthogonal decomposition methods

Orthogonal decomposition methods of solving the least squares problem are slower than the normal equations method but are more numerically stable because they avoid forming the product ''X''^T''X''. The residuals are written in matrix notation as :

\mathbf r= \mathbf y - X \hat.

The matrix ''X'' is subjected to an orthogonal decomposition, e.g., the

QR decomposition In linear algebra, a QR decomposition, also known as a QR factorization or QU factorization, is a decomposition of a matrix ''A'' into a product ''A'' = ''QR'' of an orthogonal matrix ''Q'' and an upper triangular matrix ''R''. QR decomp ...

as follows. :

X=Q
\begin
R \\
0
\end 
\

, where ''Q'' is an ''m''×''m''

orthogonal matrix In linear algebra, an orthogonal matrix, or orthonormal matrix, is a real square matrix whose columns and rows are orthonormal vectors. One way to express this is Q^\mathrm Q = Q Q^\mathrm = I, where is the transpose of and is the identity m ...

(''Q''^T''Q=I'') and ''R'' is an ''n''×''n'' upper triangular matrix with

r_>0

. The residual vector is left-multiplied by ''Q''^T. :

Q^ \mathbf r = Q^ \mathbf y - \left( Q^ Q \right) 
\begin
R \\
0
\end 
\hat= \begin
\left(Q^ \mathbf y \right)_n - R \hat  \\
\left(Q^ \mathbf y \right)_ 
\end
= \begin
\mathbf u \\
\mathbf v
\end

Because ''Q'' is

orthogonal In mathematics, orthogonality is the generalization of the geometric notion of '' perpendicularity''. By extension, orthogonality is also used to refer to the separation of specific features of a system. The term also has specialized meanings in ...

, the sum of squares of the residuals, ''s'', may be written as: :

s = \, \mathbf r \, ^2 = \mathbf r^ \mathbf r = \mathbf r^ Q Q^ \mathbf r = \mathbf u^ \mathbf u + \mathbf v^ \mathbf v

Since v doesn't depend on ''β'', the minimum value of ''s'' is attained when the upper block, u, is zero. Therefore, the parameters are found by solving: :

R \hat =\left(Q^ \mathbf y \right)_n.

These equations are easily solved as ''R'' is upper triangular. An alternative decomposition of ''X'' is the

singular value decomposition In linear algebra, the singular value decomposition (SVD) is a factorization of a real or complex matrix. It generalizes the eigendecomposition of a square normal matrix with an orthonormal eigenbasis to any \ m \times n\ matrix. It is re ...

(SVD) :

X = U \Sigma V^ \

, where ''U'' is ''m'' by ''m'' orthogonal matrix, ''V'' is ''n'' by ''n'' orthogonal matrix and

\Sigma

is an ''m'' by ''n'' matrix with all its elements outside of the main diagonal equal to ''0''. The

pseudoinverse In mathematics, and in particular, algebra, a generalized inverse (or, g-inverse) of an element ''x'' is an element ''y'' that has some properties of an inverse element but not necessarily all of them. The purpose of constructing a generalized inv ...

\Sigma

is easily obtained by inverting its non-zero diagonal elements and transposing. Hence, :

\mathbf X \mathbf X^+ = U \Sigma V^  V \Sigma^+  U^ =  U P  U^,

where ''P'' is obtained from

\Sigma

by replacing its non-zero diagonal elements with ones. Since

(\mathbf X \mathbf X^+)^* = \mathbf X \mathbf X^+

(the property of pseudoinverse), the matrix

U P U^

is an orthogonal projection onto the image (column-space) of ''X''. In accordance with a general approach described in the introduction above (find XS which is an orthogonal projection), :

S = \mathbf X^+

, and thus, :

\beta = V\Sigma^+ U^ \mathbf y

is a solution of a least squares problem. This method is the most computationally intensive, but is particularly useful if the normal equations matrix, ''X''^T''X'', is very ill-conditioned (i.e. if its

condition number In numerical analysis, the condition number of a function measures how much the output value of the function can change for a small change in the input argument. This is used to measure how sensitive a function is to changes or errors in the inpu ...

multiplied by the machine's relative

round-off error A roundoff error, also called rounding error, is the difference between the result produced by a given algorithm using exact arithmetic and the result produced by the same algorithm using finite-precision, rounded arithmetic. Rounding errors are d ...

is appreciably large). In that case, including the smallest

singular value In mathematics, in particular functional analysis, the singular values, or ''s''-numbers of a compact operator T: X \rightarrow Y acting between Hilbert spaces X and Y, are the square roots of the (necessarily non-negative) eigenvalues of the self ...

s in the inversion merely adds numerical noise to the solution. This can be cured with the truncated SVD approach, giving a more stable and exact answer, by explicitly setting to zero all singular values below a certain threshold and so ignoring them, a process closely related to

factor analysis Factor analysis is a statistical method used to describe variability among observed, correlated variables in terms of a potentially lower number of unobserved variables called factors. For example, it is possible that variations in six observed ...

Discussion

The numerical methods for linear least squares are important because

linear regression In statistics, linear regression is a linear approach for modelling the relationship between a scalar response and one or more explanatory variables (also known as dependent and independent variables). The case of one explanatory variable is cal ...

models are among the most important types of model, both as formal

statistical model A statistical model is a mathematical model that embodies a set of statistical assumptions concerning the generation of sample data (and similar data from a larger population). A statistical model represents, often in considerably idealized form ...

s and for exploration of data-sets. The majority of statistical computer packages contain facilities for regression analysis that make use of linear least squares computations. Hence it is appropriate that considerable effort has been devoted to the task of ensuring that these computations are undertaken efficiently and with due regard to

. Individual statistical analyses are seldom undertaken in isolation, but rather are part of a sequence of investigatory steps. Some of the topics involved in considering numerical methods for linear least squares relate to this point. Thus important topics can be *Computations where a number of similar, and often

nested ''Nested'' is the seventh studio album by Bronx-born singer, songwriter and pianist Laura Nyro, released in 1978 on Columbia Records. Following on from her extensive tour to promote 1976's ''Smile'', which resulted in the 1977 live album '' Sea ...

, models are considered for the same data-set. That is, where models with the same

dependent variable Dependent and independent variables are variables in mathematical modeling, statistical modeling and experimental sciences. Dependent variables receive this name because, in an experiment, their values are studied under the supposition or dema ...

but different sets of

independent variables Dependent and independent variables are variables in mathematical modeling, statistical modeling and experimental sciences. Dependent variables receive this name because, in an experiment, their values are studied under the supposition or deman ...

are to be considered, for essentially the same set of data-points. *Computations for analyses that occur in a sequence, as the number of data-points increases. *Special considerations for very extensive data-sets. Fitting of linear models by least squares often, but not always, arise in the context of

statistical analysis Statistical inference is the process of using data analysis to infer properties of an underlying distribution of probability.Upton, G., Cook, I. (2008) ''Oxford Dictionary of Statistics'', OUP. . Inferential statistical analysis infers propertie ...

. It can therefore be important that considerations of computation efficiency for such problems extend to all of the auxiliary quantities required for such analyses, and are not restricted to the formal solution of the linear least squares problem. Matrix calculations, like any other, are affected by

rounding error A roundoff error, also called rounding error, is the difference between the result produced by a given algorithm using exact arithmetic and the result produced by the same algorithm using finite-precision, rounded arithmetic. Rounding errors are d ...

s. An early summary of these effects, regarding the choice of computation methods for matrix inversion, was provided by Wilkinson.Wilkinson, J.H. (1963) "Chapter 3: Matrix Computations", ''Rounding Errors in Algebraic Processes'', London: Her Majesty's Stationery Office (National Physical Laboratory, Notes in Applied Science, No.32)

Introduction

Inverting the matrix of the normal equations

Orthogonal decomposition methods

Discussion

See also

References

Further reading