Linear least squares(2).svg
   HOME

TheInfoList



OR:

Linear least squares (LLS) is the least squares approximation of
linear functions Linearity is the property of a mathematical relationship ('' function'') that can be graphically represented as a straight line. Linearity is closely related to '' proportionality''. Examples in physics include rectilinear motion, the linear ...
to data. It is a set of formulations for solving statistical problems involved in linear regression, including variants for ordinary (unweighted), weighted, and generalized (correlated) residuals. Numerical methods for linear least squares include inverting the matrix of the normal equations and
orthogonal decomposition In the mathematical fields of linear algebra and functional analysis, the orthogonal complement of a subspace ''W'' of a vector space ''V'' equipped with a bilinear form ''B'' is the set ''W''⊥ of all vectors in ''V'' that are orthogonal to every ...
methods.


Main formulations

The three main linear least squares formulations are: *
Ordinary least squares In statistics, ordinary least squares (OLS) is a type of linear least squares method for choosing the unknown parameters in a linear regression model (with fixed level-one effects of a linear function of a set of explanatory variables) by the ...
(OLS) is the most common estimator. OLS estimates are commonly used to analyze both
experiment An experiment is a procedure carried out to support or refute a hypothesis, or determine the efficacy or likelihood of something previously untried. Experiments provide insight into Causality, cause-and-effect by demonstrating what outcome oc ...
al and
observational Observation is the active acquisition of information from a primary source. In living beings, observation employs the senses. In science, observation can also involve the perception and recording of data (information), data via the use of scienti ...
data. The OLS method minimizes the sum of squared residuals, and leads to a closed-form expression for the estimated value of the unknown parameter vector ''β'': \hat = (\mathbf^\mathsf\mathbf)^ \mathbf^\mathsf \mathbf, where \mathbf is a vector whose ''i''th element is the ''i''th observation of the dependent variable, and \mathbf is a matrix whose ''ij'' element is the ''i''th observation of the ''j''th independent variable. The estimator is
unbiased Bias is a disproportionate weight ''in favor of'' or ''against'' an idea or thing, usually in a way that is closed-minded, prejudicial, or unfair. Biases can be innate or learned. People may develop biases for or against an individual, a group, ...
and
consistent In classical deductive logic, a consistent theory is one that does not lead to a logical contradiction. The lack of contradiction can be defined in either semantic or syntactic terms. The semantic definition states that a theory is consistent ...
if the errors have finite variance and are uncorrelated with the regressors: \operatorname ,\mathbf_i\varepsilon_i\,= 0, where \mathbf_i is the transpose of row ''i'' of the matrix \mathbf. It is also efficient under the assumption that the errors have finite variance and are
homoscedastic In statistics, a sequence (or a vector) of random variables is homoscedastic () if all its random variables have the same finite variance. This is also known as homogeneity of variance. The complementary notion is called heteroscedasticity. The s ...
, meaning that E 'ε''''i''2x''i''does not depend on ''i''. The condition that the errors are uncorrelated with the regressors will generally be satisfied in an experiment, but in the case of observational data, it is difficult to exclude the possibility of an omitted covariate ''z'' that is related to both the observed covariates and the response variable. The existence of such a covariate will generally lead to a correlation between the regressors and the response variable, and hence to an inconsistent estimator of β. The condition of homoscedasticity can fail with either experimental or observational data. If the goal is either inference or predictive modeling, the performance of OLS estimates can be poor if
multicollinearity In statistics, multicollinearity (also collinearity) is a phenomenon in which one predictor variable in a multiple regression model can be linearly predicted from the others with a substantial degree of accuracy. In this situation, the coeffic ...
is present, unless the sample size is large. *
Weighted least squares Weighted least squares (WLS), also known as weighted linear regression, is a generalization of ordinary least squares and linear regression in which knowledge of the variance of observations is incorporated into the regression. WLS is also a speci ...
(WLS) are used when heteroscedasticity is present in the error terms of the model. *
Generalized least squares In statistics, generalized least squares (GLS) is a technique for estimating the unknown parameters in a linear regression model when there is a certain degree of correlation between the residuals in a regression model. In these cases, ordinar ...
(GLS) is an extension of the OLS method, that allows efficient estimation of ''β'' when either heteroscedasticity, or correlations, or both are present among the error terms of the model, as long as the form of heteroscedasticity and correlation is known independently of the data. To handle heteroscedasticity when the error terms are uncorrelated with each other, GLS minimizes a weighted analogue to the sum of squared residuals from OLS regression, where the weight for the ''i''th case is inversely proportional to var(''ε''''i''). This special case of GLS is called "weighted least squares". The GLS solution to an estimation problem is \hat = (\mathbf^\mathsf \boldsymbol\Omega^ \mathbf)^\mathbf^\mathsf\boldsymbol\Omega^\mathbf, where Ω is the covariance matrix of the errors. GLS can be viewed as applying a linear transformation to the data so that the assumptions of OLS are met for the transformed data. For GLS to be applied, the covariance structure of the errors must be known up to a multiplicative constant.


Alternative formulations

Other formulations include: *
Iteratively reweighted least squares The method of iteratively reweighted least squares (IRLS) is used to solve certain optimization problems with objective functions of the form of a ''p''-norm: :\underset \sum_^n \big, y_i - f_i (\boldsymbol\beta) \big, ^p, by an iterative met ...
(IRLS) is used when heteroscedasticity, or correlations, or both are present among the error terms of the model, but where little is known about the covariance structure of the errors independently of the data. In the first iteration, OLS, or GLS with a provisional covariance structure is carried out, and the residuals are obtained from the fit. Based on the residuals, an improved estimate of the covariance structure of the errors can usually be obtained. A subsequent GLS iteration is then performed using this estimate of the error structure to define the weights. The process can be iterated to convergence, but in many cases, only one iteration is sufficient to achieve an efficient estimate of ''β''. * Instrumental variables regression (IV) can be performed when the regressors are correlated with the errors. In this case, we need the existence of some auxiliary ''instrumental variables'' z''i'' such that E ''z''i''''ε''''i''nbsp;= 0. If Z is the matrix of instruments, then the estimator can be given in closed form as \hat = (\mathbf^\mathsf\mathbf(\mathbf^\mathsf\mathbf)^\mathbf^\mathsf\mathbf)^\mathbf^\mathsf\mathbf(\mathbf^\mathsf\mathbf)^\mathbf^\mathsf\mathbf. Optimal instruments regression is an extension of classical IV regression to the situation where . *
Total least squares In applied statistics, total least squares is a type of errors-in-variables regression, a least squares data modeling technique in which observational errors on both dependent and independent variables are taken into account. It is a generaliza ...
(TLS) is an approach to least squares estimation of the linear regression model that treats the covariates and response variable in a more geometrically symmetric manner than OLS. It is one approach to handling the "errors in variables" problem, and is also sometimes used even when the covariates are assumed to be error-free. *Percentage least squares focuses on reducing percentage errors, which is useful in the field of forecasting or time series analysis. It is also useful in situations where the dependent variable has a wide range without constant variance, as here the larger residuals at the upper end of the range would dominate if OLS were used. When the percentage or relative error is normally distributed, least squares percentage regression provides maximum likelihood estimates. Percentage regression is linked to a multiplicative error model, whereas OLS is linked to models containing an additive error term. *
Constrained least squares In constrained least squares one solves a linear least squares problem with an additional constraint on the solution. I.e., the unconstrained equation \mathbf \boldsymbol = \mathbf must be fit as closely as possible (in the least squares sens ...
, indicates a linear least squares problem with additional constraints on the solution.


Objective function

In OLS (i.e., assuming unweighted observations), the optimal value of the
objective function In mathematical optimization and decision theory, a loss function or cost function (sometimes also called an error function) is a function that maps an event or values of one or more variables onto a real number intuitively representing some "cost ...
is found by substituting the optimal expression for the coefficient vector: S=\mathbf y^\mathsf (\mathbf - \mathbf)^\mathsf (\mathbf - \mathbf) \mathbf y = \mathbf y^\mathsf (\mathbf - \mathbf) \mathbf y, where \mathbf=\mathbf(\mathbf^\mathsf\mathbf)^ \mathbf^\mathsf , the latter equality holding since (\mathbf - \mathbf) is symmetric and idempotent. It can be shown from this that under an appropriate assignment of weights the expected value of ''S'' is ''m'' − ''n''. If instead unit weights are assumed, the expected value of ''S'' is (m - n)\sigma^2, where \sigma^2 is the variance of each observation. If it is assumed that the residuals belong to a normal distribution, the objective function, being a sum of weighted squared residuals, will belong to a chi-squared distribution with ''m'' − ''n'' degrees of freedom. Some illustrative percentile values of \chi ^2 are given in the following table. These values can be used for a statistical criterion as to the
goodness of fit The goodness of fit of a statistical model describes how well it fits a set of observations. Measures of goodness of fit typically summarize the discrepancy between observed values and the values expected under the model in question. Such measure ...
. When unit weights are used, the numbers should be divided by the variance of an observation. For WLS, the ordinary objective function above is replaced for a weighted average of residuals.


Discussion

In statistics and mathematics, linear least squares is an approach to fitting a
mathematical Mathematics is an area of knowledge that includes the topics of numbers, formulas and related structures, shapes and the spaces in which they are contained, and quantities and their changes. These topics are represented in modern mathematics ...
or statistical model to
data In the pursuit of knowledge, data (; ) is a collection of discrete Value_(semiotics), values that convey information, describing quantity, qualitative property, quality, fact, statistics, other basic units of meaning, or simply sequences of sy ...
in cases where the idealized value provided by the model for any data point is expressed linearly in terms of the unknown
parameter A parameter (), generally, is any characteristic that can help in defining or classifying a particular system (meaning an event, project, object, situation, etc.). That is, a parameter is an element of a system that is useful, or critical, when ...
s of the model. The resulting fitted model can be used to summarize the data, to predict unobserved values from the same system, and to understand the mechanisms that may underlie the system. Mathematically, linear least squares is the problem of approximately solving an
overdetermined system In mathematics, a system of equations is considered overdetermined if there are more equations than unknowns. An overdetermined system is almost always inconsistent (it has no solution) when constructed with random coefficients. However, an over ...
of linear equations A x = b, where b is not an element of the
column space In linear algebra, the column space (also called the range or image) of a matrix ''A'' is the span (set of all possible linear combinations) of its column vectors. The column space of a matrix is the image or range of the corresponding mat ...
of the matrix A. The approximate solution is realized as an exact solution to A x = b', where b' is the projection of b onto the column space of A. The best approximation is then that which minimizes the sum of squared differences between the data values and their corresponding modeled values. The approach is called ''linear'' least squares since the assumed function is linear in the parameters to be estimated. Linear least squares problems are
convex Convex or convexity may refer to: Science and technology * Convex lens, in optics Mathematics * Convex set, containing the whole line segment that joins points ** Convex polygon, a polygon which encloses a convex set of points ** Convex polytop ...
and have a
closed-form solution In mathematics, a closed-form expression is a mathematical expression that uses a finite number of standard operations. It may contain constants, variables, certain well-known operations (e.g., + − × ÷), and functions (e.g., ''n''th roo ...
that is unique, provided that the number of data points used for fitting equals or exceeds the number of unknown parameters, except in special degenerate situations. In contrast,
non-linear least squares Non-linear least squares is the form of least squares analysis used to fit a set of ''m'' observations with a model that is non-linear in ''n'' unknown parameters (''m'' ≥ ''n''). It is used in some forms of nonlinear regression. The ...
problems generally must be solved by an iterative procedure, and the problems can be non-convex with multiple optima for the objective function. If prior distributions are available, then even an underdetermined system can be solved using the Bayesian MMSE estimator. In statistics, linear least squares problems correspond to a particularly important type of statistical model called linear regression which arises as a particular form of
regression analysis In statistical modeling, regression analysis is a set of statistical processes for estimating the relationships between a dependent variable (often called the 'outcome' or 'response' variable, or a 'label' in machine learning parlance) and one ...
. One basic form of such a model is an
ordinary least squares In statistics, ordinary least squares (OLS) is a type of linear least squares method for choosing the unknown parameters in a linear regression model (with fixed level-one effects of a linear function of a set of explanatory variables) by the ...
model. The present article concentrates on the mathematical aspects of linear least squares problems, with discussion of the formulation and interpretation of statistical regression models and statistical inferences related to these being dealt with in the articles just mentioned. See outline of regression analysis for an outline of the topic.


Properties

If the experimental errors, \varepsilon, are uncorrelated, have a mean of zero and a constant variance, \sigma, the
Gauss–Markov theorem In statistics, the Gauss–Markov theorem (or simply Gauss theorem for some authors) states that the ordinary least squares (OLS) estimator has the lowest sampling variance within the class of linear unbiased estimators, if the errors in the ...
states that the least-squares estimator, \hat, has the minimum variance of all estimators that are linear combinations of the observations. In this sense it is the best, or optimal, estimator of the parameters. Note particularly that this property is independent of the statistical distribution function of the errors. In other words, ''the distribution function of the errors need not be a
normal distribution In statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is : f(x) = \frac e^ The parameter \mu ...
''. However, for some probability distributions, there is no guarantee that the least-squares solution is even possible given the observations; still, in such cases it is the best estimator that is both linear and unbiased. For example, it is easy to show that the arithmetic mean of a set of measurements of a quantity is the least-squares estimator of the value of that quantity. If the conditions of the Gauss–Markov theorem apply, the arithmetic mean is optimal, whatever the distribution of errors of the measurements might be. However, in the case that the experimental errors do belong to a normal distribution, the least-squares estimator is also a
maximum likelihood In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of an assumed probability distribution, given some observed data. This is achieved by maximizing a likelihood function so that, under the assumed stat ...
estimator. These properties underpin the use of the method of least squares for all types of data fitting, even when the assumptions are not strictly valid.


Limitations

An assumption underlying the treatment given above is that the independent variable, ''x'', is free of error. In practice, the errors on the measurements of the independent variable are usually much smaller than the errors on the dependent variable and can therefore be ignored. When this is not the case,
total least squares In applied statistics, total least squares is a type of errors-in-variables regression, a least squares data modeling technique in which observational errors on both dependent and independent variables are taken into account. It is a generaliza ...
or more generally
errors-in-variables models In statistics, errors-in-variables models or measurement error models are regression models that account for measurement errors in the independent variables. In contrast, standard regression models assume that those regressors have been measured e ...
, or ''rigorous least squares'', should be used. This can be done by adjusting the weighting scheme to take into account errors on both the dependent and independent variables and then following the standard procedure. In some cases the (weighted) normal equations matrix ''X''T''X'' is
ill-conditioned In numerical analysis, the condition number of a function measures how much the output value of the function can change for a small change in the input argument. This is used to measure how sensitive a function is to changes or errors in the input ...
. When fitting polynomials the normal equations matrix is a
Vandermonde matrix In linear algebra, a Vandermonde matrix, named after Alexandre-Théophile Vandermonde, is a matrix with the terms of a geometric progression in each row: an matrix :V=\begin 1 & x_1 & x_1^2 & \dots & x_1^\\ 1 & x_2 & x_2^2 & \dots & x_2^\\ 1 & x_ ...
. Vandermonde matrices become increasingly ill-conditioned as the order of the matrix increases. In these cases, the least squares estimate amplifies the measurement noise and may be grossly inaccurate. Various
regularization Regularization may refer to: * Regularization (linguistics) * Regularization (mathematics) * Regularization (physics) In physics, especially quantum field theory, regularization is a method of modifying observables which have singularities in ...
techniques can be applied in such cases, the most common of which is called
ridge regression Ridge regression is a method of estimating the coefficients of multiple- regression models in scenarios where the independent variables are highly correlated. It has been used in many fields including econometrics, chemistry, and engineering. Als ...
. If further information about the parameters is known, for example, a range of possible values of \mathbf, then various techniques can be used to increase the stability of the solution. For example, see
constrained least squares In constrained least squares one solves a linear least squares problem with an additional constraint on the solution. I.e., the unconstrained equation \mathbf \boldsymbol = \mathbf must be fit as closely as possible (in the least squares sens ...
. Another drawback of the least squares estimator is the fact that the norm of the residuals, \, \mathbf y - X\hat \, is minimized, whereas in some cases one is truly interested in obtaining small error in the parameter \mathbf, e.g., a small value of \, -\hat\, . However, since the true parameter is necessarily unknown, this quantity cannot be directly minimized. If a
prior probability In Bayesian statistical inference, a prior probability distribution, often simply called the prior, of an uncertain quantity is the probability distribution that would express one's beliefs about this quantity before some evidence is taken into ...
on \hat is known, then a Bayes estimator can be used to minimize the
mean squared error In statistics, the mean squared error (MSE) or mean squared deviation (MSD) of an estimator (of a procedure for estimating an unobserved quantity) measures the average of the squares of the errors—that is, the average squared difference between ...
, E \left\ . The least squares method is often applied when no prior is known. Surprisingly, when several parameters are being estimated jointly, better estimators can be constructed, an effect known as Stein's phenomenon. For example, if the measurement error is
Gaussian Carl Friedrich Gauss (1777–1855) is the eponym of all of the topics listed below. There are over 100 topics all named after this German mathematician and scientist, all in the fields of mathematics, physics, and astronomy. The English eponymo ...
, several estimators are known which
dominate The Dominate, also known as the late Roman Empire, is the name sometimes given to the " despotic" later phase of imperial government in the ancient Roman Empire. It followed the earlier period known as the "Principate". Until the empire was reuni ...
, or outperform, the least squares technique; the best known of these is the
James–Stein estimator The James–Stein estimator is a biased estimator of the mean, \boldsymbol\theta, of (possibly) correlated Gaussian distributed random vectors Y = \ with unknown means \. It arose sequentially in two main published papers, the earlier version ...
. This is an example of more general shrinkage estimators that have been applied to regression problems.


Applications

*
Polynomial fitting In statistics, polynomial regression is a form of regression analysis in which the relationship between the independent variable ''x'' and the dependent variable ''y'' is modelled as an ''n''th degree polynomial in ''x''. Polynomial regression fi ...
: models are
polynomial In mathematics, a polynomial is an expression consisting of indeterminates (also called variables) and coefficients, that involves only the operations of addition, subtraction, multiplication, and positive-integer powers of variables. An example ...
s in an independent variable, ''x'': ** Straight line: f(x, \boldsymbol \beta)=\beta_1 +\beta_2 x. ** Quadratic: f(x, \boldsymbol \beta)=\beta_1 + \beta_2 x +\beta_3 x^2. ** Cubic, quartic and higher polynomials. For regression with high-order polynomials, the use of
orthogonal polynomials In mathematics, an orthogonal polynomial sequence is a family of polynomials such that any two different polynomials in the sequence are orthogonal to each other under some inner product. The most widely used orthogonal polynomials are the class ...
is recommended. *
Numerical smoothing and differentiation Numerical may refer to: * Number * Numerical digit * Numerical analysis Numerical analysis is the study of algorithms that use numerical approximation (as opposed to symbolic manipulations) for the problems of mathematical analysis (as distin ...
— this is an application of polynomial fitting. * Multinomials in more than one independent variable, including surface fitting * Curve fitting with B-splines *
Chemometrics Chemometrics is the science of extracting information from chemical systems by data-driven means. Chemometrics is inherently interdisciplinary, using methods frequently employed in core data-analytic disciplines such as multivariate statistics, a ...
,
Calibration curve In analytical chemistry, a calibration curve, also known as a standard curve, is a general method for determining the concentration of a substance in an unknown sample by comparing the unknown to a set of standard samples of known concentration. ...
, Standard addition,
Gran plot A Gran plot (also known as Gran titration or the Gran method) is a common means of standardizing a titrate or titrant by estimating the ''equivalence volume'' or '' end point'' in a strong acid-strong base titration or in a potentiometric titrat ...
, analysis of mixtures


Uses in data fitting

The primary application of linear least squares is in data fitting. Given a set of ''m'' data points y_1, y_2,\dots, y_m, consisting of experimentally measured values taken at ''m'' values x_1, x_2,\dots, x_m of an independent variable (x_i may be scalar or vector quantities), and given a model function y=f(x, \boldsymbol \beta), with \boldsymbol \beta = (\beta_1, \beta_2, \dots, \beta_n), it is desired to find the parameters \beta_j such that the model function "best" fits the data. In linear least squares, linearity is meant to be with respect to parameters \beta_j, so f(x, \boldsymbol \beta) = \sum_^ \beta_j \varphi_j(x). Here, the functions \varphi_j may be nonlinear with respect to the variable x. Ideally, the model function fits the data exactly, so y_i = f(x_i, \boldsymbol \beta) for all i=1, 2, \dots, m. This is usually not possible in practice, as there are more data points than there are parameters to be determined. The approach chosen then is to find the minimal possible value of the sum of squares of the residuals r_i(\boldsymbol \beta)= y_i - f(x_i, \boldsymbol \beta),\ (i=1, 2, \dots, m) so to minimize the function S(\boldsymbol \beta)=\sum_^r_i^2(\boldsymbol \beta). After substituting for r_i and then for f, this minimization problem becomes the quadratic minimization problem above with X_ = \varphi_j(x_i), and the best fit can be found by solving the normal equations.


Example

As a result of an experiment, four (x, y) data points were obtained, (1, 6), (2, 5), (3, 7), and (4, 10) (shown in red in the diagram on the right). We hope to find a line y=\beta_1+\beta_2 x that best fits these four points. In other words, we would like to find the numbers \beta_1 and \beta_2 that approximately solve the overdetermined linear system: \begin \beta_1 + 1\beta_2 + r_1 &&\; = \;&& 6 & \\ \beta_1 + 2\beta_2 + r_2 &&\; = \;&& 5 & \\ \beta_1 + 3\beta_2 + r_3 &&\; = \;&& 7 & \\ \beta_1 + 4\beta_2 + r_4 &&\; = \;&& 10 & \\ \end of four equations in two unknowns in some "best" sense. r represents the residual, at each point, between the curve fit and the data: \begin r_1 &&\; = \;&& 6 - (\beta_1 + 1\beta_2) & \\ r_2 &&\; = \;&& 5 - (\beta_1 + 2\beta_2) & \\ r_3 &&\; = \;&& 7 - (\beta_1 + 3\beta_2) & \\ r_4 &&\; = \;&& 10 - (\beta_1 + 4\beta_2) & \\ \end The least squares approach to solving this problem is to try to make the sum of the squares of these residuals as small as possible; that is, to find the
minimum In mathematical analysis, the maxima and minima (the respective plurals of maximum and minimum) of a function, known collectively as extrema (the plural of extremum), are the largest and smallest value of the function, either within a given r ...
of the function: \begin S(\beta_1, \beta_2) &= r_1^2 + r_2^2 + r_3^2 + r_4^2 \\ pt&= -(\beta_1+1\beta_2)2 + -(\beta_1+2\beta_2)2 + -(\beta_1+3\beta_2)2 + 0-(\beta_1+4\beta_2)2 \\ pt&= 4\beta_1^2 + 30\beta_2^2 + 20\beta_1\beta_2 - 56\beta_1 - 154\beta_2 + 210 \\ pt\end The minimum is determined by calculating the partial derivatives of S(\beta_1, \beta_2) with respect to \beta_1 and \beta_2 and setting them to zero: \frac=0=8\beta_1 + 20\beta_2 -56 \frac=0=20\beta_1 + 60\beta_2 -154. This results in a system of two equations in two unknowns, called the normal equations, which when solved give: \beta_1=3.5 \beta_2=1.4 and the equation y = 3.5 + 1.4x is the line of best fit. The residuals, that is, the differences between the y values from the observations and the y predicated variables by using the line of best fit, are then found to be 1.1, -1.3, -0.7, and 0.9 (see the diagram on the right). The minimum value of the sum of squares of the residuals is S(3.5, 1.4)=1.1^2+(-1.3)^2+(-0.7)^2+0.9^2=4.2. More generally, one can have n regressors x_j, and a linear model y = \beta_0 + \sum_^ \beta_ x_.


Using a quadratic model

Importantly, in "linear least squares", we are not restricted to using a line as the model as in the above example. For instance, we could have chosen the restricted quadratic model y=\beta_1 x^2. This model is still linear in the \beta_1 parameter, so we can still perform the same analysis, constructing a system of equations from the data points: \begin 6 &&\; = \beta_1 (1)^2 + r_1 \\ 5 &&\; = \beta_1 (2)^2 + r_2 \\ 7 &&\; = \beta_1 (3)^2 + r_3 \\ 10 &&\; = \beta_1 (4)^2 + r_4 \\ \end The partial derivatives with respect to the parameters (this time there is only one) are again computed and set to 0: \frac = 0 = 708 \beta_1 - 498 and solved \beta_1 = 0.703 leading to the resulting best fit model y = 0.703 x^2.


See also

* Line-line intersection#Nearest point to non-intersecting lines, an application *
Line fitting Line fitting is the process of constructing a straight line that has the best fit to a series of data points. Several methods exist, considering: *Vertical distance: Simple linear regression **Resistance to outliers: Robust simple linear regres ...
*
Nonlinear least squares Non-linear least squares is the form of least squares analysis used to fit a set of ''m'' observations with a model that is non-linear in ''n'' unknown parameters (''m'' ≥ ''n''). It is used in some forms of nonlinear regression. The ...
*
Regularized least squares Regularized least squares (RLS) is a family of methods for solving the least-squares problem while using regularization to further constrain the resulting solution. RLS is used for two main reasons. The first comes up when the number of variables ...
* Simple linear regression *
Partial least squares regression Partial least squares regression (PLS regression) is a statistical method that bears some relation to principal components regression; instead of finding hyperplanes of maximum variance between the response and independent variables, it finds a ...
*
Linear function In mathematics, the term linear function refers to two distinct but related notions: * In calculus and related areas, a linear function is a function whose graph is a straight line, that is, a polynomial function of degree zero or one. For dist ...


References


Further reading

*


External links


Least Squares Fitting – From MathWorld
{{Least Squares and Regression Analysis Broad-concept articles Least squares Computational statistics