Least absolute deviations (LAD), also known as least absolute errors (LAE), least absolute residuals (LAR), or least absolute values (LAV), is a statistical

optimality criterion In statistics, an optimality criterion provides a measure of the fit of the data to a given hypothesis, to aid in model selection. A model is designated as the "best" of the candidate models if it gives the best value of an objective function mea ...

and a statistical

optimization Mathematical optimization (alternatively spelled ''optimisation'') or mathematical programming is the selection of a best element, with regard to some criterion, from some set of available alternatives. It is generally divided into two subfi ...

technique based minimizing the ''

sum of absolute deviations Least absolute deviations (LAD), also known as least absolute errors (LAE), least absolute residuals (LAR), or least absolute values (LAV), is a statistical optimality criterion and a statistical optimization technique based minimizing the '' sum ...

'' (sum of absolute residuals or sum of absolute errors) or the ''L''₁ norm of such values. It is analogous to the

least squares The method of least squares is a standard approach in regression analysis to approximate the solution of overdetermined systems (sets of equations in which there are more equations than unknowns) by minimizing the sum of the squares of the res ...

technique, except that it is based on ''

absolute value In mathematics, the absolute value or modulus of a real number x, is the non-negative value without regard to its sign. Namely, , x, =x if is a positive number, and , x, =-x if x is negative (in which case negating x makes -x positive), an ...

s'' instead of squared values. It attempts to find a

function Function or functionality may refer to: Computing * Function key, a type of key on computer keyboards * Function model, a structured representation of processes in a system * Function object or functor or functionoid, a concept of object-oriente ...

which closely approximates a set of data by minimizing residuals between points generated by the function and corresponding data points. The LAD estimate also arises as the

maximum likelihood In statistics, maximum likelihood estimation (MLE) is a method of estimation theory, estimating the Statistical parameter, parameters of an assumed probability distribution, given some observed data. This is achieved by Mathematical optimization, ...

estimate if the errors have a

Laplace distribution In probability theory and statistics, the Laplace distribution is a continuous probability distribution named after Pierre-Simon Laplace. It is also sometimes called the double exponential distribution, because it can be thought of as two exponen ...

. It was introduced in 1757 by

Roger Joseph Boscovich Roger Joseph Boscovich ( hr, Ruđer Josip Bošković; ; it, Ruggiero Giuseppe Boscovich; la, Rogerius (Iosephus) Boscovicius; sr, Руђер Јосип Бошковић; 18 May 1711 – 13 February 1787) was a physicist, astronomer, ...

Formulation

Suppose that the

data set A data set (or dataset) is a collection of data. In the case of tabular data, a data set corresponds to one or more database tables, where every column of a table represents a particular variable, and each row corresponds to a given record of the ...

consists of the points (''x''_''i'', ''y''_''i'') with ''i'' = 1, 2, ..., ''n''. We want to find a function ''f'' such that

f(x_i)\approx y_i.

To attain this goal, we suppose that the function ''f'' is of a particular form containing some parameters that need to be determined. For instance, the simplest form would be linear: ''f''(''x'') = ''bx'' + ''c'', where ''b'' and ''c'' are parameters whose values are not known but which we would like to estimate. Less simply, suppose that ''f''(''x'') is quadratic, meaning that ''f''(''x'') = ''ax''² + ''bx'' + ''c'', where ''a'', ''b'' and ''c'' are not yet known. (More generally, there could be not just one explanator ''x'', but rather multiple explanators, all appearing as arguments of the function ''f''.) We now seek estimated values of the unknown parameters that minimize the sum of the absolute values of the residuals: :

S = \sum_^n , y_i - f(x_i), .

Solution

Though the idea of least absolute deviations regression is just as straightforward as that of least squares regression, the least absolute deviations line is not as simple to compute efficiently. Unlike least squares regression, least absolute deviations regression does not have an analytical solving method. Therefore, an iterative approach is required. The following is an enumeration of some least absolute deviations solving methods. * Simplex-based methods (such as the Barrodale-Roberts algorithm) ** Because the problem is a

linear program Linear programming (LP), also called linear optimization, is a method to achieve the best outcome (such as maximum profit or lowest cost) in a mathematical model whose requirements are represented by linear relationships. Linear programming is ...

, any of the many linear programming techniques (including the simplex method as well as others) can be applied. *

Iteratively re-weighted least squares The method of iteratively reweighted least squares (IRLS) is used to solve certain optimization problems with objective functions of the form of a ''p''-norm: :\underset \sum_^n \big, y_i - f_i (\boldsymbol\beta) \big, ^p, by an iterative met ...

* Wesolowsky’s direct descent method * Li-Arce’s maximum likelihood approach * Recursive reduction of dimensionality approach * Check all combinations of point-to-point lines for minimum sum of errors Simplex-based methods are the “preferred” way to solve the least absolute deviations problem.William A. Pfeil,
Statistical Teaching Aids
', Bachelor of Science thesis,

Worcester Polytechnic Institute '' , mottoeng = "Theory and Practice" , established = , former_name = Worcester County Free Institute of Industrial Science (1865-1886) , type = Private research university , endowme ...

, 2006 A Simplex method is a method for solving a problem in linear programming. The most popular algorithm is the Barrodale-Roberts modified Simplex algorithm. The algorithms for IRLS, Wesolowsky's Method, and Li's Method can be found in Appendix A of among other methods. Checking all combinations of lines traversing any two (x,y) data points is another method of finding the least absolute deviations line. Since it is known that at least one least absolute deviations line traverses at least two data points, this method will find a line by comparing the SAE (Smallest Absolute Error over data points) of each line, and choosing the line with the smallest SAE. In addition, if multiple lines have the same, smallest SAE, then the lines outline the region of multiple solutions. Though simple, this final method is inefficient for large sets of data.

Solution using linear programming

The problem can be solved using any linear programming technique on the following problem specification. We wish to :

\text \sum_^n , y_i - a_0 - a_1x_ - a_2x_ - \cdots - a_kx_,

with respect to the choice of the values of the parameters

a_0,\ldots, a_k

, where ''y''_''i'' is the value of the ''i''^th observation of the dependent variable, and ''x''_''ij'' is the value of the ''i''^th observation of the ''j''^th independent variable (''j'' = 1,...,''k''). We rewrite this problem in terms of artificial variables ''u''_''i'' as :

\text \sum_^n u_i

:with respect to

a_0,\ldots, a_k

and

u_1,\ldots, u_n

:subject to :

u_i \ge y_i - a_0 - a_1x_ - a_2x_ - \cdots - a_kx_ \,\ \,\ \,\ \,\ \,\ \text i=1,\ldots,n

\,\ \,\ \text i=1,\ldots,n.

These constraints have the effect of forcing each

u_i

to equal

, y_i - a_0 - a_1x_ - a_2x_ - \cdots - a_kx_,

upon being minimized, so the objective function is equivalent to the original objective function. Since this version of the problem statement does not contain the absolute value operator, it is in a format that can be solved with any linear programming package.

Properties

There exist other unique properties of the least absolute deviations line. In the case of a set of (''x'',''y'') data, the least absolute deviations line will always pass through at least two of the data points, unless there are multiple solutions. If multiple solutions exist, then the region of valid least absolute deviations solutions will be bounded by at least two lines, each of which passes through at least two data points. More generally, if there are ''k'' regressors (including the constant), then at least one optimal regression surface will pass through ''k'' of the data points. This "latching" of the line to the data points can help to understand the "instability" property: if the line always latches to at least two points, then the line will jump between different sets of points as the data points are altered. The "latching" also helps to understand the "robustness" property: if there exists an outlier, and a least absolute deviations line must latch onto two data points, the outlier will most likely not be one of those two points because that will not minimize the sum of absolute deviations in most cases. One known case in which multiple solutions exist is a set of points symmetric about a horizontal line, as shown in Figure A below. Least absolute deviations regression method diagram

Least absolute deviations regression method diagram

To understand why there are multiple solutions in the case shown in Figure A, consider the pink line in the green region. Its sum of absolute errors is some value S. If one were to tilt the line upward slightly, while still keeping it within the green region, the sum of errors would still be S. It would not change because the distance from each point to the line grows on one side of the line, while the distance to each point on the opposite side of the line diminishes by exactly the same amount. Thus the sum of absolute errors remains the same. Also, since one can tilt the line in infinitely small increments, this also shows that if there is more than one solution, there are infinitely many solutions.

Advantages and disadvantages

The following is a table contrasting some properties of the method of least absolute deviations with those of the method of least squares (for non-singular problems). *Provided that the number of data points is greater than or equal to the number of features. The method of least absolute deviations finds applications in many areas, due to its robustness compared to the least squares method. Least absolute deviations is robust in that it is resistant to outliers in the data. LAD gives equal emphasis to all observations, in contrast to ordinary least squares (OLS) which, by squaring the residuals, gives more weight to large residuals, that is, outliers in which predicted values are far from actual observations. This may be helpful in studies where outliers do not need to be given greater weight than other observations. If it is important to give greater weight to outliers, the method of least squares is a better choice.

Variations, extensions, specializations

If in the sum of the absolute values of the residuals one generalises the absolute value function to a tilted absolute value function, which on the left half-line has slope

\tau-1

and on the right half-line has slope

\tau

, where

0<\tau<1

, one obtains

quantile regression Quantile regression is a type of regression analysis used in statistics and econometrics. Whereas the method of least squares estimates the conditional ''mean'' of the response variable across values of the predictor variables, quantile regressi ...

. The case of

\tau=1/2

gives the standard regression by least absolute deviations and is also known as ''

median regression Median regression may refer to: * Quantile regression, a regression analysis used to estimate conditional quantiles such as the median * Repeated median regression In robust statistics, repeated median regression, also known as the repeated median ...

''. The least absolute deviation problem may be extended to include multiple explanators, constraints and

regularization Regularization may refer to: * Regularization (linguistics) * Regularization (mathematics) * Regularization (physics) * Regularization (solid modeling) * Regularization Law, an Israeli law intended to retroactively legalize settlements See also ...

, e.g., a linear model with linear constraints: : minimize

S(\mathbf, b) = \sum_i ,  \mathbf'_i \mathbf + b - y_i ,

: subject to, e.g.,

\mathbf'_1 \mathbf + b - y_1 \leq  k

where

\mathbf

is a column vector of coefficients to be estimated, ''b'' is an intercept to be estimated, x_i is a column vector of the ''i''^th observations on the various explanators, ''y''_''i'' is the ''i''^th observation on the dependent variable, and ''k'' is a known constant.

Regularization Regularization may refer to: * Regularization (linguistics) * Regularization (mathematics) * Regularization (physics) * Regularization (solid modeling) * Regularization Law, an Israeli law intended to retroactively legalize settlements See also ...

with

LASSO A lasso ( or ), also called lariat, riata, or reata (all from Castilian, la reata 're-tied rope'), is a loop of rope designed as a restraint to be thrown around a target and tightened when pulled. It is a well-known tool of the Spanish an ...

(least absolute shrinkage and selection operator) may also be combined with LAD.

Formulation

Solution

Solution using linear programming

Properties

Advantages and disadvantages

Variations, extensions, specializations

See also

References

Further reading