Perpendicular regression
   HOME

TheInfoList



OR:

In statistics, Deming regression, named after
W. Edwards Deming William Edwards Deming (October 14, 1900 – December 20, 1993) was an American engineer, statistician, professor, author, lecturer, and management consultant. Educated initially as an electrical engineer and later specializing in mathematical ...
, is an
errors-in-variables model In statistics, errors-in-variables models or measurement error models are regression models that account for measurement errors in the independent variables. In contrast, standard regression models assume that those regressors have been measure ...
which tries to find the
line of best fit Line fitting is the process of constructing a straight line that has the best fit to a series of data points. Several methods exist, considering: *Vertical distance: Simple linear regression **Resistance to outliers: Robust simple linear regre ...
for a two-dimensional dataset. It differs from the simple linear regression in that it accounts for errors in observations on both the ''x''- and the ''y''- axis. It is a special case of
total least squares In applied statistics, total least squares is a type of errors-in-variables regression, a least squares data modeling technique in which observational errors on both dependent and independent variables are taken into account. It is a generaliza ...
, which allows for any number of predictors and a more complicated error structure. Deming regression is equivalent to the
maximum likelihood In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of an assumed probability distribution, given some observed data. This is achieved by maximizing a likelihood function so that, under the assumed stat ...
estimation of an
errors-in-variables model In statistics, errors-in-variables models or measurement error models are regression models that account for measurement errors in the independent variables. In contrast, standard regression models assume that those regressors have been measure ...
in which the errors for the two variables are assumed to be independent and normally distributed, and the ratio of their variances, denoted ''δ'', is known. In practice, this ratio might be estimated from related data-sources; however the regression procedure takes no account for possible errors in estimating this ratio. The Deming regression is only slightly more difficult to compute than the simple linear regression. Most statistical software packages used in clinical chemistry offer Deming regression. The model was originally introduced by who considered the case ''δ'' = 1, and then more generally by with arbitrary ''δ''. However their ideas remained largely unnoticed for more than 50 years, until they were revived by and later propagated even more by . The latter book became so popular in
clinical chemistry Clinical chemistry (also known as chemical pathology, clinical biochemistry or medical biochemistry) is the area of chemistry that is generally concerned with analysis of bodily fluids for diagnostic and therapeutic purposes. It is an applied ...
and related fields that the method was even dubbed ''Deming regression'' in those fields.


Specification

Assume that the available data (''yi'', ''xi'') are measured observations of the "true" values (''yi*'', ''xi*''), which lie on the regression line: : \begin y_i &= y^*_i + \varepsilon_i, \\ x_i &= x^*_i + \eta_i, \end where errors ''ε'' and ''η'' are independent and the ratio of their variances is assumed to be known: : \delta = \frac. In practice, the variances of the x and y parameters are often unknown, which complicates the estimate of \delta . Note that when the measurement method for x and y is the same, these variances are likely to be equal, so \delta = 1 for this case. We seek to find the line of "best fit" : y^* = \beta_0 + \beta_1 x^*, such that the weighted sum of squared residuals of the model is minimized: : SSR = \sum_^n\bigg(\frac + \frac\bigg) = \frac \sum_^n\Big((y_i-\beta_0-\beta_1x^*_i)^2 + \delta(x_i-x^*_i)^2\Big) \ \to\ \min_ SSR See for a full derivation.


Solution

The solution can be expressed in terms of the second-degree sample moments. That is, we first calculate the following quantities (all sums go from ''i'' = 1 to ''n''): : \begin & \overline = \frac\sum x_i, \quad \overline = \frac\sum y_i, \\ & s_ = \tfrac\sum (x_i-\overline)^2, \\ & s_ = \tfrac\sum (x_i-\overline)(y_i-\overline), \\ & s_ = \tfrac\sum (y_i-\overline)^2. \end Finally, the least-squares estimates of model's parameters will be : \begin & \hat\beta_1 = \frac, \\ & \hat\beta_0 = \overline - \hat\beta_1\overline, \\ & \hat_i^* = x_i + \frac(y_i-\hat\beta_0-\hat\beta_1x_i). \end


Orthogonal regression

For the case of equal error variances, i.e., when \delta=1, Deming regression becomes ''orthogonal regression'': it minimizes the sum of squared perpendicular distances from the data points to the regression line. In this case, denote each observation as a point ''z''''j'' in the complex plane (i.e., the point (''x''''j'', ''y''''j'') is written as ''z''''j'' = ''x''''j'' + ''iy''''j'' where ''i'' is the
imaginary unit The imaginary unit or unit imaginary number () is a solution to the quadratic equation x^2+1=0. Although there is no real number with this property, can be used to extend the real numbers to what are called complex numbers, using addition an ...
). Denote as ''Z'' the sum of the squared differences of the data points from the
centroid In mathematics and physics, the centroid, also known as geometric center or center of figure, of a plane figure or solid figure is the arithmetic mean position of all the points in the surface of the figure. The same definition extends to any ...
(also denoted in complex coordinates), which is the point whose horizontal and vertical locations are the averages of those of the data points. Then: *If ''Z'' = 0, then every line through the centroid is a line of best orthogonal fit. *If ''Z'' ≠ 0, the orthogonal regression line goes through the centroid and is parallel to the vector from the origin to \sqrt. A
trigonometric Trigonometry () is a branch of mathematics that studies relationships between side lengths and angles of triangles. The field emerged in the Hellenistic world during the 3rd century BC from applications of geometry to astronomical studies. ...
representation of the orthogonal regression line was given by Coolidge in 1913.


Application

In the case of three non-collinear points in the plane, the
triangle A triangle is a polygon with three edges and three vertices. It is one of the basic shapes in geometry. A triangle with vertices ''A'', ''B'', and ''C'' is denoted \triangle ABC. In Euclidean geometry, any three points, when non- colline ...
with these points as its vertices has a unique
Steiner inellipse In geometry, the Steiner inellipse,Weisstein, E. "Steiner Inellipse" — From MathWorld, A Wolfram Web Resource, http://mathworld.wolfram.com/SteinerInellipse.html. midpoint inellipse, or midpoint ellipse of a triangle is the unique ellipse i ...
that is tangent to the triangle's sides at their midpoints. The major axis of this ellipse falls on the orthogonal regression line for the three vertices. The quantification of a biological cell's intrinsic
cellular noise Cellular noise is random variability in quantities arising in cellular biology. For example, cells which are genetically identical, even within the same tissue, are often observed to have different expression levels of proteins, different sizes and ...
can be quantified upon applying Deming regression to the observed behavior of a two reporter synthetic biological circuit.


See also

*
Line fitting Line fitting is the process of constructing a straight line that has the best fit to a series of data points. Several methods exist, considering: *Vertical distance: Simple linear regression **Resistance to outliers: Robust simple linear regres ...


References

;Notes ;Bibliography * * * * * * * * * * * * {{DEFAULTSORT:Deming Regression Curve fitting Regression analysis