In
statistics
Statistics (from German language, German: ''wikt:Statistik#German, Statistik'', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of ...
, Deming regression, named after
W. Edwards Deming, is an
errors-in-variables model which tries to find the
line of best fit
Line fitting is the process of constructing a straight line that has the best fit to a series of data points.
Several methods exist, considering:
*Vertical distance: Simple linear regression
In statistics, simple linear regression is a lin ...
for a two-dimensional dataset. It differs from the
simple linear regression
In statistics, simple linear regression is a linear regression model with a single explanatory variable. That is, it concerns two-dimensional sample points with one independent variable and one dependent variable (conventionally, the ''x'' and ...
in that it accounts for
errors in observations on both the ''x''- and the ''y''- axis. It is a special case of
total least squares
In applied statistics, total least squares is a type of errors-in-variables regression, a least squares data modeling technique in which observational errors on both dependent and independent variables are taken into account. It is a generalizat ...
, which allows for any number of predictors and a more complicated error structure.
Deming regression is equivalent to the
maximum likelihood estimation of an
errors-in-variables model in which the errors for the two variables are assumed to be independent and
normally distributed, and the ratio of their variances, denoted ''δ'', is known. In practice, this ratio might be estimated from related data-sources; however the regression procedure takes no account for possible errors in estimating this ratio.
The Deming regression is only slightly more difficult to compute than the
simple linear regression
In statistics, simple linear regression is a linear regression model with a single explanatory variable. That is, it concerns two-dimensional sample points with one independent variable and one dependent variable (conventionally, the ''x'' and ...
. Most statistical software packages used in clinical chemistry offer Deming regression.
The model was originally introduced by who considered the case ''δ'' = 1, and then more generally by with arbitrary ''δ''. However their ideas remained largely unnoticed for more than 50 years, until they were revived by and later propagated even more by . The latter book became so popular in
clinical chemistry and related fields that the method was even dubbed ''Deming regression'' in those fields.
Specification
Assume that the available data (''y
i'', ''x
i'') are measured observations of the "true" values (''y
i*'', ''x
i*''), which lie on the regression line:
:
where errors ''ε'' and ''η'' are independent and the ratio of their variances is assumed to be known:
:
In practice, the variances of the
and
parameters are often unknown, which complicates the estimate of
. Note that when the measurement method for
and
is the same, these variances are likely to be equal, so
for this case.
We seek to find the line of "best fit"
:
such that the weighted sum of squared residuals of the model is minimized:
:
See for a full derivation.
Solution
The solution can be expressed in terms of the second-degree sample moments. That is, we first calculate the following quantities (all sums go from ''i'' = 1 to ''n''):
:
Finally, the least-squares estimates of model's parameters will be
:
Orthogonal regression
For the case of equal error variances, i.e., when
, Deming regression becomes ''orthogonal regression'': it minimizes the sum of squared
perpendicular distances from the data points to the regression line. In this case, denote each observation as a point ''z''
''j'' in the complex plane (i.e., the point (''x''
''j'', ''y''
''j'') is written as ''z''
''j'' = ''x''
''j'' + ''iy''
''j'' where ''i'' is the
imaginary unit). Denote as ''Z'' the sum of the squared differences of the data points from the
centroid (also denoted in complex coordinates), which is the point whose horizontal and vertical locations are the averages of those of the data points. Then:
*If ''Z'' = 0, then every line through the centroid is a line of best orthogonal fit.
*If ''Z'' ≠ 0, the orthogonal regression line goes through the centroid and is parallel to the vector from the origin to
.
A
trigonometric representation of the orthogonal regression line was given by Coolidge in 1913.
Application
In the case of three
non-collinear points in the plane, the
triangle with these points as its
vertices has a unique
Steiner inellipse that is tangent to the triangle's sides at their midpoints. The
major axis of this ellipse falls on the orthogonal regression line for the three vertices. The quantification of a biological cell's intrinsic
cellular noise can be quantified upon applying Deming regression to the observed behavior of a two reporter
synthetic biological circuit
Synthetic biological circuits are an application of synthetic biology where biological parts inside a cell are designed to perform logical functions mimicking those observed in electronic circuits. The applications range from simply inducing prod ...
.
See also
*
Line fitting
References
;Notes
;Bibliography
*
*
*
*
*
*
*
*
*
*
*
*
{{DEFAULTSORT:Deming Regression
Curve fitting
Regression analysis