HOME

TheInfoList



OR:

The partition of sums of squares is a concept that permeates much of
inferential statistics Statistical inference is the process of using data analysis to infer properties of an underlying distribution of probability.Upton, G., Cook, I. (2008) ''Oxford Dictionary of Statistics'', OUP. . Inferential statistical analysis infers propertie ...
and
descriptive statistics A descriptive statistic (in the count noun sense) is a summary statistic that quantitatively describes or summarizes features from a collection of information, while descriptive statistics (in the mass noun sense) is the process of using and an ...
. More properly, it is the partitioning of sums of
squared deviations Squared deviations from the mean (SDM) result from squaring deviations. In probability theory and statistics, the definition of ''variance'' is either the expected value of the SDM (when considering a theoretical distribution) or its average valu ...
or errors. Mathematically, the sum of squared deviations is an unscaled, or unadjusted measure of
dispersion Dispersion may refer to: Economics and finance *Dispersion (finance), a measure for the statistical distribution of portfolio returns *Price dispersion, a variation in prices across sellers of the same item *Wage dispersion, the amount of variatio ...
(also called variability). When scaled for the number of
degrees of freedom Degrees of freedom (often abbreviated df or DOF) refers to the number of independent variables or parameters of a thermodynamic system. In various scientific fields, the word "freedom" is used to describe the limits to which physical movement or ...
, it estimates the
variance In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its population mean or sample mean. Variance is a measure of dispersion, meaning it is a measure of how far a set of numbers ...
, or spread of the observations about their mean value. Partitioning of the sum of squared deviations into various components allows the overall variability in a dataset to be ascribed to different types or sources of variability, with the relative importance of each being quantified by the size of each component of the overall sum of squares.


Background

The distance from any point in a collection of data, to the mean of the data, is the deviation. This can be written as y_i - \overline, where y_i is the ith data point, and \overline is the estimate of the mean. If all such deviations are squared, then summed, as in \sum_^n\left(y_i-\overline\,\right)^2, this gives the "sum of squares" for these data. When more data are added to the collection the sum of squares will increase, except in unlikely cases such as the new data being equal to the mean. So usually, the sum of squares will grow with the size of the data collection. That is a manifestation of the fact that it is unscaled. In many cases, the number of
degrees of freedom Degrees of freedom (often abbreviated df or DOF) refers to the number of independent variables or parameters of a thermodynamic system. In various scientific fields, the word "freedom" is used to describe the limits to which physical movement or ...
is simply the number of data points in the collection, minus one. We write this as ''n'' − 1, where ''n'' is the number of data points. Scaling (also known as normalizing) means adjusting the sum of squares so that it does not grow as the size of the data collection grows. This is important when we want to compare samples of different sizes, such as a sample of 100 people compared to a sample of 20 people. If the sum of squares were not normalized, its value would always be larger for the sample of 100 people than for the sample of 20 people. To scale the sum of squares, we divide it by the degrees of freedom, i.e., calculate the sum of squares per degree of freedom, or variance.
Standard deviation In statistics, the standard deviation is a measure of the amount of variation or dispersion of a set of values. A low standard deviation indicates that the values tend to be close to the mean (also called the expected value) of the set, while ...
, in turn, is the square root of the variance. The above describes how the sum of squares is used in descriptive statistics; see the article on
total sum of squares In statistical data analysis the total sum of squares (TSS or SST) is a quantity that appears as part of a standard way of presenting results of such analyses. For a set of observations, y_i, i\leq n, it is defined as the sum over all squared dif ...
for an application of this broad principle to
inferential statistics Statistical inference is the process of using data analysis to infer properties of an underlying distribution of probability.Upton, G., Cook, I. (2008) ''Oxford Dictionary of Statistics'', OUP. . Inferential statistical analysis infers propertie ...
.


Partitioning the sum of squares in linear regression

Theorem. Given a
linear regression model In statistics, linear regression is a linear approach for modelling the relationship between a scalar response and one or more explanatory variables (also known as dependent and independent variables). The case of one explanatory variable is call ...
y_i = \beta_0 + \beta_1 x_ + \cdots + \beta_p x_ + \varepsilon_i ''including a constant'' \beta_0, based on a sample (y_i, x_, \ldots, x_), \, i = 1, \ldots, n containing ''n'' observations, the total sum of squares \mathrm = \sum_^n (y_i - \bar)^2 can be partitioned as follows into the
explained sum of squares In statistics, the explained sum of squares (ESS), alternatively known as the model sum of squares or sum of squares due to regression (SSR – not to be confused with the residual sum of squares (RSS) or sum of squares of errors), is a quantity ...
(ESS) and the
residual sum of squares In statistics, the residual sum of squares (RSS), also known as the sum of squared estimate of errors (SSE), is the sum of the squares of residuals (deviations predicted from actual empirical values of data). It is a measure of the discrepan ...
(RSS): :\mathrm = \mathrm + \mathrm, where this equation is equivalent to each of the following forms: : \begin \left\, y - \bar \mathbf \right\, ^2 &= \left\, \hat - \bar \mathbf \right\, ^2 + \left\, \hat \right\, ^2, \quad \mathbf = (1, 1, \ldots, 1)^T ,\\ \sum_^n (y_i - \bar)^2 &= \sum_^n (\hat_i - \bar)^2 + \sum_^n (y_i - \hat_i)^2 ,\\ \sum_^n (y_i - \bar)^2 &= \sum_^n (\hat_i - \bar)^2 + \sum_^n \hat_i^2 ,\\ \end :where \hat_i is the value estimated by the regression line having \hat_0 , \hat_1, ..., \hat_p as the estimated
coefficient In mathematics, a coefficient is a multiplicative factor in some term of a polynomial, a series, or an expression; it is usually a number, but may be any expression (including variables such as , and ). When the coefficients are themselves var ...
s.


Proof

: \begin \sum_^n (y_i - \overline)^2 &= \sum_^n (y_i - \overline + \hat_i - \hat_i)^2 = \sum_^n ((\hat_i - \bar) + \underbrace_)^2 \\ &= \sum_^n ((\hat_i - \bar)^2 + 2 \hat_i (\hat_i - \bar) + \hat_i^2) \\ &= \sum_^n (\hat_i - \bar)^2 + \sum_^n \hat_i^2 + 2 \sum_^n \hat_i (\hat_i - \bar) \\ &= \sum_^n (\hat_i - \bar)^2 + \sum_^n \hat_i^2 + 2 \sum_^n \hat_i(\hat_0 + \hat_1 x_ + \cdots + \hat_p x_ - \overline) \\ &= \sum_^n (\hat_i - \bar)^2 + \sum_^n \hat_i^2 + 2 (\hat_0 - \overline) \underbrace_0 + 2 \hat_1 \underbrace_0 + \cdots + 2 \hat_p \underbrace_0 \\ &= \sum_^n (\hat_i - \bar)^2 + \sum_^n \hat_i^2 = \mathrm + \mathrm \\ \end The requirement that the model include a constant or equivalently that the design matrix contain a column of ones ensures that \sum_^n \hat_i = 0 , i.e. \hat^T\mathbf=0 . The proof can also be expressed in vector form, as follows: : \begin SS_\text = \Vert \mathbf - \bar\mathbf \Vert^2 & = \Vert \mathbf - \bar\mathbf + \mathbf - \mathbf \Vert^2 , \\ & = \Vert \left( \mathbf - \bar\mathbf \right) + \left( \mathbf - \mathbf \right) \Vert^2, \\ & = \Vert \Vert^2 + \Vert \hat \varepsilon\Vert^2 + 2 \hat \varepsilon^T \left( \mathbf - \bar\mathbf \right), \\ & = SS_\text + SS_\text + 2\hat \varepsilon^T \left( X\hat\beta - \bar\mathbf \right) ,\\ & = SS_\text + SS_\text + 2\left( \hat \varepsilon^T X \right)\hat\beta - 2\bar\underbrace_0, \\ & = SS_\text + SS_\text. \end The elimination of terms in the last line, used the fact that : \hat \varepsilon ^T X = \left( \mathbf - \mathbf \right)^T X = \mathbf^T(I - X(X^T X)^ X^T)^T X = ^T(X^T-X^T)^T=.


Further partitioning

Note that the residual sum of squares can be further partitioned as the
lack-of-fit sum of squares In statistics, a sum of squares due to lack of fit, or more tersely a lack-of-fit sum of squares, is one of the components of a partition of the sum of squares of residuals in an analysis of variance, used in the numerator in an F-test of the null ...
plus the sum of squares due to pure error.


See also

*
Inner-product space In mathematics, an inner product space (or, rarely, a Hausdorff space, Hausdorff pre-Hilbert space) is a real vector space or a complex vector space with an operation (mathematics), operation called an inner product. The inner product of two ve ...
**
Hilbert space In mathematics, Hilbert spaces (named after David Hilbert) allow generalizing the methods of linear algebra and calculus from (finite-dimensional) Euclidean vector spaces to spaces that may be infinite-dimensional. Hilbert spaces arise natural ...
***
Euclidean space Euclidean space is the fundamental space of geometry, intended to represent physical space. Originally, that is, in Euclid's Elements, Euclid's ''Elements'', it was the three-dimensional space of Euclidean geometry, but in modern mathematics ther ...
*
Expected mean squares In statistics, expected mean squares (EMS) are the expected values of certain statistics arising in partitions of sums of squares in the analysis of variance (ANOVA). They can be used for ascertaining which statistic should appear in the denominato ...
**
Orthogonality In mathematics, orthogonality is the generalization of the geometric notion of ''perpendicularity''. By extension, orthogonality is also used to refer to the separation of specific features of a system. The term also has specialized meanings in ...
**
Orthonormal basis In mathematics, particularly linear algebra, an orthonormal basis for an inner product space ''V'' with finite dimension is a basis for V whose vectors are orthonormal, that is, they are all unit vectors and orthogonal to each other. For example, ...
***
Orthogonal complement In the mathematical fields of linear algebra and functional analysis, the orthogonal complement of a subspace ''W'' of a vector space ''V'' equipped with a bilinear form ''B'' is the set ''W''⊥ of all vectors in ''V'' that are orthogonal to every ...
, the closed subspace orthogonal to a set (especially a subspace) ***
Orthomodular lattice In the mathematical discipline of order theory, a complemented lattice is a bounded lattice (with least element 0 and greatest element 1), in which every element ''a'' has a complement, i.e. an element ''b'' satisfying ''a'' ∨ ''b''&nbs ...
of the subspaces of an inner-product space ***
Orthogonal projection In linear algebra and functional analysis, a projection is a linear transformation P from a vector space to itself (an endomorphism) such that P\circ P=P. That is, whenever P is applied twice to any vector, it gives the same result as if it wer ...
**
Pythagorean theorem In mathematics, the Pythagorean theorem or Pythagoras' theorem is a fundamental relation in Euclidean geometry between the three sides of a right triangle. It states that the area of the square whose side is the hypotenuse (the side opposite t ...
that the sum of the squared norms of orthogonal summands equals the squared norm of the sum. *
Least squares The method of least squares is a standard approach in regression analysis to approximate the solution of overdetermined systems (sets of equations in which there are more equations than unknowns) by minimizing the sum of the squares of the res ...
*
Mean squared error In statistics, the mean squared error (MSE) or mean squared deviation (MSD) of an estimator (of a procedure for estimating an unobserved quantity) measures the average of the squares of the errors—that is, the average squared difference between ...
*
Squared deviations Squared deviations from the mean (SDM) result from squaring deviations. In probability theory and statistics, the definition of ''variance'' is either the expected value of the SDM (when considering a theoretical distribution) or its average valu ...


References

* Pre-publication chapters are available on-line. * * *:Republished as: * {{cite book, title=Probability Via Expectation, edition=4th, author=Whittle, P., publisher=Springer, date=20 April 2000, isbn=0-387-98955-2 Analysis of variance Least squares