HOME

TheInfoList



OR:

In
statistics Statistics (from German: '' Statistik'', "description of a state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a scientific, indust ...
, the number of degrees of freedom is the number of values in the final calculation of a
statistic A statistic (singular) or sample statistic is any quantity computed from values in a sample which is considered for a statistical purpose. Statistical purposes include estimating a population parameter, describing a sample, or evaluating a hypo ...
that are free to vary. Estimates of
statistical parameter In statistics, as opposed to its general use in mathematics, a parameter is any measured quantity of a statistical population that summarises or describes an aspect of the population, such as a mean or a standard deviation. If a population ...
s can be based upon different amounts of information or data. The number of independent pieces of information that go into the estimate of a parameter is called the degrees of freedom. In general, the degrees of freedom of an estimate of a parameter are equal to the number of independent scores that go into the estimate minus the number of parameters used as intermediate steps in the estimation of the parameter itself. For example, if the variance is to be estimated from a random sample of ''N'' independent scores, then the degrees of freedom is equal to the number of independent scores (''N'') minus the number of parameters estimated as intermediate steps (one, namely, the sample mean) and is therefore equal to ''N'' − 1. Mathematically, degrees of freedom is the number of
dimension In physics and mathematics, the dimension of a mathematical space (or object) is informally defined as the minimum number of coordinates needed to specify any point within it. Thus, a line has a dimension of one (1D) because only one coord ...
s of the domain of a random vector, or essentially the number of "free" components (how many components need to be known before the vector is fully determined). The term is most often used in the context of
linear models In statistics, the term linear model is used in different ways according to the context. The most common occurrence is in connection with regression models and the term is often taken as synonymous with linear regression model. However, the term ...
( linear regression,
analysis of variance Analysis of variance (ANOVA) is a collection of statistical models and their associated estimation procedures (such as the "variation" among and between groups) used to analyze the differences among means. ANOVA was developed by the statistician ...
), where certain random vectors are constrained to lie in linear subspaces, and the number of degrees of freedom is the dimension of the subspace. The degrees of freedom are also commonly associated with the squared lengths (or "sum of squares" of the coordinates) of such vectors, and the parameters of chi-squared and other distributions that arise in associated statistical testing problems. While introductory textbooks may introduce degrees of freedom as distribution parameters or through hypothesis testing, it is the underlying geometry that defines degrees of freedom, and is critical to a proper understanding of the concept.


History

Although the basic concept of degrees of freedom was recognized as early as 1821 in the work of German astronomer and mathematician Carl Friedrich Gauss, its modern definition and usage was first elaborated by English statistician
William Sealy Gosset William Sealy Gosset (13 June 1876 – 16 October 1937) was an English statistician, chemist and brewer who served as Head Brewer of Guinness and Head Experimental Brewer of Guinness and was a pioneer of modern statistics. He pioneered small sa ...
in his 1908 '' Biometrika'' article "The Probable Error of a Mean", published under the pen name "Student". While Gosset did not actually use the term 'degrees of freedom', he explained the concept in the course of developing what became known as Student's t-distribution. The term itself was popularized by English statistician and biologist Ronald Fisher, beginning with his 1922 work on chi squares.


Notation

In equations, the typical symbol for degrees of freedom is ''ν'' (lowercase Greek letter nu). In text and tables, the abbreviation "d.f." is commonly used.
R. A. Fisher Sir Ronald Aylmer Fisher (17 February 1890 – 29 July 1962) was a British polymath who was active as a mathematician, statistician, biologist, geneticist, and academic. For his work in statistics, he has been described as "a genius who a ...
used ''n'' to symbolize degrees of freedom but modern usage typically reserves ''n'' for sample size.


Of random vectors

Geometrically, the degrees of freedom can be interpreted as the dimension of certain vector subspaces. As a starting point, suppose that we have a sample of independent normally distributed observations, :X_1,\dots,X_n.\, This can be represented as an ''n''-dimensional random vector: :\begin X_1\\ \vdots \\ X_n \end. Since this random vector can lie anywhere in ''n''-dimensional space, it has ''n'' degrees of freedom. Now, let \bar X be the sample mean. The random vector can be decomposed as the sum of the sample mean plus a vector of residuals: :\begin X_1\\ \vdots \\ X_n \end = \bar X \begin 1 \\ \vdots \\ 1 \end + \begin X_1-\bar \\ \vdots \\ X_n-\bar \end. The first vector on the right-hand side is constrained to be a multiple of the vector of 1's, and the only free quantity is \bar X. It therefore has 1 degree of freedom. The second vector is constrained by the relation \sum_^n (X_i-\bar X)=0. The first ''n'' − 1 components of this vector can be anything. However, once you know the first ''n'' − 1 components, the constraint tells you the value of the ''n''th component. Therefore, this vector has ''n'' − 1 degrees of freedom. Mathematically, the first vector is the Oblique projection of the data vector onto the subspace spanned by the vector of 1's. The 1 degree of freedom is the dimension of this subspace. The second residual vector is the least-squares projection onto the (''n'' − 1)-dimensional orthogonal complement of this subspace, and has ''n'' − 1 degrees of freedom. In statistical testing applications, often one is not directly interested in the component vectors, but rather in their squared lengths. In the example above, the
residual sum-of-squares In statistics, the residual sum of squares (RSS), also known as the sum of squared estimate of errors (SSE), is the sum of the squares of residuals (deviations predicted from actual empirical values of data). It is a measure of the discrepa ...
is :\sum_^n (X_i - \bar)^2 = \begin X_1-\bar \\ \vdots \\ X_n-\bar \end^2. If the data points X_i are normally distributed with mean 0 and variance \sigma^2, then the residual sum of squares has a scaled chi-squared distribution (scaled by the factor \sigma^2), with ''n'' − 1 degrees of freedom. The degrees-of-freedom, here a parameter of the distribution, can still be interpreted as the dimension of an underlying vector subspace. Likewise, the one-sample ''t''-test statistic, :\frac follows a
Student's t In probability and statistics, Student's ''t''-distribution (or simply the ''t''-distribution) is any member of a family of continuous probability distributions that arise when estimating the mean of a normally distributed population in s ...
distribution with ''n'' − 1 degrees of freedom when the hypothesized mean \mu_0 is correct. Again, the degrees-of-freedom arises from the residual vector in the denominator.


In structural equation models

When the results of structural equation models (SEM) are presented, they generally include one or more indices of overall model fit, the most common of which is a ''χ''2 statistic. This forms the basis for other indices that are commonly reported. Although it is these other statistics that are most commonly interpreted, the ''degrees of freedom'' of the ''χ''2 are essential to understanding model fit as well as the nature of the model itself. Degrees of freedom in SEM are computed as a difference between the number of unique pieces of information that are used as input into the analysis, sometimes called knowns, and the number of parameters that are uniquely estimated, sometimes called unknowns. For example, in a one-factor confirmatory factor analysis with 4 items, there are 10 knowns (the six unique covariances among the four items and the four item variances) and 8 unknowns (4 factor loadings and 4 error variances) for 2 degrees of freedom. Degrees of freedom are important to the understanding of model fit if for no other reason than that, all else being equal, the fewer degrees of freedom, the better indices such as ''χ''2 will be. It has been shown that degrees of freedom can be used by readers of papers that contain SEMs to determine if the authors of those papers are in fact reporting the correct model fit statistics. In the organizational sciences, for example, nearly half of papers published in top journals report degrees of freedom that are inconsistent with the models described in those papers, leaving the reader to wonder which models were actually tested.


Of residuals

A common way to think of degrees of freedom is as the number of independent pieces of information available to estimate another piece of information. More concretely, the number of degrees of freedom is the number of independent observations in a sample of data that are available to estimate a parameter of the population from which that sample is drawn. For example, if we have two observations, when calculating the mean we have two independent observations; however, when calculating the variance, we have only one independent observation, since the two observations are equally distant from the sample mean. In fitting statistical models to data, the vectors of residuals are constrained to lie in a space of smaller dimension than the number of components in the vector. That smaller dimension is the number of ''degrees of freedom for error'', also called ''residual degrees of freedom''.


Example

Perhaps the simplest example is this. Suppose :X_1,\dots,X_n are random variables each with expected value ''μ'', and let :\overline_n = \frac be the "sample mean." Then the quantities :X_i-\overline_n are residuals that may be considered estimates of the
errors An error (from the Latin ''error'', meaning "wandering") is an action which is inaccurate or incorrect. In some usages, an error is synonymous with a mistake. The etymology derives from the Latin term 'errare', meaning 'to stray'. In statistics ...
''X''''i'' − ''μ''. The sum of the residuals (unlike the sum of the errors) is necessarily 0. If one knows the values of any ''n'' − 1 of the residuals, one can thus find the last one. That means they are constrained to lie in a space of dimension ''n'' − 1. One says that there are ''n'' − 1 degrees of freedom for errors. An example which is only slightly less simple is that of least squares estimation of ''a'' and ''b'' in the model :Y_i=a+bx_i+e_i\text i=1,\dots,n where ''x''''i'' is given, but e''i'' and hence ''Y''''i'' are random. Let \widehat and \widehat be the least-squares estimates of ''a'' and ''b''. Then the residuals : \widehat_i=y_i-(\widehat+\widehatx_i) are constrained to lie within the space defined by the two equations : \widehat_1 + \cdots + \widehat_n=0, : x_1 \widehat_1 + \cdots + x_n \widehat_n=0. One says that there are ''n'' − 2 degrees of freedom for error. Notationally, the capital letter ''Y'' is used in specifying the model, while lower-case ''y'' in the definition of the residuals; that is because the former are hypothesized random variables and the latter are actual data. We can generalise this to multiple regression involving ''p'' parameters and covariates (e.g. ''p'' − 1 predictors and one mean (=intercept in the regression)), in which case the cost in ''degrees of freedom of the fit'' is ''p'', leaving ''n - p'' degrees of freedom for errors


In linear models

The demonstration of the ''t'' and chi-squared distributions for one-sample problems above is the simplest example where degrees-of-freedom arise. However, similar geometry and vector decompositions underlie much of the theory of
linear models In statistics, the term linear model is used in different ways according to the context. The most common occurrence is in connection with regression models and the term is often taken as synonymous with linear regression model. However, the term ...
, including linear regression and
analysis of variance Analysis of variance (ANOVA) is a collection of statistical models and their associated estimation procedures (such as the "variation" among and between groups) used to analyze the differences among means. ANOVA was developed by the statistician ...
. An explicit example based on comparison of three means is presented here; the geometry of linear models is discussed in more complete detail by Christensen (2002). Suppose independent observations are made for three populations, X_1,\ldots,X_n, Y_1,\ldots,Y_n and Z_1,\ldots,Z_n. The restriction to three groups and equal sample sizes simplifies notation, but the ideas are easily generalized. The observations can be decomposed as :\begin X_i &= \bar + (\bar-\bar) + (X_i-\bar)\\ Y_i &= \bar + (\bar-\bar) + (Y_i-\bar)\\ Z_i &= \bar + (\bar-\bar) + (Z_i-\bar) \end where \bar, \bar, \bar are the means of the individual samples, and \bar=(\bar+\bar+\bar)/3 is the mean of all 3''n'' observations. In vector notation this decomposition can be written as : \begin X_1 \\ \vdots \\ X_n \\ Y_1 \\ \vdots \\ Y_n \\ Z_1 \\ \vdots \\ Z_n \end = \bar \begin1 \\ \vdots \\ 1 \\ 1 \\ \vdots \\ 1 \\ 1 \\ \vdots \\ 1 \end + \begin\bar-\bar\\ \vdots \\ \bar-\bar \\ \bar-\bar\\ \vdots \\ \bar-\bar \\ \bar-\bar\\ \vdots \\ \bar-\bar \end + \begin X_1-\bar \\ \vdots \\ X_n-\bar \\ Y_1-\bar \\ \vdots \\ Y_n-\bar \\ Z_1-\bar \\ \vdots \\ Z_n-\bar \end. The observation vector, on the left-hand side, has 3''n'' degrees of freedom. On the right-hand side, the first vector has one degree of freedom (or dimension) for the overall mean. The second vector depends on three random variables, \bar-\bar, \bar-\bar and \overline-\overline. However, these must sum to 0 and so are constrained; the vector therefore must lie in a 2-dimensional subspace, and has 2 degrees of freedom. The remaining 3''n'' − 3 degrees of freedom are in the residual vector (made up of ''n'' − 1 degrees of freedom within each of the populations).


In analysis of variance (ANOVA)

In statistical testing problems, one usually is not interested in the component vectors themselves, but rather in their squared lengths, or Sum of Squares. The degrees of freedom associated with a sum-of-squares is the degrees-of-freedom of the corresponding component vectors. The three-population example above is an example of
one-way Analysis of Variance In statistics, one-way analysis of variance (abbreviated one-way ANOVA) is a technique that can be used to compare whether two sample's means are significantly different or not (using the F distribution). This technique can be used only for numerica ...
. The model, or treatment, sum-of-squares is the squared length of the second vector, :\text = n(\bar-\bar)^2 + n(\bar-\bar)^2 + n(\bar-\bar)^2 with 2 degrees of freedom. The residual, or error, sum-of-squares is :\text = \sum_^n (X_i-\bar)^2 + \sum_^n (Y_i-\bar)^2 + \sum_^n (Z_i-\bar)^2 with 3(''n''−1) degrees of freedom. Of course, introductory books on ANOVA usually state formulae without showing the vectors, but it is this underlying geometry that gives rise to SS formulae, and shows how to unambiguously determine the degrees of freedom in any given situation. Under the null hypothesis of no difference between population means (and assuming that standard ANOVA regularity assumptions are satisfied) the sums of squares have scaled chi-squared distributions, with the corresponding degrees of freedom. The F-test statistic is the ratio, after scaling by the degrees of freedom. If there is no difference between population means this ratio follows an ''F''-distribution with 2 and 3''n'' − 3 degrees of freedom. In some complicated settings, such as unbalanced split-plot designs, the sums-of-squares no longer have scaled chi-squared distributions. Comparison of sum-of-squares with degrees-of-freedom is no longer meaningful, and software may report certain fractional 'degrees of freedom' in these cases. Such numbers have no genuine degrees-of-freedom interpretation, but are simply providing an ''approximate'' chi-squared distribution for the corresponding sum-of-squares. The details of such approximations are beyond the scope of this page.


In probability distributions

Several commonly encountered statistical distributions ( Student's ''t'', chi-squared, ''F'') have parameters that are commonly referred to as ''degrees of freedom''. This terminology simply reflects that in many applications where these distributions occur, the parameter corresponds to the degrees of freedom of an underlying random vector, as in the preceding ANOVA example. Another simple example is: if X_i; i=1,\ldots,n are independent normal (\mu,\sigma^2) random variables, the statistic : \frac follows a chi-squared distribution with ''n'' − 1 degrees of freedom. Here, the degrees of freedom arises from the residual sum-of-squares in the numerator, and in turn the ''n'' − 1 degrees of freedom of the underlying residual vector \. In the application of these distributions to linear models, the degrees of freedom parameters can take only
integer An integer is the number zero (), a positive natural number (, , , etc.) or a negative integer with a minus sign ( −1, −2, −3, etc.). The negative numbers are the additive inverses of the corresponding positive numbers. In the languag ...
values. The underlying families of distributions allow fractional values for the degrees-of-freedom parameters, which can arise in more sophisticated uses. One set of examples is problems where chi-squared approximations based on effective degrees of freedom are used. In other applications, such as modelling
heavy-tailed In probability theory, heavy-tailed distributions are probability distributions whose tails are not exponentially bounded: that is, they have heavier tails than the exponential distribution. In many applications it is the right tail of the distrib ...
data, a t or ''F''-distribution may be used as an empirical model. In these cases, there is no particular ''degrees of freedom'' interpretation to the distribution parameters, even though the terminology may continue to be used.


In non-standard regression

Many non-standard regression methods, including regularized least squares (e.g.,
ridge regression Ridge regression is a method of estimating the coefficients of multiple-regression models in scenarios where the independent variables are highly correlated. It has been used in many fields including econometrics, chemistry, and engineering. Also ...
), linear smoothers,
smoothing splines Smoothing splines are function estimates, \hat f(x), obtained from a set of noisy observations y_i of the target f(x_i), in order to balance a measure of goodness of fit of \hat f(x_i) to y_i with a derivative based measure of the smoothness of ...
, and semiparametric regression are not based on ordinary least squares projections, but rather on regularized ( generalized and/or penalized) least-squares, and so degrees of freedom defined in terms of dimensionality is generally not useful for these procedures. However, these procedures are still linear in the observations, and the fitted values of the regression can be expressed in the form :\hat = Hy, where \hat is the vector of fitted values at each of the original covariate values from the fitted model, ''y'' is the original vector of responses, and ''H'' is the
hat matrix In statistics, the projection matrix (\mathbf), sometimes also called the influence matrix or hat matrix (\mathbf), maps the vector of response values (dependent variable values) to the vector of fitted values (or predicted values). It describes ...
or, more generally, smoother matrix. For statistical inference, sums-of-squares can still be formed: the model sum-of-squares is \, Hy\, ^2; the residual sum-of-squares is \, y-Hy\, ^2. However, because ''H'' does not correspond to an ordinary least-squares fit (i.e. is not an orthogonal projection), these sums-of-squares no longer have (scaled, non-central) chi-squared distributions, and dimensionally defined degrees-of-freedom are not useful. The ''effective degrees of freedom'' of the fit can be defined in various ways to implement goodness-of-fit tests, cross-validation, and other
statistical inference Statistical inference is the process of using data analysis to infer properties of an underlying distribution of probability.Upton, G., Cook, I. (2008) ''Oxford Dictionary of Statistics'', OUP. . Inferential statistical analysis infers properti ...
procedures. Here one can distinguish between ''regression effective degrees of freedom'' and ''residual effective degrees of freedom''.


Regression effective degrees of freedom

For the regression effective degrees of freedom, appropriate definitions can include the trace of the hat matrix, tr(''H''), the trace of the quadratic form of the hat matrix, tr(''H'H''), the form tr(2''H'' – ''H'' ''H), or the Satterthwaite approximation, . In the case of linear regression, the hat matrix ''H'' is ''X''(''X'' '''X'')−1''X ''', and all these definitions reduce to the usual degrees of freedom. Notice that :\operatorname(H) = \sum_i h_ = \sum_i \frac, the regression (not residual) degrees of freedom in linear models are "the sum of the sensitivities of the fitted values with respect to the observed response values", i.e. the sum of leverage scores. One way to help to conceptualize this is to consider a simple smoothing matrix like a Gaussian blur, used to mitigate data noise. In contrast to a simple linear or polynomial fit, computing the effective degrees of freedom of the smoothing function is not straight-forward. In these cases, it is important to estimate the Degrees of Freedom permitted by the H matrix so that the residual degrees of freedom can then be used to estimate statistical tests such as \chi^2 .


Residual effective degrees of freedom

There are corresponding definitions of residual effective degrees-of-freedom (redf), with ''H'' replaced by ''I'' − ''H''. For example, if the goal is to estimate error variance, the redf would be defined as tr((''I'' − ''H'')'(''I'' − ''H'')), and the unbiased estimate is (with \hat=y-Hy), :\hat\sigma^2 = \frac, or:Trevor Hastie, Robert Tibshirani (1990)
''Generalized additive models''
CRC Press, (p. 54) and (eq.(B.1), p. 305))
Simon N. Wood (2006)
''Generalized additive models: an introduction with R''
CRC Press, (eq.(4,14), p. 172)
:\hat\sigma^2 = \frac = \frac : \hat\sigma^2 \approx \frac. The last approximation above reduces the computational cost from ''O''(''n''2) to only ''O''(''n''). In general the numerator would be the objective function being minimized; e.g., if the hat matrix includes an observation covariance matrix, Σ, then \, \hat\, ^2 becomes \hat'\Sigma^\hat.


General

Note that unlike in the original case, non-integer degrees of freedom are allowed, though the value must usually still be constrained between 0 and ''n''. Consider, as an example, the ''k''- nearest neighbour smoother, which is the average of the ''k'' nearest measured values to the given point. Then, at each of the ''n'' measured points, the weight of the original value on the linear combination that makes up the predicted value is just 1/''k''. Thus, the trace of the hat matrix is ''n/k''. Thus the smooth costs ''n/k'' effective degrees of freedom. As another example, consider the existence of nearly duplicated observations. Naive application of classical formula, ''n'' − ''p'', would lead to over-estimation of the residuals degree of freedom, as if each observation were independent. More realistically, though, the hat matrix would involve an observation covariance matrix Σ indicating the non-zero correlation among observations. The more general formulation of effective degree of freedom would result in a more realistic estimate for, e.g., the error variance σ2, which in its turn scales the unknown parameters' ''a posteriori'' standard deviation; the degree of freedom will also affect the expansion factor necessary to produce an error ellipse for a given
confidence level In frequentist statistics, a confidence interval (CI) is a range of estimates for an unknown parameter. A confidence interval is computed at a designated ''confidence level''; the 95% confidence level is most common, but other levels, such as ...
.


Other formulations

Similar concepts are the ''equivalent degrees of freedom'' in
non-parametric regression Nonparametric regression is a category of regression analysis in which the predictor does not take a predetermined form but is constructed according to information derived from the data. That is, no parametric form is assumed for the relationship ...
, the ''degree of freedom of signal'' in atmospheric studies, and the ''non-integer degree of freedom'' in geodesy.H. Theil (1963), "On the Use of Incomplete Prior Information in Regression Analysis", '' Journal of the American Statistical Association'', 58 (302), 401–414 (eq.(5.19)–(5.20)) The residual sum-of-squares \, y-Hy\, ^2 has a
generalized chi-squared distribution In probability theory and statistics, the generalized chi-squared distribution (or generalized chi-square distribution) is the distribution of a quadratic form of a multinormal variable (normal vector), or a linear combination of different norm ...
, and the theory associated with this distributionJones, D.A. (1983) "Statistical analysis of empirical models fitted by optimisation", Biometrika, 70 (1), 67–88 provides an alternative route to the answers provided above.


See also

* Bessel's correction * Chi-squared per degree of freedom * Pooled degrees of freedom * Replication (statistics) *
Sample size Sample size determination is the act of choosing the number of observations or replicates to include in a statistical sample. The sample size is an important feature of any empirical study in which the goal is to make inferences about a populati ...
*
Statistical model A statistical model is a mathematical model that embodies a set of statistical assumptions concerning the generation of sample data (and similar data from a larger population). A statistical model represents, often in considerably idealized form ...
* Variance


References


Further reading

* * * *
Transcription by C Olsen with errata


External links

* Yu, Chong-ho (1997

* Dallal, GE. (2003

{{DEFAULTSORT:Degrees Of Freedom (Statistics) Statistical theory eo:Grado de libereco ja:自由度 no:Frihetsgrad su:Tingkat kabebasan