statistics Statistics (from German language, German: ', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a s ...

, Bessel's correction is the use of ''n'' − 1 instead of ''n'' in the formula for the

sample variance In probability theory and statistics, variance is the expected value of the squared deviation from the mean of a random variable. The standard deviation (SD) is obtained as the square root of the variance. Variance is a measure of dispersion, ...

and sample standard deviation, where ''n'' is the number of observations in a sample. This method corrects the bias in the estimation of the population variance. It also partially corrects the bias in the estimation of the population standard deviation. However, the correction often increases the

mean squared error In statistics, the mean squared error (MSE) or mean squared deviation (MSD) of an estimator (of a procedure for estimating an unobserved quantity) measures the average of the squares of the errors—that is, the average squared difference betwee ...

in these estimations. This technique is named after

Friedrich Bessel Friedrich Wilhelm Bessel (; 22 July 1784 – 17 March 1846) was a German astronomer, mathematician, physicist, and geodesy, geodesist. He was the first astronomer who determined reliable values for the distance from the Sun to another star by th ...

Formulation

In estimating the population

variance In probability theory and statistics, variance is the expected value of the squared deviation from the mean of a random variable. The standard deviation (SD) is obtained as the square root of the variance. Variance is a measure of dispersion ...

from a sample when the population mean is unknown, the uncorrected sample variance is the ''mean'' of the squares of deviations of sample values from the sample mean (i.e., using a multiplicative factor 1/''n''). In this case, the sample variance is a biased estimator of the population variance. Multiplying the uncorrected sample variance by the factor :

\frac n

gives an ''unbiased'' estimator of the population variance. In some literature, the above factor is called Bessel's correction. One can understand Bessel's correction as the

degrees of freedom In many scientific fields, the degrees of freedom of a system is the number of parameters of the system that may vary independently. For example, a point in the plane has two degrees of freedom for translation: its two coordinates; a non-infinite ...

in the residuals vector (residuals, not errors, because the population mean is unknown): :

(x_1-\overline,\,\dots,\,x_n-\overline),

where

\overline

is the sample mean. While there are ''n'' independent observations in the sample, there are only ''n'' − 1 independent residuals, as they sum to 0. For a more intuitive explanation of the need for Bessel's correction, see . Generally Bessel's correction is an approach to reduce the bias due to finite sample size. Such finite-sample bias correction is also needed for other estimates like skew and kurtosis, but in these the inaccuracies are often significantly larger. To fully remove such bias it is necessary to do a more complex multi-parameter estimation. For instance a correct correction for the standard deviation depends on the kurtosis (normalized central 4th moment), but this again has a finite sample bias and it depends on the standard deviation, i.e., both estimations have to be merged.

Caveats

There are three caveats to consider regarding Bessel's correction: # It does not yield an unbiased estimator of standard ''deviation''. # The corrected estimator often has a higher

(MSE) than the uncorrected estimator. Furthermore, there is no population distribution for which it has the minimum MSE because a different scale factor can always be chosen to minimize MSE. # It is only necessary when the population mean is unknown (and estimated as the sample mean). In practice, this generally happens. Firstly, while the sample variance (using Bessel's correction) is an unbiased estimator of the population variance, its

square root In mathematics, a square root of a number is a number such that y^2 = x; in other words, a number whose ''square'' (the result of multiplying the number by itself, or y \cdot y) is . For example, 4 and −4 are square roots of 16 because 4 ...

, the sample standard deviation, is a ''biased'' estimate of the population standard deviation; because the square root is a

concave function In mathematics, a concave function is one for which the function value at any convex combination of elements in the domain is greater than or equal to that convex combination of those domain elements. Equivalently, a concave function is any funct ...

, the bias is downward, by Jensen's inequality. There is no general formula for an unbiased estimator of the population standard deviation, though there are correction factors for particular distributions, such as the normal; see unbiased estimation of standard deviation for details. An approximation for the exact correction factor for the normal distribution is given by using ''n'' − 1.5 in the formula: the bias decays quadratically (rather than linearly, as in the uncorrected form and Bessel's corrected form). Secondly, the unbiased estimator does not minimize mean squared error (MSE), and generally has worse MSE than the uncorrected estimator (this varies with excess kurtosis). MSE can be minimized by using a different factor. The optimal value depends on excess kurtosis, as discussed in mean squared error: variance; for the normal distribution this is optimized by dividing by ''n'' + 1 (instead of ''n'' − 1 or ''n''). Thirdly, Bessel's correction is only necessary when the population mean is unknown, and one is estimating ''both'' population mean ''and'' population variance from a given sample, using the sample mean to estimate the population mean. In that case there are ''n'' degrees of freedom in a sample of ''n'' points, and simultaneous estimation of mean and variance means one degree of freedom goes to the sample mean and the remaining ''n'' − 1 degrees of freedom (the ''residuals'') go to the sample variance. However, if the population mean is known, then the deviations of the observations from the population mean have ''n'' degrees of freedom (because the mean is not being estimated – the deviations are not residuals but ''errors'') and Bessel's correction is not applicable.

Source of bias

Most simply, to understand the bias that needs correcting, think of an extreme case. Suppose the population is (0,0,0,1,2,9), which has a population mean of 2 and a population variance of

31/3

. A sample of ''n'' = 1 is drawn, and it turns out to be

x_1=0.

The best estimate of the population mean is

\bar = x_1/n = 0/1 = 0.

But what if we use the formula

(x_1-\bar)^2/n = (0-0)^2/1 = 0

to estimate the variance? The estimate of the variance would be zero – and the estimate would be zero for any population and any sample of ''n'' = 1. The problem is that in estimating the sample mean, the process has already made our estimate of the mean close to the value we sampled—identical, for ''n'' = 1. In the case of ''n'' = 1, the variance just cannot be estimated, because there is no variability in the sample. But consider ''n'' = 2. Suppose the sample were (0, 2). Then

\bar=1

and

\left x_1-\bar)^2  + (x_2-\bar)^2\right /n =  (1+1)/2 = 1

, but with Bessel's correction,

\left x_1-\bar)^2  + (x_2-\bar)^2\right /(n-1) =  (1+1)/1 = 2

, which is an unbiased estimate (if all possible samples of ''n'' = 2 are taken and this method is used, the average estimate will be 12.4, same as the sample variance with Bessel's correction.) To see this in more detail, consider the following example. Suppose the mean of the whole population is 2050, but the statistician does not know that, and must estimate it based on this small sample chosen randomly from the population: :

2051,\quad 2053,\quad 2055,\quad 2050,\quad 2051

One may compute the sample average: :

\frac\left(2051 + 2053 + 2055 + 2050 + 2051\right) = 2052

This may serve as an observable estimate of the unobservable population average, which is 2050. Now we face the problem of estimating the population variance. That is the average of the squares of the deviations from 2050. If we knew that the population average is 2050, we could proceed as follows: :

= & \frac = 7.2 \end

But our estimate of the population average is the sample average, 2052. The actual average, 2050, is unknown. So the sample average, 2052, must be used: :

= & \frac = 3.2 \end

The variance is now smaller, and it (almost) always is. The only exception occurs when the sample average and the population average are the same. To understand why, consider that variance ''measures distance from a point'', and within a given sample, the average is precisely that point which minimises the distances. A variance calculation using ''any'' other average value must produce a larger result. To see this algebraically, we use a simple identity: :

(a+b)^2 = a^2 + 2ab + b^2

With

a

representing the deviation of an individual sample from the sample mean, and

b

representing the deviation of the sample mean from the population mean. Note that we've simply decomposed the actual deviation of an individual sample from the (unknown) population mean into two components: the deviation of the single sample from the sample mean, which we can compute, and the additional deviation of the sample mean from the population mean, which we can not. Now, we apply this identity to the squares of deviations from the population mean: :

2 \\ & = \overbrace^ + \overbrace^ + \overbrace^ \end

Now apply this to all five observations and observe certain patterns: :

\begin
  \overbrace^\  &+\  \overbrace^\  &&+\  \overbrace^ \\
  (2053 - 2052)^2\  &+\  2(2053 - 2052)(2052 - 2050)\  &&+\  (2052 - 2050)^2 \\
  (2055 - 2052)^2\  &+\  2(2055 - 2052)(2052 - 2050)\  &&+\  (2052 - 2050)^2 \\
  (2050 - 2052)^2\  &+\  2(2050 - 2052)(2052 - 2050)\  &&+\  (2052 - 2050)^2 \\
  (2051 - 2052)^2\  &+\  \underbrace_\ &&+\  (2052 - 2050)^2
\end

The sum of the entries in the middle column must be zero because the term ''a'' will be added across all 5 rows, which itself must equal zero. That is because ''a'' contains the 5 individual samples (left side within parentheses) which – when added – naturally have the same sum as adding 5 times the sample mean of those 5 numbers (2052). This means that a subtraction of these two sums must equal zero. The factor 2 and the term b in the middle column are equal for all rows, meaning that the relative difference across all rows in the middle column stays the same and can therefore be disregarded. The following statements explain the meaning of the remaining columns: * The sum of the entries in the first column (''a''²) is the sum of the squares of the distance from sample to sample mean; *The sum of the entries in the last column (''b''²) is the sum of squared distances between the measured sample mean and the correct population mean * Every single row now consists of pairs of ''a''² (biased, because the sample mean is used) and ''b''² (correction of bias, because it takes the difference between the "real" population mean and the inaccurate sample mean into account). Therefore, the sum of all entries of the first and last column now represents the correct variance, meaning that now the sum of squared distance between samples and population mean is used * The sum of the ''a''²-column and the b²-column must be bigger than the sum within entries of the ''a''²-column, since all the entries within the b²-column are positive (except when the population mean is the same as the sample mean, in which case all of the numbers in the last column will be 0). Therefore: * The sum of squares of the distance from samples to the ''population'' mean will always be bigger than the sum of squares of the distance to the ''sample'' mean, except when the sample mean happens to be the same as the population mean, in which case the two are equal. That is why the sum of squares of the deviations from the ''sample'' mean is too small to give an unbiased estimate of the population variance when the average of those squares is found. The smaller the sample size, the larger is the difference between the sample variance and the population variance.

Terminology

This correction is so common that the term "sample variance" and "sample standard deviation" are frequently used to mean the corrected estimators (unbiased sample variation, less biased sample standard deviation), using ''n'' − 1. However caution is needed: some calculators and software packages may provide for both or only the more unusual formulation. This article uses the following symbols and definitions: *''μ'' is the population mean *

\overline

is the sample mean *''σ''² is the population variance *''s_n''² is the biased sample variance (i.e., without Bessel's correction) *''s''² is the unbiased sample variance (i.e., with Bessel's correction) The standard deviations will then be the square roots of the respective variances. Since the square root introduces bias, the terminology "uncorrected" and "corrected" is preferred for the standard deviation estimators: *''s_n'' is the uncorrected sample standard deviation (i.e., without Bessel's correction) *''s'' is the corrected sample standard deviation (i.e., with Bessel's correction), which is less biased, but still biased

Formula

The sample mean is given by

\overline=\frac\sum_^n x_i.

The biased sample variance is then written:

s_n^2 = \frac  \sum_^n  \left(x_i - \overline \right)^ 2 = \frac - \frac

and the unbiased sample variance is written:

s^2 = \frac  \sum_^n  \left(x_i - \overline \right)^ 2 = \frac - \frac = \left(\frac\right)\,s_n^2.

Proof

Suppose thus that

X_1, \ldots, X_n

are independent and identically distributed random variables with expectation

\mu

and variance

\sigma^2

. Knowing the values of the

X_1, \ldots, X_n

at an outcome

\omega \in \Omega

of the underlying sample space, we would like to get a good estimate for the variance

\sigma^2

, which is unknown. To this end, we construct a mathematical formula containing the

X_1, \ldots, X_n

such that the expectation of this formula is precisely

\sigma^2

. This means that on average, this formula should produce the right answer. The educated, but naive way of guessing the variance formula would be :

\frac \sum_^n (x_k - \overline x)^2

, where

x_k = X_k(\omega)

. This would be the variance if we had a discrete random variable on the discrete probability space

\

that had value

x_k

k

. But let us calculate the expected value of this expression: :

\\ & = \frac \operatorname(X_1)\end

Therefore, our initial guess was wrong by a factor of

\tfrac

. This is precisely Bessel's correction. The last step used that the sum in question splits into one with equal resp. unequal indices. For independent and identically distributed variables this thus results in multiples of

\mathbb E_1^2 /math> resp. \mathbb E_1 2 :
: \mathbb E \left \left( \sum_^n (X_k - X_j) \right)^2 \right /math>
: = \mathbb E \left \sum_^n \sum_^n (X_k - X_j)(X_k - X_l) \right /math>
: = \mathbb E_1^2 cdot(n^2+2n\cdot(-1)+n\cdot(-1)^2) + \mathbb E_1 2\cdot(2n(n-1)\cdot(-1) + n(n-1)\cdot(-1)^2) : = n(n-1) \big(\mathbb E_1^2 - \mathbb E_1 2\big) : = n(n-1) \operatorname(X_1)

Notes

External links

*
Animated experiment demonstrating the correction, at Khan Academy
{{DEFAULTSORT:Bessel's Correction Statistical deviation and dispersion Estimation methods Articles containing proofs