statistics Statistics (from German language, German: ''wikt:Statistik#German, Statistik'', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of ...

and in particular

statistical theory The theory of statistics provides a basis for the whole range of techniques, in both study design and data analysis, that are used within applications of statistics. The theory covers approaches to statistical-decision problems and to statistica ...

, unbiased estimation of a standard deviation is the calculation from a statistical sample of an estimated value of the

standard deviation In statistics, the standard deviation is a measure of the amount of variation or dispersion of a set of values. A low standard deviation indicates that the values tend to be close to the mean (also called the expected value) of the set, while ...

(a measure of

statistical dispersion In statistics, dispersion (also called variability, scatter, or spread) is the extent to which a Probability distribution, distribution is stretched or squeezed. Common examples of measures of statistical dispersion are the variance, standard de ...

) of a

population Population typically refers to the number of people in a single area, whether it be a city or town, region, country, continent, or the world. Governments typically quantify the size of the resident population within their jurisdiction using a ...

of values, in such a way that the

expected value In probability theory, the expected value (also called expectation, expectancy, mathematical expectation, mean, average, or first moment) is a generalization of the weighted average. Informally, the expected value is the arithmetic mean of a l ...

of the calculation equals the true value. Except in some important situations, outlined later, the task has little relevance to applications of statistics since its need is avoided by standard procedures, such as the use of

significance test A statistical hypothesis test is a method of statistical inference used to decide whether the data at hand sufficiently support a particular hypothesis. Hypothesis testing allows us to make probabilistic statements about population parameters. ...

s and

confidence intervals In frequentist statistics, a confidence interval (CI) is a range of estimates for an unknown parameter. A confidence interval is computed at a designated ''confidence level''; the 95% confidence level is most common, but other levels, such as 9 ...

, or by using

Bayesian analysis Bayesian inference is a method of statistical inference in which Bayes' theorem is used to update the probability for a hypothesis as more evidence or information becomes available. Bayesian inference is an important technique in statistics, and ...

. However, for statistical theory, it provides an exemplar problem in the context of

estimation theory Estimation theory is a branch of statistics that deals with estimating the values of parameters based on measured empirical data that has a random component. The parameters describe an underlying physical setting in such a way that their valu ...

which is both simple to state and for which results cannot be obtained in closed form. It also provides an example where imposing the requirement for unbiased estimation might be seen as just adding inconvenience, with no real benefit.

Motivation

, the

of a population of numbers is often estimated from a

random sample In statistics, quality assurance, and survey methodology, sampling is the selection of a subset (a statistical sample) of individuals from within a statistical population to estimate characteristics of the whole population. Statisticians attempt ...

drawn from the population. This is the sample standard deviation, which is defined by :

s = \sqrt,

where

\

is the sample (formally, realizations from a

random variable A random variable (also called random quantity, aleatory variable, or stochastic variable) is a mathematical formalization of a quantity or object which depends on random events. It is a mapping or a function from possible outcomes (e.g., the po ...

''X'') and

\overline

is the

sample mean The sample mean (or "empirical mean") and the sample covariance are statistics computed from a Sample (statistics), sample of data on one or more random variables. The sample mean is the average value (or mean, mean value) of a sample (statistic ...

. One way of seeing that this is a

biased estimator In statistics, the bias of an estimator (or bias function) is the difference between this estimator's expected value and the true value of the parameter being estimated. An estimator or decision rule with zero bias is called ''unbiased''. In st ...

of the standard deviation of the population (if that exists and the samples are drawn independently with replacement) is that already ''s''² is an estimator of the population

variance In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its population mean or sample mean. Variance is a measure of dispersion, meaning it is a measure of how far a set of numbers ...

σ² with a low bias. Because only linear functions commute with taking expectations and the square root is a strictly concave function, it follows from

Jensen's inequality In mathematics, Jensen's inequality, named after the Danish mathematician Johan Jensen, relates the value of a convex function of an integral to the integral of the convex function. It was proved by Jensen in 1906, building on an earlier pr ...

that the square root of the sample variance also has a low bias. The use of ''n'' − 1 instead of ''n'' in the formula for the sample variance is known as

Bessel's correction In statistics, Bessel's correction is the use of ''n'' − 1 instead of ''n'' in the formula for the sample variance and sample standard deviation, where ''n'' is the number of observations in a sample. This method corrects the bias in t ...

, and it gives :

s = \sqrt.

It corrects the bias in the estimation of the population ''variance,'' and some, but not all of the bias in the estimation of the population ''standard deviation.'' It is not possible to find an estimate of the standard deviation which is unbiased for all population distributions, as the bias depends on the particular distribution. Much of the following relates to estimation assuming a

normal distribution In statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is : f(x) = \frac e^ The parameter \mu ...

Bias correction

Results for the normal distribution

When the random variable is normally distributed, a minor correction exists to eliminate the bias. To derive the correction, note that for normally distributed ''X'',

Cochran's theorem In statistics, Cochran's theorem, devised by William G. Cochran, is a theorem used to justify results relating to the probability distributions of statistics that are used in the analysis of variance. Statement Let ''U''1, ..., ''U'N'' be i.i. ...

implies that

(n-1) s^2/\sigma^2

has a chi square distribution with

n-1

degrees of freedom Degrees of freedom (often abbreviated df or DOF) refers to the number of independent variables or parameters of a thermodynamic system. In various scientific fields, the word "freedom" is used to describe the limits to which physical movement or ...

and thus its square root,

\sqrt s/\sigma

has a

chi distribution In probability theory and statistics, the chi distribution is a continuous probability distribution. It is the distribution of the positive square root of the sum of squares of a set of independent random variables each following a standard norm ...

with

n-1

degrees of freedom. Consequently, calculating the expectation of this last expression and rearranging constants, :

= c_4(n)\sigma

where the correction factor

c_4(n)

is the scale mean of the chi distribution with

n-1

degrees of freedom,

\mu_1/\sqrt

. This depends on the sample size ''n,'' and is given as follows: :

c_4(n) = \sqrt \frac = 1 - \frac - \frac - \frac + O(n^)

where Γ(·) is the

gamma function In mathematics, the gamma function (represented by , the capital letter gamma from the Greek alphabet) is one commonly used extension of the factorial function to complex numbers. The gamma function is defined for all complex numbers except ...

. An unbiased estimator of ''σ'' can be obtained by dividing

s

c_4(n)

. As

n

grows large it approaches 1, and even for smaller values the correction is minor. The figure shows a plot of

c_4(n)

versus sample size. The table below gives numerical values of

c_4(n)

and algebraic expressions for some values of

n

; more complete tables may be found in most textbooks on

statistical quality control Statistical process control (SPC) or statistical quality control (SQC) is the application of statistical methods to monitor and control the quality of a production process. This helps to ensure that the process operates efficiently, producing ...

. It is important to keep in mind this correction only produces an unbiased estimator for normally and independently distributed ''X''. When this condition is satisfied, another result about ''s'' involving

c_4(n)

is that the

standard error The standard error (SE) of a statistic (usually an estimate of a parameter) is the standard deviation of its sampling distribution or an estimate of that standard deviation. If the statistic is the sample mean, it is called the standard error ...

of ''s'' is

\sigma\sqrt

, while the

of the unbiased estimator is

\sigma\sqrt .

Rule of thumb for the normal distribution

If calculation of the function ''c''₄(''n'') appears too difficult, there is a simple rule of thumb to take the estimator :

\hat\sigma = \sqrt

The formula differs from the familiar expression for ''s''² only by having instead of in the denominator. This expression is only approximate; in fact, :

= \sigma\cdot\left ( 1 + \frac + \frac + O(n^) \right).

The bias is relatively small: say, for

n=3

it is equal to 1.3%, and for

n=9

the bias is already 0.1%.

Other distributions

In cases where

statistically independent Independence is a fundamental notion in probability theory, as in statistics and the theory of stochastic processes. Two events are independent, statistically independent, or stochastically independent if, informally speaking, the occurrence of o ...

data are modelled by a parametric family of distributions other than the

, the population standard deviation will, if it exists, be a function of the parameters of the model. One general approach to estimation would be

maximum likelihood In statistics, maximum likelihood estimation (MLE) is a method of estimation theory, estimating the Statistical parameter, parameters of an assumed probability distribution, given some observed data. This is achieved by Mathematical optimization, ...

. Alternatively, it may be possible to use the

Rao–Blackwell theorem In statistics, the Rao–Blackwell theorem, sometimes referred to as the Rao–Blackwell–Kolmogorov theorem, is a result which characterizes the transformation of an arbitrarily crude estimator into an estimator that is optimal by the mean-squa ...

as a route to finding a good estimate of the standard deviation. In neither case would the estimates obtained usually be unbiased. Notionally, theoretical adjustments might be obtainable to lead to unbiased estimates but, unlike those for the normal distribution, these would typically depend on the estimated parameters. If the requirement is simply to reduce the bias of an estimated standard deviation, rather than to eliminate it entirely, then two practical approaches are available, both within the context of resampling. These are

jackknifing Jackknifing is the folding of an articulated vehicle so that it resembles the acute angle of a folding pocket knife. If a vehicle towing a trailer skids, the trailer can push the towing vehicle from behind until it spins the vehicle around and ...

and

bootstrapping In general, bootstrapping usually refers to a self-starting process that is supposed to continue or grow without external input. Etymology Tall boots may have a tab, loop or handle at the top known as a bootstrap, allowing one to use fingers ...

. Both can be applied either to parametrically based estimates of the standard deviation or to the sample standard deviation. For non-normal distributions an approximate (up to ''O''(''n''⁻¹) terms) formula for the unbiased estimator of the standard deviation is :

\hat\sigma = \sqrt,

where ''γ''₂ denotes the population

excess kurtosis In probability theory and statistics, kurtosis (from el, κυρτός, ''kyrtos'' or ''kurtos'', meaning "curved, arching") is a measure of the "tailedness" of the probability distribution of a real-valued random variable. Like skewness, kurtosi ...

. The excess kurtosis may be either known beforehand for certain distributions, or estimated from the data.

Effect of autocorrelation (serial correlation)

The material above, to stress the point again, applies only to independent data. However, real-world data often does not meet this requirement; it is autocorrelated (also known as serial correlation). As one example, the successive readings of a measurement instrument that incorporates some form of “smoothing” (more correctly, low-pass filtering) process will be autocorrelated, since any particular value is calculated from some combination of the earlier and later readings. Estimates of the variance, and standard deviation, of autocorrelated data will be biased. The expected value of the sample variance is :

\left^2\right = \sigma ^2  \left 1- \frac  \sum_^ \left( 1-\frac  \right)\rho _k \right /math>

where ''n'' is the sample size (number of measurements) and '' \rho_k'' is the autocorrelation function (ACF) of the data. (Note that the expression in the brackets is simply one minus the average expected autocorrelation for the readings.) If the ACF consists of positive values then the estimate of the variance (and its square root, the standard deviation) will be biased low. That is, the actual variability of the data will be greater than that indicated by an uncorrected variance or standard deviation calculation. It is essential to recognize that, if this expression is to be used to correct for the bias, by dividing the estimate s^2 by the quantity in brackets above, then the ACF must be known analytically, not via estimation from the data. This is because the estimated ACF will itself be biased.

Example of bias in standard deviation

To illustrate the magnitude of the bias in the standard deviation, consider a dataset that consists of sequential readings from an instrument that uses a specific digital filter whose ACF is known to be given by :

\rho _k  = (1 - \alpha)^k

where ''α'' is the parameter of the filter, and it takes values from zero to unity. Thus the ACF is positive and geometrically decreasing. The figure shows the ratio of the estimated standard deviation to its known value (which can be calculated analytically for this digital filter), for several settings of ''α'' as a function of sample size ''n''. Changing ''α'' alters the variance reduction ratio of the filter, which is known to be :

=\frac

so that smaller values of ''α'' result in more variance reduction, or “smoothing.” The bias is indicated by values on the vertical axis different from unity; that is, if there were no bias, the ratio of the estimated to known standard deviation would be unity. Clearly, for modest sample sizes there can be significant bias (a factor of two, or more).

Variance of the mean

It is often of interest to estimate the variance or standard deviation of an estimated mean rather than the variance of a population. When the data are autocorrelated, this has a direct effect on the theoretical variance of the sample mean, which is :

\left \overline  x \right = \frac \left 1+2 \sum_^  \right .

The variance of the sample mean can then be estimated by substituting an estimate of ''σ''². One such estimate can be obtained from the equation for E ²given above. First define the following constants, assuming, again, a known ACF: :

\gamma_1 \equiv 1 - \frac  \sum_^   \rho _k

\gamma _2 \equiv 1 + 2 \sum_^   \rho_k

so that :

= \sigma ^2

This says that the expected value of the quantity obtained by dividing the observed sample variance by the correction factor

\gamma_1

gives an unbiased estimate of the variance. Similarly, re-writing the expression above for the variance of the mean, :

= \frac \gamma _2

and substituting the estimate for

\sigma^2

givesLaw and Kelton, p.285 :

\left \overline  \right = \left \frac \left( \frac \right) \right = \left \frac \left\ \right /math>

which is an unbiased estimator of the variance of the mean in terms of the observed sample variance and known quantities. If the autocorrelations \rho_k are identically zero, this expression reduces to the well-known result for the variance of the mean for independent data. The effect of the expectation operator in these expressions is that the equality holds in the mean (i.e., on average).

Estimating the standard deviation of the population

Having the expressions above involving the variance of the population, and of an estimate of the mean of that population, it would seem logical to simply take the square root of these expressions to obtain unbiased estimates of the respective standard deviations. However it is the case that, since expectations are integrals, :

ne \sqrt \ne \sigma \sqrt

Instead, assume a function ''θ'' exists such that an unbiased estimator of the standard deviation can be written :

=\sigma \theta \sqrt \Rightarrow \hat \sigma= \frac

and ''θ'' depends on the sample size ''n'' and the ACF. In the case of NID (normally and independently distributed) data, the radicand is unity and ''θ'' is just the ''c''₄ function given in the first section above. As with ''c''₄, ''θ'' approaches unity as the sample size increases (as does ''γ₁''). It can be demonstrated via simulation modeling that ignoring ''θ'' (that is, taking it to be unity) and using :

\approx \sigma \sqrt \Rightarrow \hat \sigma \approx \frac

removes all but a few percent of the bias caused by autocorrelation, making this a ''reduced''-bias estimator, rather than an ''un''biased estimator. In practical measurement situations, this reduction in bias can be significant, and useful, even if some relatively small bias remains. The figure above, showing an example of the bias in the standard deviation vs. sample size, is based on this approximation; the actual bias would be somewhat larger than indicated in those graphs since the transformation bias ''θ'' is not included there.

Estimating the standard deviation of the sample mean

The unbiased variance of the mean in terms of the population variance and the ACF is given by :

= \frac \gamma _2

and since there are no expected values here, in this case the square root can be taken, so that :

\sigma_= \frac  \sqrt

Using the unbiased estimate expression above for ''σ'', an estimate of the standard deviation of the mean will then be :

\hat \sigma_ = \frac \frac

If the data are NID, so that the ACF vanishes, this reduces to :

\hat \sigma_ =\frac

In the presence of a nonzero ACF, ignoring the function ''θ'' as before leads to the ''reduced''-bias estimator :

\hat \sigma _ \approx \frac \frac  = \frac \sqrt

which again can be demonstrated to remove a useful majority of the bias.

References

* Douglas C. Montgomery and George C. Runger, ''Applied Statistics and Probability for Engineers'', 3rd edition, Wiley and sons, 2003. (see Sections 7–2.2 and 16–5)

External links

*
Java interactive graphic
showing the Helmert PDF from which the bias correction factors are derived.
Monte-Carlo simulation demo for unbiased estimation of standard deviation.
* http://www.itl.nist.gov/div898/handbook/pmc/section3/pmc32.htm What are Variables Control Charts? {{DEFAULTSORT:Unbiased Estimation Of Standard Deviation Estimation methods Summary statistics Covariance and correlation