In
statistics
Statistics (from German language, German: ''wikt:Statistik#German, Statistik'', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of ...
and in particular
statistical theory
The theory of statistics provides a basis for the whole range of techniques, in both study design and data analysis, that are used within applications of statistics.
The theory covers approaches to statistical-decision problems and to statistica ...
, unbiased estimation of a standard deviation is the calculation from a
statistical sample of an estimated value of the
standard deviation
In statistics, the standard deviation is a measure of the amount of variation or dispersion of a set of values. A low standard deviation indicates that the values tend to be close to the mean (also called the expected value) of the set, while ...
(a measure of
statistical dispersion
In statistics, dispersion (also called variability, scatter, or spread) is the extent to which a Probability distribution, distribution is stretched or squeezed. Common examples of measures of statistical dispersion are the variance, standard de ...
) of a
population
Population typically refers to the number of people in a single area, whether it be a city or town, region, country, continent, or the world. Governments typically quantify the size of the resident population within their jurisdiction using a ...
of values, in such a way that the
expected value
In probability theory, the expected value (also called expectation, expectancy, mathematical expectation, mean, average, or first moment) is a generalization of the weighted average. Informally, the expected value is the arithmetic mean of a l ...
of the calculation equals the true value. Except in some important situations, outlined later, the task has little relevance to applications of statistics since its need is avoided by standard procedures, such as the use of
significance test
A statistical hypothesis test is a method of statistical inference used to decide whether the data at hand sufficiently support a particular hypothesis.
Hypothesis testing allows us to make probabilistic statements about population parameters.
...
s and
confidence intervals
In frequentist statistics, a confidence interval (CI) is a range of estimates for an unknown parameter. A confidence interval is computed at a designated ''confidence level''; the 95% confidence level is most common, but other levels, such as 9 ...
, or by using
Bayesian analysis
Bayesian inference is a method of statistical inference in which Bayes' theorem is used to update the probability for a hypothesis as more evidence or information becomes available. Bayesian inference is an important technique in statistics, and ...
.
However, for statistical theory, it provides an exemplar problem in the context of
estimation theory
Estimation theory is a branch of statistics that deals with estimating the values of parameters based on measured empirical data that has a random component. The parameters describe an underlying physical setting in such a way that their valu ...
which is both simple to state and for which results cannot be obtained in closed form. It also provides an example where imposing the requirement for
unbiased estimation might be seen as just adding inconvenience, with no real benefit.
Motivation
In
statistics
Statistics (from German language, German: ''wikt:Statistik#German, Statistik'', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of ...
, the
standard deviation
In statistics, the standard deviation is a measure of the amount of variation or dispersion of a set of values. A low standard deviation indicates that the values tend to be close to the mean (also called the expected value) of the set, while ...
of a population of numbers is often estimated from a
random sample
In statistics, quality assurance, and survey methodology, sampling is the selection of a subset (a statistical sample) of individuals from within a statistical population to estimate characteristics of the whole population. Statisticians attempt ...
drawn from the population. This is the sample standard deviation, which is defined by
:
where
is the sample (formally, realizations from a
random variable
A random variable (also called random quantity, aleatory variable, or stochastic variable) is a mathematical formalization of a quantity or object which depends on random events. It is a mapping or a function from possible outcomes (e.g., the po ...
''X'') and
is the
sample mean
The sample mean (or "empirical mean") and the sample covariance are statistics computed from a Sample (statistics), sample of data on one or more random variables.
The sample mean is the average value (or mean, mean value) of a sample (statistic ...
.
One way of seeing that this is a
biased estimator
In statistics, the bias of an estimator (or bias function) is the difference between this estimator's expected value and the true value of the parameter being estimated. An estimator or decision rule with zero bias is called ''unbiased''. In st ...
of the standard deviation of the population (if that exists and the samples are drawn independently with replacement) is that already ''s''
2 is an estimator of the population
variance
In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its population mean or sample mean. Variance is a measure of dispersion, meaning it is a measure of how far a set of numbers ...
σ
2 with a low bias. Because only linear functions commute with taking expectations and the square root is a strictly concave function, it follows from
Jensen's inequality
In mathematics, Jensen's inequality, named after the Danish mathematician Johan Jensen, relates the value of a convex function of an integral to the integral of the convex function. It was proved by Jensen in 1906, building on an earlier pr ...
that the square root of the sample variance also has a low bias.
The use of ''n'' − 1 instead of ''n'' in the formula for the sample variance is known as
Bessel's correction
In statistics, Bessel's correction is the use of ''n'' − 1 instead of ''n'' in the formula for the sample variance and sample standard deviation, where ''n'' is the number of observations in a sample. This method corrects the bias in t ...
, and it gives
:
It corrects the bias in the estimation of the population ''variance,'' and some, but not all of the bias in the estimation of the population ''standard deviation.''
It is not possible to find an estimate of the standard deviation which is unbiased for all population distributions, as the bias depends on the particular distribution. Much of the following relates to estimation assuming a
normal distribution
In statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is
:
f(x) = \frac e^
The parameter \mu ...
.
Bias correction
Results for the normal distribution
When the random variable is
normally distributed, a minor correction exists to eliminate the bias. To derive the correction, note that for normally distributed ''X'',
Cochran's theorem In statistics, Cochran's theorem, devised by William G. Cochran, is a theorem used to justify results relating to the probability distributions of statistics that are used in the analysis of variance.
Statement
Let ''U''1, ..., ''U'N'' be i.i. ...
implies that
has a
chi square distribution with
degrees of freedom
Degrees of freedom (often abbreviated df or DOF) refers to the number of independent variables or parameters of a thermodynamic system. In various scientific fields, the word "freedom" is used to describe the limits to which physical movement or ...
and thus its square root,
has a
chi distribution
In probability theory and statistics, the chi distribution is a continuous probability distribution. It is the distribution of the positive square root of the sum of squares of a set of independent random variables each following a standard norm ...
with
degrees of freedom. Consequently, calculating the expectation of this last expression and rearranging constants,
:
where the correction factor
is the scale mean of the chi distribution with
degrees of freedom,
. This depends on the sample size ''n,'' and is given as follows:
:
where Γ(·) is the
gamma function
In mathematics, the gamma function (represented by , the capital letter gamma from the Greek alphabet) is one commonly used extension of the factorial function to complex numbers. The gamma function is defined for all complex numbers except ...
. An unbiased estimator of ''σ'' can be obtained by dividing
by
. As
grows large it approaches 1, and even for smaller values the correction is minor. The figure shows a plot of
versus sample size. The table below gives numerical values of
and algebraic expressions for some values of
; more complete tables may be found in most textbooks on
statistical quality control
Statistical process control (SPC) or statistical quality control (SQC) is the application of statistical methods to monitor and control the quality of a production process. This helps to ensure that the process operates efficiently, producing ...
.
It is important to keep in mind this correction only produces an unbiased estimator for normally and independently distributed ''X''. When this condition is satisfied, another result about ''s'' involving
is that the
standard error
The standard error (SE) of a statistic (usually an estimate of a parameter) is the standard deviation of its sampling distribution or an estimate of that standard deviation. If the statistic is the sample mean, it is called the standard error ...
of ''s'' is
, while the
standard error
The standard error (SE) of a statistic (usually an estimate of a parameter) is the standard deviation of its sampling distribution or an estimate of that standard deviation. If the statistic is the sample mean, it is called the standard error ...
of the unbiased estimator is
Rule of thumb for the normal distribution
If calculation of the function ''c''
4(''n'') appears too difficult, there is a simple rule of thumb to take the estimator
:
The formula differs from the familiar expression for ''s''
2 only by having instead of in the denominator. This expression is only approximate; in fact,
:
The bias is relatively small: say, for
it is equal to 1.3%, and for
the bias is already 0.1%.
Other distributions
In cases where
statistically independent
Independence is a fundamental notion in probability theory, as in statistics and the theory of stochastic processes. Two events are independent, statistically independent, or stochastically independent if, informally speaking, the occurrence of o ...
data are modelled by a parametric family of distributions other than the
normal distribution
In statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is
:
f(x) = \frac e^
The parameter \mu ...
, the population standard deviation will, if it exists, be a function of the parameters of the model. One general approach to estimation would be
maximum likelihood
In statistics, maximum likelihood estimation (MLE) is a method of estimation theory, estimating the Statistical parameter, parameters of an assumed probability distribution, given some observed data. This is achieved by Mathematical optimization, ...
. Alternatively, it may be possible to use the
Rao–Blackwell theorem
In statistics, the Rao–Blackwell theorem, sometimes referred to as the Rao–Blackwell–Kolmogorov theorem, is a result which characterizes the transformation of an arbitrarily crude estimator into an estimator that is optimal by the mean-squa ...
as a route to finding a good estimate of the standard deviation. In neither case would the estimates obtained usually be unbiased. Notionally, theoretical adjustments might be obtainable to lead to unbiased estimates but, unlike those for the normal distribution, these would typically depend on the estimated parameters.
If the requirement is simply to reduce the bias of an estimated standard deviation, rather than to eliminate it entirely, then two practical approaches are available, both within the context of
resampling. These are
jackknifing
Jackknifing is the folding of an articulated vehicle so that it resembles the acute angle of a folding pocket knife. If a vehicle towing a trailer skids, the trailer can push the towing vehicle from behind until it spins the vehicle around and ...
and
bootstrapping
In general, bootstrapping usually refers to a self-starting process that is supposed to continue or grow without external input.
Etymology
Tall boots may have a tab, loop or handle at the top known as a bootstrap, allowing one to use fingers ...
. Both can be applied either to parametrically based estimates of the standard deviation or to the sample standard deviation.
For non-normal distributions an approximate (up to ''O''(''n''
−1) terms) formula for the unbiased estimator of the standard deviation is
:
where ''γ''
2 denotes the population
excess kurtosis
In probability theory and statistics, kurtosis (from el, κυρτός, ''kyrtos'' or ''kurtos'', meaning "curved, arching") is a measure of the "tailedness" of the probability distribution of a real-valued random variable. Like skewness, kurtosi ...
. The excess kurtosis may be either known beforehand for certain distributions, or estimated from the data.
Effect of autocorrelation (serial correlation)
The material above, to stress the point again, applies only to independent data. However, real-world data often does not meet this requirement; it is
autocorrelated (also known as serial correlation). As one example, the successive readings of a measurement instrument that incorporates some form of “smoothing” (more correctly, low-pass filtering) process will be autocorrelated, since any particular value is calculated from some combination of the earlier and later readings.
Estimates of the variance, and standard deviation, of autocorrelated data will be biased. The expected value of the sample variance is
: