statistics Statistics (from German language, German: ', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a s ...

, the 68–95–99.7 rule, also known as the empirical rule, and sometimes abbreviated 3sr or 3, is a shorthand used to remember the percentage of values that lie within an interval estimate in a

normal distribution In probability theory and statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is f(x) = \frac ...

: approximately 68%, 95%, and 99.7% of the values lie within one, two, and three

standard deviation In statistics, the standard deviation is a measure of the amount of variation of the values of a variable about its Expected value, mean. A low standard Deviation (statistics), deviation indicates that the values tend to be close to the mean ( ...

s of the

mean A mean is a quantity representing the "center" of a collection of numbers and is intermediate to the extreme values of the set of numbers. There are several kinds of means (or "measures of central tendency") in mathematics, especially in statist ...

, respectively. In mathematical notation, these facts can be expressed as follows, where is the probability function, is an observation from a normally distributed

random variable A random variable (also called random quantity, aleatory variable, or stochastic variable) is a Mathematics, mathematical formalization of a quantity or object which depends on randomness, random events. The term 'random variable' in its mathema ...

, (mu) is the mean of the distribution, and (sigma) is its standard deviation:

\begin
  \Pr(\mu-1\sigma \le X \le \mu+1\sigma) & \approx 68.27\% \\
  \Pr(\mu-2\sigma \le X \le \mu+2\sigma) & \approx 95.45\% \\
  \Pr(\mu-3\sigma \le X \le \mu+3\sigma) & \approx 99.73\%
\end

The usefulness of this heuristic especially depends on the question under consideration. In the

empirical science In philosophy, empiricism is an epistemological view which holds that true knowledge or justification comes only or primarily from sensory experience and empirical evidence. It is one of several competing views within epistemology, along ...

s, the so-called three-sigma rule of thumb (or 3 rule) expresses a conventional

heuristic A heuristic or heuristic technique (''problem solving'', '' mental shortcut'', ''rule of thumb'') is any approach to problem solving that employs a pragmatic method that is not fully optimized, perfected, or rationalized, but is nevertheless ...

that nearly all values are taken to lie within three standard deviations of the mean, and thus it is empirically useful to treat 99.7%

probability Probability is a branch of mathematics and statistics concerning events and numerical descriptions of how likely they are to occur. The probability of an event is a number between 0 and 1; the larger the probability, the more likely an e ...

as near certainty. In the

social sciences Social science (often rendered in the plural as the social sciences) is one of the branches of science, devoted to the study of society, societies and the Social relation, relationships among members within those societies. The term was former ...

, a result may be considered

statistically significant In statistical hypothesis testing, a result has statistical significance when a result at least as "extreme" would be very infrequent if the null hypothesis were true. More precisely, a study's defined significance level, denoted by \alpha, is the ...

if its confidence level is of the order of a two-sigma effect (95%), while in

particle physics Particle physics or high-energy physics is the study of Elementary particle, fundamental particles and fundamental interaction, forces that constitute matter and radiation. The field also studies combinations of elementary particles up to the s ...

, there is a convention of requiring statistical significance of a five-sigma effect (99.99994% confidence) to qualify as a

discovery Discovery may refer to: * Discovery (observation), observing or finding something unknown * Discovery (fiction), a character's learning something unknown * Discovery (law), a process in courts of law relating to evidence Discovery, The Discovery ...

. A weaker three-sigma rule can be derived from

Chebyshev's inequality In probability theory, Chebyshev's inequality (also called the Bienaymé–Chebyshev inequality) provides an upper bound on the probability of deviation of a random variable (with finite variance) from its mean. More specifically, the probability ...

, stating that even for non-normally distributed variables, at least 88.8% of cases should fall within properly calculated three-sigma intervals. For unimodal distributions, the probability of being within the interval is at least 95% by the Vysochanskij–Petunin inequality. There may be certain assumptions for a distribution that force this probability to be at least 98%.See: * * *

Proof

We have that

\begin\Pr(\mu -n\sigma \leq X \leq \mu + n\sigma) = \int_^ \frac e^ dx, \end

doing the change of variable in terms of the

standard score In statistics, the standard score or ''z''-score is the number of standard deviations by which the value of a raw score (i.e., an observed value or data point) is above or below the mean value of what is being observed or measured. Raw scores ...

z = \frac

, we have

\begin\frac \int_^ e^dz\end,

and this integral is independent of

\mu

and

\sigma

. We only need to calculate each integral for the cases

n = 1,2,3

\begin
\Pr(\mu -1\sigma \leq X \leq \mu + 1\sigma) &= \frac \int_^ e^dz \approx 0.6826894921 \\
\Pr(\mu -2\sigma \leq X \leq \mu + 2\sigma) &= \frac\int_^ e^dz \approx 0.9544997361 \\
\Pr(\mu -3\sigma \leq X \leq \mu + 3\sigma) &= \frac\int_^ e^dz \approx 0.9973002039.
\end

Cumulative distribution function

These numerical values "68%, 95%, 99.7%" come from the cumulative distribution function of the normal distribution. The

prediction interval In statistical inference, specifically predictive inference, a prediction interval is an estimate of an interval (statistics), interval in which a future observation will fall, with a certain probability, given what has already been observed. Pr ...

for any

''z'' corresponds numerically to . For example, , or , corresponding to a prediction interval of . This is not a symmetrical interval – this is merely the probability that an observation is less than . To compute the probability that an observation is within two standard deviations of the mean (small differences due to rounding):

\Pr(\mu-2\sigma \le X \le \mu+2\sigma)
 = \Phi(2) - \Phi(-2)
 \approx 0.9772 - (1 - 0.9772)
 \approx 0.9545

This is related to confidence interval as used in statistics:

\bar \pm 2\frac

is approximately a 95% confidence interval when

\bar

is the average of a sample of size

n

Normality tests

The "68–95–99.7 rule" is often used to quickly get a rough probability estimate of something, given its standard deviation, if the population is assumed to be normal. It is also used as a simple test for

outliers In statistics, an outlier is a data point that differs significantly from other observations. An outlier may be due to a variability in the measurement, an indication of novel data, or it may be the result of experimental error; the latter ar ...

if the population is assumed normal, and as a normality test if the population is potentially not normal. To pass from a sample to a number of standard deviations, one first computes the deviation, either the error or residual depending on whether one knows the population mean or only estimates it. The next step is

standardizing In statistics, the standard score or ''z''-score is the number of standard deviations by which the value of a raw score (i.e., an observed value or data point) is above or below the mean value of what is being observed or measured. Raw scores ...

(dividing by the population standard deviation), if the population parameters are known, or studentizing (dividing by an estimate of the standard deviation), if the parameters are unknown and only estimated. To use as a test for outliers or a normality test, one computes the size of deviations in terms of standard deviations, and compares this to expected frequency. Given a sample set, one can compute the

studentized residual In statistics, a studentized residual is the dimensionless ratio resulting from the division of a errors and residuals in statistics, residual by an estimator, estimate of its standard deviation, both expressed in the same Unit of measurement, ...

s and compare these to the expected frequency: points that fall more than 3 standard deviations from the norm are likely outliers (unless the

sample size Sample size determination or estimation is the act of choosing the number of observations or replicates to include in a statistical sample. The sample size is an important feature of any empirical study in which the goal is to make inferences abo ...

is significantly large, by which point one expects a sample this extreme), and if there are many points more than 3 standard deviations from the norm, one likely has reason to question the assumed normality of the distribution. This holds ever more strongly for moves of 4 or more standard deviations. One can compute more precisely, approximating the number of extreme moves of a given magnitude or greater by a

Poisson distribution In probability theory and statistics, the Poisson distribution () is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time if these events occur with a known const ...

, but simply, if one has multiple 4 standard deviation moves in a sample of size 1,000, one has strong reason to consider these outliers or question the assumed normality of the distribution. For example, a 6''σ'' event corresponds to a chance of about two

parts per billion In science and engineering, the parts-per notation is a set of pseudo-units to describe the small values of miscellaneous dimensionless quantities, e.g. mole fraction or mass fraction. Since these fractions are quantity-per-quantity measur ...

. For illustration, if events are taken to occur daily, this would correspond to an event expected every 1.4 million years. This gives a simple normality test: if one witnesses a 6''σ'' in daily data and significantly fewer than 1 million years have passed, then a normal distribution most likely does not provide a good model for the magnitude or frequency of large deviations in this respect. In '' The Black Swan'',

Nassim Nicholas Taleb Nassim Nicholas Taleb (; alternatively ''Nessim ''or'' Nissim''; born 12 September 1960) is a Lebanese-American essayist, mathematical statistician, former option trader, risk analyst, and aphorist. His work concerns problems of randomness, ...

gives the example of risk models according to which the Black Monday crash would correspond to a 36-''σ'' event: the occurrence of such an event should instantly suggest that the model is flawed, i.e. that the process under consideration is not satisfactorily modeled by a normal distribution. Refined models should then be considered, e.g. by the introduction of

stochastic volatility In statistics, stochastic volatility models are those in which the variance of a stochastic process is itself randomly distributed. They are used in the field of mathematical finance to evaluate derivative securities, such as options. The name ...

. In such discussions it is important to be aware of the problem of the

gambler's fallacy The gambler's fallacy, also known as the Monte Carlo fallacy or the fallacy of the maturity of chances, is the belief that, if an event (whose occurrences are Independent and identically distributed random variables, independent and identically dis ...

, which states that a single observation of a rare event does not contradict that the event is in fact rare. It is the observation of a plurality of purportedly rare events that increasingly undermines the hypothesis that they are rare, i.e. the validity of the assumed model. A proper modelling of this process of gradual loss of confidence in a hypothesis would involve the designation of

prior probability A prior probability distribution of an uncertain quantity, simply called the prior, is its assumed probability distribution before some evidence is taken into account. For example, the prior could be the probability distribution representing the ...

not just to the hypothesis itself but to all possible alternative hypotheses. For this reason,

statistical hypothesis testing A statistical hypothesis test is a method of statistical inference used to decide whether the data provide sufficient evidence to reject a particular hypothesis. A statistical hypothesis test typically involves a calculation of a test statistic. T ...

works not so much by confirming a hypothesis considered to be likely, but by refuting hypotheses considered unlikely.

Table of numerical values

Because of the exponentially decreasing tails of the normal distribution, odds of higher deviations decrease very quickly. From the rules for normally distributed data for a daily event:

References

External links