68–95–99.7 rule
   HOME

TheInfoList



OR:

In statistics, the 68–95–99.7 rule, also known as the empirical rule, is a shorthand used to remember the percentage of values that lie within an interval estimate in a
normal distribution In statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is : f(x) = \frac e^ The parameter \mu ...
: 68%, 95%, and 99.7% of the values lie within one, two, and three standard deviations of the
mean There are several kinds of mean in mathematics, especially in statistics. Each mean serves to summarize a given group of data, often to better understand the overall value (magnitude and sign) of a given data set. For a data set, the '' ari ...
, respectively. In mathematical notation, these facts can be expressed as follows, where Pr() is the probability function, Χ is an observation from a normally distributed random variable, '' μ'' (mu) is the mean of the distribution, and '' σ'' (sigma) is its standard deviation: :\begin \Pr(\mu-1\sigma \le X \le \mu+1\sigma) & \approx 68.27\% \\ \Pr(\mu-2\sigma \le X \le \mu+2\sigma) & \approx 95.45\% \\ \Pr(\mu-3\sigma \le X \le \mu+3\sigma) & \approx 99.73\% \end The usefulness of this heuristic especially depends on the question under consideration. In the
empirical science In philosophy, empiricism is an epistemological theory that holds that knowledge or justification comes only or primarily from sensory experience. It is one of several views within epistemology, along with rationalism and skepticism. Empiri ...
s, the so-called three-sigma rule of thumb (or 3σ rule) expresses a conventional
heuristic A heuristic (; ), or heuristic technique, is any approach to problem solving or self-discovery that employs a practical method that is not guaranteed to be optimal, perfect, or rational, but is nevertheless sufficient for reaching an immediate ...
that nearly all values are taken to lie within three standard deviations of the mean, and thus it is empirically useful to treat 99.7%
probability Probability is the branch of mathematics concerning numerical descriptions of how likely an event is to occur, or how likely it is that a proposition is true. The probability of an event is a number between 0 and 1, where, roughly speakin ...
as near certainty. In the
social sciences Social science is one of the branches of science, devoted to the study of societies and the relationships among individuals within those societies. The term was formerly used to refer to the field of sociology, the original "science of so ...
, a result may be considered " significant" if its
confidence level In frequentist statistics, a confidence interval (CI) is a range of estimates for an unknown parameter. A confidence interval is computed at a designated ''confidence level''; the 95% confidence level is most common, but other levels, such as 9 ...
is of the order of a two-sigma effect (95%), while in
particle physics Particle physics or high energy physics is the study of fundamental particles and forces that constitute matter and radiation. The fundamental particles in the universe are classified in the Standard Model as fermions (matter particles) an ...
, there is a convention of a five-sigma effect (99.99994% confidence) being required to qualify as a
discovery Discovery may refer to: * Discovery (observation), observing or finding something unknown * Discovery (fiction), a character's learning something unknown * Discovery (law), a process in courts of law relating to evidence Discovery, The Discover ...
. A weaker three-sigma rule can be derived from
Chebyshev's inequality In probability theory, Chebyshev's inequality (also called the Bienaymé–Chebyshev inequality) guarantees that, for a wide class of probability distributions, no more than a certain fraction of values can be more than a certain distance from th ...
, stating that even for non-normally distributed variables, at least 88.8% of cases should fall within properly calculated three-sigma intervals. For unimodal distributions, the probability of being within the interval is at least 95% by the Vysochanskij–Petunin inequality. There may be certain assumptions for a distribution that force this probability to be at least 98%.See: * * *


Cumulative distribution function

These numerical values "68%, 95%, 99.7%" come from the cumulative distribution function of the normal distribution. The prediction interval for any standard score ''z'' corresponds numerically to (1−(1−Φ''μ'',''σ''2(z))·2). For example, Φ(2) ≈ 0.9772, or Pr(''X'' ≤ ''μ'' + 2''σ'') ≈ 0.9772, corresponding to a prediction interval of (1 − (1 − 0.97725)·2) = 0.9545 = 95.45%. This is not a symmetrical interval – this is merely the probability that an observation is less than ''μ'' + 2''σ''. To compute the probability that an observation is within two standard deviations of the mean (small differences due to rounding): :\Pr(\mu-2\sigma \le X \le \mu+2\sigma) = \Phi(2) - \Phi(-2) \approx 0.9772 - (1 - 0.9772) \approx 0.9545 This is related to confidence interval as used in statistics: \bar \pm 2\frac is approximately a 95% confidence interval when \bar is the average of a sample of size n.


Normality tests

The "68–95–99.7 rule" is often used to quickly get a rough probability estimate of something, given its standard deviation, if the population is assumed to be normal. It is also used as a simple test for
outliers In statistics, an outlier is a data point that differs significantly from other observations. An outlier may be due to a variability in the measurement, an indication of novel data, or it may be the result of experimental error; the latter are ...
if the population is assumed normal, and as a
normality test In statistics, normality tests are used to determine if a data set is well-modeled by a normal distribution and to compute how likely it is for a random variable underlying the data set to be normally distributed. More precisely, the tests are a fo ...
if the population is potentially not normal. To pass from a sample to a number of standard deviations, one first computes the deviation, either the error or residual depending on whether one knows the population mean or only estimates it. The next step is standardizing (dividing by the population standard deviation), if the population parameters are known, or
studentizing In statistics, Studentization, named after William Sealy Gosset, who wrote under the pseudonym ''Student'', is the adjustment consisting of division of a first-degree statistic derived from a sample, by a sample-based estimate of a population stan ...
(dividing by an estimate of the standard deviation), if the parameters are unknown and only estimated. To use as a test for outliers or a normality test, one computes the size of deviations in terms of standard deviations, and compares this to expected frequency. Given a sample set, one can compute the
studentized residual In statistics, a studentized residual is the quotient resulting from the division of a residual by an estimate of its standard deviation. It is a form of a Student's ''t''-statistic, with the estimate of error varying between points. This is ...
s and compare these to the expected frequency: points that fall more than 3 standard deviations from the norm are likely outliers (unless the
sample size Sample size determination is the act of choosing the number of observations or replicates to include in a statistical sample. The sample size is an important feature of any empirical study in which the goal is to make inferences about a populatio ...
is significantly large, by which point one expects a sample this extreme), and if there are many points more than 3 standard deviations from the norm, one likely has reason to question the assumed normality of the distribution. This holds ever more strongly for moves of 4 or more standard deviations. One can compute more precisely, approximating the number of extreme moves of a given magnitude or greater by a
Poisson distribution In probability theory and statistics, the Poisson distribution is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time or space if these events occur with a known co ...
, but simply, if one has multiple 4 standard deviation moves in a sample of size 1,000, one has strong reason to consider these outliers or question the assumed normality of the distribution. For example, a 6''σ'' event corresponds to a chance of about two
parts per billion In science and engineering, the parts-per notation is a set of pseudo-units to describe small values of miscellaneous dimensionless quantities, e.g. mole fraction or mass fraction. Since these fractions are quantity-per-quantity measures, they ...
. For illustration, if events are taken to occur daily, this would correspond to an event expected every 1.4 million years. This gives a simple normality test: if one witnesses a 6''σ'' in daily data and significantly fewer than 1 million years have passed, then a normal distribution most likely does not provide a good model for the magnitude or frequency of large deviations in this respect. In '' The Black Swan'', Nassim Nicholas Taleb gives the example of risk models according to which the Black Monday crash would correspond to a 36-''σ'' event: the occurrence of such an event should instantly suggest that the model is flawed, i.e. that the process under consideration is not satisfactorily modeled by a normal distribution. Refined models should then be considered, e.g. by the introduction of
stochastic volatility In statistics, stochastic volatility models are those in which the variance of a stochastic process is itself randomly distributed. They are used in the field of mathematical finance to evaluate derivative securities, such as options. The name d ...
. In such discussions it is important to be aware of the problem of the gambler's fallacy, which states that a single observation of a rare event does not contradict that the event is in fact rare. It is the observation of a plurality of purportedly rare events that increasingly undermines the hypothesis that they are rare, i.e. the validity of the assumed model. A proper modelling of this process of gradual loss of confidence in a hypothesis would involve the designation of
prior probability In Bayesian statistical inference, a prior probability distribution, often simply called the prior, of an uncertain quantity is the probability distribution that would express one's beliefs about this quantity before some evidence is taken into ...
not just to the hypothesis itself but to all possible alternative hypotheses. For this reason, statistical hypothesis testing works not so much by confirming a hypothesis considered to be likely, but by refuting hypotheses considered unlikely.


Table of numerical values

Because of the exponentially decreasing tails of the normal distribution, odds of higher deviations decrease very quickly. From the rules for normally distributed data for a daily event:


See also

* ''p''-value * Six Sigma#Sigma levels * Standard score * ''t''-statistic


References


External links

*
The Normal Distribution
by Balasubramanian Narasimhan *
Calculate percentage proportion within ''x'' sigmas
at WolframAlpha {{DEFAULTSORT:68-95-99.7 rule Normal distribution Statistical approximations Rules of thumb pl:Odchylenie standardowe#Dla rozkładu normalnego