Chebyshev Inequality
   HOME

TheInfoList



OR:

In
probability theory Probability theory is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expressing it through a set o ...
, Chebyshev's inequality (also called the Bienaymé–Chebyshev inequality) guarantees that, for a wide class of
probability distributions In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment. It is a mathematical description of a random phenomenon ...
, no more than a certain fraction of values can be more than a certain distance from the
mean There are several kinds of mean in mathematics, especially in statistics. Each mean serves to summarize a given group of data, often to better understand the overall value (magnitude and sign) of a given data set. For a data set, the ''arithme ...
. Specifically, no more than 1/''k''2 of the distribution's values can be ''k'' or more
standard deviations In statistics, the standard deviation is a measure of the amount of variation or dispersion of a set of values. A low standard deviation indicates that the values tend to be close to the mean (also called the expected value) of the set, while ...
away from the mean (or equivalently, at least 1 − 1/''k''2 of the distribution's values are less than ''k'' standard deviations away from the mean). The rule is often called Chebyshev's theorem, about the range of standard deviations around the mean, in statistics. The inequality has great utility because it can be applied to any probability distribution in which the mean and variance are defined. For example, it can be used to prove the
weak law of large numbers In probability theory, the law of large numbers (LLN) is a theorem that describes the result of performing the same experiment a large number of times. According to the law, the average of the results obtained from a large number of trials shou ...
. Its practical usage is similar to the 68–95–99.7 rule, which applies only to
normal distribution In statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is : f(x) = \frac e^ The parameter \mu ...
s. Chebyshev's inequality is more general, stating that a minimum of just 75% of values must lie within two standard deviations of the mean and 88.89% within three standard deviations for a broad range of different
probability distributions In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment. It is a mathematical description of a random phenomenon ...
. The term ''Chebyshev's inequality'' may also refer to
Markov's inequality In probability theory, Markov's inequality gives an upper bound for the probability that a non-negative function (mathematics), function of a random variable is greater than or equal to some positive Constant (mathematics), constant. It is named a ...
, especially in the context of analysis. They are closely related, and some authors refer to
Markov's inequality In probability theory, Markov's inequality gives an upper bound for the probability that a non-negative function (mathematics), function of a random variable is greater than or equal to some positive Constant (mathematics), constant. It is named a ...
as "Chebyshev's First Inequality," and the similar one referred to on this page as "Chebyshev's Second Inequality."


History

The theorem is named after Russian mathematician
Pafnuty Chebyshev Pafnuty Lvovich Chebyshev ( rus, Пафну́тий Льво́вич Чебышёв, p=pɐfˈnutʲɪj ˈlʲvovʲɪtɕ tɕɪbɨˈʂof) ( – ) was a Russian mathematician and considered to be the founding father of Russian mathematics. Chebyshe ...
, although it was first formulated by his friend and colleague
Irénée-Jules Bienaymé Irénée-Jules Bienaymé (; 28 August 1796 – 19 October 1878) was a French statistician. He built on the legacy of Laplace generalizing his least squares method. He contributed to the fields of probability and statistics, and to their applicati ...
. The theorem was first stated without proof by Bienaymé in 1853 and later proved by Chebyshev in 1867. His student
Andrey Markov Andrey Andreyevich Markov, first name also spelled "Andrei", in older works also spelled Markoff) (14 June 1856 – 20 July 1922) was a Russian mathematician best known for his work on stochastic processes. A primary subject of his research lat ...
provided another proof in his 1884 Ph.D. thesis.Markov A. (1884) On certain applications of algebraic continued fractions, Ph.D. thesis, St. Petersburg


Statement

Chebyshev's inequality is usually stated for
random variable A random variable (also called random quantity, aleatory variable, or stochastic variable) is a mathematical formalization of a quantity or object which depends on random events. It is a mapping or a function from possible outcomes (e.g., the po ...
s, but can be generalized to a statement about measure spaces.


Probabilistic statement

Let ''X'' (integrable) be a
random variable A random variable (also called random quantity, aleatory variable, or stochastic variable) is a mathematical formalization of a quantity or object which depends on random events. It is a mapping or a function from possible outcomes (e.g., the po ...
with finite
expected value In probability theory, the expected value (also called expectation, expectancy, mathematical expectation, mean, average, or first moment) is a generalization of the weighted average. Informally, the expected value is the arithmetic mean of a l ...
''μ'' and finite non-zero
variance In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its population mean or sample mean. Variance is a measure of dispersion, meaning it is a measure of how far a set of numbers ...
''σ''2. Then for any
real number In mathematics, a real number is a number that can be used to measure a ''continuous'' one-dimensional quantity such as a distance, duration or temperature. Here, ''continuous'' means that values can have arbitrarily small variations. Every real ...
, : \Pr(, X-\mu, \geq k\sigma) \leq \frac. Only the case k > 1 is useful. When k \leq 1 the right-hand side \frac \geq 1 and the inequality is trivial as all probabilities are ≤ 1. As an example, using k = \sqrt shows that the probability that values lie outside the interval (\mu - \sqrt\sigma, \mu + \sqrt\sigma) does not exceed \frac. Because it can be applied to completely arbitrary distributions provided they have a known finite mean and variance, the inequality generally gives a poor bound compared to what might be deduced if more aspects are known about the distribution involved.


Measure-theoretic statement

Let (''X'', Σ, μ) be a
measure space A measure space is a basic object of measure theory, a branch of mathematics that studies generalized notions of volumes. It contains an underlying set, the subsets of this set that are feasible for measuring (the -algebra) and the method that i ...
, and let ''f'' be an
extended real In mathematics, the affinely extended real number system is obtained from the real number system \R by adding two infinity elements: +\infty and -\infty, where the infinities are treated as actual numbers. It is useful in describing the algebra ...
-valued
measurable function In mathematics and in particular measure theory, a measurable function is a function between the underlying sets of two measurable spaces that preserves the structure of the spaces: the preimage of any measurable set is measurable. This is in di ...
defined on ''X''. Then for any real number ''t'' > 0 and 0 < ''p'' < ∞, :\mu(\) \leq \int_ , f, ^p \, d\mu. More generally, if ''g'' is an extended real-valued measurable function, nonnegative and nondecreasing, with g(t) \neq 0 then: :\mu(\) \leq \int_X g\circ f\, d\mu. The previous statement then follows by defining g(x) as , x, ^p if x\ge t and 0 otherwise.


Example

Suppose we randomly select a journal article from a source with an average of 1000 words per article, with a standard deviation of 200 words. We can then infer that the probability that it has between 600 and 1400 words (i.e. within ''k'' = 2 standard deviations of the mean) must be at least 75%, because there is no more than chance to be outside that range, by Chebyshev's inequality. But if we additionally know that the distribution is
normal Normal(s) or The Normal(s) may refer to: Film and television * ''Normal'' (2003 film), starring Jessica Lange and Tom Wilkinson * ''Normal'' (2007 film), starring Carrie-Anne Moss, Kevin Zegers, Callum Keith Rennie, and Andrew Airlie * ''Norma ...
, we can say there is a 75% chance the word count is between 770 and 1230 (which is an even tighter bound).


Sharpness of bounds

As shown in the example above, the theorem typically provides rather loose bounds. However, these bounds cannot in general (remaining true for arbitrary distributions) be improved upon. The bounds are sharp for the following example: for any ''k'' ≥ 1, : X = \begin -1, & \text\frac \\ 0, & \text1 - \frac \\ 1, & \text\frac \end For this distribution, the mean ''μ'' = 0 and the standard deviation ''σ'' = , so : \Pr(, X-\mu, \ge k\sigma) = \Pr(, X, \ge 1) = \frac. Chebyshev's inequality is an equality for precisely those distributions that are a
linear transformation In mathematics, and more specifically in linear algebra, a linear map (also called a linear mapping, linear transformation, vector space homomorphism, or in some contexts linear function) is a mapping V \to W between two vector spaces that pre ...
of this example.


Proof

Markov's inequality In probability theory, Markov's inequality gives an upper bound for the probability that a non-negative function (mathematics), function of a random variable is greater than or equal to some positive Constant (mathematics), constant. It is named a ...
states that for any real-valued random variable ''Y'' and any positive number ''a'', we have Pr(, ''Y'', ≥''a'') ≤ E(, ''Y'', )/''a''. One way to prove Chebyshev's inequality is to apply Markov's inequality to the random variable with ''a'' = (''kσ'')2: : \Pr(, X - \mu, \geq k\sigma) = \Pr((X - \mu)^2 \geq k^2\sigma^2) \leq \frac = \frac = \frac. It can also be proved directly using
conditional expectation In probability theory, the conditional expectation, conditional expected value, or conditional mean of a random variable is its expected value – the value it would take “on average” over an arbitrarily large number of occurrences – give ...
: :\begin \sigma^2&=\mathbb X-\mu)^2\ pt&=\mathbb X-\mu, Pr X-\mu, \mathbb X-\mu, Pr X-\mu, \\ pt&\geq(k\sigma)^2\Pr X-\mu, 0\cdot\Pr X-\mu, \\ pt&=k^2\sigma^2\Pr X-\mu, \end Chebyshev's inequality then follows by dividing by ''k''2''σ''2. This proof also shows why the bounds are quite loose in typical cases: the conditional expectation on the event where , ''X'' − ''μ'',  < ''kσ'' is thrown away, and the lower bound of ''k''2''σ''2 on the event , ''X'' − ''μ'',  ≥ ''kσ'' can be quite poor.


Extensions

Several extensions of Chebyshev's inequality have been developed.


Selberg's inequality

Selberg derived a generalization to arbitrary intervals. Suppose ''X'' is a random variable with mean ''μ'' and variance ''σ''''2''. Selberg's inequality states that : \Pr( X \in mu - \alpha, \mu + \beta) \ge \begin\frac &\text \alpha(\beta-\alpha) \geq 2\sigma^2 \\ \frac &\text 2\alpha\beta \geq \sigma^2 \geq \alpha(\beta - \alpha) \\ 0 & \sigma^2 \geq \alpha\beta\end When \alpha = \beta, this reduces to Chebyshev's inequality. These are known to be the best possible bounds.


Finite-dimensional vector

Chebyshev's inequality naturally extends to the multivariate setting, where one has ''n'' random variables with mean and variance ''σ''i2. Then the following inequality holds. :\Pr\left(\sum_^n (X_i - \mu_i)^2 \ge k^2 \sum_^n \sigma_i^2 \right) \le \frac This is known as the Birnbaum–Raymond–Zuckerman inequality after the authors who proved it for two dimensions. This result can be rewritten in terms of vectors with mean , standard deviation ''σ'' = (''σ''1, ''σ''2, ...), in the Euclidean norm . : \Pr(\, X - \mu \, \ge k \, \sigma \, ) \le \frac . One can also get a similar infinite-dimensional Chebyshev's inequality. A second related inequality has also been derived by Chen. Let be the
dimension In physics and mathematics, the dimension of a Space (mathematics), mathematical space (or object) is informally defined as the minimum number of coordinates needed to specify any Point (geometry), point within it. Thus, a Line (geometry), lin ...
of the stochastic vector and let be the mean of . Let be the
covariance matrix In probability theory and statistics, a covariance matrix (also known as auto-covariance matrix, dispersion matrix, variance matrix, or variance–covariance matrix) is a square matrix giving the covariance between each pair of elements of ...
and . Then : \Pr \left( ( X - \operatorname(X) )^T S^ (X - \operatorname(X)) < k \right) \ge 1 - \frac where ''Y''T is the
transpose In linear algebra, the transpose of a matrix is an operator which flips a matrix over its diagonal; that is, it switches the row and column indices of the matrix by producing another matrix, often denoted by (among other notations). The tr ...
of . The inequality can be written in terms of the
Mahalanobis distance The Mahalanobis distance is a measure of the distance between a point ''P'' and a distribution ''D'', introduced by P. C. Mahalanobis in 1936. Mahalanobis's definition was prompted by the problem of identifying the similarities of skulls based ...
as : \Pr \left( d^2_S(X,\operatorname(X)) < k \right) \ge 1 - \frac where the Mahalanobis distance based on S is defined by : d_S(x,y) =\sqrt Navarro proved that these bounds are sharp, that is, they are the best possible bounds for that regions when we just know the mean and the covariance matrix of X. Stellato et al. showed that this multivariate version of the Chebyshev inequality can be easily derived analytically as a special case of Vandenberghe et al. where the bound is computed by solving a semidefinite program (SDP).


Known correlation

If the variables are independent this inequality can be sharpened. :\Pr\left (\bigcap_^n \frac \le k_i \right ) \ge \prod_^n \left (1 - \frac \right) Berge derived an inequality for two correlated variables . Let be the correlation coefficient between ''X''1 and ''X''2 and let ''σ''''i''2 be the variance of . Then : \Pr\left( \bigcap_^2 \left \frac < k \right\right) \ge 1 - \frac . This result can be sharpened to having different bounds for the two random variablesLal D. N. (1955) A note on a form of Tchebycheff's inequality for two or more variables.
Sankhya ''Samkhya'' or ''Sankya'' (; Sanskrit सांख्य), IAST: ') is a dualistic school of Indian philosophy. It views reality as composed of two independent principles, '' puruṣa'' ('consciousness' or spirit); and ''prakṛti'', (nature ...
15(3):317–320
and having asymmetric bounds, as in Selberg's inequality. Isii K. (1959) On a method for generalizations of Tchebycheff's inequality. Ann Inst Stat Math 10: 65–88 Olkin and Pratt derived an inequality for correlated variables. : \Pr\left(\bigcap_^n \frac < k_i \right) \ge 1 - \frac \left(\sqrt + \sqrt \sqrt \right)^2 where the sum is taken over the ''n'' variables and : u = \sum_^n \frac + 2\sum_^n \sum_ \frac where is the correlation between and . Olkin and Pratt's inequality was subsequently generalised by Godwin.Godwin H. J. (1964) Inequalities on distribution functions. New York, Hafner Pub. Co.


Higher moments

Mitzenmacher and Upfal note that by applying Markov's inequality to the nonnegative variable , X - \operatorname(X) , ^n, one can get a family of tail bounds : \Pr\left(, X - \operatorname(X) , \ge k \operatorname(, X - \operatorname(X) , ^n )^\right) \le \frac , \qquad k >0, n \geq 2. For ''n'' = 2 we obtain Chebyshev's inequality. For ''k'' ≥ 1, ''n'' > 4 and assuming that the ''n''th moment exists, this bound is tighter than Chebyshev's inequality. This strategy, called the method of moments, is often used to prove tail bounds.


Exponential moment

A related inequality sometimes known as the exponential Chebyshev's inequality Section 2.1
is the inequality : \Pr(X \ge \varepsilon) \le e^\operatorname\left (e^ \right), \qquad t > 0. Let be the
cumulant generating function In probability theory and statistics, the cumulants of a probability distribution are a set of quantities that provide an alternative to the '' moments'' of the distribution. Any two probability distributions whose moments are identical will have ...
, : K( t ) = \log \left(\operatorname\left( e^ \right) \right). Taking the
Legendre–Fenchel transformation In mathematics and mathematical optimization, the convex conjugate of a function is a generalization of the Legendre transformation which applies to non-convex functions. It is also known as Legendre–Fenchel transformation, Fenchel transformati ...
of and using the exponential Chebyshev's inequality we have : -\log( \Pr (X \ge \varepsilon )) \ge \sup_t( t \varepsilon - K( t ) ). This inequality may be used to obtain exponential inequalities for unbounded variables. (the references for this article are corrected by )


Bounded variables

If P(''x'') has finite support based on the interval , let where , ''x'', is the
absolute value In mathematics, the absolute value or modulus of a real number x, is the non-negative value without regard to its sign. Namely, , x, =x if is a positive number, and , x, =-x if x is negative (in which case negating x makes -x positive), an ...
of . If the mean of P(''x'') is zero then for all Dufour (2003
Properties of moments of random variables
/ref> : \frac \le \Pr( , X , \ge k ) \le \frac. The second of these inequalities with is the Chebyshev bound. The first provides a lower bound for the value of P(''x'').


Finite samples


Univariate case

Saw ''et al'' extended Chebyshev's inequality to cases where the population mean and variance are not known and may not exist, but the sample mean and sample standard deviation from ''N'' samples are to be employed to bound the expected value of a new drawing from the same distribution. The following simpler version of this inequality is given by Kabán. : P( , X - m , \ge ks ) \le \frac 1 \left\lfloor \frac N \left(\frac + 1 \right) \right\rfloor where ''X'' is a random variable which we have sampled ''N'' times, ''m'' is the sample mean, ''k'' is a constant and ''s'' is the sample standard deviation. This inequality holds even when the population moments do not exist, and when the sample is only weakly exchangeably distributed; this criterion is met for randomised sampling. A table of values for the Saw–Yang–Mo inequality for finite sample sizes (''N'' < 100) has been determined by Konijn. The table allows the calculation of various confidence intervals for the mean, based on multiples, C, of the standard error of the mean as calculated from the sample. For example, Konijn shows that for ''N'' = 59, the 95 percent confidence interval for the mean ''m'' is where (this is 2.28 times larger than the value found on the assumption of normality showing the loss on precision resulting from ignorance of the precise nature of the distribution). If the standard deviation is a multiple of the mean then a further inequality can be derived, : P( , X - m , \ge ks ) \le \frac N \frac 1 \frac + \frac 1 N. A table of values for the Saw–Yang–Mo inequality for finite sample sizes (''N'' < 100) has been determined by Konijn. For fixed ''N'' and large ''m'' the Saw–Yang–Mo inequality is approximately : P( , X - m , \ge ks ) \le \frac 1 . Beasley ''et al'' have suggested a modification of this inequality : P( , X - m , \ge ks ) \le \frac 1 . In empirical testing this modification is conservative but appears to have low statistical power. Its theoretical basis currently remains unexplored.


Dependence on sample size

The bounds these inequalities give on a finite sample are less tight than those the Chebyshev inequality gives for a distribution. To illustrate this let the sample size ''N'' = 100 and let ''k'' = 3. Chebyshev's inequality states that at most approximately 11.11% of the distribution will lie at least three standard deviations away from the mean. Kabán's version of the inequality for a finite sample states that at most approximately 12.05% of the sample lies outside these limits. The dependence of the confidence intervals on sample size is further illustrated below. For ''N'' = 10, the 95% confidence interval is approximately ±13.5789 standard deviations. For ''N'' = 100 the 95% confidence interval is approximately ±4.9595 standard deviations; the 99% confidence interval is approximately ±140.0 standard deviations. For ''N'' = 500 the 95% confidence interval is approximately ±4.5574 standard deviations; the 99% confidence interval is approximately ±11.1620 standard deviations. For ''N'' = 1000 the 95% and 99% confidence intervals are approximately ±4.5141 and approximately ±10.5330 standard deviations respectively. The Chebyshev inequality for the distribution gives 95% and 99% confidence intervals of approximately ±4.472 standard deviations and ±10 standard deviations respectively.


Samuelson's inequality

Although Chebyshev's inequality is the best possible bound for an arbitrary distribution, this is not necessarily true for finite samples.
Samuelson's inequality In statistics, Samuelson's inequality, named after the economist Paul Samuelson, also called the Laguerre–Samuelson inequality, after the mathematician Edmond Laguerre, states that every one of any collection ''x''1, ..., ''x'n'' ...
states that all values of a sample will lie within standard deviations of the mean (with probability one). By comparison, Chebyshev's inequality states that all but a ''1/N'' fraction of the sample will lie within standard deviations of the mean. Since there are ''N'' samples, this means that no samples will lie outside standard deviations of the mean, which is worse than Samuelson's inequality. However, the benefit of Chebyshev's inequality is that it can be applied more generally to get confidence bounds for ranges of standard deviations that do not depend on the number of samples.


Semivariances

An alternative method of obtaining sharper bounds is through the use of
semivariance In spatial statistics the theoretical variogram 2\gamma(\mathbf_1,\mathbf_2) is a function describing the degree of spatial dependence of a spatial random field or stochastic process Z(\mathbf). The semivariogram \gamma(\mathbf_1,\mathbf_2) is hal ...
s (partial variances). The upper (''σ''+2) and lower (''σ''2) semivariances are defined as : \sigma_+^2 = \frac , : \sigma_-^2 = \frac , where ''m'' is the arithmetic mean of the sample and ''n'' is the number of elements in the sample. The variance of the sample is the sum of the two semivariances: : \sigma^2 = \sigma_+^2 + \sigma_-^2. In terms of the lower semivariance Chebyshev's inequality can be written : \Pr(x \le m - a \sigma_-) \le \frac . Putting : a = \frac . Chebyshev's inequality can now be written : \Pr(x \le m - k \sigma) \le \frac \frac . A similar result can also be derived for the upper semivariance. If we put : \sigma_u^2 = \max(\sigma_-^2, \sigma_+^2) , Chebyshev's inequality can be written : \Pr(, x \le m - k \sigma , ) \le \frac 1 \frac . Because ''σ''u2 ≤ ''σ''2, use of the semivariance sharpens the original inequality. If the distribution is known to be symmetric, then : \sigma_+^2 = \sigma_-^2 = \frac \sigma^2 and : \Pr(x \le m - k \sigma) \le \frac 1 . This result agrees with that derived using standardised variables. ;Note: The inequality with the lower semivariance has been found to be of use in estimating downside risk in finance and agriculture.


Multivariate case

Stellato et al. simplified the notation and extended the empirical Chebyshev inequality from Saw et al. to the multivariate case. Let \xi \in \mathbb^ be a random variable and let N \in \mathbb_. We draw N+1 iid samples of \xi denoted as \xi^,\dots,\xi^,\xi^ \in \mathbb^. Based on the first N samples, we define the empirical mean as \mu_N = \frac 1 N \sum_^N \xi^ and the unbiased empirical covariance as \Sigma_N = \frac 1 N \sum_^N (\xi^ - \mu_)(\xi^ - \mu_N)^\top. If \Sigma_N is nonsingular, then for all \lambda \in \mathbb_ then : \begin & P^ \left((\xi^ - \mu_N)^\top \Sigma_N^(\xi^ - \mu_N) \geq \lambda^2\right) \\ pt\leq & \min\left\. \end


Remarks

In the univariate case, i.e. n_\xi = 1, this inequality corresponds to the one from Saw et al. Moreover, the right-hand side can be simplified by upper bounding the floor function by its argument : P^\left((\xi^ - \mu_N)^\top \Sigma_N^(\xi^ - \mu_N) \geq \lambda^2\right) \leq \min\left\. As N \to \infty, the right-hand side tends to \min \left\ which corresponds to the multivariate Chebyshev inequality over ellipsoids shaped according to \Sigma and centered in \mu.


Sharpened bounds

Chebyshev's inequality is important because of its applicability to any distribution. As a result of its generality it may not (and usually does not) provide as sharp a bound as alternative methods that can be used if the distribution of the random variable is known. To improve the sharpness of the bounds provided by Chebyshev's inequality a number of methods have been developed; for a review see eg.


Cantelli's inequality

Cantelli's inequalityCantelli F. (1910) Intorno ad un teorema fondamentale della teoria del rischio. Bolletino dell Associazione degli Attuari Italiani due to
Francesco Paolo Cantelli Francesco Paolo Cantelli (20 December 187521 July 1966) was an Italian mathematician. He made contributions to celestial mechanics, probability theory, and actuarial science. Biography Cantelli was born in Palermo. He received his doctorate in ...
states that for a real random variable (''X'') with mean (''μ'') and variance (''σ''2) : P(X - \mu \ge a) \le \frac where ''a'' ≥ 0. This inequality can be used to prove a one tailed variant of Chebyshev's inequality with ''k'' > 0Grimmett and Stirzaker, problem 7.11.9. Several proofs of this result can be found i
Chebyshev's Inequalities
by A. G. McDowell.
: \Pr(X - \mu \geq k \sigma) \leq \frac. The bound on the one tailed variant is known to be sharp. To see this consider the random variable ''X'' that takes the values : X = 1 with probability \frac : X = - \sigma^2 with probability \frac . Then E(''X'') = 0 and E(''X''2) = ''σ''2 and P(''X'' < 1) = 1 / (1 + ''σ''2).


An application: distance between the mean and the median

The one-sided variant can be used to prove the proposition that for
probability distribution In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment. It is a mathematical description of a random phenomenon i ...
s having an
expected value In probability theory, the expected value (also called expectation, expectancy, mathematical expectation, mean, average, or first moment) is a generalization of the weighted average. Informally, the expected value is the arithmetic mean of a l ...
and a
median In statistics and probability theory, the median is the value separating the higher half from the lower half of a data sample, a population, or a probability distribution. For a data set, it may be thought of as "the middle" value. The basic fe ...
, the mean and the median can never differ from each other by more than one
standard deviation In statistics, the standard deviation is a measure of the amount of variation or dispersion of a set of values. A low standard deviation indicates that the values tend to be close to the mean (also called the expected value) of the set, while ...
. To express this in symbols let ''μ'', ''ν'', and ''σ'' be respectively the mean, the median, and the standard deviation. Then : \left , \mu - \nu \right , \leq \sigma. There is no need to assume that the variance is finite because this inequality is trivially true if the variance is infinite. The proof is as follows. Setting ''k'' = 1 in the statement for the one-sided inequality gives: :\Pr(X - \mu \geq \sigma) \leq \frac \implies \Pr(X \geq \mu + \sigma) \leq \frac. Changing the sign of ''X'' and of ''μ'', we get :\Pr(X \leq \mu - \sigma) \leq \frac. As the median is by definition any real number ''m'' that satisfies the inequalities :\operatorname(X\leq m) \geq \frac\text\operatorname(X\geq m) \geq \frac this implies that the median lies within one standard deviation of the mean. A proof using Jensen's inequality also exists.


Bhattacharyya's inequality

Bhattacharyya extended Cantelli's inequality using the third and fourth moments of the distribution. Let ''μ'' = 0 and ''σ''2 be the variance. Let ''γ'' = E(''X''3)/''σ''3 and κ = E(''X''4)/''σ''4. If ''k''2 − ''k''γ − 1 > 0 then : P(X > k\sigma) \le \frac. The necessity of ''k''2 − ''k''γ − 1 > 0 requires that ''k'' be reasonably large. In the case E ^30 this simplifies to :P(X > k\sigma) \le \frac \quad \text k > 1. Since \frac = \frac-\frac+O\left((k-1)^2\right) for ''k'' close to 1, this bound improves slightly over Cantelli's bound \frac-\frac+O\left((k-1)^2\right) as ''κ'' > 1. wins a factor 2 over Chebyshev's inequality.


Gauss's inequality

In 1823
Gauss Johann Carl Friedrich Gauss (; german: Gauß ; la, Carolus Fridericus Gauss; 30 April 177723 February 1855) was a German mathematician and physicist who made significant contributions to many fields in mathematics and science. Sometimes refer ...
showed that for a distribution with a unique mode at zero,Gauss C. F. Theoria Combinationis Observationum Erroribus Minimis Obnoxiae. Pars Prior. Pars Posterior. Supplementum. Theory of the Combination of Observations Least Subject to Errors. Part One. Part Two. Supplement. 1995. Translated by G. W. Stewart. Classics in Applied Mathematics Series, Society for Industrial and Applied Mathematics, Philadelphia : P( , X , \ge k ) \le \frac \quad\text \quad k^2 \ge \frac \operatorname (X^2) , : P( , X , \ge k ) \le 1 - \frac \quad \text \quad k^2 \le \frac \operatorname( X^2 ).


Vysochanskij–Petunin inequality

The Vysochanskij–Petunin inequality generalizes Gauss's inequality, which only holds for deviation from the mode of a unimodal distribution, to deviation from the mean, or more generally, any center. If ''X'' is a
unimodal distribution In mathematics, unimodality means possessing a unique mode. More generally, unimodality means there is only a single highest value, somehow defined, of some mathematical object. Unimodal probability distribution In statistics, a unimodal pr ...
with mean ''μ'' and variance ''σ''2, then the inequality states that : P( , X - \mu , \ge k \sigma ) \le \frac \quad \text \quad k \ge \sqrt = 1.633. : P( , X - \mu , \ge k \sigma ) \le \frac - \frac13 \quad \text \quad k \le \sqrt. For symmetrical unimodal distributions, the median and the mode are equal, so both the Vysochanskij–Petunin inequality and Gauss's inequality apply to the same center. Further, for symmetrical distributions, one-sided bounds can be obtained by noticing that : P( X - \mu \ge k \sigma ) = P( X - \mu \le -k \sigma ) = \frac P( , X - \mu, \ge k \sigma ). The additional fraction of 4/9 present in these tail bounds lead to better confidence intervals than Chebyshev's inequality. For example, for any symmetrical unimodal distribution, the Vysochanskij–Petunin inequality states that 4/(9 x 3^2) = 4/81 ≈ 4.9% of the distribution lies outside 3 standard deviations of the mode.


Bounds for specific distributions

DasGupta has shown that if the distribution is known to be normal : P( , X - \mu , \ge k \sigma ) \le \frac . From DasGupta's inequality it follows that for a normal distribution at least 95% lies within approximately 2.582 standard deviations of the mean. This is less sharp than the true figure (approximately 1.96 standard deviations of the mean). *DasGupta has determined a set of best possible bounds for a
normal distribution In statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is : f(x) = \frac e^ The parameter \mu ...
for this inequality. *Steliga and Szynal have extended these bounds to the
Pareto distribution The Pareto distribution, named after the Italian civil engineer, economist, and sociologist Vilfredo Pareto ( ), is a power-law probability distribution that is used in description of social, quality control, scientific, geophysical, actua ...
. *Grechuk et al. developed a general method for deriving the best possible bounds in Chebyshev's inequality for any family of distributions, and any
deviation risk measure In financial mathematics, a deviation risk measure is a function to quantify financial risk (and not necessarily downside risk) in a different method than a general risk measure. Deviation risk measures generalize the concept of standard deviation ...
in place of standard deviation. In particular, they derived Chebyshev inequality for distributions with log-concave densities.Grechuk, B., Molyboha, A., Zabarankin, M. (2010)
Chebyshev Inequalities with Law Invariant Deviation Measures
Probability in the Engineering and Informational Sciences, 24(1), 145-170.


Related inequalities

Several other related inequalities are also known.


Paley–Zygmund inequality

The Paley–Zygmund inequality gives a lower bound on tail probabilities, as opposed to Chebyshev's inequality which gives an upper bound.Godwin H. J. (1964) Inequalities on distribution functions. (Chapter 3) New York, Hafner Pub. Co. Applying it to the square of a random variable, we get : \Pr( , Z , > \theta \sqrt ) \ge \frac.


Haldane's transformation

One use of Chebyshev's inequality in applications is to create confidence intervals for variates with an unknown distribution. Haldane noted, using an equation derived by Kendall,Kendall M. G. (1943) The Advanced Theory of Statistics, 1. London that if a variate (''x'') has a zero mean, unit variance and both finite
skewness In probability theory and statistics, skewness is a measure of the asymmetry of the probability distribution of a real-valued random variable about its mean. The skewness value can be positive, zero, negative, or undefined. For a unimodal d ...
(''γ'') and
kurtosis In probability theory and statistics, kurtosis (from el, κυρτός, ''kyrtos'' or ''kurtos'', meaning "curved, arching") is a measure of the "tailedness" of the probability distribution of a real-valued random variable. Like skewness, kurtosi ...
(''κ'') then the variate can be converted to a normally distributed
standard score In statistics, the standard score is the number of standard deviations by which the value of a raw score (i.e., an observed value or data point) is above or below the mean value of what is being observed or measured. Raw scores above the mean ...
(''z''): : z = x - \frac (x^2 - 1) + \frac 2 \gamma^2 (4 x^2 - 7) - 3 \kappa (x^2 - 3) + \cdots This transformation may be useful as an alternative to Chebyshev's inequality or as an adjunct to it for deriving confidence intervals for variates with unknown distributions. While this transformation may be useful for moderately skewed and/or kurtotic distributions, it performs poorly when the distribution is markedly skewed and/or kurtotic.


He, Zhang and Zhang's inequality

For any collection of non-negative independent random variables with expectation 1 : \Pr\left ( \frac - 1 \ge \frac \right) \le \frac.


Integral Chebyshev inequality

There is a second (less well known) inequality also named after Chebyshev If ''f'', ''g'' : 'a'', ''b''→ R are two
monotonic In mathematics, a monotonic function (or monotone function) is a function between ordered sets that preserves or reverses the given order. This concept first arose in calculus, and was later generalized to the more abstract setting of order ...
function Function or functionality may refer to: Computing * Function key, a type of key on computer keyboards * Function model, a structured representation of processes in a system * Function object or functor or functionoid, a concept of object-oriente ...
s of the same monotonicity, then : \frac \int_a^b \! f(x) g(x) \,dx \ge \left \frac \int_a^b \! f(x) \,dx \right\left \frac \int_a^b \! g(x) \,dx \right. If ''f'' and ''g'' are of opposite monotonicity, then the above inequality works in the reverse way. This inequality is related to
Jensen's inequality In mathematics, Jensen's inequality, named after the Danish mathematician Johan Jensen, relates the value of a convex function of an integral to the integral of the convex function. It was proved by Jensen in 1906, building on an earlier pr ...
, Kantorovich's inequality, the
Hermite–Hadamard inequality In mathematics, the Hermite–Hadamard inequality, named after Charles Hermite and Jacques Hadamard and sometimes also called Hadamard's inequality, states that if a function ƒ :  'a'', ''b''nbsp;→ R is convex function, c ...
and Walter's conjecture.


Other inequalities

There are also a number of other inequalities associated with Chebyshev: *
Chebyshev's sum inequality In mathematics, Chebyshev's sum inequality, named after Pafnuty Chebyshev, states that if :a_1 \geq a_2 \geq \cdots \geq a_n \quad and \quad b_1 \geq b_2 \geq \cdots \geq b_n, then : \sum_^n a_k b_k \geq \left(\sum_^n a_k\right)\!\!\left(\sum_^n ...
*
Chebyshev–Markov–Stieltjes inequalities In mathematical analysis, the Chebyshev–Markov–Stieltjes inequalities are inequalities related to the problem of moments that were formulated in the 1880s by Pafnuty Chebyshev and proved independently by Andrey Markov and (somewhat late ...


Notes

The
Environmental Protection Agency A biophysical environment is a biotic and abiotic surrounding of an organism or population, and consequently includes the factors that have an influence in their survival, development, and evolution. A biophysical environment can vary in scale f ...
has suggested best practices for the use of Chebyshev's inequality for estimating confidence intervals.


See also

*
Multidimensional Chebyshev's inequality In probability theory, the multidimensional Chebyshev's inequality is a generalization of Chebyshev's inequality, which puts a bound on the probability of the event that a random variable differs from its expected value by more than a specified ...
*
Concentration inequality In probability theory, concentration inequalities provide bounds on how a random variable deviates from some value (typically, its expected value). The law of large numbers of classical probability theory states that sums of independent random vari ...
– a summary of tail-bounds on random variables. *
Cornish–Fisher expansion The Cornish–Fisher expansion is an asymptotic expansion used to approximate the quantiles of a probability distribution based on its cumulants. It is named after E. A. Cornish and R. A. Fisher, who first described the technique in 1937. De ...
* Eaton's inequality *
Kolmogorov's inequality In probability theory, Kolmogorov's inequality is a so-called "maximal inequality (mathematics), inequality" that gives a bound on the probability that the partial sums of a Finite set, finite collection of independent random variables exceed some s ...
* Proof of the weak law of large numbers using Chebyshev's inequality *
Le Cam's theorem In probability theory, Le Cam's theorem, named after Lucien Le Cam (1924 – 2000), states the following. Suppose: * X_1, X_2, X_3, \ldots are independent random variables, each with a Bernoulli distribution (i.e., equal to either 0 or 1), n ...
*
Paley–Zygmund inequality In mathematics, the Paley–Zygmund inequality bounds the probability that a positive random variable is small, in terms of its first two moment (mathematics), moments. The inequality was proved by Raymond Paley and Antoni Zygmund. Theorem: If ''Z' ...
* Vysochanskiï–Petunin inequality — a stronger result applicable to unimodal probability distributions * Lenglart Inequality * Burkholder-Davis-Gundy Inequality


References


Further reading

* A. Papoulis (1991), ''Probability, Random Variables, and Stochastic Processes'', 3rd ed. McGraw–Hill. . pp. 113–114. * G. Grimmett and D. Stirzaker (2001), ''Probability and Random Processes'', 3rd ed. Oxford. . Section 7.3.


External links

* {{springer, title=Chebyshev inequality in probability theory, id=p/c021890
Formal proof
in the
Mizar system The Mizar system consists of a formal language for writing mathematical definitions and proofs, a proof assistant, which is able to mechanically check proofs written in this language, and a library of formalized mathematics, which can be used i ...
. Articles containing proofs Probabilistic inequalities Statistical inequalities