probability theory Probability theory or probability calculus is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expre ...

, Chebyshev's inequality (also called the Bienaymé–Chebyshev inequality) provides an upper bound on the probability of deviation of a

random variable A random variable (also called random quantity, aleatory variable, or stochastic variable) is a Mathematics, mathematical formalization of a quantity or object which depends on randomness, random events. The term 'random variable' in its mathema ...

(with finite variance) from its mean. More specifically, the probability that a random variable deviates from its mean by more than

k\sigma

is at most

1/k^2

, where

k

is any positive constant and

\sigma

is the

standard deviation In statistics, the standard deviation is a measure of the amount of variation of the values of a variable about its Expected value, mean. A low standard Deviation (statistics), deviation indicates that the values tend to be close to the mean ( ...

(the square root of the variance). The rule is often called Chebyshev's theorem, about the range of standard deviations around the mean, in statistics. The inequality has great utility because it can be applied to any probability distribution in which the mean and variance are defined. For example, it can be used to prove the weak law of large numbers. Its practical usage is similar to the 68–95–99.7 rule, which applies only to

normal distribution In probability theory and statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is f(x) = \frac ...

s. Chebyshev's inequality is more general, stating that a minimum of just 75% of values must lie within two standard deviations of the mean and 88.88% within three standard deviations for a broad range of different

probability distributions In probability theory and statistics, a probability distribution is a function that gives the probabilities of occurrence of possible events for an experiment. It is a mathematical description of a random phenomenon in terms of its sample spac ...

. The term ''Chebyshev's inequality'' may also refer to

Markov's inequality In probability theory, Markov's inequality gives an upper bound on the probability that a non-negative random variable is greater than or equal to some positive Constant (mathematics), constant. Markov's inequality is tight in the sense that for e ...

, especially in the context of analysis. They are closely related, and some authors refer to

as "Chebyshev's First Inequality," and the similar one referred to on this page as "Chebyshev's Second Inequality." Chebyshev's inequality is tight in the sense that for each chosen positive constant, there exists a random variable such that the inequality is in fact an equality.

History

The theorem is named after Russian mathematician

Pafnuty Chebyshev Pafnuty Lvovich Chebyshev ( rus, Пафну́тий Льво́вич Чебышёв, p=pɐfˈnutʲɪj ˈlʲvovʲɪtɕ tɕɪbɨˈʂof) ( – ) was a Russian mathematician and considered to be the founding father of Russian mathematics. Chebysh ...

, although it was first formulated by his friend and colleague Irénée-Jules Bienaymé. The theorem was first proved by Bienaymé in 1853 and more generally proved by Chebyshev in 1867. His student

Andrey Markov Andrey Andreyevich Markov (14 June 1856 – 20 July 1922) was a Russian mathematician best known for his work on stochastic processes. A primary subject of his research later became known as the Markov chain. He was also a strong, close to mas ...

provided another proof in his 1884 Ph.D. thesis.Markov A. (1884) On certain applications of algebraic continued fractions, Ph.D. thesis, St. Petersburg

Statement

Chebyshev's inequality is usually stated for

s, but can be generalized to a statement about measure spaces.

Probabilistic statement

Let ''X'' (integrable) be a

with finite non-zero

variance In probability theory and statistics, variance is the expected value of the squared deviation from the mean of a random variable. The standard deviation (SD) is obtained as the square root of the variance. Variance is a measure of dispersion ...

''σ''² (and thus finite

expected value In probability theory, the expected value (also called expectation, expectancy, expectation operator, mathematical expectation, mean, expectation value, or first Moment (mathematics), moment) is a generalization of the weighted average. Informa ...

''μ''). Then for any

real number In mathematics, a real number is a number that can be used to measure a continuous one- dimensional quantity such as a duration or temperature. Here, ''continuous'' means that pairs of values can have arbitrarily small differences. Every re ...

, :

\Pr(, X-\mu, \geq k\sigma) \leq \frac.

Only the case

k > 1

is useful. When

k \leq 1

the right-hand side

\frac \geq 1

and the inequality is trivial as all probabilities are ≤ 1. As an example, using

k = \sqrt

shows that the probability values lie outside the interval

(\mu - \sqrt\sigma, \mu + \sqrt\sigma)

does not exceed

\frac

. Equivalently, it implies that the probability of values lying within the interval (i.e. its "coverage") is ''at least''

\frac

. Because it can be applied to completely arbitrary distributions provided they have a known finite mean and variance, the inequality generally gives a poor bound compared to what might be deduced if more aspects are known about the distribution involved.

Measure-theoretic statement

Let

(X,\,\Sigma,\,\mu)

be a

measure space A measure space is a basic object of measure theory, a branch of mathematics that studies generalized notions of volumes. It contains an underlying set, the subsets of this set that are feasible for measuring (the -algebra) and the method that ...

, and let ''f'' be an extended real-valued

measurable function In mathematics, and in particular measure theory, a measurable function is a function between the underlying sets of two measurable spaces that preserves the structure of the spaces: the preimage of any measurable set is measurable. This is in ...

defined on ''X''. Then for any real number

t > 0

and

0 < p < \infty

, :

\mu(\) \leq  \int_ , f, ^p \, d\mu.

More generally, if ''g'' is an extended real-valued measurable function, nonnegative and nondecreasing, with

g(t) \neq 0

then: :

\mu(\) \leq  \int_X g\circ f\, d\mu.

This statement follows from the Markov inequality,

\mu(\) \leq\frac1\varepsilon \int_X, F, d\mu

, with

F=g\circ f

and

\varepsilon=g(t)

, since in this case

\mu(\) \geq \mu(\)

. The previous statement then follows by defining

g(x)

, x, ^p

x\ge t

and

0

otherwise.

Example

Suppose we randomly select a journal article from a source with an average of 1000 words per article, with a standard deviation of 200 words. We can then infer that the probability that it has between 600 and 1400 words (i.e. within

k=2

standard deviations of the mean) must be at least 75%, because there is no more than

1/k^2 = 1/4

chance to be outside that range, by Chebyshev's inequality. But if we additionally know that the distribution is normal, we can say there is a 75% chance the word count is between 770 and 1230 (which is an even tighter bound).

Sharpness of bounds

As shown in the example above, the theorem typically provides rather loose bounds. However, these bounds cannot in general (remaining true for arbitrary distributions) be improved upon. The bounds are sharp for the following example: for any ''k'' ≥ 1, :

X = \begin
          -1, & \text\;\;\frac \\
\phantom0, & \text1 - \frac \\
          +1, & \text\;\;\frac
        \end

For this distribution, the mean ''μ'' = 0 and the variance ''σ''² = + 0 + = , so the standard deviation ''σ'' = and :

\Pr(, X-\mu,  \ge k\sigma) = \Pr(, X,  \ge 1) = \frac.

Chebyshev's inequality is an equality for precisely those distributions which are

affine transformation In Euclidean geometry, an affine transformation or affinity (from the Latin, '' affinis'', "connected with") is a geometric transformation that preserves lines and parallelism, but not necessarily Euclidean distances and angles. More general ...

s of this example.

Proof

states that for any non-negative real-valued random variable ''Y'' and any positive number ''a'', we have

\Pr(, Y,  \geq a) \leq \mathbb conditional expectation

In probability theory, the conditional expectation, conditional expected value, or conditional mean of a random variable is its expected value evaluated with respect to the conditional probability distribution. If the random variable can take on ...

History

Statement

Probabilistic statement

Measure-theoretic statement

Example

Sharpness of bounds

Proof

Extensions

Selberg's inequality

Finite-dimensional vector

Known correlation

Higher moments

Exponential moment

Bounded variables

Finite samples

Univariate case

Dependence on sample size

Samuelson's inequality

Semivariances

Multivariate case

Remarks

Sharpened bounds

Cantelli's inequality

An application: distance between the mean and the median

Bhattacharyya's inequality

Gauss's inequality

Vysochanskij–Petunin inequality

Bounds for specific distributions

Related inequalities

Paley–Zygmund inequality

Haldane's transformation

He, Zhang and Zhang's inequality

Integral Chebyshev inequality

Other inequalities

Notes

See also

References

Further reading

External links