probability theory Probability theory or probability calculus is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expre ...

, Markov's inequality gives an

upper bound In mathematics, particularly in order theory, an upper bound or majorant of a subset of some preordered set is an element of that is every element of . Dually, a lower bound or minorant of is defined to be an element of that is less ...

on the

probability Probability is a branch of mathematics and statistics concerning events and numerical descriptions of how likely they are to occur. The probability of an event is a number between 0 and 1; the larger the probability, the more likely an e ...

that a

non-negative In mathematics, the sign of a real number is its property of being either positive, negative, or 0. Depending on local conventions, zero may be considered as having its own unique sign, having no sign, or having both positive and negative sign. ...

random variable A random variable (also called random quantity, aleatory variable, or stochastic variable) is a Mathematics, mathematical formalization of a quantity or object which depends on randomness, random events. The term 'random variable' in its mathema ...

is greater than or equal to some positive constant. Markov's inequality is tight in the sense that for each chosen positive constant, there exists a random variable such that the inequality is in fact an equality. It is named after the Russian mathematician

Andrey Markov Andrey Andreyevich Markov (14 June 1856 – 20 July 1922) was a Russian mathematician best known for his work on stochastic processes. A primary subject of his research later became known as the Markov chain. He was also a strong, close to mas ...

, although it appeared earlier in the work of

Pafnuty Chebyshev Pafnuty Lvovich Chebyshev ( rus, Пафну́тий Льво́вич Чебышёв, p=pɐfˈnutʲɪj ˈlʲvovʲɪtɕ tɕɪbɨˈʂof) ( – ) was a Russian mathematician and considered to be the founding father of Russian mathematics. Chebysh ...

(Markov's teacher), and many sources, especially in

analysis Analysis (: analyses) is the process of breaking a complex topic or substance into smaller parts in order to gain a better understanding of it. The technique has been applied in the study of mathematics and logic since before Aristotle (38 ...

, refer to it as Chebyshev's inequality (sometimes, calling it the first Chebyshev inequality, while referring to

Chebyshev's inequality In probability theory, Chebyshev's inequality (also called the Bienaymé–Chebyshev inequality) provides an upper bound on the probability of deviation of a random variable (with finite variance) from its mean. More specifically, the probability ...

as the second Chebyshev inequality) or Bienaymé's inequality. Markov's inequality (and other similar inequalities) relate probabilities to expectations, and provide (frequently loose but still useful) bounds for the

cumulative distribution function In probability theory and statistics, the cumulative distribution function (CDF) of a real-valued random variable X, or just distribution function of X, evaluated at x, is the probability that X will take a value less than or equal to x. Ever ...

of a random variable. Markov's inequality can also be used to upper bound the expectation of a non-negative random variable in terms of its distribution function.

Statement

If is a nonnegative random variable and , then the probability that is at least is at most the expectation of divided by : :

\operatorname(X \geq a) \leq \frac.

When

\operatorname(X) > 0

, we can take

a = \tilde \cdot \operatorname(X)

for

\tilde > 0

to rewrite the previous inequality as :

\operatorname(X \geq \tilde \cdot \operatorname(X)) \leq \frac.

In the language of

measure theory In mathematics, the concept of a measure is a generalization and formalization of geometrical measures (length, area, volume) and other common notions, such as magnitude (mathematics), magnitude, mass, and probability of events. These seemingl ...

, Markov's inequality states that if is a

measure space A measure space is a basic object of measure theory, a branch of mathematics that studies generalized notions of volumes. It contains an underlying set, the subsets of this set that are feasible for measuring (the -algebra) and the method that ...

f

is a

measurable In mathematics, the concept of a measure is a generalization and formalization of geometrical measures (length, area, volume) and other common notions, such as magnitude, mass, and probability of events. These seemingly distinct concepts hav ...

extended real-valued function, and , then :

\mu(\) \leq \frac 1 \varepsilon \int_X , f, \,d\mu.

This measure-theoretic definition is sometimes referred to as

Extended version for nondecreasing functions

If is a nondecreasing nonnegative function, is a (not necessarily nonnegative) random variable, and , then :

\operatorname P (X \ge a) \le \frac.

An immediate corollary, using higher moments of supported on values larger than 0, is :

\operatorname P (, X,  \ge a) \le \frac.

The uniformly randomized Markov's inequality

If is a nonnegative random variable and , and is a uniformly distributed random variable on

,1 /math> that is independent of  , then

: \operatorname(X \geq Ua) \leq \frac. Since  is almost surely smaller than one, this bound is strictly stronger than Markov's inequality. Remarkably,  cannot be replaced by any constant smaller than one, meaning that deterministic improvements to Markov's inequality cannot exist in general. While Markov's inequality holds with equality for distributions supported on \, the above randomized variant holds with equality for any distribution that is bounded on,a /math>.

Proofs

We separate the case in which the measure space is a probability space from the more general case because the probability case is more accessible for the general reader.

Intuition

\operatorname(X) = \operatorname(X < a)\cdot \operatorname(X, X where \operatorname(X, X is larger than or equal to 0 as the random variable X is non-negative and \operatorname(X, X\geq a) is larger than or equal to a because the conditional expectation only takes into account of values larger than or equal to a which r.v. X can take.

Property 1: \operatorname(X < a) \cdot \operatorname(X \mid X < a) \geq 0 Given a non-negative random variable X, the conditional expectation \operatorname(X \mid X < a) \geq 0 because X \geq 0 . Also, probabilities are always non-negative, i.e., \operatorname(X < a) \geq 0 . Thus, the product: \operatorname(X < a) \cdot \operatorname(X \mid X < a) \geq 0 .

This is intuitive since conditioning on X < a still results in non-negative values, ensuring the product remains non-negative.

Property 2: \operatorname(X \geq a) \cdot \operatorname(X \mid X \geq a) \geq a \cdot \operatorname(X \geq a) For X \geq a, the expected value given X \geq a is at least a. \operatorname(X \mid X \geq a) \geq a . Multiplying both sides by \operatorname(X \geq a), we get: \operatorname(X \geq a) \cdot \operatorname(X \mid X \geq a) \geq a \cdot \operatorname(X \geq a) .

This is intuitive since all values considered are at least a, making their average also greater than or equal to a .

Hence intuitively, \operatorname(X)\geq \operatorname(X \geq a)\cdot \operatorname(X, X\geq a)\geq a \cdot \operatorname(X\geq a), which directly leads to \operatorname(X\geq a)\leq \frac .

Probability-theoretic proof

Method 1: From the definition of expectation: :

\operatorname(X)=\int_^ xf(x) \, dx

However, X is a non-negative random variable thus, :

\operatorname(X)=\int_^\infty xf(x) \, dx = \int_0^\infty xf(x) \, dx

From this we can derive, :

\operatorname(X)=\int_0^a xf(x) \, dx + \int_a^\infty xf(x) \, dx \ge  \int_a^\infty xf(x) \, dx \ge\int_a^\infty af(x) \, dx = a\int_a^\infty f(x) \, dx= a \operatorname(X \ge a)

From here, dividing through by

a

allows us to see that :

\Pr(X \ge a) \le \operatorname(X)/a

Method 2: For any event

E

, let

I_E

be the indicator random variable of

E

, that is,

I_E=1

E

occurs and

I_E=0

otherwise. Using this notation, we have

I_=1

if the event

X\geq a

occurs, and

I_=0

X .  Then, given a>0,

: aI_ \leq X which is clear if we consider the two possible values of X\geq a .  If X, then I_=0, and so a I_=0\leq X .  Otherwise, we have X\geq a, for which I_=1 and so aI_=a\leq X .

Since \operatorname is a monotonically increasing function, taking expectation of both sides of an inequality cannot reverse it. Therefore,

: \operatorname(aI_) \leq \operatorname(X). Now, using linearity of expectations, the left side of this inequality is the same as

: a\operatorname(I_) = a(1\cdot\operatorname(X \geq a) + 0\cdot\operatorname(X < a)) = a\operatorname(X \geq a). Thus we have

: a\operatorname(X \geq a) \leq \operatorname(X) and since ''a'' > 0, we can divide both sides by ''a''.

Measure-theoretic proof

We may assume that the function

f

is non-negative, since only its absolute value enters in the equation. Now, consider the real-valued function ''s'' on ''X'' given by :

s(x) =
\begin
  \varepsilon, & \text f(x) \geq \varepsilon  \\
  0, & \text f(x) < \varepsilon
\end

Then

0\leq s(x)\leq f(x)

. By the definition of the

Lebesgue integral In mathematics, the integral of a non-negative Function (mathematics), function of a single variable can be regarded, in the simplest case, as the area between the Graph of a function, graph of that function and the axis. The Lebesgue integral, ...

\int_X f(x) \, d\mu \geq \int_X s(x) \, d \mu = \varepsilon \mu( \ )

and since

\varepsilon >0

, both sides can be divided by

\varepsilon

, obtaining :

\mu(\) \leq \int_X f \,d\mu.

Discrete case

We now provide a proof for the special case when

X

is a discrete random variable which only takes on non-negative integer values. Let

a

be a positive integer. By definition

a\operatorname(X > a)

=a\operatorname(X = a + 1) + a\operatorname(X = a + 2) + a\operatorname(X = a + 3) + ...

\leq a\operatorname(X = a) + (a+1)\operatorname(X = a + 1) + (a+2)\operatorname(X = a + 2) + ...

\leq \operatorname(X = 1) + 2\operatorname(X = 2) + 3\operatorname(X = 3) + ...

+a\operatorname(X = a ) + (a+1)\operatorname(X = a + 1) + (a+2)\operatorname(X = a + 2) + ...

=\operatorname(X)

Dividing by

a

yields the desired result.

Corollaries

Chebyshev's inequality

uses the

variance In probability theory and statistics, variance is the expected value of the squared deviation from the mean of a random variable. The standard deviation (SD) is obtained as the square root of the variance. Variance is a measure of dispersion ...

to bound the probability that a random variable deviates far from the mean. Specifically, :

\operatorname(, X-\operatorname(X),  \geq a) \leq \frac,

for any . Here is the

of X, defined as: :

\operatorname(X) = \operatorname X - \operatorname(X) )^2

Chebyshev's inequality follows from Markov's inequality by considering the random variable :

(X - \operatorname(X))^2

and the constant

a^2,

for which Markov's inequality reads :

\operatorname( (X - \operatorname(X))^2 \ge a^2) \le \frac.

This argument can be summarized (where "MI" indicates use of Markov's inequality): :

\operatorname(, X-\operatorname(X),  \geq a) = 
\operatorname\left((X-\operatorname(X))^2 \geq a^2\right) \,\overset\, 
\frac  =
\frac.

Other corollaries

#The "monotonic" result can be demonstrated by: #:

\operatorname P (, X,  \ge a) = \operatorname P \big(\varphi(, X, ) \ge \varphi(a)\big) \,\overset\, 
\frac

#: #The result that, for a nonnegative random variable , the

quantile function In probability and statistics, the quantile function is a function Q: ,1\mapsto \mathbb which maps some probability x \in ,1/math> of a random variable v to the value of the variable y such that P(v\leq y) = x according to its probability distr ...

of satisfies: #:

Q_X(1-p) \leq \frac ,

#:the proof using #:

p \leq \operatorname P(X \geq Q_X(1-p)) \,\overset\, \frac .

#: #Let

M \succeq 0

be a self-adjoint matrix-valued random variable and

A \succ 0

. Then #:

\operatorname(M \npreceq A) \leq \operatorname(\operatorname E (X) A^)

#:which can be proved similarly.

Examples

Assuming no income is negative, Markov's inequality shows that no more than 10% (1/10) of the population can have more than 10 times the average income. Another simple example is as follows: Andrew makes 4 mistakes on average on his Statistics course tests. The best upper bound on the probability that Andrew will do at least 10 mistakes is 0.4 as

\operatorname(X \geq 10) \leq   \frac = \frac.

Note that Andrew might do exactly 10 mistakes with probability 0.4 and make no mistakes with probability 0.6; the expectation is exactly 4 mistakes.

References

External links

The formal proof of Markov's inequality
in the

Mizar system The Mizar system consists of a formal language for writing mathematical definitions and proofs, a proof assistant, which is able to mechanically check proofs written in this language, and a library of formalized mathematics, which can be used ...

. {{DEFAULTSORT:Markov's Inequality Probabilistic inequalities Articles containing proofs