In
probability theory
Probability theory or probability calculus is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expre ...
, Hoeffding's inequality provides an
upper bound
In mathematics, particularly in order theory, an upper bound or majorant of a subset of some preordered set is an element of that is every element of .
Dually, a lower bound or minorant of is defined to be an element of that is less ...
on the
probability
Probability is a branch of mathematics and statistics concerning events and numerical descriptions of how likely they are to occur. The probability of an event is a number between 0 and 1; the larger the probability, the more likely an e ...
that the sum of bounded
independent random variables
Independence is a fundamental notion in probability theory, as in statistics and the theory of stochastic processes. Two events are independent, statistically independent, or stochastically independent if, informally speaking, the occurrence of ...
deviates from its
expected value
In probability theory, the expected value (also called expectation, expectancy, expectation operator, mathematical expectation, mean, expectation value, or first Moment (mathematics), moment) is a generalization of the weighted average. Informa ...
by more than a certain amount. Hoeffding's inequality was proven by
Wassily Hoeffding in 1963.
Hoeffding's inequality is a special case of the
Azuma–Hoeffding inequality and
McDiarmid's inequality. It is similar to the
Chernoff bound, but tends to be less sharp, in particular when the variance of the random variables is small. It is similar to, but incomparable with, one of
Bernstein's inequalities.
Statement
Let be
independent random variables
Independence is a fundamental notion in probability theory, as in statistics and the theory of stochastic processes. Two events are independent, statistically independent, or stochastically independent if, informally speaking, the occurrence of ...
such that
almost surely
In probability theory, an event is said to happen almost surely (sometimes abbreviated as a.s.) if it happens with probability 1 (with respect to the probability measure). In other words, the set of outcomes on which the event does not occur ha ...
. Consider the sum of these random variables,
:
Then Hoeffding's theorem states that, for all ,
:
Here is the
expected value
In probability theory, the expected value (also called expectation, expectancy, expectation operator, mathematical expectation, mean, expectation value, or first Moment (mathematics), moment) is a generalization of the weighted average. Informa ...
of .
Note that the inequalities also hold when the have been obtained using sampling without replacement; in this case the random variables are not independent anymore. A proof of this statement can be found in Hoeffding's paper. For slightly better bounds in the case of sampling without replacement, see for instance the paper by .
Generalization
Let be independent observations such that and . Let . Then, for any ,
Special Case: Bernoulli RVs
Suppose
and
for all ''i''. This can occur when ''X
i'' are independent
Bernoulli random variables, though they need not be identically distributed. Then we get the inequality
:
or equivalently,
for all
. This is a version of the
additive Chernoff bound which is more general, since it allows for random variables that take values between zero and one, but also weaker, since the Chernoff bound gives a better tail bound when the random variables have small variance.
General case of bounded from above random variables
Hoeffding's inequality can be extended to the case of bounded from above random variables.
Let be
independent random variables
Independence is a fundamental notion in probability theory, as in statistics and the theory of stochastic processes. Two events are independent, statistically independent, or stochastically independent if, informally speaking, the occurrence of ...
such that
and
almost surely
In probability theory, an event is said to happen almost surely (sometimes abbreviated as a.s.) if it happens with probability 1 (with respect to the probability measure). In other words, the set of outcomes on which the event does not occur ha ...
.
Denote by
:
Hoeffding's inequality for bounded from above random variables states that for all
,
:
In particular, if
for all
,
then for all
,
:
General case of sub-Gaussian random variables
The proof of Hoeffding's inequality can be generalized to any
sub-Gaussian distribution. Recall that a random variable is called sub-Gaussian, if
:
for some
. For any bounded variable ,
for
for some sufficiently large . Then
for all
so taking
yields
:
for
. So every bounded variable is sub-Gaussian.
For a random variable , the following norm is finite if and only if is sub-Gaussian:
:
Then let be independent sub-Gaussian random variables, the general version of the Hoeffding's inequality states that:
:
where ''c'' > 0 is an absolute constant.
Equivalently, we can define sub-Gaussian distributions by variance proxy, defined as follows. If there exists some
such that
for all
, then
is called a ''variance proxy'', and the smallest such
is called the ''optimal variance proxy'', and denoted by
. In this form, Hoeffding's inequality states
Proof
We quote:
The proof of Hoeffding's inequality then follows similarly to concentration inequalities like
Chernoff bounds.
This proof easily generalizes to the case for sub-Gaussian distributions with variance proxy.
Usage
Confidence intervals
Hoeffding's inequality can be used to derive
confidence intervals. We consider a coin that shows heads with probability and tails with probability . We toss the coin times, generating samples
(which are Independent and identically distributed random variables, i.i.d
Bernoulli random variables). The
expected value, expected number of times the coin comes up heads is . Furthermore, the probability that the coin comes up heads at least times can be exactly quantified by the following expression:
:
where is the number of heads in coin tosses.
When for some , Hoeffding's inequality bounds this probability by a term that is exponentially small in :
:
Since this bound holds on both sides of the mean, Hoeffding's inequality implies that the number of heads that we see is concentrated around its mean, with exponentially small tail.
:
Thinking of
as the "observed" mean, this probability can be interpreted as the level of significance
(probability of making an error) for a confidence interval around
of size 2:
:
Finding for opposite inequality sign in the above, i.e. that violates inequality but not equality above, gives us:
:
Therefore, we require at least
samples to acquire a
-confidence interval
.
Hence, the cost of acquiring the confidence interval is sublinear in terms of confidence level and quadratic in terms of precision. Note that there are more efficient methods of estimating a
confidence interval.
See also
*
Concentration inequality – a summary of tail-bounds on random variables.
*
Hoeffding's lemma
*
Bernstein inequalities (probability theory)
Notes
References
*
*
*
*
*
*
{{refend
Probabilistic inequalities