HOME

TheInfoList



OR:

In
probability theory Probability theory is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expressing it through a set o ...
, Hoeffding's inequality provides an
upper bound In mathematics, particularly in order theory, an upper bound or majorant of a subset of some preordered set is an element of that is greater than or equal to every element of . Dually, a lower bound or minorant of is defined to be an element ...
on the
probability Probability is the branch of mathematics concerning numerical descriptions of how likely an Event (probability theory), event is to occur, or how likely it is that a proposition is true. The probability of an event is a number between 0 and ...
that the sum of bounded
independent random variables Independent or Independents may refer to: Arts, entertainment, and media Artist groups * Independents (artist group), a group of modernist painters based in the New Hope, Pennsylvania, area of the United States during the early 1930s * Independe ...
deviates from its
expected value In probability theory, the expected value (also called expectation, expectancy, mathematical expectation, mean, average, or first moment) is a generalization of the weighted average. Informally, the expected value is the arithmetic mean of a l ...
by more than a certain amount. Hoeffding's inequality was proven by
Wassily Hoeffding Wassily Hoeffding (June 12, 1914 – February 28, 1991) was a Finnish statistician and probabilist. Hoeffding was one of the founders of nonparametric statistics, in which Hoeffding contributed the idea and basic results on U-statistics. In pro ...
in 1963. Hoeffding's inequality is a special case of the
Azuma–Hoeffding inequality In probability theory, the Azuma–Hoeffding inequality (named after Kazuoki Azuma and Wassily Hoeffding) gives a concentration result for the values of martingales that have bounded differences. Suppose \ is a martingale (or super-martingale) ...
and McDiarmid's inequality. It is similar to the
Chernoff bound In probability theory, the Chernoff bound gives exponentially decreasing bounds on tail distributions of sums of independent random variables. Despite being named after Herman Chernoff, the author of the paper it first appeared in, the result is d ...
, but tends to be less sharp, in particular when the variance of the random variables is small. It is similar to, but incomparable with, one of Bernstein's inequalities.


Statement

Let be
independent random variables Independent or Independents may refer to: Arts, entertainment, and media Artist groups * Independents (artist group), a group of modernist painters based in the New Hope, Pennsylvania, area of the United States during the early 1930s * Independe ...
such that a_i \leq X_i \leq b_i
almost surely In probability theory, an event is said to happen almost surely (sometimes abbreviated as a.s.) if it happens with probability 1 (or Lebesgue measure 1). In other words, the set of possible exceptions may be non-empty, but it has probability 0. ...
. Consider the sum of these random variables, :S_n = X_1 + \cdots + X_n. Then Hoeffding's theorem states that, for all , :\begin \operatorname \left(S_n - \mathrm\left _n \right\geq t \right) &\leq \exp \left(-\frac \right) \\ \operatorname \left(\left , S_n - \mathrm\left _n \right\right , \geq t \right) &\leq 2\exp \left(-\frac \right) \end Here is the
expected value In probability theory, the expected value (also called expectation, expectancy, mathematical expectation, mean, average, or first moment) is a generalization of the weighted average. Informally, the expected value is the arithmetic mean of a l ...
of . Note that the inequalities also hold when the have been obtained using sampling without replacement; in this case the random variables are not independent anymore. A proof of this statement can be found in Hoeffding's paper. For slightly better bounds in the case of sampling without replacement, see for instance the paper by .


Example

Suppose a_i = 0 and b_i = 1 for all ''i''. This can occur when ''Xi'' are independent Bernoulli random variables, though they need not be identically distributed. Then we get the inequality :\begin \operatorname\left(S_n - \mathrm\left _n\right\geq t\right) \leq \exp(-2t^2/n) \end for all t \geq 0. This is a version of the additive Chernoff bound which is more general, since it allows for random variables that take values between zero and one, but also weaker, since the Chernoff bound gives a better tail bound when the random variables have small variance.


General case of sub-Gaussian random variables

The proof of Hoeffding's inequality can be generalized to any
sub-Gaussian distribution In probability theory, a sub-Gaussian distribution is a probability distribution with strong tail decay. Informally, the tails of a sub-Gaussian distribution are dominated by (i.e. decay at least as fast as) the tails of a Gaussian. This property gi ...
. In fact, the main lemma used in the proof, Hoeffding's lemma, implies that bounded random variables are sub-Gaussian. A random variable is called sub-Gaussian, if :\mathrm(, X, \geq t)\leq 2e^, for some c>0. For a random variable , the following norm is finite if and only if is sub-Gaussian: :\Vert X \Vert_ := \inf\left\. Then let be zero-mean independent sub-Gaussian random variables, the general version of the Hoeffding's inequality states that: : \mathrm\left( \left, \sum_^n X_i \ \geq t \right) \leq 2\exp\left( -\frac \right), where ''c'' > 0 is an absolute constant.


Proof

The proof of Hoeffding's inequality follows similarly to concentration inequalities like
Chernoff bound In probability theory, the Chernoff bound gives exponentially decreasing bounds on tail distributions of sums of independent random variables. Despite being named after Herman Chernoff, the author of the paper it first appeared in, the result is d ...
s. The main difference is the use of Hoeffding's Lemma: ::Suppose is a real random variable such that X\in\left ,b\right/math>
almost surely In probability theory, an event is said to happen almost surely (sometimes abbreviated as a.s.) if it happens with probability 1 (or Lebesgue measure 1). In other words, the set of possible exceptions may be non-empty, but it has probability 0. ...
. Then ::::\mathrm \left ^ \right leq \exp\left(\tfrac s^2 (b-a)^2\right). Using this lemma, we can prove Hoeffding's inequality. As in the theorem statement, suppose are independent random variables such that X_i\in _i,b_i/math> almost surely for all ''i'', and let S_n = X_1 + \cdots + X_n. Then for ,
Markov's inequality In probability theory, Markov's inequality gives an upper bound for the probability that a non-negative function (mathematics), function of a random variable is greater than or equal to some positive Constant (mathematics), constant. It is named a ...
and the independence of implies: :\begin \operatorname\left(S_n-\mathrm\left _n \right geq t \right) &= \operatorname \left (\exp(s(S_n-\mathrm\left _n \right ) \geq \exp(st) \right)\\ &\leq \exp(-st)\mathrm \left _n_\right_)_\right_.html" ;"title="exp(s(S_n-\mathrm\left _n \right ) \right ">exp(s(S_n-\mathrm\left _n \right ) \right \ &= \exp(-st) \prod_^n\mathrm \left exp(s(X_i-\mathrm\left_[X_i\right)_\right_.html" ;"title="_i\right.html" ;"title="exp(s(X_i-\mathrm\left [X_i\right">exp(s(X_i-\mathrm\left [X_i\right) \right ">_i\right.html" ;"title="exp(s(X_i-\mathrm\left [X_i\right">exp(s(X_i-\mathrm\left [X_i\right) \right \ &\leq \exp(-st) \prod_^n \exp\Big(\frac\Big)\\ &= \exp\left(-st+\tfrac s^2 \sum_^n(b_i-a_i)^2\right) \end This upper bound is the best for the value of minimizing the value inside the exponential. This can be done easily by optimizing a quadratic, giving :s=\frac. Writing the above bound for this value of , we get the desired bound: :\operatorname \left(S_n-\mathrm\left _n \right geq t \right)\leq \exp\left(-\frac\right).


Usage


Confidence intervals

Hoeffding's inequality can be used to derive
confidence interval In frequentist statistics, a confidence interval (CI) is a range of estimates for an unknown parameter. A confidence interval is computed at a designated ''confidence level''; the 95% confidence level is most common, but other levels, such as 9 ...
s. We consider a coin that shows heads with probability and tails with probability . We toss the coin times, generating samples X_1,\ldots,X_n (which are Independent and identically distributed random variables, i.i.d Bernoulli random variables). The expected value, expected number of times the coin comes up heads is . Furthermore, the probability that the coin comes up heads at least times can be exactly quantified by the following expression: :\operatorname( H(n) \geq k)= \sum_^n \binom p^i (1-p)^, where is the number of heads in coin tosses. When for some , Hoeffding's inequality bounds this probability by a term that is exponentially small in : :\operatorname( H(n) - pn >\varepsilon n)\leq\exp\left(-2\varepsilon^2 n\right). Since this bound holds on both sides of the mean, Hoeffding's inequality implies that the number of heads that we see is concentrated around its mean, with exponentially small tail. :\operatorname\left(, H(n) - pn, >\varepsilon n\right)\leq 2\exp\left(-2\varepsilon^2 n\right). Thinking of \overline = \fracH(n) as the "observed" mean, this probability can be interpreted as the level of significance \alpha (probability of making an error) for a confidence interval around p of size 2: :\alpha=\operatorname(\ \overline \notin -\varepsilon, p+\varepsilon \leq 2e^ Solving the above for gives us the following: :n\geq \frac Therefore, we require at least \textstyle \frac samples to acquire a \textstyle (1-\alpha)-confidence interval \textstyle p \pm \varepsilon. Hence, the cost of acquiring the confidence interval is sublinear in terms of confidence level and quadratic in terms of precision. Note that there are more efficient methods of estimating a
confidence interval In frequentist statistics, a confidence interval (CI) is a range of estimates for an unknown parameter. A confidence interval is computed at a designated ''confidence level''; the 95% confidence level is most common, but other levels, such as 9 ...
.


See also

*
Concentration inequality In probability theory, concentration inequalities provide bounds on how a random variable deviates from some value (typically, its expected value). The law of large numbers of classical probability theory states that sums of independent random vari ...
– a summary of tail-bounds on random variables. * Hoeffding's lemma *
Bernstein inequalities (probability theory) In probability theory, Bernstein inequalities give bounds on the probability that the sum of random variables deviates from its mean. In the simplest case, let ''X''1, ..., ''X'n'' be independent Bernoulli random variables taking value ...


Notes


References

* * * * *

{{refend Probabilistic inequalities