Hoeffding's Inequality
   HOME
*





Hoeffding's Inequality
In probability theory, Hoeffding's inequality provides an upper bound on the probability that the sum of bounded independent random variables deviates from its expected value by more than a certain amount. Hoeffding's inequality was proven by Wassily Hoeffding in 1963. Hoeffding's inequality is a special case of the Azuma–Hoeffding inequality and McDiarmid's inequality. It is similar to the Chernoff bound, but tends to be less sharp, in particular when the variance of the random variables is small. It is similar to, but incomparable with, one of Bernstein's inequalities. Statement Let be independent random variables such that a_i \leq X_i \leq b_i almost surely. Consider the sum of these random variables, :S_n = X_1 + \cdots + X_n. Then Hoeffding's theorem states that, for all , :\begin \operatorname \left(S_n - \mathrm\left _n \right\geq t \right) &\leq \exp \left(-\frac \right) \\ \operatorname \left(\left , S_n - \mathrm\left _n \right\right , \geq t \right) &\leq 2\ ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Probability Theory
Probability theory is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expressing it through a set of axioms. Typically these axioms formalise probability in terms of a probability space, which assigns a measure taking values between 0 and 1, termed the probability measure, to a set of outcomes called the sample space. Any specified subset of the sample space is called an event. Central subjects in probability theory include discrete and continuous random variables, probability distributions, and stochastic processes (which provide mathematical abstractions of non-deterministic or uncertain processes or measured quantities that may either be single occurrences or evolve over time in a random fashion). Although it is not possible to perfectly predict random events, much can be said about their behavior. Two major results in probability ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Almost Surely
In probability theory, an event is said to happen almost surely (sometimes abbreviated as a.s.) if it happens with probability 1 (or Lebesgue measure 1). In other words, the set of possible exceptions may be non-empty, but it has probability 0. The concept is analogous to the concept of "almost everywhere" in measure theory. In probability experiments on a finite sample space, there is no difference between ''almost surely'' and ''surely'' (since having a probability of 1 often entails including all the sample points). However, this distinction becomes important when the sample space is an infinite set, because an infinite set can have non-empty subsets of probability 0. Some examples of the use of this concept include the strong and uniform versions of the law of large numbers, and the continuity of the paths of Brownian motion. The terms almost certainly (a.c.) and almost always (a.a.) are also used. Almost never describes the opposite of ''almost surely'': an event that h ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Concentration Inequality
In probability theory, concentration inequalities provide bounds on how a random variable deviates from some value (typically, its expected value). The law of large numbers of classical probability theory states that sums of independent random variables are, under very mild conditions, close to their expectation with a large probability. Such sums are the most basic examples of random variables concentrated around their mean. Recent results show that such behavior is shared by other functions of independent random variables. Concentration inequalities can be sorted according to how much information about the random variable is needed in order to use them. Markov's inequality Let X be a random variable that is non-negative (almost surely). Then, for every constant a > 0, : \Pr(X \geq a) \leq \frac. Note the following extension to Markov's inequality: if \Phi is a strictly increasing and non-negative function, then :\Pr(X \geq a) = \Pr(\Phi (X) \geq \Phi (a)) \leq \frac. Cheb ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Independent And Identically Distributed Random Variables
In probability theory and statistics, a collection of random variables is independent and identically distributed if each random variable has the same probability distribution as the others and all are mutually independent. This property is usually abbreviated as ''i.i.d.'', ''iid'', or ''IID''. IID was first defined in statistics and finds application in different fields such as data mining and signal processing. Introduction In statistics, we commonly deal with random samples. A random sample can be thought of as a set of objects that are chosen randomly. Or, more formally, it’s “a sequence of independent, identically distributed (IID) random variables”. In other words, the terms ''random sample'' and ''IID'' are basically one and the same. In statistics, we usually say “random sample,” but in probability it’s more common to say “IID.” * Identically Distributed means that there are no overall trends–the distribution doesn’t fluctuate and all items in t ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Confidence Interval
In frequentist statistics, a confidence interval (CI) is a range of estimates for an unknown parameter. A confidence interval is computed at a designated ''confidence level''; the 95% confidence level is most common, but other levels, such as 90% or 99%, are sometimes used. The confidence level represents the long-run proportion of corresponding CIs that contain the true value of the parameter. For example, out of all intervals computed at the 95% level, 95% of them should contain the parameter's true value. Factors affecting the width of the CI include the sample size, the variability in the sample, and the confidence level. All else being the same, a larger sample produces a narrower confidence interval, greater variability in the sample produces a wider confidence interval, and a higher confidence level produces a wider confidence interval. Definition Let be a random sample from a probability distribution with statistical parameter , which is a quantity to be estimate ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Markov's Inequality
In probability theory, Markov's inequality gives an upper bound for the probability that a non-negative function (mathematics), function of a random variable is greater than or equal to some positive Constant (mathematics), constant. It is named after the Russian mathematician Andrey Markov, although it appeared earlier in the work of Pafnuty Chebyshev (Markov's teacher), and many sources, especially in Mathematical analysis, analysis, refer to it as Chebyshev's inequality (sometimes, calling it the first Chebyshev inequality, while referring to Chebyshev's inequality as the second Chebyshev inequality) or Irénée-Jules Bienaymé, Bienaymé's inequality. Markov's inequality (and other similar inequalities) relate probabilities to expected value, expectations, and provide (frequently loose but still useful) bounds for the cumulative distribution function of a random variable. Statement If is a nonnegative random variable and , then the probability that is at least is at most th ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  




Hoeffding's Lemma
In probability theory, Hoeffding's lemma is an inequality that bounds the moment-generating function of any bounded random variable. It is named after the Finnish–American mathematical statistician Wassily Hoeffding. The proof of Hoeffding's lemma uses Taylor's theorem and Jensen's inequality. Hoeffding's lemma is itself used in the proof of McDiarmid's inequality. Statement of the lemma Let ''X'' be any real-valued random variable such that a \leq X \leq b almost surely, i.e. with probability one. Then, for all \lambda \in \mathbb, :\mathbb \left e^ \right\leq \exp \Big(\lambda\mathbb \frac \Big), or equivalently, :\mathbb \left e^ \right\leq \exp \Big(\frac \Big). Proof Without loss of generality, by replacing X by X - \mathbb /math>, we can assume \mathbb = 0, so that a \leq 0 \leq b. Since e^ is a convex function of x, we have that for all x \in ,b/math>, :e^\leq \frace^+\frace^ So, :\begin \mathbb\left ^\right&\leq \frace^+\frace^\\ &= \frace^ + \frace ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Sub-Gaussian Distribution
In probability theory, a sub-Gaussian distribution is a probability distribution with strong tail decay. Informally, the tails of a sub-Gaussian distribution are dominated by (i.e. decay at least as fast as) the tails of a Gaussian. This property gives sub-Gaussian distributions their name. Formally, the probability distribution of a random variable ''X '' is called sub-Gaussian if there are positive constant ''C'' such that for every t \geq 0, : \operatorname(, X, \geq t) \leq 2 \exp . Sub-Gaussian properties Let ''X '' be a random variable. The following conditions are equivalent: # \operatorname(, X, \geq t) \leq 2 \exp for all t \geq 0, where K_1 is a positive constant; # \operatornameexp\leq 2, where K_2 is a positive constant; # \operatorname , X, ^p \leq 2K_3^p \Gamma\left(\frac+1\right) for all ''p \geq 1'', where K_3 is a positive constant. ''Proof''. (1)\implies(3) By the layer cake representation,\begin \operatorname , X, ^p &= \int_0^\infty \operatorname(, X ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Bernoulli Trial
In the theory of probability and statistics, a Bernoulli trial (or binomial trial) is a random experiment with exactly two possible outcomes, "success" and "failure", in which the probability of success is the same every time the experiment is conducted. It is named after Jacob Bernoulli, a 17th-century Swiss mathematician, who analyzed them in his ''Ars Conjectandi'' (1713). The mathematical formalisation of the Bernoulli trial is known as the Bernoulli process. This article offers an elementary introduction to the concept, whereas the article on the Bernoulli process offers a more advanced treatment. Since a Bernoulli trial has only two possible outcomes, it can be framed as some "yes or no" question. For example: *Is the top card of a shuffled deck an ace? *Was the newborn child a girl? (See human sex ratio.) Therefore, success and failure are merely labels for the two outcomes, and should not be construed literally. The term "success" in this sense consists in the result ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Bernstein Inequalities In Probability Theory
In probability theory, Bernstein inequalities give bounds on the probability that the sum of random variables deviates from its mean. In the simplest case, let ''X''1, ..., ''X''''n'' be independent Bernoulli trial, Bernoulli random variables taking values +1 and −1 with probability 1/2 (this distribution is also known as the Rademacher distribution), then for every positive \varepsilon, :\mathbb\left (\left, \frac\sum_^n X_i\ > \varepsilon \right ) \leq 2\exp \left (-\frac \right). Bernstein inequalities were proved and published by Sergei Bernstein in the 1920s and 1930s.J.V.Uspensky, "Introduction to Mathematical Probability", McGraw-Hill Book Company, 1937 Later, these inequalities were rediscovered several times in various forms. Thus, special cases of the Bernstein inequalities are also known as the Chernoff bound, Hoeffding's inequality and Azuma's inequality. Some of the inequalities 1. Let X_1, \ldots, X_n be independent zero-mean random variables. Sup ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Upper Bound
In mathematics, particularly in order theory, an upper bound or majorant of a subset of some preordered set is an element of that is greater than or equal to every element of . Dually, a lower bound or minorant of is defined to be an element of that is less than or equal to every element of . A set with an upper (respectively, lower) bound is said to be bounded from above or majorized (respectively bounded from below or minorized) by that bound. The terms bounded above (bounded below) are also used in the mathematical literature for sets that have upper (respectively lower) bounds. Examples For example, is a lower bound for the set (as a subset of the integers or of the real numbers, etc.), and so is . On the other hand, is not a lower bound for since it is not smaller than every element in . The set has as both an upper bound and a lower bound; all other numbers are either an upper bound or a lower bound for that . Every subset of the natural numbers has a lowe ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Chernoff Bound
In probability theory, the Chernoff bound gives exponentially decreasing bounds on tail distributions of sums of independent random variables. Despite being named after Herman Chernoff, the author of the paper it first appeared in, the result is due to Herman Rubin. It is a sharper bound than the first- or second-moment-based tail bounds such as Markov's inequality or Chebyshev's inequality, which only yield power-law bounds on tail decay. However, the Chernoff bound requires the variates to be independent, a condition that is not required by either Markov's inequality or Chebyshev's inequality (although Chebyshev's inequality does require the variates to be pairwise independent). The Chernoff bound is related to the Bernstein inequalities, which were developed earlier, and to Hoeffding's inequality. The generic bound The generic Chernoff bound for a random variable is attained by applying Markov's inequality to . This gives a bound in terms of the moment-generating function ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]