probability theory Probability theory or probability calculus is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expre ...

, Bernstein inequalities give bounds on the probability that the sum of random variables deviates from its mean. In the simplest case, let ''X''₁, ..., ''X''_''n'' be independent Bernoulli random variables taking values +1 and −1 with probability 1/2 (this distribution is also known as the Rademacher distribution), then for every positive

\varepsilon

, :

\mathbb\left (\left, \frac\sum_^n X_i\ > \varepsilon \right ) \leq 2\exp \left (-\frac \right).

Bernstein inequalities were proven and published by

Sergei Bernstein Sergei Natanovich Bernstein (, sometimes Romanized as ; 5 March 1880 – 26 October 1968) was a Ukrainian and Soviet mathematician of Jewish origin known for contributions to partial differential equations, differential geometry, probability theo ...

in the 1920s and 1930s.J.V.Uspensky, "Introduction to Mathematical Probability", McGraw-Hill Book Company, 1937 Later, these inequalities were rediscovered several times in various forms. Thus, special cases of the Bernstein inequalities are also known as the Chernoff bound, Hoeffding's inequality and

Azuma's inequality In probability theory, the Azuma–Hoeffding inequality (named after Kazuoki Azuma and Wassily Hoeffding) gives a concentration result for the values of martingales that have bounded differences. Suppose \ is a martingale (or super-martingale ...

. The martingale case of the Bernstein inequality is known as Freedman's inequality and its refinement is known as Hoeffding's inequality.

Some of the inequalities

1. Let

X_1, \ldots, X_n

be independent zero-mean random variables. Suppose that

, X_i, \leq M

almost surely, for all

i.

Then, for all positive

t

, :

\mathbb \left (\sum_^n X_i \geq t \right ) \leq \exp \left ( -\frac \right ).

2. Let

X_1, \ldots, X_n

be independent zero-mean random variables. Suppose that for some positive real

L

and every integer

k \geq 2

, :

\mathbb \left X_i^k \right , \right \leq \frac \mathbb \left_i^2\right L^ k!

Then :

\mathbb \left (\sum_^n X_i \geq 2t \sqrt \right ) < \exp(-t^2), \qquad \text\quad 0 \leq t \leq \frac\sqrt.

3. Let

X_1, \ldots, X_n

be independent zero-mean random variables. Suppose that :

\leq \frac \left(\frac\right)^

for all integer

k \geq 4.

Denote :

A_k = \sum \mathbb \left X_i^k\right

Then, :

\right) < 2 \exp (- t^2), \qquad \text \quad 0 < t \leq \frac.

4. Bernstein also proved generalizations of the inequalities above to weakly dependent random variables. For example, inequality (2) can be extended as follows. Let

X_1, \ldots, X_n

be possibly non-independent random variables. Suppose that for all integers

i>0

, :

L^ k! \end

Then :

\mathbb \left( \sum_^n X_i \geq 2t \sqrt \right) < \exp(-t^2), \qquad \text\quad  0 < t \leq \frac \sqrt.

More general results for martingales can be found in Fan et al. (2015).

Proofs

The proofs are based on an application of

Markov's inequality In probability theory, Markov's inequality gives an upper bound on the probability that a non-negative random variable is greater than or equal to some positive Constant (mathematics), constant. Markov's inequality is tight in the sense that for e ...

to the random variable :

\exp \left ( \lambda \sum_^n X_j \right ),

for a suitable choice of the parameter

\lambda > 0

Generalizations

The Bernstein inequality can be generalized to Gaussian random matrices. Let

G = g^H A g + 2 \operatorname(g^H a)

be a scalar where

A

is a complex Hermitian matrix and

a

is complex vector of size

N

. The vector

g \sim \mathcal(0,I)

is a Gaussian vector of size

N

. Then for any

\sigma \geq 0

, we have :

\mathbb \left( G \leq \operatorname(A) - \sqrt\sqrt - \sigma s^-(A) \right) < \exp(-\sigma),

where

\operatorname

is the vectorization operation and

s^- (A) = \max(-\lambda_(A),0)

where

\lambda_(A)

is the largest eigenvalue of

A

. The proof is detailed here. Another similar inequality is formulated as :

\mathbb \left( G \geq \operatorname(A) + \sqrt\sqrt + \sigma s^+(A) \right) < \exp(-\sigma),

where

s^+(A) = \max(\lambda_(A),0)

References

(according to: S.N.Bernstein, Collected Works, Nauka, 1964) A modern translation of some of these results can also be found in {{DEFAULTSORT:Bernstein Inequalities (Probability Theory) Probabilistic inequalities

Some of the inequalities

Proofs

Generalizations

See also

References