Sanov's Theorem
   HOME

TheInfoList



OR:

In mathematics and information theory, Sanov's theorem gives a bound on the probability of observing an
atypical ''Atypical'' is an American comedy-drama television series created by Robia Rashid for Netflix. It focuses on the life of 18-year-old Sam Gardner ( Keir Gilchrist), who is on the autism spectrum. The first season was released on August 11, 201 ...
sequence of samples from a given probability distribution. In the language of
large deviations theory In probability theory, the theory of large deviations concerns the asymptotic behaviour of remote tails of sequences of probability distributions. While some basic ideas of the theory can be traced to Laplace, the formalization started with insura ...
, Sanov's theorem identifies the
rate function In mathematics — specifically, in large deviations theory — a rate function is a function used to quantify the probabilities of rare events. It is required to have several properties which assist in the formulation of the large deviati ...
for large deviations of the
empirical measure In probability theory, an empirical measure is a random measure arising from a particular realization of a (usually finite) sequence of random variables. The precise definition is found below. Empirical measures are relevant to mathematical sta ...
of a sequence of
i.i.d. In probability theory and statistics, a collection of random variables is independent and identically distributed if each random variable has the same probability distribution as the others and all are mutually independent. This property is us ...
random variables. Let ''A'' be a set of probability distributions over an alphabet ''X'', and let ''q'' be an arbitrary distribution over ''X'' (where ''q'' may or may not be in ''A''). Suppose we draw ''n'' i.i.d. samples from ''q'', represented by the vector x^n = x_1, x_2, \ldots, x_n. Then, we have the following bound on the probability that the empirical measure \hat_ of the samples falls within the set ''A'': :q^n(\hat_\in A) \le (n+1)^ 2^, where * q^n is the joint probability distribution on X^n, and * p^* is the information projection of ''q'' onto ''A''. In words, the probability of drawing an atypical distribution is bounded by a function of the
KL divergence KL, kL, kl, or kl. may refer to: Businesses and organizations * KLM, a Dutch airline (IATA airline designator KL) * Koninklijke Landmacht, the Royal Netherlands Army * Kvenna Listin ("Women's List"), a political party in Iceland * KL FM, a Mala ...
from the true distribution to the atypical one; in the case that we consider a set of possible atypical distributions, there is a dominant atypical distribution, given by the information projection. Furthermore, if ''A'' is a closed set, then :\lim_\frac\log q^n(\hat_\in A) = -D_(p^*, , q).


Technical statement

Define: * \Sigma is a finite set with size \geq 2. Understood as “alphabet”. * \Delta(\Sigma) is the simplex spanned by the alphabet. It is a subset of \R^\Sigma. * L_n is a random variable taking values in \Delta(\Sigma). Take n samples from the distribution \mu, then L_n is the frequency probability vector for the sample. * \mathcal L_n is the space of values that L_n can take. In other words, it is \Then, Sanov's theorem states: * For every measurable subset S \in \Delta(\Sigma),-\inf_ D(\nu\, \mu) \leq \liminf_n \frac 1n \ln P_\mu(L_n \in S) \leq \limsup_n \frac 1n \ln P_\mu(L_n \in S) \leq -\inf_ D(\nu \, \mu) * For every open subset U \in \Delta(\Sigma),-\lim_n \lim_ D(\nu\, \mu) = \lim_n \frac 1n \ln P_\mu(L_n \in S) = -\inf_ D(\nu\, \mu) Here, int(S) means the interior, and cl(S) means the closure.


References

* *Sanov, I. N. (1957) "On the probability of large deviations of random variables". ''Mat. Sbornik'' 42(84), No. 1, 11–44. *Санов, И. Н. (1957) "О вероятности больших отклонений случайных величин". ''МАТЕМАТИЧЕСКИЙ СБОРНИК' 42(84), No. 1, 11–44. Information theory Probabilistic inequalities {{probability-stub