probability theory Probability theory is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expressing it through a set o ...

, the theory of large deviations concerns the asymptotic behaviour of remote tails of sequences of probability distributions. While some basic ideas of the theory can be traced to

Laplace Pierre-Simon, marquis de Laplace (; ; 23 March 1749 – 5 March 1827) was a French scholar and polymath whose work was important to the development of engineering, mathematics, statistics, physics, astronomy, and philosophy. He summarized ...

, the formalization started with insurance mathematics, namely

ruin theory In actuarial science and applied probability, ruin theory (sometimes risk theory or collective risk theory) uses mathematical models to describe an insurer's vulnerability to insolvency/ruin. In such models key quantities of interest are the prob ...

with Cramér and

Lundberg Lundberg is a surname of Danish language, Swedish origin. Lundberg means "wooded hill" or "mountain grove". Notable people with the surname include: A *Agneta Lundberg (born 1947), Swedish Social Democratic politician *Alfred Lundberg (1852–1935 ...

. A unified formalization of large deviation theory was developed in 1966, in a paper by Varadhan. Large deviations theory formalizes the heuristic ideas of ''concentration of measures'' and widely generalizes the notion of

convergence of probability measures ''Convergence of Probability Measures'' is a graduate textbook in the field of mathematical probability theory. It was written by Patrick Billingsley and published by Wiley in 1968. A second edition in 1999 both simplified its treatment of previ ...

. Roughly speaking, large deviations theory concerns itself with the exponential decline of the probability measures of certain kinds of extreme or ''tail'' events.

Introductory examples

An elementary example

Consider a sequence of independent tosses of a fair coin. The possible outcomes could be heads or tails. Let us denote the possible outcome of the i-th trial by where we encode head as 1 and tail as 0. Now let

M_N

denote the mean value after

N

trials, namely : Then

M_N

lies between 0 and 1. From the

law of large numbers In probability theory, the law of large numbers (LLN) is a theorem that describes the result of performing the same experiment a large number of times. According to the law, the average of the results obtained from a large number of trials shou ...

it follows that as N grows, the distribution of

M_N

converges to

0.5 = \operatorname /math> (the expected value of a single coin toss).

Moreover, by the central limit theorem, it follows that M_N is approximately normally distributed for large  The central limit theorem can provide more detailed information about the behavior of M_N than the law of large numbers. For example, we can approximately find a tail probability of   that M_N is greater than  for a fixed value of  However, the approximation by the central limit theorem may not be accurate if x is far from \operatorname_i /math> unless N is sufficiently large. Also, it does not provide information about the convergence of the tail probabilities as  However, the large deviation theory can provide answers for such problems.

Let us make this statement more precise. For a given value  let us compute the tail probability  Define

:

Note that the function I(x) is a convex, nonnegative function that is zero at x = \tfrac and increases as x approaches  It is the negative of the

Bernoulli entropy In information theory, the binary entropy function, denoted \operatorname H(p) or \operatorname H_\text(p), is defined as the entropy of a Bernoulli process with probability p of one of two values. It is a special case of \Eta(X), the entropy fun ...

with that it's appropriate for coin tosses follows from the asymptotic equipartition property applied to a

Bernoulli trial In the theory of probability and statistics, a Bernoulli trial (or binomial trial) is a random experiment with exactly two possible outcomes, "success" and "failure", in which the probability of success is the same every time the experiment is c ...

. Then by

Chernoff's inequality In probability theory, the Chernoff bound gives exponentially decreasing bounds on tail distributions of sums of independent random variables. Despite being named after Herman Chernoff, the author of the paper it first appeared in, the result is d ...

, it can be shown that This bound is rather sharp, in the sense that

I(x)

cannot be replaced with a larger number which would yield a strict inequality for all positive (However, the exponential bound can still be reduced by a subexponential factor on the order of this follows from the

Stirling approximation In mathematics, Stirling's approximation (or Stirling's formula) is an approximation for factorials. It is a good approximation, leading to accurate results even for small values of n. It is named after James Stirling, though a related but less p ...

applied to the

binomial coefficient In mathematics, the binomial coefficients are the positive integers that occur as coefficients in the binomial theorem. Commonly, a binomial coefficient is indexed by a pair of integers and is written \tbinom. It is the coefficient of the t ...

appearing in the

Bernoulli distribution In probability theory and statistics, the Bernoulli distribution, named after Swiss mathematician Jacob Bernoulli,James Victor Uspensky: ''Introduction to Mathematical Probability'', McGraw-Hill, New York 1937, page 45 is the discrete probabil ...

.) Hence, we obtain the following result: : The probability

P(M_N > x)

decays exponentially as

N \to \infty

at a rate depending on ''x''. This formula approximates any tail probability of the sample mean of i.i.d. variables and gives its convergence as the number of samples increases.

Large deviations for sums of independent random variables

In the above example of coin-tossing we explicitly assumed that each toss is an independent trial, and the probability of getting head or tail is always the same. Let

X,X_1,X_2, \ldots

be independent and identically distributed (i.i.d.) random variables whose common distribution satisfies a certain growth condition. Then the following limit exists: : Here : as before. Function

I(\cdot)

is called the " rate function" or "Cramér function" or sometimes the "entropy function". The above-mentioned limit means that for large : which is the basic result of large deviations theory. If we know the probability distribution of an explicit expression for the rate function can be obtained. This is given by a

Legendre–Fenchel transformation In mathematics and mathematical optimization, the convex conjugate of a function is a generalization of the Legendre transformation which applies to non-convex functions. It is also known as Legendre–Fenchel transformation, Fenchel transformati ...

, : where :

\lambda(\theta) = \ln \operatorname exp(\theta X) /math>

is called the

cumulant generating function In probability theory and statistics, the cumulants of a probability distribution are a set of quantities that provide an alternative to the '' moments'' of the distribution. Any two probability distributions whose moments are identical will have ...

(CGF) and

\operatorname

denotes the

mathematical expectation In probability theory, the expected value (also called expectation, expectancy, mathematical expectation, mean, average, or first moment) is a generalization of the weighted average. Informally, the expected value is the arithmetic mean of a l ...

. If

X

follows a

normal distribution In statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is : f(x) = \frac e^ The parameter \mu ...

, the rate function becomes a parabola with its apex at the mean of the normal distribution. If

\

is a

Markov chain A Markov chain or Markov process is a stochastic model describing a sequence of possible events in which the probability of each event depends only on the state attained in the previous event. Informally, this may be thought of as, "What happe ...

, the variant of the basic large deviations result stated above may hold.

Formal definition

Given a Polish space

\mathcal

let

\

be a sequence of

Borel Borel may refer to: People * Borel (author), 18th-century French playwright * Jacques Brunius, Borel (1906–1967), pseudonym of the French actor Jacques Henri Cottance * Émile Borel (1871 – 1956), a French mathematician known for his founding ...

probability measures on let

\

be a sequence of positive real numbers such that and finally let

I:\mathcal\to, \infty /math> be a lower semicontinuous functional on \mathcal. The sequence \ is said to satisfy a

large deviation principle In mathematics — specifically, in large deviations theory — a rate function is a function used to quantify the probabilities of rare events. It is required to have several properties which assist in the formulation of the large deviat ...

with ''speed''

\

and ''rate''

I

if, and only if, for each Borel measurable set : where

\overline

and

E^\circ

denote respectively the closure and

interior Interior may refer to: Arts and media * ''Interior'' (Degas) (also known as ''The Rape''), painting by Edgar Degas * ''Interior'' (play), 1895 play by Belgian playwright Maurice Maeterlinck * ''The Interior'' (novel), by Lisa See * Interior de ...

Brief history

The first rigorous results concerning large deviations are due to the Swedish mathematician

Harald Cramér Harald Cramér (; 25 September 1893 – 5 October 1985) was a Swedish mathematician, actuary, and statistician, specializing in mathematical statistics and probabilistic number theory. John Kingman described him as "one of the giants of statist ...

, who applied them to model the insurance business. From the point of view of an insurance company, the earning is at a constant rate per month (the monthly premium) but the claims come randomly. For the company to be successful over a certain period of time (preferably many months), the total earning should exceed the total claim. Thus to estimate the premium you have to ask the following question: "What should we choose as the premium

q

such that over

N

months the total claim

C = \Sigma X_i

should be less than This is clearly the same question asked by the large deviations theory. Cramér gave a solution to this question for i.i.d.

random variable A random variable (also called random quantity, aleatory variable, or stochastic variable) is a mathematical formalization of a quantity or object which depends on random events. It is a mapping or a function from possible outcomes (e.g., the po ...

s, where the rate function is expressed as a power series. A very incomplete list of mathematicians who have made important advances would include Petrov,Petrov V.V. (1954) Generalization of Cramér's limit theorem. Uspehi Matem. Nauk, v. 9, No 4(62), 195--202.(Russian) Sanov,Sanov I.N. (1957) On the probability of large deviations of random magnitudes. Matem. Sbornik, v. 42 (84), 11--44. S.R.S. Varadhan (who has won the Abel prize for his contribution to the theory),

D. Ruelle David Pierre Ruelle (; born 20 August 1935) is a Belgians, Belgian mathematical physicist, naturalized French people, French. He has worked on statistical physics and dynamical systems. With Floris Takens, Ruelle coined the term ''strange attrac ...

, O.E. Lanford, Amir Dembo, and Ofer Zeitouni.

Applications

Principles of large deviations may be effectively applied to gather information out of a probabilistic model. Thus, theory of large deviations finds its applications in

information theory Information theory is the scientific study of the quantification (science), quantification, computer data storage, storage, and telecommunication, communication of information. The field was originally established by the works of Harry Nyquist a ...

and risk management. In physics, the best known application of large deviations theory arise in thermodynamics and

statistical mechanics In physics, statistical mechanics is a mathematical framework that applies statistical methods and probability theory to large assemblies of microscopic entities. It does not assume or postulate any natural laws, but explains the macroscopic be ...

(in connection with relating entropy with rate function).

Large deviations and entropy

The rate function is related to the entropy in statistical mechanics. This can be heuristically seen in the following way. In statistical mechanics the entropy of a particular macro-state is related to the number of micro-states which corresponds to this macro-state. In our coin tossing example the mean value

M_N

could designate a particular macro-state. And the particular sequence of heads and tails which gives rise to a particular value of

M_N

constitutes a particular micro-state. Loosely speaking a macro-state having a higher number of micro-states giving rise to it, has higher entropy. And a state with higher entropy has a higher chance of being realised in actual experiments. The macro-state with mean value of 1/2 (as many heads as tails) has the highest number of micro-states giving rise to it and it is indeed the state with the highest entropy. And in most practical situations we shall indeed obtain this macro-state for large numbers of trials. The "rate function" on the other hand measures the probability of appearance of a particular macro-state. The smaller the rate function the higher is the chance of a macro-state appearing. In our coin-tossing the value of the "rate function" for mean value equal to 1/2 is zero. In this way one can see the "rate function" as the negative of the "entropy". There is a relation between the "rate function" in large deviations theory and the Kullback–Leibler divergence, the connection is established by Sanov's theorem (see Sanov and Novak,Novak S.Y. (2011) Extreme value methods with applications to finance. Chapman & Hall/CRC Press. . ch. 14.5). In a special case, large deviations are closely related to the concept of Gromov–Hausdorff limits.Kotani M., Sunada T. ''Large deviation and the tangent cone at infinity of a crystal lattice'', Math. Z. 254, (2006), 837-870.

References

Bibliography

Special invited paper: Large deviations
by S. R. S. Varadhan The Annals of Probability 2008, Vol. 36, No. 2, 397–419
A basic introduction to large deviations: Theory, applications, simulations
Hugo Touchette, arXiv:1106.4146. * Entropy, Large Deviations and Statistical Mechanics by R.S. Ellis, Springer Publication. * Large Deviations for Performance Analysis by Alan Weiss and Adam Shwartz. Chapman and Hall * Large Deviations Techniques and Applications by Amir Dembo and Ofer Zeitouni. Springer * Random Perturbations of Dynamical Systems by M.I. Freidlin and A.D. Wentzell. Springer * "Large Deviations for Two Dimensional Navier-Stokes Equation with Multiplicative Noise", S. S. Sritharan and P. Sundar, Stochastic Processes and Their Applications, Vol. 116 (2006) 1636–165

*"Large Deviations for the Stochastic Shell Model of Turbulence", U. Manna, S. S. Sritharan and P. Sundar, NoDEA Nonlinear Differential Equations Appl. 16 (2009), no. 4, 493–52

{{DEFAULTSORT:Large Deviations Theory Large deviations theory, Asymptotic analysis Asymptotic theory (statistics)