probability theory Probability theory or probability calculus is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expre ...

, Donsker's theorem (also known as Donsker's invariance principle, or the functional central limit theorem), named after Monroe D. Donsker, is a functional extension of the

central limit theorem In probability theory, the central limit theorem (CLT) states that, under appropriate conditions, the Probability distribution, distribution of a normalized version of the sample mean converges to a Normal distribution#Standard normal distributi ...

for empirical distribution functions. Specifically, the theorem states that an appropriately centered and scaled version of the empirical distribution function converges to a

Gaussian process In probability theory and statistics, a Gaussian process is a stochastic process (a collection of random variables indexed by time or space), such that every finite collection of those random variables has a multivariate normal distribution. The di ...

. Let

X_1, X_2, X_3, \ldots

be a sequence of

independent and identically distributed Independent or Independents may refer to: Arts, entertainment, and media Artist groups * Independents (artist group), a group of modernist painters based in Pennsylvania, United States * Independentes (English: Independents), a Portuguese artist ...

(i.i.d.)

random variables A random variable (also called random quantity, aleatory variable, or stochastic variable) is a mathematical formalization of a quantity or object which depends on random events. The term 'random variable' in its mathematical definition refers ...

with mean 0 and variance 1. Let

S_n:=\sum_^n X_i

. The stochastic process

S:=(S_n)_

is known as a

random walk In mathematics, a random walk, sometimes known as a drunkard's walk, is a stochastic process that describes a path that consists of a succession of random steps on some Space (mathematics), mathematical space. An elementary example of a rand ...

. Define the diffusively rescaled random walk (partial-sum process) by :

W^(t) := \frac, \qquad t\in,1

The

asserts that

W^(1)

converges in distribution In probability theory, there exist several different notions of convergence of sequences of random variables, including ''convergence in probability'', ''convergence in distribution'', and ''almost sure convergence''. The different notions of conve ...

to a standard Gaussian random variable

W(1)

n\to\infty

. Donsker's invariance principle extends this convergence to the whole function

W^:=(W^(t))_

. More precisely, in its modern form, Donsker's invariance principle states that: As

taking values in the Skorokhod space

\mathcal,1 /math>, the random function W^

to a

standard Brownian motion In mathematics, the Wiener process (or Brownian motion, due to its historical connection with the physical process of the same name) is a real-valued continuous-time stochastic process discovered by Norbert Wiener. It is one of the best know ...

W:=(W(t))_

n\to \infty.

Donsker_theorem_for_uniform_distributions

Donsker_theorem_for_normal_distributions

Formal statement

Let ''F''_''n'' be the

empirical distribution function In statistics, an empirical distribution function ( an empirical cumulative distribution function, eCDF) is the Cumulative distribution function, distribution function associated with the empirical measure of a Sampling (statistics), sample. Th ...

of the sequence of i.i.d. random variables

X_1, X_2, X_3, \ldots

with distribution function ''F.'' Define the centered and scaled version of ''F''_''n'' by :

G_n(x)= \sqrt n ( F_n(x) - F(x) )

indexed by ''x'' ∈ R. By the classical

, for fixed ''x'', the random variable ''G''_''n''(''x'')

to a Gaussian (normal)

random variable A random variable (also called random quantity, aleatory variable, or stochastic variable) is a Mathematics, mathematical formalization of a quantity or object which depends on randomness, random events. The term 'random variable' in its mathema ...

''G''(''x'') with zero mean and variance ''F''(''x'')(1 − ''F''(''x'')) as the sample size ''n'' grows. Theorem (Donsker, Skorokhod, Kolmogorov) The sequence of ''G''_''n''(''x''), as random elements of the Skorokhod space

\mathcal(-\infty,\infty)

to a

''G'' with zero mean and covariance given by :

= \min\ - F(s)

(t).

The process ''G''(''x'') can be written as ''B''(''F''(''x'')) where ''B'' is a standard

Brownian bridge A Brownian bridge is a continuous-time gaussian process ''B''(''t'') whose probability distribution is the conditional probability distribution of a standard Wiener process ''W''(''t'') (a mathematical model of Brownian motion) subject to the con ...

on the unit interval.

Proof sketch

For continuous probability distributions, it reduces to the case where the distribution is uniform on

, 1 The comma is a punctuation mark that appears in several variants in different languages. Some typefaces render it as a small line, slightly curved or straight, but inclined from the vertical; others give it the appearance of a miniature fille ...

by the inverse transform. Given any finite sequence of times

0 < t_1 < t_2 < \dots < t_n < 1

, we have that

N F_N(t_1)

is distributed as a

binomial distribution In probability theory and statistics, the binomial distribution with parameters and is the discrete probability distribution of the number of successes in a sequence of statistical independence, independent experiment (probability theory) ...

with mean

Nt_1

and variance

Nt_1(1-t_1)

. Similarly, the joint distribution of

F_N(t_1), F_N(t_2), \dots, F_N(t_n)

is a multinomial distribution. Now, the central limit approximation for multinomial distributions shows that

\lim_N \sqrt N (F_N(t_i) - t_i)

converges in distribution to a gaussian process with covariance matrix with entries

\min(t_i, t_j) - t_i t_j

, which is precisely the covariance matrix for the Brownian bridge.

History and related results

Kolmogorov (1933) showed that when ''F'' is

continuous Continuity or continuous may refer to: Mathematics * Continuity (mathematics), the opposing concept to discreteness; common examples include ** Continuous probability distribution or random variable in probability and statistics ** Continuous ...

, the supremum

\scriptstyle\sup_t G_n(t)

and supremum of absolute value,

\scriptstyle\sup_t , G_n(t),

to the laws of the same functionals of the

''B''(''t''), see the

Kolmogorov–Smirnov test In statistics, the Kolmogorov–Smirnov test (also K–S test or KS test) is a nonparametric statistics, nonparametric test of the equality of continuous (or discontinuous, see #Discrete and mixed null distribution, Section 2.2), one-dimensional ...

. In 1949 Doob asked whether the convergence in distribution held for more general functionals, thus formulating a problem of weak convergence of random functions in a suitable

function space In mathematics, a function space is a set of functions between two fixed sets. Often, the domain and/or codomain will have additional structure which is inherited by the function space. For example, the set of functions from any set into a ve ...

. In 1952 Donsker stated and proved (not quite correctly) a general extension for the Doob–Kolmogorov heuristic approach. In the original paper, Donsker proved that the convergence in law of ''G''_''n'' to the Brownian bridge holds for Uniform ,1distributions with respect to uniform convergence in ''t'' over the interval ,1 However Donsker's formulation was not quite correct because of the problem of measurability of the functionals of discontinuous processes. In 1956 Skorokhod and Kolmogorov defined a separable metric ''d'', called the ''Skorokhod metric'', on the space of

càdlàg In mathematics, a càdlàg (), RCLL ("right continuous with left limits"), or corlol ("continuous on (the) right, limit on (the) left") function is a function defined on the real numbers (or a subset of them) that is everywhere right-continuous an ...

functions on ,1 such that convergence for ''d'' to a continuous function is equivalent to convergence for the sup norm, and showed that ''G_n'' converges in law in

\mathcal,1 /math> to the Brownian bridge.

Later Dudley reformulated Donsker's result to avoid the problem of measurability and the need of the Skorokhod metric. One can prove that there exist ''X''

_''i'', iid uniform in ,1and a sequence of sample-continuous Brownian bridges ''B''_''n'', such that :

\, G_n-B_n\, _\infty

is measurable and

converges in probability In probability theory, there exist several different notions of convergence of sequences of random variables, including ''convergence in probability'', ''convergence in distribution'', and ''almost sure convergence''. The different notions of conve ...

to 0. An improved version of this result, providing more detail on the rate of convergence, is the

Komlós–Major–Tusnády approximation In probability theory, the Komlós–Major–Tusnády approximation (also known as the KMT approximation, the KMT embedding, or the Hungarian embedding) refers to one of the two strong embedding theorems: 1) approximation of random walk by a standar ...

References

{{DEFAULTSORT:Donsker's Theorem Theorems in probability theory Theorems in statistics Empirical process

Formal statement

Proof sketch

History and related results

See also

References