probability theory Probability theory or probability calculus is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expre ...

, an empirical process is a

stochastic process In probability theory and related fields, a stochastic () or random process is a mathematical object usually defined as a family of random variables in a probability space, where the index of the family often has the interpretation of time. Sto ...

that characterizes the deviation of the

empirical distribution function In statistics, an empirical distribution function ( an empirical cumulative distribution function, eCDF) is the Cumulative distribution function, distribution function associated with the empirical measure of a Sampling (statistics), sample. Th ...

from its expectation. In

mean field theory In physics and probability theory, Mean-field theory (MFT) or Self-consistent field theory studies the behavior of high-dimensional random (stochastic) models by studying a simpler model that approximates the original by averaging over degrees of ...

, limit theorems (as the number of objects becomes large) are considered and generalise the

central limit theorem In probability theory, the central limit theorem (CLT) states that, under appropriate conditions, the Probability distribution, distribution of a normalized version of the sample mean converges to a Normal distribution#Standard normal distributi ...

for

empirical measure In probability theory, an empirical measure is a random measure arising from a particular realization of a (usually finite) sequence of random variables. The precise definition is found below. Empirical measures are relevant to mathematical sta ...

s. Applications of the theory of empirical processes arise in

non-parametric statistics Nonparametric statistics is a type of statistical analysis that makes minimal assumptions about the underlying distribution of the data being studied. Often these models are infinite-dimensional, rather than finite dimensional, as in parametric s ...

Definition

For ''X''₁, ''X''₂, ... ''X''_''n'' independent and identically-distributed random variables in R with common

cumulative distribution function In probability theory and statistics, the cumulative distribution function (CDF) of a real-valued random variable X, or just distribution function of X, evaluated at x, is the probability that X will take a value less than or equal to x. Ever ...

''F''(''x''), the empirical distribution function is defined by :

F_n(x)=\frac\sum_^n I_(X_i),

where I_''C'' is the

indicator function In mathematics, an indicator function or a characteristic function of a subset of a set is a function that maps elements of the subset to one, and all other elements to zero. That is, if is a subset of some set , then the indicator functio ...

of the set ''C''. For every (fixed) ''x'', ''F''_''n''(''x'') is a sequence of random variables which converge to ''F''(''x'')

almost surely In probability theory, an event is said to happen almost surely (sometimes abbreviated as a.s.) if it happens with probability 1 (with respect to the probability measure). In other words, the set of outcomes on which the event does not occur ha ...

by the strong

law of large numbers In probability theory, the law of large numbers is a mathematical law that states that the average of the results obtained from a large number of independent random samples converges to the true value, if it exists. More formally, the law o ...

. That is, ''F''_''n'' converges to ''F''

pointwise In mathematics, the qualifier pointwise is used to indicate that a certain property is defined by considering each value f(x) of some Function (mathematics), function f. An important class of pointwise concepts are the ''pointwise operations'', that ...

. Glivenko and Cantelli strengthened this result by proving

uniform convergence In the mathematical field of analysis, uniform convergence is a mode of convergence of functions stronger than pointwise convergence. A sequence of functions (f_n) converges uniformly to a limiting function f on a set E as the function domain i ...

of ''F''_''n'' to ''F'' by the

Glivenko–Cantelli theorem In the theory of probability, the Glivenko–Cantelli theorem (sometimes referred to as the fundamental theorem of statistics), named after Valery Ivanovich Glivenko and Francesco Paolo Cantelli, describes the asymptotic behaviour of the empirica ...

. A centered and scaled version of the empirical measure is the

signed measure In mathematics, a signed measure is a generalization of the concept of (positive) measure by allowing the set function to take negative values, i.e., to acquire sign. Definition There are two slightly different concepts of a signed measure, de ...

G_n(A)=\sqrt(P_n(A)-P(A))

It induces a map on measurable functions ''f'' given by :

f\mapsto G_n f=\sqrt(P_n-P)f=\sqrt\left(\frac\sum_^n f(X_i)-\mathbbf\right)

By the

G_n(A)

converges in distribution In probability theory, there exist several different notions of convergence of sequences of random variables, including ''convergence in probability'', ''convergence in distribution'', and ''almost sure convergence''. The different notions of conve ...

to a normal random variable ''N''(0, ''P''(''A'')(1 − ''P''(''A''))) for fixed measurable set ''A''. Similarly, for a fixed function ''f'',

G_nf

converges in distribution to a normal random variable

N(0,\mathbb(f-\mathbbf)^2)

, provided that

\mathbbf

and

\mathbbf^2

exist. Definition :

\bigl(G_n(c)\bigr)_

is called an ''empirical process'' indexed by

\mathcal

, a collection of measurable subsets of ''S''. :

\bigl(G_nf\bigr)_

is called an ''empirical process'' indexed by

\mathcal

, a collection of measurable functions from ''S'' to

\mathbb

. A significant result in the area of empirical processes is Donsker's theorem. It has led to a study of Donsker classes: sets of functions with the useful property that empirical processes indexed by these classes converge weakly to a certain

Gaussian process In probability theory and statistics, a Gaussian process is a stochastic process (a collection of random variables indexed by time or space), such that every finite collection of those random variables has a multivariate normal distribution. The di ...

. While it can be shown that Donsker classes are Glivenko–Cantelli classes, the converse is not true in general.

Example

As an example, consider

s. For real-valued iid random variables ''X''₁, ''X''₂, ..., ''X''_''n'' they are given by :

F_n(x)=P_n((-\infty,x])=P_nI_.

In this case, empirical processes are indexed by a class

\mathcal=\.

It has been shown that

\mathcal

is a Donsker class, in particular, :

\sqrt(F_n(x)-F(x))

converges Weak convergence of measures, weakly in

\ell^\infty(\mathbb)

to a Brownian bridge ''B''(''F''(''x'')) .

References

External links

Empirical Processes: Theory and Applications
by David Pollard, a textbook available online.
Introduction to Empirical Processes and Semiparametric Inference
by Michael Kosorok, another textbook available online. {{Stochastic processes Nonparametric statistics

Definition

Example

See also

References

Further reading

External links