In
probability theory
Probability theory is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expressing it through a set o ...
, an empirical process is a
stochastic process
In probability theory and related fields, a stochastic () or random process is a mathematical object usually defined as a family of random variables. Stochastic processes are widely used as mathematical models of systems and phenomena that appea ...
that describes the proportion of objects in a system in a given state.
For a process in a discrete state space a population continuous time Markov chain or Markov population model is a process which counts the number of objects in a given state (without rescaling).
In
mean field theory
In physics and probability theory, Mean-field theory (MFT) or Self-consistent field theory studies the behavior of high-dimensional random (stochastic) models by studying a simpler model that approximates the original by averaging over degrees of ...
, limit theorems (as the number of objects becomes large) are considered and generalise the
central limit theorem
In probability theory, the central limit theorem (CLT) establishes that, in many situations, when independent random variables are summed up, their properly normalized sum tends toward a normal distribution even if the original variables themselv ...
for
empirical measure
In probability theory, an empirical measure is a random measure arising from a particular realization of a (usually finite) sequence of random variables. The precise definition is found below. Empirical measures are relevant to mathematical sta ...
s. Applications of the theory of empirical processes arise in
non-parametric statistics
Nonparametric statistics is the branch of statistics that is not based solely on parametrized families of probability distributions (common examples of parameters are the mean and variance). Nonparametric statistics is based on either being distr ...
.
Definition
For ''X''
1, ''X''
2, ... ''X''
''n'' independent and identically-distributed random variables
In probability theory and statistics, a collection of random variables is independent and identically distributed if each random variable has the same probability distribution as the others and all are mutually independent. This property is usual ...
in R with common
cumulative distribution function
In probability theory and statistics, the cumulative distribution function (CDF) of a real-valued random variable X, or just distribution function of X, evaluated at x, is the probability that X will take a value less than or equal to x.
Ev ...
''F''(''x''), the empirical distribution function is defined by
:
where I
''C'' is the
indicator function
In mathematics, an indicator function or a characteristic function of a subset of a set is a function that maps elements of the subset to one, and all other elements to zero. That is, if is a subset of some set , one has \mathbf_(x)=1 if x\i ...
of the set ''C''.
For every (fixed) ''x'', ''F''
''n''(''x'') is a sequence of random variables which converge to ''F''(''x'')
almost surely
In probability theory, an event is said to happen almost surely (sometimes abbreviated as a.s.) if it happens with probability 1 (or Lebesgue measure 1). In other words, the set of possible exceptions may be non-empty, but it has probability 0. ...
by the strong
law of large numbers. That is, ''F''
''n'' converges to ''F''
pointwise In mathematics, the qualifier pointwise is used to indicate that a certain property is defined by considering each value f(x) of some function f. An important class of pointwise concepts are the ''pointwise operations'', that is, operations defined ...
. Glivenko and Cantelli strengthened this result by proving
uniform convergence
In the mathematical field of analysis, uniform convergence is a mode of convergence of functions stronger than pointwise convergence. A sequence of functions (f_n) converges uniformly to a limiting function f on a set E if, given any arbitrarily s ...
of ''F''
''n'' to ''F'' by the
Glivenko–Cantelli theorem
In the theory of probability, the Glivenko–Cantelli theorem (sometimes referred to as the Fundamental Theorem of Statistics), named after Valery Ivanovich Glivenko and Francesco Paolo Cantelli, determines the asymptotic behaviour of the empir ...
.
A centered and scaled version of the empirical measure is the
signed measure
In mathematics, signed measure is a generalization of the concept of (positive) measure by allowing the set function to take negative values.
Definition
There are two slightly different concepts of a signed measure, depending on whether or not ...
:
It induces a map on measurable functions ''f'' given by
:
By the
central limit theorem
In probability theory, the central limit theorem (CLT) establishes that, in many situations, when independent random variables are summed up, their properly normalized sum tends toward a normal distribution even if the original variables themselv ...
,
converges in distribution
In probability theory, there exist several different notions of convergence of random variables. The convergence of sequences of random variables to some limit random variable is an important concept in probability theory, and its applications to ...
to a
normal Normal(s) or The Normal(s) may refer to:
Film and television
* ''Normal'' (2003 film), starring Jessica Lange and Tom Wilkinson
* ''Normal'' (2007 film), starring Carrie-Anne Moss, Kevin Zegers, Callum Keith Rennie, and Andrew Airlie
* ''Norma ...
random variable ''N''(0, ''P''(''A'')(1 − ''P''(''A''))) for fixed measurable set ''A''. Similarly, for a fixed function ''f'',
converges in distribution to a normal random variable
, provided that
and
exist.
Definition
:
is called an ''empirical process'' indexed by
, a collection of measurable subsets of ''S''.
:
is called an ''empirical process'' indexed by
, a collection of measurable functions from ''S'' to
.
A significant result in the area of empirical processes is
Donsker's theorem
In probability theory, Donsker's theorem (also known as Donsker's invariance principle, or the functional central limit theorem), named after Monroe D. Donsker, is a functional extension of the central limit theorem.
Let X_1, X_2, X_3, \ldots be ...
. It has led to a study of
Donsker classes: sets of functions with the useful property that empirical processes indexed by these classes
converge weakly to a certain
Gaussian process. While it can be shown that Donsker classes are
Glivenko–Cantelli classes, the converse is not true in general.
Example
As an example, consider
empirical distribution functions. For real-valued
iid
In probability theory and statistics, a collection of random variables is independent and identically distributed if each random variable has the same probability distribution as the others and all are mutually independent. This property is us ...
random variables ''X''
1, ''X''
2, ..., ''X''
''n'' they are given by
:
In this case, empirical processes are indexed by a class
It has been shown that
is a Donsker class, in particular,
:
converges
Weak convergence of measures, weakly in
to a
Brownian bridge ''B''(''F''(''x'')) .
See also
*
Khmaladze transformation
In statistics, the Khmaladze transformation is a mathematical tool used in constructing convenient goodness of fit tests for hypothetical distribution functions. More precisely, suppose X_1,\ldots, X_n are i.i.d., possibly multi-dimensional, ran ...
*
Weak convergence of measures
In mathematics, more specifically measure theory, there are various notions of the convergence of measures. For an intuitive general sense of what is meant by ''convergence of measures'', consider a sequence of measures μ''n'' on a space, sharing ...
*
Glivenko–Cantelli theorem
In the theory of probability, the Glivenko–Cantelli theorem (sometimes referred to as the Fundamental Theorem of Statistics), named after Valery Ivanovich Glivenko and Francesco Paolo Cantelli, determines the asymptotic behaviour of the empir ...
References
Further reading
*
*
*
*
*
*
*
*
External links
Empirical Processes: Theory and Applications by David Pollard, a textbook available online.
Introduction to Empirical Processes and Semiparametric Inference by Michael Kosorok, another textbook available online.
{{Stochastic processes
Nonparametric statistics