In
probability theory
Probability theory or probability calculus is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expre ...
, an empirical process is a
stochastic process
In probability theory and related fields, a stochastic () or random process is a mathematical object usually defined as a family of random variables in a probability space, where the index of the family often has the interpretation of time. Sto ...
that characterizes the deviation of the
empirical distribution function
In statistics, an empirical distribution function ( an empirical cumulative distribution function, eCDF) is the Cumulative distribution function, distribution function associated with the empirical measure of a Sampling (statistics), sample. Th ...
from its expectation.
In
mean field theory
In physics and probability theory, Mean-field theory (MFT) or Self-consistent field theory studies the behavior of high-dimensional random (stochastic) models by studying a simpler model that approximates the original by averaging over degrees of ...
, limit theorems (as the number of objects becomes large) are considered and generalise the
central limit theorem
In probability theory, the central limit theorem (CLT) states that, under appropriate conditions, the Probability distribution, distribution of a normalized version of the sample mean converges to a Normal distribution#Standard normal distributi ...
for
empirical measure
In probability theory, an empirical measure is a random measure arising from a particular realization of a (usually finite) sequence of random variables. The precise definition is found below. Empirical measures are relevant to mathematical sta ...
s. Applications of the theory of empirical processes arise in
non-parametric statistics
Nonparametric statistics is a type of statistical analysis that makes minimal assumptions about the underlying distribution of the data being studied. Often these models are infinite-dimensional, rather than finite dimensional, as in parametric s ...
.
Definition
For ''X''
1, ''X''
2, ... ''X''
''n'' independent and identically-distributed random variables in R with common
cumulative distribution function
In probability theory and statistics, the cumulative distribution function (CDF) of a real-valued random variable X, or just distribution function of X, evaluated at x, is the probability that X will take a value less than or equal to x.
Ever ...
''F''(''x''), the empirical distribution function is defined by
:
where I
''C'' is the
indicator function
In mathematics, an indicator function or a characteristic function of a subset of a set is a function that maps elements of the subset to one, and all other elements to zero. That is, if is a subset of some set , then the indicator functio ...
of the set ''C''.
For every (fixed) ''x'', ''F''
''n''(''x'') is a sequence of random variables which converge to ''F''(''x'')
almost surely
In probability theory, an event is said to happen almost surely (sometimes abbreviated as a.s.) if it happens with probability 1 (with respect to the probability measure). In other words, the set of outcomes on which the event does not occur ha ...
by the strong
law of large numbers
In probability theory, the law of large numbers is a mathematical law that states that the average of the results obtained from a large number of independent random samples converges to the true value, if it exists. More formally, the law o ...
. That is, ''F''
''n'' converges to ''F''
pointwise In mathematics, the qualifier pointwise is used to indicate that a certain property is defined by considering each value f(x) of some Function (mathematics), function f. An important class of pointwise concepts are the ''pointwise operations'', that ...
. Glivenko and Cantelli strengthened this result by proving
uniform convergence
In the mathematical field of analysis, uniform convergence is a mode of convergence of functions stronger than pointwise convergence. A sequence of functions (f_n) converges uniformly to a limiting function f on a set E as the function domain i ...
of ''F''
''n'' to ''F'' by the
Glivenko–Cantelli theorem
In the theory of probability, the Glivenko–Cantelli theorem (sometimes referred to as the fundamental theorem of statistics), named after Valery Ivanovich Glivenko and Francesco Paolo Cantelli, describes the asymptotic behaviour of the empirica ...
.
A centered and scaled version of the empirical measure is the
signed measure
In mathematics, a signed measure is a generalization of the concept of (positive) measure by allowing the set function to take negative values, i.e., to acquire sign.
Definition
There are two slightly different concepts of a signed measure, de ...
:
It induces a map on measurable functions ''f'' given by
:
By the
central limit theorem
In probability theory, the central limit theorem (CLT) states that, under appropriate conditions, the Probability distribution, distribution of a normalized version of the sample mean converges to a Normal distribution#Standard normal distributi ...
,
converges in distribution
In probability theory, there exist several different notions of convergence of sequences of random variables, including ''convergence in probability'', ''convergence in distribution'', and ''almost sure convergence''. The different notions of conve ...
to a
normal random variable ''N''(0, ''P''(''A'')(1 − ''P''(''A''))) for fixed measurable set ''A''. Similarly, for a fixed function ''f'',
converges in distribution to a normal random variable
, provided that
and
exist.
Definition
:
is called an ''empirical process'' indexed by
, a collection of measurable subsets of ''S''.
:
is called an ''empirical process'' indexed by
, a collection of measurable functions from ''S'' to
.
A significant result in the area of empirical processes is
Donsker's theorem. It has led to a study of
Donsker classes: sets of functions with the useful property that empirical processes indexed by these classes
converge weakly to a certain
Gaussian process
In probability theory and statistics, a Gaussian process is a stochastic process (a collection of random variables indexed by time or space), such that every finite collection of those random variables has a multivariate normal distribution. The di ...
. While it can be shown that Donsker classes are
Glivenko–Cantelli classes, the converse is not true in general.
Example
As an example, consider
empirical distribution function
In statistics, an empirical distribution function ( an empirical cumulative distribution function, eCDF) is the Cumulative distribution function, distribution function associated with the empirical measure of a Sampling (statistics), sample. Th ...
s. For real-valued
iid random variables ''X''
1, ''X''
2, ..., ''X''
''n'' they are given by
:
In this case, empirical processes are indexed by a class
It has been shown that
is a Donsker class, in particular,
:
converges
Weak convergence of measures, weakly in
to a
Brownian bridge ''B''(''F''(''x'')) .
See also
*
Khmaladze transformation
*
Weak convergence of measures
*
Glivenko–Cantelli theorem
In the theory of probability, the Glivenko–Cantelli theorem (sometimes referred to as the fundamental theorem of statistics), named after Valery Ivanovich Glivenko and Francesco Paolo Cantelli, describes the asymptotic behaviour of the empirica ...
References
Further reading
*
*
*
*
*
*
*
*
External links
Empirical Processes: Theory and Applications by David Pollard, a textbook available online.
Introduction to Empirical Processes and Semiparametric Inference by Michael Kosorok, another textbook available online.
{{Stochastic processes
Nonparametric statistics