HOME

TheInfoList



OR:

In
probability theory Probability theory is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expressing it through a set o ...
, an empirical process is a
stochastic process In probability theory and related fields, a stochastic () or random process is a mathematical object usually defined as a family of random variables. Stochastic processes are widely used as mathematical models of systems and phenomena that appea ...
that describes the proportion of objects in a system in a given state. For a process in a discrete state space a population continuous time Markov chain or Markov population model is a process which counts the number of objects in a given state (without rescaling). In
mean field theory In physics and probability theory, Mean-field theory (MFT) or Self-consistent field theory studies the behavior of high-dimensional random (stochastic) models by studying a simpler model that approximates the original by averaging over degrees of ...
, limit theorems (as the number of objects becomes large) are considered and generalise the
central limit theorem In probability theory, the central limit theorem (CLT) establishes that, in many situations, when independent random variables are summed up, their properly normalized sum tends toward a normal distribution even if the original variables themselv ...
for
empirical measure In probability theory, an empirical measure is a random measure arising from a particular realization of a (usually finite) sequence of random variables. The precise definition is found below. Empirical measures are relevant to mathematical sta ...
s. Applications of the theory of empirical processes arise in
non-parametric statistics Nonparametric statistics is the branch of statistics that is not based solely on parametrized families of probability distributions (common examples of parameters are the mean and variance). Nonparametric statistics is based on either being distr ...
.


Definition

For ''X''1, ''X''2, ... ''X''''n''
independent and identically-distributed random variables In probability theory and statistics, a collection of random variables is independent and identically distributed if each random variable has the same probability distribution as the others and all are mutually independent. This property is usual ...
in R with common
cumulative distribution function In probability theory and statistics, the cumulative distribution function (CDF) of a real-valued random variable X, or just distribution function of X, evaluated at x, is the probability that X will take a value less than or equal to x. Ev ...
''F''(''x''), the empirical distribution function is defined by :F_n(x)=\frac\sum_^n I_(X_i), where I''C'' is the
indicator function In mathematics, an indicator function or a characteristic function of a subset of a set is a function that maps elements of the subset to one, and all other elements to zero. That is, if is a subset of some set , one has \mathbf_(x)=1 if x\i ...
of the set ''C''. For every (fixed) ''x'', ''F''''n''(''x'') is a sequence of random variables which converge to ''F''(''x'')
almost surely In probability theory, an event is said to happen almost surely (sometimes abbreviated as a.s.) if it happens with probability 1 (or Lebesgue measure 1). In other words, the set of possible exceptions may be non-empty, but it has probability 0. ...
by the strong law of large numbers. That is, ''F''''n'' converges to ''F''
pointwise In mathematics, the qualifier pointwise is used to indicate that a certain property is defined by considering each value f(x) of some function f. An important class of pointwise concepts are the ''pointwise operations'', that is, operations defined ...
. Glivenko and Cantelli strengthened this result by proving
uniform convergence In the mathematical field of analysis, uniform convergence is a mode of convergence of functions stronger than pointwise convergence. A sequence of functions (f_n) converges uniformly to a limiting function f on a set E if, given any arbitrarily s ...
of ''F''''n'' to ''F'' by the
Glivenko–Cantelli theorem In the theory of probability, the Glivenko–Cantelli theorem (sometimes referred to as the Fundamental Theorem of Statistics), named after Valery Ivanovich Glivenko and Francesco Paolo Cantelli, determines the asymptotic behaviour of the empir ...
. A centered and scaled version of the empirical measure is the
signed measure In mathematics, signed measure is a generalization of the concept of (positive) measure by allowing the set function to take negative values. Definition There are two slightly different concepts of a signed measure, depending on whether or not ...
:G_n(A)=\sqrt(P_n(A)-P(A)) It induces a map on measurable functions ''f'' given by :f\mapsto G_n f=\sqrt(P_n-P)f=\sqrt\left(\frac\sum_^n f(X_i)-\mathbbf\right) By the
central limit theorem In probability theory, the central limit theorem (CLT) establishes that, in many situations, when independent random variables are summed up, their properly normalized sum tends toward a normal distribution even if the original variables themselv ...
, G_n(A)
converges in distribution In probability theory, there exist several different notions of convergence of random variables. The convergence of sequences of random variables to some limit random variable is an important concept in probability theory, and its applications to ...
to a
normal Normal(s) or The Normal(s) may refer to: Film and television * ''Normal'' (2003 film), starring Jessica Lange and Tom Wilkinson * ''Normal'' (2007 film), starring Carrie-Anne Moss, Kevin Zegers, Callum Keith Rennie, and Andrew Airlie * ''Norma ...
random variable ''N''(0, ''P''(''A'')(1 − ''P''(''A''))) for fixed measurable set ''A''. Similarly, for a fixed function ''f'', G_nf converges in distribution to a normal random variable N(0,\mathbb(f-\mathbbf)^2), provided that \mathbbf and \mathbbf^2 exist. Definition :\bigl(G_n(c)\bigr)_ is called an ''empirical process'' indexed by \mathcal, a collection of measurable subsets of ''S''. :\bigl(G_nf\bigr)_ is called an ''empirical process'' indexed by \mathcal, a collection of measurable functions from ''S'' to \mathbb. A significant result in the area of empirical processes is
Donsker's theorem In probability theory, Donsker's theorem (also known as Donsker's invariance principle, or the functional central limit theorem), named after Monroe D. Donsker, is a functional extension of the central limit theorem. Let X_1, X_2, X_3, \ldots be ...
. It has led to a study of Donsker classes: sets of functions with the useful property that empirical processes indexed by these classes converge weakly to a certain Gaussian process. While it can be shown that Donsker classes are Glivenko–Cantelli classes, the converse is not true in general.


Example

As an example, consider empirical distribution functions. For real-valued
iid In probability theory and statistics, a collection of random variables is independent and identically distributed if each random variable has the same probability distribution as the others and all are mutually independent. This property is us ...
random variables ''X''1, ''X''2, ..., ''X''''n'' they are given by :F_n(x)=P_n((-\infty,x])=P_nI_. In this case, empirical processes are indexed by a class \mathcal=\. It has been shown that \mathcal is a Donsker class, in particular, :\sqrt(F_n(x)-F(x)) converges Weak convergence of measures, weakly in \ell^\infty(\mathbb) to a Brownian bridge ''B''(''F''(''x'')) .


See also

*
Khmaladze transformation In statistics, the Khmaladze transformation is a mathematical tool used in constructing convenient goodness of fit tests for hypothetical distribution functions. More precisely, suppose X_1,\ldots, X_n are i.i.d., possibly multi-dimensional, ran ...
*
Weak convergence of measures In mathematics, more specifically measure theory, there are various notions of the convergence of measures. For an intuitive general sense of what is meant by ''convergence of measures'', consider a sequence of measures μ''n'' on a space, sharing ...
*
Glivenko–Cantelli theorem In the theory of probability, the Glivenko–Cantelli theorem (sometimes referred to as the Fundamental Theorem of Statistics), named after Valery Ivanovich Glivenko and Francesco Paolo Cantelli, determines the asymptotic behaviour of the empir ...


References


Further reading

* * * * * * * *


External links


Empirical Processes: Theory and Applications
by David Pollard, a textbook available online.
Introduction to Empirical Processes and Semiparametric Inference
by Michael Kosorok, another textbook available online. {{Stochastic processes Nonparametric statistics