Glivenko–Cantelli Theorem
   HOME

TheInfoList



OR:

In the
theory of probability Probability theory is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expressing it through a set o ...
, the Glivenko–Cantelli theorem (sometimes referred to as the Fundamental Theorem of Statistics), named after
Valery Ivanovich Glivenko Valery Ivanovich Glivenko (russian: Вале́рий Ива́нович Гливе́нко, uk, Валерій Іванович Гливенко; 2 January 1897 (Gregorian calendar) / 21 December 1896 (Julian calendar) in Kiev – 15 February ...
and
Francesco Paolo Cantelli Francesco Paolo Cantelli (20 December 187521 July 1966) was an Italian mathematician. He made contributions to celestial mechanics, probability theory, and actuarial science. Biography Cantelli was born in Palermo. He received his doctorate in ...
, determines the asymptotic behaviour of the
empirical distribution function In statistics, an empirical distribution function (commonly also called an empirical Cumulative Distribution Function, eCDF) is the distribution function associated with the empirical measure of a sample. This cumulative distribution function ...
as the number of
independent and identically distributed In probability theory and statistics, a collection of random variables is independent and identically distributed if each random variable has the same probability distribution as the others and all are mutually independent. This property is usua ...
observations grows. The uniform convergence of more general
empirical measure In probability theory, an empirical measure is a random measure arising from a particular realization of a (usually finite) sequence of random variables. The precise definition is found below. Empirical measures are relevant to mathematical stat ...
s becomes an important property of the Glivenko–Cantelli classes of functions or sets. The Glivenko–Cantelli classes arise in
Vapnik–Chervonenkis theory Vapnik–Chervonenkis theory (also known as VC theory) was developed during 1960–1990 by Vladimir Vapnik and Alexey Chervonenkis. The theory is a form of computational learning theory, which attempts to explain the learning process from a stati ...
, with applications to
machine learning Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence. Machine ...
. Applications can be found in
econometrics Econometrics is the application of Statistics, statistical methods to economic data in order to give Empirical evidence, empirical content to economic relationships.M. Hashem Pesaran (1987). "Econometrics," ''The New Palgrave: A Dictionary of ...
making use of
M-estimator In statistics, M-estimators are a broad class of extremum estimators for which the objective function is a sample average. Both non-linear least squares and maximum likelihood estimation are special cases of M-estimators. The definition of M-estima ...
s.


Statement

Assume that X_1,X_2,\dots are
independent and identically distributed random variables In probability theory and statistics, a collection of random variables is independent and identically distributed if each random variable has the same probability distribution as the others and all are mutually independent. This property is us ...
in \mathbb with common
cumulative distribution function In probability theory and statistics, the cumulative distribution function (CDF) of a real-valued random variable X, or just distribution function of X, evaluated at x, is the probability that X will take a value less than or equal to x. Ev ...
F(x). The ''
empirical distribution function In statistics, an empirical distribution function (commonly also called an empirical Cumulative Distribution Function, eCDF) is the distribution function associated with the empirical measure of a sample. This cumulative distribution function ...
'' for X_1,\dots,X_n is defined by :F_n(x)=\frac\sum_^n I_(x) = \frac\left, \left\\ where I_C is the
indicator function In mathematics, an indicator function or a characteristic function of a subset of a set is a function that maps elements of the subset to one, and all other elements to zero. That is, if is a subset of some set , one has \mathbf_(x)=1 if x\i ...
of the set C. For every (fixed) x, F_n(x) is a sequence of random variables which converge to F(x)
almost surely In probability theory, an event is said to happen almost surely (sometimes abbreviated as a.s.) if it happens with probability 1 (or Lebesgue measure 1). In other words, the set of possible exceptions may be non-empty, but it has probability 0. ...
by the strong
law of large numbers In probability theory, the law of large numbers (LLN) is a theorem that describes the result of performing the same experiment a large number of times. According to the law, the average of the results obtained from a large number of trials shou ...
. Glivenko and Cantelli strengthened this result by proving
uniform convergence In the mathematical field of analysis, uniform convergence is a mode of convergence of functions stronger than pointwise convergence. A sequence of functions (f_n) converges uniformly to a limiting function f on a set E if, given any arbitrarily s ...
of F_n to F. Theorem :\, F_n - F\, _\infty = \sup_ , F_n(x) - F(x), \longrightarrow 0 almost surely. This theorem originates with Valery Glivenko and Francesco Cantelli, in 1933. Remarks *If X_n is a stationary
ergodic process In physics, statistics, econometrics and signal processing, a stochastic process is said to be in an ergodic regime if an observable's ensemble average equals the time average. In this regime, any collection of random samples from a process must ...
, then F_n(x) converges almost surely to F(x)=\operatorname E(1_). The Glivenko–Cantelli theorem gives a stronger mode of convergence than this in the
iid In probability theory and statistics, a collection of random variables is independent and identically distributed if each random variable has the same probability distribution as the others and all are mutually independent. This property is us ...
case. *An even stronger uniform convergence result for the empirical distribution function is available in the form of an extended type of
law of the iterated logarithm In probability theory, the law of the iterated logarithm describes the magnitude of the fluctuations of a random walk. The original statement of the law of the iterated logarithm is due to A. Ya. Khinchin (1924). Another statement was given by A ...
. See asymptotic properties of the empirical distribution function for this and related results.


Proof

For simplicity, consider a case of continuous random variable X. Fix -\infty =x_0 such that F(x_j)-F(x_)=\frac for j=1,\dots,m. Now for all x \in \mathbb there exists j \in \ such that x \in _,x_j/math>. Note that : \begin F_n(x)-F(x) &\leq F_n(x_j)-F(x_) = F_n(x_j)-F(x_j)+\frac1m,\\ F_n(x)-F(x) &\geq F_n(x_)-F(x_j) = F_n(x_)-F(x_)-\frac1m. \end Therefore, : \, F_n-F\, _\infty = \sup_, F_n(x)-F(x), \leq \max_ , F_n(x_j)-F(x_j), + \frac1m. Since \max_ , F_n(x_j)-F(x_j), \to 0 \text by strong law of large numbers, we can guarantee that for any positive \varepsilon and any integer m such that 1/m<\varepsilon, we can find N such that for all n \geq N, we have \max_ , F_n(x_j)-F(x_j), \leq \varepsilon-1/m \text. Combined with the above result, this further implies that \, F_n-F\, _\infty \leq \varepsilon \text, which is the definition of almost sure convergence.


Empirical measures

One can generalize the ''empirical distribution function'' by replacing the set (-\infty,x] by an arbitrary set ''C'' from a class of sets \mathcal to obtain an
empirical measure In probability theory, an empirical measure is a random measure arising from a particular realization of a (usually finite) sequence of random variables. The precise definition is found below. Empirical measures are relevant to mathematical stat ...
indexed by sets C \in \mathcal. :P_n(C)=\frac \sum_^n I_C(X_i), C\in\mathcal Where I_C(x) is the
indicator function In mathematics, an indicator function or a characteristic function of a subset of a set is a function that maps elements of the subset to one, and all other elements to zero. That is, if is a subset of some set , one has \mathbf_(x)=1 if x\i ...
of each set C. Further generalization is the map induced by P_n on measurable real-valued functions ''f'', which is given by :f\mapsto P_nf=\int_Sf \, dP_n = \frac 1 n \sum_^n f(X_i), f\in\mathcal. Then it becomes an important property of these classes whether the strong
law of large numbers In probability theory, the law of large numbers (LLN) is a theorem that describes the result of performing the same experiment a large number of times. According to the law, the average of the results obtained from a large number of trials shou ...
holds uniformly on \mathcal or \mathcal.


Glivenko–Cantelli class

Consider a set \mathcal with a sigma algebra of Borel set, Borel subsets ''A'' and a
probability measure In mathematics, a probability measure is a real-valued function defined on a set of events in a probability space that satisfies measure properties such as ''countable additivity''. The difference between a probability measure and the more gener ...
''P''. For a class of subsets, :\subset\ and a class of functions :\mathcal\subset\ define random variables :\, P_n-P\, _=\sup_ , P_n(C)-P(C), :\, P_n-P\, _=\sup_ , P_nf- Pf, where P_n(C) is the empirical measure, P_n f is the corresponding map, and :\operatorname E f=\int_\mathcal f \, dP = P f , assuming that it exists. Definitions * A class \mathcal C is called a ''Glivenko–Cantelli class'' (or ''GC class'') with respect to a probability measure ''P'' if any of the following equivalent statements is true. ::1. \, P_n-P\, _\mathcal\to 0 almost surely as n\to\infty. ::2. \, P_n-P\, _\mathcal\to 0 in probability as n\to\infty. ::3. \operatorname E \, P_n-P\, _\mathcal\to 0, as n\to\infty (convergence in mean). :The Glivenko–Cantelli classes of functions are defined similarly. *A class is called a ''universal Glivenko–Cantelli class'' if it is a GC class with respect to any probability measure ''P'' on (''S'',''A''). *A class is called ''uniformly Glivenko–Cantelli'' if the convergence occurs uniformly over all probability measures ''P'' on (''S'',''A''): ::\sup_ \operatorname E \, P_n-P\, _\mathcal\to 0; ::\sup_ \operatorname E \, P_n-P\, _\mathcal\to 0. Theorem (
Vapnik Vladimir Naumovich Vapnik (russian: Владимир Наумович Вапник; born 6 December 1936) is one of the main developers of the Vapnik–Chervonenkis theory of statistical learning, and the co-inventor of the support-vector machin ...
and
Chervonenkis Alexey Yakovlevich Chervonenkis (russian: link=no, Алексей Яковлевич Червоненкис; 7 September 1938 – 22 September 2014) was a Soviet and Russian mathematician. Along with Vladimir Vapnik, he was one of the main develop ...
, 1968) : ''A class of sets \mathcal is uniformly GC if and only if it is a Vapnik–Chervonenkis class. ''


Examples

* Let S=\mathbb R and =\. The classical Glivenko–Cantelli theorem implies that this class is a universal GC class. Furthermore, by Kolmogorov's theorem, :\sup_\, P_n-P\, _ \sim n^, that is \mathcal is uniformly Glivenko–Cantelli class. * Let ''P'' be a nonatomic probability measure on ''S'' and \mathcal be a class of all finite subsets in ''S''. Because A_n=\\in \mathcal, P(A_n)=0, P_n(A_n)=1, we have that \, P_n-P\, _=1 and so \mathcal is ''not'' a GC class with respect to ''P''.


See also

*
Donsker's theorem In probability theory, Donsker's theorem (also known as Donsker's invariance principle, or the functional central limit theorem), named after Monroe D. Donsker, is a functional extension of the central limit theorem. Let X_1, X_2, X_3, \ldots be ...
*
Dvoretzky–Kiefer–Wolfowitz inequality In the theory of probability and statistics, the Dvoretzky–Kiefer–Wolfowitz–Massart inequality (DKW inequality) bounds how close an empirically determined distribution function will be to the distribution function from which the empirical ...
– strengthens the Glivenko–Cantelli theorem by quantifying the rate of convergence.


References


Further reading

* * * * * * {{DEFAULTSORT:Glivenko-Cantelli Theorem Empirical process Asymptotic theory (statistics) Probability theorems Theorems in statistics