HOME

TheInfoList



OR:

In the
theory of probability Probability theory or probability calculus is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expre ...
, the Glivenko–Cantelli theorem (sometimes referred to as the fundamental theorem of statistics), named after Valery Ivanovich Glivenko and
Francesco Paolo Cantelli Francesco Paolo Cantelli (20 December 187521 July 1966) was an Italian mathematician. He made contributions to celestial mechanics, probability theory, and actuarial science. Biography Cantelli was born in Palermo. He received his doctorate in ...
, describes the asymptotic behaviour of the
empirical distribution function In statistics, an empirical distribution function ( an empirical cumulative distribution function, eCDF) is the Cumulative distribution function, distribution function associated with the empirical measure of a Sampling (statistics), sample. Th ...
as the number of
independent and identically distributed Independent or Independents may refer to: Arts, entertainment, and media Artist groups * Independents (artist group), a group of modernist painters based in Pennsylvania, United States * Independentes (English: Independents), a Portuguese artist ...
observations grows. Specifically, the empirical distribution function
converges uniformly In the mathematical field of analysis, uniform convergence is a mode of convergence of functions stronger than pointwise convergence. A sequence of functions (f_n) converges uniformly to a limiting function f on a set E as the function domain i ...
to the true distribution function
almost surely In probability theory, an event is said to happen almost surely (sometimes abbreviated as a.s.) if it happens with probability 1 (with respect to the probability measure). In other words, the set of outcomes on which the event does not occur ha ...
. The uniform convergence of more general
empirical measure In probability theory, an empirical measure is a random measure arising from a particular realization of a (usually finite) sequence of random variables. The precise definition is found below. Empirical measures are relevant to mathematical sta ...
s becomes an important property of the Glivenko–Cantelli classes of functions or sets. The Glivenko–Cantelli classes arise in
Vapnik–Chervonenkis theory Vapnik–Chervonenkis theory (also known as VC theory) was developed during 1960–1990 by Vladimir Vapnik and Alexey Chervonenkis. The theory is a form of computational learning theory, which attempts to explain the learning process from a stat ...
, with applications to
machine learning Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of Computational statistics, statistical algorithms that can learn from data and generalise to unseen data, and thus perform Task ( ...
. Applications can be found in
econometrics Econometrics is an application of statistical methods to economic data in order to give empirical content to economic relationships. M. Hashem Pesaran (1987). "Econometrics", '' The New Palgrave: A Dictionary of Economics'', v. 2, p. 8 p. 8 ...
making use of
M-estimator In statistics, M-estimators are a broad class of extremum estimators for which the objective function is a sample average. Both non-linear least squares and maximum likelihood estimation are special cases of M-estimators. The definition of M-estim ...
s.


Statement

Assume that X_1,X_2,\dots are
independent and identically distributed random variables Independent or Independents may refer to: Arts, entertainment, and media Artist groups * Independents (artist group), a group of modernist painters based in Pennsylvania, United States * Independentes (English: Independents), a Portuguese artis ...
in \mathbb with common
cumulative distribution function In probability theory and statistics, the cumulative distribution function (CDF) of a real-valued random variable X, or just distribution function of X, evaluated at x, is the probability that X will take a value less than or equal to x. Ever ...
F(x). The ''
empirical distribution function In statistics, an empirical distribution function ( an empirical cumulative distribution function, eCDF) is the Cumulative distribution function, distribution function associated with the empirical measure of a Sampling (statistics), sample. Th ...
'' for X_1,\dots,X_n is defined by F_n(x) = \frac \sum_^n I_(X_i) = \frac \bigl, \left\\bigr, , where I_C is the
indicator function In mathematics, an indicator function or a characteristic function of a subset of a set is a function that maps elements of the subset to one, and all other elements to zero. That is, if is a subset of some set , then the indicator functio ...
of the set C. For every (fixed) x, F_n(x) is a sequence of random variables which converge to F(x)
almost surely In probability theory, an event is said to happen almost surely (sometimes abbreviated as a.s.) if it happens with probability 1 (with respect to the probability measure). In other words, the set of outcomes on which the event does not occur ha ...
by the strong
law of large numbers In probability theory, the law of large numbers is a mathematical law that states that the average of the results obtained from a large number of independent random samples converges to the true value, if it exists. More formally, the law o ...
. Glivenko and Cantelli strengthened this result by proving
uniform convergence In the mathematical field of analysis, uniform convergence is a mode of convergence of functions stronger than pointwise convergence. A sequence of functions (f_n) converges uniformly to a limiting function f on a set E as the function domain i ...
of F_n to F.


Theorem

: \, F_n - F\, _\infty = \sup_ \bigl, F_n(x) - F(x)\bigr, \longrightarrow 0 almost surely. This theorem originates with Valery Glivenko and Francesco Cantelli, in 1933.


Remarks

* If X_n is a stationary
ergodic process In physics, statistics, econometrics and signal processing, a stochastic process is said to be in an ergodic regime if an observable's ensemble average equals the time average. In this regime, any collection of random samples from a process must ...
, then F_n(x) converges almost surely to F(x) = \operatorname\mathbb \bigl 1_ \bigr The Glivenko–Cantelli theorem gives a stronger mode of convergence than this in the iid case. * An even stronger uniform convergence result for the empirical distribution function is available in the form of an extended type of
law of the iterated logarithm In probability theory, the law of the iterated logarithm describes the magnitude of the fluctuations of a random walk. The original statement of the law of the iterated logarithm is due to Aleksandr Khinchin, A. Ya. Khinchin (1924). Another state ...
. See asymptotic properties of the empirical distribution function for this and related results.


Proof

For simplicity, consider a case of continuous random variable X. Fix -\infty =x_0 such that F(x_j)-F(x_)=\frac for j=1,\dots,m. Now for all x \in \mathbb there exists j \in \ such that x \in _,x_j/math>. : \begin F_n(x)-F(x) &\leq F_n(x_j)-F(x_) = F_n(x_j)-F(x_j)+\frac1m,\\ F_n(x)-F(x) &\geq F_n(x_)-F(x_j) = F_n(x_)-F(x_)-\frac1m. \end Therefore, : \, F_n-F\, _\infty = \sup_, F_n(x)-F(x), \leq \max_ , F_n(x_j)-F(x_j), + \frac1m. Since \max_ , F_n(x_j)-F(x_j), \to 0 \text by strong law of large numbers, we can guarantee that for any positive \varepsilon and any integer m such that 1/m<\varepsilon, we can find N such that for all n \geq N, we have \max_ , F_n(x_j)-F(x_j), \leq \varepsilon-1/m \text. Combined with the above result, this further implies that \, F_n-F\, _\infty \leq \varepsilon \text, which is the definition of almost sure convergence.


Empirical measures

One can generalize the ''empirical distribution function'' by replacing the set (-\infty,x] by an arbitrary set ''C'' from a class of sets \mathcal to obtain an
empirical measure In probability theory, an empirical measure is a random measure arising from a particular realization of a (usually finite) sequence of random variables. The precise definition is found below. Empirical measures are relevant to mathematical sta ...
indexed by sets C \in \mathcal. :P_n(C)=\frac \sum_^n I_C(X_i), C\in\mathcal Where I_C(x) is the
indicator function In mathematics, an indicator function or a characteristic function of a subset of a set is a function that maps elements of the subset to one, and all other elements to zero. That is, if is a subset of some set , then the indicator functio ...
of each set C. Further generalization is the map induced by P_n on measurable real-valued functions ''f'', which is given by :f\mapsto P_nf=\int_Sf \, dP_n = \frac 1 n \sum_^n f(X_i), f\in\mathcal. Then it becomes an important property of these classes whether the strong
law of large numbers In probability theory, the law of large numbers is a mathematical law that states that the average of the results obtained from a large number of independent random samples converges to the true value, if it exists. More formally, the law o ...
holds uniformly on \mathcal or \mathcal.


Glivenko–Cantelli class

Consider a set \ \mathcal\ with a sigma algebra of Borel set, Borel subsets and a
probability measure In mathematics, a probability measure is a real-valued function defined on a set of events in a σ-algebra that satisfies Measure (mathematics), measure properties such as ''countable additivity''. The difference between a probability measure an ...
\ \mathbb ~. For a class of subsets, : \mathcal \subset \bigl\ and a class of functions : \mathcal \subset \bigl\ define random variables : \bigl\, \mathbb_n - \mathbb \bigr\, _ = \sup_ \bigl, \mathbb_n(C) - \mathbb(C) \bigr, : \bigl\, \mathbb_n - \mathbb \bigr\, _ = \sup_ \bigl, \mathbb_n f - \mathbb f \bigr, where \ \mathbb_n(C)\ is the empirical measure, \ \mathbb_n f\ is the corresponding map, and :\ \mathbb f = \int_\mathcal f \ \mathrm\mathbb\ , assuming that it exists. Definitions * A class \ \mathcal C\ is called a ''Glivenko–Cantelli class'' (or ''GC class'', or sometimes ''strong GC class'') with respect to a probability measure if ::\ \bigl\, \mathbb_n - \mathbb \bigr\, _\mathcal \to 0\ almost surely as \ n \to \infty ~. * A class \ \mathcal C\ is a ''weak Glivenko-Cantelli class'' with respect to if it instead satisfies the weaker condition ::\ \bigl\, \mathbb_n - \mathbb \bigr\, _\mathcal \to 0\ in probability as \ n \to \infty ~. *A class is called a ''universal Glivenko–Cantelli class'' if it is a GC class with respect to any probability measure \mathbb on (\mathcal, A). *A class is a ''weak uniform Glivenko–Cantelli class'' if the convergence occurs uniformly over all probability measures \mathbb on (\mathcal, A): For every \varepsilon > 0, ::\ \sup_ \Pr\left(\bigl\, \mathbb_n - \mathbb \bigr\, _\mathcal > \varepsilon\right) \to 0\ as \ n \to \infty ~. *A class is a (strong) ''uniform Glivenko-Cantelli class'' if it satisfies the stronger condition that for every \varepsilon > 0, ::\ \sup_ \Pr\left(\sup_ \bigl\, \mathbb_m - \mathbb \bigr\, _\mathcal > \varepsilon\right) \to 0\ as \ n \to \infty ~. Glivenko–Cantelli classes of functions (as well as their uniform and universal forms) are defined similarly, replacing all instances of \mathcal with \mathcal. The weak and strong versions of the various Glivenko-Cantelli properties often coincide under certain regularity conditions. The following definition commonly appears in such regularity conditions: * A class of functions \mathcal is ''image-admissible Suslin'' if there exists a Suslin space \Omega and a surjection T:\Omega \rightarrow \mathcal such that the map (x, y) \mapsto (y)x) is measurable \mathcal\times\Omega. * A class of measurable sets \mathcal is image-admissible Suslin if the class of functions \ is image-admissible Suslin, where \mathbf_C denotes the
indicator function In mathematics, an indicator function or a characteristic function of a subset of a set is a function that maps elements of the subset to one, and all other elements to zero. That is, if is a subset of some set , then the indicator functio ...
for the set C. Theorems The following two theorems give sufficient conditions for the weak and strong versions of the Glivenko-Cantelli property to be equivalent. Theorem ( Talagrand, 1987) : ''Let \mathcal be a class of functions that is integrable \mathbb, and define \mathcal_0 = \. Then the following are equivalent:'' * ''\mathcal is a weak Glivenko-Cantelli class and \mathcal_0 is dominated by an integrable function'' * ''\mathcal is a Glivenko-Cantelli class'' Theorem (
Dudley Dudley ( , ) is a market town in the West Midlands, England, southeast of Wolverhampton and northwest of Birmingham. Historically part of Worcestershire, the town is the administrative centre of the Metropolitan Borough of Dudley. In the ...
, Giné, and Zinn, 1991) : ''Suppose that a function class \mathcal is bounded. Also suppose that the set \mathcal_0 = \ is image-admissible Suslin. Then \mathcal is a weak uniform Glivenko-Cantelli class if and only if it is a strong uniform Glivenko-Cantelli class.'' The following theorem is central to statistical learning of binary classification tasks. Theorem (
Vapnik Vladimir Naumovich Vapnik (; born 6 December 1936) is a statistician, researcher, and academic. He is one of the main developers of the Vapnik–Chervonenkis theory of statistical learning and the co-inventor of the support-vector machine method a ...
and Chervonenkis, 1968) : ''Under certain consistency conditions, a universally measurable class of sets \ \mathcal\ is a uniform Glivenko-Cantelli class if and only if it is a Vapnik–Chervonenkis class.'' There exist a variety of consistency conditions for the equivalence of uniform Glivenko-Cantelli and Vapnik-Chervonenkis classes. In particular, either of the following conditions for a class \mathcal suffice: * \mathcal is image-admissible Suslin. * \mathcal is universally separable: There exists a countable subset \mathcal of \mathcal such that each set C\in\mathcal can be written as the pointwise limit of sets in \mathcal_0.


Examples

* Let S=\mathbb R and =\. The classical Glivenko–Cantelli theorem implies that this class is a universal GC class. Furthermore, by Kolmogorov's theorem, :\sup_\, P_n-P\, _ \sim n^, that is \mathcal is uniformly Glivenko–Cantelli class. * Let ''P'' be a nonatomic probability measure on ''S'' and \mathcal be a class of all finite subsets in ''S''. Because A_n=\\in \mathcal, P(A_n)=0, P_n(A_n)=1, we have that \, P_n-P\, _=1 and so \mathcal is ''not'' a GC class with respect to ''P''.


See also

*
Donsker's theorem In probability theory, Donsker's theorem (also known as Donsker's invariance principle, or the functional central limit theorem), named after Monroe D. Donsker, is a functional extension of the central limit theorem for empirical distribution fun ...
* Dvoretzky–Kiefer–Wolfowitz inequality – strengthens the Glivenko–Cantelli theorem by quantifying the rate of convergence.


References


Further reading

* * * * * * {{DEFAULTSORT:Glivenko-Cantelli Theorem Empirical process Asymptotic theory (statistics) Theorems in probability theory Theorems in statistics