In
statistics, an empirical distribution function (commonly also called an empirical Cumulative Distribution Function, eCDF) is the distribution function associated with the
empirical measure
In probability theory, an empirical measure is a random measure arising from a particular realization of a (usually finite) sequence of random variables. The precise definition is found below. Empirical measures are relevant to mathematical sta ...
of a
sample. This
cumulative distribution function is a
step function
In mathematics, a function on the real numbers is called a step function if it can be written as a finite linear combination of indicator functions of intervals. Informally speaking, a step function is a piecewise constant function having onl ...
that jumps up by at each of the data points. Its value at any specified value of the measured variable is the fraction of observations of the measured variable that are less than or equal to the specified value.
The empirical distribution function is an estimate of the cumulative distribution function that generated the points in the sample. It converges with probability 1 to that underlying distribution, according to the
Glivenko–Cantelli theorem
In the theory of probability, the Glivenko–Cantelli theorem (sometimes referred to as the Fundamental Theorem of Statistics), named after Valery Ivanovich Glivenko and Francesco Paolo Cantelli, determines the asymptotic behaviour of the empir ...
. A number of results exist to quantify the rate of convergence of the empirical distribution function to the underlying cumulative distribution function.
Definition
Let be
independent, identically distributed
In probability theory and statistics, a collection of random variables is independent and identically distributed if each random variable has the same probability distribution as the others and all are mutually independent. This property is us ...
real random variables with the common
cumulative distribution function . Then the empirical distribution function is defined as
[
]
:
where
is the
indicator
Indicator may refer to:
Biology
* Environmental indicator of environmental health (pressures, conditions and responses)
* Ecological indicator of ecosystem health (ecological processes)
* Health indicator, which is used to describe the health ...
of
event
Event may refer to:
Gatherings of people
* Ceremony, an event of ritual significance, performed on a special occasion
* Convention (meeting), a gathering of individuals engaged in some common interest
* Event management, the organization of e ...
. For a fixed , the indicator
is a
Bernoulli random variable
In probability theory and statistics, the Bernoulli distribution, named after Swiss mathematician Jacob Bernoulli,James Victor Uspensky: ''Introduction to Mathematical Probability'', McGraw-Hill, New York 1937, page 45 is the discrete probabili ...
with parameter ; hence
is a
binomial random variable with
mean
There are several kinds of mean in mathematics, especially in statistics. Each mean serves to summarize a given group of data, often to better understand the overall value (magnitude and sign) of a given data set.
For a data set, the '' ari ...
and
variance
In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its population mean or sample mean. Variance is a measure of dispersion, meaning it is a measure of how far a set of numbe ...
. This implies that
is an
unbiased
Bias is a disproportionate weight ''in favor of'' or ''against'' an idea or thing, usually in a way that is closed-minded, prejudicial, or unfair. Biases can be innate or learned. People may develop biases for or against an individual, a group, ...
estimator for .
However, in some textbooks, the definition is given as
[Madsen, H.O., Krenk, S., Lind, S.C. (2006) ''Methods of Structural Safety''. Dover Publications. p. 148-149. ]
Mean
The
mean
There are several kinds of mean in mathematics, especially in statistics. Each mean serves to summarize a given group of data, often to better understand the overall value (magnitude and sign) of a given data set.
For a data set, the '' ari ...
of the empirical distribution is an
unbiased estimator
In statistics, the bias of an estimator (or bias function) is the difference between this estimator's expected value and the true value of the parameter being estimated. An estimator or decision rule with zero bias is called ''unbiased''. In sta ...
of the mean of the population distribution.
which is more commonly denoted
Variance
The
variance
In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its population mean or sample mean. Variance is a measure of dispersion, meaning it is a measure of how far a set of numbe ...
of the empirical distribution times
is an unbiased estimator of the variance of the population distribution, for any distribution of X that has a finite variance.
Mean squared error
The mean squared error for the empirical distribution is as follows.
Where
is an estimator and
an unknown parameter
Quantiles
For any real number
the notation
(read “ceiling of a”) denotes the least integer greater than or equal to
. For any real number a, the notation
(read “floor of a”) denotes the greatest integer less than or equal to
.
If
is not an integer, then the
-th quantile is unique and is equal to
If
is an integer, then the
-th quantile is not unique and is any real number
such that