In
statistics
Statistics (from German: '' Statistik'', "description of a state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a scientific, indust ...
, an empirical distribution function (commonly also called an empirical Cumulative Distribution Function, eCDF) is the distribution function associated with the
empirical measure of a
sample
Sample or samples may refer to:
Base meaning
* Sample (statistics), a subset of a population – complete data set
* Sample (signal), a digital discrete sample of a continuous analog signal
* Sample (material), a specimen or small quantity of ...
. This
cumulative distribution function
In probability theory and statistics, the cumulative distribution function (CDF) of a real-valued random variable X, or just distribution function of X, evaluated at x, is the probability that X will take a value less than or equal to x.
Eve ...
is a
step function that jumps up by at each of the data points. Its value at any specified value of the measured variable is the fraction of observations of the measured variable that are less than or equal to the specified value.
The empirical distribution function is an estimate of the cumulative distribution function that generated the points in the sample. It converges with probability 1 to that underlying distribution, according to the
Glivenko–Cantelli theorem
In the theory of probability, the Glivenko–Cantelli theorem (sometimes referred to as the Fundamental Theorem of Statistics), named after Valery Ivanovich Glivenko and Francesco Paolo Cantelli, determines the asymptotic behaviour of the empir ...
. A number of results exist to quantify the rate of convergence of the empirical distribution function to the underlying cumulative distribution function.
Definition
Let be
independent, identically distributed
In probability theory and statistics, a collection of random variables is independent and identically distributed if each random variable has the same probability distribution as the others and all are mutually independent. This property is usual ...
real random variables with the common
cumulative distribution function
In probability theory and statistics, the cumulative distribution function (CDF) of a real-valued random variable X, or just distribution function of X, evaluated at x, is the probability that X will take a value less than or equal to x.
Eve ...
. Then the empirical distribution function is defined as
[
]
:
where
is the
indicator
Indicator may refer to:
Biology
* Environmental indicator of environmental health (pressures, conditions and responses)
* Ecological indicator of ecosystem health (ecological processes)
* Health indicator, which is used to describe the health o ...
of
event
Event may refer to:
Gatherings of people
* Ceremony, an event of ritual significance, performed on a special occasion
* Convention (meeting), a gathering of individuals engaged in some common interest
* Event management, the organization of ev ...
. For a fixed , the indicator
is a
Bernoulli random variable with parameter ; hence
is a
binomial random variable with
mean
There are several kinds of mean in mathematics, especially in statistics. Each mean serves to summarize a given group of data, often to better understand the overall value ( magnitude and sign) of a given data set.
For a data set, the '' ar ...
and
variance
In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its population mean or sample mean. Variance is a measure of dispersion, meaning it is a measure of how far a set of numbe ...
. This implies that
is an
unbiased estimator for .
However, in some textbooks, the definition is given as
[Madsen, H.O., Krenk, S., Lind, S.C. (2006) ''Methods of Structural Safety''. Dover Publications. p. 148-149. ]
Mean
The
mean
There are several kinds of mean in mathematics, especially in statistics. Each mean serves to summarize a given group of data, often to better understand the overall value ( magnitude and sign) of a given data set.
For a data set, the '' ar ...
of the empirical distribution is an
unbiased estimator of the mean of the population distribution.
which is more commonly denoted
Variance
The
variance
In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its population mean or sample mean. Variance is a measure of dispersion, meaning it is a measure of how far a set of numbe ...
of the empirical distribution times
is an unbiased estimator of the variance of the population distribution, for any distribution of X that has a finite variance.
Mean squared error
The mean squared error for the empirical distribution is as follows.
Where
is an estimator and
an unknown parameter
Quantiles
For any real number
the notation
(read “ceiling of a”) denotes the least integer greater than or equal to
. For any real number a, the notation
(read “floor of a”) denotes the greatest integer less than or equal to
.
If
is not an integer, then the
-th quantile is unique and is equal to
If
is an integer, then the
-th quantile is not unique and is any real number
such that