In
statistics, an empirical distribution function (commonly also called an empirical Cumulative Distribution Function, eCDF) is the distribution function associated with the
empirical measure of a
sample. This
cumulative distribution function
In probability theory and statistics, the cumulative distribution function (CDF) of a real-valued random variable X, or just distribution function of X, evaluated at x, is the probability that X will take a value less than or equal to x.
Ev ...
is a
step function
In mathematics, a function on the real numbers is called a step function if it can be written as a finite linear combination of indicator functions of intervals. Informally speaking, a step function is a piecewise constant function having on ...
that jumps up by at each of the data points. Its value at any specified value of the measured variable is the fraction of observations of the measured variable that are less than or equal to the specified value.
The empirical distribution function is an estimate of the cumulative distribution function that generated the points in the sample. It converges with probability 1 to that underlying distribution, according to the
Glivenko–Cantelli theorem. A number of results exist to quantify the rate of convergence of the empirical distribution function to the underlying cumulative distribution function.
Definition
Let be
independent, identically distributed real random variables with the common
cumulative distribution function
In probability theory and statistics, the cumulative distribution function (CDF) of a real-valued random variable X, or just distribution function of X, evaluated at x, is the probability that X will take a value less than or equal to x.
Ev ...
. Then the empirical distribution function is defined as
[
]
:
where
is the
indicator of
event . For a fixed , the indicator
is a
Bernoulli random variable
In probability theory and statistics, the Bernoulli distribution, named after Swiss mathematician Jacob Bernoulli,James Victor Uspensky: ''Introduction to Mathematical Probability'', McGraw-Hill, New York 1937, page 45 is the discrete probabil ...
with parameter ; hence
is a
binomial random variable with
mean
There are several kinds of mean in mathematics, especially in statistics. Each mean serves to summarize a given group of data, often to better understand the overall value ( magnitude and sign) of a given data set.
For a data set, the '' ari ...
and
variance
In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its population mean or sample mean. Variance is a measure of dispersion, meaning it is a measure of how far a set of number ...
. This implies that
is an
unbiased estimator for .
However, in some textbooks, the definition is given as
[Madsen, H.O., Krenk, S., Lind, S.C. (2006) ''Methods of Structural Safety''. Dover Publications. p. 148-149. ]
Mean
The
mean
There are several kinds of mean in mathematics, especially in statistics. Each mean serves to summarize a given group of data, often to better understand the overall value ( magnitude and sign) of a given data set.
For a data set, the '' ari ...
of the empirical distribution is an
unbiased estimator
In statistics, the bias of an estimator (or bias function) is the difference between this estimator's expected value and the true value of the parameter being estimated. An estimator or decision rule with zero bias is called ''unbiased''. In st ...
of the mean of the population distribution.
which is more commonly denoted
Variance
The
variance
In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its population mean or sample mean. Variance is a measure of dispersion, meaning it is a measure of how far a set of number ...
of the empirical distribution times
is an unbiased estimator of the variance of the population distribution, for any distribution of X that has a finite variance.
Mean squared error
The
mean squared error
In statistics, the mean squared error (MSE) or mean squared deviation (MSD) of an estimator (of a procedure for estimating an unobserved quantity) measures the average of the squares of the errors—that is, the average squared difference betwe ...
for the empirical distribution is as follows.
Where
is an estimator and
an unknown parameter
Quantiles
For any real number
the notation
(read “ceiling of a”) denotes the least integer greater than or equal to
. For any real number a, the notation
(read “floor of a”) denotes the greatest integer less than or equal to
.
If
is not an integer, then the
-th quantile is unique and is equal to
If
is an integer, then the
-th quantile is not unique and is any real number
such that