HOME

TheInfoList



OR:

In
probability theory Probability theory is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expressing it through a set ...
and statistics, the generalized extreme value (GEV) distribution is a family of continuous probability distributions developed within
extreme value theory Extreme value theory or extreme value analysis (EVA) is a branch of statistics dealing with the extreme deviations from the median of probability distributions. It seeks to assess, from a given ordered sample of a given random variable, the pr ...
to combine the Gumbel, Fréchet and
Weibull Weibull is a Swedish locational surname. The Weibull family share the same roots as the Danish / Norwegian noble family of Falsenbr>They originated from and were named after the village of Weiböl in Widstedts parish, Jutland, but settled in Sk ...
families also known as type I, II and III extreme value distributions. By the
extreme value theorem In calculus, the extreme value theorem states that if a real-valued function f is continuous on the closed interval ,b/math>, then f must attain a maximum and a minimum, each at least once. That is, there exist numbers c and d in ,b/math> su ...
the GEV distribution is the only possible limit distribution of properly normalized maxima of a sequence of independent and identically distributed random variables. Note that a limit distribution needs to exist, which requires regularity conditions on the tail of the distribution. Despite this, the GEV distribution is often used as an approximation to model the maxima of long (finite) sequences of random variables. In some fields of application the generalized extreme value distribution is known as the Fisher–Tippett distribution, named after
Ronald Fisher Sir Ronald Aylmer Fisher (17 February 1890 – 29 July 1962) was a British polymath who was active as a mathematician, statistician, biologist, geneticist, and academic. For his work in statistics, he has been described as "a genius who ...
and L. H. C. Tippett who recognised three different forms outlined below. However usage of this name is sometimes restricted to mean the special case of the
Gumbel distribution In probability theory and statistics, the Gumbel distribution (also known as the type-I generalized extreme value distribution) is used to model the distribution of the maximum (or the minimum) of a number of samples of various distributions. Th ...
. The origin of the common functional form for all 3 distributions dates back to at least Jenkinson, A. F. (1955), though allegedly it could also have been given by von Mises, R. (1936).


Specification

Using the standardized variable s = (x - \mu)/\sigma\,, where \mu\,, the location parameter, can be any real number, and \sigma > 0 is the scale parameter; the cumulative distribution function of the GEV distribution is then :F(s; \xi) = \begin \exp\Bigl(-\exp(-s)\Bigr) & ~~ \text ~~ \xi = 0 \\ \\ \exp\Bigl(-(1+\xi s)^\Bigr) & ~~ \text ~~ \xi \neq 0 ~~ \text ~~ \xi \, s > -1 \\ \\ 0 & ~~ \text ~~ \xi > 0 ~~ \text ~~ \xi\, s \le -1 \\ \\ 1 & ~~ \text ~~ \xi < 0 ~~ \text ~~ \xi\, s \le -1 ~, \end where \xi\,, the shape parameter, can be any real number. Thus, for \xi > 0, the expression is valid for s > -1/\xi\,, while for \xi < 0 it is valid for s < -1/\xi\,. In the first case, -1/\xi is the negative, lower end-point, where F is 0; in the second case, -1/\xi is the positive, upper end-point, where F is 1. For \xi = 0 the second expression is formally undefined and is replaced with the first expression, which is the result of taking the limit of the second, as \xi \to 0 in which case s can be any real number. In the special case of x =\mu\,, so s = 0 and F(0; \xi) = \exp(-1)0.368 for whatever values \xi and \sigma might have. The probability density function of the standardized distribution is :f(s;\xi) = \begin \exp(-s) \exp\Bigl(-\exp(-s)\Bigr) & ~~ \text ~~ \xi = 0 \\ \\ \Bigl(1+\xi s\Bigr)^ \exp\Bigl(-(1+\xi s)^\Bigr) & ~~ \text ~~ \xi \neq 0 ~~ \text ~~ \xi \, s > -1 \\ \\ 0 & ~~ \text \end again valid for s > -1/\xi in the case \xi > 0\,, and for s < -1/\xi in the case \xi < 0\,. The density is zero outside of the relevant range. In the case \xi = 0 the density is positive on the whole real line. Since the cumulative distribution function is invertible, the quantile function for the GEV distribution has an explicit expression, namely :Q(p;\mu,\sigma,\xi) = \begin \mu - \sigma\log\Bigl(-\log\left(p\right)\,\Bigr) & ~ \text ~ \xi = 0 ~ \text ~ p \in \left(0,1\right) \\ \\ \mu + \displaystyle\left( \Bigl(-\log(p)\,\Bigr)^ - 1\right) & ~ \text ~ \xi > 0 ~ \text ~ p \in \left ,1\right) \\ & ~~ \text ~ \, \xi < 0 ~ \text ~ p \in (0,1;,\end and therefore the quantile density function \left(q \equiv \frac\right) is :q(p;\sigma,\xi) = \frac \quad \text ~~ p \in \left(0,1\right)\;, valid for ~\sigma > 0~ and for any real ~\xi\;.


Summary statistics

Some simple statistics of the distribution are: :\operatorname(X) = \mu + \left(g_1-1\right)\frac for \xi < 1 :\operatorname(X) = \left(g_2-g_1^2\right)\frac , :\operatorname(X) = \mu+\frac 1+\xi)^-1. The
skewness In probability theory and statistics, skewness is a measure of the asymmetry of the probability distribution of a real-valued random variable about its mean. The skewness value can be positive, zero, negative, or undefined. For a unimodal ...
is for ξ>0 :\operatorname(X) = \frac For ξ<0, the sign of the numerator is reversed. The excess
kurtosis In probability theory and statistics, kurtosis (from el, κυρτός, ''kyrtos'' or ''kurtos'', meaning "curved, arching") is a measure of the "tailedness" of the probability distribution of a real-valued random variable. Like skewness, kurt ...
is: :\operatorname(X) = \frac-3 . where g_k=\Gamma(1-k\xi), k=1,2,3,4, and \Gamma(t) is the
gamma function In mathematics, the gamma function (represented by , the capital letter gamma from the Greek alphabet) is one commonly used extension of the factorial function to complex numbers. The gamma function is defined for all complex numbers except ...
.


Link to Fréchet, Weibull and Gumbel families

The shape parameter \xi governs the tail behavior of the distribution. The sub-families defined by \xi= 0, \xi>0 and \xi<0 correspond, respectively, to the Gumbel, Fréchet and Weibull families, whose cumulative distribution functions are displayed below. * Gumbel or type I extreme value distribution (\xi=0) : F(x;\mu,\sigma,0)=e^\;\;\; \text \;\; x\in\mathbb R. * Fréchet or type II extreme value distribution, if \xi=\alpha^>0 and y = 1 + \xi (x-\mu)/\sigma : F(x;\mu,\sigma,\xi)=\begin e^ & y > 0 \\ 0 & y \leq 0. \end * Reversed
Weibull Weibull is a Swedish locational surname. The Weibull family share the same roots as the Danish / Norwegian noble family of Falsenbr>They originated from and were named after the village of Weiböl in Widstedts parish, Jutland, but settled in Sk ...
or type III extreme value distribution, if \xi=-\alpha^<0 and y = - \left( 1 + \xi (x-\mu)/\sigma \right) : F(x;\mu,\sigma,\xi)=\begin e^ & y<0 \\ 1 & y\geq 0 \end The subsections below remark on properties of these distributions.


Modification for minima rather than maxima

The theory here relates to data maxima and the distribution being discussed is an extreme value distribution for maxima. A generalised extreme value distribution for data minima can be obtained, for example by substituting (−''x'') for ''x'' in the distribution function, and subtracting from one: this yields a separate family of distributions.


Alternative convention for the Weibull distribution

The ordinary Weibull distribution arises in reliability applications and is obtained from the distribution here by using the variable t = \mu - x , which gives a strictly positive support - in contrast to the use in the extreme value theory here. This arises because the ordinary Weibull distribution is used in cases that deal with data minima rather than data maxima. The distribution here has an addition parameter compared to the usual form of the Weibull distribution and, in addition, is reversed so that the distribution has an upper bound rather than a lower bound. Importantly, in applications of the GEV, the upper bound is unknown and so must be estimated, while when applying the ordinary Weibull distribution in reliability applications the lower bound is usually known to be zero.


Ranges of the distributions

Note the differences in the ranges of interest for the three extreme value distributions: Gumbel is unlimited, Fréchet has a lower limit, while the reversed
Weibull Weibull is a Swedish locational surname. The Weibull family share the same roots as the Danish / Norwegian noble family of Falsenbr>They originated from and were named after the village of Weiböl in Widstedts parish, Jutland, but settled in Sk ...
has an upper limit. More precisely, Extreme Value Theory (Univariate Theory) describes which of the three is the limiting law according to the initial law X and in particular depending on its tail.


Distribution of log variables

One can link the type I to types II and III in the following way: if the cumulative distribution function of some random variable X is of type II, and with the positive numbers as support, i.e. F(x; 0, \sigma, \alpha), then the cumulative distribution function of \ln X is of type I, namely F(x; \ln \sigma, 1/\alpha, 0). Similarly, if the cumulative distribution function of X is of type III, and with the negative numbers as support, i.e. F(x; 0, \sigma, -\alpha), then the cumulative distribution function of \ln (-X) is of type I, namely F(x; -\ln \sigma, 1/\alpha, 0).


Link to logit models (logistic regression)

Multinomial logit In statistics, multinomial logistic regression is a classification method that generalizes logistic regression to multiclass problems, i.e. with more than two possible discrete outcomes. That is, it is a model that is used to predict the prob ...
models, and certain other types of
logistic regression In statistics, the logistic model (or logit model) is a statistical model that models the probability of an event taking place by having the log-odds for the event be a linear combination of one or more independent variables. In regression a ...
, can be phrased as
latent variable In statistics, latent variables (from Latin: present participle of ''lateo'', “lie hidden”) are variables that can only be inferred indirectly through a mathematical model from other observable variables that can be directly observed or me ...
models with
error variable In statistics, linear regression is a linear approach for modelling the relationship between a scalar response and one or more explanatory variables (also known as dependent and independent variables). The case of one explanatory variable is c ...
s distributed as
Gumbel distribution In probability theory and statistics, the Gumbel distribution (also known as the type-I generalized extreme value distribution) is used to model the distribution of the maximum (or the minimum) of a number of samples of various distributions. Th ...
s (type I generalized extreme value distributions). This phrasing is common in the theory of
discrete choice In economics, discrete choice models, or qualitative choice models, describe, explain, and predict choices between two or more discrete alternatives, such as entering or not entering the labor market, or choosing between modes of transport. Such ...
models, which include
logit model In statistics, the logistic model (or logit model) is a statistical model that models the probability of an event taking place by having the log-odds for the event be a linear combination of one or more independent variables. In regression a ...
s,
probit model In statistics, a probit model is a type of regression where the dependent variable can take only two values, for example married or not married. The word is a portmanteau, coming from ''probability'' + ''unit''. The purpose of the model is to est ...
s, and various extensions of them, and derives from the fact that the difference of two type-I GEV-distributed variables follows a
logistic distribution Logistic may refer to: Mathematics * Logistic function, a sigmoid function used in many fields ** Logistic map, a recurrence relation that sometimes exhibits chaos ** Logistic regression, a statistical model using the logistic function ** Logit, ...
, of which the
logit function In statistics, the logit ( ) function is the quantile function associated with the standard logistic distribution. It has many uses in data analysis and machine learning, especially in data transformations. Mathematically, the logit is the in ...
is the quantile function. The type-I GEV distribution thus plays the same role in these logit models as the
normal distribution In statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is : f(x) = \frac e^ The parameter \mu ...
does in the corresponding probit models.


Properties

The cumulative distribution function of the generalized extreme value distribution solves the
stability postulate In probability theory, to obtain a nondegenerate limiting distribution of the extreme value distribution, it is necessary to "reduce" the actual greatest value by applying a linear transformation with coefficients that depend on the sample size. ...
equation. The generalized extreme value distribution is a special case of a max-stable distribution, and is a transformation of a min-stable distribution.


Applications

*The GEV distribution is widely used in the treatment of "tail risks" in fields ranging from insurance to finance. In the latter case, it has been considered as a means of assessing various financial risks via metrics such as
value at risk Value at risk (VaR) is a measure of the risk of loss for investments. It estimates how much a set of investments might lose (with a given probability), given normal market conditions, in a set time period such as a day. VaR is typically used by ...
. *However, the resulting shape parameters have been found to lie in the range leading to undefined means and variances, which underlines the fact that reliable data analysis is often impossible.Kjersti Aas, lecture, NTNU, Trondheim, 23 Jan 2008
/ref> * In
hydrology Hydrology () is the scientific study of the movement, distribution, and management of water on Earth and other planets, including the water cycle, water resources, and environmental watershed sustainability. A practitioner of hydrology is call ...
the GEV distribution is applied to extreme events such as annual maximum one-day rainfalls and river discharges. The blue picture, made with
CumFreq In statistics and data analysis the application software CumFreq is a tool for cumulative frequency analysis of a single variable and for probability distribution fitting. Originally the method was developed for the analysis of hydrological ...
, illustrates an example of fitting the GEV distribution to ranked annually maximum one-day rainfalls showing also the 90%
confidence belt In frequentist statistics, a confidence interval (CI) is a range of estimates for an unknown parameter. A confidence interval is computed at a designated ''confidence level''; the 95% confidence level is most common, but other levels, such as 9 ...
based on the binomial distribution. The rainfall data are represented by
plotting position Plot or Plotting may refer to: Art, media and entertainment * Plot (narrative), the story of a piece of fiction Music * ''The Plot'' (album), a 1976 album by jazz trumpeter Enrico Rava * The Plot (band), a band formed in 2003 Other * ''Plot'' ...
s as part of the
cumulative frequency analysis Cumulative frequency analysis is the analysis of the frequency of occurrence of values of a phenomenon less than a reference value. The phenomenon may be time- or space-dependent. Cumulative frequency is also called ''frequency of non-exceedance ...
.


Example for Normally distributed variables

Let (X_i)_ be
i.i.d. In probability theory and statistics, a collection of random variables is independent and identically distributed if each random variable has the same probability distribution as the others and all are mutually independent. This property is us ...
normally distributed random variables with mean 0 and variance 1. The
Fisher–Tippett–Gnedenko theorem In statistics, the Fisher–Tippett–Gnedenko theorem (also the Fisher–Tippett theorem or the extreme value theorem) is a general result in extreme value theory regarding asymptotic distribution of extreme order statistics. The maximum of a sam ...
tells us that \max_ X_i \sim GEV(\mu_n, \sigma_n, 0), where \begin \mu_n &= \Phi^\left(1-\frac \right) \\ \sigma_n &= \Phi^\left(1-\frac \cdot \mathrm^\right)- \Phi^\left(1-\frac \right) \end . This allow us to estimate e.g. the mean of \max_ X_i from the mean of the GEV distribution: \begin E\left max_ X_i\right&\approx \mu_n+\gamma\sigma_n \\&=(1-\gamma)\Phi^(1-1/n)+\gamma\Phi^(1-1/(en)) \\&= \sqrt \cdot \left(1 + \frac + \mathcal \left(\frac \right) \right) \end, where \gamma is the
Euler–Mascheroni constant Euler's constant (sometimes also called the Euler–Mascheroni constant) is a mathematical constant usually denoted by the lowercase Greek letter gamma (). It is defined as the limiting difference between the harmonic series and the natural l ...
.


Related distributions

# If X \sim \textrm(\mu,\,\sigma,\,\xi) then mX+b \sim \textrm(m\mu+b,\,m\sigma,\,\xi) # If X \sim \textrm(\mu,\,\sigma) (
Gumbel distribution In probability theory and statistics, the Gumbel distribution (also known as the type-I generalized extreme value distribution) is used to model the distribution of the maximum (or the minimum) of a number of samples of various distributions. Th ...
) then X \sim \textrm(\mu,\,\sigma,\,0) # If X \sim \textrm(\sigma,\,\mu) (
Weibull distribution In probability theory and statistics, the Weibull distribution is a continuous probability distribution. It is named after Swedish mathematician Waloddi Weibull, who described it in detail in 1951, although it was first identified by Maurice Re ...
) then \mu\left(1-\sigma\mathrm\right) \sim \textrm(\mu,\,\sigma,\,0) # If X \sim \textrm(\mu,\,\sigma,\,0) then \sigma \exp (-\tfrac ) \sim \textrm(\sigma,\,\mu) (
Weibull distribution In probability theory and statistics, the Weibull distribution is a continuous probability distribution. It is named after Swedish mathematician Waloddi Weibull, who described it in detail in 1951, although it was first identified by Maurice Re ...
) # If X \sim \textrm(1)\, ( Exponential distribution) then \mu - \sigma \log \sim \textrm(\mu,\,\sigma,\,0) # If X \sim \mathrm(\alpha_X, \beta) and Y \sim \mathrm(\alpha_Y, \beta) then X-Y \sim \mathrm(\alpha_X-\alpha_Y,\beta) \, (see
Logistic_distribution Logistic may refer to: Mathematics * Logistic function, a sigmoid function used in many fields ** Logistic map, a recurrence relation that sometimes exhibits chaos ** Logistic regression, a statistical model using the logistic function ** Logit, ...
). # If X and Y \sim \mathrm(\alpha, \beta) then X+Y \nsim \mathrm(2 \alpha,\beta) \, (The sum is ''not'' a logistic distribution). Note that E(X+Y) = 2\alpha+2\beta\gamma \neq 2\alpha = E\left(\mathrm(2 \alpha,\beta) \right) .


Proofs

4. Let X \sim \textrm(\sigma,\,\mu), then the cumulative distribution of g(x) = \mu\left(1-\sigma\mathrm\right) is: : \begin P(\mu \left(1-\sigma\log\frac\right) < x) &= P\left(\log\frac < \frac \right) \\ &\text \\ &= P\left(X < \sigma \exp\left \frac \right\right) \\ &= 1 - \exp\left( - \left(\cancel \exp\left \frac \right\cdot \cancel \right)^\mu \right) \\ &= 1 - \exp\left( - \left( \exp\left \frac \right\right)^\cancel \right) \\ &= 1 - \exp\left( - \exp\left \frac \right\right) \\ &= 1 - \exp\left( - \exp\left - s \right\right), \quad s = \frac \end which is the cdf for \sim \textrm(\mu,\,\sigma,\,0). 5. Let X \sim \textrm(1), then the cumulative distribution of g(X) = \mu - \sigma \log is: : \begin P(\mu - \sigma \log < x) &= P\left(\log(X) < \frac\right) \\ &\text \\ &= P\left(X < \exp\left( \frac \right)\right) \\ &= 1 - \exp\left \exp\left(\frac\right) \right\\ &= 1 - \exp\left \exp\left(- s \right) \right \quad s = \frac \end which is the cumulative distribution of \textrm(\mu, \sigma, 0).


See also

* Extreme Value Theory (Univariate Theory) *
Fisher–Tippett–Gnedenko theorem In statistics, the Fisher–Tippett–Gnedenko theorem (also the Fisher–Tippett theorem or the extreme value theorem) is a general result in extreme value theory regarding asymptotic distribution of extreme order statistics. The maximum of a sam ...
*
Generalized Pareto distribution In statistics, the generalized Pareto distribution (GPD) is a family of continuous probability distributions. It is often used to model the tails of another distribution. It is specified by three parameters: location \mu, scale \sigma, and shap ...
*
German tank problem In the statistical theory of estimation, the German tank problem consists of estimating the maximum of a discrete uniform distribution from sampling without replacement. In simple terms, suppose there exists an unknown number of items which are s ...
, opposite question of population maximum given sample maximum * Pickands–Balkema–De Haan theorem


References


Further reading

* * * * {{ProbDistributions, continuous-variable Continuous distributions Extreme value data Location-scale family probability distributions Stability (probability)