Extreme value theory or extreme value analysis (EVA) is a branch of
statistics dealing with the extreme
deviations from the
median of
probability distribution
In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment. It is a mathematical description of a random phenomeno ...
s. It seeks to assess, from a given ordered
sample of a given random variable, the probability of events that are more extreme than any previously observed. Extreme value analysis is widely used in many disciplines, such as
structural engineering
Structural engineering is a sub-discipline of civil engineering in which structural engineers are trained to design the 'bones and muscles' that create the form and shape of man-made structures. Structural engineers also must understand and ca ...
, finance,
earth science
Earth science or geoscience includes all fields of natural science related to the planet Earth. This is a branch of science dealing with the physical, chemical, and biological complex constitutions and synergistic linkages of Earth's four spher ...
s, traffic prediction, and
geological engineering. For example, EVA might be used in the field of
hydrology
Hydrology () is the scientific study of the movement, distribution, and management of water on Earth and other planets, including the water cycle, water resources, and environmental watershed sustainability. A practitioner of hydrology is calle ...
to estimate the probability of an unusually large flooding event, such as the
100-year flood. Similarly, for the design of a
breakwater, a
coastal engineer
Coastal engineering is a branch of civil engineering concerned with the specific demands posed by constructing at or near the coast, as well as the development of the coast itself.
The hydrodynamic impact of especially waves, tides, storm surges ...
would seek to estimate the 50-year wave and design the structure accordingly.
Data analysis
Two main approaches exist for practical extreme value analysis.
The first method relies on deriving block maxima (minima) series as a preliminary step. In many situations it is customary and convenient to extract the annual maxima (minima), generating an "Annual Maxima Series" (AMS).
The second method relies on extracting, from a continuous record, the peak values reached for any period during which values exceed a certain threshold (falls below a certain threshold). This method is generally referred to as the "Peak Over Threshold" method (POT).
For AMS data, the analysis may partly rely on the results of the
Fisher–Tippett–Gnedenko theorem, leading to the
generalized extreme value distribution being selected for fitting. However, in practice, various procedures are applied to select between a wider range of distributions. The theorem here relates to the limiting distributions for the minimum or the maximum of a very large collection of
independent
Independent or Independents may refer to:
Arts, entertainment, and media Artist groups
* Independents (artist group), a group of modernist painters based in the New Hope, Pennsylvania, area of the United States during the early 1930s
* Independe ...
random variable
A random variable (also called random quantity, aleatory variable, or stochastic variable) is a mathematical formalization of a quantity or object which depends on random events. It is a mapping or a function from possible outcomes (e.g., the p ...
s from the same distribution. Given that the number of relevant random events within a year may be rather limited, it is unsurprising that analyses of observed AMS data often lead to distributions other than the generalized extreme value distribution (GEVD) being selected.
For POT data, the analysis may involve fitting two distributions: one for the number of events in a time period considered and a second for the size of the exceedances.
A common assumption for the first is the
Poisson distribution
In probability theory and statistics, the Poisson distribution is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time or space if these events occur with a known ...
, with the
generalized Pareto distribution being used for the exceedances.
A
tail-fitting can be based on the
Pickands–Balkema–de Haan theorem.
Novak reserves the term “POT method” to the case where the threshold is non-random, and distinguishes it from the case where one deals with exceedances of a random threshold.
Applications
Applications of extreme value theory include predicting the probability distribution of:
* Extreme
flood
A flood is an overflow of water ( or rarely other fluids) that submerges land that is usually dry. In the sense of "flowing water", the word may also be applied to the inflow of the tide. Floods are an area of study of the discipline hydrol ...
s; the size of
freak waves
*
Tornado
A tornado is a violently rotating column of air that is in contact with both the surface of the Earth and a cumulonimbus cloud or, in rare cases, the base of a cumulus cloud. It is often referred to as a twister, whirlwind or cyclone, alt ...
outbreaks
* Maximum sizes of ecological populations
* Side effects of drugs (e.g.,
ximelagatran
Ximelagatran (Exanta or Exarta, H 376/95) is an anticoagulant that has been investigated extensively as a replacement for warfarin that would overcome the problematic dietary, drug interaction, and monitoring issues associated with warfarin the ...
)
* The magnitudes of large
insurance
Insurance is a means of protection from financial loss in which, in exchange for a fee, a party agrees to compensate another party in the event of a certain loss, damage, or injury. It is a form of risk management, primarily used to hedge ...
losses
*
Equity risks; day-to-day
market risk
* Mutational events during
evolution
Evolution is change in the heritable characteristics of biological populations over successive generations. These characteristics are the expressions of genes, which are passed on from parent to offspring during reproduction. Variation ...
* Large
wildfire
A wildfire, forest fire, bushfire, wildland fire or rural fire is an unplanned, uncontrolled and unpredictable fire in an area of combustible vegetation. Depending on the type of vegetation present, a wildfire may be more specifically identi ...
s
* Environmental loads on structures
* Fastest time humans are capable of running the
100 metres
The 100 metres, or 100-meter dash, is a sprint race in track and field competitions. The shortest common outdoor running distance, the dash is one of the most popular and prestigious events in the sport of athletics. It has been contest ...
sprint and performances in other athletic disciplines
* Pipeline failures due to
pitting corrosion
* Anomalous IT network traffic, prevent attackers from reaching important data
* Road safety analysis
* Wireless communications
*Epidemics
*Neurobiology
History
The field of extreme value theory was pioneered by
Leonard Tippett
Leonard Henry Caleb Tippett (8 May 1902 – 9 November 1985), known professionally as L. H. C. Tippett, was an English statistician.
Tippett was born in London but spent most of his early life in Cornwall and attended St Austell County Grammar ...
(1902–1985). Tippett was employed by the
British Cotton Industry Research Association, where he worked to make cotton thread stronger. In his studies, he realized that the strength of a thread was controlled by the strength of its weakest fibres. With the help of
R. A. Fisher, Tippet obtained three asymptotic limits describing the distributions of extremes assuming independent variables.
Emil Julius Gumbel
Emil Julius Gumbel (18 July 1891, in Munich – 10 September 1966, in New York City) was a German mathematician and political writer.
Gumbel specialised in mathematical statistics and, along with Leonard Tippett and Ronald Fisher, was instru ...
codified this theory in his 1958 book ''Statistics of Extremes'', including the
Gumbel distributions that bear his name. These results can be extended to allow for slight correlations between variables, but the classical theory does not extend to strong correlations of the order of the variance. One universality class of particular interest is that of
log-correlated fields, where the correlations decay logarithmically with the distance.
Univariate theory
Let
be a sequence of
independent and identically distributed random variables with
cumulative distribution function
In probability theory and statistics, the cumulative distribution function (CDF) of a real-valued random variable X, or just distribution function of X, evaluated at x, is the probability that X will take a value less than or equal to x.
Ev ...
''F'' and let
denote the maximum.
In theory, the exact distribution of the maximum can be derived:
:
The associated
indicator function
In mathematics, an indicator function or a characteristic function of a subset of a set is a function that maps elements of the subset to one, and all other elements to zero. That is, if is a subset of some set , one has \mathbf_(x)=1 if x ...
is a
Bernoulli process with a success probability
that depends on the magnitude
of the extreme event. The number of extreme events within
trials thus follows a
binomial distribution
In probability theory and statistics, the binomial distribution with parameters ''n'' and ''p'' is the discrete probability distribution of the number of successes in a sequence of ''n'' independent experiments, each asking a yes–no qu ...
and the number of trials until an event occurs follows a
geometric distribution with expected value and standard deviation of the same order
.
In practice, we might not have the distribution function
but the
Fisher–Tippett–Gnedenko theorem provides an asymptotic result. If there exist sequences of constants
and
such that
:
as
then
:
where
depends on the tail shape of the distribution.
When normalized, ''G'' belongs to one of the following non-
degenerate distribution families:
Weibull law: