probability theory Probability theory is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expressing it through a set o ...

, heavy-tailed distributions are

probability distribution In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment. It is a mathematical description of a random phenomeno ...

s whose tails are not exponentially bounded: that is, they have heavier tails than the

exponential distribution In probability theory and statistics, the exponential distribution is the probability distribution of the time between events in a Poisson point process, i.e., a process in which events occur continuously and independently at a constant averag ...

. In many applications it is the right tail of the distribution that is of interest, but a distribution may have a heavy left tail, or both tails may be heavy. There are three important subclasses of heavy-tailed distributions: the fat-tailed distributions, the long-tailed distributions and the subexponential distributions. In practice, all commonly used heavy-tailed distributions belong to the subexponential class. There is still some discrepancy over the use of the term heavy-tailed. There are two other definitions in use. Some authors use the term to refer to those distributions which do not have all their power moments finite; and some others to those distributions that do not have a finite

variance In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its population mean or sample mean. Variance is a measure of dispersion, meaning it is a measure of how far a set of number ...

. The definition given in this article is the most general in use, and includes all distributions encompassed by the alternative definitions, as well as those distributions such as log-normal that possess all their power moments, yet which are generally considered to be heavy-tailed. (Occasionally, heavy-tailed is used for any distribution that has heavier tails than the normal distribution.)

Definitions

Definition of heavy-tailed distribution

The distribution of a

random variable A random variable (also called random quantity, aleatory variable, or stochastic variable) is a mathematical formalization of a quantity or object which depends on random events. It is a mapping or a function from possible outcomes (e.g., the p ...

''X'' with distribution function ''F'' is said to have a heavy (right) tail if the moment generating function of ''X'', ''M_X''(''t''), is infinite for all ''t'' > 0.Rolski, Schmidli, Scmidt, Teugels, ''Stochastic Processes for Insurance and Finance'', 1999 That means :

\int_^\infty e^ \,dF(x) = \infty \quad \mbox t>0.

This is also written in terms of the tail distribution function :

\overline(x) \equiv \Pr >x \,

as :

\lim_ e^\overline(x) = \infty \quad \mbox t >0.\,

Definition of long-tailed distribution

The distribution of a

''X'' with distribution function ''F'' is said to have a long right tail if for all ''t'' > 0, :

\lim_ \Pr >x+t\mid X>x =1, \,

or equivalently :

\overline(x+t) \sim \overline(x) \quad \mbox x \to \infty. \,

This has the intuitive interpretation for a right-tailed long-tailed distributed quantity that if the long-tailed quantity exceeds some high level, the probability approaches 1 that it will exceed any other higher level. All long-tailed distributions are heavy-tailed, but the converse is false, and it is possible to construct heavy-tailed distributions that are not long-tailed.

Subexponential distributions

Subexponentiality is defined in terms of convolutions of probability distributions. For two independent, identically distributed random variables

X_1,X_2

with a common distribution function

F

, the convolution of

F

with itself, written

F^

and called the convolution square, is defined using Lebesgue–Stieltjes integration by: :

= F^(x) = \int_^x F(x-y)\,dF(y),

and the ''n''-fold convolution

F^

is defined inductively by the rule: :

F^(x) = \int_^x F(x-y)\,dF^(y).

The tail distribution function

\overline

is defined as

\overline(x) = 1-F(x)

. A distribution

F

on the positive half-line is subexponential if :

\overline(x) \sim 2\overline(x) \quad \mbox x \to \infty.

This implies that, for any

n \geq 1

, :

\overline(x) \sim n\overline(x) \quad \mbox x \to \infty.

The probabilistic interpretation of this is that, for a sum of

n

independent Independent or Independents may refer to: Arts, entertainment, and media Artist groups * Independents (artist group), a group of modernist painters based in the New Hope, Pennsylvania, area of the United States during the early 1930s * Independe ...

random variables

X_1,\ldots,X_n

with common distribution

F

, :

\quad \text x \to \infty.

This is often known as the principle of the single big jump or catastrophe principle. A distribution

F

on the whole real line is subexponential if the distribution

F I(,\infty))

is. Here

I([0,\infty))

is the indicator function of the positive half-line. Alternatively, a random variable

X

supported on the real line is subexponential if and only if

X^+ = \max(0,X)

is subexponential. All subexponential distributions are long-tailed, but examples can be constructed of long-tailed distributions that are not subexponential.

Common heavy-tailed distributions

All commonly used heavy-tailed distributions are subexponential. Those that are one-tailed include: *the Pareto distribution; *the Log-normal distribution; *the Lévy distribution; *the Weibull distribution with shape parameter greater than 0 but less than 1; *the Burr distribution; *the log-logistic distribution; *the log-gamma distribution; *the Fréchet distribution; *the q-Gaussian distribution *the log-Cauchy distribution, sometimes described as having a "super-heavy tail" because it exhibits logarithmic decay producing a heavier tail than the Pareto distribution. Those that are two-tailed include: *The

Cauchy distribution The Cauchy distribution, named after Augustin Cauchy, is a continuous probability distribution. It is also known, especially among physicists, as the Lorentz distribution (after Hendrik Lorentz), Cauchy–Lorentz distribution, Lorentz(ian) fu ...

, itself a special case of both the stable distribution and the t-distribution; *The family of stable distributions, excepting the special case of the normal distribution within that family. Some stable distributions are one-sided (or supported by a half-line), see e.g. Lévy distribution. See also '' financial models with long-tailed distributions and volatility clustering''. *The t-distribution. *The skew lognormal cascade distribution.

Relationship to fat-tailed distributions

A fat-tailed distribution is a distribution for which the probability density function, for large x, goes to zero as a power

x^

. Since such a power is always bounded below by the probability density function of an exponential distribution, fat-tailed distributions are always heavy-tailed. Some distributions, however, have a tail which goes to zero slower than an exponential function (meaning they are heavy-tailed), but faster than a power (meaning they are not fat-tailed). An example is the

log-normal distribution In probability theory, a log-normal (or lognormal) distribution is a continuous probability distribution of a random variable whose logarithm is normally distributed. Thus, if the random variable is log-normally distributed, then has a normal ...

. Many other heavy-tailed distributions such as the log-logistic and Pareto distribution are, however, also fat-tailed.

Estimating the tail-index

There are parametric and non-parametric approaches to the problem of the tail-index estimation. To estimate the tail-index using the parametric approach, some authors employ

GEV distribution In probability theory and statistics, the generalized extreme value (GEV) distribution is a family of continuous probability distributions developed within extreme value theory to combine the Gumbel, Fréchet and Weibull families also known ...

or Pareto distribution; they may apply the maximum-likelihood estimator (MLE).

Pickand's tail-index estimator

With

(X_n , n \geq 1)

a random sequence of independent and same density function

F \in D(H(\xi))

, the Maximum Attraction Domain of the generalized extreme value density

H

, where

\xi \in \mathbb

. If

\lim_ k(n) = \infty

and

\lim_ \frac= 0

, then the ''Pickands'' tail-index estimation is :

\xi^\text_ =\frac \ln \left(  \frac\right),

where

X_=\max \left(X_,\ldots  ,X_\right)

. This estimator converges in probability to

\xi

Hill's tail-index estimator

Let

(X_t , t \geq 1)

be a sequence of independent and identically distributed random variables with distribution function

F \in D(H(\xi))

, the maximum domain of attraction of the generalized extreme value distribution

H

, where

\xi \in \mathbb

. The sample path is

where

n

is the sample size. If

\

is an intermediate order sequence, i.e.

k(n) \in \,

k(n) \to \infty

and

k(n)/n \to 0

, then the Hill tail-index estimator is :

\xi^\text_ = \left(\frac 1  \sum_^n \ln(X_) - \ln (X_)\right)^,

where

X_

is the

i

-th order statistic of

X_1, \dots, X_n

. This estimator converges in probability to

\xi

, and is asymptotically normal provided

k(n) \to \infty

is restricted based on a higher order regular variation property . Consistency and asymptotic normality extend to a large class of dependent and heterogeneous sequences, irrespective of whether

X_t

is observed, or a computed residual or filtered data from a large class of models and estimators, including mis-specified models and models with errors that are dependent. Note that both Pickand's and Hill's tail-index estimators commonly make use of logarithm of the order statistics.

Ratio estimator of the tail-index

The ratio estimator (RE-estimator) of the tail-index was introduced by Goldie and Smith. It is constructed similarly to Hill's estimator but uses a non-random "tuning parameter". A comparison of Hill-type and RE-type estimators can be found in Novak.

Software

aest
C tool for estimating the heavy-tail index.

Estimation of heavy-tailed density

Nonparametric approaches to estimate heavy- and superheavy-tailed probability density functions were given in Markovich. These are approaches based on variable bandwidth and long-tailed kernel estimators; on the preliminary data transform to a new random variable at finite or infinite intervals, which is more convenient for the estimation and then inverse transform of the obtained density estimate; and "piecing-together approach" which provides a certain parametric model for the tail of the density and a non-parametric model to approximate the mode of the density. Nonparametric estimators require an appropriate selection of tuning (smoothing) parameters like a bandwidth of kernel estimators and the bin width of the histogram. The well known data-driven methods of such selection are a cross-validation and its modifications, methods based on the minimization of the mean squared error (MSE) and its asymptotic and their upper bounds. A discrepancy method which uses well-known nonparametric statistics like Kolmogorov-Smirnov's, von Mises and Anderson-Darling's ones as a metric in the space of distribution functions (dfs) and quantiles of the later statistics as a known uncertainty or a discrepancy value can be found in. Bootstrap is another tool to find smoothing parameters using approximations of unknown MSE by different schemes of re-samples selection, see e.g.{{cite book , author=Hall P. , title=The Bootstrap and Edgeworth Expansion , year=1992 , series=Springer , isbn=9780387945088

References

Tails of probability distributions Types of probability distributions Actuarial science Risk