In
probability
Probability is a branch of mathematics and statistics concerning events and numerical descriptions of how likely they are to occur. The probability of an event is a number between 0 and 1; the larger the probability, the more likely an e ...
and
statistics
Statistics (from German language, German: ', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a s ...
, an exponential family is a
parametric set of
probability distribution
In probability theory and statistics, a probability distribution is a Function (mathematics), function that gives the probabilities of occurrence of possible events for an Experiment (probability theory), experiment. It is a mathematical descri ...
s of a certain form, specified below. This special form is chosen for mathematical convenience, including the enabling of the user to calculate expectations, covariances using differentiation based on some useful algebraic properties, as well as for generality, as exponential families are in a sense very natural sets of distributions to consider. The term exponential class is sometimes used in place of "exponential family", or the older term Koopman–Darmois family.
Sometimes loosely referred to as ''the'' exponential family, this class of distributions is distinct because they all possess a variety of desirable properties, most importantly the existence of a
sufficient statistic
In statistics, sufficiency is a property of a statistic computed on a sample dataset in relation to a parametric model of the dataset. A sufficient statistic contains all of the information that the dataset provides about the model parameters. It ...
.
The concept of exponential families is credited to
E. J. G. Pitman,
G. Darmois, and
B. O. Koopman in 1935–1936. Exponential families of distributions provide a general framework for selecting a possible alternative parameterisation of a
parametric family
In mathematics and its applications, a parametric family or a parameterized family is a indexed family, family of objects (a set of related objects) whose differences depend only on the chosen values for a set of parameters.
Common examples are p ...
of distributions, in terms of natural parameters, and for defining useful
sample statistics, called the natural sufficient statistics of the family.
Nomenclature difficulty
The terms "distribution" and "family" are often used loosely: Specifically, ''an'' exponential family is a ''set'' of distributions, where the specific distribution varies with the parameter; however, a parametric ''family'' of distributions is often referred to as "''a'' distribution" (like "the normal distribution", meaning "the family of normal distributions"), and the set of all exponential families is sometimes loosely referred to as "the" exponential family.
Definition
Most of the commonly used distributions form an exponential family or subset of an exponential family, listed in the subsection below. The subsections following it are a sequence of increasingly more general mathematical definitions of an exponential family. A casual reader may wish to restrict attention to the first and simplest definition, which corresponds to a single-parameter family of
discrete
Discrete may refer to:
*Discrete particle or quantum in physics, for example in quantum theory
* Discrete device, an electronic component with just one circuit element, either passive or active, other than an integrated circuit
* Discrete group, ...
or
continuous probability distributions.
Examples of exponential family distributions
Exponential families include many of the most common distributions. Among many others, exponential families includes the following:
*
normal
*
exponential
*
gamma
Gamma (; uppercase , lowercase ; ) is the third letter of the Greek alphabet. In the system of Greek numerals it has a value of 3. In Ancient Greek, the letter gamma represented a voiced velar stop . In Modern Greek, this letter normally repr ...
*
chi-squared
*
beta
Beta (, ; uppercase , lowercase , or cursive ; or ) is the second letter of the Greek alphabet. In the system of Greek numerals, it has a value of 2. In Ancient Greek, beta represented the voiced bilabial plosive . In Modern Greek, it represe ...
*
Dirichlet
Johann Peter Gustav Lejeune Dirichlet (; ; 13 February 1805 – 5 May 1859) was a German mathematician. In number theory, he proved special cases of Fermat's last theorem and created analytic number theory. In Mathematical analysis, analysis, h ...
*
Bernoulli
*
categorical
*
Poisson
*
Wishart
*
inverse Wishart
*
geometric
A number of common distributions are exponential families, but only when certain parameters are fixed and known. For example:
*
binomial (with fixed number of trials)
*
multinomial (with fixed number of trials)
*
negative binomial
In probability theory and statistics, the negative binomial distribution, also called a Pascal distribution, is a discrete probability distribution that models the number of failures in a sequence of independent and identically distributed Berno ...
(with fixed number of failures)
Note that in each case, the parameters which must be fixed are those that set a limit on the range of values that can possibly be observed.
Examples of common distributions that are ''not'' exponential families are
Student's ''t'', most
mixture distributions, and even the family of
uniform distributions when the bounds are not fixed. See the section below on
examples
Example may refer to:
* ''exempli gratia'' (e.g.), usually read out in English as "for example"
* .example, reserved as a domain name that may not be installed as a top-level domain of the Internet
** example.com, example.net, example.org, a ...
for more discussion.
Scalar parameter
The value of
is called the ''parameter'' of the family.
A single-parameter exponential family is a set of probability distributions whose
probability density function
In probability theory, a probability density function (PDF), density function, or density of an absolutely continuous random variable, is a Function (mathematics), function whose value at any given sample (or point) in the sample space (the s ...
(or
probability mass function
In probability and statistics, a probability mass function (sometimes called ''probability function'' or ''frequency function'') is a function that gives the probability that a discrete random variable is exactly equal to some value. Sometimes i ...
, for the case of a
discrete distribution) can be expressed in the form
where , , , and are known functions. The function must be non-negative.
An alternative, equivalent form often given is
or equivalently
In terms of
log probability,
Note that
and
.
Support must be independent of
Importantly, the
''support'' of
(all the possible
values for which
is greater than
) is required to ''not'' depend on
This requirement can be used to exclude a parametric family distribution from being an exponential family.
For example: The
Pareto distribution
The Pareto distribution, named after the Italian civil engineer, economist, and sociologist Vilfredo Pareto, is a power-law probability distribution that is used in description of social, quality control, scientific, geophysical, actuarial scien ...
has a pdf which is defined for
(the minimum value,
being the scale parameter) and its support, therefore, has a lower limit of
Since the support of
is dependent on the value of the parameter, the family of
Pareto distribution
The Pareto distribution, named after the Italian civil engineer, economist, and sociologist Vilfredo Pareto, is a power-law probability distribution that is used in description of social, quality control, scientific, geophysical, actuarial scien ...
s does not form an exponential family of distributions (at least when
is unknown).
Another example:
Bernoulli-type distributions –
binomial,
negative binomial
In probability theory and statistics, the negative binomial distribution, also called a Pascal distribution, is a discrete probability distribution that models the number of failures in a sequence of independent and identically distributed Berno ...
,
geometric distribution
In probability theory and statistics, the geometric distribution is either one of two discrete probability distributions:
* The probability distribution of the number X of Bernoulli trials needed to get one success, supported on \mathbb = \;
* T ...
, and similar – can only be included in the exponential class if the number of
Bernoulli trial
In the theory of probability and statistics, a Bernoulli trial (or binomial trial) is a random experiment with exactly two possible outcomes, "success" and "failure", in which the probability of success is the same every time the experiment is ...
s, , is treated as a fixed constant – excluded from the free parameter(s)
– since the allowed number of trials sets the limits for the number of "successes" or "failures" that can be observed in a set of trials.
Vector valued and
Often
is a vector of measurements, in which case
may be a function from the space of possible values of
to the real numbers.
More generally,
and
can each be vector-valued such that
is real-valued. However, see the discussion below on
vector parameters, regarding the exponential family.
Canonical formulation
If
then the exponential family is said to be in ''
canonical form
In mathematics and computer science, a canonical, normal, or standard form of a mathematical object is a standard way of presenting that object as a mathematical expression. Often, it is one which provides the simplest representation of an obje ...
''. By defining a transformed parameter
it is always possible to convert an exponential family to canonical form. The canonical form is non-unique, since
can be multiplied by any nonzero constant, provided that is multiplied by that constant's reciprocal, or a constant can be added to
and multiplied by
to offset it. In the special case that
and , then the family is called a ''
natural exponential family''.
Even when
is a scalar, and there is only a single parameter, the functions
and
can still be vectors, as described below.
The function
or equivalently
is automatically determined once the other functions have been chosen, since it must assume a form that causes the distribution to be
normalized (sum or integrate to one over the entire domain). Furthermore, both of these functions can always be written as functions of
even when
is not a
one-to-one function, i.e. two or more different values of
map to the same value of
and hence
cannot be inverted. In such a case, all values of
mapping to the same
will also have the same value for
and
Factorization of the variables involved
What is important to note, and what characterizes all exponential family variants, is that the parameter(s) and the observation variable(s) must
factorize (can be separated into products each of which involves only one type of variable), either directly or within either part (the base or exponent) of an
exponentiation
In mathematics, exponentiation, denoted , is an operation (mathematics), operation involving two numbers: the ''base'', , and the ''exponent'' or ''power'', . When is a positive integer, exponentiation corresponds to repeated multiplication ...
operation. Generally, this means that all of the factors constituting the density or mass function must be of one of the following forms:
where and are arbitrary functions of , the observed statistical variable; and are arbitrary functions of
the fixed parameters defining the shape of the distribution; and is any arbitrary constant expression (i.e. a number or an expression that does not change with either or
).
There are further restrictions on how many such factors can occur. For example, the two expressions:
are the same, i.e. a product of two "allowed" factors. However, when rewritten into the factorized form,
it can be seen that it cannot be expressed in the required form. (However, a form of this sort is a member of a ''curved exponential family'', which allows multiple factorized terms in the exponent.)
To see why an expression of the form
qualifies,
and hence factorizes inside of the exponent. Similarly,
and again factorizes inside of the exponent.
A factor consisting of a sum where both types of variables are involved (e.g. a factor of the form
) cannot be factorized in this fashion (except in some cases where occurring directly in an exponent); this is why, for example, the
Cauchy distribution
The Cauchy distribution, named after Augustin-Louis Cauchy, is a continuous probability distribution. It is also known, especially among physicists, as the Lorentz distribution (after Hendrik Lorentz), Cauchy–Lorentz distribution, Lorentz(ian) ...
and
Student's ''t'' distribution are not exponential families.
Vector parameter
The definition in terms of one ''real-number'' parameter can be extended to one ''real-vector'' parameter
A family of distributions is said to belong to a vector exponential family if the probability density function (or probability mass function, for discrete distributions) can be written as
or in a more compact form,
This form writes the sum as a
dot product
In mathematics, the dot product or scalar productThe term ''scalar product'' means literally "product with a Scalar (mathematics), scalar as a result". It is also used for other symmetric bilinear forms, for example in a pseudo-Euclidean space. N ...
of vector-valued functions
and .
An alternative, equivalent form often seen is