Natural Exponential Family
   HOME

TheInfoList



OR:

In
probability Probability is the branch of mathematics concerning numerical descriptions of how likely an Event (probability theory), event is to occur, or how likely it is that a proposition is true. The probability of an event is a number between 0 and ...
and
statistics Statistics (from German language, German: ''wikt:Statistik#German, Statistik'', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of ...
, a natural exponential family (NEF) is a class of
probability distribution In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment. It is a mathematical description of a random phenomenon i ...
s that is a special case of an
exponential family In probability and statistics, an exponential family is a parametric set of probability distributions of a certain form, specified below. This special form is chosen for mathematical convenience, including the enabling of the user to calculate ...
(EF).


Definition


Univariate case

The natural exponential families (NEF) are a subset of the
exponential families In probability and statistics, an exponential family is a parametric set of probability distributions of a certain form, specified below. This special form is chosen for mathematical convenience, including the enabling of the user to calculate ...
. A NEF is an exponential family in which the natural parameter ''η'' and the natural statistic ''T''(''x'') are both the identity. A distribution in an
exponential family In probability and statistics, an exponential family is a parametric set of probability distributions of a certain form, specified below. This special form is chosen for mathematical convenience, including the enabling of the user to calculate ...
with parameter ''θ'' can be written with
probability density function In probability theory, a probability density function (PDF), or density of a continuous random variable, is a function whose value at any given sample (or point) in the sample space (the set of possible values taken by the random variable) can ...
(PDF) : f_X(x\mid \theta) = h(x)\ \exp\Big(\ \eta(\theta) T(x) - A(\theta)\ \Big) \,\! , where h(x) and A(\theta) are known functions. A distribution in a natural exponential family with parameter θ can thus be written with PDF : f_X(x\mid \theta) = h(x)\ \exp\Big(\ \theta x - A(\theta)\ \Big) \,\! . [Note that slightly different notation is used by the originator of the NEF, Carl Morris.Morris C. (2006) "Natural exponential families", ''Encyclopedia of Statistical Sciences''. Morris uses ''ω'' instead of ''η'' and ''ψ'' instead of ''A''.]


General multivariate case

Suppose that \mathbf \in \mathcal \subseteq \mathbb^p, then a natural exponential family of order ''p'' has density or mass function of the form: : f_X(\mathbf \mid \boldsymbol\theta) = h(\mathbf)\ \exp\Big(\boldsymbol\theta^ \mathbf - A(\boldsymbol\theta)\ \Big) \,\! , where in this case the parameter \boldsymbol\theta \in \mathbb^p .


Moment and cumulant generating functions

A member of a natural exponential family has
moment generating function In probability theory and statistics, the moment-generating function of a real-valued random variable is an alternative specification of its probability distribution. Thus, it provides the basis of an alternative route to analytical results compare ...
(MGF) of the form :M_X(\mathbf) = \exp\Big(\ A(\boldsymbol\theta + \mathbf) - A(\boldsymbol\theta)\ \Big) \, . The
cumulant generating function In probability theory and statistics, the cumulants of a probability distribution are a set of quantities that provide an alternative to the '' moments'' of the distribution. Any two probability distributions whose moments are identical will have ...
is by definition the logarithm of the MGF, so it is :K_X(\mathbf) = A(\boldsymbol\theta + \mathbf) - A(\boldsymbol\theta) \, .


Examples

The five most important univariate cases are: *
normal distribution In statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is : f(x) = \frac e^ The parameter \mu ...
with known variance *
Poisson distribution In probability theory and statistics, the Poisson distribution is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time or space if these events occur with a known co ...
*
gamma distribution In probability theory and statistics, the gamma distribution is a two-parameter family of continuous probability distributions. The exponential distribution, Erlang distribution, and chi-square distribution are special cases of the gamma distri ...
with known shape parameter ''α'' (or ''k'' depending on notation set used) *
binomial distribution In probability theory and statistics, the binomial distribution with parameters ''n'' and ''p'' is the discrete probability distribution of the number of successes in a sequence of ''n'' independent experiments, each asking a yes–no quest ...
with known number of trials, ''n'' *
negative binomial distribution In probability theory and statistics, the negative binomial distribution is a discrete probability distribution that models the number of failures in a sequence of independent and identically distributed Bernoulli trials before a specified (non-r ...
with known r These five examples – Poisson, binomial, negative binomial, normal, and gamma – are a special subset of NEF, called NEF with quadratic
variance function In statistics, the variance function is a smooth function which depicts the variance of a random quantity as a function of its mean. The variance function is a measure of heteroscedasticity and plays a large role in many settings of statisti ...
(NEF-QVF) because the variance can be written as a quadratic function of the mean. NEF-QVF are discussed below. Distributions such as the
exponential Exponential may refer to any of several mathematical topics related to exponentiation, including: *Exponential function, also: **Matrix exponential, the matrix analogue to the above * Exponential decay, decrease at a rate proportional to value *Exp ...
, chi-squared,
Bernoulli Bernoulli can refer to: People *Bernoulli family of 17th and 18th century Swiss mathematicians: ** Daniel Bernoulli (1700–1782), developer of Bernoulli's principle **Jacob Bernoulli (1654–1705), also known as Jacques, after whom Bernoulli numbe ...
, and
geometric distribution In probability theory and statistics, the geometric distribution is either one of two discrete probability distributions: * The probability distribution of the number ''X'' of Bernoulli trials needed to get one success, supported on the set \; * ...
s are special cases of the above five distributions. Many common distributions are either NEF or can be related to the NEF. For example: the
chi-squared distribution In probability theory and statistics, the chi-squared distribution (also chi-square or \chi^2-distribution) with k degrees of freedom is the distribution of a sum of the squares of k independent standard normal random variables. The chi-squa ...
is a special case of the
gamma distribution In probability theory and statistics, the gamma distribution is a two-parameter family of continuous probability distributions. The exponential distribution, Erlang distribution, and chi-square distribution are special cases of the gamma distri ...
. The
Bernoulli distribution In probability theory and statistics, the Bernoulli distribution, named after Swiss mathematician Jacob Bernoulli,James Victor Uspensky: ''Introduction to Mathematical Probability'', McGraw-Hill, New York 1937, page 45 is the discrete probabil ...
is a
binomial distribution In probability theory and statistics, the binomial distribution with parameters ''n'' and ''p'' is the discrete probability distribution of the number of successes in a sequence of ''n'' independent experiments, each asking a yes–no quest ...
with ''n'' = 1 trial. The
exponential distribution In probability theory and statistics, the exponential distribution is the probability distribution of the time between events in a Poisson point process, i.e., a process in which events occur continuously and independently at a constant average ...
is a gamma distribution with shape parameter α = 1 (or ''k'' = 1 ). Some exponential family distributions are not NEF. The
lognormal In probability theory, a log-normal (or lognormal) distribution is a continuous probability distribution of a random variable whose logarithm is normally distributed. Thus, if the random variable is log-normally distributed, then has a normal ...
and
Beta distribution In probability theory and statistics, the beta distribution is a family of continuous probability distributions defined on the interval , 1in terms of two positive parameters, denoted by ''alpha'' (''α'') and ''beta'' (''β''), that appear as ...
are in the exponential family, but not the natural exponential family. The parameterization of most of the above distributions has been written differently from the parameterization commonly used in textbooks and the above linked pages. For example, the above parameterization differs from the parameterization in the linked article in the Poisson case. The two parameterizations are related by \theta = \log(\lambda) , where λ is the mean parameter, and so that the density may be written as :f(k;\theta) = \frac \exp\Big(\ \theta\ k - \exp(\theta)\ \Big) \ , for \theta \in \mathbb, so :h(k) = \frac, \text A(\theta) = \exp(\theta)\ . This alternative parameterization can greatly simplify calculations in
mathematical statistics Mathematical statistics is the application of probability theory, a branch of mathematics, to statistics, as opposed to techniques for collecting statistical data. Specific mathematical techniques which are used for this include mathematical an ...
. For example, in
Bayesian inference Bayesian inference is a method of statistical inference in which Bayes' theorem is used to update the probability for a hypothesis as more evidence or information becomes available. Bayesian inference is an important technique in statistics, a ...
, a
posterior probability distribution The posterior probability is a type of conditional probability that results from updating the prior probability with information summarized by the likelihood via an application of Bayes' rule. From an epistemological perspective, the posterior p ...
is calculated as the product of two distributions. Normally this calculation requires writing out the probability distribution functions (PDF) and integrating; with the above parameterization, however, that calculation can be avoided. Instead, relationships between distributions can be abstracted due to the properties of the NEF described below. An example of the multivariate case is the
multinomial distribution In probability theory, the multinomial distribution is a generalization of the binomial distribution. For example, it models the probability of counts for each side of a ''k''-sided dice rolled ''n'' times. For ''n'' independent trials each of w ...
with known number of trials.


Properties

The properties of the natural exponential family can be used to simplify calculations involving these distributions.


Univariate case

1. The cumulants of an NEF can be calculated as derivatives of the NEF's cumulant generating function. The nth cumulant is the nth derivative of the cumulant generating function with respect to ''t'' evaluated at ''t'' = 0. The
cumulant generating function In probability theory and statistics, the cumulants of a probability distribution are a set of quantities that provide an alternative to the '' moments'' of the distribution. Any two probability distributions whose moments are identical will have ...
is :K_X(t) = A(\theta + t) - A(\theta) \, . The first cumulant is : \kappa_1 = K_X'(t)\Big, _ = \left. \frac A(\theta + t) \_ \, . The mean is the first moment and always equal to the first cumulant, so : \mu_1 = \kappa_1 = \operatorname = K'_X(0) = A'(\theta)\, . The variance is always the second cumulant, and it is always related to the first and second moments by : \operatorname = \kappa_2 = \mu_2 - \mu_1^2 \, , so that : \operatorname = K''_X(0) = A''(\theta) \, . Likewise, the ''n''th cumulant is : \kappa_n = A^(\theta) \, . 2. Natural exponential families (NEF) are closed under convolution. Given
independent identically distributed In probability theory and statistics, a collection of random variables is independent and identically distributed if each random variable has the same probability distribution as the others and all are mutually independent. This property is usual ...
(iid) X_1,\ldots,X_n with distribution from an NEF, then \sum_^n X_i\, is an NEF, although not necessarily the original NEF. This follows from the properties of the cumulant generating function. 3. The
variance function In statistics, the variance function is a smooth function which depicts the variance of a random quantity as a function of its mean. The variance function is a measure of heteroscedasticity and plays a large role in many settings of statisti ...
for random variables with an NEF distribution can be written in terms of the mean. :\operatorname(X) = V(\mu). 4. The first two moments of a NEF distribution uniquely specify the distribution within that family of distributions. : X \sim \operatorname mu, V(\mu).


Multivariate case

In the multivariate case, the mean vector and covariance matrix are : \operatorname = \nabla A(\boldsymbol\theta) \text \operatorname = \nabla \nabla^ A(\boldsymbol\theta)\, , where\nabla is the
gradient In vector calculus, the gradient of a scalar-valued differentiable function of several variables is the vector field (or vector-valued function) \nabla f whose value at a point p is the "direction and rate of fastest increase". If the gradi ...
and \nabla \nabla^ is the
Hessian matrix In mathematics, the Hessian matrix or Hessian is a square matrix of second-order partial derivatives of a scalar-valued function, or scalar field. It describes the local curvature of a function of many variables. The Hessian matrix was developed ...
.


Natural exponential families with quadratic variance functions (NEF-QVF)

A special case of the natural exponential families are those with quadratic variance functions. Six NEFs have quadratic variance functions (QVF) in which the variance of the distribution can be written as a quadratic function of the mean. These are called NEF-QVF. The properties of these distributions were first described by Carl Morris. : \operatorname(X) = V(\mu) = \nu_0 + \nu_1 \mu + \nu_2 \mu^2.


The six NEF-QVFs

The six NEF-QVF are written here in increasing complexity of the relationship between variance and mean. 1. The normal distribution with fixed variance X \sim N(\mu, \sigma^2) is NEF-QVF because the variance is constant. The variance can be written \operatorname(X) = V(\mu) = \sigma^2, so variance is a degree 0 function of the mean. 2. The Poisson distribution X \sim \operatorname(\mu) is NEF-QVF because all Poisson distributions have variance equal to the mean \operatorname(X) = V(\mu) = \mu, so variance is a linear function of the mean. 3. The Gamma distribution X \sim \operatorname(r, \lambda) is NEF-QVF because the mean of the Gamma distribution is \mu = r\lambda and the variance of the Gamma distribution is \operatorname(X) = V(\mu) = \mu^2/r, so the variance is a quadratic function of the mean. 4. The binomial distribution X \sim \operatorname(n, p) is NEF-QVF because the mean is \mu = np and the variance is \operatorname(X) = np(1-p) which can be written in terms of the mean as V(X) = - np^2 + np = -\mu^2/n + \mu. 5. The negative binomial distribution X \sim \operatorname(n, p) is NEF-QVF because the mean is \mu = np/(1-p) and the variance is V(\mu) = \mu^2/n + \mu. 6. The (not very famous) distribution generated by the generalized
hyperbolic secant distribution In probability theory and statistics, the hyperbolic secant distribution is a continuous probability distribution whose probability density function and characteristic function are proportional to the hyperbolic secant function. The hyperbolic sec ...
(NEF-GHS) has V(\mu) = \mu^2/n +n and \mu > 0.


Properties of NEF-QVF

The properties of NEF-QVF can simplify calculations that use these distributions. 1. Natural exponential families with quadratic variance functions (NEF-QVF) are closed under convolutions of a linear transformation. That is, a convolution of a linear transformation of an NEF-QVF is also an NEF-QVF, although not necessarily the original one. Given
independent identically distributed In probability theory and statistics, a collection of random variables is independent and identically distributed if each random variable has the same probability distribution as the others and all are mutually independent. This property is usual ...
(iid) X_1,\ldots,X_n with distribution from a NEF-QVF. A convolution of a linear transformation of an NEF-QVF is also an NEF-QVF. Let Y = \sum_^n (X_i - b)/c \, be the convolution of a linear transformation of ''X''. The mean of ''Y'' is \mu^* = n(\mu - b)/c \,. The variance of ''Y'' can be written in terms of the variance function of the original NEF-QVF. If the original NEF-QVF had variance function : \operatorname(X) = V(\mu) = \nu_0 + \nu_1 \mu + \nu_2 \mu^2, then the new NEF-QVF has variance function : \operatorname(Y) = V^*(\mu^*) = \nu^*_0 + \nu^*_1 \mu + \nu^*_2 \mu^2 , where : \nu^*_0 = nV(b)/c^2 \, , : \nu^*_1 = V'(b)/c \, , : \nu^*_2/n = \nu_2/n \, . 2. Let X_1 and X_2 be independent NEF with the same parameter θ and let Y = X_1 + X_2 . Then the conditional distribution of X_1 given Y has quadratic variance in Y if and only if X_1 and X_2 are NEF-QVF. Examples of such conditional distributions are the
normal Normal(s) or The Normal(s) may refer to: Film and television * ''Normal'' (2003 film), starring Jessica Lange and Tom Wilkinson * ''Normal'' (2007 film), starring Carrie-Anne Moss, Kevin Zegers, Callum Keith Rennie, and Andrew Airlie * ''Norma ...
,
binomial Binomial may refer to: In mathematics *Binomial (polynomial), a polynomial with two terms * Binomial coefficient, numbers appearing in the expansions of powers of binomials *Binomial QMF, a perfect-reconstruction orthogonal wavelet decomposition ...
,
beta Beta (, ; uppercase , lowercase , or cursive ; grc, βῆτα, bē̂ta or ell, βήτα, víta) is the second letter of the Greek alphabet. In the system of Greek numerals, it has a value of 2. In Modern Greek, it represents the voiced labiod ...
, hypergeometric and
geometric distribution In probability theory and statistics, the geometric distribution is either one of two discrete probability distributions: * The probability distribution of the number ''X'' of Bernoulli trials needed to get one success, supported on the set \; * ...
s, which are not all NEF-QVF. 3. NEF-QVF have
conjugate prior distribution In Bayesian probability theory, if the posterior distribution p(\theta \mid x) is in the same probability distribution family as the prior probability distribution p(\theta), the prior and posterior are then called conjugate distributions, and th ...
s on μ in the Pearson system of distributions (also called the
Pearson distribution The Pearson distribution is a family of continuous probability distribution, continuous probability distributions. It was first published by Karl Pearson in 1895 and subsequently extended by him in 1901 and 1916 in a series of articles on biostat ...
although the Pearson system of distributions is actually a family of distributions rather than a single distribution.) Examples of conjugate prior distributions of NEF-QVF distributions are the
normal Normal(s) or The Normal(s) may refer to: Film and television * ''Normal'' (2003 film), starring Jessica Lange and Tom Wilkinson * ''Normal'' (2007 film), starring Carrie-Anne Moss, Kevin Zegers, Callum Keith Rennie, and Andrew Airlie * ''Norma ...
,
gamma Gamma (uppercase , lowercase ; ''gámma'') is the third letter of the Greek alphabet. In the system of Greek numerals it has a value of 3. In Ancient Greek, the letter gamma represented a voiced velar stop . In Modern Greek, this letter re ...
, reciprocal gamma,
beta Beta (, ; uppercase , lowercase , or cursive ; grc, βῆτα, bē̂ta or ell, βήτα, víta) is the second letter of the Greek alphabet. In the system of Greek numerals, it has a value of 2. In Modern Greek, it represents the voiced labiod ...
, F-, and t- distributions. Again, these conjugate priors are not all NEF-QVF. 4. If X \mid \mu has an NEF-QVF distribution and μ has a conjugate prior distribution then the marginal distributions are well-known distributions. These properties together with the above notation can simplify calculations in
mathematical statistics Mathematical statistics is the application of probability theory, a branch of mathematics, to statistics, as opposed to techniques for collecting statistical data. Specific mathematical techniques which are used for this include mathematical an ...
that would normally be done using complicated calculations and calculus.


See also

*
Generalized linear model In statistics, a generalized linear model (GLM) is a flexible generalization of ordinary linear regression. The GLM generalizes linear regression by allowing the linear model to be related to the response variable via a ''link function'' and b ...
*
Pearson distribution The Pearson distribution is a family of continuous probability distribution, continuous probability distributions. It was first published by Karl Pearson in 1895 and subsequently extended by him in 1901 and 1916 in a series of articles on biostat ...
*
Sheffer sequence In mathematics, a Sheffer sequence or poweroid is a polynomial sequence, i.e., a sequence of polynomials in which the index of each polynomial equals its degree, satisfying conditions related to the umbral calculus in combinatorics. They are ...
*
Orthogonal polynomials In mathematics, an orthogonal polynomial sequence is a family of polynomials such that any two different polynomials in the sequence are orthogonality, orthogonal to each other under some inner product. The most widely used orthogonal polynomial ...


References

* Morris C. (1982) ''Natural exponential families with quadratic variance functions: statistical theory''. Dept of mathematics, Institute of Statistics, University of Texas, Austin. {{ProbDistributions, families Exponentials Types of probability distributions