probability Probability is a branch of mathematics and statistics concerning events and numerical descriptions of how likely they are to occur. The probability of an event is a number between 0 and 1; the larger the probability, the more likely an e ...

and

statistics Statistics (from German language, German: ', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a s ...

, a compound probability distribution (also known as a mixture distribution or contagious distribution) is the

probability distribution In probability theory and statistics, a probability distribution is a Function (mathematics), function that gives the probabilities of occurrence of possible events for an Experiment (probability theory), experiment. It is a mathematical descri ...

that results from assuming that a

random variable A random variable (also called random quantity, aleatory variable, or stochastic variable) is a Mathematics, mathematical formalization of a quantity or object which depends on randomness, random events. The term 'random variable' in its mathema ...

is distributed according to some parametrized distribution, with (some of) the parameters of that distribution themselves being random variables. If the parameter is a

scale parameter In probability theory and statistics, a scale parameter is a special kind of numerical parameter of a parametric family of probability distributions. The larger the scale parameter, the more spread out the distribution. Definition If a family ...

, the resulting mixture is also called a scale mixture. The compound distribution ("unconditional distribution") is the result of marginalizing (integrating) over the ''latent'' random variable(s) representing the parameter(s) of the parametrized distribution ("conditional distribution").

Definition

A compound probability distribution is the probability distribution that results from assuming that a random variable

X

is distributed according to some parametrized distribution

F

with an unknown parameter

\theta

that is again distributed according to some other distribution

G

. The resulting distribution

H

is said to be the distribution that results from compounding

F

with

G

. The parameter's distribution

G

is also called the mixing distribution or latent distribution. Technically, the ''unconditional'' distribution

H

results from '' marginalizing'' over

G

, i.e., from integrating out the unknown parameter(s)

\theta

. Its

probability density function In probability theory, a probability density function (PDF), density function, or density of an absolutely continuous random variable, is a Function (mathematics), function whose value at any given sample (or point) in the sample space (the s ...

is given by: :

p_H(x) =

The same formula applies analogously if some or all of the variables are vectors. From the above formula, one can see that a compound distribution essentially is a special case of a

marginal distribution In probability theory and statistics, the marginal distribution of a subset of a collection of random variables is the probability distribution of the variables contained in the subset. It gives the probabilities of various values of the variable ...

: The '' joint distribution'' of

x

and

\theta

is given by

p(x,\theta)=p(x, \theta)p(\theta)

, and the compound results as its marginal distribution:

. If the domain of

\theta

is discrete, then the distribution is again a special case of a mixture distribution.

Properties

General

The compound distribution

H

will depend on the specific expression of each distribution, as well as which parameter of

F

is distributed according to the distribution

G

, and the parameters of

H

will include any parameters of

G

that are not marginalized, or integrated, out. The support of

H

is the same as that of

F

, and if the latter is a two-parameter distribution parameterized with the mean and variance, some general properties exist.

Mean and variance

The compound distribution's first two moments are given by the

law of total expectation The proposition in probability theory known as the law of total expectation, the law of iterated expectations (LIE), Adam's law, the tower rule, and the smoothing property of conditional expectation, among other names, states that if X is a random ...

and the law of total variance:

\operatorname_H = \operatorname_G\bigl \theta bigr">.html" ;"title="operatorname_F \theta bigr

\operatorname_H(X) = \operatorname_G\bigl[\operatorname_F(X">\theta)\bigr+ \operatorname_G\bigl(\operatorname_F[X">\thetabigr)

If the mean of

F

is distributed as

G

, which in turn has mean

\mu

and variance

\sigma^2

the expressions above imply

= \operatorname_G

theta Theta (, ) uppercase Θ or ; lowercase θ or ; ''thē̂ta'' ; Modern: ''thī́ta'' ) is the eighth letter of the Greek alphabet, derived from the Phoenician letter Teth 𐤈. In the system of Greek numerals, it has a value of 9. Gree ...

= \mu and

\operatorname_H(X) = \operatorname_F(X, \theta) + \operatorname_G(Y) = \tau^2 + \sigma^2

, where

\tau^2

is the variance of

F

Proof

let

F

and

G

be probability distributions parameterized with mean a variance as

\begin
x &\sim \mathcal(\theta,\tau^2) \\
\theta &\sim \mathcal(\mu,\sigma^2)
\end

then denoting the probability density functions as

f(x, \theta) = p_F(x, \theta)

and

g(\theta) = p_G(\theta)

respectively, and

h(x)

being the probability density of

H

we have

g(\theta) d\theta \end

and we have from the parameterization

\mathcal

and

\mathcal

that

&= \int_F x f(x, \theta)dx = \theta \\ \operatorname_G

&= \int_G \theta g(\theta)d\theta = \mu \end and therefore the mean of the compound distribution

\operatorname_H = \mu

as per the expression for its first moment above. The variance of

H

is given by

^2

, and

= \int_F x^2 h(x)dx &= \int_F x^2 \int_G f(x, \theta) g(\theta) d\theta dx \\ &= \int_G g(\theta)\int_F x^2 f(x, \theta) dx\ d\theta \\ &= \int_G g(\theta)(\tau^2+\theta^2)d\theta\\ &= \tau^2\int_G g(\theta)d\theta+\int_Gg(\theta)\theta^2d\theta\\ &= \tau^2+(\sigma^2+\mu^2), \end

given the fact that

\int_F x^2 f(x\mid \theta) dx=\operatorname_F^2\mid \theta \operatorname_F(X\mid\theta)+(\operatorname_F \mid \theta^2

and

\operatorname_G(\theta) + (\operatorname_G

^2 . Finally we get

^2 \\ &= \tau^2 + \sigma^2 \end

Applications

Testing

Distributions of common

test statistic Test statistic is a quantity derived from the sample for statistical hypothesis testing.Berger, R. L.; Casella, G. (2001). ''Statistical Inference'', Duxbury Press, Second Edition (p.374) A hypothesis test is typically specified in terms of a tes ...

s result as compound distributions under their null hypothesis, for example in

Student's t-test Student's ''t''-test is a statistical test used to test whether the difference between the response of two groups is statistically significant or not. It is any statistical hypothesis test in which the test statistic follows a Student's ''t''- ...

(where the test statistic results as the ratio of a normal and a chi-squared random variable), or in the

F-test An F-test is a statistical test that compares variances. It is used to determine if the variances of two samples, or if the ratios of variances among multiple samples, are significantly different. The test calculates a Test statistic, statistic, ...

(where the test statistic is the ratio of two chi-squared random variables).

Overdispersion modeling

Compound distributions are useful for modeling outcomes exhibiting overdispersion, i.e., a greater amount of variability than would be expected under a certain model. For example, count data are commonly modeled using the

Poisson distribution In probability theory and statistics, the Poisson distribution () is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time if these events occur with a known const ...

, whose variance is equal to its mean. The distribution may be generalized by allowing for variability in its rate parameter, implemented via a

gamma distribution In probability theory and statistics, the gamma distribution is a versatile two-parameter family of continuous probability distributions. The exponential distribution, Erlang distribution, and chi-squared distribution are special cases of the g ...

, which results in a marginal

negative binomial distribution In probability theory and statistics, the negative binomial distribution, also called a Pascal distribution, is a discrete probability distribution that models the number of failures in a sequence of independent and identically distributed Berno ...

. This distribution is similar in its shape to the Poisson distribution, but it allows for larger variances. Similarly, a

binomial distribution In probability theory and statistics, the binomial distribution with parameters and is the discrete probability distribution of the number of successes in a sequence of statistical independence, independent experiment (probability theory) ...

may be generalized to allow for additional variability by compounding it with a

beta distribution In probability theory and statistics, the beta distribution is a family of continuous probability distributions defined on the interval

, 1 The comma is a punctuation mark that appears in several variants in different languages. Some typefaces render it as a small line, slightly curved or straight, but inclined from the vertical; others give it the appearance of a miniature fille ...

or (0, 1) in terms of two positive Statistical parameter, parameters, denoted by ''alpha'' (''α'') an ...

for its success probability parameter, which results in a beta-binomial distribution.

Bayesian inference

Besides ubiquitous marginal distributions that may be seen as special cases of compound distributions, in

Bayesian inference Bayesian inference ( or ) is a method of statistical inference in which Bayes' theorem is used to calculate a probability of a hypothesis, given prior evidence, and update it as more information becomes available. Fundamentally, Bayesian infer ...

, compound distributions arise when, in the notation above, ''F'' represents the distribution of future observations and ''G'' is the

posterior distribution The posterior probability is a type of conditional probability that results from updating the prior probability with information summarized by the likelihood via an application of Bayes' rule. From an epistemological perspective, the posterior ...

of the parameters of ''F'', given the information in a set of observed data. This gives a posterior predictive distribution. Correspondingly, for the prior predictive distribution, ''F'' is the distribution of a new data point while ''G'' is the

prior distribution A prior probability distribution of an uncertain quantity, simply called the prior, is its assumed probability distribution before some evidence is taken into account. For example, the prior could be the probability distribution representing the ...

of the parameters.

Convolution

Convolution In mathematics (in particular, functional analysis), convolution is a operation (mathematics), mathematical operation on two function (mathematics), functions f and g that produces a third function f*g, as the integral of the product of the two ...

of probability distributions (to derive the probability distribution of sums of random variables) may also be seen as a special case of compounding; here the sum's distribution essentially results from considering one summand as a random

location parameter In statistics, a location parameter of a probability distribution is a scalar- or vector-valued parameter x_0, which determines the "location" or shift of the distribution. In the literature of location parameter estimation, the probability distr ...

for the other summand.

Computation

Compound distributions derived from

exponential family In probability and statistics, an exponential family is a parametric set of probability distributions of a certain form, specified below. This special form is chosen for mathematical convenience, including the enabling of the user to calculate ...

distributions often have a closed form. If analytical integration is not possible, numerical methods may be necessary. Compound distributions may relatively easily be investigated using

Monte Carlo method Monte Carlo methods, or Monte Carlo experiments, are a broad class of computational algorithms that rely on repeated random sampling to obtain numerical results. The underlying concept is to use randomness to solve problems that might be ...

s, i.e., by generating random samples. It is often easy to generate random numbers from the distributions

p(\theta)

as well as

p(x, \theta)

and then utilize these to perform '' collapsed Gibbs sampling'' to generate samples from

p(x)

. A compound distribution may usually also be approximated to a sufficient degree by a mixture distribution using a finite number of mixture components, allowing to derive approximate density, distribution function etc. Parameter estimation ( maximum-likelihood or maximum-a-posteriori estimation) within a compound distribution model may sometimes be simplified by utilizing the EM-algorithm.

Examples

* Gaussian scale mixtures: ** Compounding a

normal distribution In probability theory and statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is f(x) = \frac ...

with

variance In probability theory and statistics, variance is the expected value of the squared deviation from the mean of a random variable. The standard deviation (SD) is obtained as the square root of the variance. Variance is a measure of dispersion ...

distributed according to an

inverse gamma distribution In probability theory and statistics, the inverse gamma distribution is a two-parameter family of continuous probability distributions on the positive real line, which is the distribution of the reciprocal of a variable distributed according to ...

(or equivalently, with precision distributed as a

) yields a non-standardized

Student's t-distribution In probability theory and statistics, Student's distribution (or simply the distribution) t_\nu is a continuous probability distribution that generalizes the Normal distribution#Standard normal distribution, standard normal distribu ...

. This distribution has the same symmetrical shape as a normal distribution with the same central point, but has greater variance and heavy tails. ** Compounding a Gaussian (or normal) distribution with variance distributed according to an

exponential distribution In probability theory and statistics, the exponential distribution or negative exponential distribution is the probability distribution of the distance between events in a Poisson point process, i.e., a process in which events occur continuousl ...

(or with standard deviation according to a

Rayleigh distribution In probability theory and statistics, the Rayleigh distribution is a continuous probability distribution for nonnegative-valued random variables. Up to rescaling, it coincides with the chi distribution with two degrees of freedom. The distributi ...

) yields a

Laplace distribution In probability theory and statistics, the Laplace distribution is a continuous probability distribution named after Pierre-Simon Laplace. It is also sometimes called the double exponential distribution, because it can be thought of as two exponen ...

. More generally, compounding a Gaussian (or normal) distribution with variance distributed according to a

yields a

variance-gamma distribution The variance-gamma distribution, generalized Laplace distribution or Bessel function distribution is a continuous probability distribution that is defined as the normal variance-mean mixture where the mixture density, mixing density is the gamma d ...

. ** Compounding a

Gaussian distribution In probability theory and statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real number, real-valued random variable. The general form of its probability density function is f(x ...

with variance distributed according to an

whose rate parameter is itself distributed according to a

yields a Normal-exponential-gamma distribution. (This involves two compounding stages. The variance itself then follows a Lomax distribution; see below.) ** Compounding a

with standard deviation distributed according to a (standard) inverse uniform distribution yields a Slash distribution. ** Compounding a Gaussian (normal) distribution with a Kolmogorov distribution yields a

logistic distribution In probability theory and statistics, the logistic distribution is a continuous probability distribution. Its cumulative distribution function is the logistic function, which appears in logistic regression and feedforward neural networks. It rese ...

. * other Gaussian mixtures: ** Compounding a

with

mean A mean is a quantity representing the "center" of a collection of numbers and is intermediate to the extreme values of the set of numbers. There are several kinds of means (or "measures of central tendency") in mathematics, especially in statist ...

distributed according to another

yields (again) a

. ** Compounding a

with

distributed according to a shifted

yields an

exponentially modified Gaussian distribution In probability theory, an exponentially modified Gaussian distribution (EMG, also known as exGaussian distribution) describes the sum of independent Normal distribution, normal and Exponential distribution, exponential random variables. An exGau ...

. * Compounding a

Bernoulli distribution In probability theory and statistics, the Bernoulli distribution, named after Swiss mathematician Jacob Bernoulli, is the discrete probability distribution of a random variable which takes the value 1 with probability p and the value 0 with pro ...

with probability of success

p

distributed according to a distribution

X

that has a defined expected value yields a Bernoulli distribution with success probability

E /math>. An interesting consequence is that the dispersion of X does not influence the dispersion of the resulting compound distribution.  
* Compounding a

with probability of success distributed according to a

beta distribution In probability theory and statistics, the beta distribution is a family of continuous probability distributions defined on the interval

or (0, 1) in terms of two positive Statistical parameter, parameters, denoted by ''alpha'' (''α'') an ...

yields a beta-binomial distribution. It possesses three parameters, a parameter

n

(number of samples) from the binomial distribution and

shape parameter In probability theory and statistics, a shape parameter (also known as form parameter) is a kind of numerical parameter of a parametric family of probability distributionsEveritt B.S. (2002) Cambridge Dictionary of Statistics. 2nd Edition. CUP. th ...

\alpha

and

\beta

from the beta distribution. * Compounding a

multinomial distribution In probability theory, the multinomial distribution is a generalization of the binomial distribution. For example, it models the probability of counts for each side of a ''k''-sided die rolled ''n'' times. For ''n'' statistical independence, indepen ...

with probability vector distributed according to a

Dirichlet distribution In probability and statistics, the Dirichlet distribution (after Peter Gustav Lejeune Dirichlet), often denoted \operatorname(\boldsymbol\alpha), is a family of continuous multivariate probability distributions parameterized by a vector of pos ...

yields a Dirichlet-multinomial distribution. * Compounding a

with rate parameter distributed according to a

yields a

. * Compounding a

with rate parameter distributed according to an

yields a

geometric distribution In probability theory and statistics, the geometric distribution is either one of two discrete probability distributions: * The probability distribution of the number X of Bernoulli trials needed to get one success, supported on \mathbb = \; * T ...

. * Compounding an

with its rate parameter distributed according to a

yields a Lomax distribution. * Compounding a

with inverse scale parameter distributed according to another

yields a three-parameter

beta prime distribution In probability theory and statistics, the beta prime distribution (also known as inverted beta distribution or beta distribution of the second kindJohnson et al (1995), p 248) is an absolutely continuous probability distribution. If p\in ,1/math ...

. * Compounding a

half-normal distribution In probability theory and statistics, the half-normal distribution is a special case of the folded normal distribution. Let X follow an ordinary normal distribution, N(0,\sigma^2). Then, Y=, X, follows a half-normal distribution. Thus, the ha ...

with its

distributed according to a

yields an

. This follows immediately from the

resulting as a normal scale mixture; see above. The roles of conditional and mixing distributions may also be exchanged here; consequently, compounding a

with its scale parameter distributed according to a

''also'' yields an

. * A Gamma(k=2,θ) - distributed random variable whose

θ again is uniformly distributed marginally yields an

Similar terms

The notion of "compound distribution" as used e.g. in the definition of a

Compound Poisson distribution In probability theory, a compound Poisson distribution is the probability distribution of the sum of a number of independent identically-distributed random variables, where the number of terms to be added is itself a Poisson-distributed variable. ...

Compound Poisson process A compound Poisson process is a continuous-time stochastic process with jumps. The jumps arrive randomly according to a Poisson process and the size of the jumps is also random, with a specified probability distribution. To be precise, a compound ...

is different from the definition found in this article. The meaning in this article corresponds to what is used in e.g.

Bayesian hierarchical modeling Bayesian hierarchical modelling is a statistical model written in multiple levels (hierarchical form) that estimates the parametric model, parameters of the Posterior probability, posterior distribution using the Bayesian inference, Bayesian metho ...

. The special case for compound probability distributions where the parametrized distribution

F

is the

is also called

mixed Poisson distribution A mixed Poisson distribution is a univariate discrete probability distribution in stochastics. It results from assuming that the conditional distribution of a random variable, given the value of the rate parameter, is a Poisson distribution, and ...

Definition

Properties

General

Mean and variance

Proof

Applications

Testing

Overdispersion modeling

Bayesian inference

Convolution

Computation

Examples

Similar terms

See also

References

Further reading