HOME

TheInfoList



OR:

In
probability theory Probability theory is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expressing it through a set o ...
and
statistics Statistics (from German language, German: ''wikt:Statistik#German, Statistik'', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of ...
, the normal-gamma distribution (or Gaussian-gamma distribution) is a bivariate four-parameter family of continuous
probability distribution In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment. It is a mathematical description of a random phenomenon i ...
s. It is the
conjugate prior In Bayesian probability theory, if the posterior distribution p(\theta \mid x) is in the same probability distribution family as the prior probability distribution p(\theta), the prior and posterior are then called conjugate distributions, and th ...
of a
normal distribution In statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is : f(x) = \frac e^ The parameter \mu ...
with unknown
mean There are several kinds of mean in mathematics, especially in statistics. Each mean serves to summarize a given group of data, often to better understand the overall value (magnitude and sign) of a given data set. For a data set, the ''arithme ...
and
precision Precision, precise or precisely may refer to: Science, and technology, and mathematics Mathematics and computing (general) * Accuracy and precision, measurement deviation from true value and its scatter * Significant figures, the number of digit ...
.


Definition

For a pair of
random variables A random variable (also called random quantity, aleatory variable, or stochastic variable) is a mathematical formalization of a quantity or object which depends on random events. It is a mapping or a function from possible outcomes (e.g., the po ...
, (''X'',''T''), suppose that the
conditional distribution In probability theory and statistics, given two jointly distributed random variables X and Y, the conditional probability distribution of Y given X is the probability distribution of Y when X is known to be a particular value; in some cases the co ...
of ''X'' given ''T'' is given by : X\mid T \sim N(\mu,1 /(\lambda T)) \,\! , meaning that the conditional distribution is a
normal distribution In statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is : f(x) = \frac e^ The parameter \mu ...
with
mean There are several kinds of mean in mathematics, especially in statistics. Each mean serves to summarize a given group of data, often to better understand the overall value (magnitude and sign) of a given data set. For a data set, the ''arithme ...
\mu and
precision Precision, precise or precisely may refer to: Science, and technology, and mathematics Mathematics and computing (general) * Accuracy and precision, measurement deviation from true value and its scatter * Significant figures, the number of digit ...
\lambda T — equivalently, with
variance In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its population mean or sample mean. Variance is a measure of dispersion, meaning it is a measure of how far a set of numbers ...
1 / (\lambda T) . Suppose also that the marginal distribution of ''T'' is given by :T \mid \alpha, \beta \sim \operatorname(\alpha,\beta), where this means that ''T'' has a
gamma distribution In probability theory and statistics, the gamma distribution is a two-parameter family of continuous probability distributions. The exponential distribution, Erlang distribution, and chi-square distribution are special cases of the gamma distri ...
. Here ''λ'', ''α'' and ''β'' are parameters of the joint distribution. Then (''X'',''T'') has a normal-gamma distribution, and this is denoted by : (X,T) \sim \operatorname(\mu,\lambda,\alpha,\beta).


Properties


Probability density function

The joint
probability density function In probability theory, a probability density function (PDF), or density of a continuous random variable, is a function whose value at any given sample (or point) in the sample space (the set of possible values taken by the random variable) can ...
of (''X'',''T'') is : f(x,\tau\mid\mu,\lambda,\alpha,\beta) = \frac \, \tau^\,e^\exp\left( -\frac\right)


Marginal distributions

By construction, the
marginal distribution In probability theory and statistics, the marginal distribution of a subset of a collection of random variables is the probability distribution of the variables contained in the subset. It gives the probabilities of various values of the variables ...
of \tau is a
gamma distribution In probability theory and statistics, the gamma distribution is a two-parameter family of continuous probability distributions. The exponential distribution, Erlang distribution, and chi-square distribution are special cases of the gamma distri ...
, and the
conditional distribution In probability theory and statistics, given two jointly distributed random variables X and Y, the conditional probability distribution of Y given X is the probability distribution of Y when X is known to be a particular value; in some cases the co ...
of x given \tau is a Gaussian distribution. The
marginal distribution In probability theory and statistics, the marginal distribution of a subset of a collection of random variables is the probability distribution of the variables contained in the subset. It gives the probabilities of various values of the variables ...
of x is a three-parameter non-standardized
Student's t-distribution In probability and statistics, Student's ''t''-distribution (or simply the ''t''-distribution) is any member of a family of continuous probability distributions that arise when estimating the mean of a normally distributed population in sit ...
with parameters (\nu, \mu, \sigma^2)=(2\alpha, \mu, \beta/(\lambda\alpha)).


Exponential family

The normal-gamma distribution is a four-parameter
exponential family In probability and statistics, an exponential family is a parametric set of probability distributions of a certain form, specified below. This special form is chosen for mathematical convenience, including the enabling of the user to calculate ...
with
natural parameters In probability and statistics, an exponential family is a parametric set of probability distributions of a certain form, specified below. This special form is chosen for mathematical convenience, including the enabling of the user to calculat ...
\alpha-1/2, -\beta-\lambda\mu^2/2, \lambda\mu, -\lambda/2 and
natural statistics In theory of probability, probability and statistics, an exponential family is a parametric model, parametric set of probability distributions of a certain form, specified below. This special form is chosen for mathematical convenience, includin ...
\ln\tau, \tau, \tau x, \tau x^2.


Moments of the natural statistics

The following moments can be easily computed using the moment generating function of the sufficient statistic: :\operatorname(\ln T)=\psi\left(\alpha\right) - \ln\beta, where \psi\left(\alpha\right) is the
digamma function In mathematics, the digamma function is defined as the logarithmic derivative of the gamma function: :\psi(x)=\frac\ln\big(\Gamma(x)\big)=\frac\sim\ln-\frac. It is the first of the polygamma functions. It is strictly increasing and strictly ...
, : \begin \operatorname(T) & =\frac, \\ pt\operatorname(TX) & =\mu \frac, \\ pt\operatorname(TX^2) & =\frac + \mu^2 \frac. \end


Scaling

If (X,T) \sim \mathrm(\mu,\lambda,\alpha,\beta), then for any b>0, (bX,bT) is distributed as (b\mu, \lambda/ b^3, \alpha, \beta/ b ).


Posterior distribution of the parameters

Assume that ''x'' is distributed according to a normal distribution with unknown mean \mu and precision \tau. : x \sim \mathcal(\mu, \tau^) and that the prior distribution on \mu and \tau, (\mu,\tau), has a normal-gamma distribution : (\mu,\tau) \sim \text(\mu_0,\lambda_0,\alpha_0,\beta_0) , for which the density satisfies : \pi(\mu,\tau) \propto \tau^\,\exp \beta_0\tau,\exp\left -\frac 2 \right Suppose : x_1,\ldots,x_n \mid \mu,\tau \sim \operatorname \operatorname N\left( \mu, \tau^ \right), i.e. the components of \mathbf X = (x_1,\ldots,x_n) are conditionally independent given \mu,\tau and the conditional distribution of each of them given \mu,\tau is normal with expected value \mu and variance 1 / \tau. The posterior distribution of \mu and \tau given this dataset \mathbb X can be analytically determined by
Bayes' theorem In probability theory and statistics, Bayes' theorem (alternatively Bayes' law or Bayes' rule), named after Thomas Bayes, describes the probability of an event, based on prior knowledge of conditions that might be related to the event. For examp ...
explicitly, :\mathbf(\tau,\mu \mid \mathbf) \propto \mathbf(\mathbf \mid \tau,\mu) \pi(\tau,\mu), where \mathbf is the likelihood of the parameters given the data. Since the data are i.i.d, the likelihood of the entire dataset is equal to the product of the likelihoods of the individual data samples: : \mathbf(\mathbf \mid \tau, \mu) = \prod_^n \mathbf(x_i \mid \tau, \mu). This expression can be simplified as follows: : \begin \mathbf(\mathbf \mid \tau, \mu) & \propto \prod_^n \tau^ \exp\left frac(x_i-\mu)^2\right\\ pt& \propto \tau^ \exp\left frac\sum_^n(x_i-\mu)^2\right\\ pt& \propto \tau^ \exp\left frac \sum_^n(x_i-\bar +\bar -\mu)^2 \right\\ pt& \propto \tau^ \exp\left frac 2 \sum_^n \left((x_i-\bar)^2 + (\bar -\mu)^2 \right)\right\\ pt& \propto \tau^ \exp\left frac\left(n s + n(\bar -\mu)^2\right)\right \end where \bar= \frac\sum_^n x_i, the mean of the data samples, and s= \frac \sum_^n(x_i-\bar)^2, the sample variance. The posterior distribution of the parameters is proportional to the prior times the likelihood. : \begin \mathbf(\tau, \mu \mid \mathbf) &\propto \mathbf(\mathbf \mid \tau,\mu) \pi(\tau,\mu) \\ &\propto \tau^ \exp \left \frac\left(n s + n(\bar -\mu)^2\right) \right \tau^\,\exp[]\,\exp\left[-\frac\right] \\ &\propto \tau^\exp\left[-\tau \left( \frac n s + \beta_0 \right) \right] \exp\left[- \frac\left(\lambda_0(\mu-\mu_0)^2 + n(\bar -\mu)^2\right)\right] \end The final exponential term is simplified by completing the square. : \begin \lambda_0(\mu-\mu_0)^2 + n(\bar -\mu)^2&=\lambda_0 \mu^2 - 2 \lambda_0 \mu \mu_0 + \lambda_0 \mu_0^2 + n \mu^2 - 2 n \bar \mu + n \bar^2 \\ &= (\lambda_0 + n) \mu^2 - 2(\lambda_0 \mu_0 + n \bar) \mu + \lambda_0 \mu_0^2 +n \bar^2 \\ &= (\lambda_0 + n)( \mu^2 - 2 \frac \mu ) + \lambda_0 \mu_0^2 +n \bar^2 \\ &= (\lambda_0 + n)\left(\mu - \frac \right) ^2 + \lambda_0 \mu_0^2 +n \bar^2 - \frac \\ &= (\lambda_0 + n)\left(\mu - \frac \right) ^2 + \frac \end On inserting this back into the expression above, : \begin \mathbf(\tau, \mu \mid \mathbf) & \propto \tau^ \exp \left \tau \left( \frac n s + \beta_0 \right) \right\exp \left \frac \left( \left(\lambda_0 + n \right) \left(\mu- \frac \right)^2 + \frac \right) \right\ & \propto \tau^ \exp \left \tau \left( \frac n s + \beta_0 + \frac \right) \right\exp \left \frac \left(\lambda_0 + n \right) \left(\mu- \frac \right)^2 \right\end This final expression is in exactly the same form as a Normal-Gamma distribution, i.e., : \mathbf(\tau, \mu \mid \mathbf) = \text\left(\frac, \lambda_0 + n, \alpha_0+\frac, \beta_0+ \frac\left(n s + \frac \right) \right)


Interpretation of parameters

The interpretation of parameters in terms of pseudo-observations is as follows: *The new mean takes a weighted average of the old pseudo-mean and the observed mean, weighted by the number of associated (pseudo-)observations. *The precision was estimated from 2\alpha pseudo-observations (i.e. possibly a different number of pseudo-observations, to allow the variance of the mean and precision to be controlled separately) with sample mean \mu and sample variance \frac (i.e. with sum of
squared deviations Squared deviations from the mean (SDM) result from squaring deviations. In probability theory and statistics, the definition of ''variance'' is either the expected value of the SDM (when considering a theoretical distribution) or its average valu ...
2\beta). *The posterior updates the number of pseudo-observations (\lambda_) simply by adding up the corresponding number of new observations (n). *The new sum of squared deviations is computed by adding the previous respective sums of squared deviations. However, a third "interaction term" is needed because the two sets of squared deviations were computed with respect to different means, and hence the sum of the two underestimates the actual total squared deviation. As a consequence, if one has a prior mean of \mu_0 from n_\mu samples and a prior precision of \tau_0 from n_\tau samples, the prior distribution over \mu and \tau is : \mathbf(\tau,\mu \mid \mathbf) = \operatorname \left(\mu_0, n_\mu , \frac, \frac\right) and after observing n samples with mean \mu and variance s, the posterior probability is : \mathbf(\tau,\mu \mid \mathbf) = \text\left( \frac, n_\mu +n ,\frac(n_\tau+n), \frac\left(\frac + n s + \frac\right) \right) Note that in some programming languages, such as
Matlab MATLAB (an abbreviation of "MATrix LABoratory") is a proprietary multi-paradigm programming language and numeric computing environment developed by MathWorks. MATLAB allows matrix manipulations, plotting of functions and data, implementation ...
, the gamma distribution is implemented with the inverse definition of \beta, so the fourth argument of the Normal-Gamma distribution is 2 \tau_0 /n_\tau.


Generating normal-gamma random variates

Generation of random variates is straightforward: # Sample \tau from a gamma distribution with parameters \alpha and \beta # Sample x from a normal distribution with mean \mu and variance 1/(\lambda \tau)


Related distributions

* The
normal-inverse-gamma distribution In probability theory and statistics, the normal-inverse-gamma distribution (or Gaussian-inverse-gamma distribution) is a four-parameter family of multivariate continuous probability distributions. It is the conjugate prior of a normal distributi ...
is essentially the same distribution parameterized by variance rather than precision * The
normal-exponential-gamma distribution In probability theory and statistics, the normal-exponential-gamma distribution (sometimes called the NEG distribution) is a three-parameter family of continuous probability distributions. It has a location parameter \mu, scale parameter \theta a ...


Notes


References

* Bernardo, J.M.; Smith, A.F.M. (1993) ''Bayesian Theory'', Wiley. * Dearden et al
"Bayesian Q-learning"
''Proceedings of the Fifteenth National Conference on Artificial Intelligence (AAAI-98)'', July 26–30, 1998, Madison, Wisconsin, USA. {{DEFAULTSORT:Normal-gamma distribution Multivariate continuous distributions Conjugate prior distributions Normal distribution