Scaled-inverse-chi-squared Distribution
   HOME

TheInfoList



OR:

The scaled inverse chi-squared distribution is the distribution for ''x'' = 1/''s''2, where ''s''2 is a sample mean of the squares of ν independent
normal Normal(s) or The Normal(s) may refer to: Film and television * ''Normal'' (2003 film), starring Jessica Lange and Tom Wilkinson * ''Normal'' (2007 film), starring Carrie-Anne Moss, Kevin Zegers, Callum Keith Rennie, and Andrew Airlie * ''Norma ...
random variables that have mean 0 and inverse variance 1/σ2 = τ2. The distribution is therefore parametrised by the two quantities ν and τ2, referred to as the ''number of chi-squared degrees of freedom'' and the ''scaling parameter'', respectively. This family of scaled inverse chi-squared distributions is closely related to two other distribution families, those of the
inverse-chi-squared distribution In probability and statistics, the inverse-chi-squared distribution (or inverted-chi-square distributionBernardo, J.M.; Smith, A.F.M. (1993) ''Bayesian Theory'' ,Wiley (pages 119, 431) ) is a continuous probability distribution of a positive-val ...
and the
inverse-gamma distribution In probability theory and statistics, the inverse gamma distribution is a two-parameter family of continuous probability distributions on the positive real line, which is the distribution of the reciprocal of a variable distributed according to ...
. Compared to the inverse-chi-squared distribution, the scaled distribution has an extra parameter ''τ''2, which scales the distribution horizontally and vertically, representing the inverse-variance of the original underlying process. Also, the scaled inverse chi-squared distribution is presented as the distribution for the inverse of the ''mean'' of ν squared deviates, rather than the inverse of their ''sum''. The two distributions thus have the relation that if :X \sim \mbox\chi^2(\nu, \tau^2)   then   \frac \sim \mbox\chi^2(\nu) Compared to the inverse gamma distribution, the scaled inverse chi-squared distribution describes the same data distribution, but using a different parametrization, which may be more convenient in some circumstances. Specifically, if :X \sim \mbox\chi^2(\nu, \tau^2)   then   X \sim \textrm\left(\frac, \frac\right) Either form may be used to represent the maximum entropy distribution for a fixed first inverse
moment Moment or Moments may refer to: * Present time Music * The Moments, American R&B vocal group Albums * ''Moment'' (Dark Tranquillity album), 2020 * ''Moment'' (Speed album), 1998 * ''Moments'' (Darude album) * ''Moments'' (Christine Guldbrand ...
(E(1/X)) and first logarithmic moment (E(\ln(X)). The scaled inverse chi-squared distribution also has a particular use in
Bayesian statistics Bayesian statistics is a theory in the field of statistics based on the Bayesian interpretation of probability where probability expresses a ''degree of belief'' in an event. The degree of belief may be based on prior knowledge about the event, ...
, somewhat unrelated to its use as a predictive distribution for ''x'' = 1/''s''2. Specifically, the scaled inverse chi-squared distribution can be used as a
conjugate prior In Bayesian probability theory, if the posterior distribution p(\theta \mid x) is in the same probability distribution family as the prior probability distribution p(\theta), the prior and posterior are then called conjugate distributions, and th ...
for the
variance In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its population mean or sample mean. Variance is a measure of dispersion, meaning it is a measure of how far a set of numbers ...
parameter of a
normal distribution In statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is : f(x) = \frac e^ The parameter \mu ...
. In this context the scaling parameter is denoted by σ02 rather than by τ2, and has a different interpretation. The application has been more usually presented using the
inverse-gamma distribution In probability theory and statistics, the inverse gamma distribution is a two-parameter family of continuous probability distributions on the positive real line, which is the distribution of the reciprocal of a variable distributed according to ...
formulation instead; however, some authors, following in particular Gelman ''et al.'' (1995/2004) argue that the inverse chi-squared parametrisation is more intuitive.


Characterization

The
probability density function In probability theory, a probability density function (PDF), or density of a continuous random variable, is a function whose value at any given sample (or point) in the sample space (the set of possible values taken by the random variable) can ...
of the scaled inverse chi-squared distribution extends over the domain x>0 and is : f(x; \nu, \tau^2)= \frac~ \frac where \nu is the
degrees of freedom Degrees of freedom (often abbreviated df or DOF) refers to the number of independent variables or parameters of a thermodynamic system. In various scientific fields, the word "freedom" is used to describe the limits to which physical movement or ...
parameter and \tau^2 is the
scale parameter In probability theory and statistics, a scale parameter is a special kind of numerical parameter of a parametric family of probability distributions. The larger the scale parameter, the more spread out the distribution. Definition If a family o ...
. The cumulative distribution function is :F(x; \nu, \tau^2)= \Gamma\left(\frac,\frac\right) \left/\Gamma\left(\frac\right)\right. :=Q\left(\frac,\frac\right) where \Gamma(a,x) is the
incomplete gamma function In mathematics, the upper and lower incomplete gamma functions are types of special functions which arise as solutions to various mathematical problems such as certain integrals. Their respective names stem from their integral definitions, which ...
, \Gamma(x) is the
gamma function In mathematics, the gamma function (represented by , the capital letter gamma from the Greek alphabet) is one commonly used extension of the factorial function to complex numbers. The gamma function is defined for all complex numbers except ...
and Q(a,x) is a
regularized gamma function In mathematics, the upper and lower incomplete gamma functions are types of special functions which arise as solutions to various mathematical problems such as certain integrals. Their respective names stem from their integral definitions, whic ...
. The
characteristic function In mathematics, the term "characteristic function" can refer to any of several distinct concepts: * The indicator function of a subset, that is the function ::\mathbf_A\colon X \to \, :which for a given subset ''A'' of ''X'', has value 1 at points ...
is :\varphi(t;\nu,\tau^2)= :\frac\left(\frac\right)^\!\!K_\left(\sqrt\right) , where K_(z) is the modified
Bessel function of the second kind Bessel functions, first defined by the mathematician Daniel Bernoulli and then generalized by Friedrich Bessel, are canonical solutions of Bessel's differential equation x^2 \frac + x \frac + \left(x^2 - \alpha^2 \right)y = 0 for an arbitrary ...
.


Parameter estimation

The
maximum likelihood estimate In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of an assumed probability distribution, given some observed data. This is achieved by maximizing a likelihood function so that, under the assumed statisti ...
of \tau^2 is :\tau^2 = n/\sum_^n \frac. The maximum likelihood estimate of \frac can be found using
Newton's method In numerical analysis, Newton's method, also known as the Newton–Raphson method, named after Isaac Newton and Joseph Raphson, is a root-finding algorithm which produces successively better approximations to the roots (or zeroes) of a real-valu ...
on: :\ln\left(\frac\right) - \psi\left(\frac\right) = \frac \sum_^n \ln\left(x_i\right) - \ln\left(\tau^2\right) , where \psi(x) is the
digamma function In mathematics, the digamma function is defined as the logarithmic derivative of the gamma function: :\psi(x)=\frac\ln\big(\Gamma(x)\big)=\frac\sim\ln-\frac. It is the first of the polygamma functions. It is strictly increasing and strictly ...
. An initial estimate can be found by taking the formula for mean and solving it for \nu. Let \bar = \frac\sum_^n x_i be the sample mean. Then an initial estimate for \nu is given by: :\frac = \frac.


Bayesian estimation of the variance of a normal distribution

The scaled inverse chi-squared distribution has a second important application, in the Bayesian estimation of the variance of a Normal distribution. According to
Bayes' theorem In probability theory and statistics, Bayes' theorem (alternatively Bayes' law or Bayes' rule), named after Thomas Bayes, describes the probability of an event, based on prior knowledge of conditions that might be related to the event. For examp ...
, the
posterior probability distribution The posterior probability is a type of conditional probability that results from updating the prior probability with information summarized by the likelihood via an application of Bayes' rule. From an epistemological perspective, the posterior p ...
for quantities of interest is proportional to the product of a
prior distribution In Bayesian statistical inference, a prior probability distribution, often simply called the prior, of an uncertain quantity is the probability distribution that would express one's beliefs about this quantity before some evidence is taken int ...
for the quantities and a
likelihood function The likelihood function (often simply called the likelihood) represents the probability of random variable realizations conditional on particular values of the statistical parameters. Thus, when evaluated on a given sample, the likelihood funct ...
: :p(\sigma^2, D,I) \propto p(\sigma^2, I) \; p(D, \sigma^2) where ''D'' represents the data and ''I'' represents any initial information about σ2 that we may already have. The simplest scenario arises if the mean μ is already known; or, alternatively, if it is the
conditional distribution In probability theory and statistics, given two jointly distributed random variables X and Y, the conditional probability distribution of Y given X is the probability distribution of Y when X is known to be a particular value; in some cases the co ...
of σ2 that is sought, for a particular assumed value of μ. Then the likelihood term ''L''(σ2, ''D'') = ''p''(''D'', σ2) has the familiar form :\mathcal(\sigma^2, D,\mu) = \frac \; \exp \left -\frac \right/math> Combining this with the rescaling-invariant prior p(σ2, ''I'') = 1/σ2, which can be argued (e.g. following Jeffreys) to be the least informative possible prior for σ2 in this problem, gives a combined posterior probability :p(\sigma^2, D, I, \mu) \propto \frac \; \exp \left -\frac \right/math> This form can be recognised as that of a scaled inverse chi-squared distribution, with parameters ν = ''n'' and τ2 = ''s''2 = (1/''n'') Σ (xi-μ)2 Gelman ''et al'' remark that the re-appearance of this distribution, previously seen in a sampling context, may seem remarkable; but given the choice of prior the "result is not surprising".Gelman ''et al'' (1995), ''Bayesian Data Analysis'' (1st ed), p.68 In particular, the choice of a rescaling-invariant prior for σ2 has the result that the probability for the ratio of σ2 / ''s''2 has the same form (independent of the conditioning variable) when conditioned on ''s''2 as when conditioned on σ2: :p(\tfrac, s^2) = p(\tfrac, \sigma^2) In the sampling-theory case, conditioned on σ2, the probability distribution for (1/s2) is a scaled inverse chi-squared distribution; and so the probability distribution for σ2 conditioned on ''s''2, given a scale-agnostic prior, is also a scaled inverse chi-squared distribution.


Use as an informative prior

If more is known about the possible values of σ2, a distribution from the scaled inverse chi-squared family, such as Scale-inv-χ2(''n''0, ''s''02) can be a convenient form to represent a more informative prior for σ2, as if from the result of ''n''0 previous observations (though ''n''0 need not necessarily be a whole number): :p(\sigma^2, I^\prime, \mu) \propto \frac \; \exp \left -\frac \right/math> Such a prior would lead to the posterior distribution :p(\sigma^2, D, I^\prime, \mu) \propto \frac \; \exp \left -\frac \right/math> which is itself a scaled inverse chi-squared distribution. The scaled inverse chi-squared distributions are thus a convenient
conjugate prior In Bayesian probability theory, if the posterior distribution p(\theta \mid x) is in the same probability distribution family as the prior probability distribution p(\theta), the prior and posterior are then called conjugate distributions, and th ...
family for σ2 estimation.


Estimation of variance when mean is unknown

If the mean is not known, the most uninformative prior that can be taken for it is arguably the translation-invariant prior ''p''(μ, ''I'') ∝ const., which gives the following joint posterior distribution for μ and σ2, : \begin p(\mu, \sigma^2 \mid D, I) & \propto \frac \exp \left -\frac \right\\ & = \frac \exp \left -\frac \right\exp \left -\frac \right\end The marginal posterior distribution for σ2 is obtained from the joint posterior distribution by integrating out over μ, :\begin p(\sigma^2, D, I) \; \propto \; & \frac \; \exp \left -\frac \right\; \int_^ \exp \left -\frac \rightd\mu\\ = \; & \frac \; \exp \left -\frac \right\; \sqrt \\ \propto \; & (\sigma^2)^ \; \exp \left -\frac \right\end This is again a scaled inverse chi-squared distribution, with parameters \scriptstyle\; and \scriptstyle.


Related distributions

* If X \sim \mbox\chi^2(\nu, \tau^2) then k X \sim \mbox\chi^2(\nu, k \tau^2)\, * If X \sim \mbox\chi^2(\nu) \, (
Inverse-chi-squared distribution In probability and statistics, the inverse-chi-squared distribution (or inverted-chi-square distributionBernardo, J.M.; Smith, A.F.M. (1993) ''Bayesian Theory'' ,Wiley (pages 119, 431) ) is a continuous probability distribution of a positive-val ...
) then X \sim \mbox\chi^2(\nu, 1/\nu) \, * If X \sim \mbox\chi^2(\nu, \tau^2) then \frac \sim \mbox\chi^2(\nu) \, (
Inverse-chi-squared distribution In probability and statistics, the inverse-chi-squared distribution (or inverted-chi-square distributionBernardo, J.M.; Smith, A.F.M. (1993) ''Bayesian Theory'' ,Wiley (pages 119, 431) ) is a continuous probability distribution of a positive-val ...
) * If X \sim \mbox\chi^2(\nu, \tau^2) then X \sim \textrm\left(\frac, \frac\right) (
Inverse-gamma distribution In probability theory and statistics, the inverse gamma distribution is a two-parameter family of continuous probability distributions on the positive real line, which is the distribution of the reciprocal of a variable distributed according to ...
) * Scaled inverse chi square distribution is a special case of type 5
Pearson distribution The Pearson distribution is a family of continuous probability distributions. It was first published by Karl Pearson in 1895 and subsequently extended by him in 1901 and 1916 in a series of articles on biostatistics. History The Pearson system ...


References

* Gelman A. ''et al'' (1995), ''Bayesian Data Analysis'', pp 474–475; also pp 47, 480 {{DEFAULTSORT:Scaled-Inverse-Chi-Squared Distribution Continuous distributions Exponential family distributions