The scaled inverse chi-squared distribution is the distribution for ''x'' = 1/''s''², where ''s''² is a sample mean of the squares of ν independent normal random variables that have mean 0 and inverse variance 1/σ² = τ². The distribution is therefore parametrised by the two quantities ν and τ², referred to as the ''number of chi-squared degrees of freedom'' and the ''scaling parameter'', respectively. This family of scaled inverse chi-squared distributions is closely related to two other distribution families, those of the

inverse-chi-squared distribution In probability and statistics, the inverse-chi-squared distribution (or inverted-chi-square distributionBernardo, J.M.; Smith, A.F.M. (1993) ''Bayesian Theory'' ,Wiley (pages 119, 431) ) is a continuous probability distribution of a positive-val ...

and the

inverse-gamma distribution In probability theory and statistics, the inverse gamma distribution is a two-parameter family of continuous probability distributions on the positive real line, which is the distribution of the reciprocal of a variable distributed according t ...

. Compared to the inverse-chi-squared distribution, the scaled distribution has an extra parameter ''τ''², which scales the distribution horizontally and vertically, representing the inverse-variance of the original underlying process. Also, the scaled inverse chi-squared distribution is presented as the distribution for the inverse of the ''mean'' of ν squared deviates, rather than the inverse of their ''sum''. The two distributions thus have the relation that if :

X \sim \mbox\chi^2(\nu, \tau^2)

then

\frac \sim \mbox\chi^2(\nu)

Compared to the inverse gamma distribution, the scaled inverse chi-squared distribution describes the same data distribution, but using a different parametrization, which may be more convenient in some circumstances. Specifically, if :

X \sim \mbox\chi^2(\nu, \tau^2)

then

X \sim \textrm\left(\frac, \frac\right)

Either form may be used to represent the maximum entropy distribution for a fixed first inverse moment

(E(1/X))

and first logarithmic moment

(E(\ln(X))

. The scaled inverse chi-squared distribution also has a particular use in

Bayesian statistics Bayesian statistics is a theory in the field of statistics based on the Bayesian interpretation of probability where probability expresses a ''degree of belief'' in an event. The degree of belief may be based on prior knowledge about the event, ...

, somewhat unrelated to its use as a predictive distribution for ''x'' = 1/''s''². Specifically, the scaled inverse chi-squared distribution can be used as a

conjugate prior In Bayesian probability theory, if the posterior distribution p(\theta \mid x) is in the same probability distribution family as the prior probability distribution p(\theta), the prior and posterior are then called conjugate distributions, and t ...

for the

variance In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its population mean or sample mean. Variance is a measure of dispersion, meaning it is a measure of how far a set of number ...

parameter of a

normal distribution In statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is : f(x) = \frac e^ The parameter \mu i ...

. In this context the scaling parameter is denoted by σ₀² rather than by τ², and has a different interpretation. The application has been more usually presented using the

formulation instead; however, some authors, following in particular Gelman ''et al.'' (1995/2004) argue that the inverse chi-squared parametrisation is more intuitive.

Characterization

The

probability density function In probability theory, a probability density function (PDF), or density of a continuous random variable, is a function whose value at any given sample (or point) in the sample space (the set of possible values taken by the random variable) c ...

of the scaled inverse chi-squared distribution extends over the domain

x>0

and is :

f(x; \nu, \tau^2)=
\frac~
\frac

where

\nu

is the

degrees of freedom Degrees of freedom (often abbreviated df or DOF) refers to the number of independent variables or parameters of a thermodynamic system. In various scientific fields, the word "freedom" is used to describe the limits to which physical movement or ...

parameter and

\tau^2

is the

scale parameter In probability theory and statistics, a scale parameter is a special kind of numerical parameter of a parametric family of probability distributions. The larger the scale parameter, the more spread out the distribution. Definition If a family o ...

. The cumulative distribution function is :

F(x; \nu, \tau^2)=
\Gamma\left(\frac,\frac\right)
\left/\Gamma\left(\frac\right)\right.

=Q\left(\frac,\frac\right)

where

\Gamma(a,x)

is the incomplete gamma function,

\Gamma(x)

is the

gamma function In mathematics, the gamma function (represented by , the capital letter gamma from the Greek alphabet) is one commonly used extension of the factorial function to complex numbers. The gamma function is defined for all complex numbers except th ...

and

Q(a,x)

is a regularized gamma function. The

characteristic function In mathematics, the term "characteristic function" can refer to any of several distinct concepts: * The indicator function of a subset, that is the function ::\mathbf_A\colon X \to \, :which for a given subset ''A'' of ''X'', has value 1 at point ...

is :

\varphi(t;\nu,\tau^2)=

\frac\left(\frac\right)^\!\!K_\left(\sqrt\right) ,

where

K_(z)

is the modified Bessel function of the second kind.

Parameter estimation

The

maximum likelihood estimate In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of an assumed probability distribution, given some observed data. This is achieved by maximizing a likelihood function so that, under the assumed statis ...

\tau^2

is :

\tau^2 = n/\sum_^n \frac.

The maximum likelihood estimate of

\frac

can be found using

Newton's method In numerical analysis, Newton's method, also known as the Newton–Raphson method, named after Isaac Newton and Joseph Raphson, is a root-finding algorithm which produces successively better approximations to the roots (or zeroes) of a real ...

on: :

\ln\left(\frac\right) - \psi\left(\frac\right) = \frac \sum_^n \ln\left(x_i\right) - \ln\left(\tau^2\right) ,

where

\psi(x)

is the

digamma function In mathematics, the digamma function is defined as the logarithmic derivative of the gamma function: :\psi(x)=\frac\ln\big(\Gamma(x)\big)=\frac\sim\ln-\frac. It is the first of the polygamma functions. It is strictly increasing and strict ...

. An initial estimate can be found by taking the formula for mean and solving it for

\nu.

Let

\bar = \frac\sum_^n x_i

be the sample mean. Then an initial estimate for

\nu

is given by: :

\frac = \frac.

Bayesian estimation of the variance of a normal distribution

The scaled inverse chi-squared distribution has a second important application, in the Bayesian estimation of the variance of a Normal distribution. According to

Bayes' theorem In probability theory and statistics, Bayes' theorem (alternatively Bayes' law or Bayes' rule), named after Thomas Bayes, describes the probability of an event, based on prior knowledge of conditions that might be related to the event. For exa ...

, the

posterior probability distribution The posterior probability is a type of conditional probability that results from updating the prior probability with information summarized by the likelihood via an application of Bayes' rule. From an epistemological perspective, the posterior p ...

for quantities of interest is proportional to the product of a

prior distribution In Bayesian statistical inference, a prior probability distribution, often simply called the prior, of an uncertain quantity is the probability distribution that would express one's beliefs about this quantity before some evidence is taken int ...

for the quantities and a

likelihood function The likelihood function (often simply called the likelihood) represents the probability of random variable realizations conditional on particular values of the statistical parameters. Thus, when evaluated on a given sample, the likelihood funct ...

: :

p(\sigma^2, D,I) \propto p(\sigma^2, I) \; p(D, \sigma^2)

where ''D'' represents the data and ''I'' represents any initial information about σ² that we may already have. The simplest scenario arises if the mean μ is already known; or, alternatively, if it is the

conditional distribution In probability theory and statistics, given two jointly distributed random variables X and Y, the conditional probability distribution of Y given X is the probability distribution of Y when X is known to be a particular value; in some cases the c ...

of σ² that is sought, for a particular assumed value of μ. Then the likelihood term ''L''(σ², ''D'') = ''p''(''D'', σ²) has the familiar form :

\mathcal(\sigma^2, D,\mu) = \frac \; \exp \left -\frac \right /math>

Combining this with the rescaling-invariant prior p(σ

², ''I'') = 1/σ², which can be argued (e.g. following Jeffreys) to be the least informative possible prior for σ² in this problem, gives a combined posterior probability :

p(\sigma^2, D, I, \mu) \propto \frac \; \exp \left -\frac \right /math>
This form can be recognised as that of a scaled inverse chi-squared distribution, with parameters ν = ''n'' and τ

² = ''s''² = (1/''n'') Σ (x_i-μ)² Gelman ''et al'' remark that the re-appearance of this distribution, previously seen in a sampling context, may seem remarkable; but given the choice of prior the "result is not surprising".Gelman ''et al'' (1995), ''Bayesian Data Analysis'' (1st ed), p.68 In particular, the choice of a rescaling-invariant prior for σ² has the result that the probability for the ratio of σ² / ''s''² has the same form (independent of the conditioning variable) when conditioned on ''s''² as when conditioned on σ²: :

p(\tfrac, s^2) = p(\tfrac, \sigma^2)

In the sampling-theory case, conditioned on σ², the probability distribution for (1/s²) is a scaled inverse chi-squared distribution; and so the probability distribution for σ² conditioned on ''s''², given a scale-agnostic prior, is also a scaled inverse chi-squared distribution.

Use as an informative prior

If more is known about the possible values of σ², a distribution from the scaled inverse chi-squared family, such as Scale-inv-χ²(''n''₀, ''s''₀²) can be a convenient form to represent a more informative prior for σ², as if from the result of ''n''₀ previous observations (though ''n''₀ need not necessarily be a whole number): :

p(\sigma^2, I^\prime, \mu) \propto \frac \; \exp \left -\frac \right /math>
Such a prior would lead to the posterior distribution
: p(\sigma^2, D, I^\prime, \mu) \propto \frac \; \exp \left -\frac \right /math>
which is itself a scaled inverse chi-squared distribution.  The scaled inverse chi-squared distributions are thus a convenient

family for σ² estimation.

Estimation of variance when mean is unknown

If the mean is not known, the most uninformative prior that can be taken for it is arguably the translation-invariant prior ''p''(μ, ''I'') ∝ const., which gives the following joint posterior distribution for μ and σ², :

\end

The marginal posterior distribution for σ² is obtained from the joint posterior distribution by integrating out over μ, :

\end

This is again a scaled inverse chi-squared distribution, with parameters

\scriptstyle\;

and

\scriptstyle

Related distributions

* If

X \sim \mbox\chi^2(\nu, \tau^2)

then

k X \sim \mbox\chi^2(\nu, k \tau^2)\,

* If

X \sim \mbox\chi^2(\nu) \,

(

Inverse-chi-squared distribution In probability and statistics, the inverse-chi-squared distribution (or inverted-chi-square distributionBernardo, J.M.; Smith, A.F.M. (1993) ''Bayesian Theory'' ,Wiley (pages 119, 431) ) is a continuous probability distribution of a positive-val ...

) then

X \sim \mbox\chi^2(\nu, 1/\nu) \,

* If

X \sim \mbox\chi^2(\nu, \tau^2)

then

\frac \sim \mbox\chi^2(\nu) \,

(

) * If

X \sim \mbox\chi^2(\nu, \tau^2)

then

X \sim \textrm\left(\frac, \frac\right)

(

Inverse-gamma distribution In probability theory and statistics, the inverse gamma distribution is a two-parameter family of continuous probability distributions on the positive real line, which is the distribution of the reciprocal of a variable distributed according t ...

) * Scaled inverse chi square distribution is a special case of type 5

Pearson distribution The Pearson distribution is a family of continuous probability distributions. It was first published by Karl Pearson in 1895 and subsequently extended by him in 1901 and 1916 in a series of articles on biostatistics. History The Pearson system ...

References

* Gelman A. ''et al'' (1995), ''Bayesian Data Analysis'', pp 474–475; also pp 47, 480 {{DEFAULTSORT:Scaled-Inverse-Chi-Squared Distribution Continuous distributions Exponential family distributions