HOME

TheInfoList



OR:

The deviance information criterion (DIC) is a hierarchical modeling generalization of the
Akaike information criterion The Akaike information criterion (AIC) is an estimator of prediction error and thereby relative quality of statistical models for a given set of data. Given a collection of models for the data, AIC estimates the quality of each model, relative to e ...
(AIC). It is particularly useful in
Bayesian Thomas Bayes (/beɪz/; c. 1701 – 1761) was an English statistician, philosopher, and Presbyterian minister. Bayesian () refers either to a range of concepts and approaches that relate to statistical methods based on Bayes' theorem, or a followe ...
model selection Model selection is the task of selecting a statistical model from a set of candidate models, given data. In the simplest cases, a pre-existing set of data is considered. However, the task can also involve the design of experiments such that the ...
problems where the
posterior distribution The posterior probability is a type of conditional probability that results from updating the prior probability with information summarized by the likelihood via an application of Bayes' rule. From an epistemological perspective, the posterior p ...
s of the
model A model is an informative representation of an object, person or system. The term originally denoted the plans of a building in late 16th-century English, and derived via French and Italian ultimately from Latin ''modulus'', a measure. Models c ...
s have been obtained by
Markov chain Monte Carlo In statistics, Markov chain Monte Carlo (MCMC) methods comprise a class of algorithms for sampling from a probability distribution. By constructing a Markov chain that has the desired distribution as its equilibrium distribution, one can obtain ...
(MCMC) simulation. DIC is an asymptotic approximation as the sample size becomes large, like AIC. It is only valid when the
posterior distribution The posterior probability is a type of conditional probability that results from updating the prior probability with information summarized by the likelihood via an application of Bayes' rule. From an epistemological perspective, the posterior p ...
is approximately
multivariate normal In probability theory and statistics, the multivariate normal distribution, multivariate Gaussian distribution, or joint normal distribution is a generalization of the one-dimensional (univariate) normal distribution to higher dimensions. One d ...
.


Definition

Define the deviance as D(\theta)=-2 \log(p(y, \theta))+C\, , where y are the data, \theta are the unknown parameters of the model and p(y, \theta) is the
likelihood function The likelihood function (often simply called the likelihood) represents the probability of random variable realizations conditional on particular values of the statistical parameters. Thus, when evaluated on a given sample, the likelihood funct ...
. C is a constant that cancels out in all calculations that compare different models, and which therefore does not need to be known. There are two calculations in common usage for the effective number of parameters of the model. The first, as described in , is p_D=\overline-D(\bar), where \bar is the expectation of \theta. The second, as described in , is p_D = p_V = \frac\overline. The larger the effective number of parameters is, the ''easier'' it is for the model to fit the data, and so the deviance needs to be penalized. The deviance information criterion is calculated as :\mathrm = p_D+\overline, or equivalently as :\mathrm = D(\bar)+2 p_D. From this latter form, the connection with AIC is more evident.


Motivation

The idea is that models with smaller DIC should be preferred to models with larger DIC. Models are penalized both by the value of \bar, which favors a good fit, but also (similar to AIC) by the effective number of parameters p_D. Since \bar D will decrease as the number of parameters in a model increases, the p_D term compensates for this effect by favoring models with a smaller number of parameters. An advantage of DIC over other criteria in the case of Bayesian model selection is that the DIC is easily calculated from the samples generated by a Markov chain Monte Carlo simulation. AIC requires calculating the likelihood at its maximum over \theta, which is not readily available from the MCMC simulation. But to calculate DIC, simply compute \bar as the average of D(\theta) over the samples of \theta, and D(\bar) as the value of D evaluated at the average of the samples of \theta. Then the DIC follows directly from these approximations. Claeskens and Hjort (2008, Ch. 3.5) show that the DIC is large-sample equivalent to the natural model-robust version of the AIC.


Assumptions

In the derivation of DIC, it is assumed that the specified parametric family of probability distributions that generate future observations encompasses the true model. This assumption does not always hold, and it is desirable to consider model assessment procedures in that scenario. Also, the observed data are used both to construct the posterior distribution and to evaluate the estimated models. Therefore, DIC tends to select over-fitted models.


Extensions

A resolution to the issues above was suggested by , with the proposal of the Bayesian predictive information criterion (BPIC). Ando (2010, Ch. 8) provided a discussion of various Bayesian model selection criteria. To avoid the over-fitting problems of DIC, developed Bayesian model selection criteria from a predictive view point. The criterion is calculated as :\mathit =\bar+2p_D=-2\mathbf^\theta \theta))2p_D. The first term is a measure of how well the model fits the data, while the second term is a penalty on the model complexity. Note that the in this expression is the predictive distribution rather than the likelihood above.


See also

*
Akaike information criterion The Akaike information criterion (AIC) is an estimator of prediction error and thereby relative quality of statistical models for a given set of data. Given a collection of models for the data, AIC estimates the quality of each model, relative to e ...
(AIC) * Bayesian information criterion (BIC) * Focused information criterion (FIC) * Hannan-Quinn information criterion *
Kullback–Leibler divergence In mathematical statistics, the Kullback–Leibler divergence (also called relative entropy and I-divergence), denoted D_\text(P \parallel Q), is a type of statistical distance: a measure of how one probability distribution ''P'' is different fr ...
*
Jensen–Shannon divergence In probability theory and statistics, the Jensen– Shannon divergence is a method of measuring the similarity between two probability distributions. It is also known as information radius (IRad) or total divergence to the average. It is based o ...
*
Watanabe–Akaike information criterion In statistics, the widely applicable information criterion (WAIC), also known as Watanabe–Akaike information criterion, is the generalized version of the Akaike information criterion (AIC) onto singular statistical models. Widely applicable Bay ...
(WAIC)


References

* * Ando, T. (2010). ''Bayesian Model Selection and Statistical Modeling'', CRC Press. Chapter 7. * * Claeskens, G, and Hjort, N.L. (2008). ''Model Selection and Model Averaging'', Cambridge. Section 3.5. * * van der Linde, A. (2005). "DIC in variable selection", ''Statistica Neerlandica'', 59: 45-56. do
10.1111/j.1467-9574.2005.00278.x
* *


External links

* {{cbignore Bayesian statistics Regression variable selection Model selection