HOME

TheInfoList



OR:

In
statistics Statistics (from German language, German: ''wikt:Statistik#German, Statistik'', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of ...
, deviance is a
goodness-of-fit The goodness of fit of a statistical model describes how well it fits a set of observations. Measures of goodness of fit typically summarize the discrepancy between observed values and the values expected under the model in question. Such measure ...
statistic for a
statistical model A statistical model is a mathematical model that embodies a set of statistical assumptions concerning the generation of Sample (statistics), sample data (and similar data from a larger Statistical population, population). A statistical model repres ...
; it is often used for
statistical hypothesis testing A statistical hypothesis test is a method of statistical inference used to decide whether the data at hand sufficiently support a particular hypothesis. Hypothesis testing allows us to make probabilistic statements about population parameters. ...
. It is a generalization of the idea of using the
sum of squares of residuals In statistics, the residual sum of squares (RSS), also known as the sum of squared estimate of errors (SSE), is the sum of the squares of residuals (deviations predicted from actual empirical values of data). It is a measure of the discrepanc ...
(SSR) in
ordinary least squares In statistics, ordinary least squares (OLS) is a type of linear least squares method for choosing the unknown parameters in a linear regression model (with fixed level-one effects of a linear function of a set of explanatory variables) by the prin ...
to cases where model-fitting is achieved by
maximum likelihood In statistics, maximum likelihood estimation (MLE) is a method of estimation theory, estimating the Statistical parameter, parameters of an assumed probability distribution, given some observed data. This is achieved by Mathematical optimization, ...
. It plays an important role in
exponential dispersion model In probability and statistics, the class of exponential dispersion models (EDM) is a set of probability distributions that represents a generalisation of the natural exponential family.Jørgensen, B. (1987). Exponential dispersion models (with dis ...
s and
generalized linear model In statistics, a generalized linear model (GLM) is a flexible generalization of ordinary linear regression. The GLM generalizes linear regression by allowing the linear model to be related to the response variable via a ''link function'' and b ...
s.


Definition

The unit deviance d(y,\mu) is a bivariate function that satisfies the following conditions: * d(y,y) = 0 * d(y,\mu) > 0 \quad\forall y \neq \mu The total deviance D(\mathbf,\hat) of a model with predictions \hat of the observation \mathbf is the sum of its unit deviances: D(\mathbf,\hat) = \sum_i d(y_i, \hat_i). The (total) deviance for a model ''M''0 with estimates \hat = E \hat_0/math>, based on a dataset ''y'', may be constructed by its likelihood as:McCullagh and Nelder (1989): page 17 D(y,\hat) = 2 \left(\log \left (y\mid\hat \theta_s)\right- \log \left p(y\mid\hat \theta_0)\rightright). Here \hat \theta_0 denotes the fitted values of the parameters in the model ''M''0, while \hat \theta_s denotes the fitted parameters for the ''saturated model'': both sets of fitted values are implicitly functions of the observations ''y''. Here, the saturated model is a model with a parameter for every observation so that the data are fitted exactly. This expression is simply 2 times the
log-likelihood ratio In statistics, the likelihood-ratio test assesses the goodness of fit of two competing statistical models based on the ratio of their likelihoods, specifically one found by maximization over the entire parameter space and another found after im ...
of the full model compared to the reduced model. The deviance is used to compare two models – in particular in the case of
generalized linear model In statistics, a generalized linear model (GLM) is a flexible generalization of ordinary linear regression. The GLM generalizes linear regression by allowing the linear model to be related to the response variable via a ''link function'' and b ...
s (GLM) where it has a similar role to residual sum of squares from
ANOVA Analysis of variance (ANOVA) is a collection of statistical models and their associated estimation procedures (such as the "variation" among and between groups) used to analyze the differences among means. ANOVA was developed by the statistician ...
in linear models (
RSS RSS ( RDF Site Summary or Really Simple Syndication) is a web feed that allows users and applications to access updates to websites in a standardized, computer-readable format. Subscribing to RSS feeds can allow a user to keep track of many di ...
). Suppose in the framework of the GLM, we have two nested models, ''M1'' and ''M2''. In particular, suppose that ''M1'' contains the parameters in ''M2'', and ''k'' additional parameters. Then, under the null hypothesis that ''M2'' is the true model, the difference between the deviances for the two models follows, based on
Wilks' theorem In statistics Wilks' theorem offers an asymptotic distribution of the log-likelihood ratio statistic, which can be used to produce confidence intervals for maximum-likelihood estimates or as a test statistic for performing the likelihood-ratio test ...
, an approximate
chi-squared distribution In probability theory and statistics, the chi-squared distribution (also chi-square or \chi^2-distribution) with k degrees of freedom is the distribution of a sum of the squares of k independent standard normal random variables. The chi-squa ...
with ''k''-degrees of freedom. This can be used for hypothesis testing on the deviance. Some usage of the term "deviance" can be confusing. According to Collett:Collett (2003): page 76 : "the quantity -2 \log \big p(y\mid\hat \theta_0)\big is sometimes referred to as a ''deviance''. This is ..inappropriate, since unlike the deviance used in the context of generalized linear modelling, -2 \log \big p(y\mid\hat \theta_0)\big/math> does not measure deviation from a model that is a perfect fit to the data." However, since the principal use is in the form of the difference of the deviances of two models, this confusion in definition is unimportant.


Examples

The unit deviance for the Poisson distribution is d(y,\mu) = 2\left(y\log\frac-y+\mu\right), the unit deviance for the Normal distribution is given by d(y,\mu) = \left(y-\mu\right)^2.


See also

*
Akaike information criterion The Akaike information criterion (AIC) is an estimator of prediction error and thereby relative quality of statistical models for a given set of data. Given a collection of models for the data, AIC estimates the quality of each model, relative to e ...
*
Deviance information criterion The deviance information criterion (DIC) is a hierarchical modeling generalization of the Akaike information criterion (AIC). It is particularly useful in Bayesian model selection problems where the posterior distributions of the models have been o ...
* Hosmer–Lemeshow test, a quality of fit statistic that can be used for binary data *
Pearson's chi-squared test Pearson's chi-squared test (\chi^2) is a statistical test applied to sets of categorical data to evaluate how likely it is that any observed difference between the sets arose by chance. It is the most widely used of many chi-squared tests (e.g., ...
, an alternative quality of fit statistic for
generalized linear models In statistics, a generalized linear model (GLM) is a flexible generalization of ordinary linear regression. The GLM generalizes linear regression by allowing the linear model to be related to the response variable via a ''link function'' and b ...
for count data *
Peirce's criterion In robust statistics, Peirce's criterion is a rule for eliminating outliers from data sets, which was devised by Benjamin Peirce. Outliers removed by Peirce's criterion The problem of outliers In data sets containing real-numbered measurements, ...


Notes


References

* *


External links


Generalized Linear Models
- Edward F. Connor

{{Least squares and regression analysis Statistical hypothesis testing Statistical deviation and dispersion