The goodness of fit of a
statistical model
A statistical model is a mathematical model that embodies a set of statistical assumptions concerning the generation of Sample (statistics), sample data (and similar data from a larger Statistical population, population). A statistical model repres ...
describes how well it fits a set of observations. Measures of goodness of fit typically summarize the discrepancy between observed values and the values expected under the model in question. Such measures can be used in
statistical hypothesis testing
A statistical hypothesis test is a method of statistical inference used to decide whether the data at hand sufficiently support a particular hypothesis.
Hypothesis testing allows us to make probabilistic statements about population parameters.
...
, e.g. to
test for normality of
residuals, to test whether two samples are drawn from identical distributions (see
Kolmogorov–Smirnov test), or whether outcome frequencies follow a specified distribution (see
Pearson's chi-square test
Pearson's chi-squared test (\chi^2) is a statistical test applied to sets of categorical data to evaluate how likely it is that any observed difference between the sets arose by chance. It is the most widely used of many chi-squared tests (e.g. ...
). In the
analysis of variance
Analysis of variance (ANOVA) is a collection of statistical models and their associated estimation procedures (such as the "variation" among and between groups) used to analyze the differences among means. ANOVA was developed by the statisticia ...
, one of the components into which the variance is partitioned may be a
lack-of-fit sum of squares In statistics, a sum of squares due to lack of fit, or more tersely a lack-of-fit sum of squares, is one of the components of a partition of the sum of squares of residuals in an analysis of variance, used in the numerator in an F-test of the nul ...
.
Fit of distributions
In assessing whether a given distribution is suited to a data-set, the following
test
Test(s), testing, or TEST may refer to:
* Test (assessment), an educational assessment intended to measure the respondents' knowledge or other abilities
Arts and entertainment
* ''Test'' (2013 film), an American film
* ''Test'' (2014 film), ...
s and their underlying measures of fit can be used:
*
Bayesian information criterion
*
Kolmogorov–Smirnov test
In statistics, the Kolmogorov–Smirnov test (K–S test or KS test) is a nonparametric test of the equality of continuous (or discontinuous, see Section 2.2), one-dimensional probability distributions that can be used to compare a sample wit ...
*
Cramér–von Mises criterion In statistics the Cramér–von Mises criterion is a criterion used for judging the goodness of fit of a cumulative distribution function F^* compared to a given empirical distribution function F_n, or for comparing two empirical distributions. ...
*
Anderson–Darling test
The Anderson–Darling test is a statistical test of whether a given sample of data is drawn from a given probability distribution. In its basic form, the test assumes that there are no parameters to be estimated in the distribution being tested, ...
*
Shapiro–Wilk test
*
Chi-squared test
A chi-squared test (also chi-square or test) is a statistical hypothesis test used in the analysis of contingency tables when the sample sizes are large. In simpler terms, this test is primarily used to examine whether two categorical variables ...
*
Akaike information criterion
The Akaike information criterion (AIC) is an estimator of prediction error and thereby relative quality of statistical models for a given set of data. Given a collection of models for the data, AIC estimates the quality of each model, relative to e ...
*
Hosmer–Lemeshow test
*
Kuiper's test
*Kernelized Stein discrepancy
*Zhang's Z
K, Z
C and Z
A tests
*
Moran test
In statistics, maximum spacing estimation (MSE or MSP), or maximum product of spacing estimation (MPS), is a method for estimating the parameters of a univariate statistical model. The method requires maximization of the geometric mean of ''sp ...
*Density Based Empirical Likelihood Ratio tests
Regression analysis
In
regression analysis
In statistical modeling, regression analysis is a set of statistical processes for estimating the relationships between a dependent variable (often called the 'outcome' or 'response' variable, or a 'label' in machine learning parlance) and one ...
, the following topics relate to goodness of fit:
*
Coefficient of determination (the R-squared measure of goodness of fit);
*
Lack-of-fit sum of squares In statistics, a sum of squares due to lack of fit, or more tersely a lack-of-fit sum of squares, is one of the components of a partition of the sum of squares of residuals in an analysis of variance, used in the numerator in an F-test of the nul ...
;
*
Reduced chi-square
In statistics, the reduced chi-square statistic is used extensively in goodness of fit testing. It is also known as mean squared weighted deviation (MSWD) in isotopic dating and variance of unit weight in the context of weighted least squares.
...
*
Regression validation
In statistics, regression validation is the process of deciding whether the numerical results quantifying hypothesized relationships between variables, obtained from regression analysis, are acceptable as descriptions of the data. The validation ...
*
Mallows's Cp criterion
Categorical data
The following are examples that arise in the context of
categorical data
In statistics, a categorical variable (also called qualitative variable) is a variable that can take on one of a limited, and usually fixed, number of possible values, assigning each individual or other unit of observation to a particular group or ...
.
Pearson's chi-square test
Pearson's chi-square test
Pearson's chi-squared test (\chi^2) is a statistical test applied to sets of categorical data to evaluate how likely it is that any observed difference between the sets arose by chance. It is the most widely used of many chi-squared tests (e.g. ...
uses a measure of goodness of fit which is the sum of differences between observed and
expected outcome frequencies (that is, counts of observations), each squared and divided by the expectation:
where:
*''O
i'' = an observed count for bin ''i''
*''E
i'' = an expected count for bin ''i'', asserted by the
null hypothesis
In scientific research, the null hypothesis (often denoted ''H''0) is the claim that no difference or relationship exists between two sets of data or variables being analyzed. The null hypothesis is that any experimentally observed difference is d ...
.
The expected frequency is calculated by:
where:
*''F'' = the
cumulative distribution function
In probability theory and statistics, the cumulative distribution function (CDF) of a real-valued random variable X, or just distribution function of X, evaluated at x, is the probability that X will take a value less than or equal to x.
Ev ...
for the
probability distribution
In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment. It is a mathematical description of a random phenomenon i ...
being tested.
*''Y
u'' = the upper limit for class ''i'',
*''Y
l'' = the lower limit for class ''i'', and
*''N'' = the sample size
The resulting value can be compared with a
chi-square distribution
In probability theory and statistics, the chi-squared distribution (also chi-square or \chi^2-distribution) with k degrees of freedom is the distribution of a sum of the squares of k independent standard normal random variables. The chi-squa ...
to determine the goodness of fit. The chi-square distribution has (''k'' − ''c'')
degrees of freedom, where ''k'' is the number of non-empty cells and ''c'' is the number of estimated parameters (including location and scale parameters and shape parameters) for the distribution plus one. For example, for a 3-parameter
Weibull distribution
In probability theory and statistics, the Weibull distribution is a continuous probability distribution. It is named after Swedish mathematician Waloddi Weibull, who described it in detail in 1951, although it was first identified by Maurice Re ...
, ''c'' = 4.
Example: equal frequencies of men and women
For example, to test the hypothesis that a random sample of 100 people has been drawn from a population in which men and women are equal in frequency, the observed number of men and women would be compared to the theoretical frequencies of 50 men and 50 women. If there were 44 men in the sample and 56 women, then
If the null hypothesis is true (i.e., men and women are chosen with equal probability in the sample), the test statistic will be drawn from a chi-square distribution with one
degree of freedom
Degrees of freedom (often abbreviated df or DOF) refers to the number of independent variables or parameters of a thermodynamic system. In various scientific fields, the word "freedom" is used to describe the limits to which physical movement or ...
. Though one might expect two degrees of freedom (one each for the men and women), we must take into account that the total number of men and women is constrained (100), and thus there is only one degree of freedom (2 − 1). In other words, if the male count is known the female count is determined, and vice versa.
Consultation of the
chi-square distribution
In probability theory and statistics, the chi-squared distribution (also chi-square or \chi^2-distribution) with k degrees of freedom is the distribution of a sum of the squares of k independent standard normal random variables. The chi-squa ...
for 1 degree of freedom shows that the cumulative
probability
Probability is the branch of mathematics concerning numerical descriptions of how likely an Event (probability theory), event is to occur, or how likely it is that a proposition is true. The probability of an event is a number between 0 and ...
of observing a difference more than
if men and women are equally numerous in the population is approximately 0.23. This probability is higher than the conventionally accepted criteria for
statistical significance (a probability of .001-.05), so normally we would not reject the null hypothesis that the number of men in the population is the same as the number of women (i.e. we would consider our sample within the range of what we'd expect for a 50/50 male/female ratio.)
Note the assumption that the mechanism that has generated the sample is random, in the sense of independent random selection with the same probability, here 0.5 for both males and females. If, for example, each of the 44 males selected brought a male buddy, and each of the 56 females brought a female buddy, each
will increase by a factor of 4, while each
will increase by a factor of 2. The value of the statistic will double to 2.88. Knowing this underlying mechanism, we should of course be counting pairs. In general, the mechanism, if not defensibly random, will not be known. The distribution to which the test statistic should be referred may, accordingly, be very different from chi-square.
Binomial case
A binomial experiment is a sequence of independent trials in which the trials can result in one of two outcomes, success or failure. There are ''n'' trials each with probability of success, denoted by ''p''. Provided that ''np''
''i'' ≫ 1 for every ''i'' (where ''i'' = 1, 2, ..., ''k''), then
This has approximately a chi-square distribution with ''k'' − 1 degrees of freedom. The fact that there are ''k'' − 1 degrees of freedom is a consequence of the restriction
. We know there are ''k'' observed cell counts, however, once any ''k'' − 1 are known, the remaining one is uniquely determined. Basically, one can say, there are only ''k'' − 1 freely determined cell counts, thus ''k'' − 1 degrees of freedom.
''G''-test
''G''-tests are
likelihood-ratio tests of
statistical significance that are increasingly being used in situations where Pearson's chi-square tests were previously recommended.
The general formula for ''G'' is
:
where
and
are the same as for the chi-square test,
denotes the
natural logarithm
The natural logarithm of a number is its logarithm to the base of the mathematical constant , which is an irrational and transcendental number approximately equal to . The natural logarithm of is generally written as , , or sometimes, if ...
, and the sum is taken over all non-empty cells. Furthermore, the total observed count should be equal to the total expected count:
where
is the total number of observations.
''G''-tests have been recommended at least since the 1981 edition of the popular statistics textbook by
Robert R. Sokal
Robert Reuven Sokal (January 13, 1926 in Vienna, Austria – April 9, 2012 in Stony Brook, New York) was an Austrian-American biostatistician and entomologist. Distinguished Professor Emeritus at the Stony Brook University, Sokal was a member ...
and
F. James Rohlf.
See also
*
All models are wrong
All or ALL may refer to:
Language
* All, an indefinite pronoun in English
* All, one of the English determiners
* Allar language (ISO 639-3 code)
* Allative case (abbreviated ALL)
Music
* All (band), an American punk rock band
* ''All'' (All ...
*
Deviance (statistics)
In statistics, deviance is a goodness-of-fit statistic for a statistical model; it is often used for statistical hypothesis testing. It is a generalization of the idea of using the sum of squares of residuals (SSR) in ordinary least squares to c ...
(related to
GLM)
*
Overfitting
mathematical modeling, overfitting is "the production of an analysis that corresponds too closely or exactly to a particular set of data, and may therefore fail to fit to additional data or predict future observations reliably". An overfitt ...
*
Statistical model validation
*
Theil–Sen estimator
In non-parametric statistics, the Theil–Sen estimator is a method for robustly fitting a line to sample points in the plane ( simple linear regression) by choosing the median of the slopes of all lines through pairs of points. It has also b ...
References
Further reading
*
*
*
*{{citation , author1-first= Albert , author1-last= Vexler , author2-first= Gregory , author2-last= Gurevich , title= Empirical likelihood ratios applied to goodness-of-fit tests based on sample entropy , journal=
Computational Statistics & Data Analysis
''Computational Statistics & Data Analysis'' is a monthly peer-reviewed scientific journal covering research on and applications of computational statistics and data analysis. The journal was established in 1983 and is the official journal of the I ...
, year= 2010 , volume= 54 , issue= 2 , pages= 531–545 , doi= 10.1016/j.csda.2009.09.025
Statistical theory