HOME

TheInfoList



OR:

In
statistics Statistics (from German language, German: ''wikt:Statistik#German, Statistik'', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of ...
the Cramér–von Mises criterion is a criterion used for judging the
goodness of fit The goodness of fit of a statistical model describes how well it fits a set of observations. Measures of goodness of fit typically summarize the discrepancy between observed values and the values expected under the model in question. Such measure ...
of a
cumulative distribution function In probability theory and statistics, the cumulative distribution function (CDF) of a real-valued random variable X, or just distribution function of X, evaluated at x, is the probability that X will take a value less than or equal to x. Ev ...
F^* compared to a given
empirical distribution function In statistics, an empirical distribution function (commonly also called an empirical Cumulative Distribution Function, eCDF) is the distribution function associated with the empirical measure of a sample. This cumulative distribution function ...
F_n, or for comparing two empirical distributions. It is also used as a part of other algorithms, such as
minimum distance estimation Minimum-distance estimation (MDE) is a conceptual method for fitting a statistical model to data, usually the empirical distribution. Often-used estimators such as ordinary least squares can be thought of as special cases of minimum-distance esti ...
. It is defined as :\omega^2 = \int_^ _n(x) - F^*(x)2\,\mathrmF^*(x) In one-sample applications F^* is the theoretical distribution and F_n is the empirically observed distribution. Alternatively the two distributions can both be empirically estimated ones; this is called the two-sample case. The criterion is named after
Harald Cramér Harald Cramér (; 25 September 1893 – 5 October 1985) was a Swedish mathematician, actuary, and statistician, specializing in mathematical statistics and probabilistic number theory. John Kingman described him as "one of the giants of statist ...
and Richard Edler von Mises who first proposed it in 1928–1930. The generalization to two samples is due to
Anderson Anderson or Andersson may refer to: Companies * Anderson (Carriage), a company that manufactured automobiles from 1907 to 1910 * Anderson Electric, an early 20th-century electric car * Anderson Greenwood, an industrial manufacturer * Anderson ...
. The Cramér–von Mises test is an alternative to the
Kolmogorov–Smirnov test In statistics, the Kolmogorov–Smirnov test (K–S test or KS test) is a nonparametric test of the equality of continuous (or discontinuous, see Section 2.2), one-dimensional probability distributions that can be used to compare a sample with a ...
(1933).


Cramér–von Mises test (one sample)

Let x_1,x_2,\cdots,x_n be the observed values, in increasing order. Then the statistic is Pearson, E.S., Hartley, H.O. (1972) ''Biometrika Tables for Statisticians, Volume 2'', CUP. (page 118 and Table 54) :T = n\omega^2 = \frac + \sum_^n \left \frac - F(x_i) \right2. If this value is larger than the tabulated value, then the hypothesis that the data came from the distribution F can be rejected.


Watson test

A modified version of the Cramér–von Mises test is the Watson testWatson, G.S. (1961) "Goodness-Of-Fit Tests on a Circle", ''
Biometrika ''Biometrika'' is a peer-reviewed scientific journal published by Oxford University Press for thBiometrika Trust The editor-in-chief is Paul Fearnhead (Lancaster University). The principal focus of this journal is theoretical statistics. It was es ...
'', 48 (1/2), 109-114
which uses the statistic ''U''2, where :U^2= T-n( \bar-\tfrac )^2, where :\bar=\frac \sum_^n F(x_i).


Cramér–von Mises test (two samples)

Let x_1,x_2,\cdots,x_N and y_1,y_2,\cdots,y_M be the observed values in the first and second sample respectively, in increasing order. Let r_1,r_2,\cdots,r_N be the ranks of the x's in the combined sample, and let s_1,s_2,\cdots,s_M be the ranks of the y's in the combined sample. Anderson shows that :T = \frac \omega^2 = \frac - \frac where U is defined as :U = N \sum_^N (r_i-i)^2 + M \sum_^M (s_j-j)^2 If the value of T is larger than the tabulated values, the hypothesis that the two samples come from the same distribution can be rejected. (Some books give critical values for U, which is more convenient, as it avoids the need to compute T via the expression above. The conclusion will be the same). The above assumes there are no duplicates in the x, y, and r sequences. So x_i is unique, and its rank is i in the sorted list x_1,...x_N. If there are duplicates, and x_i through x_j are a run of identical values in the sorted list, then one common approach is the ''midrank''Ruymgaart, F. H., (1980) "A unified approach to the asymptotic distribution theory of certain midrank statistics". In: ''Statistique non Parametrique Asymptotique'', 1±18, J. P. Raoult (Ed.), Lecture Notes on Mathematics, No. 821, Springer, Berlin. method: assign each duplicate a "rank" of (i+j)/2. In the above equations, in the expressions (r_i-i)^2 and (s_j-j)^2, duplicates can modify all four variables r_i, i, s_j, and j.


References

*


Further reading

* {{DEFAULTSORT:Cramer-von Mises criterion Statistical tests Statistical distance Nonparametric statistics Normality tests