D'Agostino's K-squared Test
   HOME

TheInfoList



OR:

In statistics, D'Agostino's ''K''2 test, named for Ralph D'Agostino, is a
goodness-of-fit The goodness of fit of a statistical model describes how well it fits a set of observations. Measures of goodness of fit typically summarize the discrepancy between observed values and the values expected under the model in question. Such measure ...
measure of departure from normality, that is the test aims to gauge the compatibility of given data with the null hypothesis that the data is a realization of independent, identically distributed Gaussian random variables. The test is based on transformations of the sample
kurtosis In probability theory and statistics, kurtosis (from el, κυρτός, ''kyrtos'' or ''kurtos'', meaning "curved, arching") is a measure of the "tailedness" of the probability distribution of a real-valued random variable. Like skewness, kurt ...
and
skewness In probability theory and statistics, skewness is a measure of the asymmetry of the probability distribution of a real-valued random variable about its mean. The skewness value can be positive, zero, negative, or undefined. For a unimodal ...
, and has power only against the alternatives that the distribution is skewed and/or kurtic.


Skewness and kurtosis

In the following, denotes a sample of ''n'' observations, ''g''1 and ''g''2 are the sample
skewness In probability theory and statistics, skewness is a measure of the asymmetry of the probability distribution of a real-valued random variable about its mean. The skewness value can be positive, zero, negative, or undefined. For a unimodal ...
and
kurtosis In probability theory and statistics, kurtosis (from el, κυρτός, ''kyrtos'' or ''kurtos'', meaning "curved, arching") is a measure of the "tailedness" of the probability distribution of a real-valued random variable. Like skewness, kurt ...
, ''mj''’s are the ''j''-th sample
central moment In probability theory and statistics, a central moment is a moment of a probability distribution of a random variable about the random variable's mean; that is, it is the expected value of a specified integer power of the deviation of the random ...
s, and \bar is the sample
mean There are several kinds of mean in mathematics, especially in statistics. Each mean serves to summarize a given group of data, often to better understand the overall value (magnitude and sign) of a given data set. For a data set, the '' ari ...
. Frequently in the literature related to normality testing, the skewness and kurtosis are denoted as and ''β''2 respectively. Such notation can be inconvenient since, for example, can be a negative quantity. The sample skewness and kurtosis are defined as : \begin & g_1 = \frac = \frac\ , \\ & g_2 = \frac-3 = \frac - 3\ . \end These quantities consistently estimate the theoretical skewness and kurtosis of the distribution, respectively. Moreover, if the sample indeed comes from a normal population, then the exact finite sample distributions of the skewness and kurtosis can themselves be analysed in terms of their means ''μ''1, variances ''μ''2, skewnesses ''γ''1, and kurtoses ''γ''2. This has been done by , who derived the following expressions: : \begin & \mu_1(g_1) = 0, \\ & \mu_2(g_1) = \frac, \\ & \gamma_1(g_1) \equiv \frac = 0, \\ & \gamma_2(g_1) \equiv \frac-3 = \frac. \end and : \begin & \mu_1(g_2) = - \frac, \\ & \mu_2(g_2) = \frac, \\ & \gamma_1(g_2) \equiv \frac = \frac \sqrt, \\ & \gamma_2(g_2) \equiv \frac-3 = \frac. \end For example, a sample with size drawn from a normally distributed population can be expected to have a skewness of and a kurtosis of , where SD indicates the standard deviation.


Transformed sample skewness and kurtosis

The sample skewness ''g''1 and kurtosis ''g''2 are both asymptotically normal. However, the rate of their convergence to the distribution limit is frustratingly slow, especially for ''g''2. For example even with observations the sample kurtosis ''g''2 has both the skewness and the kurtosis of approximately 0.3, which is not negligible. In order to remedy this situation, it has been suggested to transform the quantities ''g''1 and ''g''2 in a way that makes their distribution as close to standard normal as possible. In particular, suggested the following transformation for sample skewness: : Z_1(g_1) = \delta \operatorname\left( \frac \right), where constants ''α'' and ''δ'' are computed as : \begin & W^2 = \sqrt - 1, \\ & \delta = 1 / \sqrt, \\ & \alpha^2 = 2 / (W^2-1), \end and where ''μ''2 = ''μ''2(''g''1) is the variance of ''g''1, and ''γ''2 = ''γ''2(''g''1) is the kurtosis — the expressions given in the previous section. Similarly, suggested a transformation for ''g''2, which works reasonably well for sample sizes of 20 or greater: : Z_2(g_2) = \sqrt \left\, where : A = 6 + \frac \left( \frac + \sqrt\right), and ''μ''1 = ''μ''1(''g''2), ''μ''2 = ''μ''2(''g''2), ''γ''1 = ''γ''1(''g''2) are the quantities computed by Pearson.


Omnibus ''K''2 statistic

Statistics ''Z''1 and ''Z''2 can be combined to produce an omnibus test, able to detect deviations from normality due to either skewness or kurtosis : : K^2 = Z_1(g_1)^2 + Z_2(g_2)^2\, If the
null hypothesis In scientific research, the null hypothesis (often denoted ''H''0) is the claim that no difference or relationship exists between two sets of data or variables being analyzed. The null hypothesis is that any experimentally observed difference is d ...
of normality is true, then ''K''2 is approximately ''χ''2-distributed with 2 degrees of freedom. Note that the statistics ''g''1, ''g''2 are not independent, only uncorrelated. Therefore, their transforms ''Z''1, ''Z''2 will be dependent also , rendering the validity of ''χ''2 approximation questionable. Simulations show that under the null hypothesis the ''K''2 test statistic is characterized by


See also

* Shapiro–Wilk test * Jarque–Bera test


References

* * * * * * {{DEFAULTSORT:D'agostino'S K-Squared Test Parametric statistics Normality tests