Shapiro–Wilk test
   HOME

TheInfoList



OR:

The Shapiro–Wilk test is a test of normality in frequentist
statistics Statistics (from German language, German: ''wikt:Statistik#German, Statistik'', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of ...
. It was published in 1965 by Samuel Sanford Shapiro and
Martin Wilk Martin Bradbury Wilk, (18 December 1922 – 19 February 2013) was a Canadian statistician, academic, and the former Chief Statistician of Canada. In 1965, together with Samuel Shapiro, he developed the Shapiro–Wilk test, which can indicate w ...
.


Theory

The Shapiro–Wilk test tests the
null hypothesis In scientific research, the null hypothesis (often denoted ''H''0) is the claim that no difference or relationship exists between two sets of data or variables being analyzed. The null hypothesis is that any experimentally observed difference is d ...
that a
sample Sample or samples may refer to: Base meaning * Sample (statistics), a subset of a population – complete data set * Sample (signal), a digital discrete sample of a continuous analog signal * Sample (material), a specimen or small quantity of s ...
''x''1, ..., ''x''''n'' came from a normally distributed population. The
test statistic A test statistic is a statistic (a quantity derived from the sample) used in statistical hypothesis testing.Berger, R. L.; Casella, G. (2001). ''Statistical Inference'', Duxbury Press, Second Edition (p.374) A hypothesis test is typically specif ...
is :W = , where * x_ with parentheses enclosing the subscript index ''i'' is the ''i''th
order statistic In statistics, the ''k''th order statistic of a statistical sample is equal to its ''k''th-smallest value. Together with rank statistics, order statistics are among the most fundamental tools in non-parametric statistics and inference. Import ...
, i.e., the ''i''th-smallest number in the sample (not to be confused with x_i). * \overline = \left( x_1 + \cdots + x_n \right) / n is the sample mean. The coefficients a_i are given by: p. 593 :(a_1,\dots,a_n) = , where ''C'' is a
vector norm In mathematics, a norm is a function (mathematics), function from a real number, real or complex number, complex vector space to the non-negative real numbers that behaves in certain ways like the distance from the Origin (mathematics), origin: it ...
: :C = \, V^ m \, = (m^ V^V^m)^ and the vector ''m'', :m = (m_1,\dots,m_n)^\, is made of the
expected value In probability theory, the expected value (also called expectation, expectancy, mathematical expectation, mean, average, or first moment) is a generalization of the weighted average. Informally, the expected value is the arithmetic mean of a l ...
s of the
order statistic In statistics, the ''k''th order statistic of a statistical sample is equal to its ''k''th-smallest value. Together with rank statistics, order statistics are among the most fundamental tools in non-parametric statistics and inference. Import ...
s of
independent and identically distributed random variables In probability theory and statistics, a collection of random variables is independent and identically distributed if each random variable has the same probability distribution as the others and all are mutually independent. This property is us ...
sampled from the standard normal distribution; finally, V is the
covariance matrix In probability theory and statistics, a covariance matrix (also known as auto-covariance matrix, dispersion matrix, variance matrix, or variance–covariance matrix) is a square matrix giving the covariance between each pair of elements of ...
of those normal order statistics. There is no name for the distribution of W. The cutoff values for the statistics are calculated through Monte Carlo simulations.


Interpretation

The null-hypothesis of this test is that the population is normally distributed. Thus, if the ''p'' value is less than the chosen
alpha level In statistical hypothesis testing, a type I error is the mistaken rejection of an actually true null hypothesis (also known as a "false positive" finding or conclusion; example: "an innocent person is convicted"), while a type II error is the fa ...
, then the null hypothesis is rejected and there is evidence that the data tested are not normally distributed. On the other hand, if the ''p'' value is greater than the chosen alpha level, then the null hypothesis (that the data came from a normally distributed population) can not be rejected (e.g., for an alpha level of .05, a data set with a ''p'' value of less than .05 rejects the null hypothesis that the data are from a normally distributed population – consequently, a data set with a ''p'' value more than the .05 alpha value fails to reject the null hypothesis that the data is from a normally distributed population). Like most statistical significance tests, if the sample size is sufficiently large this test may detect even trivial departures from the null hypothesis (i.e., although there may be some statistically significant effect, it may be too small to be of any practical significance); thus, additional investigation of the ''effect size'' is typically advisable, e.g., a
Q–Q plot In statistics, a Q–Q plot (quantile-quantile plot) is a probability plot, a List of graphical methods, graphical method for comparing two probability distributions by plotting their ''quantiles'' against each other. A point on the plot co ...
in this case.


Power analysis

Monte Carlo simulation Monte Carlo methods, or Monte Carlo experiments, are a broad class of computational algorithms that rely on repeated random sampling to obtain numerical results. The underlying concept is to use randomness to solve problems that might be determini ...
has found that Shapiro–Wilk has the best
power Power most often refers to: * Power (physics), meaning "rate of doing work" ** Engine power, the power put out by an engine ** Electric power * Power (social and political), the ability to influence people or events ** Abusive power Power may a ...
for a given significance, followed closely by Anderson–Darling when comparing the Shapiro–Wilk, Kolmogorov–Smirnov, and Lilliefors.


Approximation

Royston proposed an alternative method of calculating the coefficients vector by providing an algorithm for calculating values that extended the sample size from 50 to 2,000. This technique is used in several software packages includin
GraphPad Prism
Stata, SPSS and SAS. Rahman and Govidarajulu extended the sample size further up to 5,000.


See also

*
Anderson–Darling test The Anderson–Darling test is a statistical test of whether a given sample of data is drawn from a given probability distribution. In its basic form, the test assumes that there are no parameters to be estimated in the distribution being tested, i ...
*
Cramér–von Mises criterion In statistics the Cramér–von Mises criterion is a criterion used for judging the goodness of fit of a cumulative distribution function F^* compared to a given empirical distribution function F_n, or for comparing two empirical distributions. It ...
*
D'Agostino's K-squared test In statistics, D'Agostino's ''K''2 test, named for Ralph D'Agostino, is a goodness-of-fit measure of departure from normality, that is the test aims to gauge the compatibility of given data with the null hypothesis that the data is a realizatio ...
*
Kolmogorov–Smirnov test In statistics, the Kolmogorov–Smirnov test (K–S test or KS test) is a nonparametric test of the equality of continuous (or discontinuous, see Section 2.2), one-dimensional probability distributions that can be used to compare a sample with a ...
*
Lilliefors test In statistics, the Lilliefors test is a normality test based on the Kolmogorov–Smirnov test. It is used to test the null hypothesis that data come from a normally distributed population, when the null hypothesis does not specify ''which'' norma ...
*
Normal probability plot The normal probability plot is a graphical technique to identify substantive departures from normality. This includes identifying outliers, skewness, kurtosis, a need for transformations, and mixtures. Normal probability plots are made of raw d ...
*
Shapiro–Francia test The Shapiro–Francia test is a Normality test, statistical test for the normality of a population, based on sample data. It was introduced by Samuel Sanford Shapiro, S. S. Shapiro and R. S. Francia in 1972 as a simplification of the Shapiro–Wil ...


References


External links


Worked example using Excel

Algorithm AS R94 (Shapiro Wilk) FORTRAN code


* ttp://www.real-statistics.com/tests-normality-and-symmetry/statistical-tests-normality-symmetry/shapiro-wilk-expanded-test/ Real Statistics Using Excel: the Shapiro-Wilk Expanded Test {{DEFAULTSORT:Shapiro-Wilk Test Statistical tests Normality tests