Shapiro–Wilk Test
   HOME

TheInfoList



OR:

The Shapiro–Wilk test is a test of normality. It was published in 1965 by Samuel Sanford Shapiro and
Martin Wilk Martin Bradbury Wilk, (18 December 1922 – 19 February 2013) was a Canadian statistician, academic, and the former chief statistician of Canada. In 1965, together with Samuel Shapiro, he developed the Shapiro–Wilk test, which can indicate ...
.


Theory

The Shapiro–Wilk test tests the
null hypothesis The null hypothesis (often denoted ''H''0) is the claim in scientific research that the effect being studied does not exist. The null hypothesis can also be described as the hypothesis in which no relationship exists between two sets of data o ...
that a
sample Sample or samples may refer to: * Sample (graphics), an intersection of a color channel and a pixel * Sample (material), a specimen or small quantity of something * Sample (signal), a digital discrete sample of a continuous analog signal * Sample ...
''x''1, ..., ''x''''n'' came from a
normally distributed In probability theory and statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real number, real-valued random variable. The general form of its probability density function is f(x ...
population. The
test statistic Test statistic is a quantity derived from the sample for statistical hypothesis testing.Berger, R. L.; Casella, G. (2001). ''Statistical Inference'', Duxbury Press, Second Edition (p.374) A hypothesis test is typically specified in terms of a tes ...
is W = \frac, where * x_ with parentheses enclosing the subscript index ''i'' is the ''i''th
order statistic In statistics, the ''k''th order statistic of a statistical sample is equal to its ''k''th-smallest value. Together with Ranking (statistics), rank statistics, order statistics are among the most fundamental tools in non-parametric statistics and ...
, i.e., the ''i''th-smallest number in the sample (not to be confused with x_i). * \overline = \left( x_1 + \cdots + x_n \right) / n is the sample mean. The coefficients a_i are given by: p. 593 (a_1,\dots,a_n) = , where ''C'' is a
vector norm In mathematics, a norm is a function from a real or complex vector space to the non-negative real numbers that behaves in certain ways like the distance from the origin: it commutes with scaling, obeys a form of the triangle inequality, and zero ...
: C = \left\, V^ m \right\, = ^ and the vector ''m'', m = (m_1,\dots,m_n)^ is made of the
expected value In probability theory, the expected value (also called expectation, expectancy, expectation operator, mathematical expectation, mean, expectation value, or first Moment (mathematics), moment) is a generalization of the weighted average. Informa ...
s of the
order statistic In statistics, the ''k''th order statistic of a statistical sample is equal to its ''k''th-smallest value. Together with Ranking (statistics), rank statistics, order statistics are among the most fundamental tools in non-parametric statistics and ...
s of
independent and identically distributed random variables Independent or Independents may refer to: Arts, entertainment, and media Artist groups * Independents (artist group), a group of modernist painters based in Pennsylvania, United States * Independentes (English: Independents), a Portuguese artis ...
sampled from the standard normal distribution; finally, V is the
covariance matrix In probability theory and statistics, a covariance matrix (also known as auto-covariance matrix, dispersion matrix, variance matrix, or variance–covariance matrix) is a square matrix giving the covariance between each pair of elements of ...
of those normal order statistics. There is no name for the distribution of W. The cutoff values for the statistics are calculated through Monte Carlo simulations.


Interpretation

The null-hypothesis of this test is that the population is normally distributed. If the ''p'' value is less than the chosen
alpha level Type I error, or a false positive, is the erroneous rejection of a true null hypothesis in statistical hypothesis testing. A type II error, or a false negative, is the erroneous failure in bringing about appropriate rejection of a false null hyp ...
, then the null hypothesis is rejected and there is evidence that the data tested are not normally distributed. Like most statistical significance tests, if the sample size is sufficiently large this test may detect even trivial departures from the null hypothesis (i.e., although there may be some statistically significant effect, it may be too small to be of any practical significance); thus, additional investigation of the ''effect size'' is typically advisable, e.g., a
Q–Q plot In statistics, a Q–Q plot (quantile–quantile plot) is a probability plot, a List of graphical methods, graphical method for comparing two probability distributions by plotting their ''quantiles'' against each other. A point on the plot ...
in this case.


Power analysis

Monte Carlo simulation Monte Carlo methods, or Monte Carlo experiments, are a broad class of computational algorithms that rely on repeated random sampling to obtain numerical results. The underlying concept is to use randomness to solve problems that might be det ...
has found that Shapiro–Wilk has the best
power Power may refer to: Common meanings * Power (physics), meaning "rate of doing work" ** Engine power, the power put out by an engine ** Electric power, a type of energy * Power (social and political), the ability to influence people or events Math ...
for a given significance, followed closely by Anderson–Darling when comparing the Shapiro–Wilk, Kolmogorov–Smirnov, and Lilliefors.


Approximation

Royston proposed an alternative method of calculating the coefficients vector by providing an algorithm for calculating values that extended the sample size from 50 to 2,000. This technique is used in several software packages includin
GraphPad Prism
Stata, SPSS and SAS. Rahman and Govidarajulu extended the sample size further up to 5,000.


See also

*
Anderson–Darling test The Anderson–Darling test is a statistical test of whether a given sample of data is drawn from a given probability distribution. In its basic form, the test assumes that there are no parameters to be estimated in the distribution being tested, i ...
*
Cramér–von Mises criterion In statistics the Cramér–von Mises criterion is a criterion used for judging the goodness of fit of a cumulative distribution function F^* compared to a given empirical distribution function F_n, or for comparing two empirical distributions. ...
*
D'Agostino's K-squared test In statistics, D'Agostino's ''K''2 test, named for Ralph D'Agostino, is a goodness-of-fit measure of departure from normality, that is the test aims to gauge the compatibility of given data with the null hypothesis that the data is a realizatio ...
*
Kolmogorov–Smirnov test In statistics, the Kolmogorov–Smirnov test (also K–S test or KS test) is a nonparametric statistics, nonparametric test of the equality of continuous (or discontinuous, see #Discrete and mixed null distribution, Section 2.2), one-dimensional ...
*
Lilliefors test Lilliefors test is a normality test based on the Kolmogorov–Smirnov test. It is used to test the null hypothesis that data come from a normally distributed population, when the null hypothesis does not specify ''which'' normal distribution; i.e. ...
*
Normal probability plot The normal probability plot is a graphical technique to identify substantive departures from normality. This includes identifying outliers, skewness, kurtosis, a need for transformations, and mixtures. Normal probability plots are made of raw ...
* Shapiro–Francia test


References


External links


Worked example using Excel

Algorithm AS R94 (Shapiro Wilk) FORTRAN code


* ttp://www.real-statistics.com/tests-normality-and-symmetry/statistical-tests-normality-symmetry/shapiro-wilk-expanded-test/ Real Statistics Using Excel: the Shapiro-Wilk Expanded Test {{DEFAULTSORT:Shapiro-Wilk Test Normality tests