In statistics, normality tests are used to determine if a

data set A data set (or dataset) is a collection of data. In the case of tabular data, a data set corresponds to one or more database tables, where every column of a table represents a particular variable, and each row corresponds to a given record of the d ...

is well-modeled by a

normal distribution In statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is : f(x) = \frac e^ The parameter \mu i ...

and to compute how likely it is for a

random variable A random variable (also called random quantity, aleatory variable, or stochastic variable) is a mathematical formalization of a quantity or object which depends on random events. It is a mapping or a function from possible outcomes (e.g., the p ...

underlying the data set to be normally distributed. More precisely, the tests are a form of

model selection Model selection is the task of selecting a statistical model from a set of candidate models, given data. In the simplest cases, a pre-existing set of data is considered. However, the task can also involve the design of experiments such that the ...

, and can be interpreted several ways, depending on one's interpretations of probability: * In

descriptive statistics A descriptive statistic (in the count noun sense) is a summary statistic that quantitatively describes or summarizes features from a collection of information, while descriptive statistics (in the mass noun sense) is the process of using and a ...

terms, one measures a

goodness of fit The goodness of fit of a statistical model describes how well it fits a set of observations. Measures of goodness of fit typically summarize the discrepancy between observed values and the values expected under the model in question. Such measure ...

of a normal model to the data – if the fit is poor then the data are not well modeled in that respect by a normal distribution, without making a judgment on any underlying variable. * In

frequentist statistics Frequentist inference is a type of statistical inference based in frequentist probability, which treats “probability” in equivalent terms to “frequency” and draws conclusions from sample-data by means of emphasizing the frequency or pr ...

statistical hypothesis testing A statistical hypothesis test is a method of statistical inference used to decide whether the data at hand sufficiently support a particular hypothesis. Hypothesis testing allows us to make probabilistic statements about population parameters. ...

, data are tested against the

null hypothesis In scientific research, the null hypothesis (often denoted ''H''0) is the claim that no difference or relationship exists between two sets of data or variables being analyzed. The null hypothesis is that any experimentally observed difference is d ...

that it is normally distributed. * In

Bayesian statistics Bayesian statistics is a theory in the field of statistics based on the Bayesian interpretation of probability where probability expresses a ''degree of belief'' in an event. The degree of belief may be based on prior knowledge about the event, ...

, one does not "test normality" per se, but rather computes the likelihood that the data come from a normal distribution with given parameters ''μ'',''σ'' (for all ''μ'',''σ''), and compares that with the likelihood that the data come from other distributions under consideration, most simply using a

Bayes factor The Bayes factor is a ratio of two competing statistical models represented by their marginal likelihood, and is used to quantify the support for one model over the other. The models in questions can have a common set of parameters, such as a nu ...

(giving the relative likelihood of seeing the data given different models), or more finely taking a

prior distribution In Bayesian statistical inference, a prior probability distribution, often simply called the prior, of an uncertain quantity is the probability distribution that would express one's beliefs about this quantity before some evidence is taken int ...

on possible models and parameters and computing a

posterior distribution The posterior probability is a type of conditional probability that results from updating the prior probability with information summarized by the likelihood via an application of Bayes' rule. From an epistemological perspective, the posterior p ...

given the computed likelihoods. A normality test is used to determine whether sample data has been drawn from a normally distributed population (within some tolerance). A number of statistical tests, such as the Student's t-test and the one-way and two-way ANOVA, require a normally distributed sample population.

Graphical methods

An informal approach to testing normality is to compare a

histogram A histogram is an approximate representation of the distribution of numerical data. The term was first introduced by Karl Pearson. To construct a histogram, the first step is to " bin" (or " bucket") the range of values—that is, divide the ent ...

of the sample data to a normal probability curve. The empirical distribution of the data (the histogram) should be bell-shaped and resemble the normal distribution. This might be difficult to see if the sample is small. In this case one might proceed by regressing the data against the

quantile In statistics and probability, quantiles are cut points dividing the range of a probability distribution into continuous intervals with equal probabilities, or dividing the observations in a sample in the same way. There is one fewer quantile ...

s of a normal distribution with the same mean and variance as the sample. Lack of fit to the regression line suggests a departure from normality (see Anderson Darling coefficient and minitab). A graphical tool for assessing normality is the

normal probability plot The normal probability plot is a graphical technique to identify substantive departures from normality. This includes identifying outliers, skewness, kurtosis, a need for transformations, and mixtures. Normal probability plots are made of raw ...

, a quantile-quantile plot (QQ plot) of the standardized data against the

standard normal distribution In statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is : f(x) = \frac e^ The parameter \mu i ...

. Here the

correlation In statistics, correlation or dependence is any statistical relationship, whether causal or not, between two random variables or bivariate data. Although in the broadest sense, "correlation" may indicate any type of association, in statisti ...

between the sample data and normal quantiles (a measure of the goodness of fit) measures how well the data are modeled by a normal distribution. For normal data the points plotted in the QQ plot should fall approximately on a straight line, indicating high positive correlation. These plots are easy to interpret and also have the benefit that outliers are easily identified.

Back-of-the-envelope test

Simple back-of-the-envelope test takes the

sample maximum and minimum In statistics, the sample maximum and sample minimum, also called the largest observation and smallest observation, are the values of the greatest and least elements of a sample. They are basic summary statistics, used in descriptive statistics ...

and computes their

z-score In statistics, the standard score is the number of standard deviations by which the value of a raw score (i.e., an observed value or data point) is above or below the mean value of what is being observed or measured. Raw scores above the me ...

, or more properly

t-statistic In statistics, the ''t''-statistic is the ratio of the departure of the estimated value of a parameter from its hypothesized value to its standard error. It is used in hypothesis testing via Student's ''t''-test. The ''t''-statistic is used i ...

(number of sample standard deviations that a sample is above or below the sample mean), and compares it to the 68–95–99.7 rule: if one has a 3''σ'' event (properly, a 3''s'' event) and substantially fewer than 300 samples, or a 4''s'' event and substantially fewer than 15,000 samples, then a normal distribution will understate the maximum magnitude of deviations in the sample data. This test is useful in cases where one faces kurtosis risk – where large deviations matter – and has the benefits that it is very easy to compute and to communicate: non-statisticians can easily grasp that "6''σ'' events are very rare in normal distributions".

Frequentist tests

Tests of univariate normality include the following: * D'Agostino's K-squared test, * Jarque–Bera test, * Anderson–Darling test, * Cramér–von Mises criterion, *

Kolmogorov–Smirnov test In statistics, the Kolmogorov–Smirnov test (K–S test or KS test) is a nonparametric test of the equality of continuous (or discontinuous, see Section 2.2), one-dimensional probability distributions that can be used to compare a sample wi ...

(this one only works if the mean and the variance of the normal are assumed known under the null hypothesis), * Lilliefors test (based on the Kolmogorov–Smirnov test, adjusted for when also estimating the mean and variance from the data), * Shapiro–Wilk test, and *

Pearson's chi-squared test Pearson's chi-squared test (\chi^2) is a statistical test applied to sets of categorical data to evaluate how likely it is that any observed difference between the sets arose by chance. It is the most widely used of many chi-squared tests (e.g ...

. A 2011 study concludes that Shapiro–Wilk has the best power for a given significance, followed closely by Anderson–Darling when comparing the Shapiro–Wilk, Kolmogorov–Smirnov, Lilliefors, and Anderson–Darling tests. Some published works recommend the Jarque–Bera test, but the test has weakness. In particular, the test has low power for distributions with short tails, especially for bimodal distributions. Some authors have declined to include its results in their studies because of its poor overall performance. Historically, the third and fourth standardized moments (

skewness In probability theory and statistics, skewness is a measure of the asymmetry of the probability distribution of a real-valued random variable about its mean. The skewness value can be positive, zero, negative, or undefined. For a unimo ...

and

kurtosis In probability theory and statistics, kurtosis (from el, κυρτός, ''kyrtos'' or ''kurtos'', meaning "curved, arching") is a measure of the "tailedness" of the probability distribution of a real-valued random variable. Like skewness, kur ...

) were some of the earliest tests for normality. The Lin-Mudholkar test specifically targets asymmetric alternatives. The Jarque–Bera test is itself derived from

and

estimates. Mardia's multivariate skewness and kurtosis tests generalize the moment tests to the multivariate case. Other early

test statistic A test statistic is a statistic (a quantity derived from the sample) used in statistical hypothesis testing.Berger, R. L.; Casella, G. (2001). ''Statistical Inference'', Duxbury Press, Second Edition (p.374) A hypothesis test is typically specifie ...

s include the ratio of the

mean absolute deviation The average absolute deviation (AAD) of a data set is the average of the Absolute value, absolute Deviation (statistics), deviations from a central tendency, central point. It is a summary statistics, summary statistic of statistical dispersion or ...

to the standard deviation and of the range to the standard deviation. More recent tests of normality include the energy test (Székely and Rizzo) and the tests based on the empirical characteristic function (ECF) (e.g. Epps and Pulley, Henze–Zirkler, BHEP test). The energy and the ECF tests are powerful tests that apply for testing univariate or multivariate normality and are statistically consistent against general alternatives. The normal distribution has the highest entropy of any distribution for a given standard deviation. There are a number of normality tests based on this property, the first attributable to Vasicek.

Bayesian tests

Kullback–Leibler divergence In mathematical statistics, the Kullback–Leibler divergence (also called relative entropy and I-divergence), denoted D_\text(P \parallel Q), is a type of statistical distance: a measure of how one probability distribution ''P'' is different fro ...

s between the whole posterior distributions of the slope and variance do not indicate non-normality. However, the ratio of expectations of these posteriors and the expectation of the ratios give similar results to the Shapiro–Wilk statistic except for very small samples, when non-informative priors are used. Spiegelhalter suggests using a

to compare normality with a different class of distributional alternatives. This approach has been extended by Farrell and Rogers-Stewart.

Applications

One application of normality tests is to the residuals from a

linear regression In statistics, linear regression is a linear approach for modelling the relationship between a scalar response and one or more explanatory variables (also known as dependent and independent variables). The case of one explanatory variable is ...

model. If they are not normally distributed, the residuals should not be used in Z tests or in any other tests derived from the normal distribution, such as t tests,

F test An ''F''-test is any statistical test in which the test statistic has an ''F''-distribution under the null hypothesis. It is most often used when comparing statistical models that have been fitted to a data set, in order to identify the model th ...

s and

chi-squared test A chi-squared test (also chi-square or test) is a statistical hypothesis test used in the analysis of contingency tables In statistics, a contingency table (also known as a cross tabulation or crosstab) is a type of table in a matrix format ...

s. If the residuals are not normally distributed, then the dependent variable or at least one

explanatory variable Dependent and independent variables are variables in mathematical modeling, statistical modeling and experimental sciences. Dependent variables receive this name because, in an experiment, their values are studied under the supposition or demand ...

may have the wrong functional form, or important variables may be missing, etc. Correcting one or more of these

systematic error Observational error (or measurement error) is the difference between a measured value of a quantity and its true value.Dodge, Y. (2003) ''The Oxford Dictionary of Statistical Terms'', OUP. In statistics, an error is not necessarily a "mistake ...

s may produce residuals that are normally distributed; in other words, non-normality of residuals is often a model deficiency rather than a data problem.

Graphical methods

Back-of-the-envelope test

Frequentist tests

Bayesian tests

Applications

See also

Notes

Further reading