HOME

TheInfoList



OR:

The normal probability plot is a
graphical technique Statistical graphics, also known as statistical graphical techniques, are graphics used in the field of statistics for data visualization. Overview Whereas statistics and data analysis procedures generally yield their output in numeric or tabul ...
to identify substantive departures from normality. This includes identifying
outlier In statistics, an outlier is a data point that differs significantly from other observations. An outlier may be due to a variability in the measurement, an indication of novel data, or it may be the result of experimental error; the latter are ...
s,
skewness In probability theory and statistics, skewness is a measure of the asymmetry of the probability distribution of a real-valued random variable about its mean. The skewness value can be positive, zero, negative, or undefined. For a unimodal d ...
,
kurtosis In probability theory and statistics, kurtosis (from el, κυρτός, ''kyrtos'' or ''kurtos'', meaning "curved, arching") is a measure of the "tailedness" of the probability distribution of a real-valued random variable. Like skewness, kurtosi ...
, a need for transformations, and
mixtures In chemistry, a mixture is a material made up of two or more different chemical substances which are not chemically bonded. A mixture is the physical combination of two or more substances in which the identities are retained and are mixed in the ...
. Normal probability plots are made of raw data, residuals from model fits, and estimated parameters. In a normal probability plot (also called a "normal plot"), the sorted data are plotted vs. values selected to make the resulting image look close to a straight line if the data are approximately normally distributed. Deviations from a straight line suggest departures from normality. The plotting can be manually performed by using a special
graph paper Graph paper, coordinate paper, grid paper, or squared paper is writing paper that is printed with fine lines making up a regular grid. The lines are often used as guides for plotting graphs of functions or experimental data and drawing curves. I ...
, called ''normal probability paper''. With modern computers normal plots are commonly made with software. The normal probability plot is a special case of the Q–Q probability plot for a normal distribution. The theoretical
quantile In statistics and probability, quantiles are cut points dividing the range of a probability distribution into continuous intervals with equal probabilities, or dividing the observations in a sample in the same way. There is one fewer quantile th ...
s are generally chosen to approximate either the mean or the median of the corresponding
order statistic In statistics, the ''k''th order statistic of a statistical sample is equal to its ''k''th-smallest value. Together with rank statistics, order statistics are among the most fundamental tools in non-parametric statistics and inference. Import ...
s.


Definition

The normal probability plot is formed by plotting the sorted data vs. an approximation to the means or medians of the corresponding
order statistic In statistics, the ''k''th order statistic of a statistical sample is equal to its ''k''th-smallest value. Together with rank statistics, order statistics are among the most fundamental tools in non-parametric statistics and inference. Import ...
s; see
rankit In statistics, rankits of a set of data are the expected values of the order statistics of a sample from the standard normal distribution the same size as the data. They are primarily used in the normal probability plot, a graphical technique for ...
. Some plot the data on the vertical axis; others plot the data on the horizontal axis. Different sources use slightly different approximations for rankits. The formula used by the "qqnorm" function in the basic "stats" package in
R (programming language) R is a programming language for statistical computing and graphics supported by the R Core Team and the R Foundation for Statistical Computing. Created by statisticians Ross Ihaka and Robert Gentleman, R is used among data miners, bioinform ...
is as follows: : z_i = \Phi^\left( \frac \right), for , where : if and ::0.5 for ''n'' > 10, and is the standard normal
quantile function In probability and statistics, the quantile function, associated with a probability distribution of a random variable, specifies the value of the random variable such that the probability of the variable being less than or equal to that value equ ...
. If the data are consistent with a sample from a normal distribution, the points should lie close to a straight line. As a reference, a straight line can be fit to the points. The further the points vary from this line, the greater the indication of departure from normality. If the sample has mean 0, standard deviation 1 then a line through 0 with slope 1 could be used. With more points, random deviations from a line will be less pronounced. Normal plots are often used with as few as 7 points, e.g., with plotting the effects in a saturated model from a 2-level fractional factorial experiment. With fewer points, it becomes harder to distinguish between random variability and a substantive deviation from normality.


Other distributions

Probability plots for distributions other than the normal are computed in exactly the same way. The normal quantile function is simply replaced by the quantile function of the desired distribution. In this way, a probability plot can easily be generated for any distribution for which one has the quantile function. With a location-scale family of distributions, the
location In geography, location or place are used to denote a region (point, line, or area) on Earth's surface or elsewhere. The term ''location'' generally implies a higher degree of certainty than ''place'', the latter often indicating an entity with an ...
and
scale parameter In probability theory and statistics, a scale parameter is a special kind of numerical parameter of a parametric family of probability distributions. The larger the scale parameter, the more spread out the distribution. Definition If a family o ...
s of the distribution can be estimated from the intercept and the
slope In mathematics, the slope or gradient of a line is a number that describes both the ''direction'' and the ''steepness'' of the line. Slope is often denoted by the letter ''m''; there is no clear answer to the question why the letter ''m'' is use ...
of the line. For other distributions the parameters must first be estimated before a probability plot can be made.


Plot types

This is a sample of size 50 from a normal distribution, plotted as both a histogram, and a normal probability plot. File:normprob.png, Normal probability plot of a sample from a normal distribution – it looks fairly straight, at least when the few large and small values are ignored. File:normhist.png, Histogram of a sample from a normal distribution – it looks fairly symmetric and unimodal This is a sample of size 50 from a right-skewed distribution, plotted as both a histogram, and a normal probability plot. File:normexpprob.png, Normal probability plot of a sample from a right-skewed distribution – it has an inverted C shape. File:normexphist.png, Histogram of a sample from a right-skewed distribution – it looks unimodal and skewed right. This is a sample of size 50 from a uniform distribution, plotted as both a histogram, and a normal probability plot. File:normunifprob.png, Normal probability plot of a sample from a uniform distribution – it has an S shape. File:normunifhist.png, Histogram of a sample from a uniform distribution – it looks multimodal and supposedly roughly symmetric.


See also

*
P–P plot In statistics, a P–P plot (probability–probability plot or percent–percent plot or P value plot) is a probability plot for assessing how closely two data sets agree, or for assessing how closely a dataset fits a particular model. It works b ...
*
Q–Q plot In statistics, a Q–Q plot (quantile-quantile plot) is a probability plot, a List of graphical methods, graphical method for comparing two probability distributions by plotting their ''quantiles'' against each other. A point on the plot co ...
*
Rankit In statistics, rankits of a set of data are the expected values of the order statistics of a sample from the standard normal distribution the same size as the data. They are primarily used in the normal probability plot, a graphical technique for ...


References


Further reading

*


External links


Engineering Statistics Handbook: Normal Probability Plot

Statit Support: Testing for "Near-Normality": The Probability Plot
{{Distribution fitting Statistical charts and diagrams Normal distribution Normality tests