HOME

TheInfoList



OR:

In
descriptive statistics A descriptive statistic (in the count noun sense) is a summary statistic that quantitatively describes or summarizes features from a collection of information, while descriptive statistics (in the mass noun sense) is the process of using and an ...
, the seven-number summary is a collection of seven
summary statistics In descriptive statistics, summary statistics are used to summarize a set of observations, in order to communicate the largest amount of information as simply as possible. Statisticians commonly try to describe the observations in * a measure of ...
, and is an extension of the
five-number summary The five-number summary is a set of descriptive statistics that provides information about a dataset. It consists of the five most important sample percentiles: # the sample minimum ''(smallest observation)'' # the lower quartile or ''first quart ...
. There are three similar, common forms. As with the five-number summary, it can be represented by a modified
box plot In descriptive statistics, a box plot or boxplot is a method for graphically demonstrating the locality, spread and skewness groups of numerical data through their quartiles. In addition to the box on a box plot, there can be lines (which are ca ...
, adding hatch-marks on the "whiskers" for two of the additional numbers.


Seven-number summary

The following
percentiles In statistics, a ''k''-th percentile (percentile score or centile) is a score ''below which'' a given percentage ''k'' of scores in its frequency distribution falls (exclusive definition) or a score ''at or below which'' a given percentage falls ...
are (approximately) evenly spaced under a normally distributed variable: # the 2nd
percentile In statistics, a ''k''-th percentile (percentile score or centile) is a score ''below which'' a given percentage ''k'' of scores in its frequency distribution falls (exclusive definition) or a score ''at or below which'' a given percentage falls ...
(better: 2.15%) # the 9th percentile (better: 8.87%) # the 25th percentile or lower quartile or ''first quartile'' # the 50th percentile or
median In statistics and probability theory, the median is the value separating the higher half from the lower half of a data sample, a population, or a probability distribution. For a data set, it may be thought of as "the middle" value. The basic fe ...
(middle value, or ''second quartile'') # the 75th percentile or upper quartile or ''third quartile'' # the 91st percentile (better: 91.13%) # the 98th percentile (better: 97.85%) The middle three values – the lower quartile,
median In statistics and probability theory, the median is the value separating the higher half from the lower half of a data sample, a population, or a probability distribution. For a data set, it may be thought of as "the middle" value. The basic fe ...
, and upper quartile – are the usual statistics from the
five-number summary The five-number summary is a set of descriptive statistics that provides information about a dataset. It consists of the five most important sample percentiles: # the sample minimum ''(smallest observation)'' # the lower quartile or ''first quart ...
and are the standard values for the box in a
box plot In descriptive statistics, a box plot or boxplot is a method for graphically demonstrating the locality, spread and skewness groups of numerical data through their quartiles. In addition to the box on a box plot, there can be lines (which are ca ...
. The two unusual percentiles at either end are used because the locations of all seven values will be approximately equally spaced if the data is normally distributed Some statistical tests require normally distributed data, so the plotted values provide a convenient visual check for validity of later tests, simply by scanning to see if the marks for those seven percentiles appear to be equal distances apart on the graph. Notice that whereas the extreme values of the
five-number summary The five-number summary is a set of descriptive statistics that provides information about a dataset. It consists of the five most important sample percentiles: # the sample minimum ''(smallest observation)'' # the lower quartile or ''first quart ...
depend on the number of samples, this seven-number summary does not, and is somewhat more stable, since its whisker-ends are protected from the usual wild swings in the extreme values of the sample by replacing them with the more steady 2nd and 98th percentiles. The values can be represented using a modified
box plot In descriptive statistics, a box plot or boxplot is a method for graphically demonstrating the locality, spread and skewness groups of numerical data through their quartiles. In addition to the box on a box plot, there can be lines (which are ca ...
. The 2nd and 98th percentiles are represented by the ends of the whiskers, and hatch-marks across the whiskers mark the 9th and 91st percentiles.


Bowley’s seven-figure summary

Arthur Bowley Sir Arthur Lyon Bowley, FBA (6 November 1869 – 21 January 1957) was an English statistician and economist who worked on economic statistics and pioneered the use of sampling techniques in social surveys. Early life Bowley's father, James Wil ...
used a set of
non-parametric statistics Nonparametric statistics is the branch of statistics that is not based solely on parametrized families of probability distributions (common examples of parameters are the mean and variance). Nonparametric statistics is based on either being distr ...
, called a "seven-figure summary", including the extremes,
decile In descriptive statistics, a decile is any of the nine values that divide the sorted data into ten equal parts, so that each part represents 1/10 of the sample or population. A decile is one possible form of a quantile; others include the quartile ...
s, and quartiles, along with the median. Thus the numbers are: # the
sample minimum In statistics, the sample maximum and sample minimum, also called the largest observation and smallest observation, are the values of the greatest and least elements of a sample. They are basic summary statistics, used in descriptive statistic ...
# the 10th percentile (first
decile In descriptive statistics, a decile is any of the nine values that divide the sorted data into ten equal parts, so that each part represents 1/10 of the sample or population. A decile is one possible form of a quantile; others include the quartile ...
) # the 25th percentile or lower quartile or ''first quartile'' # the 50th percentile or
median In statistics and probability theory, the median is the value separating the higher half from the lower half of a data sample, a population, or a probability distribution. For a data set, it may be thought of as "the middle" value. The basic fe ...
(middle value, or ''second quartile'') # the 75th percentile or upper quartile or ''third quartile'' # the 90th percentile (last
decile In descriptive statistics, a decile is any of the nine values that divide the sorted data into ten equal parts, so that each part represents 1/10 of the sample or population. A decile is one possible form of a quantile; others include the quartile ...
) # the
sample maximum In statistics, the sample maximum and sample minimum, also called the largest observation and smallest observation, are the values of the greatest and least elements of a sample. They are basic summary statistics, used in descriptive statistic ...
Note that the middle five of the seven numbers are very nearly the same as for the seven number summary, above. The addition of the deciles allow one to compute the
interdecile range In statistics, the interdecile range is the difference between the first and the ninth deciles (10% and 90%). The interdecile range is a measure of statistical dispersion of the values in a set of data, similar to the range and the interquartile r ...
, which for a normal distribution can be scaled to give a reasonably efficient estimate of standard deviation, and the 10%
midsummary In statistics, the mid-range or mid-extreme is a measure of central tendency of a sample defined as the arithmetic mean of the maximum and minimum values of the data set: :M=\frac. The mid-range is closely related to the range, a measure of ...
, which when compared to the median gives an idea of the
skewness In probability theory and statistics, skewness is a measure of the asymmetry of the probability distribution of a real-valued random variable about its mean. The skewness value can be positive, zero, negative, or undefined. For a unimodal d ...
in the tails.


Tukey’s seven-number summary

John Tukey John Wilder Tukey (; June 16, 1915 – July 26, 2000) was an American mathematician and statistician, best known for the development of the fast Fourier Transform (FFT) algorithm and box plot. The Tukey range test, the Tukey lambda distributi ...
used a seven-number summary consisting of the extremes, octiles, quartiles, and the median. The seven numbers are: # the
sample minimum In statistics, the sample maximum and sample minimum, also called the largest observation and smallest observation, are the values of the greatest and least elements of a sample. They are basic summary statistics, used in descriptive statistic ...
# the 12.5th percentile (first octile) # the 25th percentile or lower quartile or ''first quartile'' # the 50th percentile or
median In statistics and probability theory, the median is the value separating the higher half from the lower half of a data sample, a population, or a probability distribution. For a data set, it may be thought of as "the middle" value. The basic fe ...
(middle value, or ''second quartile'') # the 75th percentile or upper quartile or ''third quartile'' # the 87.5th percentile (last octile) # the
sample maximum In statistics, the sample maximum and sample minimum, also called the largest observation and smallest observation, are the values of the greatest and least elements of a sample. They are basic summary statistics, used in descriptive statistic ...
Note that the middle five of the seven numbers can all be obtained by successive partitioning of the ordered data into subsets of equal size. Extending the seven-number summary by continued partitioning produces the ''nine-number summary'', the ''eleven-number summary'', and so on.


See also

*
Three-point estimation The three-point estimation technique is used in management and information systems applications for the construction of an approximate probability distribution representing the outcome of future events, based on very limited information. While the d ...
*
Stanine Stanine (STAndard NINE) is a method of scaling test scores on a nine-point standard scale with a mean of five and a standard deviation of two. Some web sources attribute stanines to the U.S. Army Air Forces during World War II. Psychometric lege ...


Footnotes


References

{{reflist Summary statistics