In
descriptive statistics, the seven-number summary is a collection of seven
summary statistics, and is an extension of the
five-number summary. There are three similar, common forms.
As with the five-number summary, it can be represented by a modified
box plot
In descriptive statistics, a box plot or boxplot is a method for demonstrating graphically the locality, spread and skewness groups of numerical data through their quartiles.
In addition to the box on a box plot, there can be lines (which are ca ...
, adding hatch-marks on the "whiskers" for two of the additional numbers.
Seven-number summary
The following
percentiles are (approximately) evenly spaced under a
normally distributed variable:
:
The middle three values – the
lower quartile,
median, and
upper quartile – are the usual statistics from the
five-number summary and are the standard values for the box in a
box plot
In descriptive statistics, a box plot or boxplot is a method for demonstrating graphically the locality, spread and skewness groups of numerical data through their quartiles.
In addition to the box on a box plot, there can be lines (which are ca ...
.
The two unusual percentiles at either end are used because the locations of all seven values will be approximately equally spaced if the data is
normally distributed.
Some statistical tests require
normally distributed data, so the plotted values provide a convenient visual check for validity of later tests, simply by scanning to see if the marks for those seven percentiles appear to be equal distances apart on the plot.
Notice that whereas the extreme values of the
five-number summary depend on the number of samples, this seven-number summary does not, and is somewhat more stable, since its whisker-ends are protected from the usual wild swings in the extreme values of the sample by replacing them with the more steady 2nd and 98th percentiles.
The values can be represented using a modified
box plot
In descriptive statistics, a box plot or boxplot is a method for demonstrating graphically the locality, spread and skewness groups of numerical data through their quartiles.
In addition to the box on a box plot, there can be lines (which are ca ...
. The 2nd and 98th percentiles are represented by the ends of the whiskers, and hatch-marks across the whiskers mark the 9th and 91st percentiles.
Bowley’s seven-figure summary
Arthur Bowley used a set of
non-parametric statistics, called a "seven-figure summary", including the extremes,
deciles, and
quartiles, along with the median.
[
]
Thus the numbers are:
:
Note that the middle five of the seven numbers are very nearly the same as for the seven number summary, above.
The addition of the deciles allow one to compute the
interdecile range, which for a normal distribution can be scaled to give a reasonably efficient estimate of standard deviation, and the 10%
midsummary, which when compared to the median gives an idea of the
skewness in the tails.
Tukey’s seven-number summary
John Tukey
John Wilder Tukey (; June 16, 1915 – July 26, 2000) was an American mathematician and statistician, best known for the development of the fast Fourier Transform (FFT) algorithm and box plot. The Tukey range test, the Tukey lambda distributi ...
used a seven-number summary consisting of the extremes,
octiles,
quartiles, and the median.
[
]
The seven numbers are:
:
Note that the middle five of the seven numbers can all be obtained by successive partitioning of the ordered data into subsets of equal size. Extending the seven-number summary by continued partitioning produces the ''nine-number summary'', the ''eleven-number summary'', and so on.
See also
*
Three-point estimation
*
Stanine
Footnotes
References
{{reflist, 25em
Summary statistics