Interquartile range
   HOME

TheInfoList



OR:

In
descriptive statistics A descriptive statistic (in the count noun sense) is a summary statistic that quantitatively describes or summarizes features from a collection of information, while descriptive statistics (in the mass noun sense) is the process of using and an ...
, the interquartile range (IQR) is a measure of
statistical dispersion In statistics, dispersion (also called variability, scatter, or spread) is the extent to which a distribution is stretched or squeezed. Common examples of measures of statistical dispersion are the variance, standard deviation, and interquartile ...
, which is the spread of the data. The IQR may also be called the midspread, middle 50%, fourth spread, or H‑spread. It is defined as the difference between the 75th and 25th
percentiles In statistics, a ''k''-th percentile (percentile score or centile) is a score ''below which'' a given percentage ''k'' of scores in its frequency distribution falls (exclusive definition) or a score ''at or below which'' a given percentage falls ...
of the data. To calculate the IQR, the data set is divided into quartiles, or four rank-ordered even parts via linear interpolation. These quartiles are denoted by Q1 (also called the lower quartile), ''Q''2 (the median), and ''Q''3 (also called the upper quartile). The lower quartile corresponds with the 25th percentile and the upper quartile corresponds with the 75th percentile, so IQR = ''Q''3 −  ''Q''1. The IQR is an example of a trimmed estimator, defined as the 25% trimmed range, which enhances the accuracy of dataset statistics by dropping lower contribution, outlying points. It is also used as a robust measure of scale It can be clearly visualized by the box on a
Box plot In descriptive statistics, a box plot or boxplot is a method for graphically demonstrating the locality, spread and skewness groups of numerical data through their quartiles. In addition to the box on a box plot, there can be lines (which are ca ...
.


Use

Unlike total range, the interquartile range has a
breakdown point Robust statistics are statistics with good performance for data drawn from a wide range of probability distributions, especially for distributions that are not normal. Robust statistical methods have been developed for many common problems, such ...
of 25%, and is thus often preferred to the total range. The IQR is used to build
box plot In descriptive statistics, a box plot or boxplot is a method for graphically demonstrating the locality, spread and skewness groups of numerical data through their quartiles. In addition to the box on a box plot, there can be lines (which are ca ...
s, simple graphical representations of a probability distribution. The IQR is used in businesses as a marker for their
income Income is the consumption and saving opportunity gained by an entity within a specified timeframe, which is generally expressed in monetary terms. Income is difficult to define conceptually and the definition may be different across fields. Fo ...
rates. For a symmetric distribution (where the median equals the
midhinge In statistics, the midhinge is the average of the first and third quartiles and is thus a measure of location. Equivalently, it is the 25% trimmed mid-range or 25% midsummary; it is an L-estimator. : \operatorname(X) = \overline = \frac = \frac ...
, the average of the first and third quartiles), half the IQR equals the median absolute deviation (MAD). The median is the corresponding measure of
central tendency In statistics, a central tendency (or measure of central tendency) is a central or typical value for a probability distribution.Weisberg H.F (1992) ''Central Tendency and Variability'', Sage University Paper Series on Quantitative Applications in ...
. The IQR can be used to identify outliers (see below). The IQR also may indicate the
skewness In probability theory and statistics, skewness is a measure of the asymmetry of the probability distribution of a real-valued random variable about its mean. The skewness value can be positive, zero, negative, or undefined. For a unimodal ...
of the dataset. The quartile deviation or semi-interquartile range is defined as half the IQR.


Algorithm

The IQR of a set of values is calculated as the difference between the upper and lower quartiles, Q3 and Q1. Each quartile is a median calculated as follows. Given an even ''2n'' or odd ''2n+1'' number of values :''first quartile Q1'' = median of the ''n'' smallest values :''third quartile Q3'' = median of the ''n'' largest values The ''second quartile Q2'' is the same as the ordinary median.


Examples


Data set in a table

The following table has 13 rows, and follows the rules for the odd number of entries. For the data in this table the interquartile range is IQR = Q3 − Q1 = 119 - 31 = 88.


Data set in a plain-text box plot

                    
                             +−−−−−+−+     
               * , −−−−−−−−−−−,      ,  , −−−−−−−−−−−, 
                             +−−−−−+−+    
                    
 +−−−+−−−+−−−+−−−+−−−+−−−+−−−+−−−+−−−+−−−+−−−+−−−+   number line
 0   1   2   3   4   5   6   7   8   9   10  11  12
  
For the data set in this
box plot In descriptive statistics, a box plot or boxplot is a method for graphically demonstrating the locality, spread and skewness groups of numerical data through their quartiles. In addition to the box on a box plot, there can be lines (which are ca ...
: * lower (first) quartile ''Q''1 = 7 * median (second quartile) ''Q''2 = 8.5 * upper (third) quartile ''Q''3 = 9 * interquartile range, IQR = ''Q''3 - ''Q''1 = 2 * lower 1.5*IQR whisker = ''Q''1 - 1.5 * IQR = 7 - 3 = 4. (If there is no data point at 4, then the lowest point greater than 4.) * upper 1.5*IQR whisker = ''Q''3 + 1.5 * IQR = 9 + 3 = 12. (If there is no data point at 12, then the highest point less than 12.) This means the 1.5*IQR whiskers can be uneven in lengths. The median, minimum, maximum, and the first and third quartile constitute the
Five-number summary The five-number summary is a set of descriptive statistics that provides information about a dataset. It consists of the five most important sample percentiles: # the sample minimum ''(smallest observation)'' # the lower quartile or ''first quart ...
.


Distributions

The interquartile range of a continuous distribution can be calculated by integrating the
probability density function In probability theory, a probability density function (PDF), or density of a continuous random variable, is a function whose value at any given sample (or point) in the sample space (the set of possible values taken by the random variable) ca ...
(which yields the cumulative distribution function—any other means of calculating the CDF will also work). The lower quartile, ''Q''1, is a number such that integral of the PDF from -∞ to ''Q''1 equals 0.25, while the upper quartile, ''Q''3, is such a number that the integral from -∞ to ''Q''3 equals 0.75; in terms of the CDF, the quartiles can be defined as follows: :Q_1 = \text^(0.25) , :Q_3 = \text^(0.75) , where CDF−1 is the quantile function. The interquartile range and median of some common distributions are shown below


Interquartile range test for normality of distribution

The IQR,
mean There are several kinds of mean in mathematics, especially in statistics. Each mean serves to summarize a given group of data, often to better understand the overall value (magnitude and sign) of a given data set. For a data set, the '' ari ...
, and standard deviation of a population ''P'' can be used in a simple test of whether or not ''P'' is normally distributed, or Gaussian. If ''P'' is normally distributed, then the standard score of the first quartile, ''z''1, is −0.67, and the standard score of the third quartile, ''z''3, is +0.67. Given ''mean'' = \bar and ''standard deviation'' = σ for ''P'', if ''P'' is normally distributed, the first quartile :Q_1 = (\sigma \, z_1) + \bar and the third quartile :Q_3 = (\sigma \, z_3) + \bar If the actual values of the first or third quartiles differ substantially from the calculated values, ''P'' is not normally distributed. However, a normal distribution can be trivially perturbed to maintain its Q1 and Q2 std. scores at 0.67 and −0.67 and not be normally distributed (so the above test would produce a false positive). A better test of normality, such as
Q–Q plot In statistics, a Q–Q plot (quantile-quantile plot) is a probability plot, a graphical method for comparing two probability distributions by plotting their ''quantiles'' against each other. A point on the plot corresponds to one of the qu ...
would be indicated here.


Outliers

The interquartile range is often used to find outliers in data. Outliers here are defined as observations that fall below Q1 − 1.5 IQR or above Q3 + 1.5 IQR. In a boxplot, the highest and lowest occurring value within this limit are indicated by ''whiskers'' of the box (frequently with an additional bar at the end of the whisker) and any outliers as individual points.


See also

*
Interdecile range In statistics, the interdecile range is the difference between the first and the ninth deciles (10% and 90%). The interdecile range is a measure of statistical dispersion of the values in a set of data, similar to the range and the interquartile r ...
*
Midhinge In statistics, the midhinge is the average of the first and third quartiles and is thus a measure of location. Equivalently, it is the 25% trimmed mid-range or 25% midsummary; it is an L-estimator. : \operatorname(X) = \overline = \frac = \frac ...
* Probable error * Robust measures of scale


References


External links

* {{DEFAULTSORT:Interquartile Range Scale statistics Wikipedia articles with ASCII art