
In
statistics
Statistics (from German language, German: ', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a s ...
, dispersion (also called variability, scatter, or spread) is the extent to which a
distribution is stretched or squeezed. Common examples of measures of statistical dispersion are the
variance
In probability theory and statistics, variance is the expected value of the squared deviation from the mean of a random variable. The standard deviation (SD) is obtained as the square root of the variance. Variance is a measure of dispersion ...
,
standard deviation
In statistics, the standard deviation is a measure of the amount of variation of the values of a variable about its Expected value, mean. A low standard Deviation (statistics), deviation indicates that the values tend to be close to the mean ( ...
, and
interquartile range
In descriptive statistics, the interquartile range (IQR) is a measure of statistical dispersion, which is the spread of the data. The IQR may also be called the midspread, middle 50%, fourth spread, or H‑spread. It is defined as the differen ...
. For instance, when the variance of data in a set is large, the data is widely scattered. On the other hand, when the variance is small, the data in the set is clustered.
Dispersion is contrasted with location or
central tendency, and together they are the most used properties of distributions.
Measures of statistical dispersion
A measure of statistical dispersion is a nonnegative
real number
In mathematics, a real number is a number that can be used to measure a continuous one- dimensional quantity such as a duration or temperature. Here, ''continuous'' means that pairs of values can have arbitrarily small differences. Every re ...
that is zero if all the data are the same and increases as the data become more diverse.
Most measures of dispersion have the same
units as the
quantity
Quantity or amount is a property that can exist as a multitude or magnitude, which illustrate discontinuity and continuity. Quantities can be compared in terms of "more", "less", or "equal", or by assigning a numerical value multiple of a u ...
being measured. In other words, if the measurements are in metres or seconds, so is the measure of dispersion. Examples of dispersion measures include:
*
Standard deviation
In statistics, the standard deviation is a measure of the amount of variation of the values of a variable about its Expected value, mean. A low standard Deviation (statistics), deviation indicates that the values tend to be close to the mean ( ...
*
Interquartile range
In descriptive statistics, the interquartile range (IQR) is a measure of statistical dispersion, which is the spread of the data. The IQR may also be called the midspread, middle 50%, fourth spread, or H‑spread. It is defined as the differen ...
(IQR)
*
Range
*
Mean absolute difference (also known as Gini mean absolute difference)
*
Median absolute deviation (MAD)
*
Average absolute deviation (or simply called average deviation)
*
Distance standard deviation
These are frequently used (together with
scale factors) as
estimators of
scale parameters, in which capacity they are called estimates of scale.
Robust measures of scale are those unaffected by a small number of
outliers, and include the IQR and MAD.
All the above measures of statistical dispersion have the useful property that they are ''location-invariant'' and ''linear in scale''. This means that if a
random variable
A random variable (also called random quantity, aleatory variable, or stochastic variable) is a Mathematics, mathematical formalization of a quantity or object which depends on randomness, random events. The term 'random variable' in its mathema ...
has a dispersion of
then a
linear transformation
In mathematics, and more specifically in linear algebra, a linear map (also called a linear mapping, linear transformation, vector space homomorphism, or in some contexts linear function) is a mapping V \to W between two vector spaces that pr ...
for
real and
should have dispersion
, where
is the
absolute value
In mathematics, the absolute value or modulus of a real number x, is the non-negative value without regard to its sign. Namely, , x, =x if x is a positive number, and , x, =-x if x is negative (in which case negating x makes -x positive), ...
of
, that is, ignores a preceding negative sign
.
Other measures of dispersion are
dimensionless
Dimensionless quantities, or quantities of dimension one, are quantities implicitly defined in a manner that prevents their aggregation into units of measurement. ISBN 978-92-822-2272-0. Typically expressed as ratios that align with another sy ...
. In other words, they have no units even if the variable itself has units. These include:
*
Coefficient of variation
In probability theory and statistics, the coefficient of variation (CV), also known as normalized root-mean-square deviation (NRMSD), percent RMS, and relative standard deviation (RSD), is a standardized measure of dispersion of a probability ...
*
Quartile coefficient of dispersion
*
Relative mean difference, equal to twice the
Gini coefficient
In economics, the Gini coefficient ( ), also known as the Gini index or Gini ratio, is a measure of statistical dispersion intended to represent the income distribution, income inequality, the wealth distribution, wealth inequality, or the ...
*
Entropy
Entropy is a scientific concept, most commonly associated with states of disorder, randomness, or uncertainty. The term and the concept are used in diverse fields, from classical thermodynamics, where it was first recognized, to the micros ...
: While the entropy of a discrete variable is location-invariant and scale-independent, and therefore not a measure of dispersion in the above sense, the entropy of a continuous variable is location invariant and additive in scale: If
is the entropy of a continuous variable
and
, then
.
There are other measures of dispersion:
*
Variance
In probability theory and statistics, variance is the expected value of the squared deviation from the mean of a random variable. The standard deviation (SD) is obtained as the square root of the variance. Variance is a measure of dispersion ...
(the square of the standard deviation) – location-invariant but not linear in scale.
*
Variance-to-mean ratio – mostly used for
count data when the term
coefficient of dispersion is used and when this ratio is
dimensionless
Dimensionless quantities, or quantities of dimension one, are quantities implicitly defined in a manner that prevents their aggregation into units of measurement. ISBN 978-92-822-2272-0. Typically expressed as ratios that align with another sy ...
, as count data are themselves dimensionless, not otherwise.
Some measures of dispersion have specialized purposes. The
Allan variance
The Allan variance (AVAR), also known as two-sample variance, is a measure of frequency stability in clocks, oscillators and amplifiers. It is named after David W. Allan and expressed mathematically as \sigma_y^2(\tau).
The Allan deviation (ADEV ...
can be used for applications where the noise disrupts convergence. The
Hadamard variance can be used to counteract linear frequency drift sensitivity.
For
categorical variable
In statistics, a categorical variable (also called qualitative variable) is a variable that can take on one of a limited, and usually fixed, number of possible values, assigning each individual or other unit of observation to a particular group or ...
s, it is less common to measure dispersion by a single number; see
qualitative variation. One measure that does so is the discrete
entropy
Entropy is a scientific concept, most commonly associated with states of disorder, randomness, or uncertainty. The term and the concept are used in diverse fields, from classical thermodynamics, where it was first recognized, to the micros ...
.
Sources
In the
physical sciences
Physical science is a branch of natural science that studies non-living systems, in contrast to life science. It in turn has many branches, each referred to as a "physical science", together is called the "physical sciences".
Definition
...
, such variability may result from random measurement errors: instrument measurements are often not perfectly
precise, i.e., reproducible, and there is additional
inter-rater variability in interpreting and reporting the measured results. One may assume that the quantity being measured is stable, and that the variation between measurements is due to
observational error
Observational error (or measurement error) is the difference between a measured value of a quantity and its unknown true value.Dodge, Y. (2003) ''The Oxford Dictionary of Statistical Terms'', OUP. Such errors are inherent in the measurement ...
. A system of a large number of particles is characterized by the mean values of a relatively few number of macroscopic quantities such as temperature, energy, and density. The standard deviation is an important measure in fluctuation theory, which explains many physical phenomena, including why the sky is blue.
In the
biological sciences
Biology is the scientific study of life and living organisms. It is a broad natural science that encompasses a wide range of fields and unifying principles that explain the structure, function, growth, origin, evolution, and distribution of ...
, the quantity being measured is seldom unchanging and stable, and the variation observed might additionally be ''intrinsic'' to the phenomenon: It may be due to ''inter-individual variability'', that is, distinct members of a population differing from each other. Also, it may be due to ''intra-individual variability'', that is, one and the same subject differing in tests taken at different times or in other differing conditions. Such types of variability are also seen in the arena of manufactured products; even there, the meticulous scientist finds variation.
A partial ordering of dispersion
A
mean-preserving spread (MPS) is a change from one probability distribution A to another probability distribution B, where B is formed by spreading out one or more portions of A's probability density function while leaving the mean (the expected value) unchanged.
The concept of a mean-preserving spread provides a
partial ordering of probability distributions according to their dispersions: of two probability distributions, one may be ranked as having more dispersion than the other, or alternatively neither may be ranked as having more dispersion.
See also
*
Average
In colloquial, ordinary language, an average is a single number or value that best represents a set of data. The type of average taken as most typically representative of a list of numbers is the arithmetic mean the sum of the numbers divided by ...
*
Circular dispersion
*
Dispersion matrix
*
Probability density function
In probability theory, a probability density function (PDF), density function, or density of an absolutely continuous random variable, is a Function (mathematics), function whose value at any given sample (or point) in the sample space (the s ...
*
Qualitative variation
*
Measurement uncertainty
*
Precision (statistics)
*
Robust measures of scale
*
Summary statistics
In descriptive statistics, summary statistics are used to summarize a set of observations, in order to communicate the largest amount of information as simply as possible. Statisticians commonly try to describe the observations in
* a measure of ...
References
{{DEFAULTSORT:Statistical Dispersion
Statistical deviation and dispersion
Summary statistics
Accuracy and precision