Variation (statistics)
   HOME

TheInfoList



OR:

In
statistics Statistics (from German language, German: ''wikt:Statistik#German, Statistik'', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of ...
, dispersion (also called variability, scatter, or spread) is the extent to which a
distribution Distribution may refer to: Mathematics *Distribution (mathematics), generalized functions used to formulate solutions of partial differential equations * Probability distribution, the probability of a particular value or value range of a vari ...
is stretched or squeezed. Common examples of measures of statistical dispersion are the
variance In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its population mean or sample mean. Variance is a measure of dispersion, meaning it is a measure of how far a set of numbers ...
,
standard deviation In statistics, the standard deviation is a measure of the amount of variation or dispersion of a set of values. A low standard deviation indicates that the values tend to be close to the mean (also called the expected value) of the set, while ...
, and
interquartile range In descriptive statistics, the interquartile range (IQR) is a measure of statistical dispersion, which is the spread of the data. The IQR may also be called the midspread, middle 50%, fourth spread, or H‑spread. It is defined as the difference ...
. For instance, when the variance of data in a set is large, the data is widely scattered. On the other hand, when the variance is small, the data in the set is clustered. Dispersion is contrasted with location or central tendency, and together they are the most used properties of distributions.


Measures

A measure of statistical dispersion is a nonnegative
real number In mathematics, a real number is a number that can be used to measure a ''continuous'' one-dimensional quantity such as a distance, duration or temperature. Here, ''continuous'' means that values can have arbitrarily small variations. Every real ...
that is zero if all the data are the same and increases as the data become more diverse. Most measures of dispersion have the same units as the
quantity Quantity or amount is a property that can exist as a Counting, multitude or Magnitude (mathematics), magnitude, which illustrate discontinuity (mathematics), discontinuity and continuum (theory), continuity. Quantities can be compared in terms o ...
being measured. In other words, if the measurements are in metres or seconds, so is the measure of dispersion. Examples of dispersion measures include: *
Standard deviation In statistics, the standard deviation is a measure of the amount of variation or dispersion of a set of values. A low standard deviation indicates that the values tend to be close to the mean (also called the expected value) of the set, while ...
*
Interquartile range In descriptive statistics, the interquartile range (IQR) is a measure of statistical dispersion, which is the spread of the data. The IQR may also be called the midspread, middle 50%, fourth spread, or H‑spread. It is defined as the difference ...
(IQR) *
Range Range may refer to: Geography * Range (geographic), a chain of hills or mountains; a somewhat linear, complex mountainous or hilly area (cordillera, sierra) ** Mountain range, a group of mountains bordered by lowlands * Range, a term used to i ...
*
Mean absolute difference The mean absolute difference (univariate) is a Statistical dispersion#Measures of statistical dispersion, measure of statistical dispersion equal to the average absolute difference of two independent values drawn from a probability distribution. ...
(also known as Gini mean absolute difference) *
Median absolute deviation In statistics, the median absolute deviation (MAD) is a robust measure of the variability of a univariate sample of quantitative data. It can also refer to the population parameter that is estimated by the MAD calculated from a sample. For a un ...
(MAD) * Average absolute deviation (or simply called average deviation) * Distance standard deviation These are frequently used (together with scale factors) as estimators of scale parameters, in which capacity they are called estimates of scale.
Robust measures of scale In statistics, robust measures of scale are methods that quantify the statistical dispersion in a sample of numerical data while resisting outliers. The most common such robust statistics are the ''interquartile range'' (IQR) and the ''median absol ...
are those unaffected by a small number of outliers, and include the IQR and MAD. All the above measures of statistical dispersion have the useful property that they are ''location-invariant'' and ''linear in scale''. This means that if a
random variable A random variable (also called random quantity, aleatory variable, or stochastic variable) is a mathematical formalization of a quantity or object which depends on random events. It is a mapping or a function from possible outcomes (e.g., the po ...
X has a dispersion of S_X then a linear transformation Y=aX+b for real a and b should have dispersion S_Y=, a, S_X, where , a, is the
absolute value In mathematics, the absolute value or modulus of a real number x, is the non-negative value without regard to its sign. Namely, , x, =x if is a positive number, and , x, =-x if x is negative (in which case negating x makes -x positive), an ...
of a, that is, ignores a preceding negative sign -. Other measures of dispersion are dimensionless. In other words, they have no units even if the variable itself has units. These include: *
Coefficient of variation In probability theory and statistics, the coefficient of variation (CV), also known as relative standard deviation (RSD), is a standardized measure of dispersion of a probability distribution or frequency distribution. It is often expressed as ...
*
Quartile coefficient of dispersion In statistics, the quartile coefficient of dispersion is a descriptive statistic which measures dispersion and is used to make comparisons within and between data sets. Since it is based on quantile information, it is less sensitive to outliers tha ...
*
Relative mean difference The mean absolute difference (univariate) is a measure of statistical dispersion equal to the average absolute difference of two independent values drawn from a probability distribution. A related statistic is the relative mean absolute differe ...
, equal to twice the Gini coefficient *
Entropy Entropy is a scientific concept, as well as a measurable physical property, that is most commonly associated with a state of disorder, randomness, or uncertainty. The term and the concept are used in diverse fields, from classical thermodynam ...
: While the entropy of a discrete variable is location-invariant and scale-independent, and therefore not a measure of dispersion in the above sense, the entropy of a continuous variable is location invariant and additive in scale: If H(z) is the entropy of a continuous variable z and z=ax+b, then H(z)=H(x)+\log(a). There are other measures of dispersion: *
Variance In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its population mean or sample mean. Variance is a measure of dispersion, meaning it is a measure of how far a set of numbers ...
(the square of the standard deviation) – location-invariant but not linear in scale. *
Variance-to-mean ratio In probability theory and statistics, the index of dispersion, dispersion index, coefficient of dispersion, relative variance, or variance-to-mean ratio (VMR), like the coefficient of variation, is a normalized measure of the dispersion of a prob ...
– mostly used for
count data Count (feminine: countess) is a historical title of nobility in certain European countries, varying in relative status, generally of middling rank in the hierarchy of nobility. Pine, L. G. ''Titles: How the King Became His Majesty''. New York: ...
when the term
coefficient of dispersion In probability theory and statistics, the index of dispersion, dispersion index, coefficient of dispersion, relative variance, or variance-to-mean ratio (VMR), like the coefficient of variation, is a normalized measure of the dispersion of a p ...
is used and when this ratio is dimensionless, as count data are themselves dimensionless, not otherwise. Some measures of dispersion have specialized purposes. The
Allan variance The Allan variance (AVAR), also known as two-sample variance, is a measure of frequency stability in clock A clock or a timepiece is a device used to measure and indicate time. The clock is one of the oldest human inventions, meetin ...
can be used for applications where the noise disrupts convergence. The
Hadamard variance Jacques Salomon Hadamard (; 8 December 1865 – 17 October 1963) was a French mathematician who made major contributions in number theory, complex analysis, differential geometry and partial differential equations. Biography The son of a teac ...
can be used to counteract linear frequency drift sensitivity. For
categorical variable In statistics, a categorical variable (also called qualitative variable) is a variable that can take on one of a limited, and usually fixed, number of possible values, assigning each individual or other unit of observation to a particular group or ...
s, it is less common to measure dispersion by a single number; see qualitative variation. One measure that does so is the discrete
entropy Entropy is a scientific concept, as well as a measurable physical property, that is most commonly associated with a state of disorder, randomness, or uncertainty. The term and the concept are used in diverse fields, from classical thermodynam ...
.


Sources

In the
physical sciences Physical science is a branch of natural science that studies non-living systems, in contrast to life science. It in turn has many branches, each referred to as a "physical science", together called the "physical sciences". Definition Phy ...
, such variability may result from random measurement errors: instrument measurements are often not perfectly precise, i.e., reproducible, and there is additional
inter-rater variability In statistics, inter-rater reliability (also called by various similar names, such as inter-rater agreement, inter-rater concordance, inter-observer reliability, inter-coder reliability, and so on) is the degree of agreement among independent obse ...
in interpreting and reporting the measured results. One may assume that the quantity being measured is stable, and that the variation between measurements is due to
observational error Observational error (or measurement error) is the difference between a measured value of a quantity and its true value.Dodge, Y. (2003) ''The Oxford Dictionary of Statistical Terms'', OUP. In statistics, an error is not necessarily a " mistake ...
. A system of a large number of particles is characterized by the mean values of a relatively few number of macroscopic quantities such as temperature, energy, and density. The standard deviation is an important measure in fluctuation theory, which explains many physical phenomena, including why the sky is blue. In the biological sciences, the quantity being measured is seldom unchanging and stable, and the variation observed might additionally be ''intrinsic'' to the phenomenon: It may be due to ''inter-individual variability'', that is, distinct members of a population differing from each other. Also, it may be due to ''intra-individual variability'', that is, one and the same subject differing in tests taken at different times or in other differing conditions. Such types of variability are also seen in the arena of manufactured products; even there, the meticulous scientist finds variation. In
economics Economics () is the social science that studies the Production (economics), production, distribution (economics), distribution, and Consumption (economics), consumption of goods and services. Economics focuses on the behaviour and intera ...
,
finance Finance is the study and discipline of money, currency and capital assets. It is related to, but not synonymous with economics, the study of production, distribution, and consumption of money, assets, goods and services (the discipline of fina ...
, and other disciplines,
regression analysis In statistical modeling, regression analysis is a set of statistical processes for estimating the relationships between a dependent variable (often called the 'outcome' or 'response' variable, or a 'label' in machine learning parlance) and one ...
attempts to explain the dispersion of a
dependent variable Dependent and independent variables are variables in mathematical modeling, statistical modeling and experimental sciences. Dependent variables receive this name because, in an experiment, their values are studied under the supposition or demand ...
, generally measured by its variance, using one or more
independent variable Dependent and independent variables are variables in mathematical modeling, statistical modeling and experimental sciences. Dependent variables receive this name because, in an experiment, their values are studied under the supposition or demand ...
s each of which itself has positive dispersion. The fraction of variance explained is called the
coefficient of determination In statistics, the coefficient of determination, denoted ''R''2 or ''r''2 and pronounced "R squared", is the proportion of the variation in the dependent variable that is predictable from the independent variable(s). It is a statistic used i ...
.


A partial ordering of dispersion

A
mean-preserving spread In probability and statistics, a mean-preserving spread (MPS) is a change from one probability distribution A to another probability distribution B, where B is formed by spreading out one or more portions of A's probability density function or proba ...
(MPS) is a change from one probability distribution A to another probability distribution B, where B is formed by spreading out one or more portions of A's probability density function while leaving the mean (the expected value) unchanged. The concept of a mean-preserving spread provides a partial ordering of probability distributions according to their dispersions: of two probability distributions, one may be ranked as having more dispersion than the other, or alternatively neither may be ranked as having more dispersion.


See also

*
Average In ordinary language, an average is a single number taken as representative of a list of numbers, usually the sum of the numbers divided by how many numbers are in the list (the arithmetic mean). For example, the average of the numbers 2, 3, 4, 7, ...
*
Circular dispersion Directional statistics (also circular statistics or spherical statistics) is the subdiscipline of statistics that deals with directions (unit vectors in Euclidean space, R''n''), axes (lines through the origin in R''n'') or rotations in R''n''. M ...
*
Dispersion matrix In probability theory and statistics, a covariance matrix (also known as auto-covariance matrix, dispersion matrix, variance matrix, or variance–covariance matrix) is a square matrix giving the covariance between each pair of elements ...
* Qualitative variation *
Measurement uncertainty In metrology, measurement uncertainty is the expression of the statistical dispersion of the values attributed to a measured quantity. All measurements are subject to uncertainty and a measurement result is complete only when it is accompanied by ...
*
Robust measures of scale In statistics, robust measures of scale are methods that quantify the statistical dispersion in a sample of numerical data while resisting outliers. The most common such robust statistics are the ''interquartile range'' (IQR) and the ''median absol ...
* Summary statistics


References

{{DEFAULTSORT:Statistical Dispersion Statistical deviation and dispersion Summary statistics Accuracy and precision