HOME

TheInfoList



OR:

In
statistics Statistics (from German: '' Statistik'', "description of a state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a scientific, indust ...
, the median absolute deviation (MAD) is a
robust Robustness is the property of being strong and healthy in constitution. When it is transposed into a system, it refers to the ability of tolerating perturbations that might affect the system’s functional body. In the same line ''robustness'' ca ...
measure of the variability of a univariate sample of
quantitative data Quantitative research is a research strategy that focuses on quantifying the collection and analysis of data. It is formed from a deductive approach where emphasis is placed on the testing of theory, shaped by empiricist and positivist philoso ...
. It can also refer to the
population Population typically refers to the number of people in a single area, whether it be a city or town, region, country, continent, or the world. Governments typically quantify the size of the resident population within their jurisdiction usi ...
parameter A parameter (), generally, is any characteristic that can help in defining or classifying a particular system (meaning an event, project, object, situation, etc.). That is, a parameter is an element of a system that is useful, or critical, when ...
that is
estimated Estimation (or estimating) is the process of finding an estimate or approximation, which is a value that is usable for some purpose even if input data may be incomplete, uncertain, or unstable. The value is nonetheless usable because it is der ...
by the MAD calculated from a sample. For a univariate data set ''X''1, ''X''2, ..., ''Xn'', the MAD is defined as the
median In statistics and probability theory, the median is the value separating the higher half from the lower half of a data sample, a population, or a probability distribution. For a data set, it may be thought of as "the middle" value. The basic f ...
of the
absolute deviation In mathematics and statistics, deviation is a measure of difference between the observed value of a variable and some other value, often that variable's mean. The sign of the deviation reports the direction of that difference (the deviation is posi ...
s from the data's median \tilde=\operatorname(X) : : \operatorname = \operatorname( , X_i - \tilde, ) that is, starting with the residuals (deviations) from the data's median, the MAD is the
median In statistics and probability theory, the median is the value separating the higher half from the lower half of a data sample, a population, or a probability distribution. For a data set, it may be thought of as "the middle" value. The basic f ...
of their
absolute value In mathematics, the absolute value or modulus of a real number x, is the non-negative value without regard to its sign. Namely, , x, =x if is a positive number, and , x, =-x if x is negative (in which case negating x makes -x positive), ...
s.


Example

Consider the data (1, 1, 2, 2, 4, 6, 9). It has a median value of 2. The absolute deviations about 2 are (1, 1, 0, 0, 2, 4, 7) which in turn have a median value of 1 (because the sorted absolute deviations are (0, 0, 1, 1, 2, 4, 7)). So the median absolute deviation for this data is 1.


Uses

The median absolute deviation is a measure of
statistical dispersion In statistics, dispersion (also called variability, scatter, or spread) is the extent to which a distribution is stretched or squeezed. Common examples of measures of statistical dispersion are the variance, standard deviation, and interquartile ...
. Moreover, the MAD is a robust statistic, being more resilient to outliers in a data set than the
standard deviation In statistics, the standard deviation is a measure of the amount of variation or dispersion of a set of values. A low standard deviation indicates that the values tend to be close to the mean (also called the expected value) of the set, whil ...
. In the standard deviation, the distances from the
mean There are several kinds of mean in mathematics, especially in statistics. Each mean serves to summarize a given group of data, often to better understand the overall value ( magnitude and sign) of a given data set. For a data set, the '' ar ...
are squared, so large deviations are weighted more heavily, and thus outliers can heavily influence it. In the MAD, the deviations of a small number of outliers are irrelevant. Because the MAD is a more robust estimator of scale than the sample
variance In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its population mean or sample mean. Variance is a measure of dispersion, meaning it is a measure of how far a set of numbe ...
or
standard deviation In statistics, the standard deviation is a measure of the amount of variation or dispersion of a set of values. A low standard deviation indicates that the values tend to be close to the mean (also called the expected value) of the set, whil ...
, it works better with distributions without a mean or variance, such as the
Cauchy distribution The Cauchy distribution, named after Augustin Cauchy, is a continuous probability distribution. It is also known, especially among physicists, as the Lorentz distribution (after Hendrik Lorentz), Cauchy–Lorentz distribution, Lorentz(ian) fun ...
.


Relation to standard deviation

The MAD may be used similarly to how one would use the deviation for the average. In order to use the MAD as a
consistent estimator In statistics, a consistent estimator or asymptotically consistent estimator is an estimator—a rule for computing estimates of a parameter ''θ''0—having the property that as the number of data points used increases indefinitely, the result ...
for the
estimation Estimation (or estimating) is the process of finding an estimate or approximation, which is a value that is usable for some purpose even if input data may be incomplete, uncertain, or unstable. The value is nonetheless usable because it is de ...
of the
standard deviation In statistics, the standard deviation is a measure of the amount of variation or dispersion of a set of values. A low standard deviation indicates that the values tend to be close to the mean (also called the expected value) of the set, whil ...
\sigma, one takes : \hat = k \cdot \operatorname, where k is a constant scale factor, which depends on the distribution. For normally distributed data k is taken to be : k = 1/\left(\Phi^(3/4)\right) \approx 1.4826, i.e., the
reciprocal Reciprocal may refer to: In mathematics * Multiplicative inverse, in mathematics, the number 1/''x'', which multiplied by ''x'' gives the product 1, also known as a ''reciprocal'' * Reciprocal polynomial, a polynomial obtained from another pol ...
of the
quantile function In probability and statistics, the quantile function, associated with a probability distribution of a random variable, specifies the value of the random variable such that the probability of the variable being less than or equal to that value equ ...
\Phi^ (also known as the inverse of the
cumulative distribution function In probability theory and statistics, the cumulative distribution function (CDF) of a real-valued random variable X, or just distribution function of X, evaluated at x, is the probability that X will take a value less than or equal to x. Eve ...
) for the
standard normal distribution In statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is : f(x) = \frac e^ The parameter \mu ...
Z = (X - \mu) / \sigma. The argument 3/4 is such that \pm \operatorname covers 50% (between 1/4 and 3/4) of the standard normal
cumulative distribution function In probability theory and statistics, the cumulative distribution function (CDF) of a real-valued random variable X, or just distribution function of X, evaluated at x, is the probability that X will take a value less than or equal to x. Eve ...
, i.e. : \frac 12 = P(, X - \mu, \le \operatorname) = P\left(\left, \frac\ \le \frac \sigma\right) = P\left(, Z, \le \frac\right). Therefore, we must have that : \Phi\left(\operatorname / \sigma\right) - \Phi\left(-\operatorname / \sigma\right) = 1/2. Noticing that : \Phi\left(-\operatorname / \sigma\right) = 1 - \Phi\left(\operatorname / \sigma\right), we have that \operatorname / \sigma = \Phi^(3/4) = 0.67449, from which we obtain the scale factor k = 1 / \Phi^(3/4) = 1.4826. Another way of establishing the relationship is noting that MAD equals the
half-normal distribution In probability theory and statistics, the half-normal distribution is a special case of the folded normal distribution. Let X follow an ordinary normal distribution, N(0,\sigma^2). Then, Y=, X, follows a half-normal distribution. Thus, the ha ...
median: : \operatorname = \sigma\sqrt\operatorname^(1/2) \approx 0.67449 \sigma. This form is used in, e.g., the
probable error In statistics, probable error defines the half-range of an interval about a central point for the distribution, such that half of the values from the distribution will lie within the interval and half outside.Dodge, Y. (2006) ''The Oxford Dictiona ...
. In the case of
complex Complex commonly refers to: * Complexity, the behaviour of a system whose components interact in multiple ways so possible interactions are difficult to describe ** Complex system, a system composed of many components which may interact with each ...
values (''X''+i''Y''), the relation of MAD to the standard deviation is unchanged for normally distributed data.


MAD using geometric median

Analogously to how the
median In statistics and probability theory, the median is the value separating the higher half from the lower half of a data sample, a population, or a probability distribution. For a data set, it may be thought of as "the middle" value. The basic f ...
generalizes to the
geometric median In geometry, the geometric median of a discrete set of sample points in a Euclidean space is the point minimizing the sum of distances to the sample points. This generalizes the median, which has the property of minimizing the sum of distances ...
(gm) in multivariate data, MAD can be generalized to MADGM (median of distances to gm) in n dimensions. This is done by replacing the absolute differences in one dimension by euclidian distances of the data points to the geometric median in n dimensions. This gives the identical result as the univariate MAD in 1 dimension and generalizes to any number of dimensions. MADGM needs the geometric median to be found, which is done by an iterative process.


The population MAD

The population MAD is defined analogously to the sample MAD, but is based on the complete
distribution Distribution may refer to: Mathematics * Distribution (mathematics), generalized functions used to formulate solutions of partial differential equations *Probability distribution, the probability of a particular value or value range of a vari ...
rather than on a sample. For a symmetric distribution with zero mean, the population MAD is the 75th
percentile In statistics, a ''k''-th percentile (percentile score or centile) is a score ''below which'' a given percentage ''k'' of scores in its frequency distribution falls (exclusive definition) or a score ''at or below which'' a given percentage fall ...
of the distribution. Unlike the
variance In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its population mean or sample mean. Variance is a measure of dispersion, meaning it is a measure of how far a set of numbe ...
, which may be infinite or undefined, the population MAD is always a finite number. For example, the standard
Cauchy distribution The Cauchy distribution, named after Augustin Cauchy, is a continuous probability distribution. It is also known, especially among physicists, as the Lorentz distribution (after Hendrik Lorentz), Cauchy–Lorentz distribution, Lorentz(ian) fun ...
has undefined variance, but its MAD is 1. The earliest known mention of the concept of the MAD occurred in 1816, in a paper by
Carl Friedrich Gauss Johann Carl Friedrich Gauss (; german: Gauß ; la, Carolus Fridericus Gauss; 30 April 177723 February 1855) was a German mathematician and physicist who made significant contributions to many fields in mathematics and science. Sometimes refer ...
on the determination of the accuracy of numerical observations.


See also

*
Deviation (statistics) In mathematics and statistics, deviation is a measure of difference between the observed value of a variable and some other value, often that variable's mean. The sign of the deviation reports the direction of that difference (the deviation is posi ...
*
Interquartile range In descriptive statistics, the interquartile range (IQR) is a measure of statistical dispersion, which is the spread of the data. The IQR may also be called the midspread, middle 50%, fourth spread, or H‑spread. It is defined as the differen ...
*
Probable error In statistics, probable error defines the half-range of an interval about a central point for the distribution, such that half of the values from the distribution will lie within the interval and half outside.Dodge, Y. (2006) ''The Oxford Dictiona ...
*
Robust measures of scale In statistics, robust measures of scale are methods that quantify the statistical dispersion in a sample of numerical data while resisting outliers. The most common such robust statistics are the ''interquartile range'' (IQR) and the ''median abs ...
* Relative mean absolute difference *
Average absolute deviation The average absolute deviation (AAD) of a data set is the average of the absolute deviations from a central point. It is a summary statistic of statistical dispersion or variability. In the general form, the central point can be a mean, median, m ...
*
Least absolute deviations Least absolute deviations (LAD), also known as least absolute errors (LAE), least absolute residuals (LAR), or least absolute values (LAV), is a statistical optimality criterion and a statistical optimization technique based minimizing the '' sum ...


Notes


References

* * * {{Machine learning evaluation metrics Statistical deviation and dispersion Robust statistics