statistics Statistics (from German language, German: ''wikt:Statistik#German, Statistik'', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of ...

, the median absolute deviation (MAD) is a

robust Robustness is the property of being strong and healthy in constitution. When it is transposed into a system, it refers to the ability of tolerating perturbations that might affect the system’s functional body. In the same line ''robustness'' ca ...

measure of the variability of a

univariate In mathematics, a univariate object is an expression, equation, function or polynomial involving only one variable. Objects involving more than one variable are multivariate. In some cases the distinction between the univariate and multivariate ...

sample of

quantitative data Quantitative research is a research strategy that focuses on quantifying the collection and analysis of data. It is formed from a deductive approach where emphasis is placed on the testing of theory, shaped by empiricist and positivist philoso ...

. It can also refer to the

population Population typically refers to the number of people in a single area, whether it be a city or town, region, country, continent, or the world. Governments typically quantify the size of the resident population within their jurisdiction using a ...

parameter A parameter (), generally, is any characteristic that can help in defining or classifying a particular system (meaning an event, project, object, situation, etc.). That is, a parameter is an element of a system that is useful, or critical, when ...

that is

estimated Estimation (or estimating) is the process of finding an estimate or approximation, which is a value that is usable for some purpose even if input data may be incomplete, uncertain, or unstable. The value is nonetheless usable because it is der ...

by the MAD calculated from a sample. For a univariate data set ''X''₁, ''X''₂, ..., ''X_n'', the MAD is defined as the

median In statistics and probability theory, the median is the value separating the higher half from the lower half of a data sample, a population, or a probability distribution. For a data set, it may be thought of as "the middle" value. The basic fe ...

of the

absolute deviation In mathematics and statistics, deviation is a measure of difference between the observed value of a variable and some other value, often that variable's mean. The sign of the deviation reports the direction of that difference (the deviation is posi ...

s from the data's median

\tilde=\operatorname(X)

: :

\operatorname = \operatorname( , X_i - \tilde, )

that is, starting with the residuals (deviations) from the data's median, the MAD is the

of their

absolute value In mathematics, the absolute value or modulus of a real number x, is the non-negative value without regard to its sign. Namely, , x, =x if is a positive number, and , x, =-x if x is negative (in which case negating x makes -x positive), an ...

Example

Consider the data (1, 1, 2, 2, 4, 6, 9). It has a median value of 2. The absolute deviations about 2 are (1, 1, 0, 0, 2, 4, 7) which in turn have a median value of 1 (because the sorted absolute deviations are (0, 0, 1, 1, 2, 4, 7)). So the median absolute deviation for this data is 1.

Uses

The median absolute deviation is a measure of

statistical dispersion In statistics, dispersion (also called variability, scatter, or spread) is the extent to which a Probability distribution, distribution is stretched or squeezed. Common examples of measures of statistical dispersion are the variance, standard de ...

. Moreover, the MAD is a

robust statistic Robust statistics are statistics with good performance for data drawn from a wide range of probability distributions, especially for distributions that are not normal. Robust statistical methods have been developed for many common problems, su ...

, being more resilient to outliers in a data set than the

standard deviation In statistics, the standard deviation is a measure of the amount of variation or dispersion of a set of values. A low standard deviation indicates that the values tend to be close to the mean (also called the expected value) of the set, while ...

. In the standard deviation, the distances from the

mean There are several kinds of mean in mathematics, especially in statistics. Each mean serves to summarize a given group of data, often to better understand the overall value (magnitude and sign) of a given data set. For a data set, the ''arithme ...

are squared, so large deviations are weighted more heavily, and thus outliers can heavily influence it. In the MAD, the deviations of a small number of outliers are irrelevant. Because the MAD is a more robust estimator of scale than the sample

variance In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its population mean or sample mean. Variance is a measure of dispersion, meaning it is a measure of how far a set of numbers ...

, it works better with distributions without a mean or variance, such as the

Cauchy distribution The Cauchy distribution, named after Augustin Cauchy, is a continuous probability distribution. It is also known, especially among physicists, as the Lorentz distribution (after Hendrik Lorentz), Cauchy–Lorentz distribution, Lorentz(ian) fun ...

Relation to standard deviation

The MAD may be used similarly to how one would use the deviation for the average. In order to use the MAD as a

consistent estimator In statistics, a consistent estimator or asymptotically consistent estimator is an estimator—a rule for computing estimates of a parameter ''θ''0—having the property that as the number of data points used increases indefinitely, the result ...

for the

estimation Estimation (or estimating) is the process of finding an estimate or approximation, which is a value that is usable for some purpose even if input data may be incomplete, uncertain, or unstable. The value is nonetheless usable because it is der ...

of the

\sigma

, one takes :

\hat = k \cdot \operatorname,

where

k

is a constant

scale factor In affine geometry, uniform scaling (or isotropic scaling) is a linear transformation that enlarges (increases) or shrinks (diminishes) objects by a '' scale factor'' that is the same in all directions. The result of uniform scaling is similar ...

, which depends on the distribution. For normally distributed data

k

is taken to be :

k = 1/\left(\Phi^(3/4)\right) \approx 1.4826,

i.e., the

reciprocal Reciprocal may refer to: In mathematics * Multiplicative inverse, in mathematics, the number 1/''x'', which multiplied by ''x'' gives the product 1, also known as a ''reciprocal'' * Reciprocal polynomial, a polynomial obtained from another pol ...

of the

quantile function In probability and statistics, the quantile function, associated with a probability distribution of a random variable, specifies the value of the random variable such that the probability of the variable being less than or equal to that value equ ...

\Phi^

(also known as the inverse of the

cumulative distribution function In probability theory and statistics, the cumulative distribution function (CDF) of a real-valued random variable X, or just distribution function of X, evaluated at x, is the probability that X will take a value less than or equal to x. Ev ...

) for the standard normal distribution

Z = (X - \mu) / \sigma

. The argument 3/4 is such that

\pm \operatorname

covers 50% (between 1/4 and 3/4) of the standard normal

, i.e. :

\frac 12 = P(, X - \mu,  \le \operatorname) = P\left(\left, \frac\ \le \frac  \sigma\right) = P\left(, Z,  \le \frac\right).

Therefore, we must have that :

\Phi\left(\operatorname / \sigma\right) - \Phi\left(-\operatorname / \sigma\right) = 1/2.

Noticing that :

\Phi\left(-\operatorname / \sigma\right) = 1 - \Phi\left(\operatorname / \sigma\right),

we have that

\operatorname / \sigma = \Phi^(3/4) = 0.67449

, from which we obtain the scale factor

k = 1 / \Phi^(3/4) = 1.4826

. Another way of establishing the relationship is noting that MAD equals the

half-normal distribution In probability theory and statistics, the half-normal distribution is a special case of the folded normal distribution. Let X follow an ordinary normal distribution, N(0,\sigma^2). Then, Y=, X, follows a half-normal distribution. Thus, the hal ...

median: :

\operatorname = \sigma\sqrt\operatorname^(1/2) \approx 0.67449 \sigma.

This form is used in, e.g., the

probable error In statistics, probable error defines the half-range of an interval about a central point for the distribution, such that half of the values from the distribution will lie within the interval and half outside.Dodge, Y. (2006) ''The Oxford Dictiona ...

. In the case of

complex Complex commonly refers to: * Complexity, the behaviour of a system whose components interact in multiple ways so possible interactions are difficult to describe ** Complex system, a system composed of many components which may interact with each ...

values (''X''+i''Y''), the relation of MAD to the standard deviation is unchanged for normally distributed data.

MAD using geometric median

Analogously to how the

generalizes to the

geometric median In geometry, the geometric median of a discrete set of sample points in a Euclidean space is the point minimizing the sum of distances to the sample points. This generalizes the median, which has the property of minimizing the sum of distances ...

(gm) in multivariate data, MAD can be generalized to MADGM (median of distances to gm) in n dimensions. This is done by replacing the absolute differences in one dimension by euclidian distances of the data points to the geometric median in n dimensions. This gives the identical result as the univariate MAD in 1 dimension and generalizes to any number of dimensions. MADGM needs the geometric median to be found, which is done by an iterative process.

The population MAD

The population MAD is defined analogously to the sample MAD, but is based on the complete

distribution Distribution may refer to: Mathematics *Distribution (mathematics), generalized functions used to formulate solutions of partial differential equations * Probability distribution, the probability of a particular value or value range of a vari ...

rather than on a sample. For a symmetric distribution with zero mean, the population MAD is the 75th

percentile In statistics, a ''k''-th percentile (percentile score or centile) is a score ''below which'' a given percentage ''k'' of scores in its frequency distribution falls (exclusive definition) or a score ''at or below which'' a given percentage falls ...

of the distribution. Unlike the

, which may be infinite or undefined, the population MAD is always a finite number. For example, the standard

has undefined variance, but its MAD is 1. The earliest known mention of the concept of the MAD occurred in 1816, in a paper by

Carl Friedrich Gauss Johann Carl Friedrich Gauss (; german: Gauß ; la, Carolus Fridericus Gauss; 30 April 177723 February 1855) was a German mathematician and physicist who made significant contributions to many fields in mathematics and science. Sometimes refer ...

on the determination of the accuracy of numerical observations.

Notes

References

* * * {{Machine learning evaluation metrics Statistical deviation and dispersion Robust statistics

Example

Uses

Relation to standard deviation

MAD using geometric median

The population MAD

See also

Notes

References