The Mahalanobis distance is a measure of the distance between a point ''P'' and a

distribution Distribution may refer to: Mathematics *Distribution (mathematics), generalized functions used to formulate solutions of partial differential equations * Probability distribution, the probability of a particular value or value range of a vari ...

''D'', introduced by P. C. Mahalanobis in 1936. Mahalanobis's definition was prompted by the problem of identifying the similarities of skulls based on measurements in 1927. It is a multi-dimensional generalization of the idea of measuring how many

standard deviations In statistics, the standard deviation is a measure of the amount of variation or dispersion of a set of values. A low standard deviation indicates that the values tend to be close to the mean (also called the expected value) of the set, while ...

away ''P'' is from the

mean There are several kinds of mean in mathematics, especially in statistics. Each mean serves to summarize a given group of data, often to better understand the overall value (magnitude and sign) of a given data set. For a data set, the ''arithme ...

of ''D''. This distance is zero for ''P'' at the mean of ''D'' and grows as ''P'' moves away from the mean along each

principal component Principal may refer to: Title or rank * Principal (academia), the chief executive of a university ** Principal (education), the office holder/ or boss in any school * Principal (civil service) or principal officer, the senior management level ...

axis. If each of these axes is re-scaled to have unit variance, then the Mahalanobis distance corresponds to standard

Euclidean distance In mathematics, the Euclidean distance between two points in Euclidean space is the length of a line segment between the two points. It can be calculated from the Cartesian coordinates of the points using the Pythagorean theorem, therefor ...

in the transformed space. The Mahalanobis distance is thus unitless, scale-invariant, and takes into account the

correlations In statistics, correlation or dependence is any statistical relationship, whether causal or not, between two random variables or bivariate data. Although in the broadest sense, "correlation" may indicate any type of association, in statistics ...

of the

data set A data set (or dataset) is a collection of data. In the case of tabular data, a data set corresponds to one or more database tables, where every column of a table represents a particular variable, and each row corresponds to a given record of the ...

Definition

Given a probability distribution

Q

\R^N

, with mean

\vec = (\mu_1, \mu_2, \mu_3, \dots , \mu_N)^\mathsf

and positive-definite covariance matrix

S

, the Mahalanobis distance of a point

\vec = (x_1, x_2, x_3, \dots, x_N )^\mathsf

from

Q

d_M(\vec, Q) = \sqrt.

Given two points

\vec

and

\vec

\R^N

, the Mahalanobis distance between them with respect to

Q

d_M(\vec ,\vec; Q) = \sqrt.

which means that

d_M(\vec, Q) = d_M(\vec,\vec; Q)

. Since

S

positive-definite In mathematics, positive definiteness is a property of any object to which a bilinear form or a sesquilinear form may be naturally associated, which is positive-definite. See, in particular: * Positive-definite bilinear form * Positive-definite fu ...

, so is

S^

, thus the square roots are always defined. We can find useful decompositions of the squared Mahalanobis distance that help to explain some reasons for the outlyingness of multivariate observations and also provide a graphical tool for identifying outliers. By the

spectral theorem In mathematics, particularly linear algebra and functional analysis, a spectral theorem is a result about when a linear operator or matrix (mathematics), matrix can be Diagonalizable matrix, diagonalized (that is, represented as a diagonal matrix i ...

S^

can be decomposed as

S^ = W^T W

for some real

N\times N

matrix, which gives us the equivalent definition

d_M(\vec, \vec; Q) = \, W(\vec - \vec)\,

where

\, \cdot\,

is the Euclidean norm. That is, the Mahalanobis distance is the Euclidean distance after a

whitening transformation A whitening transformation or sphering transformation is a linear transformation that transforms a vector of random variables with a known covariance matrix into a set of new variables whose covariance is the identity matrix, meaning that they ar ...

. The existence of

W

is guaranteed by the spectral theorem, but it is not unique. Different choices have different theoretical and practical advantages. In practice, the distribution

Q

is usually the sample distribution from a set of

IID In probability theory and statistics, a collection of random variables is independent and identically distributed if each random variable has the same probability distribution as the others and all are mutually independent. This property is us ...

samples from an underlying unknown distribution, so

\mu

is the sample mean, and

S

is the covariance matrix of the samples. When the affine span of the samples is not the entire

\R^N

, the covariance matrix would not be positive-definite, which means the above definition would not work. This is repaired by noting that, in general, the Mahalanobis distance is preserved under any full-rank affine transformation of the affine span of the samples. So in case the affine span is not the entire

\R^N

, the samples can be first projected non-degenerately to

\R^n

, where

n

is the dimension of the affine span of the samples, then the Mahalanobis distance can be computed as usual.

Intuitive explanation

Consider the problem of estimating the probability that a test point in ''N''-dimensional

Euclidean space Euclidean space is the fundamental space of geometry, intended to represent physical space. Originally, that is, in Euclid's Elements, Euclid's ''Elements'', it was the three-dimensional space of Euclidean geometry, but in modern mathematics ther ...

belongs to a set, where we are given sample points that definitely belong to that set. Our first step would be to find the

centroid In mathematics and physics, the centroid, also known as geometric center or center of figure, of a plane figure or solid figure is the arithmetic mean position of all the points in the surface of the figure. The same definition extends to any ob ...

or center of mass of the sample points. Intuitively, the closer the point in question is to this center of mass, the more likely it is to belong to the set. However, we also need to know if the set is spread out over a large range or a small range, so that we can decide whether a given distance from the center is noteworthy or not. The simplistic approach is to estimate the

standard deviation In statistics, the standard deviation is a measure of the amount of variation or dispersion of a set of values. A low standard deviation indicates that the values tend to be close to the mean (also called the expected value) of the set, while ...

of the distances of the sample points from the center of mass. If the distance between the test point and the center of mass is less than one standard deviation, then we might conclude that it is highly probable that the test point belongs to the set. The further away it is, the more likely that the test point should not be classified as belonging to the set. This intuitive approach can be made quantitative by defining the normalized distance between the test point and the set to be

\frac

, which reads:

\frac

. By plugging this into the normal distribution we can derive the probability of the test point belonging to the set. The drawback of the above approach was that we assumed that the sample points are distributed about the center of mass in a spherical manner. Were the distribution to be decidedly non-spherical, for instance ellipsoidal, then we would expect the probability of the test point belonging to the set to depend not only on the distance from the center of mass, but also on the direction. In those directions where the ellipsoid has a short axis the test point must be closer, while in those where the axis is long the test point can be further away from the center. Putting this on a mathematical basis, the ellipsoid that best represents the set's probability distribution can be estimated by building the covariance matrix of the samples. The Mahalanobis distance is the distance of the test point from the center of mass divided by the width of the ellipsoid in the direction of the test point.

Normal distributions

For a

normal distribution In statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is : f(x) = \frac e^ The parameter \mu ...

in any number of dimensions, the probability density of an observation

\vec

is uniquely determined by the Mahalanobis distance

d

: :

\,d\vec = \frac \exp \left(-\frac\right) \,d\vec = \frac \exp(-d^2/2) \,d\vec.

Specifically,

d^2

follows the

chi-squared distribution In probability theory and statistics, the chi-squared distribution (also chi-square or \chi^2-distribution) with k degrees of freedom is the distribution of a sum of the squares of k independent standard normal random variables. The chi-squa ...

with

n

degrees of freedom, where

n

is the number of dimensions of the normal distribution. If the number of dimensions is 2, for example, the probability of a particular calculated

d

being less than some threshold

t

1 - e^

. To determine a threshold to achieve a particular probability,

p

, use

t = \sqrt

, for 2 dimensions. For number of dimensions other than 2, the cumulative chi-squared distribution should be consulted. In a normal distribution, the region where the Mahalanobis distance is less than one (i.e. the region inside the ellipsoid at distance one) is exactly the region where the probability distribution is

concave Concave or concavity may refer to: Science and technology * Concave lens * Concave mirror Mathematics * Concave function, the negative of a convex function * Concave polygon, a polygon which is not convex * Concave set * The concavity In ca ...

. Mahalanobis distance is proportional, for a normal distribution, to the square root of the negative

log-likelihood The likelihood function (often simply called the likelihood) represents the probability of random variable realizations conditional on particular values of the statistical parameters. Thus, when evaluated on a given sample, the likelihood functi ...

(after adding a constant so the minimum is at zero).

Other forms of multivariate location and scatter

Mahalanobis-distance-location-and-scatter-methods

The sample mean and covariance matrix can be quite sensitive to outliers, therefore other approaches to calculating the multivariate location and scatter of data are also commonly used when calculating the Mahalanobis distance. The Minimum Covariance Determinant approach estimates multivariate location and scatter from a subset numbering

h

data points that has the smallest variance-covariance matrix determinant. The Minimum Volume Ellipsoid approach is similar to the Minimum Covariance Determinant approach in that it works with a subset of size

h

data points, but the Minimum Volume Ellipsoid estimates multivariate location and scatter from the ellipsoid of minimal volume that encapsulates the

h

data points. Each method varies in its definition of the distribution of the data, and therefore produces different Mahalanobis distances. The Minimum Covariance Determinant and Minimum Volume Ellipsoid approaches are more robust to samples that contain outliers, while the sample mean and covariance matrix tends to be more reliable with small and biased data sets.

Relationship to normal random variables

In general, given a normal (

Gaussian Carl Friedrich Gauss (1777–1855) is the eponym of all of the topics listed below. There are over 100 topics all named after this German mathematician and scientist, all in the fields of mathematics, physics, and astronomy. The English eponymo ...

) random variable

X

with variance

S=1

and mean

\mu = 0

, any other normal random variable

R

(with mean

\mu_1

and variance

S_1

) can be defined in terms of

X

by the equation

R = \mu_1 + \sqrtX.

Conversely, to recover a normalized random variable from any normal random variable, one can typically solve for

X = (R - \mu_1)/\sqrt

. If we square both sides, and take the square-root, we will get an equation for a metric that looks a lot like the Mahalanobis distance:

D = \sqrt = \sqrt = \sqrt.

The resulting magnitude is always non-negative and varies with the distance of the data from the mean, attributes that are convenient when trying to define a model for the data.

Relationship to leverage

Mahalanobis distance is closely related to the leverage statistic,

h

, but has a different scale:

D^2 = (N - 1)(h - \tfrac).

Applications

Mahalanobis distance is widely used in

cluster analysis Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). It is a main task of ...

and

classification Classification is a process related to categorization, the process in which ideas and objects are recognized, differentiated and understood. Classification is the grouping of related facts into classes. It may also refer to: Business, organizat ...

techniques. It is closely related to

Hotelling's T-square distribution In statistics, particularly in hypothesis testing, the Hotelling's ''T''-squared distribution (''T''2), proposed by Harold Hotelling, is a multivariate probability distribution that is tightly related to the ''F''-distribution and is most notab ...

used for multivariate statistical testing and Fisher's

Linear Discriminant Analysis Linear discriminant analysis (LDA), normal discriminant analysis (NDA), or discriminant function analysis is a generalization of Fisher's linear discriminant, a method used in statistics and other fields, to find a linear combination of features ...

that is used for

supervised classification Supervised learning (SL) is a machine learning paradigm for problems where the available data consists of labelled examples, meaning that each data point contains features (covariates) and an associated label. The goal of supervised learning alg ...

. In order to use the Mahalanobis distance to classify a test point as belonging to one of ''N'' classes, one first estimates the covariance matrix of each class, usually based on samples known to belong to each class. Then, given a test sample, one computes the Mahalanobis distance to each class, and classifies the test point as belonging to that class for which the Mahalanobis distance is minimal. Mahalanobis distance and leverage are often used to detect

outlier In statistics, an outlier is a data point that differs significantly from other observations. An outlier may be due to a variability in the measurement, an indication of novel data, or it may be the result of experimental error; the latter are ...

s, especially in the development of

linear regression In statistics, linear regression is a linear approach for modelling the relationship between a scalar response and one or more explanatory variables (also known as dependent and independent variables). The case of one explanatory variable is call ...

models. A point that has a greater Mahalanobis distance from the rest of the sample population of points is said to have higher leverage since it has a greater influence on the slope or coefficients of the regression equation. Mahalanobis distance is also used to determine multivariate outliers. Regression techniques can be used to determine if a specific case within a sample population is an outlier via the combination of two or more variable scores. Even for normal distributions, a point can be a multivariate outlier even if it is not a univariate outlier for any variable (consider a probability density concentrated along the line

x_1 = x_2

, for example), making Mahalanobis distance a more sensitive measure than checking dimensions individually. Mahalanobis distance has also been used in ecological niche modelling, as the convex elliptical shape of the distances relates well to the concept of the

fundamental niche In ecology, a niche is the match of a species to a specific environmental condition. Three variants of ecological niche are described by It describes how an organism or population responds to the distribution of resources and competitors (for ...

. Another example of usage is in finance, where Mahalanobis distance has been used to compute an indicator called the "turbulence index", which is a statistical measure of financial markets abnormal behaviour. An implementation as a Web API of this indicator is available online.

Software implementations

Many programs and statistics packages, such as R,

Python Python may refer to: Snakes * Pythonidae, a family of nonvenomous snakes found in Africa, Asia, and Australia ** ''Python'' (genus), a genus of Pythonidae found in Africa and Asia * Python (mythology), a mythical serpent Computing * Python (pro ...

, etc., include implementations of Mahalanobis distance.

References

External links

*
Mahalanobis distance tutorial
– interactive online program and spreadsheet computation

– overview of Mahalanobis distance, including MATLAB code
What is Mahalanobis distance?
– intuitive, illustrated explanation, from Rick Wicklin on blogs.sas.com {{DEFAULTSORT:Mahalanobis Distance Statistical distance Multivariate statistics Distance