Distance standard deviation
   HOME

TheInfoList



OR:

In
statistics Statistics (from German: '' Statistik'', "description of a state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a scientific, indust ...
and in
probability theory Probability theory is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expressing it through a set ...
, distance correlation or distance covariance is a measure of dependence between two paired
random vector In probability, and statistics, a multivariate random variable or random vector is a list of mathematical variables each of whose value is unknown, either because the value has not yet occurred or because there is imperfect knowledge of its value ...
s of arbitrary, not necessarily equal,
dimension In physics and mathematics, the dimension of a mathematical space (or object) is informally defined as the minimum number of coordinates needed to specify any point within it. Thus, a line has a dimension of one (1D) because only one coord ...
. The population distance correlation coefficient is zero if and only if the random vectors are
independent Independent or Independents may refer to: Arts, entertainment, and media Artist groups * Independents (artist group), a group of modernist painters based in the New Hope, Pennsylvania, area of the United States during the early 1930s * Independe ...
. Thus, distance correlation measures both linear and nonlinear association between two random variables or random vectors. This is in contrast to Pearson's correlation, which can only detect linear association between two
random variable A random variable (also called random quantity, aleatory variable, or stochastic variable) is a mathematical formalization of a quantity or object which depends on random events. It is a mapping or a function from possible outcomes (e.g., the po ...
s. Distance correlation can be used to perform a
statistical test A statistical hypothesis test is a method of statistical inference used to decide whether the data at hand sufficiently support a particular hypothesis. Hypothesis testing allows us to make probabilistic statements about population parameters. ...
of dependence with a
permutation test A permutation test (also called re-randomization test) is an exact statistical hypothesis test making use of the proof by contradiction. A permutation test involves two or more samples. The null hypothesis is that all samples come from the same di ...
. One first computes the distance correlation (involving the re-centering of Euclidean distance matrices) between two random vectors, and then compares this value to the distance correlations of many shuffles of the data.


Background

The classical measure of dependence, the
Pearson correlation coefficient In statistics, the Pearson correlation coefficient (PCC, pronounced ) ― also known as Pearson's ''r'', the Pearson product-moment correlation coefficient (PPMCC), the bivariate correlation, or colloquially simply as the correlation coefficient ...
, is mainly sensitive to a linear relationship between two variables. Distance correlation was introduced in 2005 by Gábor J. Székely in several lectures to address this deficiency of Pearson's
correlation In statistics, correlation or dependence is any statistical relationship, whether causal or not, between two random variables or bivariate data. Although in the broadest sense, "correlation" may indicate any type of association, in statistic ...
, namely that it can easily be zero for dependent variables. Correlation = 0 (uncorrelatedness) does not imply independence while distance correlation = 0 does imply independence. The first results on distance correlation were published in 2007 and 2009. It was proved that distance covariance is the same as the Brownian covariance. These measures are examples of
energy distance Energy distance is a statistical distance between probability distributions. If X and Y are independent random vectors in ''R''d with cumulative distribution functions (cdf) F and G respectively, then the energy distance between the distributions ...
s. The distance correlation is derived from a number of other quantities that are used in its specification, specifically: distance variance, distance standard deviation, and distance covariance. These quantities take the same roles as the ordinary
moment Moment or Moments may refer to: * Present time Music * The Moments, American R&B vocal group Albums * ''Moment'' (Dark Tranquillity album), 2020 * ''Moment'' (Speed album), 1998 * ''Moments'' (Darude album) * ''Moments'' (Christine Guldbrand ...
s with corresponding names in the specification of the
Pearson product-moment correlation coefficient In statistics, the Pearson correlation coefficient (PCC, pronounced ) ― also known as Pearson's ''r'', the Pearson product-moment correlation coefficient (PPMCC), the bivariate correlation, or colloquially simply as the correlation coefficient ...
.


Definitions


Distance covariance

Let us start with the definition of the sample distance covariance. Let (''X''''k'', ''Y''''k''), ''k'' = 1, 2, ..., ''n'' be a
statistical sample In statistics, quality assurance, and survey methodology, sampling is the selection of a subset (a statistical sample) of individuals from within a statistical population to estimate characteristics of the whole population. Statisticians attem ...
from a pair of real valued or vector valued random variables (''X'', ''Y''). First, compute the ''n'' by ''n'' distance matrices (''a''''j'', ''k'') and (''b''''j'', ''k'') containing all pairwise distances : \begin a_ &= \, X_j-X_k\, , \qquad j, k =1,2,\ldots,n, \\ b_ &= \, Y_j-Y_k\, , \qquad j, k=1,2,\ldots,n, \end where , , ⋅ , , denotes
Euclidean norm Euclidean space is the fundamental space of geometry, intended to represent physical space. Originally, that is, in Euclid's ''Elements'', it was the three-dimensional space of Euclidean geometry, but in modern mathematics there are Euclidean ...
. Then take all doubly centered distances : A_ := a_-\overline_-\overline_ + \overline_, \qquad B_ := b_ - \overline_ -\overline_ + \overline_, where \textstyle \overline_ is the -th row mean, \textstyle \overline_ is the -th column mean, and \textstyle \overline_ is the
grand mean The grand mean or pooled mean is the average of the means of several subsamples, as long as the subsamples have the same number of data points. For example, consider several lots, each containing several items. The items from each lot are sampling ( ...
of the distance matrix of the sample. The notation is similar for the values. (In the matrices of centered distances (''A''''j'', ''k'') and (''B''''j'',''k'') all rows and all columns sum to zero.) The squared sample distance covariance (a scalar) is simply the arithmetic average of the products ''A''''j'', ''k ''''B''''j'', ''k'': : \operatorname^2_n(X,Y) := \frac 1 \sum_^n \sum_^n A_ \, B_. The statistic ''T''''n'' = ''n'' dCov2''n''(''X'', ''Y'') determines a consistent multivariate test of independence of random vectors in arbitrary dimensions. For an implementation see ''dcov.test'' function in the ''energy'' package for R. The population value of distance covariance can be defined along the same lines. Let ''X'' be a random variable that takes values in a ''p''-dimensional Euclidean space with probability distribution and let ''Y'' be a random variable that takes values in a ''q''-dimensional Euclidean space with probability distribution , and suppose that ''X'' and ''Y'' have finite expectations. Write :a_\mu(x):= \operatorname X-x\, \quad D(\mu) := \operatorname _\mu(X) \quad d_\mu(x, x') := \, x-x'\, -a_\mu(x)-a_\mu(x')+D(\mu). Finally, define the population value of squared distance covariance of ''X'' and ''Y'' as :\operatorname^2(X, Y) := \operatorname\big _\mu(X,X')d_\nu(Y,Y')\big One can show that this is equivalent to the following definition: : \begin \operatorname^2(X,Y) := & \operatorname X-X'\, \,\, Y-Y'\, + \operatorname X-X'\, ,\operatorname Y-Y'\, \\ &\qquad - \operatorname X-X'\, \,\, Y-Y''\, - \operatorname X-X''\, \,\, Y-Y'\, \\ = & \operatorname X-X'\, \,\, Y-Y'\, + \operatorname X-X'\, ,\operatorname Y-Y'\, \\ &\qquad - 2\operatorname X-X'\, \,\, Y-Y''\, \end where ''E'' denotes expected value, and \textstyle (X, Y), \textstyle (X', Y'), and \textstyle (X'',Y'') are independent and identically distributed. The primed random variables \textstyle (X', Y') and \textstyle (X'',Y'') denote independent and identically distributed (iid) copies of the variables X and Y and are similarly iid. Distance covariance can be expressed in terms of the classical Pearson's
covariance In probability theory and statistics, covariance is a measure of the joint variability of two random variables. If the greater values of one variable mainly correspond with the greater values of the other variable, and the same holds for the le ...
, cov, as follows: :\operatorname^2(X,Y) = \operatorname(\, X-X'\, ,\, Y-Y'\, ) - 2\operatorname(\, X-X'\, ,\, Y-Y''\, ). This identity shows that the distance covariance is not the same as the covariance of distances, ). This can be zero even if ''X'' and ''Y'' are not independent. Alternatively, the distance covariance can be defined as the weighted ''L''2 norm of the distance between the joint
characteristic function In mathematics, the term "characteristic function" can refer to any of several distinct concepts: * The indicator function of a subset, that is the function ::\mathbf_A\colon X \to \, :which for a given subset ''A'' of ''X'', has value 1 at points ...
of the random variables and the product of their marginal characteristic functions:, Theorem 7, (3.7). : \operatorname^2(X,Y)= \frac 1 \int_ \frac \,dt\,ds where \varphi_(s,t), \varphi_(s), and \varphi_(t) are the characteristic functions of ''X'', and ''Y'', respectively, ''p'', ''q'' denote the Euclidean dimension of ''X'' and ''Y'', and thus of ''s'' and ''t'', and ''c''''p'', ''c''''q'' are constants. The weight function ()^ is chosen to produce a scale equivariant and rotation invariant measure that doesn't go to zero for dependent variables. One interpretation of the characteristic function definition is that the variables ''eisX'' and ''eitY'' are cyclic representations of ''X'' and ''Y'' with different periods given by ''s'' and ''t'', and the expression in the numerator of the characteristic function definition of distance covariance is simply the classical covariance of ''eisX'' and ''eitY''. The characteristic function definition clearly shows that dCov2(''X'', ''Y'') = 0 if and only if ''X'' and ''Y'' are independent.


Distance variance and distance standard deviation

The ''distance variance'' is a special case of distance covariance when the two variables are identical. The population value of distance variance is the square root of : \operatorname^2(X) := \operatorname X-X'\, ^2+ \operatorname^2 X-X'\, - 2\operatorname X-X'\, \,\, X-X''\, where X, X', and X'' are
independent and identically distributed random variables In probability theory and statistics, a collection of random variables is independent and identically distributed if each random variable has the same probability distribution as the others and all are mutually independent. This property is usu ...
, \operatorname denotes the
expected value In probability theory, the expected value (also called expectation, expectancy, mathematical expectation, mean, average, or first moment) is a generalization of the weighted average. Informally, the expected value is the arithmetic mean of a ...
, and f^2(\cdot)=(f(\cdot))^2 for function f(\cdot), e.g., \operatorname^2 cdot= (\operatorname cdot^2. The ''sample distance variance'' is the square root of : \operatorname^2_n(X) := \operatorname^2_n(X,X) = \tfrac\sum_A_^2, which is a relative of Corrado Gini's mean difference introduced in 1912 (but Gini did not work with centered distances). The ''distance standard deviation'' is the square root of the ''distance variance''.


Distance correlation

The ''distance correlation'' of two random variables is obtained by dividing their ''distance covariance'' by the product of their ''distance standard deviations''. The distance correlation is the square root of : \operatorname^2(X,Y) = \frac, and the ''sample distance correlation'' is defined by substituting the sample distance covariance and distance variances for the population coefficients above. For easy computation of sample distance correlation see the ''dcor'' function in the ''energy'' package for R.


Properties


Distance correlation


Distance covariance

This last property is the most important effect of working with centered distances. The statistic \operatorname^2_n(X,Y) is a biased estimator of \operatorname^2(X,Y). Under independence of X and Y : \begin \operatorname operatorname^2_n(X,Y)& = \frac \left\ \\ pt& = \frac\operatorname X-X'\, ,\operatorname Y-Y'\, \end An unbiased estimator of \operatorname^2(X,Y) is given by Székely and Rizzo.


Distance variance

Equality holds in (iv) if and only if one of the random variables or is a constant.


Generalization

Distance covariance can be generalized to include powers of Euclidean distance. Define : \begin \operatorname^2(X, Y; \alpha) := & \operatorname X-X'\, ^\alpha\,\, Y-Y'\, ^\alpha+ \operatorname X-X'\, ^\alpha,\operatorname Y-Y'\, ^\alpha\ &\qquad - 2\operatorname X-X'\, ^\alpha\,\, Y-Y''\, ^\alpha \end Then for every 0<\alpha<2, X and Y are independent if and only if \operatorname^2(X, Y; \alpha) = 0. It is important to note that this characterization does not hold for exponent \alpha=2; in this case for bivariate (X, Y), \operatorname(X, Y; \alpha=2) is a deterministic function of the Pearson correlation. If a_ and b_ are \alpha powers of the corresponding distances, 0<\alpha\leq2, then \alpha sample distance covariance can be defined as the nonnegative number for which : \operatorname^2_n(X, Y; \alpha):= \frac\sum_A_\,B_. One can extend \operatorname to metric-space-valued
random variables A random variable (also called random quantity, aleatory variable, or stochastic variable) is a mathematical formalization of a quantity or object which depends on random events. It is a mapping or a function from possible outcomes (e.g., the po ...
X and Y: If X has law \mu in a metric space with metric d, then define a_\mu(x):= \operatorname (X, x)/math>, D(\mu) := \operatorname _\mu(X)/math>, and (provided a_\mu is finite, i.e., X has finite first moment), d_\mu(x, x') := d(x, x')-a_\mu(x)-a_\mu(x')+D(\mu). Then if Y has law \nu (in a possibly different metric space with finite first moment), define : \operatorname^2(X, Y) := \operatorname\big _\mu(X,X')d_\nu(Y,Y')\big This is non-negative for all such X, Y iff both metric spaces have negative type. Here, a metric space (M, d) has negative type if (M, d^) is isometric to a subset of a
Hilbert space In mathematics, Hilbert spaces (named after David Hilbert) allow generalizing the methods of linear algebra and calculus from (finite-dimensional) Euclidean vector spaces to spaces that may be infinite-dimensional. Hilbert spaces arise natural ...
. If both metric spaces have strong negative type, then \operatorname^2(X, Y)= 0 iff X, Y are independent.


Alternative definition of distance covariance

The original distance covariance has been defined as the square root of \operatorname^2(X,Y), rather than the squared coefficient itself. \operatorname(X,Y) has the property that it is the
energy distance Energy distance is a statistical distance between probability distributions. If X and Y are independent random vectors in ''R''d with cumulative distribution functions (cdf) F and G respectively, then the energy distance between the distributions ...
between the joint distribution of \operatorname X, Y and the product of its marginals. Under this definition, however, the distance variance, rather than the distance standard deviation, is measured in the same units as the \operatorname X distances. Alternately, one could define ''distance covariance'' to be the square of the energy distance: \operatorname^2(X,Y). In this case, the distance standard deviation of X is measured in the same units as X distance, and there exists an unbiased estimator for the population distance covariance. Under these alternate definitions, the distance correlation is also defined as the square \operatorname^2(X,Y), rather than the square root.


Alternative formulation: Brownian covariance

Brownian covariance is motivated by generalization of the notion of covariance to stochastic processes. The square of the covariance of random variables X and Y can be written in the following form: : \operatorname(X,Y)^2 = \operatorname\left \big(X - \operatorname(X)\big) \big(X^\mathrm - \operatorname(X^\mathrm)\big) \big(Y - \operatorname(Y)\big) \big(Y^\mathrm - \operatorname(Y^\mathrm)\big) \right where E denotes the
expected value In probability theory, the expected value (also called expectation, expectancy, mathematical expectation, mean, average, or first moment) is a generalization of the weighted average. Informally, the expected value is the arithmetic mean of a ...
and the prime denotes independent and identically distributed copies. We need the following generalization of this formula. If U(s), V(t) are arbitrary random processes defined for all real s and t then define the U-centered version of X by : X_U := U(X) - \operatorname_X\left U(X) \mid \left \ \right whenever the subtracted conditional expected value exists and denote by YV the V-centered version of Y. The (U,V) covariance of (X,Y) is defined as the nonnegative number whose square is : \operatorname_^2(X,Y) := \operatorname\left _U X_U^\mathrm Y_V Y_V^\mathrm\right whenever the right-hand side is nonnegative and finite. The most important example is when U and V are two-sided independent
Brownian motion Brownian motion, or pedesis (from grc, πήδησις "leaping"), is the random motion of particles suspended in a medium (a liquid or a gas). This pattern of motion typically consists of random fluctuations in a particle's position insi ...
s /
Wiener process In mathematics, the Wiener process is a real-valued continuous-time stochastic process named in honor of American mathematician Norbert Wiener for his investigations on the mathematical properties of the one-dimensional Brownian motion. It is ...
es with expectation zero and covariance (for nonnegative s, t only). (This is twice the covariance of the standard Wiener process; here the factor 2 simplifies the computations.) In this case the (''U'',''V'') covariance is called Brownian covariance and is denoted by : \operatorname_W(X,Y). There is a surprising coincidence: The Brownian covariance is the same as the distance covariance: : \operatorname_(X, Y) = \operatorname(X, Y), and thus Brownian correlation is the same as distance correlation. On the other hand, if we replace the Brownian motion with the deterministic identity function ''id'' then Covid(''X'',''Y'') is simply the absolute value of the classical Pearson
covariance In probability theory and statistics, covariance is a measure of the joint variability of two random variables. If the greater values of one variable mainly correspond with the greater values of the other variable, and the same holds for the le ...
, : \operatorname_(X,Y) = \left\vert\operatorname(X,Y)\right\vert.


Related metrics

Other correlational metrics, including kernel-based correlational metrics (such as the Hilbert-Schmidt Independence Criterion or HSIC) can also detect linear and nonlinear interactions. Both distance correlation and kernel-based metrics can be used in methods such as
canonical correlation analysis In statistics, canonical-correlation analysis (CCA), also called canonical variates analysis, is a way of inferring information from cross-covariance matrices. If we have two vectors ''X'' = (''X''1, ..., ''X'n'') and ''Y' ...
and
independent component analysis In signal processing, independent component analysis (ICA) is a computational method for separating a multivariate signal into additive subcomponents. This is done by assuming that at most one subcomponent is Gaussian and that the subcomponents ar ...
to yield stronger
statistical power In statistics, the power of a binary hypothesis test is the probability that the test correctly rejects the null hypothesis (H_0) when a specific alternative hypothesis (H_1) is true. It is commonly denoted by 1-\beta, and represents the chances ...
.


See also

*
RV coefficient In statistics, the RV coefficient is a multivariate generalization of the ''squared'' Pearson correlation coefficient (because the RV coefficient takes values between 0 and 1). It measures the closeness of two set of points that may each be repres ...
* For a related third-order statistic, see Distance skewness.


Notes


References

* * * * * * * * * * * * *


External links


E-statistics (energy statistics)
{{DEFAULTSORT:Distance Correlation Statistical distance Theory of probability distributions Covariance and correlation