Energy distance is a

statistical distance In statistics, probability theory, and information theory, a statistical distance quantifies the distance between two statistical objects, which can be two random variables, or two probability distributions or samples, or the distance can be be ...

between probability distributions. If X and Y are independent random vectors in ''R''^d with

cumulative distribution function In probability theory and statistics, the cumulative distribution function (CDF) of a real-valued random variable X, or just distribution function of X, evaluated at x, is the probability that X will take a value less than or equal to x. Ev ...

s (cdf) F and G respectively, then the energy distance between the distributions F and G is defined to be the square root of :

D^2(F, G) = 2\operatorname E\, X - Y\,  - \operatorname E\, X - X'\,  - \operatorname E\, Y - Y'\,  \geq 0,

where (X, X', Y, Y') are independent, the cdf of X and X' is F, the cdf of Y and Y' is G,

\operatorname E

is the

expected value In probability theory, the expected value (also called expectation, expectancy, mathematical expectation, mean, average, or first moment) is a generalization of the weighted average. Informally, the expected value is the arithmetic mean of a l ...

, and , , . , , denotes the

length Length is a measure of distance. In the International System of Quantities, length is a quantity with dimension distance. In most systems of measurement a base unit for length is chosen, from which all other units are derived. In the Interna ...

of a vector. Energy distance satisfies all axioms of a metric thus energy distance characterizes the equality of distributions: D(F,G) = 0 if and only if F = G. Energy distance for statistical applications was introduced in 1985 by Gábor J. Székely, who proved that for real-valued random variables

D^2(F, G)

is exactly twice

Harald Cramér Harald Cramér (; 25 September 1893 – 5 October 1985) was a Swedish mathematician, actuary, and statistician, specializing in mathematical statistics and probabilistic number theory. John Kingman described him as "one of the giants of statist ...

's distance: :

\int_^\infty (F(x) - G(x))^2 \, dx.

For a simple proof of this equivalence, see Székely (2002). In higher dimensions, however, the two distances are different because the energy distance is rotation invariant while Cramér's distance is not. (Notice that Cramér's distance is not the same as the

distribution-free Nonparametric statistics is the branch of statistics that is not based solely on parametrized families of probability distributions (common examples of parameters are the mean and variance). Nonparametric statistics is based on either being dist ...

Cramér–von Mises criterion In statistics the Cramér–von Mises criterion is a criterion used for judging the goodness of fit of a cumulative distribution function F^* compared to a given empirical distribution function F_n, or for comparing two empirical distributions. It ...

Generalization to metric spaces

One can generalize the notion of energy distance to probability distributions on metric spaces. Let

(M, d)

be a

metric space In mathematics, a metric space is a set together with a notion of ''distance'' between its elements, usually called points. The distance is measured by a function called a metric or distance function. Metric spaces are the most general settin ...

with its Borel sigma algebra

\mathcal (M)

. Let

\mathcal (M)

denote the collection of all

probability measure In mathematics, a probability measure is a real-valued function defined on a set of events in a probability space that satisfies measure properties such as ''countable additivity''. The difference between a probability measure and the more gener ...

s on the measurable space

(M, \mathcal (M))

. If μ and ν are probability measures in

\mathcal (M)

, then the energy-distance

D

of μ and ν can be defined as the square root of :

D^2(\mu, \nu)= 2 \operatorname E (X,Y) -  \operatorname E (X,X') -  \operatorname E (Y,Y') .

This is not necessarily non-negative, however. If

(M, d)

is a strongly negative definite kernel, then

D

is a

metric Metric or metrical may refer to: * Metric system, an internationally adopted decimal system of measurement * An adjective indicating relation to measurement in general, or a noun describing a specific type of measurement Mathematics In mathema ...

, and conversely.Klebanov, L. B. (2005) N-distances and their Applications, Karolinum Press, Charles University, Prague. This condition is expressed by saying that

(M, d)

has negative type. Negative type is not sufficient for

D

to be a metric; the latter condition is expressed by saying that

(M, d)

has strong negative type. In this situation, the energy distance is zero if and only if X and Y are identically distributed. An example of a metric of negative type but not of strong negative type is the plane with the taxicab metric. All Euclidean spaces and even separable Hilbert spaces have strong negative type. In the literature on

kernel methods In machine learning, kernel machines are a class of algorithms for pattern analysis, whose best known member is the support-vector machine (SVM). The general task of pattern analysis is to find and study general types of relations (for example ...

for

machine learning Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence. Machine ...

, these generalized notions of energy distance are studied under the name of maximum mean discrepancy. Equivalence of distance based and kernel methods for hypothesis testing is covered by several authors.

Energy statistics

A related statistical concept, the notion of E-statistic or energy-statistic was introduced by Gábor J. Székely in the 1980s when he was giving colloquium lectures in Budapest, Hungary and at MIT, Yale, and Columbia. This concept is based on the notion of Newton’s

potential energy In physics, potential energy is the energy held by an object because of its position relative to other objects, stresses within itself, its electric charge, or other factors. Common types of potential energy include the gravitational potentia ...

.Székely, G.J. (2002) E-statistics: The Energy of Statistical Samples, Technical Report BGSU No 02-16. The idea is to consider statistical observations as

heavenly bodies "Heavenly Bodies" is a song written by Elaine Lifton, Gloria Nissenson and Lee Ritenour, and recorded by American country music artist Earl Thomas Conley. It was released in May 1982 as the first single from the album '' Somewhere Between Right ...

governed by a statistical

which is zero only when an underlying statistical

null hypothesis In scientific research, the null hypothesis (often denoted ''H''0) is the claim that no difference or relationship exists between two sets of data or variables being analyzed. The null hypothesis is that any experimentally observed difference is d ...

is true. Energy statistics are functions of

distances Distance is a numerical or occasionally qualitative measurement of how far apart objects or points are. In physics or everyday usage, distance may refer to a physical length or an estimation based on other criteria (e.g. "two counties over"). ...

between statistical observations. Energy distance and E-statistic were considered as N-distances and N-statistic in Zinger A.A., Kakosyan A.V., Klebanov L.B. Characterization of distributions by means of mean values of some statistics in connection with some probability metrics, Stability Problems for Stochastic Models. Moscow, VNIISI, 1989,47-55. (in Russian), English Translation: A characterization of distributions by mean values of statistics and certain probabilistic metrics A. A. Zinger, A. V. Kakosyan, L. B. Klebanov in Journal of Soviet Mathematics (1992). In the same paper there was given a definition of strongly negative definite kernel, and provided a generalization on metric spaces, discussed above. The book gives these results and their applications to statistical testing as well. The book contains also some applications to recovering the measure from its potential.

Testing for equal distributions

Consider the null hypothesis that two random variables, ''X'' and ''Y'', have the same probability distributions:

\mu = \nu

. For

statistical sample In statistics, quality assurance, and survey methodology, sampling is the selection of a subset (a statistical sample) of individuals from within a statistical population to estimate characteristics of the whole population. Statisticians attempt ...

s from ''X'' and ''Y'': :

x_1, \dots, x_n

and

y_1, \dots, y_m

, the following arithmetic averages of distances are computed between the X and the Y samples: :

A:= \frac \sum_^n \sum_^m \,  x_i - y_j \, , B:= \frac \sum_^n \sum_^n \,  x_i - x_j \, , C:= \frac \sum_^m \sum_^m \,  y_i - y_j\,

. The E-statistic of the underlying null hypothesis is defined as follows: :

E_(X, Y) := 2A - B - C

One can prove that

E_(X, Y) \geq 0

and that the corresponding population value is zero if and only if ''X'' and ''Y'' have the same distribution (

\mu = \nu

). Under this null hypothesis the test statistic :

T = \frac E_(X,Y)

converges in distribution to a quadratic form of independent standard normal random variables. Under the alternative hypothesis ''T'' tends to infinity. This makes it possible to construct a consistent statistical test, the energy test for equal distributions. The E-coefficient of inhomogeneity can also be introduced. This is always between 0 and 1 and is defined as :

H = \frac = 
 \frac
    ,

where

\operatorname E

denotes the

. ''H'' = 0 exactly when ''X'' and ''Y'' have the same distribution.

Goodness-of-fit

A multivariate goodness-of-fit measure is defined for distributions in arbitrary dimension (not restricted by sample size). The energy goodness-of-fit statistic is :

Q_n = n \left( \frac \sum_^n \operatorname E \, x_i - X\, ^\alpha - \operatorname E\, X - X'\, ^\alpha - \frac \sum_^n \sum_^n \, x_i - x_j\, ^\alpha \right),

where X and X' are independent and identically distributed according to the hypothesized distribution, and

\alpha \in (0,2)

. The only required condition is that X has finite

\alpha

moment under the null hypothesis. Under the null hypothesis

\operatorname EQ_n=\operatorname E\, X-X'\, ^\alpha

, and the asymptotic distribution of Q_n is a quadratic form of centered Gaussian random variables. Under an alternative hypothesis, Q_n tends to infinity stochastically, and thus determines a statistically consistent test. For most applications the exponent 1 (Euclidean distance) can be applied. The important special case of testing multivariate normalityReprint
/ref> is implemented in the ''energy'' package for R. Tests are also developed for heavy tailed distributions such as Pareto (

power law In statistics, a power law is a Function (mathematics), functional relationship between two quantities, where a Relative change and difference, relative change in one quantity results in a proportional relative change in the other quantity, inde ...

), or stable distributions by application of exponents in (0,1).

Applications

Applications include: *

Hierarchical clustering In data mining and statistics, hierarchical clustering (also called hierarchical cluster analysis or HCA) is a method of cluster analysis that seeks to build a hierarchy of clusters. Strategies for hierarchical clustering generally fall into ...

(a generalization of Ward's method) * Testing multivariate normality * Testing the multi-sample hypothesis of equal distributions, * Change point detection * Multivariate independence: **

distance correlation In statistics and in probability theory, distance correlation or distance covariance is a measure of dependence between two paired random vectors of arbitrary, not necessarily equal, dimension. The population distance correlation coefficient is zer ...

, **

Brownian covariance In statistics and in probability theory, distance correlation or distance covariance is a measure of dependence between two paired random vectors of arbitrary, not necessarily equal, dimension. The population distance correlation coefficient is ze ...

. *

Scoring rule In decision theory, a scoring rule provides a summary measure for the evaluation of probabilistic predictions or forecasts. It is applicable to tasks in which predictions assign probabilities to events, i.e. one issues a probability distribution ...

s: :Gneiting and Raftery apply energy distance to develop a new and very general type of proper scoring rule for probabilistic predictions, the energy score. * Robust statistics * Scenario reduction * Gene selection * Microarray data analysis * Material structure analysis * Morphometric and chemometric dataE. Vaiciukynas, A. Verikas, A. Gelzinis, M. Bacauskiene, and I. Olenina (2015) Exploiting statistical energy test for comparison of multiple groups in morphometric and chemometric data, Chemometrics and Intelligent Laboratory Systems, 146, 10-23. Applications of energy statistics are implemented in the open source ''energy'' package for R.

References

{{DEFAULTSORT:E-Statistic Statistical distance Statistical hypothesis testing Theory of probability distributions