statistics Statistics (from German language, German: ', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a s ...

probability theory Probability theory or probability calculus is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expre ...

, and

information theory Information theory is the mathematical study of the quantification (science), quantification, Data storage, storage, and telecommunications, communication of information. The field was established and formalized by Claude Shannon in the 1940s, ...

, a statistical distance quantifies the

distance Distance is a numerical or occasionally qualitative measurement of how far apart objects, points, people, or ideas are. In physics or everyday usage, distance may refer to a physical length or an estimation based on other criteria (e.g. "two co ...

between two statistical objects, which can be two

random variable A random variable (also called random quantity, aleatory variable, or stochastic variable) is a Mathematics, mathematical formalization of a quantity or object which depends on randomness, random events. The term 'random variable' in its mathema ...

s, or two

probability distribution In probability theory and statistics, a probability distribution is a Function (mathematics), function that gives the probabilities of occurrence of possible events for an Experiment (probability theory), experiment. It is a mathematical descri ...

s or samples, or the distance can be between an individual sample point and a population or a wider sample of points. A distance between populations can be interpreted as measuring the distance between two

s and hence they are essentially measures of distances between

probability measure In mathematics, a probability measure is a real-valued function defined on a set of events in a σ-algebra that satisfies Measure (mathematics), measure properties such as ''countable additivity''. The difference between a probability measure an ...

s. Where statistical distance measures relate to the differences between

s, these may have statistical dependence, and hence these distances are not directly related to measures of distances between probability measures. Again, a measure of distance between random variables may relate to the extent of dependence between them, rather than to their individual values. Many statistical distance measures are not metrics, and some are not symmetric. Some types of distance measures, which generalize ''squared'' distance, are referred to as (statistical) ''

divergence In vector calculus, divergence is a vector operator that operates on a vector field, producing a scalar field giving the rate that the vector field alters the volume in an infinitesimal neighborhood of each point. (In 2D this "volume" refers to ...

s''.

Terminology

Many terms are used to refer to various notions of distance; these are often confusingly similar, and may be used inconsistently between authors and over time, either loosely or with precise technical meaning. In addition to "distance", similar terms include deviance, deviation, discrepancy, discrimination, and

, as well as others such as contrast function and metric. Terms from

include cross entropy, relative entropy, discrimination information, and information gain.

Distances as metrics

Metrics

A metric on a set ''X'' is a function (called the ''distance function'' or simply distance) ''d'' : ''X'' × ''X'' → R⁺ (where R⁺ is the set of non-negative

real number In mathematics, a real number is a number that can be used to measure a continuous one- dimensional quantity such as a duration or temperature. Here, ''continuous'' means that pairs of values can have arbitrarily small differences. Every re ...

s). For all ''x'', ''y'', ''z'' in ''X'', this function is required to satisfy the following conditions: # ''d''(''x'', ''y'') ≥ 0 ('' non-negativity'') # ''d''(''x'', ''y'') = 0 if and only if ''x'' = ''y'' ('' identity of indiscernibles''. Note that condition 1 and 2 together produce '' positive definiteness'') # ''d''(''x'', ''y'') = ''d''(''y'', ''x'') (''

symmetry Symmetry () in everyday life refers to a sense of harmonious and beautiful proportion and balance. In mathematics, the term has a more precise definition and is usually used to refer to an object that is Invariant (mathematics), invariant und ...

'') # ''d''(''x'', ''z'') ≤ ''d''(''x'', ''y'') + ''d''(''y'', ''z'') ('' subadditivity'' / ''

triangle inequality In mathematics, the triangle inequality states that for any triangle, the sum of the lengths of any two sides must be greater than or equal to the length of the remaining side. This statement permits the inclusion of Degeneracy (mathematics)#T ...

'').

Generalized metrics

Many statistical distances are not metrics, because they lack one or more properties of proper metrics. For example, pseudometrics violate property (2), identity of indiscernibles; quasimetrics violate property (3), symmetry; and semimetrics violate property (4), the triangle inequality. Statistical distances that satisfy (1) and (2) are referred to as

Statistically close

The total variation distance of two distributions

X

and

Y

over a finite domain

D

, (often referred to as ''statistical difference'' or ''statistical distance'' Reyzin, Leo. (Lecture Notes
Extractors and the Leftover Hash Lemma
in cryptography) is defined as

\Delta(X,Y)=\frac \sum _ ,  \Pr =\alpha - \Pr =\alpha,

. We say that two probability ensembles

\_

and

\_

are statistically close if

\Delta(X_k,Y_k)

is a negligible function in

k

Examples

Metrics

Total variation distance In probability theory, the total variation distance is a statistical distance between probability distributions, and is sometimes called the statistical distance, statistical difference or variational distance. Definition Consider a measurable ...

(sometimes just called "the" statistical distance) *

Hellinger distance In probability and statistics, the Hellinger distance (closely related to, although different from, the Bhattacharyya distance) is used to quantify the similarity between two probability distributions. It is a type of ''f''-divergence. The Hell ...

* Lévy–Prokhorov metric * Wasserstein metric: also known as the Kantorovich metric, or earth mover's distance * Mahalanobis distance * Integral probability metrics generalize several metrics or pseudometrics on distributions

Divergences

Kullback–Leibler divergence In mathematical statistics, the Kullback–Leibler (KL) divergence (also called relative entropy and I-divergence), denoted D_\text(P \parallel Q), is a type of statistical distance: a measure of how much a model probability distribution is diff ...

* Rényi divergence *

Jensen–Shannon divergence In probability theory and statistics, the Jensen–Shannon divergence, named after Johan Jensen and Claude Shannon, is a method of measuring the similarity between two probability distributions. It is also known as information radius (IRad) or to ...

* Ball divergence * Bhattacharyya distance (despite its name it is not a distance, as it violates the triangle inequality) *

f-divergence In probability theory, an f-divergence is a certain type of function D_f(P\, Q) that measures the difference between two probability distributions P and Q. Many common divergences, such as KL-divergence, Hellinger distance, and total variation ...

: generalizes several distances and divergences * Discriminability index, specifically the Bayes discriminability index, is a positive-definite symmetric measure of the overlap of two distributions.

Notes

External links

References

*Dodge, Y. (2003) ''Oxford Dictionary of Statistical Terms'', OUP. {{ISBN, 0-19-920613-9

Terminology

Distances as metrics

Metrics

Generalized metrics

Statistically close

Examples

Metrics

Divergences

See also

Notes

External links

References