Total variation distance of probability measures
   HOME

TheInfoList



OR:

In
probability theory Probability theory is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expressing it through a set ...
, the total variation distance is a distance measure for probability distributions. It is an example of a statistical distance metric, and is sometimes called the statistical distance, statistical difference or variational distance.


Definition

Consider a
measurable space In mathematics, a measurable space or Borel space is a basic object in measure theory. It consists of a set and a σ-algebra, which defines the subsets that will be measured. Definition Consider a set X and a σ-algebra \mathcal A on X. Then the ...
(\Omega, \mathcal) and probability measures P and Q defined on (\Omega, \mathcal). The total variation distance between P and Q is defined as: :\delta(P,Q)=\sup_\left, P(A)-Q(A)\. Informally, this is the largest possible difference between the probabilities that the two probability distributions can assign to the same event.


Properties


Relation to other distances

The total variation distance is related to the
Kullback–Leibler divergence In mathematical statistics, the Kullback–Leibler divergence (also called relative entropy and I-divergence), denoted D_\text(P \parallel Q), is a type of statistical distance: a measure of how one probability distribution ''P'' is different fr ...
by Pinsker’s inequality: :\delta(P,Q) \le \sqrt. One also has the following inequality, due to Bretagnolle and Huber (see, also, Tsybakov), which has the advantage of providing a non-vacuous bound even when D_(P\parallel Q)>2: :\delta(P,Q) \le \sqrt. When \Omega is countable, the total variation distance is related to the L1 norm by the identity: :\delta(P, Q)=\frac12\, P-Q\, _1=\frac12\sum_, P(\)-Q(\), The total variation distance is related to the
Hellinger distance In probability and statistics, the Hellinger distance (closely related to, although different from, the Bhattacharyya distance) is used to quantify the similarity between two probability distributions. It is a type of ''f''-divergence. The Hellin ...
H(P,Q) as follows: : H^2(P,Q) \leq \delta(P,Q) \leq \sqrt 2 H(P,Q). These inequalities follow immediately from the inequalities between the 1-norm and the
2-norm In mathematics, a norm is a function from a real or complex vector space to the non-negative real numbers that behaves in certain ways like the distance from the origin: it commutes with scaling, obeys a form of the triangle inequality, and is z ...
.


Connection to transportation theory

The total variation distance (or half the norm) arises as the optimal transportation cost, when the cost function is c(x,y) = _, that is, :\frac \, P - Q \, _1 = \delta(P,Q) = \inf\ = \inf_\pi \operatorname_ where the expectation is taken with respect to the probability measure \pi on the space where (x,y) lives, and the infimum is taken over all such \pi with marginals P and Q, respectively.


See also

*
Total variation In mathematics, the total variation identifies several slightly different concepts, related to the ( local or global) structure of the codomain of a function or a measure. For a real-valued continuous function ''f'', defined on an interval ...
*
Kolmogorov–Smirnov test In statistics, the Kolmogorov–Smirnov test (K–S test or KS test) is a nonparametric test of the equality of continuous (or discontinuous, see Section 2.2), one-dimensional probability distributions that can be used to compare a sample wit ...
*
Wasserstein metric In mathematics, the Leonid Vaseršteĭn, Wasserstein distance or Leonid Kantorovich, Kantorovich–Gennadii Rubinstein, Rubinstein metric is a metric (mathematics), distance function defined between Probability distribution, probability distributi ...


References

Probability theory F-divergences {{probability-stub