HOME

TheInfoList



OR:

In
probability theory Probability theory or probability calculus is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expre ...
, the total variation distance is a statistical distance between
probability distribution In probability theory and statistics, a probability distribution is a Function (mathematics), function that gives the probabilities of occurrence of possible events for an Experiment (probability theory), experiment. It is a mathematical descri ...
s, and is sometimes called the statistical distance, statistical difference or variational distance.


Definition

Consider a measurable space (\Omega, \mathcal) and
probability measure In mathematics, a probability measure is a real-valued function defined on a set of events in a σ-algebra that satisfies Measure (mathematics), measure properties such as ''countable additivity''. The difference between a probability measure an ...
s P and Q defined on (\Omega, \mathcal). The total variation distance between P and Q is defined as :\delta(P,Q)=\sup_\left, P(A)-Q(A)\. This is the largest absolute difference between the probabilities that the two
probability distribution In probability theory and statistics, a probability distribution is a Function (mathematics), function that gives the probabilities of occurrence of possible events for an Experiment (probability theory), experiment. It is a mathematical descri ...
s assign to the same event.


Properties

The total variation distance is an ''f''-divergence and an integral probability metric.


Relation to other distances

The total variation distance is related to the
Kullback–Leibler divergence In mathematical statistics, the Kullback–Leibler (KL) divergence (also called relative entropy and I-divergence), denoted D_\text(P \parallel Q), is a type of statistical distance: a measure of how much a model probability distribution is diff ...
by Pinsker’s inequality: :\delta(P,Q) \le \sqrt. One also has the following inequality, due to Bretagnolle and Huber (see also ), which has the advantage of providing a non-vacuous bound even when \textstyle D_(P\parallel Q)>2\colon :\delta(P,Q) \le \sqrt. The total variation distance is half of the L1 distance between the probability functions: on discrete domains, this is the distance between the
probability mass function In probability and statistics, a probability mass function (sometimes called ''probability function'' or ''frequency function'') is a function that gives the probability that a discrete random variable is exactly equal to some value. Sometimes i ...
s :\delta(P, Q) = \frac12 \sum_ , P(x) - Q(x), , and when the distributions have standard probability density functions and , :\delta(P, Q) = \frac12 \int , p(x) - q(x) , \, \mathrmx (or the analogous distance between Radon-Nikodym derivatives with any common dominating measure). This result can be shown by noticing that the supremum in the definition is achieved exactly at the set where one distribution dominates the other. The total variation distance is related to the
Hellinger distance In probability and statistics, the Hellinger distance (closely related to, although different from, the Bhattacharyya distance) is used to quantify the similarity between two probability distributions. It is a type of ''f''-divergence. The Hell ...
H(P,Q) as follows: : H^2(P,Q) \leq \delta(P,Q) \leq \sqrt 2 H(P,Q). These inequalities follow immediately from the inequalities between the 1-norm and the 2-norm.


Connection to transportation theory

The total variation distance (or half the norm) arises as the optimal transportation cost, when the cost function is c(x,y) = _, that is, :\frac \, P - Q \, _1 = \delta(P,Q) = \inf\big\ = \inf_\pi \operatorname_ where the expectation is taken with respect to the probability measure \pi on the space where (x,y) lives, and the infimum is taken over all such \pi with marginals P and Q, respectively.


See also

*
Total variation In mathematics, the total variation identifies several slightly different concepts, related to the (local property, local or global) structure of the codomain of a Function (mathematics), function or a measure (mathematics), measure. For a real ...
*
Kolmogorov–Smirnov test In statistics, the Kolmogorov–Smirnov test (also K–S test or KS test) is a nonparametric statistics, nonparametric test of the equality of continuous (or discontinuous, see #Discrete and mixed null distribution, Section 2.2), one-dimensional ...
* Wasserstein metric


References

Probability theory F-divergences {{probability-stub