Total Variation Distance Of Probability Measures
   HOME

TheInfoList



OR:

In probability theory, the total variation distance is a distance measure for probability distributions. It is an example of a
statistical distance In statistics, probability theory, and information theory, a statistical distance quantifies the distance between two statistical objects, which can be two random variables, or two probability distributions or samples, or the distance can be be ...
metric, and is sometimes called the statistical distance, statistical difference or variational distance.


Definition

Consider a measurable space (\Omega, \mathcal) and
probability measure In mathematics, a probability measure is a real-valued function defined on a set of events in a probability space that satisfies measure properties such as ''countable additivity''. The difference between a probability measure and the more gener ...
s P and Q defined on (\Omega, \mathcal). The total variation distance between P and Q is defined as: :\delta(P,Q)=\sup_\left, P(A)-Q(A)\. This is the largest absolute difference between the probabilities that the two
probability distribution In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment. It is a mathematical description of a random phenomenon i ...
s assign to the same event.


Properties

The total variation distance is an ''f''-divergence and an integral probability metric.


Relation to other distances

The total variation distance is related to the Kullback–Leibler divergence by Pinsker’s inequality: :\delta(P,Q) \le \sqrt. One also has the following inequality, due to Bretagnolle and Huber (see, also, Tsybakov), which has the advantage of providing a non-vacuous bound even when D_(P\parallel Q)>2: :\delta(P,Q) \le \sqrt. The total variation distance is half of the L1 distance between the probability functions: on discrete domains this is the distance between
probability mass function In probability and statistics, a probability mass function is a function that gives the probability that a discrete random variable is exactly equal to some value. Sometimes it is also known as the discrete density function. The probability mass ...
s \delta(P, Q) = \frac12 \sum_ , P(x) - Q(x), , The relationship holds more generally as well: \delta(P, Q) = \frac12 \int , p(x) - q(x) , \mathrmx when the distributions have standard
probability density functions In probability theory, a probability density function (PDF), or density of a continuous random variable, is a function whose value at any given sample (or point) in the sample space (the set of possible values taken by the random variable) c ...
''p'' and ''q'', or the analogous distance between Radon-Nikodym derivatives with any common dominating measure. This result can be shown by noticing that the supremum in the definition is achieved exactly at the set where one distribution dominates the other. The total variation distance is related to the Hellinger distance H(P,Q) as follows: : H^2(P,Q) \leq \delta(P,Q) \leq \sqrt 2 H(P,Q). These inequalities follow immediately from the inequalities between the 1-norm and the
2-norm In mathematics, a norm is a function from a real or complex vector space to the non-negative real numbers that behaves in certain ways like the distance from the origin: it commutes with scaling, obeys a form of the triangle inequality, and is z ...
.


Connection to transportation theory

The total variation distance (or half the norm) arises as the optimal transportation cost, when the cost function is c(x,y) = _, that is, :\frac \, P - Q \, _1 = \delta(P,Q) = \inf\ = \inf_\pi \operatorname_ where the expectation is taken with respect to the probability measure \pi on the space where (x,y) lives, and the infimum is taken over all such \pi with marginals P and Q, respectively.


See also

* Total variation * Kolmogorov–Smirnov test * Wasserstein metric


References

Probability theory F-divergences {{probability-stub