In
probability theory
Probability theory is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expressing it through a set o ...
, the total variation distance is a distance measure for probability distributions. It is an example of a
statistical distance metric, and is sometimes called the statistical distance, statistical difference or variational distance.
Definition
Consider a
measurable space and
probability measure
In mathematics, a probability measure is a real-valued function defined on a set of events in a probability space that satisfies measure properties such as ''countable additivity''. The difference between a probability measure and the more g ...
s
and
defined on
.
The total variation distance between
and
is defined as:
:
Informally, this is the largest possible difference between the probabilities that the two
probability distribution
In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment. It is a mathematical description of a random phenomeno ...
s can assign to the same event.
Properties
Relation to other distances
The total variation distance is related to the
Kullback–Leibler divergence
In mathematical statistics, the Kullback–Leibler divergence (also called relative entropy and I-divergence), denoted D_\text(P \parallel Q), is a type of statistical distance: a measure of how one probability distribution ''P'' is different fro ...
by
Pinsker’s inequality:
:
One also has the following inequality, due to
Bretagnolle and Huber (see, also, Tsybakov), which has the advantage of providing a non-vacuous bound even when
:
:
When
is countable, the total variation distance is related to the
L1 norm by the identity:
:
The total variation distance is related to the
Hellinger distance as follows:
:
These inequalities follow immediately from the inequalities between the
1-norm and the
2-norm
In mathematics, a norm is a function from a real or complex vector space to the non-negative real numbers that behaves in certain ways like the distance from the origin: it commutes with scaling, obeys a form of the triangle inequality, and is z ...
.
Connection to transportation theory
The total variation distance (or half the norm) arises as the optimal transportation cost, when the cost function is
, that is,
:
where the expectation is taken with respect to the probability measure
on the space where
lives, and the infimum is taken over all such
with marginals
and
, respectively.
See also
*
Total variation
*
Kolmogorov–Smirnov test
In statistics, the Kolmogorov–Smirnov test (K–S test or KS test) is a nonparametric test of the equality of continuous (or discontinuous, see Section 2.2), one-dimensional probability distributions that can be used to compare a sample wi ...
*
Wasserstein metric
References
Probability theory
F-divergences
{{probability-stub