In
information theory
Information theory is the scientific study of the quantification, storage, and communication of information. The field was originally established by the works of Harry Nyquist and Ralph Hartley, in the 1920s, and Claude Shannon in the 1940s. ...
, Pinsker's inequality, named after its inventor
Mark Semenovich Pinsker, is an
inequality that bounds the
total variation distance In probability theory, the total variation distance is a distance measure for probability distributions. It is an example of a statistical distance metric, and is sometimes called the statistical distance, statistical difference or variational dist ...
(or statistical distance) in terms of the
Kullback–Leibler divergence
In mathematical statistics, the Kullback–Leibler divergence (also called relative entropy and I-divergence), denoted D_\text(P \parallel Q), is a type of statistical distance: a measure of how one probability distribution ''P'' is different fro ...
.
The inequality is tight up to constant factors.
Formal statement
Pinsker's inequality states that, if
and
are two
probability distribution
In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment. It is a mathematical description of a random phenomeno ...
s on a
measurable space
In mathematics, a measurable space or Borel space is a basic object in measure theory. It consists of a set and a σ-algebra, which defines the subsets that will be measured.
Definition
Consider a set X and a σ-algebra \mathcal A on X. Then ...
, then
:
where
:
is the
total variation distance In probability theory, the total variation distance is a distance measure for probability distributions. It is an example of a statistical distance metric, and is sometimes called the statistical distance, statistical difference or variational dist ...
(or statistical distance) between
and
and
:
is the
Kullback–Leibler divergence
In mathematical statistics, the Kullback–Leibler divergence (also called relative entropy and I-divergence), denoted D_\text(P \parallel Q), is a type of statistical distance: a measure of how one probability distribution ''P'' is different fro ...
in
nats. When the sample space
is a finite set, the Kullback–Leibler divergence is given by
:
Note that in terms of the
total variation norm of the
signed measure
In mathematics, signed measure is a generalization of the concept of (positive) measure by allowing the set function to take negative values.
Definition
There are two slightly different concepts of a signed measure, depending on whether or not ...
, Pinsker's inequality differs from the one given above by a factor of two:
:
A proof of Pinsker's inequality uses the
partition inequality for
''f''-divergences.
Alternative version
Note that the expression of Pinsker inequality depends on what basis of logarithm is used in the definition of KL-divergence.
is defined using
(logarithm in base
), whereas
is typically defined with
(logarithm in base 2). Then,
:
Given the above comments, there is an alternative statement of Pinsker's inequality in some literature that relates
information divergence to variation distance:
:
i.e.,
:
in which
:
is the
(non-normalized) variation distance between two
probability density functions and
on the same alphabet
.
This form of Pinsker's inequality shows that "convergence in divergence" is stronger notion than "convergence in variation distance".
A simple proof by
John Pollard is shown by letting
:
:
Here Titu's lemma is also known as
Sedrakyan's inequality
The following inequality is known as Sedrakyan's inequality, Bergström's inequality, Engel's form or Titu's lemma, respectively, referring to the article ''About the applications of one useful inequality'' of Nairi Sedrakyan published in 1997, ...
.
Note that the lower bound from Pinsker's inequality is vacuous for any distributions where
, since the total variation distance is at most
. For such distributions, an alternative bound can be used, due to
Bretagnolle and Huber (see, also, Tsybakov):
:
History
Pinsker first proved the inequality with a greater constant. The inequality in the above form was proved independently by
Kullback,
Csiszár, and
Kemperman.
Inverse problem
A precise inverse of the inequality cannot hold: for every
, there are distributions
with
but
. An easy example is given by the two-point space
with
and
.
However, an inverse inequality holds on finite spaces
with a constant depending on
.
[see Lemma 4.1 in ] More specifically, it can be shown that with the definition
we have for any measure
which is absolutely continuous to
:
As a consequence, if
has full
support (i.e.
for all
), then
:
References
{{Reflist
Further reading
* Thomas M. Cover and Joy A. Thomas: ''Elements of Information Theory'', 2nd edition, Willey-Interscience, 2006
* Nicolo Cesa-Bianchi and Gábor Lugosi: ''Prediction, Learning, and Games'', Cambridge University Press, 2006
Information theory
Probabilistic inequalities>r(x), 2
&\text E_Q
+r(x)/31 \text
\\&= \fracV(p, q)^2.
\end
Here Titu's lemma is also known as
Sedrakyan's inequality
The following inequality is known as Sedrakyan's inequality, Bergström's inequality, Engel's form or Titu's lemma, respectively, referring to the article ''About the applications of one useful inequality'' of Nairi Sedrakyan published in 1997, ...
.
Note that the lower bound from Pinsker's inequality is vacuous for any distributions where
, since the total variation distance is at most
. For such distributions, an alternative bound can be used, due to
Bretagnolle and Huber (see, also, Tsybakov):
:
History
Pinsker first proved the inequality with a greater constant. The inequality in the above form was proved independently by
Kullback,
Csiszár, and
Kemperman.
Inverse problem
A precise inverse of the inequality cannot hold: for every
, there are distributions
with
but
. An easy example is given by the two-point space
with
and
.
However, an inverse inequality holds on finite spaces
with a constant depending on
.
[see Lemma 4.1 in ] More specifically, it can be shown that with the definition
we have for any measure
which is absolutely continuous to
:
As a consequence, if
has full
support (i.e.
for all
), then
:
References
{{Reflist
Further reading
* Thomas M. Cover and Joy A. Thomas: ''Elements of Information Theory'', 2nd edition, Willey-Interscience, 2006
* Nicolo Cesa-Bianchi and Gábor Lugosi: ''Prediction, Learning, and Games'', Cambridge University Press, 2006
Information theory
Probabilistic inequalities