Information Projection
   HOME

TheInfoList



OR:

In information theory, the information projection or I-projection of a probability distribution ''q'' onto a set of distributions ''P'' is :p^* = \underset \operatorname_(p, , q). where D_ is the
Kullback–Leibler divergence In mathematical statistics, the Kullback–Leibler divergence (also called relative entropy and I-divergence), denoted D_\text(P \parallel Q), is a type of statistical distance: a measure of how one probability distribution ''P'' is different fr ...
from ''q'' to ''p''. Viewing the Kullback–Leibler divergence as a measure of distance, the I-projection p^* is the "closest" distribution to ''q'' of all the distributions in ''P''. The I-projection is useful in setting up
information geometry Information geometry is an interdisciplinary field that applies the techniques of differential geometry to study probability theory and statistics. It studies statistical manifolds, which are Riemannian manifolds whose points correspond to pro ...
, notably because of the following inequality, valid when ''P'' is convex: \operatorname_(p, , q) \geq \operatorname_(p, , p^*) + \operatorname_(p^*, , q). This inequality can be interpreted as an information-geometric version of Pythagoras' triangle-inequality theorem, where KL divergence is viewed as squared distance in a Euclidean space. It is worthwhile to note that since \operatorname_(p, , q) \geq 0 and continuous in p, if ''P'' is closed and non-empty, then there exists at least one minimizer to the optimization problem framed above. Furthermore, if ''P'' is convex, then the optimum distribution is unique. The reverse I-projection also known as moment projection or M-projection is :p^* = \underset \operatorname_(q, , p). Since the KL divergence is not symmetric in its arguments, the I-projection and the M-projection will exhibit different behavior. For I-projection, p(x) will typically under-estimate the support of q(x) and will lock onto one of its modes. This is due to p(x)=0 , whenever q(x)=0 to make sure KL divergence stays finite. For M-projection, p(x) will typically over-estimate the support of q(x) . This is due to p(x) > 0 whenever q(x) > 0 to make sure KL divergence stays finite. The reverse I-projection plays a fundamental role in the construction of optimal e-variables. The concept of information projection can be extended to arbitrary ''f''-divergences and other
divergence In vector calculus, divergence is a vector operator that operates on a vector field, producing a scalar field giving the quantity of the vector field's source at each point. More technically, the divergence represents the volume density of t ...
s.


See also

*
Sanov's theorem In mathematics and information theory, Sanov's theorem gives a bound on the probability of observing an atypical sequence of samples from a given probability distribution. In the language of large deviations theory, Sanov's theorem identifies t ...


References

*K. Murphy, "Machine Learning: a Probabilistic Perspective", The MIT Press, 2012. Information theory {{probability-stub