In
information geometry
Information geometry is an interdisciplinary field that applies the techniques of differential geometry to study probability theory and statistics. It studies statistical manifolds, which are Riemannian manifolds whose points correspond to prob ...
, the Fisher information metric is a particular
Riemannian metric
In differential geometry, a Riemannian manifold or Riemannian space , so called after the German mathematician Bernhard Riemann, is a real, smooth manifold ''M'' equipped with a positive-definite inner product ''g'p'' on the tangent space ''T ...
which can be defined on a smooth
statistical manifold
In mathematics, a statistical manifold is a Riemannian manifold, each of whose points is a probability distribution. Statistical manifolds provide a setting for the field of information geometry. The Fisher information metric provides a metric on ...
, ''i.e.'', a
smooth manifold
In mathematics, a differentiable manifold (also differential manifold) is a type of manifold that is locally similar enough to a vector space to allow one to apply calculus. Any manifold can be described by a collection of charts (atlas). One ma ...
whose points are
probability measure
In mathematics, a probability measure is a real-valued function defined on a set of events in a probability space that satisfies measure properties such as ''countable additivity''. The difference between a probability measure and the more gener ...
s defined on a common
probability space
In probability theory, a probability space or a probability triple (\Omega, \mathcal, P) is a mathematical construct that provides a formal model of a random process or "experiment". For example, one can define a probability space which models t ...
. It can be used to calculate the informational difference between measurements.
The metric is interesting in several respects. By
Chentsov’s theorem, the Fisher information metric on statistical models is the only Riemannian metric (up to rescaling) that is invariant under
sufficient statistics
In statistics, a statistic is ''sufficient'' with respect to a statistical model and its associated unknown parameter if "no other statistic that can be calculated from the same sample provides any additional information as to the value of the p ...
.
It can also be understood to be the infinitesimal form of the relative entropy (''i.e.'', the
Kullback–Leibler divergence
In mathematical statistics, the Kullback–Leibler divergence (also called relative entropy and I-divergence), denoted D_\text(P \parallel Q), is a type of statistical distance: a measure of how one probability distribution ''P'' is different fro ...
); specifically, it is the
Hessian
A Hessian is an inhabitant of the German state of Hesse.
Hessian may also refer to:
Named from the toponym
*Hessian (soldier), eighteenth-century German regiments in service with the British Empire
**Hessian (boot), a style of boot
**Hessian f ...
of the divergence. Alternately, it can be understood as the metric induced by the flat space
Euclidean metric
In mathematics, the Euclidean distance between two points in Euclidean space is the length of a line segment between the two points.
It can be calculated from the Cartesian coordinates of the points using the Pythagorean theorem, therefore oc ...
, after appropriate changes of variable. When extended to complex
projective Hilbert space In mathematics and the foundations of quantum mechanics, the projective Hilbert space P(H) of a complex Hilbert space H is the set of equivalence classes of non-zero vectors v in H, for the relation \sim on H given by
:w \sim v if and only if v = \ ...
, it becomes the
Fubini–Study metric
In mathematics, the Fubini–Study metric is a Kähler metric on projective Hilbert space, that is, on a complex projective space CP''n'' endowed with a Hermitian form. This metric was originally described in 1904 and 1905 by Guido Fubini and Edua ...
; when written in terms of
mixed states, it is the quantum
Bures metric
In mathematics, in the area of quantum information geometry, the Bures metric (named after Donald Bures) or Helstrom metric (named after Carl W. Helstrom) defines an infinitesimal distance between density matrix operators defining quantum states. ...
.
Considered purely as a matrix, it is known as the
Fisher information matrix
In mathematical statistics, the Fisher information (sometimes simply called information) is a way of measuring the amount of information that an observable random variable ''X'' carries about an unknown parameter ''θ'' of a distribution that model ...
. Considered as a measurement technique, where it is used to estimate hidden parameters in terms of observed random variables, it is known as the
observed information
In statistics, the observed information, or observed Fisher information, is the negative of the second derivative (the Hessian matrix) of the "log-likelihood" (the logarithm of the likelihood function). It is a sample-based version of the Fisher i ...
.
Definition
Given a statistical manifold with coordinates
, one writes
for the probability distribution as a function of
. Here
is drawn from the value space ''R'' for a (discrete or continuous)
random variable
A random variable (also called random quantity, aleatory variable, or stochastic variable) is a mathematical formalization of a quantity or object which depends on random events. It is a mapping or a function from possible outcomes (e.g., the po ...
''X''. The probability is normalized by
The Fisher information metric then takes the form:
:
The integral is performed over all values ''x'' in ''X''. The variable
is now a coordinate on a
Riemann manifold. The labels ''j'' and ''k'' index the local coordinate axes on the manifold.
When the probability is derived from the
Gibbs measure In mathematics, the Gibbs measure, named after Josiah Willard Gibbs, is a probability measure frequently seen in many problems of probability theory and statistical mechanics. It is a generalization of the canonical ensemble to infinite systems.
Th ...
, as it would be for any
Markovian process
A Markov chain or Markov process is a stochastic model describing a sequence of possible events in which the probability of each event depends only on the state attained in the previous event. Informally, this may be thought of as, "What happe ...
, then
can also be understood to be a
Lagrange multiplier
In mathematical optimization, the method of Lagrange multipliers is a strategy for finding the local maxima and minima of a function subject to equality constraints (i.e., subject to the condition that one or more equations have to be satisfied ex ...
; Lagrange multipliers are used to enforce constraints, such as holding the
expectation value
In probability theory, the expected value (also called expectation, expectancy, mathematical expectation, mean, average, or first moment) is a generalization of the weighted average. Informally, the expected value is the arithmetic mean of a ...
of some quantity constant. If there are ''n'' constraints holding ''n'' different expectation values constant, then the dimension of the manifold is ''n'' dimensions smaller than the original space. In this case, the metric can be explicitly derived from the
partition function; a derivation and discussion is presented there.
Substituting
from
information theory
Information theory is the scientific study of the quantification (science), quantification, computer data storage, storage, and telecommunication, communication of information. The field was originally established by the works of Harry Nyquist a ...
, an equivalent form of the above definition is:
:
To show that the equivalent form equals the above definition note that
:
and apply
on both sides.
Relation to the Kullback–Leibler divergence
Alternatively, the metric can be obtained as the second derivative of the ''relative entropy'' or
Kullback–Leibler divergence
In mathematical statistics, the Kullback–Leibler divergence (also called relative entropy and I-divergence), denoted D_\text(P \parallel Q), is a type of statistical distance: a measure of how one probability distribution ''P'' is different fro ...
. To obtain this, one considers two probability distributions
and
, which are infinitesimally close to one another, so that
:
with
an infinitesimally small change of
in the ''j'' direction. Then, since the Kullback–Leibler divergence