In
mathematical statistics
Mathematical statistics is the application of probability theory and other mathematical concepts to statistics, as opposed to techniques for collecting statistical data. Specific mathematical techniques that are commonly used in statistics inc ...
, the Fisher information is a way of measuring the amount of
information
Information is an Abstraction, abstract concept that refers to something which has the power Communication, to inform. At the most fundamental level, it pertains to the Interpretation (philosophy), interpretation (perhaps Interpretation (log ...
that an observable
random variable
A random variable (also called random quantity, aleatory variable, or stochastic variable) is a Mathematics, mathematical formalization of a quantity or object which depends on randomness, random events. The term 'random variable' in its mathema ...
''X'' carries about an unknown parameter ''θ'' of a distribution that models ''X''. Formally, it is the
variance
In probability theory and statistics, variance is the expected value of the squared deviation from the mean of a random variable. The standard deviation (SD) is obtained as the square root of the variance. Variance is a measure of dispersion ...
of the
score, or the
expected value
In probability theory, the expected value (also called expectation, expectancy, expectation operator, mathematical expectation, mean, expectation value, or first Moment (mathematics), moment) is a generalization of the weighted average. Informa ...
of the
observed information.
The role of the Fisher information in the asymptotic theory of
maximum-likelihood estimation
In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of an assumed probability distribution, given some observed data. This is achieved by maximizing a likelihood function so that, under the assumed stati ...
was emphasized and explored by the statistician
Sir Ronald Fisher (following some initial results by
Francis Ysidro Edgeworth
Francis Ysidro Edgeworth (8 February 1845 – 13 February 1926) was an Anglo-Irish philosopher and political economist who made significant contributions to the methods of statistics during the 1880s. From 1891 onward, he was appointed th ...
). The Fisher information matrix is used to calculate the
covariance matrices associated with
maximum-likelihood estimates
In the Westminster system of government, the ''Estimates'' are an outline of government spending for the following fiscal year presented by the Cabinet (government), cabinet to parliament. The Estimates are drawn up by bureaucrats in the finance ...
. It can also be used in the formulation of test statistics, such as the
Wald test.
In
Bayesian statistics
Bayesian statistics ( or ) is a theory in the field of statistics based on the Bayesian interpretation of probability, where probability expresses a ''degree of belief'' in an event. The degree of belief may be based on prior knowledge about ...
, the Fisher information plays a role in the derivation of non-informative
prior distribution
A prior probability distribution of an uncertain quantity, simply called the prior, is its assumed probability distribution before some evidence is taken into account. For example, the prior could be the probability distribution representing the ...
s according to
Jeffreys' rule. It also appears as the large-sample covariance of the
posterior distribution
The posterior probability is a type of conditional probability that results from updating the prior probability with information summarized by the likelihood via an application of Bayes' rule. From an epistemological perspective, the posterior ...
, provided that the prior is sufficiently smooth (a result known as
Bernstein–von Mises theorem
In Bayesian inference, the Bernstein–von Mises theorem provides the basis for using Bayesian credible sets for confidence statements in parametric models. It states that under some conditions, a posterior distribution converges in total variat ...
, which was anticipated by
Laplace
Pierre-Simon, Marquis de Laplace (; ; 23 March 1749 – 5 March 1827) was a French polymath, a scholar whose work has been instrumental in the fields of physics, astronomy, mathematics, engineering, statistics, and philosophy. He summariz ...
for
exponential families). The same result is used when approximating the posterior with
Laplace's approximation
Laplace's approximation provides an analytical expression for a posterior probability distribution by fitting a Gaussian distribution with a mean equal to the MAP solution and precision equal to the observed Fisher information. The approximat ...
, where the Fisher information appears as the covariance of the fitted Gaussian.
Statistical systems of a scientific nature (physical, biological, etc.) whose likelihood functions obey
shift invariance have been shown to obey maximum Fisher information. The level of the maximum depends upon the nature of the system constraints.
Definition
The Fisher information is a way of measuring the amount of information that an observable
random variable
A random variable (also called random quantity, aleatory variable, or stochastic variable) is a Mathematics, mathematical formalization of a quantity or object which depends on randomness, random events. The term 'random variable' in its mathema ...
carries about an unknown
parameter
A parameter (), generally, is any characteristic that can help in defining or classifying a particular system (meaning an event, project, object, situation, etc.). That is, a parameter is an element of a system that is useful, or critical, when ...
upon which the probability of
depends. Let
be the
probability density function
In probability theory, a probability density function (PDF), density function, or density of an absolutely continuous random variable, is a Function (mathematics), function whose value at any given sample (or point) in the sample space (the s ...
(or
probability mass function
In probability and statistics, a probability mass function (sometimes called ''probability function'' or ''frequency function'') is a function that gives the probability that a discrete random variable is exactly equal to some value. Sometimes i ...
) for
conditioned on the value of
. It describes the probability that we observe a given outcome of
, ''given'' a known value of
. If
is sharply peaked with respect to changes in
, it is easy to indicate the "correct" value of
from the data, or equivalently, that the data
provides a lot of information about the parameter
. If
is flat and spread-out, then it would take many samples of
to estimate the actual "true" value of
that ''would'' be obtained using the entire population being sampled. This suggests studying some kind of variance with respect to
.
Formally, the
partial derivative
In mathematics, a partial derivative of a function of several variables is its derivative with respect to one of those variables, with the others held constant (as opposed to the total derivative, in which all variables are allowed to vary). P ...
with respect to
of the
natural logarithm
The natural logarithm of a number is its logarithm to the base of a logarithm, base of the e (mathematical constant), mathematical constant , which is an Irrational number, irrational and Transcendental number, transcendental number approxima ...
of the
likelihood function
A likelihood function (often simply called the likelihood) measures how well a statistical model explains observed data by calculating the probability of seeing that data under different parameter values of the model. It is constructed from the ...
is called the ''
score''. Under certain regularity conditions, if
is the true parameter (i.e.
is actually distributed as
), it can be shown that the
expected value
In probability theory, the expected value (also called expectation, expectancy, expectation operator, mathematical expectation, mean, expectation value, or first Moment (mathematics), moment) is a generalization of the weighted average. Informa ...
(the first
moment) of the score, evaluated at the true parameter value
, is 0:
:
The Fisher information is defined to be the
variance
In probability theory and statistics, variance is the expected value of the squared deviation from the mean of a random variable. The standard deviation (SD) is obtained as the square root of the variance. Variance is a measure of dispersion ...
of the score:
:
Note that
. A random variable carrying high Fisher information implies that the absolute value of the score is often high. The Fisher information is not a function of a particular observation, as the random variable ''X'' has been averaged out.
If is twice differentiable with respect to ''θ'', and under certain additional regularity conditions, then the Fisher information may also be written as
:
Begin by taking the second derivative of
:
:
Now take the expectation value
:
Next, we show that the last term is equal to 0.
:
Therefore,
: