A likelihood function (often simply called the likelihood) measures how well a
statistical model
A statistical model is a mathematical model that embodies a set of statistical assumptions concerning the generation of Sample (statistics), sample data (and similar data from a larger Statistical population, population). A statistical model repre ...
explains
observed data by calculating the probability of seeing that data under different
parameter
A parameter (), generally, is any characteristic that can help in defining or classifying a particular system (meaning an event, project, object, situation, etc.). That is, a parameter is an element of a system that is useful, or critical, when ...
values of the model. It is constructed from the
joint probability distribution
A joint or articulation (or articular surface) is the connection made between bones, ossicles, or other hard structures in the body which link an animal's skeletal system into a functional whole.Saladin, Ken. Anatomy & Physiology. 7th ed. McGraw- ...
of the
random variable
A random variable (also called random quantity, aleatory variable, or stochastic variable) is a Mathematics, mathematical formalization of a quantity or object which depends on randomness, random events. The term 'random variable' in its mathema ...
that (presumably) generated the observations. When evaluated on the actual data points, it becomes a function solely of the model parameters.
In
maximum likelihood estimation
In statistics, maximum likelihood estimation (MLE) is a method of estimation theory, estimating the Statistical parameter, parameters of an assumed probability distribution, given some observed data. This is achieved by Mathematical optimization, ...
, the
argument that maximizes the likelihood function serves as a
point estimate for the unknown parameter, while the
Fisher information
In mathematical statistics, the Fisher information is a way of measuring the amount of information that an observable random variable ''X'' carries about an unknown parameter ''θ'' of a distribution that models ''X''. Formally, it is the variance ...
(often approximated by the likelihood's
Hessian matrix
In mathematics, the Hessian matrix, Hessian or (less commonly) Hesse matrix is a square matrix of second-order partial derivatives of a scalar-valued Function (mathematics), function, or scalar field. It describes the local curvature of a functio ...
at the maximum) gives an indication of the estimate's
precision.
In contrast, in
Bayesian statistics
Bayesian statistics ( or ) is a theory in the field of statistics based on the Bayesian interpretation of probability, where probability expresses a ''degree of belief'' in an event. The degree of belief may be based on prior knowledge about ...
, the estimate of interest is the ''converse'' of the likelihood, the so-called
posterior probability
The posterior probability is a type of conditional probability that results from updating the prior probability with information summarized by the likelihood via an application of Bayes' rule. From an epistemological perspective, the posteri ...
of the parameter given the observed data, which is calculated via
Bayes' rule.
Definition
The likelihood function, parameterized by a (possibly multivariate) parameter
, is usually defined differently for
discrete and continuous probability distributions
In probability theory and statistics, a probability distribution is a function that gives the probabilities of occurrence of possible events for an experiment. It is a mathematical description of a random phenomenon in terms of its sample spac ...
(a more general definition is discussed below). Given a probability density or mass function
where
is a realization of the random variable
, the likelihood function is
often written
In other words, when
is viewed as a function of
with
fixed, it is a probability density function, and when viewed as a function of
with
fixed, it is a likelihood function. In the
frequentist paradigm, the notation
is often avoided and instead
or
are used to indicate that
is regarded as a fixed unknown quantity rather than as a
random variable
A random variable (also called random quantity, aleatory variable, or stochastic variable) is a Mathematics, mathematical formalization of a quantity or object which depends on randomness, random events. The term 'random variable' in its mathema ...
being conditioned on.
The likelihood function does ''not'' specify the probability that
is the truth, given the observed sample
. Such an interpretation is a common error, with potentially disastrous consequences (see
prosecutor's fallacy).
Discrete probability distribution
Let
be a discrete
random variable
A random variable (also called random quantity, aleatory variable, or stochastic variable) is a Mathematics, mathematical formalization of a quantity or object which depends on randomness, random events. The term 'random variable' in its mathema ...
with
probability mass function
In probability and statistics, a probability mass function (sometimes called ''probability function'' or ''frequency function'') is a function that gives the probability that a discrete random variable is exactly equal to some value. Sometimes i ...
depending on a parameter
. Then the function
considered as a function of
, is the ''likelihood function'', given the
outcome of the random variable
. Sometimes the probability of "the value
of
for the parameter value
" is written as or . The likelihood is the probability that a particular outcome
is observed when the true value of the parameter is
, equivalent to the probability mass on
; it is ''not'' a probability density over the parameter
. The likelihood,
, should not be confused with
, which is the posterior probability of
given the data
.
Example
Consider a simple statistical model of a coin flip: a single parameter
that expresses the "fairness" of the coin. The parameter is the probability that a coin lands heads up ("H") when tossed.
can take on any value within the range 0.0 to 1.0. For a perfectly
fair coin
In probability theory and statistics, a sequence of Independence (probability theory), independent Bernoulli trials with probability 1/2 of success on each trial is metaphorically called a fair coin. One for which the probability is not 1/2 is ca ...
,
.
Imagine flipping a fair coin twice, and observing two heads in two tosses ("HH"). Assuming that each successive coin flip is
i.i.d., then the probability of observing HH is
Equivalently, the likelihood of observing "HH" assuming
is
This is not the same as saying that
, a conclusion which could only be reached via
Bayes' theorem
Bayes' theorem (alternatively Bayes' law or Bayes' rule, after Thomas Bayes) gives a mathematical rule for inverting Conditional probability, conditional probabilities, allowing one to find the probability of a cause given its effect. For exampl ...
given knowledge about the marginal probabilities
and
.
Now suppose that the coin is not a fair coin, but instead that
. Then the probability of two heads on two flips is
Hence
More generally, for each value of
, we can calculate the corresponding likelihood. The result of such calculations is displayed in Figure 1. The integral of
over
, 1is 1/3; likelihoods need not integrate or sum to one over the parameter space.
Continuous probability distribution
Let
be a
random variable
A random variable (also called random quantity, aleatory variable, or stochastic variable) is a Mathematics, mathematical formalization of a quantity or object which depends on randomness, random events. The term 'random variable' in its mathema ...
following an
absolutely continuous probability distribution
In probability theory and statistics, a probability distribution is a function that gives the probabilities of occurrence of possible events for an experiment. It is a mathematical description of a random phenomenon in terms of its sample spac ...
with
density function
In probability theory, a probability density function (PDF), density function, or density of an absolutely continuous random variable, is a Function (mathematics), function whose value at any given sample (or point) in the sample space (the s ...
(a function of
) which depends on a parameter
. Then the function
considered as a function of
, is the ''likelihood function'' (of
, given the
outcome ). Again,
is not a probability density or mass function over
, despite being a function of
given the observation
.
Relationship between the likelihood and probability density functions
The use of the
probability density in specifying the likelihood function above is justified as follows. Given an observation
, the likelihood for the interval