In statistics, maximum likelihood estimation (MLE) is a method of
estimating the
parameters of an assumed
probability distribution, given some observed data. This is achieved by
maximizing a
likelihood function
The likelihood function (often simply called the likelihood) represents the probability of random variable realizations conditional on particular values of the statistical parameters. Thus, when evaluated on a given sample, the likelihood functi ...
so that, under the assumed
statistical model
A statistical model is a mathematical model that embodies a set of statistical assumptions concerning the generation of sample data (and similar data from a larger population). A statistical model represents, often in considerably idealized form ...
, the
observed data is most probable. The
point in the
parameter space The parameter space is the space of possible parameter values that define a particular mathematical model, often a subset of finite-dimensional Euclidean space. Often the parameters are inputs of a function, in which case the technical term for ...
that maximizes the likelihood function is called the maximum likelihood estimate. The logic of maximum likelihood is both intuitive and flexible, and as such the method has become a dominant means of
statistical inference
Statistical inference is the process of using data analysis to infer properties of an underlying distribution of probability.Upton, G., Cook, I. (2008) ''Oxford Dictionary of Statistics'', OUP. . Inferential statistical analysis infers properti ...
.
If the likelihood function is
differentiable, the
derivative test
In calculus, a derivative test uses the derivatives of a function to locate the critical points of a function and determine whether each point is a local maximum, a local minimum, or a saddle point. Derivative tests can also give information ab ...
for finding maxima can be applied. In some cases, the first-order conditions of the likelihood function can be solved analytically; for instance, the
ordinary least squares
In statistics, ordinary least squares (OLS) is a type of linear least squares method for choosing the unknown parameters in a linear regression model (with fixed level-one effects of a linear function of a set of explanatory variables) by the ...
estimator for a
linear regression model maximizes the likelihood when all observed outcomes are assumed to have
Normal distributions with the same variance.
From the perspective of
Bayesian inference, MLE is generally equivalent to
maximum a posteriori (MAP) estimation with
uniform prior distributions (or a
normal prior distribution with a standard deviation of infinity). In
frequentist inference, MLE is a special case of an
extremum estimator In statistics and econometrics, extremum estimators are a wide class of estimators for parametric models that are calculated through maximization (or minimization) of a certain objective function, which depends on the data. The general theory of e ...
, with the objective function being the likelihood.
Principles
We model a set of observations as a random
sample
Sample or samples may refer to:
Base meaning
* Sample (statistics), a subset of a population – complete data set
* Sample (signal), a digital discrete sample of a continuous analog signal
* Sample (material), a specimen or small quantity of ...
from an unknown joint
probability distribution which is expressed in terms of a set of
parameters. The goal of maximum likelihood estimation is to determine the parameters for which the observed data have the highest joint probability. We write the parameters governing the joint distribution as a vector
so that this distribution falls within a
parametric family where
is called the ''
parameter space The parameter space is the space of possible parameter values that define a particular mathematical model, often a subset of finite-dimensional Euclidean space. Often the parameters are inputs of a function, in which case the technical term for ...
'', a finite-dimensional subset of
Euclidean space
Euclidean space is the fundamental space of geometry, intended to represent physical space. Originally, that is, in Euclid's ''Elements'', it was the three-dimensional space of Euclidean geometry, but in modern mathematics there are Euclidea ...
. Evaluating the joint density at the observed data sample
gives a real-valued function,
:
which is called the
likelihood function
The likelihood function (often simply called the likelihood) represents the probability of random variable realizations conditional on particular values of the statistical parameters. Thus, when evaluated on a given sample, the likelihood functi ...
. For
independent and identically distributed random variables,
will be the product of univariate
density functions:
:
The goal of maximum likelihood estimation is to find the values of the model parameters that maximize the likelihood function over the parameter space,
that is
:
Intuitively, this selects the parameter values that make the observed data most probable. The specific value
that maximizes the likelihood function
is called the maximum likelihood estimate. Further, if the function
so defined is
measurable, then it is called the maximum likelihood
estimator. It is generally a function defined over the
sample space, i.e. taking a given sample as its argument. A
sufficient but not necessary condition for its existence is for the likelihood function to be
continuous
Continuity or continuous may refer to:
Mathematics
* Continuity (mathematics), the opposing concept to discreteness; common examples include
** Continuous probability distribution or random variable in probability and statistics
** Continuous g ...
over a parameter space
that is
compact
Compact as used in politics may refer broadly to a pact or treaty; in more specific cases it may refer to:
* Interstate compact
* Blood compact, an ancient ritual of the Philippines
* Compact government, a type of colonial rule utilized in Britis ...
. For an
open
Open or OPEN may refer to:
Music
* Open (band), Australian pop/rock band
* The Open (band), English indie rock band
* Open (Blues Image album), ''Open'' (Blues Image album), 1969
* Open (Gotthard album), ''Open'' (Gotthard album), 1999
* Open (C ...
the likelihood function may increase without ever reaching a supremum value.
In practice, it is often convenient to work with the
natural logarithm of the likelihood function, called the
log-likelihood:
:
Since the logarithm is a
monotonic function, the maximum of
occurs at the same value of
as does the maximum of
If
is
differentiable in
the
necessary conditions for the occurrence of a maximum (or a minimum) are
:
known as the likelihood equations. For some models, these equations can be explicitly solved for
but in general no closed-form solution to the maximization problem is known or available, and an MLE can only be found via
numerical optimization
Mathematical optimization (alternatively spelled ''optimisation'') or mathematical programming is the selection of a best element, with regard to some criterion, from some set of available alternatives. It is generally divided into two subfi ...
. Another problem is that in finite samples, there may exist multiple
roots
A root is the part of a plant, generally underground, that anchors the plant body, and absorbs and stores water and nutrients.
Root or roots may also refer to:
Art, entertainment, and media
* ''The Root'' (magazine), an online magazine focusing ...
for the likelihood equations. Whether the identified root
of the likelihood equations is indeed a (local) maximum depends on whether the matrix of second-order partial and cross-partial derivatives, the so-called
Hessian matrix
:
is
negative semi-definite at
, as this indicates local
concavity
In calculus, the second derivative, or the second order derivative, of a function is the derivative of the derivative of . Roughly speaking, the second derivative measures how the rate of change of a quantity is itself changing; for example, ...
. Conveniently, most common
probability distributions – in particular the
exponential family – are
logarithmically concave.
Restricted parameter space
While the domain of the likelihood function—the
parameter space The parameter space is the space of possible parameter values that define a particular mathematical model, often a subset of finite-dimensional Euclidean space. Often the parameters are inputs of a function, in which case the technical term for ...
—is generally a finite-dimensional subset of
Euclidean space
Euclidean space is the fundamental space of geometry, intended to represent physical space. Originally, that is, in Euclid's ''Elements'', it was the three-dimensional space of Euclidean geometry, but in modern mathematics there are Euclidea ...
, additional
restrictions sometimes need to be incorporated into the estimation process. The parameter space can be expressed as
:
where
is a
vector-valued function mapping
into
Estimating the true parameter
belonging to
then, as a practical matter, means to find the maximum of the likelihood function subject to the
constraint
Theoretically, the most natural approach to this
constrained optimization problem is the method of substitution, that is "filling out" the restrictions
to a set
in such a way that
is a
one-to-one function from
to itself, and reparameterize the likelihood function by setting
Because of the equivariance of the maximum likelihood estimator, the properties of the MLE apply to the restricted estimates also. For instance, in a
multivariate normal distribution the
covariance matrix must be
positive-definite; this restriction can be imposed by replacing
where
is a real
upper triangular matrix
In mathematics, a triangular matrix is a special kind of square matrix. A square matrix is called if all the entries ''above'' the main diagonal are zero. Similarly, a square matrix is called if all the entries ''below'' the main diagonal are ...
and
is its
transpose.
In practice, restrictions are usually imposed using the method of Lagrange which, given the constraints as defined above, leads to the ''restricted likelihood equations''
:
and
where
is a column-vector of
Lagrange multipliers and
is the
Jacobian matrix of partial derivatives.
Naturally, if the constraints are not binding at the maximum, the Lagrange multipliers should be zero. This in turn allows for a statistical test of the "validity" of the constraint, known as the
Lagrange multiplier test.
Properties
A maximum likelihood estimator is an
extremum estimator In statistics and econometrics, extremum estimators are a wide class of estimators for parametric models that are calculated through maximization (or minimization) of a certain objective function, which depends on the data. The general theory of e ...
obtained by maximizing, as a function of ''θ'', the
objective function . If the data are
independent and identically distributed, then we have
:
this being the sample analogue of the expected log-likelihood