In statistics, maximum likelihood estimation (MLE) is a method of
estimating the
parameters of an assumed
probability distribution
In probability theory and statistics, a probability distribution is the mathematical Function (mathematics), function that gives the probabilities of occurrence of different possible outcomes for an Experiment (probability theory), experiment. ...
, given some observed data. This is achieved by
maximizing a
likelihood function
The likelihood function (often simply called the likelihood) represents the probability of Realization (probability), random variable realizations conditional on particular values of the statistical parameters. Thus, when evaluated on a Sample (st ...
so that, under the assumed
statistical model
A statistical model is a mathematical model that embodies a set of statistical assumptions concerning the generation of Sample (statistics), sample data (and similar data from a larger Statistical population, population). A statistical model repres ...
, the
observed data is most probable. The
point in the
parameter space that maximizes the likelihood function is called the maximum likelihood estimate. The logic of maximum likelihood is both intuitive and flexible, and as such the method has become a dominant means of
statistical inference
Statistical inference is the process of using data analysis to infer properties of an underlying probability distribution, distribution of probability.Upton, G., Cook, I. (2008) ''Oxford Dictionary of Statistics'', OUP. . Inferential statistical ...
.
If the likelihood function is
differentiable
In mathematics
Mathematics is an area of knowledge that includes the topics of numbers, formulas and related structures, shapes and the spaces in which they are contained, and quantities and their changes. These topics are represented in mo ...
, the
derivative test for finding maxima can be applied. In some cases, the first-order conditions of the likelihood function can be solved analytically; for instance, the
ordinary least squares
In statistics, ordinary least squares (OLS) is a type of linear least squares method for choosing the unknown statistical parameter, parameters in a linear regression model (with fixed level-one effects of a linear function of a set of explanatory ...
estimator for a
linear regression
In statistics
Statistics (from German language, German: ''wikt:Statistik#German, Statistik'', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, an ...
model maximizes the likelihood when all observed outcomes are assumed to have
Normal distributions with the same variance.
From the perspective of
Bayesian inference
Bayesian inference is a method of statistical inference in which Bayes' theorem is used to update the probability for a hypothesis as more evidence or information becomes available. Bayesian inference is an important technique in statistics, and e ...
, MLE is generally equivalent to
maximum a posteriori (MAP) estimation with
uniform
A uniform is a variety of clothing worn by members of an organization while participating in that organization's activity. Modern uniforms are most often worn by armed forces and paramilitary organizations such as police, emergency services, se ...
prior distributions (or a
normal prior distribution with a standard deviation of infinity). In
frequentist inference, MLE is a special case of an
extremum estimator, with the objective function being the likelihood.
Principles
We model a set of observations as a random
sample from an unknown joint
probability distribution
In probability theory and statistics, a probability distribution is the mathematical Function (mathematics), function that gives the probabilities of occurrence of different possible outcomes for an Experiment (probability theory), experiment. ...
which is expressed in terms of a set of
parameters. The goal of maximum likelihood estimation is to determine the parameters for which the observed data have the highest joint probability. We write the parameters governing the joint distribution as a vector
so that this distribution falls within a
parametric family
In mathematics and its applications, a parametric family or a parameterized family is a indexed family, family of objects (a set of related objects) whose differences depend only on the chosen values for a set of parameters.
Common examples are p ...
where
is called the ''
parameter space'', a finite-dimensional subset of
Euclidean space
Euclidean space is the fundamental space of geometry, intended to represent physical space. Originally, that is, in Euclid's Elements, Euclid's ''Elements'', it was the three-dimensional space of Euclidean geometry, but in modern mathematics ther ...
. Evaluating the joint density at the observed data sample
gives a real-valued function,
:
which is called the
likelihood function
The likelihood function (often simply called the likelihood) represents the probability of Realization (probability), random variable realizations conditional on particular values of the statistical parameters. Thus, when evaluated on a Sample (st ...
. For
independent and identically distributed random variables,
will be the product of univariate
density functions:
:
The goal of maximum likelihood estimation is to find the values of the model parameters that maximize the likelihood function over the parameter space,
that is
:
Intuitively, this selects the parameter values that make the observed data most probable. The specific value
that maximizes the likelihood function
is called the maximum likelihood estimate. Further, if the function
so defined is
measurable, then it is called the maximum likelihood
estimator
In statistics, an estimator is a rule for calculating an estimate of a given quantity based on observed data: thus the rule (the estimator), the quantity of interest (the estimand) and its result (the estimate) are distinguished. For example, th ...
. It is generally a function defined over the
sample space, i.e. taking a given sample as its argument. A
sufficient but not necessary condition for its existence is for the likelihood function to be
continuous over a parameter space
that is
compact. For an
open
Open or OPEN may refer to:
Music
* Open (band)
Open is a band.
Background
Drummer Pete Neville has been involved in the Sydney/Australian music scene for a number of years. He has recently completed a Masters in screen music at the Australia ...
the likelihood function may increase without ever reaching a supremum value.
In practice, it is often convenient to work with the
natural logarithm
The natural logarithm of a number is its logarithm to the base (exponentiation), base of the mathematical constant , which is an Irrational number, irrational and Transcendental number, transcendental number approximately equal to . The natur ...
of the likelihood function, called the
log-likelihood:
:
Since the logarithm is a
monotonic function, the maximum of
occurs at the same value of
as does the maximum of
If
is
differentiable
In mathematics
Mathematics is an area of knowledge that includes the topics of numbers, formulas and related structures, shapes and the spaces in which they are contained, and quantities and their changes. These topics are represented in mo ...
in
the
necessary conditions for the occurrence of a maximum (or a minimum) are
:
known as the likelihood equations. For some models, these equations can be explicitly solved for
but in general no closed-form solution to the maximization problem is known or available, and an MLE can only be found via
numerical optimization. Another problem is that in finite samples, there may exist multiple
roots for the likelihood equations. Whether the identified root
of the likelihood equations is indeed a (local) maximum depends on whether the matrix of second-order partial and cross-partial derivatives, the so-called
Hessian matrix
:
is
negative semi-definite at
, as this indicates local
concavity. Conveniently, most common
probability distribution
In probability theory and statistics, a probability distribution is the mathematical Function (mathematics), function that gives the probabilities of occurrence of different possible outcomes for an Experiment (probability theory), experiment. ...
s – in particular the
exponential family
In theory of probability, probability and statistics, an exponential family is a parametric model, parametric set of probability distributions of a certain form, specified below. This special form is chosen for mathematical convenience, includin ...
– are
logarithmically concave.
Restricted parameter space
While the domain of the likelihood function—the
parameter space—is generally a finite-dimensional subset of
Euclidean space
Euclidean space is the fundamental space of geometry, intended to represent physical space. Originally, that is, in Euclid's Elements, Euclid's ''Elements'', it was the three-dimensional space of Euclidean geometry, but in modern mathematics ther ...
, additional
restrictions sometimes need to be incorporated into the estimation process. The parameter space can be expressed as
:
where
is a
vector-valued function mapping
into
Estimating the true parameter
belonging to
then, as a practical matter, means to find the maximum of the likelihood function subject to the
constraint
Theoretically, the most natural approach to this
constrained optimization
In mathematical optimization, constrained optimization (in some contexts called constraint optimization) is the process of optimizing an objective function with respect to some variable (mathematics), variables in the presence of Constraint (mathe ...
problem is the method of substitution, that is "filling out" the restrictions
to a set
in such a way that
is a
one-to-one function
In mathematics, an injective function (also known as injection, or one-to-one function) is a function (mathematics), function that maps Distinct (mathematics), distinct elements of its domain to distinct elements; that is, implies . (Equivale ...
from
to itself, and reparameterize the likelihood function by setting
Because of the equivariance of the maximum likelihood estimator, the properties of the MLE apply to the restricted estimates also. For instance, in a
multivariate normal distribution
In probability theory and statistics, the multivariate normal distribution, multivariate Gaussian distribution, or joint normal distribution is a generalization of the one-dimensional (univariate) normal distribution to higher dimensions. One d ...
the
covariance matrix
In probability theory and statistics, a covariance matrix (also known as auto-covariance matrix, dispersion matrix, variance matrix, or variance–covariance matrix) is a square Matrix (mathematics), matrix giving the covariance between ea ...
must be
positive-definite; this restriction can be imposed by replacing
where
is a real
upper triangular matrix and
is its
transpose
In linear algebra, the transpose of a Matrix (mathematics), matrix is an operator which flips a matrix over its diagonal;
that is, it switches the row and column indices of the matrix by producing another matrix, often denoted by (among othe ...
.
In practice, restrictions are usually imposed using the method of Lagrange which, given the constraints as defined above, leads to the ''restricted likelihood equations''
:
and
where
is a column-vector of
Lagrange multipliers and
is the
Jacobian matrix
In vector calculus, the Jacobian matrix (, ) of a vector-valued function of several variables is the matrix (mathematics), matrix of all its first-order partial derivatives. When this matrix is square matrix, square, that is, when the function ...
of partial derivatives.
Naturally, if the constraints are not binding at the maximum, the Lagrange multipliers should be zero. This in turn allows for a statistical test of the "validity" of the constraint, known as the
Lagrange multiplier test.
Properties
A maximum likelihood estimator is an
extremum estimator obtained by maximizing, as a function of ''θ'', the
objective function
In mathematical optimization and decision theory, a loss function or cost function (sometimes also called an error function) is a function that maps an event (probability theory), event or values of one or more variables onto a real number intuiti ...
. If the data are
independent and identically distributed
In probability theory
Probability theory is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expre ...
, then we have
:
this being the sample analogue of the expected log-likelihood