In
statistics, the score (or informant) is the
gradient
In vector calculus, the gradient of a scalar-valued differentiable function of several variables is the vector field (or vector-valued function) \nabla f whose value at a point p is the "direction and rate of fastest increase". If the gr ...
of the
log-likelihood function
The likelihood function (often simply called the likelihood) represents the probability of Realization (probability), random variable realizations conditional on particular values of the statistical parameters. Thus, when evaluated on a Sample (st ...
with respect to the
parameter vector. Evaluated at a particular point of the parameter vector, the score indicates the
steepness of the log-likelihood function and thereby the sensitivity to
infinitesimal changes to the parameter values. If the log-likelihood function is
continuous over the
parameter space The parameter space is the space of possible parameter values that define a particular mathematical model, often a subset of finite-dimensional Euclidean space. Often the parameters are inputs of a function, in which case the technical term for the ...
, the score will
vanish at a local
maximum or minimum; this fact is used in
maximum likelihood estimation
In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of an assumed probability distribution, given some observed data. This is achieved by maximizing a likelihood function so that, under the assumed stati ...
to find the parameter values that maximize the likelihood function.
Since the score is a function of the
observations that are subject to
sampling error
In statistics, sampling errors are incurred when the statistical characteristics of a population are estimated from a subset, or sample, of that population. Since the sample does not include all members of the population, statistics of the sample ...
, it lends itself to a
test statistic
A test statistic is a statistic (a quantity derived from the sample) used in statistical hypothesis testing.Berger, R. L.; Casella, G. (2001). ''Statistical Inference'', Duxbury Press, Second Edition (p.374) A hypothesis test is typically specifie ...
known as ''
score test
In statistics, the score test assesses constraints on statistical parameters based on the gradient of the likelihood function—known as the ''score''—evaluated at the hypothesized parameter value under the null hypothesis. Intuitively, if ...
'' in which the parameter is held at a particular value. Further, the
ratio of two likelihood functions evaluated at two distinct parameter values can be understood as a
definite integral
In mathematics, an integral assigns numbers to functions in a way that describes displacement, area, volume, and other concepts that arise by combining infinitesimal data. The process of finding integrals is called integration. Along with ...
of the score function.
Definition
The score is the
gradient
In vector calculus, the gradient of a scalar-valued differentiable function of several variables is the vector field (or vector-valued function) \nabla f whose value at a point p is the "direction and rate of fastest increase". If the gr ...
(the vector of
partial derivative
In mathematics, a partial derivative of a function of several variables is its derivative with respect to one of those variables, with the others held constant (as opposed to the total derivative, in which all variables are allowed to vary). Pa ...
s) of
, the
natural logarithm
The natural logarithm of a number is its logarithm to the base of the mathematical constant , which is an irrational and transcendental number approximately equal to . The natural logarithm of is generally written as , , or sometimes, if ...
of the
likelihood function
The likelihood function (often simply called the likelihood) represents the probability of random variable realizations conditional on particular values of the statistical parameters. Thus, when evaluated on a given sample, the likelihood funct ...
, with respect to an
m-dimensional parameter vector
.
:
This differentiation yields a
row vector, and indicates the sensitivity of the likelihood (its derivative normalized by its value).
In older literature, "linear score" may refer to the score with respect to infinitesimal translation of a given density. This convention arises from a time when the primary parameter of interest was the mean or median of a distribution. In this case, the likelihood of an observation is given by a density of the form
. The "linear score" is then defined as
:
Properties
Mean
While the score is a function of
, it also depends on the observations
at which the likelihood function is evaluated, and in view of the random character of sampling one may take its
expected value
In probability theory, the expected value (also called expectation, expectancy, mathematical expectation, mean, average, or first moment) is a generalization of the weighted average. Informally, the expected value is the arithmetic mean of a ...
over the
sample space
In probability theory, the sample space (also called sample description space, possibility space, or outcome space) of an experiment or random trial is the set of all possible outcomes or results of that experiment. A sample space is usually de ...
. Under certain regularity conditions on the density functions of the random variables, the expected value of the score, evaluated at the true parameter value
, is zero. To see this, rewrite the likelihood function
as a
probability density function
In probability theory, a probability density function (PDF), or density of a continuous random variable, is a function whose value at any given sample (or point) in the sample space (the set of possible values taken by the random variable) c ...
, and denote the
sample space
In probability theory, the sample space (also called sample description space, possibility space, or outcome space) of an experiment or random trial is the set of all possible outcomes or results of that experiment. A sample space is usually de ...
. Then:
:
The assumed regularity conditions allow the interchange of derivative and integral (see
Leibniz integral rule), hence the above expression may be rewritten as
:
It is worth restating the above result in words: the expected value of the score is zero. Thus, if one were to repeatedly sample from some distribution, and repeatedly calculate the score, then the mean value of the scores would tend to zero
asymptotically
In analytic geometry, an asymptote () of a curve is a line such that the distance between the curve and the line approaches zero as one or both of the ''x'' or ''y'' coordinates tends to infinity. In projective geometry and related contexts, ...
.
Variance
The
variance
In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its population mean or sample mean. Variance is a measure of dispersion, meaning it is a measure of how far a set of number ...
of the score,
, can be derived from the above expression for the expected value.
:
Hence the variance of the score is equal to the negative expected value of the
Hessian matrix
In mathematics, the Hessian matrix or Hessian is a square matrix of second-order partial derivatives of a scalar-valued function, or scalar field. It describes the local curvature of a function of many variables. The Hessian matrix was developed ...
of the log-likelihood.
:
The latter is known as the
Fisher information
In mathematical statistics, the Fisher information (sometimes simply called information) is a way of measuring the amount of information that an observable random variable ''X'' carries about an unknown parameter ''θ'' of a distribution that model ...
and is written
. Note that the Fisher information is not a function of any particular observation, as the random variable
has been averaged out. This concept of information is useful when comparing two methods of observation of some
random process.
Examples
Bernoulli process
Consider observing the first ''n'' trials of a
Bernoulli process
In probability and statistics, a Bernoulli process (named after Jacob Bernoulli) is a finite or infinite sequence of binary random variables, so it is a discrete-time stochastic process that takes only two values, canonically 0 and 1. Th ...
, and seeing that ''A'' of them are successes and the remaining ''B'' are failures, where the probability of success is ''θ''.
Then the likelihood
is
:
so the score ''s'' is
:
We can now verify that the expectation of the score is zero. Noting that the expectation of ''A'' is ''nθ'' and the expectation of ''B'' is ''n''(1 − ''θ'')
ecall that ''A'' and ''B'' are random variables we can see that the expectation of ''s'' is
:
We can also check the variance of
. We know that ''A'' + ''B'' = ''n'' (so ''B'' = ''n'' − ''A'') and the variance of ''A'' is ''nθ''(1 − ''θ'') so the variance of ''s'' is
:
Binary outcome model
For
models with binary outcomes (''Y'' = 1 or 0), the model can be scored with the logarithm of predictions
:
where ''p'' is the probability in the model to be estimated and ''S'' is the score.
Applications
Scoring algorithm
The scoring algorithm is an iterative method for
numerically
Numerical analysis is the study of algorithms that use numerical approximation (as opposed to symbolic manipulations) for the problems of mathematical analysis (as distinguished from discrete mathematics). It is the study of numerical methods th ...
determining the
maximum likelihood
In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of an assumed probability distribution, given some observed data. This is achieved by maximizing a likelihood function so that, under the assumed sta ...
estimator
In statistics, an estimator is a rule for calculating an estimate of a given quantity based on observed data: thus the rule (the estimator), the quantity of interest (the estimand) and its result (the estimate) are distinguished. For example, the ...
.
Score test
Note that
is a function of
and the observation
, so that, in general, it is not a
statistic
A statistic (singular) or sample statistic is any quantity computed from values in a sample which is considered for a statistical purpose. Statistical purposes include estimating a population parameter, describing a sample, or evaluating a hy ...
. However, in certain applications, such as the
score test
In statistics, the score test assesses constraints on statistical parameters based on the gradient of the likelihood function—known as the ''score''—evaluated at the hypothesized parameter value under the null hypothesis. Intuitively, if ...
, the score is evaluated at a specific value of
(such as a null-hypothesis value), in which case the result is a statistic. Intuitively, if the restricted estimator is near the maximum of the likelihood function, the score should not differ from zero by more than
sampling error
In statistics, sampling errors are incurred when the statistical characteristics of a population are estimated from a subset, or sample, of that population. Since the sample does not include all members of the population, statistics of the sample ...
. In 1948,
C. R. Rao first proved that the square of the score divided by the information matrix follows an asymptotic
χ2-distribution under the null hypothesis.
Further note that the
likelihood-ratio test
In statistics, the likelihood-ratio test assesses the goodness of fit of two competing statistical models based on the ratio of their likelihoods, specifically one found by maximization over the entire parameter space and another found after ...
is given by
:
which means that the likelihood-ratio test can be understood as the area under the score function between
and
.
See also
*
Fisher information
In mathematical statistics, the Fisher information (sometimes simply called information) is a way of measuring the amount of information that an observable random variable ''X'' carries about an unknown parameter ''θ'' of a distribution that model ...
*
Information theory
Information theory is the scientific study of the quantification, storage, and communication of information. The field was originally established by the works of Harry Nyquist and Ralph Hartley, in the 1920s, and Claude Shannon in the 1940s. ...
*
Score test
In statistics, the score test assesses constraints on statistical parameters based on the gradient of the likelihood function—known as the ''score''—evaluated at the hypothesized parameter value under the null hypothesis. Intuitively, if ...
*
Scoring algorithm
*
Standard score
In statistics, the standard score is the number of standard deviations by which the value of a raw score (i.e., an observed value or data point) is above or below the mean value of what is being observed or measured. Raw scores above the me ...
*
Support curve
The likelihood function (often simply called the likelihood) represents the probability of random variable realizations conditional on particular values of the statistical parameters. Thus, when evaluated on a given sample, the likelihood functi ...
Notes
References
*
*
*{{cite book
, last = Schervish
, first = Mark J.
, title = Theory of Statistics
, publisher =Springer
, date =1995
, location =New York
, pages = Section 2.3.1
, isbn = 0-387-94546-6
, no-pp = true
Maximum likelihood estimation