statistics Statistics (from German language, German: ''wikt:Statistik#German, Statistik'', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of ...

, the score (or informant) is the

gradient In vector calculus, the gradient of a scalar-valued differentiable function of several variables is the vector field (or vector-valued function) \nabla f whose value at a point p is the "direction and rate of fastest increase". If the gradi ...

of the log-likelihood function with respect to the parameter vector. Evaluated at a particular point of the parameter vector, the score indicates the

steepness In mathematics, the slope or gradient of a line is a number that describes both the ''direction'' and the ''steepness'' of the line. Slope is often denoted by the letter ''m''; there is no clear answer to the question why the letter ''m'' is use ...

of the log-likelihood function and thereby the sensitivity to

infinitesimal In mathematics, an infinitesimal number is a quantity that is closer to zero than any standard real number, but that is not zero. The word ''infinitesimal'' comes from a 17th-century Modern Latin coinage ''infinitesimus'', which originally referr ...

changes to the parameter values. If the log-likelihood function is

continuous Continuity or continuous may refer to: Mathematics * Continuity (mathematics), the opposing concept to discreteness; common examples include ** Continuous probability distribution or random variable in probability and statistics ** Continuous ...

over the

parameter space The parameter space is the space of possible parameter values that define a particular mathematical model, often a subset of finite-dimensional Euclidean space. Often the parameters are inputs of a function, in which case the technical term for th ...

, the score will vanish at a local maximum or minimum; this fact is used in

maximum likelihood estimation In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of an assumed probability distribution, given some observed data. This is achieved by maximizing a likelihood function so that, under the assumed statis ...

to find the parameter values that maximize the likelihood function. Since the score is a function of the

observations Observation is the active acquisition of information from a primary source. In living beings, observation employs the senses. In science, observation can also involve the perception and recording of data via the use of scientific instrument ...

that are subject to

sampling error In statistics, sampling errors are incurred when the statistical characteristics of a population are estimated from a subset, or sample, of that population. Since the sample does not include all members of the population, statistics of the sample ( ...

, it lends itself to a

test statistic A test statistic is a statistic (a quantity derived from the sample) used in statistical hypothesis testing.Berger, R. L.; Casella, G. (2001). ''Statistical Inference'', Duxbury Press, Second Edition (p.374) A hypothesis test is typically specifi ...

known as ''

score test In statistics, the score test assesses constraints on statistical parameters based on the gradient of the likelihood function—known as the ''score''—evaluated at the hypothesized parameter value under the null hypothesis. Intuitively, if the ...

'' in which the parameter is held at a particular value. Further, the ratio of two likelihood functions evaluated at two distinct parameter values can be understood as a

definite integral In mathematics, an integral assigns numbers to functions in a way that describes displacement, area, volume, and other concepts that arise by combining infinitesimal data. The process of finding integrals is called integration. Along with di ...

of the score function.

Definition

The score is the

(the vector of

partial derivative In mathematics, a partial derivative of a function of several variables is its derivative with respect to one of those variables, with the others held constant (as opposed to the total derivative, in which all variables are allowed to vary). Part ...

s) of

\log \mathcal(\theta)

, the

natural logarithm The natural logarithm of a number is its logarithm to the base of the mathematical constant , which is an irrational and transcendental number approximately equal to . The natural logarithm of is generally written as , , or sometimes, if ...

of the

likelihood function The likelihood function (often simply called the likelihood) represents the probability of random variable realizations conditional on particular values of the statistical parameters. Thus, when evaluated on a given sample, the likelihood funct ...

, with respect to an m-dimensional parameter vector

\theta

. :

s(\theta) \equiv \frac

This differentiation yields a

(1 \times m)

row vector, and indicates the sensitivity of the likelihood (its derivative normalized by its value). In older literature, "linear score" may refer to the score with respect to infinitesimal translation of a given density. This convention arises from a time when the primary parameter of interest was the mean or median of a distribution. In this case, the likelihood of an observation is given by a density of the form

\mathcal L(\theta;X)=f(X+\theta)

. The "linear score" is then defined as :

s_
= \frac \log f(X)

Properties

Mean

While the score is a function of

\theta

, it also depends on the observations

\mathbf = (x_, x_, \ldots x_)

at which the likelihood function is evaluated, and in view of the random character of sampling one may take its

expected value In probability theory, the expected value (also called expectation, expectancy, mathematical expectation, mean, average, or first moment) is a generalization of the weighted average. Informally, the expected value is the arithmetic mean of a l ...

over the

sample space In probability theory, the sample space (also called sample description space, possibility space, or outcome space) of an experiment or random trial is the set of all possible outcomes or results of that experiment. A sample space is usually den ...

. Under certain regularity conditions on the density functions of the random variables, the expected value of the score, evaluated at the true parameter value

\theta

, is zero. To see this, rewrite the likelihood function

\mathcal L

as a

probability density function In probability theory, a probability density function (PDF), or density of a continuous random variable, is a function whose value at any given sample (or point) in the sample space (the set of possible values taken by the random variable) can ...

\mathcal L(\theta; x) = f(x; \theta)

, and denote the

\mathcal

. Then: :

& = \int_ f(x; \theta) \frac\frac\, dx =\int_ \frac \, dx \end

The assumed regularity conditions allow the interchange of derivative and integral (see

Leibniz integral rule In calculus, the Leibniz integral rule for differentiation under the integral sign, named after Gottfried Leibniz, states that for an integral of the form \int_^ f(x,t)\,dt, where -\infty < a(x), b(x) < \infty and the integral are

), hence the above expression may be rewritten as :

\frac \int_
 f(x; \theta) \, dx
=
\frac1 = 0.

It is worth restating the above result in words: the expected value of the score is zero. Thus, if one were to repeatedly sample from some distribution, and repeatedly calculate the score, then the mean value of the scores would tend to zero

asymptotically In analytic geometry, an asymptote () of a curve is a line such that the distance between the curve and the line approaches zero as one or both of the ''x'' or ''y'' coordinates tends to infinity. In projective geometry and related contexts, ...

Variance

The

variance In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its population mean or sample mean. Variance is a measure of dispersion, meaning it is a measure of how far a set of numbers ...

of the score,

\operatorname(s(\theta)) = \operatorname(s(\theta) s(\theta)^)

, can be derived from the above expression for the expected value. :

\mathsf \right) \end

Hence the variance of the score is equal to the negative expected value of the

Hessian matrix In mathematics, the Hessian matrix or Hessian is a square matrix of second-order partial derivatives of a scalar-valued function, or scalar field. It describes the local curvature of a function of many variables. The Hessian matrix was developed ...

of the log-likelihood. :

\operatorname(s(\theta) s(\theta)^) = - \operatorname\left( \frac \right)

The latter is known as the

Fisher information In mathematical statistics, the Fisher information (sometimes simply called information) is a way of measuring the amount of information that an observable random variable ''X'' carries about an unknown parameter ''θ'' of a distribution that model ...

and is written

\mathcal(\theta)

. Note that the Fisher information is not a function of any particular observation, as the random variable

X

has been averaged out. This concept of information is useful when comparing two methods of observation of some

random process In probability theory and related fields, a stochastic () or random process is a mathematical object usually defined as a family of random variables. Stochastic processes are widely used as mathematical models of systems and phenomena that appe ...

Examples

Bernoulli process

Consider observing the first ''n'' trials of a Bernoulli process, and seeing that ''A'' of them are successes and the remaining ''B'' are failures, where the probability of success is ''θ''. Then the likelihood

\mathcal L

is :

\mathcal L(\theta;A,B)=\frac\theta^A(1-\theta)^B,

so the score ''s'' is :

s=\frac\frac = \frac-\frac.

We can now verify that the expectation of the score is zero. Noting that the expectation of ''A'' is ''nθ'' and the expectation of ''B'' is ''n''(1 − ''θ'') ecall that ''A'' and ''B'' are random variables we can see that the expectation of ''s'' is :

E(s)
= \frac - \frac
= n - n 
= 0.

We can also check the variance of

s

. We know that ''A'' + ''B'' = ''n'' (so ''B'' = ''n'' − ''A'') and the variance of ''A'' is ''nθ''(1 − ''θ'') so the variance of ''s'' is :

\begin
\operatorname(s) & =\operatorname\left(\frac-\frac\right)
=\operatorname\left(A\left(\frac+\frac\right)\right) \\
& =\left(\frac+\frac\right)^2\operatorname(A)
=\frac.
\end

Binary outcome model

For models with binary outcomes (''Y'' = 1 or 0), the model can be scored with the logarithm of predictions :

S = Y \log( p ) + ( 1 - Y ) ( \log( 1 - p ) )

where ''p'' is the probability in the model to be estimated and ''S'' is the score.

Applications

Scoring algorithm

The scoring algorithm is an iterative method for numerically determining the

maximum likelihood In statistics, maximum likelihood estimation (MLE) is a method of estimation theory, estimating the Statistical parameter, parameters of an assumed probability distribution, given some observed data. This is achieved by Mathematical optimization, ...

estimator In statistics, an estimator is a rule for calculating an estimate of a given quantity based on observed data: thus the rule (the estimator), the quantity of interest (the estimand) and its result (the estimate) are distinguished. For example, the ...

Score test

Note that

s

is a function of

\theta

and the observation

\mathbf = (x_, x_, \ldots x_)

, so that, in general, it is not a

statistic A statistic (singular) or sample statistic is any quantity computed from values in a sample which is considered for a statistical purpose. Statistical purposes include estimating a population parameter, describing a sample, or evaluating a hypo ...

. However, in certain applications, such as the

, the score is evaluated at a specific value of

\theta

(such as a null-hypothesis value), in which case the result is a statistic. Intuitively, if the restricted estimator is near the maximum of the likelihood function, the score should not differ from zero by more than

. In 1948,

C. R. Rao Calyampudi Radhakrishna Rao FRS (born 10 September 1920), commonly known as C. R. Rao, is an Indian-American mathematician and statistician. He is currently professor emeritus at Pennsylvania State University and Research Professor at the Un ...

first proved that the square of the score divided by the information matrix follows an asymptotic χ²-distribution under the null hypothesis. Further note that the

likelihood-ratio test In statistics, the likelihood-ratio test assesses the goodness of fit of two competing statistical models based on the ratio of their likelihoods, specifically one found by maximization over the entire parameter space and another found after im ...

is given by :

= 2 \int_^ \frac \, d \theta = 2 \int_^ s(\theta) \, d \theta

which means that the likelihood-ratio test can be understood as the area under the score function between

\theta_

and

\hat

Notes

References

* * *{{cite book , last = Schervish , first = Mark J. , title = Theory of Statistics , publisher =Springer , date =1995 , location =New York , pages = Section 2.3.1 , isbn = 0-387-94546-6 , no-pp = true Maximum likelihood estimation