statistical hypothesis testing A statistical hypothesis test is a method of statistical inference used to decide whether the data provide sufficient evidence to reject a particular hypothesis. A statistical hypothesis test typically involves a calculation of a test statistic. T ...

, a uniformly most powerful (UMP) test is a

hypothesis test A statistical hypothesis test is a method of statistical inference used to decide whether the data provide sufficient evidence to reject a particular hypothesis. A statistical hypothesis test typically involves a calculation of a test statistic. ...

which has the greatest

power Power may refer to: Common meanings * Power (physics), meaning "rate of doing work" ** Engine power, the power put out by an engine ** Electric power, a type of energy * Power (social and political), the ability to influence people or events Math ...

1 - \beta

among all possible tests of a given

size Size in general is the Magnitude (mathematics), magnitude or dimensions of a thing. More specifically, ''geometrical size'' (or ''spatial size'') can refer to three geometrical measures: length, area, or volume. Length can be generalized ...

''α''. For example, according to the

Neyman–Pearson lemma In statistics, the Neyman–Pearson lemma describes the existence and uniqueness of the likelihood ratio as a uniformly most powerful test in certain contexts. It was introduced by Jerzy Neyman and Egon Pearson in a paper in 1933. The Neyman–Pea ...

, the likelihood-ratio test is UMP for testing simple (point) hypotheses.

Setting

Let

X

denote a random vector (corresponding to the measurements), taken from a parametrized family of

probability density function In probability theory, a probability density function (PDF), density function, or density of an absolutely continuous random variable, is a Function (mathematics), function whose value at any given sample (or point) in the sample space (the s ...

s or

probability mass function In probability and statistics, a probability mass function (sometimes called ''probability function'' or ''frequency function'') is a function that gives the probability that a discrete random variable is exactly equal to some value. Sometimes i ...

f_(x)

, which depends on the unknown deterministic parameter

\theta \in \Theta

. The parameter space

\Theta

is partitioned into two disjoint sets

\Theta_0

and

\Theta_1

. Let

H_0

denote the hypothesis that

\theta \in \Theta_0

, and let

H_1

denote the hypothesis that

\theta \in \Theta_1

. The binary test of hypotheses is performed using a test function

\varphi(x)

with a reject region

R

(a subset of measurement space). :

\varphi(x) = 
\begin
1 & \text x \in R \\
0 & \text x \in R^c
\end

meaning that

H_1

is in force if the measurement

X \in R

and that

H_0

is in force if the measurement

X\in R^c

. Note that

R \cup R^c

is a disjoint covering of the measurement space.

Formal definition

A test function

\varphi(x)

is UMP of size

\alpha

if for any other test function

\varphi'(x)

satisfying :

\sup_\; \operatorname \theta \alpha'\leq\alpha=\sup_\; \operatorname \theta,

we have :

\forall \theta \in \Theta_1, \quad \operatorname \theta 1 - \beta'(\theta) \leq 1 - \beta(\theta) =\operatorname \theta

The Karlin–Rubin theorem

The Karlin–Rubin theorem can be regarded as an extension of the Neyman–Pearson lemma for composite hypotheses.Casella, G.; Berger, R.L. (2008), ''Statistical Inference'', Brooks/Cole. (Theorem 8.3.17) Consider a scalar measurement having a probability density function parameterized by a scalar parameter ''θ'', and define the likelihood ratio

l(x) = f_(x) / f_(x)

. If

l(x)

is monotone non-decreasing, in

x

, for any pair

\theta_1 \geq \theta_0

(meaning that the greater

x

is, the more likely

H_1

is), then the threshold test: :

\varphi(x) = 
\begin
1 & \text x > x_0 \\
0 & \text x < x_0
\end

:where

x_0

is chosen such that

\operatorname_\varphi(X)=\alpha

is the UMP test of size ''α'' for testing

H_0: \theta \leq \theta_0 \text H_1: \theta > \theta_0 .

Note that exactly the same test is also UMP for testing

H_0: \theta = \theta_0 \text H_1: \theta > \theta_0 .

Important case: exponential family

Although the Karlin-Rubin theorem may seem weak because of its restriction to scalar parameter and scalar measurement, it turns out that there exist a host of problems for which the theorem holds. In particular, the one-dimensional

exponential family In probability and statistics, an exponential family is a parametric set of probability distributions of a certain form, specified below. This special form is chosen for mathematical convenience, including the enabling of the user to calculate ...

s or

s with :

f_\theta(x) = g(\theta) h(x) \exp(\eta(\theta) T(x))

has a monotone non-decreasing likelihood ratio in the

sufficient statistic In statistics, sufficiency is a property of a statistic computed on a sample dataset in relation to a parametric model of the dataset. A sufficient statistic contains all of the information that the dataset provides about the model parameters. It ...

T(x)

, provided that

\eta(\theta)

is non-decreasing.

Example

Let

X=(X_0 ,\ldots , X_)

denote i.i.d. normally distributed

N

-dimensional random vectors with mean

\theta m

and covariance matrix

R

. We then have :

& \exp \left\ \exp \left\ \end

which is exactly in the form of the exponential family shown in the previous section, with the sufficient statistic being :

T(X) = m^T R^ \sum_^X_n.

Thus, we conclude that the test :

\varphi(T) =  \begin 1 & T > t_0 \\ 0 & T < t_0 \end \qquad \operatorname_ \varphi (T) = \alpha

is the UMP test of size

\alpha

for testing

H_0: \theta \leqslant \theta_0

vs.

H_1: \theta > \theta_0

Further discussion

In general, UMP tests do not exist for vector parameters or for two-sided tests (a test in which one hypothesis lies on both sides of the alternative). The reason is that in these situations, the most powerful test of a given size for one possible value of the parameter (e.g. for

\theta_1

where

\theta_1 > \theta_0

) is different from the most powerful test of the same size for a different value of the parameter (e.g. for

\theta_2