The sequential probability ratio test (SPRT) is a specific sequential hypothesis test, developed by

Abraham Wald Abraham Wald (; ; , ; – ) was a Hungarian and American mathematician and statistician who contributed to decision theory, geometry and econometrics, and founded the field of sequential analysis. One of his well-known statistical works was ...

and later proven to be optimal by Wald and Jacob Wolfowitz. Neyman and Pearson's 1933 result inspired Wald to reformulate it as a sequential analysis problem. The Neyman-Pearson lemma, by contrast, offers a

rule of thumb In English language, English, the phrase ''rule of thumb'' refers to an approximate method for doing something, based on practical experience rather than theory. This usage of the phrase can be traced back to the 17th century and has been associat ...

for when all the data is collected (and its likelihood ratio known). While originally developed for use in

quality control Quality control (QC) is a process by which entities review the quality of all factors involved in production. ISO 9000 defines quality control as "a part of quality management focused on fulfilling quality requirements". This approach plac ...

studies in the realm of manufacturing, SPRT has been formulated for use in the computerized testing of human examinees as a termination criterion.

Theory

As in classical

hypothesis testing A statistical hypothesis test is a method of statistical inference used to decide whether the data provide sufficient evidence to reject a particular hypothesis. A statistical hypothesis test typically involves a calculation of a test statistic. T ...

, SPRT starts with a pair of hypotheses, say

H_0

and

H_1

for the

null hypothesis The null hypothesis (often denoted ''H''0) is the claim in scientific research that the effect being studied does not exist. The null hypothesis can also be described as the hypothesis in which no relationship exists between two sets of data o ...

and

alternative hypothesis In statistical hypothesis testing, the alternative hypothesis is one of the proposed propositions in the hypothesis test. In general the goal of hypothesis test is to demonstrate that in the given condition, there is sufficient evidence supporting ...

respectively. They must be specified as follows: :

H_0: p=p_0

H_1: p=p_1

The next step is to calculate the cumulative sum of the log-

likelihood ratio A likelihood function (often simply called the likelihood) measures how well a statistical model explains observed data by calculating the probability of seeing that data under different parameter values of the model. It is constructed from the j ...

\log \Lambda_i

, as new data arrive: with

S_0 = 0

, then, for

i

=1,2,..., :

S_i=S_+ \log \Lambda_i

The

stopping rule In probability theory, in particular in the study of stochastic processes, a stopping time (also Markov time, Markov moment, optional stopping time or optional time ) is a specific type of "random time": a random variable whose value is interpre ...

is a simple thresholding scheme: *

a < S_i < b

: continue monitoring (''critical inequality'') *

S_i \geq b

: Accept

H_1

S_i \leq a

: Accept

H_0

where

a

and

b

(

a<0) depend on the desired

type I and type II errors Type I error, or a false positive, is the erroneous rejection of a true null hypothesis in statistical hypothesis testing. A type II error, or a false negative, is the erroneous failure in bringing about appropriate rejection of a false null hy ...

\alpha

and

\beta

. They may be chosen as follows:

a \approx \log \frac

and

b \approx \log \frac

In other words,

\alpha

and

\beta

must be decided beforehand in order to set the thresholds appropriately. The numerical value will depend on the application. The reason for being only an approximation is that, in the discrete case, the signal may cross the threshold between samples. Thus, depending on the penalty of making an error and the

sampling frequency In signal processing, sampling is the reduction of a continuous-time signal to a discrete-time signal. A common example is the conversion of a sound wave to a sequence of "samples". A sample is a value of the signal at a point in time and/or s ...

, one might set the thresholds more aggressively. The exact bounds are correct in the continuous case.

Example

A textbook example is

parameter estimation Estimation theory is a branch of statistics that deals with estimating the values of parameters based on measured empirical data that has a random component. The parameters describe an underlying physical setting in such a way that their value ...

of a probability distribution function. Consider the

exponential distribution In probability theory and statistics, the exponential distribution or negative exponential distribution is the probability distribution of the distance between events in a Poisson point process, i.e., a process in which events occur continuousl ...

: :

f_\theta(x)= \theta^ e^, \qquad x,\theta>0

The hypotheses are :

\begin H_0: \theta=\theta_0 \\ H_1: \theta=\theta_1\end \qquad \theta_1>\theta_0.

Then the log-likelihood function (LLF) for one sample is :

\begin
\log \Lambda(x)&= \log \left ( \frac \right) \\
&= \log \left ( \frac e^ \right) \\
&= \log \left ( \frac \right) + \log \left (e^ \right) \\
&= -\log \left ( \frac \right ) + \left (\frac - \frac \right ) \\
&= -\log \left ( \frac \right ) + \left ( \frac\right ) x 
\end

The cumulative sum of the LLFs for all is :

S_n=\sum_^n \log \Lambda(x_i)= - n \log \left ( \frac \right ) + \left (\frac \right)\sum_^n x_i

Accordingly, the stopping rule is: :

a<- n \log \left ( \frac \right ) + \left (\frac \right ) \sum_^n x_i After re-arranging we finally find

: a+n \log \left ( \frac \right ) < \left ( \frac \right ) \sum_^n x_i < b+n \log \left ( \frac \right ) The thresholds are simply two

parallel lines In geometry, parallel lines are coplanar infinite straight lines that do not intersect at any point. Parallel planes are planes in the same three-dimensional space that never meet. '' Parallel curves'' are curves that do not touch each oth ...

with

slope In mathematics, the slope or gradient of a Line (mathematics), line is a number that describes the direction (geometry), direction of the line on a plane (geometry), plane. Often denoted by the letter ''m'', slope is calculated as the ratio of t ...

\log ( \theta_1/\theta_0 )

. Sampling should stop when the sum of the samples makes an excursion outside the ''continue-sampling region''.

Applications

Manufacturing

The test is done on the proportion metric, and tests that a variable ''p'' is equal to one of two desired points, ''p₁'' or ''p₂''. The region between these two points is known as the ''indifference region'' (IR). For example, suppose you are performing a quality control study on a factory lot of widgets. Management would like the lot to have 3% or less defective widgets, but 1% or less is the ideal lot that would pass with flying colors. In this example, ''p₁ = 0.01'' and ''p₂ = 0.03'' and the region between them is the IR because management considers these lots to be marginal and is OK with them being classified either way. Widgets would be sampled one at a time from the lot (sequential analysis) until the test determines, within an acceptable error level, that the lot is ideal or should be rejected.

Testing of human examinees

The SPRT is currently the predominant method of classifying examinees in a variable-length

computerized classification test A computerized classification test (CCT) refers to a Performance Appraisal System that is administered by computer for the purpose of classifying examinees. The most common CCT is a mastery test where the test classifies examinees as "Pass" or "Fa ...

(CCT). The two parameters are ''p₁'' and ''p₂'' are specified by determining a cutscore (threshold) for examinees on the proportion correct metric, and selecting a point above and below that cutscore. For instance, suppose the cutscore is set at 70% for a test. We could select ''p₁ = 0.65'' and ''p₂ = 0.75'' . The test then evaluates the likelihood that an examinee's true score on that metric is equal to one of those two points. If the examinee is determined to be at 75%, they pass, and they fail if they are determined to be at 65%. These points are not specified completely arbitrarily. A cutscore should always be set with a legally defensible method, such as a modified Angoff procedure. Again, the indifference region represents the region of scores that the test designer is OK with going either way (pass or fail). The upper parameter ''p₂'' is conceptually the highest level that the test designer is willing to accept for a Fail (because everyone below it has a good chance of failing), and the lower parameter ''p₁'' is the lowest level that the test designer is willing to accept for a pass (because everyone above it has a decent chance of passing). While this definition may seem to be a relatively small burden, consider the high-stakes case of a licensing test for medical doctors: at just what point should we consider somebody to be at one of these two levels? While the SPRT was first applied to testing in the days of

classical test theory Classical test theory (CTT) is a body of related psychometric theory that predicts outcomes of psychological Test (assessment), testing such as the difficulty of items or the ability of test-takers. It is a theory of testing based on the idea that ...

, as is applied in the previous paragraph, Reckase (1983) suggested that

item response theory In psychometrics, item response theory (IRT, also known as latent trait theory, strong true score theory, or modern mental test theory) is a paradigm for the design, analysis, and scoring of Test (student assessment), tests, questionnaires, and sim ...

be used to determine the ''p₁'' and ''p₂'' parameters. The cutscore and indifference region are defined on the latent ability (theta) metric, and translated onto the proportion metric for computation. Research on CCT since then has applied this methodology for several reasons: #Large item banks tend to be calibrated with IRT #This allows more accurate specification of the parameters #By using the item response function for each item, the parameters are easily allowed to vary between items.

Detection of anomalous medical outcomes

Spiegelhalter et al. have shown that SPRT can be used to monitor the performance of doctors, surgeons and other medical practitioners in such a way as to give early warning of potentially anomalous results. In their 2003 paper, they showed how it could have helped identify

Harold Shipman Harold Frederick Shipman (14 January 1946 – 13 January 2004), known to acquaintances as Fred Shipman, was an English doctor in general practice and serial killer. He is considered to be one of the most prolific serial killers in modern ...

as a murderer well before he was actually identified.

Extensions

MaxSPRT

More recently, in 2011, an extension of the SPRT method called Maximized Sequential Probability Ratio Test (MaxSPRT) was introduced. The salient feature of MaxSPRT is the allowance of a composite, one-sided alternative hypothesis, and the introduction of an upper stopping boundary. The method has been used in several medical research studies.2nd to last paragraph of section 1: http://www.tandfonline.com/doi/full/10.1080/07474946.2011.539924 A Maximized Sequential Probability Ratio Test for Drug and Vaccine Safety Surveillance Kulldorff, M. et al ''Sequential Analysis: Design Methods and Applications'' vol 30, issue 1

References

External links

Wald's Sequential Probability Ratio Test
for R by Stéphane Bottine
Wald's Sequential Probability Ratio Test
for

Python Python may refer to: Snakes * Pythonidae, a family of nonvenomous snakes found in Africa, Asia, and Australia ** ''Python'' (genus), a genus of Pythonidae found in Africa and Asia * Python (mythology), a mythical serpent Computing * Python (prog ...

by Zhenning Yu Statistical tests Sequential methods Mathematical psychology