Boschloo's test is a
statistical hypothesis test
A statistical hypothesis test is a method of statistical inference used to decide whether the data at hand sufficiently support a particular hypothesis.
Hypothesis testing allows us to make probabilistic statements about population parameters.
...
for analysing 2x2
contingency tables
In statistics, a contingency table (also known as a cross tabulation or crosstab) is a type of table in a matrix format that displays the (multivariate) frequency distribution of the variables. They are heavily used in survey research, business ...
. It examines the association of two
Bernoulli distributed random variables
A random variable (also called random quantity, aleatory variable, or stochastic variable) is a mathematical formalization of a quantity or object which depends on random events. It is a mapping or a function from possible outcomes (e.g., the po ...
and is a uniformly more
powerful alternative to
Fisher's exact test
Fisher's exact test is a statistical significance test used in the analysis of contingency tables. Although in practice it is employed when sample sizes are small, it is valid for all sample sizes. It is named after its inventor, Ronald Fisher, ...
. It was proposed in 1970 by R. D. Boschloo.
Setting
A 2x2 contingency table visualizes
independent observations of two binary variables
and
:
:
The probability distribution of such tables can be classified into three distinct cases.
# The row sums
and column sums
are fixed in advance and not random.
Then all
are determined by
. If
and
are independent,
follows a
hypergeometric distribution
In probability theory and statistics, the hypergeometric distribution is a discrete probability distribution that describes the probability of k successes (random draws for which the object drawn has a specified feature) in n draws, ''without'' ...
with parameters
:
.
# The row sums
are fixed in advance but the column sums
are not.
Then all random parameters are determined by
and
and
follow a
binomial distribution
In probability theory and statistics, the binomial distribution with parameters ''n'' and ''p'' is the discrete probability distribution of the number of successes in a sequence of ''n'' independent experiments, each asking a yes–no quest ...
with probabilities
:
# Only the total number
is fixed but the row sums
and the column sums
are not.
Then the random vector
follows a
multinomial distribution
In probability theory, the multinomial distribution is a generalization of the binomial distribution. For example, it models the probability of counts for each side of a ''k''-sided dice rolled ''n'' times. For ''n'' independent trials each of w ...
with probability vector
.
Fisher's exact test
Fisher's exact test is a statistical significance test used in the analysis of contingency tables. Although in practice it is employed when sample sizes are small, it is valid for all sample sizes. It is named after its inventor, Ronald Fisher, ...
is designed for the first case and therefore an
exact
Exact may refer to:
* Exaction, a concept in real property law
* ''Ex'Act'', 2016 studio album by Exo
* Schooner Exact, the ship which carried the founders of Seattle
Companies
* Exact (company), a Dutch software company
* Exact Change, an Ameri ...
conditional test (because it conditions on the column sums). The typical example of such a case is the
Lady tasting tea
In the design of experiments in statistics, the lady tasting tea is a randomized experiment devised by Ronald Fisher and reported in his book ''The Design of Experiments'' (1935). The experiment is the original exposition of Fisher's notion of a ...
: A lady tastes 8 cups of tea with milk. In 4 of those cups the milk is poured in before the tea. In the other 4 cups the tea is poured in first. The lady tries to assign the cups to the two categories. Following our notation, the random variable
represents the used method (1 = milk first, 0 = milk last) and
represents the lady's guesses (1 = milk first guessed, 0 = milk last guessed). Then the row sums are the fixed numbers of cups prepared with each method:
. The lady knows that there are 4 cups in each category, so will assign 4 cups to each method. Thus, the column sums are also fixed in advance:
. If she is not able to tell the difference,
and
are independent and the number
of correctly classified cups with milk first follows the hypergeometric distribution
.
Boschloo's test is designed for the second case and therefore an exact unconditional test. Examples of such a case are often found in medical research, where a binary
endpoint
An endpoint, end-point or end point may refer to:
* Endpoint (band), a hardcore punk band from Louisville, Kentucky
* Endpoint (chemistry), the conclusion of a chemical reaction, particularly for titration
* Outcome measure, a measure used as an e ...
is compared between two patient groups. Following our notation,
represents the first group that receives some medication of interest.
represents the second group that receives a
placebo
A placebo ( ) is a substance or treatment which is designed to have no therapeutic value. Common placebos include inert tablets (like sugar pills), inert injections (like Saline (medicine), saline), sham surgery, and other procedures.
In general ...
.
indicates the cure of a patient (1 = cure, 0 = no cure). Then the row sums equal the group sizes and are usually fixed in advance. The column sums are the total number of cures respectively disease continuations and not fixed in advance.
An example for the third case can be constructed as follows: Simultaneously flip two distinguishable coins
and
and do this
times. If we count the number of results in our 2x2 table (1 = head, 0 = tail), we neither know in advance how often coin
shows head or tail (row sums random), nor do we know how often coin
shows head or tail (column sums random).
Test hypothesis
The
null hypothesis
In scientific research, the null hypothesis (often denoted ''H''0) is the claim that no difference or relationship exists between two sets of data or variables being analyzed. The null hypothesis is that any experimentally observed difference is d ...
of Boschloo's
one-tailed test
In statistical significance testing, a one-tailed test and a two-tailed test are alternative ways of computing the statistical significance of a parameter inferred from a data set, in terms of a test statistic. A two-tailed test is appropriate i ...
(high values of
favor the alternative hypothesis) is:
:
The null hypothesis of the one-tailed test can also be formulated in the other direction (small values of
favor the alternative hypothesis):
:
The null hypothesis of the two-tailed test is:
:
There is no universal definition of the two-tailed version of Fisher's exact test.
Since Boschloo's test is based on Fisher's exact test, a universal two-tailed version of Boschloo's test also doesn't exist. In the following we deal with the one-tailed test and
.
Boschloo's idea
We denote the desired
significance level
In statistical hypothesis testing, a result has statistical significance when it is very unlikely to have occurred given the null hypothesis (simply by chance alone). More precisely, a study's defined significance level, denoted by \alpha, is the ...
by
. Fisher's exact test is a conditional test and appropriate for the first of the above mentioned cases. But if we treat the observed column sum
as fixed in advance, Fisher's exact test can also be applied to the second case. The true
size
Size in general is the Magnitude (mathematics), magnitude or dimensions of a thing. More specifically, ''geometrical size'' (or ''spatial size'') can refer to linear dimensions (length, width, height, diameter, perimeter), area, or volume ...
of the test then depends on the
nuisance parameters and
. It can be shown that the size maximum
is taken for equal proportions
and is still controlled by
.
However, Boschloo stated that for small sample sizes, the maximal size is often considerably smaller than
. This leads to an undesirable loss of
power
Power most often refers to:
* Power (physics), meaning "rate of doing work"
** Engine power, the power put out by an engine
** Electric power
* Power (social and political), the ability to influence people or events
** Abusive power
Power may a ...
.
Boschloo proposed to use Fisher's exact test with a greater nominal level
. Here,
should be chosen as large as possible such that the maximal size is still controlled by
:
. This method was especially advantageous at the time of Boschloo's publication because
could be looked up for common values of
and
. This made performing Boschloo's test computationally easy.
Test statistic
The
decision rule
In decision theory, a decision rule is a function which maps an observation to an appropriate action. Decision rules play an important role in the theory of statistics and economics, and are closely related to the concept of a strategy (game theory ...
of Boschloo's approach is based on Fisher's exact test. An equivalent way of formulating the test is to use the p-value of Fisher's exact test as
test statistic
A test statistic is a statistic (a quantity derived from the sample) used in statistical hypothesis testing.Berger, R. L.; Casella, G. (2001). ''Statistical Inference'', Duxbury Press, Second Edition (p.374) A hypothesis test is typically specif ...
. Fisher's p-value is calculated from the hypergeometric distribution (for ease of notation we write
instead of
):
:
The distribution of
is determined by the binomial distributions of
and
and depends on the unknown nuisance parameter
. For a specified significance level
the
critical value
Critical value may refer to:
*In differential topology, a critical value of a differentiable function between differentiable manifolds is the image (value of) ƒ(''x'') in ''N'' of a critical point ''x'' in ''M''.
*In statistical hypothesis ...
of
is the maximal value
that satisfies
. The critical value
is equal to the nominal level of Boschloo's original approach.
Modification
Boschloo's test deals with the unknown nuisance parameter
by taking the maximum over the whole parameter space