Boschloo's test is a

statistical hypothesis test A statistical hypothesis test is a method of statistical inference used to decide whether the data at hand sufficiently support a particular hypothesis. Hypothesis testing allows us to make probabilistic statements about population parameters. ...

for analysing 2x2

contingency tables In statistics, a contingency table (also known as a cross tabulation or crosstab) is a type of table in a matrix format that displays the (multivariate) frequency distribution of the variables. They are heavily used in survey research, business ...

. It examines the association of two Bernoulli distributed random variables and is a uniformly more powerful alternative to

Fisher's exact test Fisher's exact test is a statistical significance test used in the analysis of contingency tables. Although in practice it is employed when sample sizes are small, it is valid for all sample sizes. It is named after its inventor, Ronald Fisher, a ...

. It was proposed in 1970 by R. D. Boschloo.

Setting

A 2x2 contingency table visualizes

n

independent observations of two binary variables

A

and

B

: :

\begin
& B = 1 & B = 0 & \mbox\\
\hline
A = 1 & x_ & x_ & n_1 \\
A = 0 & x_ & x_ & n_0 \\
\hline
\mbox & s_1 & s_0 & n\\
\end

The probability distribution of such tables can be classified into three distinct cases. # The row sums

n_1, n_0

and column sums

s_1, s_0

are fixed in advance and not random.
Then all

x_

are determined by

x_

. If

A

and

B

are independent,

x_

follows a

hypergeometric distribution In probability theory and statistics, the hypergeometric distribution is a discrete probability distribution that describes the probability of k successes (random draws for which the object drawn has a specified feature) in n draws, ''without' ...

with parameters

n, n_1, s_1

x_ \sim \mbox(n, n_1, s_1)

. # The row sums

n_1, n_0

are fixed in advance but the column sums

s_1, s_0

are not.
Then all random parameters are determined by

x_

and

x_

and

x_, x_

follow a

binomial distribution In probability theory and statistics, the binomial distribution with parameters ''n'' and ''p'' is the discrete probability distribution of the number of successes in a sequence of ''n'' independent experiments, each asking a yes–no qu ...

with probabilities

p_1, p_0

x_ \sim B(n_1, p_1)

x_ \sim B(n_0, p_0)

# Only the total number

n

is fixed but the row sums

n_1, n_0

and the column sums

s_1, s_0

are not.
Then the random vector

(x_, x_, x_, x_)

follows a

multinomial distribution In probability theory, the multinomial distribution is a generalization of the binomial distribution. For example, it models the probability of counts for each side of a ''k''-sided dice rolled ''n'' times. For ''n'' independent trials each of w ...

with probability vector

(p_, p_, p_, p_)

is designed for the first case and therefore an exact conditional test (because it conditions on the column sums). The typical example of such a case is the

Lady tasting tea In the design of experiments in statistics, the lady tasting tea is a randomized experiment devised by Ronald Fisher and reported in his book ''The Design of Experiments'' (1935). The experiment is the original exposition of Fisher's notion of ...

: A lady tastes 8 cups of tea with milk. In 4 of those cups the milk is poured in before the tea. In the other 4 cups the tea is poured in first. The lady tries to assign the cups to the two categories. Following our notation, the random variable

A

represents the used method (1 = milk first, 0 = milk last) and

B

represents the lady's guesses (1 = milk first guessed, 0 = milk last guessed). Then the row sums are the fixed numbers of cups prepared with each method:

n_1 = 4, n_0 = 4

. The lady knows that there are 4 cups in each category, so will assign 4 cups to each method. Thus, the column sums are also fixed in advance:

s_1 = 4, s_0 = 4

. If she is not able to tell the difference,

A

and

B

are independent and the number

x_

of correctly classified cups with milk first follows the hypergeometric distribution

\mbox(8, 4, 4)

. Boschloo's test is designed for the second case and therefore an exact unconditional test. Examples of such a case are often found in medical research, where a binary endpoint is compared between two patient groups. Following our notation,

A = 1

represents the first group that receives some medication of interest.

A = 0

represents the second group that receives a

placebo A placebo ( ) is a substance or treatment which is designed to have no therapeutic value. Common placebos include inert tablets (like sugar pills), inert injections (like Saline (medicine), saline), sham surgery, and other procedures. In general ...

B

indicates the cure of a patient (1 = cure, 0 = no cure). Then the row sums equal the group sizes and are usually fixed in advance. The column sums are the total number of cures respectively disease continuations and not fixed in advance. An example for the third case can be constructed as follows: Simultaneously flip two distinguishable coins

A

and

B

and do this

n

times. If we count the number of results in our 2x2 table (1 = head, 0 = tail), we neither know in advance how often coin

A

shows head or tail (row sums random), nor do we know how often coin

B

shows head or tail (column sums random).

Test hypothesis

The

null hypothesis In scientific research, the null hypothesis (often denoted ''H''0) is the claim that no difference or relationship exists between two sets of data or variables being analyzed. The null hypothesis is that any experimentally observed difference is d ...

of Boschloo's

one-tailed test In statistical significance testing, a one-tailed test and a two-tailed test are alternative ways of computing the statistical significance of a parameter inferred from a data set, in terms of a test statistic. A two-tailed test is appropriate i ...

(high values of

x_1

favor the alternative hypothesis) is: :

H_0: p_1 \le p_0

The null hypothesis of the one-tailed test can also be formulated in the other direction (small values of

x_1

favor the alternative hypothesis): :

H_0: p_1 \ge p_0

The null hypothesis of the two-tailed test is: :

H_0: p_1 = p_0

There is no universal definition of the two-tailed version of Fisher's exact test. Since Boschloo's test is based on Fisher's exact test, a universal two-tailed version of Boschloo's test also doesn't exist. In the following we deal with the one-tailed test and

H_0: p_1 \le p_0

Boschloo's idea

We denote the desired

significance level In statistical hypothesis testing, a result has statistical significance when it is very unlikely to have occurred given the null hypothesis (simply by chance alone). More precisely, a study's defined significance level, denoted by \alpha, is the p ...

\alpha

. Fisher's exact test is a conditional test and appropriate for the first of the above mentioned cases. But if we treat the observed column sum

s_1

as fixed in advance, Fisher's exact test can also be applied to the second case. The true

size Size in general is the Magnitude (mathematics), magnitude or dimensions of a thing. More specifically, ''geometrical size'' (or ''spatial size'') can refer to linear dimensions (length, width, height, diameter, perimeter), area, or volume ...

of the test then depends on the nuisance parameters

p_1

and

p_0

. It can be shown that the size maximum

\max\limits_\big(\mbox(p_1, p_0)\big)

is taken for equal proportions

p=p_1=p_0

and is still controlled by

\alpha

. However, Boschloo stated that for small sample sizes, the maximal size is often considerably smaller than

\alpha

. This leads to an undesirable loss of

power Power most often refers to: * Power (physics), meaning "rate of doing work" ** Engine power, the power put out by an engine ** Electric power * Power (social and political), the ability to influence people or events ** Abusive power Power may ...

. Boschloo proposed to use Fisher's exact test with a greater nominal level

\alpha^* > \alpha

. Here,

\alpha^*

should be chosen as large as possible such that the maximal size is still controlled by

\alpha

\max\limits_\big(\mbox(p)\big) \le \alpha

. This method was especially advantageous at the time of Boschloo's publication because

\alpha^*

could be looked up for common values of

\alpha, n_1

and

n_0

. This made performing Boschloo's test computationally easy.

Test statistic

The

decision rule In decision theory, a decision rule is a function which maps an observation to an appropriate action. Decision rules play an important role in the theory of statistics and economics, and are closely related to the concept of a strategy in game th ...

of Boschloo's approach is based on Fisher's exact test. An equivalent way of formulating the test is to use the p-value of Fisher's exact test as

test statistic A test statistic is a statistic (a quantity derived from the sample) used in statistical hypothesis testing.Berger, R. L.; Casella, G. (2001). ''Statistical Inference'', Duxbury Press, Second Edition (p.374) A hypothesis test is typically specifie ...

. Fisher's p-value is calculated from the hypergeometric distribution (for ease of notation we write

x_1, x_0

instead of

x_, x_

): :

p_F = 1-F_(x_1-1)

The distribution of

p_F

is determined by the binomial distributions of

x_1

and

x_0

and depends on the unknown nuisance parameter

p

. For a specified significance level

\alpha,

the

critical value Critical value may refer to: *In differential topology, a critical value of a differentiable function between differentiable manifolds is the image (value of) ƒ(''x'') in ''N'' of a critical point ''x'' in ''M''. *In statistical hypothesis ...

p_F

is the maximal value

\alpha^*

that satisfies

\max\limits_P(p_F \le \alpha^*) \le \alpha

. The critical value

\alpha^*

is equal to the nominal level of Boschloo's original approach.

Modification

Boschloo's test deals with the unknown nuisance parameter

p

by taking the maximum over the whole parameter space

,1 /math>. The Berger & Boos procedure takes a different approach by maximizing P(p_F \le \alpha^*) over a (1-\gamma)

confidence interval In frequentist statistics, a confidence interval (CI) is a range of estimates for an unknown parameter. A confidence interval is computed at a designated ''confidence level''; the 95% confidence level is most common, but other levels, such as ...

p = p_1 = p_0

and adding

\gamma

\gamma

is usually a small value such as 0.001 or 0.0001. This results in a modified Boschloo's test which is also exact.

Comparison to other exact tests

All exact tests hold the specified significance level but can have varying power in different situations. Mehrotra et al. compared the power of some exact tests in different situations. The results regarding Boschloo's test are summarized in the following.

Modified Boschloo's test

Boschloo's test and the modified Boschloo's test have similar power in all considered scenarios. Boschloo's test has slightly more power in some cases, and vice versa in some other cases.

Fisher's exact test

Boschloo's test is by construction uniformly more powerful than Fisher's exact test. For small sample sizes (e.g. 10 per group) the power difference is large, ranging from 16 to 20 percentage points in the regarded cases. The power difference is smaller for greater sample sizes.

Exact $Z$ -Pooled test

This test is based on the test statistic :

Z_P(x_1, x_0) = \frac,

where

\hat p_i = \frac

are the group event rates and

\tilde p = \frac

is the pooled event rate. The power of this test is similar to that of Boschloo's test in most scenarios. In some cases, the

Z

-Pooled test has greater power, with differences mostly ranging from 1 to 5 percentage points. In very few cases, the difference goes up to 9 percentage points. This test can also be modified by the Berger & Boos procedure. However, the resulting test has very similar power to the unmodified test in all scenarios.

Exact $Z$ -Unpooled test

This test is based on the test statistic :

Z_U(x_1, x_0) = \frac,

where

\hat p_i = \frac

are the group event rates. The power of this test is similar to that of Boschloo's test in many scenarios. In some cases, the

Z

-Unpooled test has greater power, with differences ranging from 1 to 5 percentage points. However, in some other cases, Boschloo's test has noticeably greater power, with differences up to 68 percentage points. This test can also be modified by the Berger & Boos procedure. The resulting test has similar power to the unmodified test in most scenarios. In some cases, the power is considerably improved by the modification but the overall power comparison to Boschloo's test remains unchanged.

Software

The calculation of Boschloo's test can be performed in following software: * The function ''scipy.stats.boschloo_exact'' from

SciPy SciPy (pronounced "sigh pie") is a free and open-source Python library used for scientific computing and technical computing. SciPy contains modules for optimization, linear algebra, integration, interpolation, special functions, FFT, signal ...

* Packages ''Exact'' and ''exact2x2'' of the programming language R * StatXact

References

{{Reflist Statistical tests for contingency tables Nonparametric statistics