Boschloo's test is a

statistical hypothesis test A statistical hypothesis test is a method of statistical inference used to decide whether the data provide sufficient evidence to reject a particular hypothesis. A statistical hypothesis test typically involves a calculation of a test statistic. T ...

for analysing 2x2

contingency tables In statistics, a contingency table (also known as a cross tabulation or crosstab) is a type of table in a matrix format that displays the multivariate frequency distribution of the variables. They are heavily used in survey research, business int ...

. It examines the association of two Bernoulli distributed

random variables A random variable (also called random quantity, aleatory variable, or stochastic variable) is a mathematical formalization of a quantity or object which depends on random events. The term 'random variable' in its mathematical definition refers ...

and is a uniformly more powerful alternative to

Fisher's exact test Fisher's exact test (also Fisher-Irwin test) is a statistical significance test used in the analysis of contingency tables. Although in practice it is employed when sample sizes are small, it is valid for all sample sizes. The test assumes that a ...

. It was proposed in 1970 by R. D. Boschloo.

Setting

A 2 × 2 contingency table visualizes

\ n\

independent observations of two binary variables

\ A\

and

\ B\

: :

\begin
& B = 1 & B = 0 & \mbox\\
\hline
A = 1 & x_ & x_ & n_1 \\
A = 0 & x_ & x_ & n_0 \\
\hline
\mbox & s_1 & s_0 & n\\
\end

The probability distribution of such tables can be classified into three distinct cases. # The row sums

\ n_1\ , n_0\

and column sums

\ s_1\ , s_0\

are fixed in advance and not random.
Then all

\ x_\

are determined by

\ x_ ~.

\ A\

and

\ B\

are independent,

\ x_\

follows a

hypergeometric distribution In probability theory and statistics, the hypergeometric distribution is a Probability distribution#Discrete probability distribution, discrete probability distribution that describes the probability of k successes (random draws for which the ...

with parameters

\ n\ , n_1\ , s_1\ :

\ x_\  \sim\  \mbox(\ n\ , n_1\ , s_1\ ) ~.

# The row sums

\ n_1\ , n_0\

are fixed in advance but the column sums

\ s_1\ , s_0\

are not.
Then all random parameters are determined by

\ x_\

and

x_\

and

\ x_\ , x_\

follow a

binomial distribution In probability theory and statistics, the binomial distribution with parameters and is the discrete probability distribution of the number of successes in a sequence of statistical independence, independent experiment (probability theory) ...

with probabilities

\ p_1\ , p_0\ :

\ x_\  \sim\  B(\ n_1\ , p_1\ )\

\ x_\  \sim\  B(\ n_0\ , p_0\ )\

# Only the total number

\ n\

is fixed but the row sums

\ n_1\ , n_0\

and the column sums

\ s_1\ , s_0\

are not.
Then the random vector

\ (\ x_, x_\ , x_\ , x_\ )\

follows a

multinomial distribution In probability theory, the multinomial distribution is a generalization of the binomial distribution. For example, it models the probability of counts for each side of a ''k''-sided die rolled ''n'' times. For ''n'' statistical independence, indepen ...

with probability vector

\ (p_\ , p_\ , p_\ , p_\ ) ~.

Experiment type 1: Rare taste-test experiment, fully constrained

is designed for the first case and therefore an exact conditional test (because it conditions on the column sums). The typical example of such a case is the

Lady tasting tea In the design of experiments in statistics, the lady tasting tea is a randomized experiment devised by Ronald Fisher and reported in his book '' The Design of Experiments'' (1935). The experiment is the original exposition of Fisher's notion of ...

: A lady tastes of tea with milk. In cups the milk is poured in before the tea. In the other the tea is poured in first. The lady tries to assign the cups to the two categories. Following our notation, the random variable

\ A\

represents the used method (1 = milk first, 0 = milk last) and

\ B\

represents the lady's guesses (1 = milk first guessed, 0 = milk last guessed). Then the row sums are the fixed numbers of cups prepared with each method:

\ n_1 = 4\ , n_0 = 4 ~.

The lady knows that there are in each category, so will assign to each method. Thus, the column sums are also fixed in advance:

\ s_1 = 4\ , s_0 = 4 ~.

If she is not able to tell the difference,

\ A\

and

\ B\

are independent and the number

\ x_\

of correctly classified cups with milk first follows the hypergeometric distribution

\ \mbox(8, 4, 4) ~.

Experiment type 2: Normal laboratory controlled experiment, only one margin constrained

Boschloo's test is designed for the second case and therefore an exact unconditional test. Examples of such a case are often found in medical research, where a binary endpoint is compared between two patient groups. Following our notation,

\ A = 1\

represents the first group that receives some medication of interest.

\ A = 0\

represents the second group that receives a

placebo A placebo ( ) can be roughly defined as a sham medical treatment. Common placebos include inert tablets (like sugar pills), inert injections (like saline), sham surgery, and other procedures. Placebos are used in randomized clinical trials ...

B

indicates the cure of a patient (1 = cure, 0 = no cure). Then the row sums equal the group sizes and are usually fixed in advance. The column sums are the total number of cures respectively disease continuations and not fixed in advance.

Experiment type 3: Field observation, no marginal constraints at all

Pearson's chi-squared test Pearson's chi-squared test or Pearson's \chi^2 test is a statistical test applied to sets of categorical data to evaluate how likely it is that any observed difference between the sets arose by chance. It is the most widely used of many chi-squa ...

(without ''any'' "continuity correction") is the correct choice for the third case, where there are no constraints on either the row totals or the column totals. This third scenario describes most observational studies or "field-observations", where data is collected as-available in an uncontrolled environment. For example, if one goes out collecting two types of butterflies of some particular predetermined identifiable color, which can be recognized before capture, however it is ''not'' possible to distinguished whether a butterfly is species 1 or species 0; before it is captured and closely examined: One can merely tell by its color that a butterfly being pursued must be either one of the two species of interest. For any one day's session of butterfly collecting, one cannot predetermine how many of each species will be collected, only perhaps the total number of capture, depending on the collector's criterion for stopping. If the species are tallied in separate rows of the table, then the row sums are unconstrained and independently binomially distributed. The second distinction between the captured butterflies will be whether the butterfly is female (type 1) or male (type 0), tallied in the columns. If its sex also requires close examination of the butterfly, that also is independently binomially random. That means that because of the

experimental design The design of experiments (DOE), also known as experiment design or experimental design, is the design of any task that aims to describe and explain the variation of information under conditions that are hypothesized to reflect the variation. ...

, the column sums are unconstrained just like the rows are: Neither the count for either of species, nor count of the sex of the captured butterflies in each species is predetermined by the process of observation, and neither total constrains the other. The only possible constraint is the grand total of all butterflies captured, and even that could itself be unconstrained, depending on how the collector decides to stop. But since one cannot reliably know beforehand for any one particular day in any one particular meadow how successful one's pursuit might be during the time available for collection, even the grand total might be unconstrained: It depends on whether the constraint on data collected is the time available to catch butterflies, or some predetermined total to be collected, perhaps to ensure adequately significant statistics. This type of 'experiment' (also called a "field observation") is almost entirely uncontrolled, hence some prefer to only call it an 'observation', not an 'experiment'. All the numbers in the table are independently random. Each of the cells of the contingency table is a separate binomial probability and neither Fisher's fully constrained 'exact' test nor Boschloo's partly-constrained test are based on the statistics arising from the experimental design.

is the appropriate test for an unconstrained observational study, and Pearson's test, in turn, employs the wrong statistical model for the other two types of experiment. (Note in passing that Pearson's chi-squared statistic should ''never'' have ''any'' "continuity correction" applied, what-so-ever, e.g. no "Yates' correction": The consequence of that "correction" will be to distort its to match Fisher's test, i.e. give the wrong answer.)

Test hypothesis

The

null hypothesis The null hypothesis (often denoted ''H''0) is the claim in scientific research that the effect being studied does not exist. The null hypothesis can also be described as the hypothesis in which no relationship exists between two sets of data o ...

of Boschloo's

one-tailed test In statistical significance testing, a one-tailed test and a two-tailed test are alternative ways of computing the statistical significance of a parameter inferred from a data set, in terms of a test statistic. A two-tailed test is appropriate i ...

(high values of

x_1

favor the alternative hypothesis) is: :

H_0: p_1 \le p_0

The null hypothesis of the one-tailed test can also be formulated in the other direction (small values of

x_1

favor the alternative hypothesis): :

H_0: p_1 \ge p_0

The null hypothesis of the two-tailed test is: :

H_0: p_1 = p_0

There is no universal definition of the two-tailed version of Fisher's exact test. Since Boschloo's test is based on Fisher's exact test, a universal two-tailed version of Boschloo's test also doesn't exist. In the following we deal with the one-tailed test and

H_0: p_1 \le p_0

Boschloo's idea

We denote the desired

significance level In statistical hypothesis testing, a result has statistical significance when a result at least as "extreme" would be very infrequent if the null hypothesis were true. More precisely, a study's defined significance level, denoted by \alpha, is the ...

\alpha

. Fisher's exact test is a conditional test and appropriate for the first of the above mentioned cases. But if we treat the observed column sum

s_1

as fixed in advance, Fisher's exact test can also be applied to the second case. The true

size Size in general is the Magnitude (mathematics), magnitude or dimensions of a thing. More specifically, ''geometrical size'' (or ''spatial size'') can refer to three geometrical measures: length, area, or volume. Length can be generalized ...

of the test then depends on the nuisance parameters

p_1

and

p_0

. It can be shown that the size maximum

\max\limits_\big(\mbox(p_1, p_0)\big)

is taken for equal proportions

p=p_1=p_0

and is still controlled by

\alpha

. However, Boschloo stated that for small sample sizes, the maximal size is often considerably smaller than

\alpha

. This leads to an undesirable loss of

power Power may refer to: Common meanings * Power (physics), meaning "rate of doing work" ** Engine power, the power put out by an engine ** Electric power, a type of energy * Power (social and political), the ability to influence people or events Math ...

. Boschloo proposed to use Fisher's exact test with a greater nominal level

\alpha^* > \alpha

. Here,

\alpha^*

should be chosen as large as possible such that the maximal size is still controlled by

\alpha

\max\limits_\big(\mbox(p)\big) \le \alpha

. This method was especially advantageous at the time of Boschloo's publication because

\alpha^*

could be looked up for common values of

\alpha, n_1

and

n_0

. This made performing Boschloo's test computationally easy.

Test statistic

The

decision rule In decision theory, a decision rule is a function which maps an observation to an appropriate action. Decision rules play an important role in the theory of statistics and economics, and are closely related to the concept of a strategy in game ...

of Boschloo's approach is based on Fisher's exact test. An equivalent way of formulating the test is to use the p-value of Fisher's exact test as

test statistic Test statistic is a quantity derived from the sample for statistical hypothesis testing.Berger, R. L.; Casella, G. (2001). ''Statistical Inference'', Duxbury Press, Second Edition (p.374) A hypothesis test is typically specified in terms of a tes ...

. Fisher's p-value is calculated from the hypergeometric distribution (for ease of notation we write

x_1, x_0

instead of

x_, x_

): :

p_F = 1-F_(x_1-1)

The distribution of

p_F

is determined by the binomial distributions of

x_1

and

x_0

and depends on the unknown nuisance parameter

p

. For a specified significance level

\alpha,

the

critical value Critical value or threshold value can refer to: * A quantitative threshold in medicine, chemistry and physics * Critical value (statistics), boundary of the acceptance region while testing a statistical hypothesis * Value of a function at a crit ...

p_F

is the maximal value

\alpha^*

that satisfies

\max\limits_P(p_F \le \alpha^*) \le \alpha

. The critical value

\alpha^*

is equal to the nominal level of Boschloo's original approach.

Modification

Boschloo's test deals with the unknown nuisance parameter

p

by taking the maximum over the whole parameter space

,1 /math>. The Berger & Boos procedure takes a different approach by maximizing P(p_F \le \alpha^*) over a (1-\gamma) confidence interval of p = p_1 = p_0 and adding \gamma . \gamma is usually a small value such as 0.001 or 0.0001. This results in a modified Boschloo's test which is also exact.

Comparison to other exact tests

All exact tests hold the specified significance level but can have varying power in different situations. Mehrotra et al. compared the power of some exact tests in different situations. The results regarding Boschloo's test are summarized in the following.

Modified Boschloo's test

Boschloo's test and the modified Boschloo's test have similar power in all considered scenarios. Boschloo's test has slightly more power in some cases, and vice versa in some other cases.

Fisher's exact test

Boschloo's test is by construction uniformly more powerful than Fisher's exact test. For small sample sizes (e.g. 10 per group) the power difference is large, ranging from 16 to 20 percentage points in the regarded cases. The power difference is smaller for greater sample sizes.

Exact Z-Pooled test

This test is based on the test statistic :

Z_P(x_1, x_0) = \frac,

where

\hat p_i = \frac

are the group event rates and

\tilde p = \frac

is the pooled event rate. The power of this test is similar to that of Boschloo's test in most scenarios. In some cases, the

Z

-Pooled test has greater power, with differences mostly ranging from 1 to 5 percentage points. In very few cases, the difference goes up to 9 percentage points. This test can also be modified by the Berger & Boos procedure. However, the resulting test has very similar power to the unmodified test in all scenarios.

Exact Z-Unpooled test

This test is based on the test statistic :

Z_U(x_1, x_0) = \frac,

where

\hat p_i = \frac

are the group event rates. The power of this test is similar to that of Boschloo's test in many scenarios. In some cases, the

Z

-Unpooled test has greater power, with differences ranging from 1 to 5 percentage points. However, in some other cases, Boschloo's test has noticeably greater power, with differences up to 68 percentage points. This test can also be modified by the Berger & Boos procedure. The resulting test has similar power to the unmodified test in most scenarios. In some cases, the power is considerably improved by the modification but the overall power comparison to Boschloo's test remains unchanged.

Software

The calculation of Boschloo's test can be performed in following software: * The function ''scipy.stats.boschloo_exact'' from

SciPy SciPy (pronounced "sigh pie") is a free and open-source Python library used for scientific computing and technical computing. SciPy contains modules for optimization, linear algebra, integration, interpolation, special functions, fast Fourier ...

* Packages ''Exact'' and ''exact2x2'' of the programming language R * StatXact

References

{{Reflist Statistical tests for contingency tables Nonparametric statistics