In
statistics
Statistics (from German: '' Statistik'', "description of a state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a scientific, indust ...
, family-wise error rate (FWER) is the
probability
Probability is the branch of mathematics concerning numerical descriptions of how likely an event is to occur, or how likely it is that a proposition is true. The probability of an event is a number between 0 and 1, where, roughly speaking, ...
of making one or more false discoveries, or
type I error
In statistical hypothesis testing, a type I error is the mistaken rejection of an actually true null hypothesis (also known as a "false positive" finding or conclusion; example: "an innocent person is convicted"), while a type II error is the fa ...
s when performing
multiple hypotheses tests.
Familywise and Experimentwise Error Rates
Tukey (1953) developed the concept of a familywise error rate as the probability of making a Type I error among a specified group, or "family," of tests.
[ Based on Tukey (1953), ] Ryan (1959) proposed the related concept of an ''experimentwise error rate'', which is the probability of making a Type I error in a given experiment.
Hence, an experimentwise error rate is a familywise error rate for all of the tests that are conducted within an experiment.
As Ryan (1959, Footnote 3) explained, an experiment may contain two or more families of multiple comparisons, each of which relates to a particular statistical inference and each of which has its own separate familywise error rate.
Hence, familywise error rates are usually based on theoretically informative collections of multiple comparisons. In contrast, an experimentwise error rate may be based on a co-incidental collection of comparisons that refer to a diverse range of separate inferences. Consequently, some have argued that it may not be useful to control the experimentwise error rate.
Indeed, Tukey was against the idea of experimentwise error rates (Tukey, 1956, personal communication, in Ryan, 1962, p. 302).
Background
Within the statistical framework, there are several definitions for the term "family":
* Hochberg & Tamhane (1987) defined "family" as "any collection of inferences for which it is meaningful to take into account some combined measure of error".
* According to Cox (1982), a set of inferences should be regarded a family:
# To take into account the selection effect due to
data dredging
Data dredging (also known as data snooping or ''p''-hacking) is the misuse of data analysis to find patterns in data that can be presented as statistically significant, thus dramatically increasing and understating the risk of false positives. T ...
# To ensure simultaneous correctness of a set of inferences as to guarantee a correct overall decision
To summarize, a family could best be defined by the potential
selective inference
Selective may refer to:
* Selective school, a school that admits students on the basis of some sort of selection criteria
** Selective school (New South Wales)
Selective strength: the human body transitions between being weak and strong. This ran ...
that is being faced: A family is the smallest set of items of inference in an analysis, interchangeable about their meaning for the goal of research, from which selection of results for action, presentation or highlighting could be made (
Yoav Benjamini
Yoav Benjamini ( he, יואב בנימיני; born January 5, 1949) is an Israeli statistician best known for development (with Yosef Hochberg) of the “false discovery rate” criterion. He is currently The Nathan and Lily Silver
Professor of Ap ...
).
Classification of multiple hypothesis tests
Definition
The FWER is the probability of making at least one
type I error
In statistical hypothesis testing, a type I error is the mistaken rejection of an actually true null hypothesis (also known as a "false positive" finding or conclusion; example: "an innocent person is convicted"), while a type II error is the fa ...
in the family,
:
or equivalently,
:
Thus, by assuring
, the probability of making one or more
type I errors in the family is controlled at level
.
A procedure controls the FWER ''in the weak sense'' if the FWER control at level
is guaranteed ''only'' when all null hypotheses are true (i.e. when
, meaning the "global null hypothesis" is true).
A procedure controls the FWER ''in the strong sense'' if the FWER control at level
is guaranteed for ''any'' configuration of true and non-true null hypotheses (whether the global null hypothesis is true or not).
Controlling procedures
Some classical solutions that ensure strong level
FWER control, and some newer solutions exist.
The Bonferroni procedure
* Denote by
the ''p''-value for testing
* reject
if
The Šidák procedure
* Testing each hypothesis at level
is Sidak's multiple testing procedure.
* This procedure is more powerful than Bonferroni but the gain is small.
* This procedure can fail to control the FWER when the tests are negatively dependent.
Tukey's procedure
* Tukey's procedure is only applicable for
pairwise comparison
Pairwise comparison generally is any process of comparing entities in pairs to judge which of each entity is preferred, or has a greater amount of some quantitative property, or whether or not the two entities are identical. The method of pairwi ...
s.
* It assumes independence of the observations being tested, as well as equal variation across observations (
homoscedasticity
In statistics, a sequence (or a vector) of random variables is homoscedastic () if all its random variables have the same finite variance. This is also known as homogeneity of variance. The complementary notion is called heteroscedasticity. Th ...
).
* The procedure calculates for each pair the
studentized range In statistics, the studentized range, denoted ''q'', is the difference between the largest and smallest data in a sample normalized by the sample standard deviation.
It is named after William Sealy Gosset (who wrote under the pseudonym "''Student' ...
statistic:
where
is the larger of the two means being compared,
is the smaller, and
is the standard error of the data in question.
* Tukey's test is essentially a
Student's t-test
A ''t''-test is any statistical hypothesis test in which the test statistic follows a Student's ''t''-distribution under the null hypothesis. It is most commonly applied when the test statistic would follow a normal distribution if the value of ...
, except that it corrects for family-wise error-rate.
Holm's step-down procedure (1979)
* Start by ordering the ''p''-values (from lowest to highest)
and let the associated hypotheses be
* Let
be the minimal index such that
* Reject the null hypotheses
. If
then none of the hypotheses are rejected.
This procedure is uniformly more powerful than the Bonferroni procedure.
The reason why this procedure controls the family-wise error rate for all the m hypotheses at level α in the strong sense is, because it is a
closed testing procedure
In statistics, the closed testing procedure is a general method for performing more than one hypothesis test simultaneously.
The closed testing principle
Suppose there are ''k'' hypotheses ''H''1,..., ''H'k'' to be tested and the overall type I ...
. As such, each intersection is tested using the simple Bonferroni test.
Hochberg's step-up procedure
Hochberg's step-up procedure (1988) is performed using the following steps:
* Start by ordering the ''p''-values (from lowest to highest)
and let the associated hypotheses be
* For a given
, let
be the largest
such that
* Reject the null hypotheses
Hochberg's procedure is more powerful than Holms'. Nevertheless, while Holm’s is a closed testing procedure (and thus, like Bonferroni, has no restriction on the joint distribution of the test statistics), Hochberg’s is based on the Simes test, so it holds only under non-negative dependence.
Dunnett's correction
Charles Dunnett
Charles William Dunnett (24 August 1921 – May 18, 2007) was a Canadian statistician. He was the Statistical Society of Canada 1986 Gold Medalist and Professor Emeritus of the Departments of Mathematics, Statistics, Clinical Epidemiology, and B ...
(1955, 1966) described an alternative alpha error adjustment when ''k'' groups are compared to the same control group. Now known as Dunnett's test, this method is less conservative than the Bonferroni adjustment.
Scheffé's method
Resampling procedures
The procedures of Bonferroni and Holm control the FWER under any dependence structure of the ''p''-values (or equivalently the individual test statistics). Essentially, this is achieved by accommodating a `worst-case' dependence structure (which is close to independence for most practical purposes). But such an approach is conservative if dependence is actually positive. To give an extreme example, under perfect positive dependence, there is effectively only one test and thus, the FWER is uninflated.
Accounting for the dependence structure of the ''p''-values (or of the individual test statistics) produces more powerful procedures. This can be achieved by applying resampling methods, such as bootstrapping and permutations methods. The procedure of Westfall and Young (1993) requires a certain condition that does not always hold in practice (namely, subset pivotality). The procedures of Romano and Wolf (2005a,b) dispense with this condition and are thus more generally valid.
Harmonic mean ''p''-value procedure
The harmonic mean ''p''-value (HMP) procedure provides a multilevel test that improves on the power of Bonferroni correction by assessing the significance of ''groups'' of hypotheses while controlling the strong-sense family-wise error rate. The significance of any subset
of the
tests is assessed by calculating the HMP for the subset,
where
are weights that sum to one (i.e.
). An approximate procedure that controls the strong-sense family-wise error rate at level approximately
rejects the null hypothesis that none of the ''p''-values in subset
are significant when
(where
). This approximation is reasonable for small
(e.g.
) and becomes arbitrarily good as
approaches zero. An asymptotically exact test is also available (see
main article).
Alternative approaches
FWER control exerts a more stringent control over false discovery compared to false discovery rate (FDR) procedures. FWER control limits the probability of ''at least one'' false discovery, whereas FDR control limits (in a loose sense) the expected proportion of false discoveries. Thus, FDR procedures have greater
power
Power most often refers to:
* Power (physics), meaning "rate of doing work"
** Engine power, the power put out by an engine
** Electric power
* Power (social and political), the ability to influence people or events
** Abusive power
Power may a ...
at the cost of increased rates of
type I errors, i.e., rejecting null hypotheses that are actually true.
On the other hand, FWER control is less stringent than per-family error rate control, which limits the expected number of errors per family. Because FWER control is concerned with ''at least one'' false discovery, unlike per-family error rate control it does not treat multiple simultaneous false discoveries as any worse than one false discovery. The
Bonferroni correction
In statistics, the Bonferroni correction is a method to counteract the multiple comparisons problem.
Background
The method is named for its use of the Bonferroni inequalities.
An extension of the method to confidence intervals was proposed by Ol ...
is often considered as merely controlling the FWER, but in fact also controls the per-family error rate.
References
External links
Understanding Family Wise Error Rate- blog post including its utility relative to False Discovery Rate
{{DEFAULTSORT:Familywise Error Rate
Statistical hypothesis testing
Multiple comparisons
Rates