In
statistics, the false discovery rate (FDR) is a method of conceptualizing the rate of
type I errors
In statistical hypothesis testing, a type I error is the mistaken rejection of an actually true null hypothesis (also known as a "false positive" finding or conclusion; example: "an innocent person is convicted"), while a type II error is the fa ...
in
null hypothesis
In scientific research, the null hypothesis (often denoted ''H''0) is the claim that no difference or relationship exists between two sets of data or variables being analyzed. The null hypothesis is that any experimentally observed difference is d ...
testing when conducting
multiple comparisons
In statistics, the multiple comparisons, multiplicity or multiple testing problem occurs when one considers a set of statistical inferences simultaneously or infers a subset of parameters selected based on the observed values.
The more inferenc ...
. FDR-controlling procedures are designed to control the FDR, which is the
expected proportion of "discoveries" (rejected
null hypotheses
In scientific research, the null hypothesis (often denoted ''H''0) is the claim that no difference or relationship exists between two sets of data or variables being analyzed. The null hypothesis is that any experimentally observed difference is d ...
) that are false (incorrect rejections of the null).
Equivalently, the FDR is the expected ratio of the number of false positive classifications (false discoveries) to the total number of positive classifications (rejections of the null). The total number of rejections of the null include both the number of false positives (FP) and true positives (TP). Simply put, FDR = FP / (FP + TP). FDR-controlling procedures provide less stringent control of Type I errors compared to
family-wise error rate
In statistics, family-wise error rate (FWER) is the probability of making one or more false discoveries, or type I errors when performing multiple hypotheses tests.
Familywise and Experimentwise Error Rates
Tukey (1953) developed the concept of ...
(FWER) controlling procedures (such as the
Bonferroni correction
In statistics, the Bonferroni correction is a method to counteract the multiple comparisons problem.
Background
The method is named for its use of the Bonferroni inequalities.
An extension of the method to confidence intervals was proposed by Oliv ...
), which control the probability of ''at least one'' Type I error. Thus, FDR-controlling procedures have greater
power
Power most often refers to:
* Power (physics), meaning "rate of doing work"
** Engine power, the power put out by an engine
** Electric power
* Power (social and political), the ability to influence people or events
** Abusive power
Power may ...
, at the cost of increased numbers of Type I errors.
History
Technological motivations
The modern widespread use of the FDR is believed to stem from, and be motivated by, the development in technologies that allowed the collection and analysis of a large number of distinct variables in several individuals (e.g., the expression level of each of 10,000 different genes in 100 different persons).
By the late 1980s and 1990s, the development of "high-throughput" sciences, such as
genomics
Genomics is an interdisciplinary field of biology focusing on the structure, function, evolution, mapping, and editing of genomes. A genome is an organism's complete set of DNA, including all of its genes as well as its hierarchical, three-dim ...
, allowed for rapid data acquisition. This, coupled with the growth in computing power, made it possible to seamlessly perform a very high number of
statistical tests
A statistical hypothesis test is a method of statistical inference used to decide whether the data at hand sufficiently support a particular hypothesis.
Hypothesis testing allows us to make probabilistic statements about population parameters.
...
on a given data set. The technology of
microarray
A microarray is a multiplex lab-on-a-chip. Its purpose is to simultaneously detect the expression of thousands of genes from a sample (e.g. from a tissue). It is a two-dimensional array on a solid substrate—usually a glass slide or silic ...
s was a prototypical example, as it enabled thousands of genes to be tested simultaneously for differential expression between two biological conditions.
As high-throughput technologies became common, technological and/or financial constraints led researchers to collect datasets with relatively small sample sizes (e.g. few individuals being tested) and large numbers of variables being measured per sample (e.g. thousands of gene expression levels). In these datasets, too few of the measured variables showed statistical significance after classic correction for multiple tests with standard
multiple comparison procedures. This created a need within many scientific communities to abandon
FWER
In statistics, family-wise error rate (FWER) is the probability of making one or more false discoveries, or type I errors when performing multiple hypotheses tests.
Familywise and Experimentwise Error Rates
Tukey (1953) developed the concept of ...
and unadjusted multiple hypothesis testing for other ways to highlight and rank in publications those variables showing marked effects across individuals or treatments that would otherwise be dismissed as non-significant after standard correction for multiple tests. In response to this, a variety of error rates have been proposed—and become commonly used in publications—that are less conservative than
FWER
In statistics, family-wise error rate (FWER) is the probability of making one or more false discoveries, or type I errors when performing multiple hypotheses tests.
Familywise and Experimentwise Error Rates
Tukey (1953) developed the concept of ...
in flagging possibly noteworthy observations. The FDR is useful when researchers are looking for "discoveries" that will give them followup work (E.g.: detecting promising genes for followup studies), and are interested in controlling the proportion of "false leads" they are willing to accept.
Literature
The FDR concept was formally described by
Yoav Benjamini
Yoav Benjamini ( he, יואב בנימיני; born January 5, 1949) is an Israeli statistician best known for development (with Yosef Hochberg) of the “false discovery rate” criterion. He is currently The Nathan and Lily Silver
Professor of Ap ...
and
Yosef Hochberg
Yosef (; also transliterated as Yossef, Josef, Yoseph Tiberian Hebrew and Aramaic ''Yôsēp̄'') is a Hebrew male name derived from the Biblical character Joseph. The name can also consist of the Hebrew yadah meaning "praise", "fame" and the word ...
in 1995
(
BH procedure) as a less conservative and arguably more appropriate approach for identifying the important few from the trivial many effects tested. The FDR has been particularly influential, as it was the first alternative to the FWER to gain broad acceptance in many scientific fields (especially in the life sciences, from genetics to biochemistry, oncology and plant sciences).
In 2005, the Benjamini and Hochberg paper from 1995 was identified as one of the 25 most-cited statistical papers.
Prior to the 1995 introduction of the FDR concept, various precursor ideas had been considered in the statistics literature. In 1979, Holm proposed the
Holm procedure, a stepwise algorithm for controlling the FWER that is at least as powerful as the well-known
Bonferroni adjustment
In statistics, the Bonferroni correction is a method to counteract the multiple comparisons problem.
Background
The method is named for its use of the Bonferroni inequalities.
An extension of the method to confidence intervals was proposed by Oli ...
. This stepwise algorithm sorts the
''p''-values and sequentially rejects the hypotheses starting from the smallest ''p''-values.
Benjamini (2010)
said that the false discovery rate, and the paper Benjamini and Hochberg (1995), had its origins in two papers concerned with multiple testing:
* The first paper is by
Schweder and
Spjotvoll (1982)
who suggested plotting the ranked ''p''-values and assessing the number of true null hypotheses (
) via an eye-fitted line starting from the largest ''p''-values. The ''p''-values that deviate from this straight line then should correspond to the false null hypotheses. This idea was later developed into an algorithm and incorporated the estimation of
into procedures such as Bonferroni, Holm or Hochberg.
This idea is closely related to the graphical interpretation of the BH procedure.
* The second paper is by Branko Soric (1989)
which introduced the terminology of "discovery" in the multiple hypothesis testing context. Soric used the expected number of false discoveries divided by the number of discoveries
as a warning that "a large part of statistical discoveries may be wrong". This led Benjamini and Hochberg to the idea that a similar error rate, rather than being merely a warning, can serve as a worthy goal to control.
The BH procedure was proven to control the FDR for independent tests in 1995 by Benjamini and Hochberg.
In 1986, R. J. Simes offered the same procedure as the "
Simes procedure
SIMes (or H2Imes) is an ''N''-heterocyclic carbene. It is a white solid that dissolves in organic solvents. The compound is used as a ligand in organometallic chemistry. It is structurally related to the more common ligand IMes but with a sat ...
", in order to control the FWER in the weak sense (under the intersection null hypothesis) when the statistics are independent.
Definitions
Based on definitions below we can define as the proportion of false discoveries among the discoveries (rejections of the null hypothesis):
:
.
where
is the number of false discoveries and
is the number of true discoveries.
The false discovery rate (FDR) is then simply:
:
where