In statistics, the false discovery rate (FDR) is a method of conceptualizing the rate of

type I errors In statistical hypothesis testing, a type I error is the mistaken rejection of an actually true null hypothesis (also known as a "false positive" finding or conclusion; example: "an innocent person is convicted"), while a type II error is the fa ...

null hypothesis In scientific research, the null hypothesis (often denoted ''H''0) is the claim that no difference or relationship exists between two sets of data or variables being analyzed. The null hypothesis is that any experimentally observed difference is d ...

testing when conducting

multiple comparisons In statistics, the multiple comparisons, multiplicity or multiple testing problem occurs when one considers a set of statistical inferences simultaneously or infers a subset of parameters selected based on the observed values. The more inferenc ...

. FDR-controlling procedures are designed to control the FDR, which is the expected proportion of "discoveries" (rejected

null hypotheses In scientific research, the null hypothesis (often denoted ''H''0) is the claim that no difference or relationship exists between two sets of data or variables being analyzed. The null hypothesis is that any experimentally observed difference is d ...

) that are false (incorrect rejections of the null). Equivalently, the FDR is the expected ratio of the number of false positive classifications (false discoveries) to the total number of positive classifications (rejections of the null). The total number of rejections of the null include both the number of false positives (FP) and true positives (TP). Simply put, FDR = FP / (FP + TP). FDR-controlling procedures provide less stringent control of Type I errors compared to

family-wise error rate In statistics, family-wise error rate (FWER) is the probability of making one or more false discoveries, or type I errors when performing multiple hypotheses tests. Familywise and Experimentwise Error Rates Tukey (1953) developed the concept of ...

(FWER) controlling procedures (such as the

Bonferroni correction In statistics, the Bonferroni correction is a method to counteract the multiple comparisons problem. Background The method is named for its use of the Bonferroni inequalities. An extension of the method to confidence intervals was proposed by Oliv ...

), which control the probability of ''at least one'' Type I error. Thus, FDR-controlling procedures have greater

power Power most often refers to: * Power (physics), meaning "rate of doing work" ** Engine power, the power put out by an engine ** Electric power * Power (social and political), the ability to influence people or events ** Abusive power Power may ...

, at the cost of increased numbers of Type I errors.

History

Technological motivations

The modern widespread use of the FDR is believed to stem from, and be motivated by, the development in technologies that allowed the collection and analysis of a large number of distinct variables in several individuals (e.g., the expression level of each of 10,000 different genes in 100 different persons). By the late 1980s and 1990s, the development of "high-throughput" sciences, such as

genomics Genomics is an interdisciplinary field of biology focusing on the structure, function, evolution, mapping, and editing of genomes. A genome is an organism's complete set of DNA, including all of its genes as well as its hierarchical, three-dim ...

, allowed for rapid data acquisition. This, coupled with the growth in computing power, made it possible to seamlessly perform a very high number of

statistical tests A statistical hypothesis test is a method of statistical inference used to decide whether the data at hand sufficiently support a particular hypothesis. Hypothesis testing allows us to make probabilistic statements about population parameters. ...

on a given data set. The technology of

microarray A microarray is a multiplex lab-on-a-chip. Its purpose is to simultaneously detect the expression of thousands of genes from a sample (e.g. from a tissue). It is a two-dimensional array on a solid substrate—usually a glass slide or silic ...

s was a prototypical example, as it enabled thousands of genes to be tested simultaneously for differential expression between two biological conditions. As high-throughput technologies became common, technological and/or financial constraints led researchers to collect datasets with relatively small sample sizes (e.g. few individuals being tested) and large numbers of variables being measured per sample (e.g. thousands of gene expression levels). In these datasets, too few of the measured variables showed statistical significance after classic correction for multiple tests with standard multiple comparison procedures. This created a need within many scientific communities to abandon

FWER In statistics, family-wise error rate (FWER) is the probability of making one or more false discoveries, or type I errors when performing multiple hypotheses tests. Familywise and Experimentwise Error Rates Tukey (1953) developed the concept of ...

and unadjusted multiple hypothesis testing for other ways to highlight and rank in publications those variables showing marked effects across individuals or treatments that would otherwise be dismissed as non-significant after standard correction for multiple tests. In response to this, a variety of error rates have been proposed—and become commonly used in publications—that are less conservative than

in flagging possibly noteworthy observations. The FDR is useful when researchers are looking for "discoveries" that will give them followup work (E.g.: detecting promising genes for followup studies), and are interested in controlling the proportion of "false leads" they are willing to accept.

Literature

The FDR concept was formally described by

Yoav Benjamini Yoav Benjamini ( he, יואב בנימיני; born January 5, 1949) is an Israeli statistician best known for development (with Yosef Hochberg) of the “false discovery rate” criterion. He is currently The Nathan and Lily Silver Professor of Ap ...

and

Yosef Hochberg Yosef (; also transliterated as Yossef, Josef, Yoseph Tiberian Hebrew and Aramaic ''Yôsēp̄'') is a Hebrew male name derived from the Biblical character Joseph. The name can also consist of the Hebrew yadah meaning "praise", "fame" and the word ...

in 1995 ( BH procedure) as a less conservative and arguably more appropriate approach for identifying the important few from the trivial many effects tested. The FDR has been particularly influential, as it was the first alternative to the FWER to gain broad acceptance in many scientific fields (especially in the life sciences, from genetics to biochemistry, oncology and plant sciences). In 2005, the Benjamini and Hochberg paper from 1995 was identified as one of the 25 most-cited statistical papers. Prior to the 1995 introduction of the FDR concept, various precursor ideas had been considered in the statistics literature. In 1979, Holm proposed the Holm procedure, a stepwise algorithm for controlling the FWER that is at least as powerful as the well-known

Bonferroni adjustment In statistics, the Bonferroni correction is a method to counteract the multiple comparisons problem. Background The method is named for its use of the Bonferroni inequalities. An extension of the method to confidence intervals was proposed by Oli ...

. This stepwise algorithm sorts the ''p''-values and sequentially rejects the hypotheses starting from the smallest ''p''-values. Benjamini (2010) said that the false discovery rate, and the paper Benjamini and Hochberg (1995), had its origins in two papers concerned with multiple testing: * The first paper is by Schweder and Spjotvoll (1982) who suggested plotting the ranked ''p''-values and assessing the number of true null hypotheses (

m_0

) via an eye-fitted line starting from the largest ''p''-values. The ''p''-values that deviate from this straight line then should correspond to the false null hypotheses. This idea was later developed into an algorithm and incorporated the estimation of

m_0

into procedures such as Bonferroni, Holm or Hochberg. This idea is closely related to the graphical interpretation of the BH procedure. * The second paper is by Branko Soric (1989) which introduced the terminology of "discovery" in the multiple hypothesis testing context. Soric used the expected number of false discoveries divided by the number of discoveries

R \right )

as a warning that "a large part of statistical discoveries may be wrong". This led Benjamini and Hochberg to the idea that a similar error rate, rather than being merely a warning, can serve as a worthy goal to control. The BH procedure was proven to control the FDR for independent tests in 1995 by Benjamini and Hochberg. In 1986, R. J. Simes offered the same procedure as the "

Simes procedure SIMes (or H2Imes) is an ''N''-heterocyclic carbene. It is a white solid that dissolves in organic solvents. The compound is used as a ligand in organometallic chemistry. It is structurally related to the more common ligand IMes but with a sat ...

", in order to control the FWER in the weak sense (under the intersection null hypothesis) when the statistics are independent.

Definitions

Based on definitions below we can define as the proportion of false discoveries among the discoveries (rejections of the null hypothesis): :

Q = V/R = V/(V+S)

. where

V

is the number of false discoveries and

S

is the number of true discoveries. The false discovery rate (FDR) is then simply: :

\mathrm = Q_e =  \mathrm\!\left \right

where

\mathrm\!\left \right /math> is the

expected value In probability theory, the expected value (also called expectation, expectancy, mathematical expectation, mean, average, or first moment) is a generalization of the weighted average. Informally, the expected value is the arithmetic mean of a ...

Q

. The goal is to keep FDR below a given threshold ''q''. To avoid

division by zero In mathematics, division by zero is division where the divisor (denominator) is zero. Such a division can be formally expressed as \tfrac, where is the dividend (numerator). In ordinary arithmetic, the expression has no meaning, as there is ...

Q

is defined to be 0 when

R = 0

. Formally,

\cdot \mathrm\!\left (R>0 \right)

Classification of multiple hypothesis tests

Controlling procedures

The settings for many procedures is such that we have

H_1 \ldots H_m

null hypotheses tested and

P_1 \ldots P_m

their corresponding ''p''-values. We list these ''p''-values in ascending order and denote them by

P_ \ldots P_

. A procedure that goes from a small ''p''-value to a large one will be called a step-up procedure. In a similar way, in a "step-down" procedure we move from a large corresponding test statistic to a smaller one.

Benjamini–Hochberg procedure

The ''Benjamini–Hochberg procedure'' (BH step-down procedure) controls the FDR at level

\alpha

. It works as follows: # For a given

\alpha

, find the largest such that

P_ \leq \frac \alpha

#Reject the null hypothesis (i.e., declare discoveries) for all

H_

for

i = 1, \ldots, k

Geometrically, this corresponds to plotting

P_

vs. (on the and axes respectively), drawing the line through the origin with slope

\frac
\alpha

, and declaring discoveries for all points on the left up to and including the last point that is below the line. The BH procedure is valid when the tests are

independent Independent or Independents may refer to: Arts, entertainment, and media Artist groups * Independents (artist group), a group of modernist painters based in the New Hope, Pennsylvania, area of the United States during the early 1930s * Independe ...

, and also in various scenarios of dependence, but is not universally valid. It also satisfies the inequality: :

E(Q) \leq \frac\alpha \leq \alpha

If an estimator of

m_0

is inserted into the BH procedure, it is no longer guaranteed to achieve FDR control at the desired level. Adjustments may be needed in the estimator and several modifications have been proposed. Note that the mean

\alpha

for these tests is

\frac

, the Mean(FDR

\alpha

) or MFDR,

\alpha

adjusted for independent or positively correlated tests (see AFDR below). The MFDR expression here is for a single recomputed value of

\alpha

and is not part of the Benjamini and Hochberg method.

Benjamini–Yekutieli procedure

The ''Benjamini–Yekutieli'' procedure controls the false discovery rate under arbitrary dependence assumptions. This refinement modifies the threshold and finds the largest such that: :

P_ \leq \frac \alpha

* If the tests are independent or positively correlated (as in Benjamini–Hochberg procedure):

c(m)=1

* Under arbitrary dependence (including the case of negative correlation), c(m) is the

harmonic number In mathematics, the -th harmonic number is the sum of the reciprocals of the first natural numbers: H_n= 1+\frac+\frac+\cdots+\frac =\sum_^n \frac. Starting from , the sequence of harmonic numbers begins: 1, \frac, \frac, \frac, \frac, \do ...

c(m) = \sum _ ^m \frac

. : Note that

c(m)

can be approximated by using the

Taylor series expansion In mathematics, the Taylor series or Taylor expansion of a function is an infinite sum of terms that are expressed in terms of the function's derivatives at a single point. For most common functions, the function and the sum of its Taylor ser ...

and the

Euler–Mascheroni constant Euler's constant (sometimes also called the Euler–Mascheroni constant) is a mathematical constant usually denoted by the lowercase Greek letter gamma (). It is defined as the limiting difference between the harmonic series and the natural ...

(

\gamma = 0.57721...

): :

\sum _ ^m \frac \approx \ln(m) + \gamma + \frac.

Using MFDR and formulas above, an adjusted MFDR, or AFDR, is the min(mean

\alpha

) for dependent tests

= \frac\mathrm

. Another way to address dependence is by bootstrapping and rerandomization.

Properties

Adaptive and scalable

Using a multiplicity procedure that controls the FDR criterion is

adaptive Adaptation, in biology, is the process or trait by which organisms or population better match their environment Adaptation may also refer to: Arts * Adaptation (arts), a transfer of a work of art from one medium to another ** Film adaptation, ...

and

scalable Scalability is the property of a system to handle a growing amount of work by adding resources to the system. In an economic context, a scalable business model implies that a company can increase sales given increased resources. For example, a ...

. Meaning that controlling the FDR can be very permissive (if the data justify it), or conservative (acting close to control of FWER for sparse problem) - all depending on the number of hypotheses tested and the level of significance. The FDR criterion ''adapts'' so that the same number of false discoveries (V) will have different implications, depending on the total number of discoveries (R). This contrasts with the

criterion. For example, if inspecting 100 hypotheses (say, 100 genetic mutations or

SNPs In genetics, a single-nucleotide polymorphism (SNP ; plural SNPs ) is a germline substitution of a single nucleotide at a specific position in the genome. Although certain definitions require the substitution to be present in a sufficiently larg ...

for association with some phenotype in some population): * If we make 4 discoveries (R), having 2 of them be false discoveries (V) is often very costly. Whereas, * If we make 50 discoveries (R), having 2 of them be false discoveries (V) is often not very costly. The FDR criterion is ''scalable'' in that the same proportion of false discoveries out of the total number of discoveries (Q), remains sensible for different number of total discoveries (R). For example: * If we make 100 discoveries (R), having 5 of them be false discoveries (

q=5\%

) may not be very costly. * Similarly, if we make 1000 discoveries (R), having 50 of them be false discoveries (as before,

q=5\%

) may still not be very costly.

Dependency among the test statistics

Controlling the FDR using the linear step-up BH procedure, at level q, has several properties related to the dependency structure between the test statistics of the null hypotheses that are being corrected for. If the test statistics are: * Independent:

\mathrm \le \fracq

* Independent and continuous:

\mathrm = \fracq

* Positive dependent:

\mathrm \le \fracq

* In the general case:

\mathrm \le \frac q / \left( 1 + \frac + \frac + \cdots + \frac \right) \approx \fracq / (\ln (m) + \gamma + \frac)

, where

\gamma

is the

Proportion of true hypotheses

If all of the null hypotheses are true (

m_0=m

), then controlling the FDR at level guarantees control over the

(this is also called "weak control of the FWER"):

\mathrm=P\left( V \ge 1 \right) = E\left( \frac \right) = \mathrm \le q

, simply because the event of rejecting at least one true null hypothesis

\

is exactly the event

\

, and the event

\

is exactly the event

\

(when

V = R = 0

V/R = 0

by definition). But if there are some true discoveries to be made (

m_0) then .  In that case there will be room for improving detection power.  It also means that any procedure that controls the FWER will also control the FDR.

Related concepts

The discovery of the FDR was preceded and followed by many other types of error rates. These include: * (

per-comparison error rate In statistics, per-comparison error rate (PCER) is the probability of a Type I error in the absence of any multiple hypothesis testing correction. This is a liberal error rate relative to the false discovery rate and family-wise error rate In sta ...

) is defined as:

\mathrm = E \left \frac \right

. Testing individually each hypothesis at level guarantees that

\mathrm \le \alpha

(this is testing without any correction for multiplicity) * (the

) is defined as:

\mathrm = P(V \ge 1)

. There are numerous procedures that control the FWER. *

k\text

(The tail probability of the False Discovery Proportion), suggested by Lehmann and Romano, van der Laan at al, is defined as:

k\text = P(V \ge k) \le q

. *

k\text

(also called the ''generalized FDR'' by Sarkar in 2007) is defined as:

k\text = E \left( \fracI_  \right) \le q

. *

Q'

is the proportion of false discoveries among the discoveries", suggested by Soric in 1989, and is defined as:

Q' = \frac

. This is a mixture of expectations and realizations, and has the problem of control for

m_0=m

. *

\mathrm_

(or Fdr) was used by Benjamini and Hochberg, and later called "Fdr" by Efron (2008) and earlier. It is defined as:

\mathrm_ = Fdr = \frac

. This error rate cannot be strictly controlled because it is 1 when

m = m_0

. *

\mathrm_

was used by Benjamini and Hochberg, and later called "pFDR" by Storey (2002). It is defined as:

\mathrm_ = pFDR = E \left R>0 \right

. This error rate cannot be strictly controlled because it is 1 when

m = m_0

. JD Storey promoted the use of the pFDR (a close relative of the FDR), and the q-value, which can be viewed as the proportion of false discoveries that we expect in an ordered table of results, up to the current line. Storey also promoted the idea (also mentioned by BH) that the actual number of null hypotheses,

m_0

, can be estimated from the shape of the probability distribution curve. For example, in a set of data where all null hypotheses are true, 50% of results will yield probabilities between 0.5 and 1.0 (and the other 50% will yield probabilities between 0.0 and 0.5). We can therefore estimate

m_0

by finding the number of results with

P > 0.5

and doubling it, and this permits refinement of our calculation of the pFDR at any particular cut-off in the data-set. * False exceedance rate (the tail probability of FDP), defined as:

\mathrm \left( \frac > q \right)

W\text

(Weighted FDR). Associated with each hypothesis i is a weight

w_i \ge 0

, the weights capture importance/price. The W-FDR is defined as:

W\text = E\left( \frac \right)

. * (False Discovery Cost Rate). Stemming from

statistical process control Statistical process control (SPC) or statistical quality control (SQC) is the application of statistical methods to monitor and control the quality of a production process. This helps to ensure that the process operates efficiently, producing ...

: associated with each hypothesis i is a cost

\mathrm_i

and with the intersection hypothesis

H_

a cost

c_0

. The motivation is that stopping a production process may incur a fixed cost. It is defined as:

\mathrm = E\left( c_0 V_0 + \frac \right)

* (per-family error rate) is defined as:

\mathrm = E(V)

. * (False non-discovery rates) by Sarkar; Genovese and Wasserman is defined as:

\mathrm = E\left( \frac \right) = E\left( \frac \right)

\mathrm(z)

is defined as:

\mathrm(z) = \frac

\mathrm

The local fdr is defined as:

\mathrm = \frac

False coverage rate

The

false coverage rate In statistics, a false coverage rate (FCR) is the average rate of false coverage, i.e. not covering the true parameters, among the selected intervals. The FCR gives a simultaneous coverage at a (1 − ''α'')×100% level for al ...

(FCR) is, in a sense, the FDR analog to the

confidence interval In frequentist statistics, a confidence interval (CI) is a range of estimates for an unknown parameter. A confidence interval is computed at a designated ''confidence level''; the 95% confidence level is most common, but other levels, such as ...

. FCR indicates the average rate of false coverage, namely, not covering the true parameters, among the selected intervals. The FCR gives a simultaneous coverage at a

1-\alpha

level for all of the parameters considered in the problem. Intervals with simultaneous coverage probability 1−q can control the FCR to be bounded by ''q''. There are many FCR procedures such as: Bonferroni-Selected–Bonferroni-Adjusted, Adjusted BH-Selected CIs (Benjamini and Yekutieli (2005)), Bayes FCR (Yekutieli (2008)), and other Bayes methods.

Bayesian approaches

Connections have been made between the FDR and Bayesian approaches (including empirical Bayes methods), thresholding wavelets coefficients and

model selection Model selection is the task of selecting a statistical model from a set of candidate models, given data. In the simplest cases, a pre-existing set of data is considered. However, the task can also involve the design of experiments such that the ...

, and generalizing the

into the false coverage statement rate (FCR).

References

External links

False Discovery Rate Analysis in R
– Lists links with popular R packages
False Discovery Rate Analysis in Python
– Python implementations of false discovery rate procedures
False Discovery Rate: Corrected & Adjusted P-values
-

MATLAB MATLAB (an abbreviation of "MATrix LABoratory") is a proprietary multi-paradigm programming language and numeric computing environment developed by MathWorks. MATLAB allows matrix manipulations, plotting of functions and data, implementa ...

GNU Octave GNU Octave is a high-level programming language primarily intended for scientific computing and numerical computation. Octave helps in solving linear and nonlinear problems numerically, and for performing other numerical experiments using a lang ...

implementation and discussion on the difference between corrected and adjusted FDR p-values.
Understanding False Discovery Rate
- blog post *
Understanding False Discovery Rate
- Includes Excel VBA code to implement it, and an example in cell line development {{Statistics, state=expanded Statistical hypothesis testing Summary statistics for contingency tables Multiple comparisons Israeli inventions