statistics Statistics (from German: ''Statistik'', "description of a state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a scientific, industri ...

, Fisher's method, also known as Fisher's combined probability test, is a technique for

data fusion Data fusion is the process of integrating multiple data sources to produce more consistent, accurate, and useful information than that provided by any individual data source. Data fusion processes are often categorized as low, intermediate, or hig ...

or "

meta-analysis A meta-analysis is a statistical analysis that combines the results of multiple scientific studies. Meta-analyses can be performed when there are multiple scientific studies addressing the same question, with each individual study reporting me ...

" (analysis of analyses). It was developed by and named for

Ronald Fisher Sir Ronald Aylmer Fisher (17 February 1890 – 29 July 1962) was a British polymath who was active as a mathematician, statistician, biologist, geneticist, and academic. For his work in statistics, he has been described as "a genius who ...

. In its basic form, it is used to combine the results from several independence tests bearing upon the same overall

hypothesis A hypothesis (plural hypotheses) is a proposed explanation for a phenomenon. For a hypothesis to be a scientific hypothesis, the scientific method requires that one can test it. Scientists generally base scientific hypotheses on previous obser ...

(''H''₀).

Application to independent test statistics

Fisher's method combines extreme value

probabilities Probability is the branch of mathematics concerning numerical descriptions of how likely an event is to occur, or how likely it is that a proposition is true. The probability of an event is a number between 0 and 1, where, roughly speakin ...

from each test, commonly known as "

p-value In null-hypothesis significance testing, the ''p''-value is the probability of obtaining test results at least as extreme as the result actually observed, under the assumption that the null hypothesis is correct. A very small ''p''-value means ...

s", into one

test statistic A test statistic is a statistic (a quantity derived from the sample) used in statistical hypothesis testing.Berger, R. L.; Casella, G. (2001). ''Statistical Inference'', Duxbury Press, Second Edition (p.374) A hypothesis test is typically specifie ...

(''X''²) using the formula :

X^2_ \sim -2\sum_^k \log(p_i),

where ''p''_''i'' is the p-value for the ''i''^th hypothesis test. When the p-values tend to be small, the test statistic ''X''² will be large, which suggests that the null hypotheses are not true for every test. When all the null hypotheses are true, and the ''p''_''i'' (or their corresponding test statistics) are independent, ''X''² has a

chi-squared distribution In probability theory and statistics, the chi-squared distribution (also chi-square or \chi^2-distribution) with k degrees of freedom is the distribution of a sum of the squares of k independent standard normal random variables. The chi-square ...

with 2''k''

degrees of freedom Degrees of freedom (often abbreviated df or DOF) refers to the number of independent variables or parameters of a thermodynamic system. In various scientific fields, the word "freedom" is used to describe the limits to which physical movement or ...

, where ''k'' is the number of

tests Test(s), testing, or TEST may refer to: * Test (assessment), an educational assessment intended to measure the respondents' knowledge or other abilities Arts and entertainment * ''Test'' (2013 film), an American film * ''Test'' (2014 film), ...

being combined. This fact can be used to determine the

for ''X''². The distribution of ''X''² is a

for the following reason; under the null hypothesis for test ''i'', the p-value ''p''_''i'' follows a

uniform distribution Uniform distribution may refer to: * Continuous uniform distribution * Discrete uniform distribution * Uniform distribution (ecology) * Equidistributed sequence See also * * Homogeneous distribution In mathematics, a homogeneous distribution ...

on the interval ,1 The negative logarithm of a uniformly distributed value follows an

exponential distribution In probability theory and statistics, the exponential distribution is the probability distribution of the time between events in a Poisson point process, i.e., a process in which events occur continuously and independently at a constant average ...

. Scaling a value that follows an exponential distribution by a factor of two yields a quantity that follows a

with two degrees of freedom. Finally, the sum of ''k'' independent chi-squared values, each with two degrees of freedom, follows a chi-squared distribution with 2''k'' degrees of freedom.

Limitations of independence assumption

Dependence among statistical tests is generally positive, which means that the p-value of ''X''² is too small (anti-conservative) if the dependency is not taken into account. Thus, if Fisher's method for independent tests is applied in a dependent setting, and the p-value is not small enough to reject the null hypothesis, then that conclusion will continue to hold even if the dependence is not properly accounted for. However, if positive dependence is not accounted for, and the meta-analysis p-value is found to be small, the evidence against the null hypothesis is generally overstated. The mean false discovery rate,

\alpha(k+1)/(2k)

\alpha

reduced for ''k'' independent or positively correlated tests, may suffice to control

alpha Alpha (uppercase , lowercase ; grc, ἄλφα, ''álpha'', or ell, άλφα, álfa) is the first letter of the Greek alphabet. In the system of Greek numerals, it has a value of one. Alpha is derived from the Phoenician letter aleph , whic ...

for useful comparison to an over-small p-value from Fisher's ''X''².

Extension to dependent test statistics

In cases where the tests are not independent, the null distribution of ''X''² is more complicated. A common strategy is to approximate the null distribution with a scaled random variable. Different approaches may be used depending on whether or not the covariance between the different p-values is known. Brown's method can be used to combine dependent p-values whose underlying test statistics have a multivariate normal distribution with a known covariance matrix. Kost's method extends Brown's to allow one to combine p-values when the covariance matrix is known only up to a scalar multiplicative factor. The harmonic mean ''p''-value offers an alternative to Fisher's method for combining ''p''-values when the dependency structure is unknown but the tests cannot be assumed to be independent.

Interpretation

Fisher's method is typically applied to a collection of independent test statistics, usually from separate studies having the same null hypothesis. The meta-analysis null hypothesis is that all of the separate null hypotheses are true. The meta-analysis alternative hypothesis is that at least one of the separate ''alternative'' hypotheses is true. In some settings, it makes sense to consider the possibility of "heterogeneity," in which the null hypothesis holds in some studies but not in others, or where different alternative hypotheses may hold in different studies. A common reason for the latter form of heterogeneity is that

effect size In statistics, an effect size is a value measuring the strength of the relationship between two variables in a population, or a sample-based estimate of that quantity. It can refer to the value of a statistic calculated from a sample of data, the ...

s may differ among populations. For example, consider a collection of medical studies looking at the risk of a high glucose diet for developing type II

diabetes Diabetes, also known as diabetes mellitus, is a group of metabolic disorders characterized by a high blood sugar level (hyperglycemia) over a prolonged period of time. Symptoms often include frequent urination, increased thirst and increased ap ...

. Due to genetic or environmental factors, the true risk associated with a given level of glucose consumption may be greater in some human populations than in others. In other settings, the alternative hypothesis is either universally false, or universally true – there is no possibility of it holding in some settings but not in others. For example, consider several experiments designed to test a particular physical law. Any discrepancies among the results from separate studies or experiments must be due to chance, possibly driven by differences in

power Power most often refers to: * Power (physics), meaning "rate of doing work" ** Engine power, the power put out by an engine ** Electric power * Power (social and political), the ability to influence people or events ** Abusive power Power may ...

. In the case of a meta-analysis using two-sided tests, it is possible to reject the meta-analysis null hypothesis even when the individual studies show strong effects in differing directions. In this case, we are rejecting the hypothesis that the null hypothesis is true in every study, but this does not imply that there is a uniform alternative hypothesis that holds across all studies. Thus, two-sided meta-analysis is particularly sensitive to heterogeneity in the alternative hypotheses. One sided meta-analysis can detect heterogeneity in the effect magnitudes, but focuses on a single, pre-specified effect direction.

Relation to Stouffer's Z-score method

A closely related approach to Fisher's method is Stouffer's Z, based on Z-scores rather than p-values, allowing incorporation of study weights. It is named for the sociologist Samuel A. Stouffer. If we let ''Z''_''i'' = ''Φ''^− 1(1−''p''_''i''), where ''Φ'' is the standard normal

cumulative distribution function In probability theory and statistics, the cumulative distribution function (CDF) of a real-valued random variable X, or just distribution function of X, evaluated at x, is the probability that X will take a value less than or equal to x. Ever ...

, then :

Z \sim \frac,

is a Z-score for the overall meta-analysis. This Z-score is appropriate for one-sided right-tailed p-values; minor modifications can be made if two-sided or left-tailed p-values are being analysed. Specifically, if two-sided p-values are being analyzed, the two-sided p-value (p_i/2) is used, or 1-p_i if left-tailed p-values are used. Since Fisher's method is based on the average of −log(''p''_''i'') values, and the Z-score method is based on the average of the ''Z''_''i'' values, the relationship between these two approaches follows from the relationship between ''z'' and −log(''p'') = −log(1−''Φ''(''z'')). For the normal distribution, these two values are not perfectly linearly related, but they follow a highly linear relationship over the range of Z-values most often observed, from 1 to 5. As a result, the power of the Z-score method is nearly identical to the power of Fisher's method. One advantage of the Z-score approach is that it is straightforward to introduce weights. If the ''i''^''th'' Z-score is weighted by ''w''_''i'', then the meta-analysis Z-score is :

Z \sim \frac,

which follows a standard normal distribution under the null hypothesis. While weighted versions of Fisher's statistic can be derived, the null distribution becomes a weighted sum of independent chi-squared statistics, which is less convenient to work with.

Application to independent test statistics

Limitations of independence assumption

Extension to dependent test statistics

Interpretation

Relation to Stouffer's Z-score method

References

See also