HOME

TheInfoList



OR:

In statistics, hypotheses suggested by a given dataset, when tested with the same dataset that suggested them, are likely to be accepted even when they are not true. This is because circular reasoning (double dipping) would be involved: something seems true in the limited data set; therefore we hypothesize that it is true in general; therefore we wrongly test it on the same, limited data set, which seems to confirm that it is true. Generating hypotheses based on data already observed, in the absence of testing them on new data, is referred to as post hoc theorizing (from
Latin Latin (, or , ) is a classical language belonging to the Italic branch of the Indo-European languages. Latin was originally a dialect spoken in the lower Tiber area (then known as Latium) around present-day Rome, but through the power of the ...
'' post hoc'', "after this"). The correct procedure is to test any hypothesis on a data set that was not used to generate the hypothesis.


The general problem

Testing a hypothesis suggested by the data can very easily result in false positives (
type I error In statistical hypothesis testing, a type I error is the mistaken rejection of an actually true null hypothesis (also known as a "false positive" finding or conclusion; example: "an innocent person is convicted"), while a type II error is the fa ...
s). If one looks long enough and in enough different places, eventually data can be found to support any hypothesis. Yet, these positive data do not by themselves constitute evidence that the hypothesis is correct. The negative test data that were thrown out are just as important, because they give one an idea of how common the positive results are compared to chance. Running an experiment, seeing a pattern in the data, proposing a hypothesis from that pattern, then using the ''same'' experimental data as evidence for the new hypothesis is extremely suspect, because data from all other experiments, completed or potential, has essentially been "thrown out" by choosing to look only at the experiments that suggested the new hypothesis in the first place. A large set of tests as described above greatly inflates the
probability Probability is the branch of mathematics concerning numerical descriptions of how likely an event is to occur, or how likely it is that a proposition is true. The probability of an event is a number between 0 and 1, where, roughly speakin ...
of
type I error In statistical hypothesis testing, a type I error is the mistaken rejection of an actually true null hypothesis (also known as a "false positive" finding or conclusion; example: "an innocent person is convicted"), while a type II error is the fa ...
as all but the data most favorable to the
hypothesis A hypothesis (plural hypotheses) is a proposed explanation for a phenomenon. For a hypothesis to be a scientific hypothesis, the scientific method requires that one can test it. Scientists generally base scientific hypotheses on previous obse ...
is discarded. This is a risk, not only in
hypothesis testing A statistical hypothesis test is a method of statistical inference used to decide whether the data at hand sufficiently support a particular hypothesis. Hypothesis testing allows us to make probabilistic statements about population parameters. ...
but in all statistical inference as it is often problematic to accurately describe the process that has been followed in searching and discarding
data In the pursuit of knowledge, data (; ) is a collection of discrete Value_(semiotics), values that convey information, describing quantity, qualitative property, quality, fact, statistics, other basic units of meaning, or simply sequences of sy ...
. In other words, one wants to keep all data (regardless of whether they tend to support or refute the hypothesis) from "good tests", but it is sometimes difficult to figure out what a "good test" is. It is a particular problem in statistical modelling, where many different models are rejected by
trial and error Trial and error is a fundamental method of problem-solving characterized by repeated, varied attempts which are continued until success, or until the practicer stops trying. According to W.H. Thorpe, the term was devised by C. Lloyd Morgan (18 ...
before publishing a result (see also overfitting, publication bias). The error is particularly prevalent in data mining and
machine learning Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence. Machine ...
. It also commonly occurs in
academic publishing Academic publishing is the subfield of publishing which distributes academic research and scholarship. Most academic work is published in academic journal articles, books or theses. The part of academic written output that is not formally pu ...
where only reports of positive, rather than negative, results tend to be accepted, resulting in the effect known as publication bias.


Correct procedures

All strategies for sound testing of hypotheses suggested by the data involve including a wider range of tests in an attempt to validate or refute the new hypothesis. These include: *Collecting
confirmation sample In Christian denominations that practice infant baptism, confirmation is seen as the sealing of the covenant created in baptism. Those being confirmed are known as confirmands. For adults, it is an affirmation of belief. It involves laying on ...
s * Cross-validation *Methods of compensation for
multiple comparisons In statistics, the multiple comparisons, multiplicity or multiple testing problem occurs when one considers a set of statistical inferences simultaneously or infers a subset of parameters selected based on the observed values. The more inferences ...
*Simulation studies including adequate representation of the multiple-testing actually involved Henry Scheffé's simultaneous test of all contrasts in
multiple comparison In statistics, the multiple comparisons, multiplicity or multiple testing problem occurs when one considers a set of statistical inferences simultaneously or infers a subset of parameters selected based on the observed values. The more inferences ...
problems is the most well-known remedy in the case of
analysis of variance Analysis of variance (ANOVA) is a collection of statistical models and their associated estimation procedures (such as the "variation" among and between groups) used to analyze the differences among means. ANOVA was developed by the statistician ...
.
Henry Scheffé Henry Scheffé (April 11, 1907 – July 5, 1977) was an American statistician. He is known for the Lehmann–Scheffé theorem and Scheffé's method. Education and career Scheffé was born in New York City on April 11, 1907, the child of Germa ...
, "A Method for Judging All Contrasts in the Analysis of Variance", '' Biometrika'', 40, pages 87–104 (1953).
It is a method designed for testing hypotheses suggested by the data while avoiding the fallacy described above.


See also

*
Bonferroni correction In statistics, the Bonferroni correction is a method to counteract the multiple comparisons problem. Background The method is named for its use of the Bonferroni inequalities. An extension of the method to confidence intervals was proposed by Ol ...
* Data analysis * Data dredging *
Exploratory data analysis In statistics, exploratory data analysis (EDA) is an approach of analyzing data sets to summarize their main characteristics, often using statistical graphics and other data visualization methods. A statistical model can be used or not, but pri ...
*
HARKing HARKing (hypothesizing after the results are known) is an acronym coined by social psychologist Norbert Kerr that refers to the questionable research practice of “presenting a post hoc hypothesis in the introduction of a research report as if it ...
* ''p''-hacking *
Post hoc analysis In a scientific study, post hoc analysis (from Latin '' post hoc'', "after this") consists of statistical analyses that were specified after the data were seen. They are usually used to uncover specific differences between three or more group mea ...
*
Predictive analytics Predictive analytics encompasses a variety of statistical techniques from data mining, predictive modeling, and machine learning that analyze current and historical facts to make predictions about future or otherwise unknown events. In busine ...
* Texas sharpshooter fallacy *
Type I and type II errors In statistical hypothesis testing, a type I error is the mistaken rejection of an actually true null hypothesis (also known as a "false positive" finding or conclusion; example: "an innocent person is convicted"), while a type II error is the fa ...
* Uncomfortable science


Notes and references

{{reflist Statistical hypothesis testing Misuse of statistics Multiple comparisons