Invalid science consists of scientific claims based on experiments that cannot be reproduced or that are contradicted by experiments that can be reproduced. Recent analyses indicate that the proportion of retracted claims in the scientific literature is steadily increasing. The number of retractions has grown tenfold over the past decade, but they still make up approximately 0.2% of the 1.4m papers published annually in scholarly journals. The U.S. Office of Research Integrity (ORI), investigates scientific misconduct.

Incidence

Science magazine ''Science'', also widely referred to as ''Science Magazine'', is the peer-reviewed academic journal of the American Association for the Advancement of Science (AAAS) and one of the world's top academic journals. It was first published in 1880, ...

ranked first for the number of articles retracted at 70, just edging out

PNAS ''Proceedings of the National Academy of Sciences of the United States of America'' (often abbreviated ''PNAS'' or ''PNAS USA'') is a peer-reviewed multidisciplinary scientific journal. It is the official journal of the National Academy of Scien ...

, which retracted 69. Thirty-two of Science's retractions were due to fraud or suspected fraud, and 37 to error. A subsequent "retraction index" indicated that journals with relatively high impact factors, such as Science,

Nature Nature, in the broadest sense, is the physical world or universe. "Nature" can refer to the phenomena of the physical world, and also to life in general. The study of nature is a large, if not the only, part of science. Although humans ar ...

and

Cell Cell most often refers to: * Cell (biology), the functional basic unit of life Cell may also refer to: Locations * Monastic cell, a small room, hut, or cave in which a religious recluse lives, alternatively the small precursor of a monastery w ...

, had a higher rate of retractions. Under 0.1% of papers in

PubMed PubMed is a free search engine accessing primarily the MEDLINE database of references and abstracts on life sciences and biomedical topics. The United States National Library of Medicine (NLM) at the National Institutes of Health maintain ...

had were retracted of more than 25 million papers going back to the 1940s. The fraction of retracted papers due to scientific misconduct was estimated at two-thirds, according to studies of 2047 papers published since 1977. Misconducted included fraud and plagiarism. Another one-fifth were retracted because of mistakes, and the rest were pulled for unknown or other reasons. A separate study analyzed 432 claims of genetic links for various health risks that vary between men and women. Only one of these claims proved to be consistently reproducible. Another meta review, found that of the 49 most-cited clinical research studies published between 1990 and 2003, more than 40 percent of them were later shown to be either totally wrong or significantly incorrect.

Biological sciences

In 2012 biotech firm

Amgen Amgen Inc. (formerly Applied Molecular Genetics Inc.) is an American multinational biopharmaceutical company headquartered in Thousand Oaks, California. One of the world's largest independent biotechnology companies, Amgen was established in ...

was able to reproduce just six of 53 important studies in

cancer research Cancer research is research into cancer to identify causes and develop strategies for prevention, diagnosis, treatment, and cure. Cancer research ranges from epidemiology, molecular bioscience to the performance of clinical trials to evaluate a ...

. Earlier, a group at

Bayer Bayer AG (, commonly pronounced ; ) is a German multinational pharmaceutical and biotechnology company and one of the largest pharmaceutical companies in the world. Headquartered in Leverkusen, Bayer's areas of business include pharmaceutica ...

, a drug company, successfully repeated only one fourth of 67 important papers. In 2000-10 roughly 80,000 patients took part in clinical trials based on research that was later retracted because of mistakes or improprieties.

Paleontology

Nathan Myhrvold Nathan Paul Myhrvold (born August 3, 1959), formerly Chief Technology Officer at Microsoft, is co-founder of Intellectual Ventures and the principal author of '' Modernist Cuisine'' and its successor books. Myhrvold was listed as co-inventor ...

failed repeatedly to replicate the findings of several papers on dinosaur growth. Dinosaurs added a layer to their bones each year.

Tyrannosaurus rex ''Tyrannosaurus'' is a genus of large theropod dinosaur. The species ''Tyrannosaurus rex'' (''rex'' meaning "king" in Latin), often called ''T. rex'' or colloquially ''T-Rex'', is one of the best represented theropods. ''Tyrannosaurus'' live ...

was thought to have increased in size by more than 700 kg a year, until Mhyrvold showed that this was a factor of 2 too large. In 4 of 12 papers he examined, the original data had been lost. In three, the statistics were correct, while three had serious errors that invalidated their conclusions. Two papers mistakenly relied on data from these three. He discovered that some of the paper's graphs did not reflect the data. In one case, he found that only four of nine points on the graph came from data cited in the paper.

Major retractions

Torcetrapib Torcetrapib (CP-529,414, Pfizer) was a drug being developed to treat hypercholesterolemia (elevated cholesterol levels) and prevent cardiovascular disease. Its development was halted in 2006 when phase III studies showed excessive all-cause mo ...

was originally hyped as a drug that could block a protein that converts

HDL cholesterol High-density lipoprotein (HDL) is one of the five major groups of lipoproteins. Lipoproteins are complex particles composed of multiple proteins which transport all fat molecules (lipids) around the body within the water outside cells. They are t ...

into LDL with the potential to "redefine cardiovascular treatment". One

clinical trial Clinical trials are prospective biomedical or behavioral research studies on human subject research, human participants designed to answer specific questions about biomedical or behavioral interventions, including new treatments (such as novel v ...

showed that the drug could increase HDL and decrease LDL. Two days after

Pfizer Pfizer Inc. ( ) is an American multinational pharmaceutical and biotechnology corporation headquartered on 42nd Street in Manhattan, New York City. The company was established in 1849 in New York by two German entrepreneurs, Charles Pfize ...

announced its plans for the drug, it ended the Phase III clinical trial due to higher rates of chest pain and heart failure and a 60 percent increase in overall mortality. Pfizer had invested more than $1 billion in developing the drug. An in-depth review of the most highly cited biomarkers (whose presence are used to infer illness and measure treatment effects) claimed that 83 percent of supposed correlations became significantly weaker in subsequent studies.

Homocysteine Homocysteine is a non-proteinogenic α-amino acid. It is a homologue of the amino acid cysteine, differing by an additional methylene bridge (-CH2-). It is biosynthesized from methionine by the removal of its terminal Cε methyl group. In th ...

is an amino acid whose levels correlated with heart disease. However, a 2010 study showed that lowering homocysteine by nearly 30 percent had no effect on heart attack or stroke.

Priming

Priming Priming may refer to: * Priming (agriculture), a form of seed planting preparation, in which seeds are soaked before planting * Priming (immunology), a process occurring when a specific antigen is presented to naive lymphocytes causing them to ...

studies claim that decisions can be influenced by apparently irrelevant events that a subject witnesses just before making a choice. Nobel Prize-winner Daniel Kahneman alleges that much of it is poorly founded. Researchers have been unable to replicate some of the more widely cited examples. A paper in PLoS ONE reported that nine separate experiments could not reproduce a study purporting to show that thinking about a professor before taking an intelligence test leads to a higher score than imagining a football hooligan. A further systematic replication involving 40 different labs around the world did not replicate the main finding. However, this latter systematic replication showed that participants who did not think there was a relation between thinking about a hooligan or a professor were significantly more susceptible to the priming manipulation.

Potential causes

Competition

In the 1950s, when academic research accelerated during the cold war, the total number of scientists was a few hundred thousand. In the new century 6m-7m researchers are active. The number of research jobs has not matched this increase. Every year six new PhDs compete for every academic post. Replicating other researcher’s results is not perceived to be valuable. The struggle to compete encourages exaggeration of findings and biased data selection. A recent survey found that one in three researchers knows of a colleague who has at least somewhat distorted their results.

Publication bias

Major journals reject in excess of 90% of submitted manuscripts and tend to favor the most dramatic claims. The statistical measures that researchers use to test their claims allow a fraction of false claims to appear valid. Invalid claims are more likely to be dramatic (because they are false.) Without replication, such errors are less likely to be caught. Conversely, failures to prove a hypothesis are rarely even offered for publication. “Negative results” now account for only 14% of published papers, down from 30% in 1990. Knowledge of what is not true is as important as of what is true.

Peer review

Peer review Peer review is the evaluation of work by one or more people with similar competencies as the producers of the work ( peers). It functions as a form of self-regulation by qualified members of a profession within the relevant field. Peer revie ...

is the primary validation technique employed by scientific publications. However, a prominent medical journal tested the system and found major failings. It supplied research with induced errors and found that most reviewers failed to spot the mistakes, even after being told of the tests. A pseudonymous fabricated paper on the effects of a chemical derived from lichen on cancer cells was submitted to 304 journals for peer review. The paper was filled with errors of study design, analysis and interpretation. 157 lower-rated journals accepted it. Another study sent an article containing eight deliberate mistakes in study design, analysis and interpretation to more than 200 of the

British Medical Journal ''The BMJ'' is a weekly peer-reviewed medical trade journal, published by the trade union the British Medical Association (BMA). ''The BMJ'' has editorial freedom from the BMA. It is one of the world's oldest general medical journals. Origin ...

’s regular reviewers. On average, they reported fewer than two of the problems. Peer reviewers typically do not re-analyse data from scratch, checking only that the authors’ analysis is properly conceived.

Statistics

Type I and type II errors

Scientists divide errors into type I, incorrectly asserting the truth of a hypothesis (false positive) and type II, rejecting a correct hypothesis (false negative). Statistical checks assess the probability that data which seem to support a hypothesis come about simply by chance. If the probability is less than 5%, the evidence is rated “statistically significant”. One definitional consequence is a type one error rate of one in 20.

Statistical power

In 2005 Stanford

epidemiologist Epidemiology is the study and analysis of the distribution (who, when, and where), patterns and determinants of health and disease conditions in a defined population. It is a cornerstone of public health, and shapes policy decisions and evidenc ...

John Ioannidis showed that the idea that only one paper in 20 gives a false-positive result was incorrect. He claimed, “most published research findings are probably false.” He found three categories of problems: insufficient “

statistical power In statistics, the power of a binary hypothesis test is the probability that the test correctly rejects the null hypothesis (H_0) when a specific alternative hypothesis (H_1) is true. It is commonly denoted by 1-\beta, and represents the chances ...

” (avoiding type II errors); the unlikeliness of the hypothesis; and publication bias favoring novel claims. A statistically powerful study identifies factors with only small effects on data. In general studies with more repetitions that run the experiment more times on more subjects have greater power. A power of 0.8 means that of ten true hypotheses tested, the effects of two are missed. Ioannidis found that in neuroscience the typical statistical power is 0.21; another study found that psychology studies average 0.35. Unlikeliness is a measure of the degree of surprise in a result. Scientists prefer surprising results, leading them to test hypotheses that are unlikely to very unlikely. Ioannidis claimed that in epidemiology, some one in ten hypotheses should be true. In exploratory disciplines like genomics, which rely on examining voluminous data about genes and proteins, only one in a thousand should prove correct. In a discipline in which 100 out of 1,000 hypotheses are true, studies with a power of 0.8 will find 80 and miss 20. Of the 900 incorrect hypotheses, 5% or 45 will be accepted because of type I errors. Adding the 45 false positives to the 80 true positives gives 125 positive results, or 36% specious. Dropping statistical power to 0.4, optimistic for many fields, would still produce 45 false positives but only 40 true positives, less than half. Negative results are more reliable. Statistical power of 0.8 produces 875 negative results of which only 20 are false, giving an accuracy of over 97%. Negative results however account for a minority of published results, varying by discipline. A study of 4,600 papers found that the proportion of published negative results dropped from 30% to 14% between 1990 and 2007. Subatomic physics sets an acceptable false-positive rate of one in 3.5m (known as the five-sigma standard). However, even this does not provide perfect protection. The problem invalidates some 3/4s of

machine learning Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence. Machine ...

studies according to one review.

Statistical significance

Statistical significance In statistical hypothesis testing, a result has statistical significance when it is very unlikely to have occurred given the null hypothesis (simply by chance alone). More precisely, a study's defined significance level, denoted by \alpha, is the p ...

is a measure for testing

statistical correlation In statistics, correlation or dependence is any statistical relationship, whether causal or not, between two random variables or bivariate data. Although in the broadest sense, "correlation" may indicate any type of association, in statistic ...

. It was invented by English mathematician

Ronald Fisher Sir Ronald Aylmer Fisher (17 February 1890 – 29 July 1962) was a British polymath who was active as a mathematician, statistician, biologist, geneticist, and academic. For his work in statistics, he has been described as "a genius who ...

in the 1920s. It defines a “significant” result as any data point that would be produced by chance less than 5 (or more stringently, 1) percent of the time. A significant result is widely seen as an important indicator that the correlation is not random. While correlations track the relationship between truly independent measurements, such as smoking and cancer, they are much less effective when variables cannot be isolated, a common circumstance in biological systems. For example, statistics found a high correlation between lower back pain and abnormalities in spinal discs, although it was later discovered that serious abnormalities were present in two-thirds of pain-free patients.

Minimum threshold publishers

Journals such as PLoS One use a “minimal-threshold” standard, seeking to publish as much science as possible, rather than to pick out the best work. Their peer reviewers assess only whether a paper is methodologically sound. Almost half of their submissions are still rejected on that basis.

Unpublished research

Only 22% of the

s financed by the

National Institutes of Health The National Institutes of Health, commonly referred to as NIH (with each letter pronounced individually), is the primary agency of the United States government The federal government of the United States (U.S. federal government or U ...

(NIH) released summary results within one year of completion, even though the NIH requires it. Fewer than half published within 30 months; a third remained unpublished after 51 months. When other scientists rely on invalid research, they may waste time on lines of research that are themselves invalid. The failure to report failures means that researchers waste money and effort exploring blind alleys already investigated by other scientists.

Fraud

In 21 surveys of academics (mostly in the biomedical sciences but also in civil engineering, chemistry and economics) carried out between 1987 and 2008, 2% admitted fabricating data, but 28% claimed to know of colleagues who engaged in questionable research practices.

Lack of access to data and software

Clinical trials are generally too costly to rerun. Access to trial data is the only practical approach to reassessment. A campaign to persuade pharmaceutical firms to make all trial data available won its first convert in February 2013 when

GlaxoSmithKline GSK plc, formerly GlaxoSmithKline plc, is a British Multinational corporation, multinational pharmaceutical and biotechnology company with global headquarters in London, England. Established in 2000 by a Mergers and acquisitions, merger of Gl ...

became the first to agree. Software used in a trial is generally considered to be proprietary intellectual property and is not available to replicators, further complicating matters. Journals that insist on data-sharing tend not to do the same for software. Even well-written papers may not include sufficient detail and/or tacit knowledge (subtle skills and extemporisations not considered notable) for the replication to succeed. One cause of replication failure is insufficient control of the protocol, which can cause disputes between the original and replicating researchers.

Reform

Statistics training

Geneticists have begun more careful reviews, particularly of the use of statistical techniques. The effect was to stop a flood of specious results from

genome sequencing Whole genome sequencing (WGS), also known as full genome sequencing, complete genome sequencing, or entire genome sequencing, is the process of determining the entirety, or nearly the entirety, of the DNA sequence of an organism's genome at a ...

Protocol registration

Registering research protocols in advance and monitoring them over the course of a study can prevent researchers from modifying the protocol midstream to highlight preferred results. Providing raw data for other researchers to inspect and test can also better hold researchers to account.

Post-publication review

Replacing peer review with post-publication evaluations can encourage researchers to think more about the long-term consequences of excessive or unsubstantiated claims. That system was adopted in physics and mathematics with good results.

Replication

Few researchers, especially junior workers, seek opportunities to replicate others' work, partly to protect relationships with senior researchers. Reproduction benefits from access to the original study's methods and data. More than half of 238 biomedical papers published in 84 journals failed to identify all the resources (such as chemical reagents) necessary to reproduce the results. In 2008 some 60% of researchers said they would share raw data; in 2013 just 45% do. Journals have begun to demand that at least some raw data be made available, although only 143 of 351 randomly selected papers covered by some data-sharing policy actually complied. The Reproducibility Initiative is a service allowing life scientists to pay to have their work validated by an independent lab. In October 2013 the initiative received funding to review 50 of the highest-impact cancer findings published between 2010 and 2012. ''Blog Syn'' is a website run by graduate students that is dedicated to reproducing chemical reactions reported in papers. In 2013 replication efforts received greater attention. ''Nature'' and related publications introduced an 18-point checklist for life science authors in May,Reporting Checklist For Life Sciences Articles
/ref> in its effort to ensure that its published research can be reproduced. Expanded "methods" sections and all data were to be available online. The Centre for Open Science opened as an independent laboratory focused on replication. The journal Perspectives on Psychological Science announced a section devoted to replications. Another project announced plans to replicate 100 studies published in the first three months of 2008 in three leading psychology journals. Major funders, including the

European Research Council The European Research Council (ERC) is a public body for funding of scientific and technological research conducted within the European Union (EU). Established by the European Commission in 2007, the ERC is composed of an independent Scientif ...

, the US

National Science Foundation The National Science Foundation (NSF) is an independent agency of the United States government that supports fundamental research and education in all the non-medical fields of science and engineering. Its medical counterpart is the National ...

and

Research Councils UK Research Councils UK, sometimes known as RCUK, was a non-departmental public body which coordinated science policy in the United Kingdom from 2002 to 2018. It was an umbrella organisation that coordinated the seven separate research councils t ...

have not changed their preference for new work over replications.

References

External links

* * {{Cite web, last1=O’Grady, first1=Cathleen, date=2020-12-09, title=Psychology's replication crisis inspires ecologists to push for more reliable research, url=https://www.science.org/content/article/psychology-s-replication-crisis-inspires-ecologists-push-more-reliable-research, access-date=2020-12-16, website=Science {{! AAAS, language=en Scientific misconduct