Statistics
Statistics (from German language, German: ''wikt:Statistik#German, Statistik'', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of ...
, when used in a misleading fashion, can trick the casual observer into believing something other than what the
data
In the pursuit of knowledge, data (; ) is a collection of discrete values that convey information, describing quantity, quality, fact, statistics, other basic units of meaning, or simply sequences of symbols that may be further interpreted ...
shows. That is, a misuse of statistics occurs when a statistical argument asserts a
falsehood
Deception or falsehood is an act or statement that misleads, hides the truth, or promotes a belief, concept, or idea that is not true. It is often done for personal gain or advantage. Deception can involve dissimulation, propaganda and sleight o ...
. In some cases, the misuse may be accidental. In others, it is purposeful and for the gain of the perpetrator. When the statistical reason involved is false or misapplied, this constitutes a statistical
fallacy
A fallacy is the use of invalid or otherwise faulty reasoning, or "wrong moves," in the construction of an argument which may appear stronger than it really is if the fallacy is not spotted. The term in the Western intellectual tradition was intr ...
.
The false statistics trap can be quite damaging for the quest for knowledge. For example, in medical science, correcting a falsehood may take decades and cost lives.
Misuses can be easy to fall into. Professional scientists, even mathematicians and professional statisticians, can be fooled by even some simple methods, even if they are careful to check everything. Scientists have been known to fool themselves with statistics due to lack of knowledge of
probability theory
Probability theory is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expressing it through a set o ...
and lack of
standardization
Standardization or standardisation is the process of implementing and developing technical standards based on the consensus of different parties that include firms, users, interest groups, standards organizations and governments. Standardization ...
of their
tests
Test(s), testing, or TEST may refer to:
* Test (assessment), an educational assessment intended to measure the respondents' knowledge or other abilities
Arts and entertainment
* ''Test'' (2013 film), an American film
* ''Test'' (2014 film), ...
.
Definition, limitations and context
One usable definition is: "Misuse of Statistics: Using numbers in such a manner that – either by intent or through ignorance or carelessness – the conclusions are unjustified or incorrect." The "numbers" include
misleading graph
In statistics, a misleading graph, also known as a distorted graph, is a graph that misrepresents data, constituting a misuse of statistics and with the result that an incorrect conclusion may be derived from it.
Graphs may be misleading by be ...
ics discussed elsewhere. The term is not commonly encountered in statistics texts and no authoritative definition is known. It is a generalization of
lying with statistics which was richly described by examples from statisticians 60 years ago.
The definition confronts some problems (some are addressed by the source):
# Statistics usually produces probabilities; conclusions are provisional
# The provisional conclusions have errors and error rates. Commonly 5% of the provisional conclusions of significance testing are wrong
# Statisticians are not in complete agreement on ideal methods
# Statistical methods are based on assumptions which are seldom fully met
# Data gathering is usually limited by ethical, practical and financial constraints.
''
How to Lie with Statistics
''How to Lie with Statistics'' is a book written by Darrell Huff in 1954 presenting an introduction to statistics for the general reader. Not a statistician, Huff was a journalist who wrote many "how to" articles as a freelancer.
The book is a ...
'' acknowledges that statistics can ''legitimately'' take many forms. Whether the statistics show that a product is "light and economical" or "flimsy and cheap" can be debated whatever the numbers. Some object to the substitution of statistical correctness for moral leadership (for example) as an objective. Assigning blame for misuses is often difficult because scientists, pollsters, statisticians and reporters are often employees or consultants.
An insidious misuse(?) of statistics is completed by the listener/observer/audience/juror. The supplier provides the "statistics" as numbers or graphics (or before/after photographs), allowing the consumer to draw (possibly unjustified or incorrect) conclusions. The poor state of public
statistical literacy Statistical literacy is the ability to understand and reason with statistics and data. The abilities to understand and reason with data, or arguments that use data, are necessary for citizens to understand material presented in publications such as ...
and the non-statistical nature of human intuition permits misleading without explicitly producing faulty conclusions. The definition is weak on the responsibility of the consumer of statistics.
A historian listed over 100 fallacies in a dozen categories including those of generalization and those of causation. A few of the fallacies are explicitly or potentially statistical including sampling, statistical nonsense, statistical probability, false extrapolation, false interpolation and insidious generalization. All of the technical/mathematical problems of applied probability would fit in the single listed fallacy of statistical probability. Many of the fallacies could be coupled to statistical analysis, allowing the possibility of a false conclusion flowing from a blameless statistical analysis.
An example use of statistics is in the analysis of medical research. The process includes
[ Contains a rich list of medical misuses of statistics of all types.] experimental planning, the conduct of the experiment, data analysis, drawing the logical conclusions and presentation/reporting. The report is summarized by the popular press and by advertisers. Misuses of statistics can result from problems at any step in the process. The statistical standards ideally imposed on the scientific report are much different than those imposed on the popular press and advertisers; however, cases exist of
advertising disguised as science. The definition of the misuse of statistics is weak on the required completeness of statistical reporting. The opinion is expressed that newspapers must provide at least the source for the statistics reported.
Simple causes
Many misuses of statistics occur because
* The source is a subject matter expert, not a statistics expert. The source may incorrectly use a method or interpret a result.
* The source is a statistician, not a subject matter expert. An expert should know when the numbers being compared describe different things. Numbers change, as reality does not, when legal definitions or political boundaries change.
* The subject being studied is not well defined, or some of its aspects are easy to quantify while others hard to quantify or there is no known quantification method (see
McNamara fallacy
The McNamara fallacy (also known as the quantitative fallacy), named for Robert McNamara, the US Secretary of Defense from 1961 to 1968, involves making a decision based solely on quantitative observations (or metrics) and ignoring all others. ...
). For example:
** While
IQ tests
An intelligence quotient (IQ) is a total score derived from a set of standardized tests or subtests designed to assess human intelligence. The abbreviation "IQ" was coined by the psychologist William Stern for the German term ''Intelligenzqu ...
are available and numeric it is difficult to define what they measure, as intelligence is an elusive concept.
** Publishing "impact" has the same problem. Scientific papers and scholarly journals are often rated by "impact", quantified as the number of citations by later publications. Mathematicians and statisticians conclude that impact (while relatively objective) is not a very meaningful measure. "The sole reliance on citation data provides at best an incomplete and often shallow understanding of an understanding that is valid only when reinforced by other judgments. Numbers are not inherently superior to sound judgments."
** A seemingly simple question about the number of words in the English language immediately encounters questions about archaic forms, accounting for prefixes and suffixes, multiple definitions of a word, variant spellings, dialects, fanciful creations (like ectoplastistics from ectoplasm and statistics), technical vocabulary, and so on.
* Data quality is poor. Apparel provides an example. People have a wide range of sizes and body shapes. It is obvious that apparel sizing must be multidimensional. Instead it is complex in unexpected ways. Some
apparel
Clothing (also known as clothes, apparel, and attire) are items worn on the body. Typically, clothing is made of fabrics or textiles, but over time it has included garments made from animal skin and other thin sheets of materials and natural ...
is sold by size only (with no explicit consideration of body shape), sizes vary by country and manufacturer and
some sizes are deliberately misleading. While sizes are numeric, only the crudest of statistical analyses is possible using the size numbers with care.
* The popular press has limited expertise and mixed motives. If the facts are not "newsworthy" (which may require exaggeration) they may not be published. The motives of advertisers are even more mixed.
* "Politicians use statistics in the same way that a drunk uses lamp posts—for support rather than illumination" – Andrew Lang (WikiQuote) "What do we learn from these two ways of looking at the same numbers? We learn that a clever propagandist, right or left, can almost always find a way to present the data on economic growth that seems to support her case. And we therefore also learn to take any statistical analysis from a strongly political source with handfuls of salt." The term statistics originates from numbers generated for and utilized by the state. Good government may require accurate numbers, but popular government may require supportive numbers (not necessarily the same). "The use and misuse of statistics by governments is an ancient art."
Types of misuse
Discarding unfavorable observations
All a company has to do to promote a neutral (useless) product is to find or conduct, for example, 40 studies with a confidence level of 95%. If the product is really useless, this would on average produce one study showing the product was beneficial, one study showing it was harmful and thirty-eight inconclusive studies (38 is 95% of 40). This tactic becomes more effective the more studies there are available. Organizations that do not publish every study they carry out, such as tobacco companies denying a link between smoking and cancer, anti-smoking advocacy groups and media outlets trying to prove a link between smoking and various ailments, or miracle pill vendors, are likely to use this tactic.
Ronald Fisher
Sir Ronald Aylmer Fisher (17 February 1890 – 29 July 1962) was a British polymath who was active as a mathematician, statistician, biologist, geneticist, and academic. For his work in statistics, he has been described as "a genius who a ...
considered this issue in his famous
lady tasting tea
In the design of experiments in statistics, the lady tasting tea is a randomized experiment devised by Ronald Fisher and reported in his book ''The Design of Experiments'' (1935). The experiment is the original exposition of Fisher's notion of ...
example experiment (from his 1935 book, ''
The Design of Experiments
''The Design of Experiments'' is a 1935 book by the English statistician Ronald Fisher about the design of experiments and is considered a foundational work in experimental design. Among other contributions, the book introduced the concept of the ...
''). Regarding repeated experiments he said, "It would clearly be illegitimate, and would rob our calculation of its basis, if unsuccessful results were not all brought into the account."
Another term related to this concept is
cherry picking
Cherry picking, suppressing evidence, or the fallacy of incomplete evidence is the act of pointing to individual cases or data that seem to confirm a particular position while ignoring a significant portion of related and similar cases or data th ...
.
Ignoring important features
Multivariable datasets have two or more
features/dimensions. If too few of these features are chosen for analysis (for example, if just one feature is chosen and
simple linear regression
In statistics, simple linear regression is a linear regression model with a single explanatory variable. That is, it concerns two-dimensional sample points with one independent variable and one dependent variable (conventionally, the ''x'' and ...
is performed instead of
multiple linear regression
In statistics, linear regression is a linear approach for modelling the relationship between a scalar response and one or more explanatory variables (also known as dependent and independent variables). The case of one explanatory variable is call ...
), the results can be misleading. This leaves the analyst vulnerable to any of various
statistical paradoxes, or in some (not all) cases false causality as below.
Loaded questions
The answers to surveys can often be manipulated by wording the question in such a way as to induce a prevalence towards a certain answer from the respondent. For example, in polling support for a war, the questions:
* Do you support the attempt by the US to bring freedom and democracy to other places in the world?
* Do you support the unprovoked military action by the USA?
will likely result in data skewed in different directions, although they are both polling about the support for the war. A better way of wording the question could be "Do you support the current US military action abroad?" A still more nearly neutral way to put that question is "What is your view about the current US military action abroad?" The point should be that the person being asked has no way of guessing from the wording what the questioner might want to hear.
Another way to do this is to precede the question by information that supports the "desired" answer. For example, more people will likely answer "yes" to the question "Given the increasing burden of taxes on middle-class families, do you support cuts in income tax?" than to the question "Considering the rising federal budget deficit and the desperate need for more revenue, do you support cuts in income tax?"
The proper formulation of questions can be very subtle. The responses to two questions can vary dramatically depending on the order in which they are asked. "A survey that asked about 'ownership of stock' found that most Texas ranchers owned stock, though probably not the kind traded on the New York Stock Exchange."
Overgeneralization
Overgeneralization
A faulty generalization is an informal fallacy wherein a conclusion is drawn about all or many instances of a phenomenon on the basis of one or a few instances of that phenomenon. It is similar to a proof by example in mathematics. It is an exam ...
is a fallacy occurring when a statistic about a particular population is asserted to hold among members of a group for which the original population is not a representative sample.
For example, suppose 100% of apples are observed to be red in summer. The assertion "All apples are red" would be an instance of overgeneralization because the original statistic was true only of a specific subset of apples (those in summer), which is not expected to be representative of the population of apples as a whole.
A real-world example of the overgeneralization fallacy can be observed as an artifact of modern polling techniques, which prohibit calling cell phones for over-the-phone political polls. As young people are more likely than other demographic groups to lack a conventional "landline" phone, a telephone poll that exclusively surveys responders of calls landline phones, may cause the poll results to undersample the views of young people, if no other measures are taken to account for this skewing of the sampling. Thus, a poll examining the voting preferences of young people using this technique may not be a perfectly accurate representation of young peoples' true voting preferences as a whole without overgeneralizing, because the sample used excludes young people that carry only cell phones, who may or may not have voting preferences that differ from the rest of the population.
Overgeneralization often occurs when information is passed through nontechnical sources, in particular mass media.
Biased samples
Scientists have learned at great cost that gathering good experimental data for statistical analysis is difficult. Example: The
placebo
A placebo ( ) is a substance or treatment which is designed to have no therapeutic value. Common placebos include inert tablets (like sugar pills), inert injections (like Saline (medicine), saline), sham surgery, and other procedures.
In general ...
effect (mind over body) is very powerful. 100% of subjects developed a rash when exposed to an inert substance that was falsely called poison ivy while few developed a rash to a "harmless" object that really was poison ivy. Researchers combat this effect by double-blind randomized comparative
experiment
An experiment is a procedure carried out to support or refute a hypothesis, or determine the efficacy or likelihood of something previously untried. Experiments provide insight into Causality, cause-and-effect by demonstrating what outcome oc ...
s. Statisticians typically worry more about the validity of the data than the analysis. This is reflected in a field of study within statistics known as the
design of experiments
The design of experiments (DOE, DOX, or experimental design) is the design of any task that aims to describe and explain the variation of information under conditions that are hypothesized to reflect the variation. The term is generally associ ...
.
Pollsters have learned at great cost that gathering good survey data for statistical analysis is difficult. The selective effect of cellular telephones on data collection (discussed in the Overgeneralization section) is one potential example; If young people with traditional telephones are not representative, the sample can be biased. Sample surveys have many pitfalls and require great care in execution. One effort required almost 3000 telephone calls to get 1000 answers. The simple random sample of the population "isn't simple and may not be random."
Misreporting or misunderstanding of estimated error
If a research team wants to know how 300 million people feel about a certain topic, it would be impractical to ask all of them. However, if the team picks a random sample of about 1000 people, they can be fairly certain that the results given by this group are representative of what the larger group would have said if they had all been asked.
This confidence can actually be quantified by the
central limit theorem
In probability theory, the central limit theorem (CLT) establishes that, in many situations, when independent random variables are summed up, their properly normalized sum tends toward a normal distribution even if the original variables themselv ...
and other mathematical results. Confidence is expressed as a probability of the true result (for the larger group) being within a certain range of the estimate (the figure for the smaller group). This is the "plus or minus" figure often quoted for statistical surveys. The probability part of the confidence level is usually not mentioned; if so, it is assumed to be a standard number like 95%.
The two numbers are related. If a survey has an estimated error of ±5% at 95% confidence, it also has an estimated error of ±6.6% at 99% confidence. ±
% at 95% confidence is always ±
% at 99% confidence for a normally distributed population.
The smaller the estimated error, the larger the required sample, at a given confidence level; for example, at
95.4% confidence:
* ±1% would require 10,000 people.
* ±2% would require 2,500 people.
* ±3% would require 1,111 people.
* ±4% would require 625 people.
* ±5% would require 400 people.
* ±10% would require 100 people.
* ±20% would require 25 people.
* ±25% would require 16 people.
* ±50% would require 4 people.
People may assume, because the confidence figure is omitted, that there is a 100% certainty that the true result is within the estimated error. This is not mathematically correct.
Many people may not realize that the randomness of the sample is very important. In practice, many opinion polls are conducted by phone, which distorts the sample in several ways, including exclusion of people who do not have phones, favoring the inclusion of people who have more than one phone, favoring the inclusion of people who are willing to participate in a phone survey over those who refuse, etc. Non-random sampling makes the estimated error unreliable.
On the other hand, people may consider that statistics are inherently unreliable because not everybody is called, or because they themselves are never polled. People may think that it is impossible to get data on the opinion of dozens of millions of people by just polling a few thousands. This is also inaccurate. A poll with perfect unbiased sampling and truthful answers has a mathematically determined
margin of error
The margin of error is a statistic expressing the amount of random sampling error in the results of a survey. The larger the margin of error, the less confidence one should have that a poll result would reflect the result of a census of the ent ...
, which only depends on the number of people polled.
However, often only one margin of error is reported for a survey. When results are reported for population subgroups, a larger margin of error will apply, but this may not be made clear. For example, a survey of 1000 people may contain 100 people from a certain ethnic or economic group. The results focusing on that group will be much less reliable than results for the full population. If the margin of error for the full sample was 4%, say, then the margin of error for such a subgroup could be around 13%.
There are also many other measurement problems in population surveys.
The problems mentioned above apply to all statistical experiments, not just population surveys.
False causality
When a statistical test shows a correlation between A and B, there are usually six possibilities:
# A causes B.
# B causes A.
# A and B both partly cause each other.
# A and B are both caused by a third factor, C.
# B is caused by C which is correlated to A.
# The observed correlation was due purely to chance.
The sixth possibility can be quantified by statistical tests that can calculate the probability that the correlation observed would be as large as it is just by chance if, in fact, there is no relationship between the variables. However, even if that possibility has a small probability, there are still the five others.
If the number of people buying ice cream at the beach is statistically related to the number of people who drown at the beach, then nobody would claim ice cream causes drowning because it's obvious that it isn't so. (In this case, both drowning and ice cream buying are clearly related by a third factor: the number of people at the beach).
This fallacy can be used, for example, to prove that exposure to a chemical causes cancer. Replace "number of people buying ice cream" with "number of people exposed to chemical X", and "number of people who drown" with "number of people who get cancer", and many people will believe you. In such a situation, there may be a statistical correlation even if there is no real effect. For example, if there is a perception that a chemical site is "dangerous" (even if it really isn't) property values in the area will decrease, which will entice more low-income families to move to that area. If low-income families are more likely to get cancer than high-income families (due to a poorer diet, for example, or less access to medical care) then rates of cancer will go up, even though the chemical itself is not dangerous. It is believed
that this is exactly what happened with some of the early studies showing a link between EMF (
electromagnetic field
An electromagnetic field (also EM field or EMF) is a classical (i.e. non-quantum) field produced by (stationary or moving) electric charges. It is the field described by classical electrodynamics (a classical field theory) and is the classical c ...
s) from power lines and
cancer
Cancer is a group of diseases involving abnormal cell growth with the potential to invade or spread to other parts of the body. These contrast with benign tumors, which do not spread. Possible signs and symptoms include a lump, abnormal b ...
.
[ Cites: ]
In well-designed studies, the effect of false causality can be eliminated by assigning some people into a "treatment group" and some people into a "control group" at random, and giving the treatment group the treatment and not giving the control group the treatment. In the above example, a researcher might expose one group of people to chemical X and leave a second group unexposed. If the first group had higher cancer rates, the researcher knows that there is no third factor that affected whether a person was exposed because he controlled who was exposed or not, and he assigned people to the exposed and non-exposed groups at random. However, in many applications, actually doing an experiment in this way is either prohibitively expensive, infeasible, unethical, illegal, or downright impossible. For example, it is highly unlikely that an
IRB would accept an experiment that involved intentionally exposing people to a dangerous substance in order to test its toxicity. The obvious ethical implications of such types of experiments limit researchers' ability to empirically test causation.
Proof of the null hypothesis
In a statistical test, the
null hypothesis
In scientific research, the null hypothesis (often denoted ''H''0) is the claim that no difference or relationship exists between two sets of data or variables being analyzed. The null hypothesis is that any experimentally observed difference is d ...
(
) is considered valid until enough data proves it wrong. Then
is rejected and the alternative hypothesis (
) is considered to be proven as correct. By chance this can happen, although
is true, with a probability denoted
(the significance level). This can be compared to the judicial process, where the accused is considered innocent (
) until proven guilty (
) beyond reasonable doubt (
).
But if data does not give us enough proof to reject that
, this does not automatically prove that
is correct. If, for example, a tobacco producer wishes to demonstrate that its products are safe, it can easily conduct a test with a small sample of smokers versus a small sample of non-smokers. It is unlikely that any of them will develop lung cancer (and even if they do, the difference between the groups has to be very big in order to reject
). Therefore, it is likely—even when smoking is dangerous—that our test will not reject
. If
is accepted, it does not automatically follow that smoking is proven harmless. The test has insufficient power to reject
, so the test is useless and the value of the "proof" of
is also null.
This can—using the judicial analogue above—be compared with the truly guilty defendant who is released just because the proof is not enough for a guilty verdict. This does not prove the defendant's innocence, but only that there is not proof enough for a guilty verdict.
"...the null hypothesis is never proved or established, but it is possibly disproved, in the course of experimentation. Every experiment may be said to exist only in order to give the facts a chance of disproving the null hypothesis." (Fisher in ''
The Design of Experiments
''The Design of Experiments'' is a 1935 book by the English statistician Ronald Fisher about the design of experiments and is considered a foundational work in experimental design. Among other contributions, the book introduced the concept of the ...
'') Many reasons for confusion exist including the use of double negative logic and terminology resulting from the merger of Fisher's "significance testing" (where the null hypothesis is never accepted) with "hypothesis testing" (where some hypothesis is always accepted).
Confusing statistical significance with practical significance
Statistical significance is a measure of probability; practical significance is a measure of effect. A baldness cure is statistically significant if a sparse peach-fuzz usually covers the previously naked scalp. The cure is practically significant when a hat is no longer required in cold weather and the barber asks how much to take off the top. The bald want a cure that is both statistically and practically significant; It will probably work and if it does, it will have a big hairy effect. Scientific publication often requires only statistical significance. This has led to complaints (for the last 50 years) that statistical significance testing is a misuse of statistics.
Data dredging
Data dredging
Data dredging (also known as data snooping or ''p''-hacking) is the misuse of data analysis to find patterns in data that can be presented as statistically significant, thus dramatically increasing and understating the risk of false positives. ...
is an abuse of
data mining. In data dredging, large compilations of data are examined in order to find a correlation, without any pre-defined choice of a
hypothesis
A hypothesis (plural hypotheses) is a proposed explanation for a phenomenon. For a hypothesis to be a scientific hypothesis, the scientific method requires that one can test it. Scientists generally base scientific hypotheses on previous obse ...
to be tested. Since the required
confidence interval
In frequentist statistics, a confidence interval (CI) is a range of estimates for an unknown parameter. A confidence interval is computed at a designated ''confidence level''; the 95% confidence level is most common, but other levels, such as 9 ...
to establish a relationship between two parameters is usually chosen to be 95% (meaning that there is a 95% chance that the relationship observed is not due to random chance), there is thus a 5% chance of finding a correlation between any two sets of completely random variables. Given that data dredging efforts typically examine large datasets with many variables, and hence even larger numbers of pairs of variables, spurious but apparently statistically significant results are almost certain to be found by any such study.
Note that data dredging is a valid way of ''finding'' a possible hypothesis but that hypothesis ''must'' then be tested with data not used in the original dredging. The misuse comes in when that hypothesis is stated as fact without further validation.
"You cannot legitimately test a hypothesis on the same data that first suggested that hypothesis. The remedy is clear. Once you have a hypothesis, design a study to search specifically for the effect you now think is there. If the result of this test is statistically significant, you have real evidence at last."
Data manipulation
Informally called "fudging the data," this practice includes selective reporting (see also
publication bias
In published academic research, publication bias occurs when the outcome of an experiment or research study biases the decision to publish or otherwise distribute it. Publishing only results that show a significant finding disturbs the balance o ...
) and even simply making up false data.
Examples of selective reporting abound. The easiest and most common examples involve choosing a group of results that follow a pattern
consistent
In classical deductive logic, a consistent theory is one that does not lead to a logical contradiction. The lack of contradiction can be defined in either semantic or syntactic terms. The semantic definition states that a theory is consistent i ...
with the preferred
hypothesis
A hypothesis (plural hypotheses) is a proposed explanation for a phenomenon. For a hypothesis to be a scientific hypothesis, the scientific method requires that one can test it. Scientists generally base scientific hypotheses on previous obse ...
while ignoring other results or "data runs" that contradict the hypothesis.
Scientists, in general, question the validity of study results that cannot be reproduced by other investigators. However, some scientists refuse to publish their data and methods.
Data manipulation is a serious issue/consideration in the most honest of statistical analyses. Outliers, missing data and non-normality can all adversely affect the validity of statistical analysis. It is appropriate to study the data and repair real problems before analysis begins. "
any scatter diagram there will be some points more or less detached from the main part of the cloud: these points should be rejected only for cause."
Other fallacies
Pseudoreplication
Pseudoreplication (sometimes unit of analysis error) has many definitions. Pseudoreplication was originally defined in 1984 by Stuart H. Hurlbert as the use of inferential statistics to test for treatment effects with data from experiments where ...
is a technical error associated with
analysis of variance
Analysis of variance (ANOVA) is a collection of statistical models and their associated estimation procedures (such as the "variation" among and between groups) used to analyze the differences among means. ANOVA was developed by the statisticia ...
. Complexity hides the fact that statistical analysis is being attempted on a single sample (N=1). For this degenerate case the variance cannot be calculated (division by zero). An (N=1) will always give the researcher the highest statistical correlation between intent bias and actual findings.
The
gambler's fallacy assumes that an event for which a future likelihood can be measured had the same likelihood of happening once it has already occurred. Thus, if someone had already tossed 9 coins and each has come up heads, people tend to assume that the likelihood of a tenth toss also being heads is 1023 to 1 against (which it was before the first coin was tossed) when in fact the chance of the tenth head is 50% (assuming the coin is unbiased).
The
prosecutor's fallacy
The prosecutor's fallacy is a fallacy of statistical reasoning involving a test for an occurrence, such as a DNA match. A positive result in the test may paradoxically be more likely to be an erroneous result than an actual occurrence, even i ...
assumes that the probability of an apparently criminal event being random chance is equal to the chance that the suspect is innocent. A prominent example in the UK is the wrongful conviction of
Sally Clark for killing her two sons who appeared to have died of
Sudden Infant Death Syndrome
Sudden infant death syndrome (SIDS) is the sudden unexplained death of a child of less than one year of age. Diagnosis requires that the death remain unexplained even after a thorough autopsy and detailed death scene investigation. SIDS usuall ...
(SIDS). In his expert testimony, now discredited Professor Sir
Roy Meadow
Sir Samuel Roy Meadow (born 9 June 1933) is a British retired paediatrician. He was awarded the Donald Paterson prize of the British Paediatric Association in 1968 for a study of the effects on parents of having a child in hospital. In 1977, he ...
claimed that due to the rarity of SIDS, the probability of Clark being innocent was 1 in 73 million. This was later questioned by the
Royal Statistical Society
The Royal Statistical Society (RSS) is an established statistical society. It has three main roles: a British learned society for statistics, a professional body for statisticians and a charity which promotes statistics for the public good.
...
;
[Royal Statistical Society (23 October 2001). " "] assuming Meadows figure was accurate, one has to weigh up all the possible explanations against each other to make a conclusion on which most likely caused the unexplained death of the two children. Available data suggest that the odds would be in favour of double SIDS compared to double homicide by a factor of nine. The 1 in 73 million figure was also misleading as it was reached by finding the probability of a baby from an affluent, non-smoking family dying from SIDS and
squaring it: this erroneously treats each death as
statistically independent
Independence is a fundamental notion in probability theory, as in statistics and the theory of stochastic processes. Two events are independent, statistically independent, or stochastically independent if, informally speaking, the occurrence of o ...
, assuming that there is no factor, such as genetics, that would make it more likely for two siblings to die from SIDS.
This is also an example of the
ecological fallacy
An ecological fallacy (also ecological ''inference'' fallacy or population fallacy) is a formal fallacy in the interpretation of statistical data that occurs when inferences about the nature of individuals are deduced from inferences about the gr ...
as it assumes the probability of SIDS in Clark's family was the same as the average of all affluent, non-smoking families; social class is a highly complex and multifaceted concept, with numerous other variables such as education, line of work, and many more. Assuming that an individual will have the same attributes as the rest of a given group fails to account for the effects of other variables which in turn can be misleading.
The conviction of
Sally Clark was eventually overturned and Meadow was struck from the medical register.
The
ludic fallacy
The ludic fallacy, proposed by Nassim Nicholas Taleb in his book '' The Black Swan'' ( 2007), is "the misuse of games to model real-life situations". Taleb explains the fallacy as "basing studies of chance on the narrow world of games and dice". ...
. Probabilities are based on simple models that ignore real (if remote) possibilities. Poker players do not consider that an opponent may draw a gun rather than a card. The insured (and governments) assume that insurers will remain solvent, but see
AIG
American International Group, Inc. (AIG) is an American multinational finance and insurance corporation with operations in more than 80 countries and jurisdictions. , AIG companies employed 49,600 people.https://www.aig.com/content/dam/aig/amer ...
and
systemic risk
In finance, systemic risk is the risk of collapse of an entire financial system or entire market, as opposed to the risk associated with any one individual entity, group or component of a system, that can be contained therein without harming the ...
.
Other types of misuse
Other misuses include comparing
apples and oranges
A comparison of apples and oranges occurs when two items or groups of items are compared that cannot be practically compared, typically because of inherent, fundamental and/or qualitative differences between the items.
The idiom, ''comparing ...
, using the wrong average,
regression toward the mean
In statistics, regression toward the mean (also called reversion to the mean, and reversion to mediocrity) is the fact that if one sample of a random variable is extreme, the next sampling of the same random variable is likely to be closer to it ...
, and the umbrella phrase
garbage in, garbage out
In computer science, garbage in, garbage out (GIGO) is the concept that flawed, or nonsense (garbage) input data produces nonsense output. Rubbish in, rubbish out (RIRO) is an alternate wording.
The principle applies to all logical argumentati ...
. Some statistics are simply irrelevant to an issue.
Anscombe's quartet is a made-up dataset that exemplifies the shortcomings of simple
descriptive statistics
A descriptive statistic (in the count noun sense) is a summary statistic that quantitatively describes or summarizes features from a collection of information, while descriptive statistics (in the mass noun sense) is the process of using and an ...
(and the value of
data plotting before numerical analysis).
See also
*
Deception
Deception or falsehood is an act or statement that misleads, hides the truth, or promotes a belief, concept, or idea that is not true. It is often done for personal gain or advantage. Deception can involve dissimulation, propaganda and sleight o ...
*
Ecological fallacy
An ecological fallacy (also ecological ''inference'' fallacy or population fallacy) is a formal fallacy in the interpretation of statistical data that occurs when inferences about the nature of individuals are deduced from inferences about the gr ...
*
Ethics in mathematics
*
Metascience
Metascience (also known as meta-research) is the use of scientific methodology to study science itself. Metascience seeks to increase the quality of scientific research while reducing inefficiency. It is also known as "''research on research''" ...
*
Misuse of p-values
Misuse of ''p''-values is common in scientific research and scientific education. ''p''-values are often used or interpreted incorrectly; the American Statistical Association states that ''p''-values can indicate how incompatible the data are wit ...
*
Misleading graph
In statistics, a misleading graph, also known as a distorted graph, is a graph that misrepresents data, constituting a misuse of statistics and with the result that an incorrect conclusion may be derived from it.
Graphs may be misleading by be ...
*
Post hoc analysis
In a scientific study, post hoc analysis (from Latin '' post hoc'', "after this") consists of statistical analyses that were specified after the data were seen. They are usually used to uncover specific differences between three or more group mea ...
*
Simpson's paradox
Simpson's paradox is a phenomenon in probability and statistics in which a trend appears in several groups of data but disappears or reverses when the groups are combined. This result is often encountered in social-science and medical-science st ...
*
Statcheck
Statcheck is an R package designed to detect statistical errors in peer-reviewed psychology articles by searching papers for statistical results, redoing the calculations described in each paper, and comparing the two values to see if they match. I ...
References
Notes
Sources
Further reading
*
*
*
*
*
*
*
*
*
*
* The book is based on several hundred examples of misuse.
* Oldberg, T. and R. Christensen (1995) "Erratic Measure" in ''NDE for the Energy Industry 1995'', The American Society of Mechanical Engineers. (pages 1–6
Republished on the Web by ndt.net* Oldberg, T. (2005) "An Ethical Problem in the Statistics of Defect Detection Test Reliability," Speech to the Golden Gate Chapter of the
American Society for Nondestructive Testing
The American Society for Nondestructive Testing, Inc. or ASNT is a technical society for nondestructive testing (NDT) professionals. ASNT evolved from ''The American Industrial Radium and X-ray Society'' which was founded in 1941. Its headquarters ...
Published on the Web by ndt.net* Stone, M. (2009) ''Failing to Figure: Whitehall's Costly Neglect of Statistical Reasoning'', Civitas, London.
*
{{DEFAULTSORT:Misuse Of Statistics
Ethics and statistics