Statistical bias is a systematic tendency which causes differences between results and facts. The bias exists in numbers of the process of data analysis, including the source of the data, the
estimator
In statistics, an estimator is a rule for calculating an estimate of a given quantity based on observed data: thus the rule (the estimator), the quantity of interest (the estimand) and its result (the estimate) are distinguished. For example, the ...
chosen, and the ways the data was analyzed. Bias may have a serious impact on results, for example, to investigate people's buying habits. If the sample size is not large enough, the results may not be representative of the buying habits of all the people. That is, there may be discrepancies between the survey results and the actual results. Therefore, understanding the source of statistical bias can help to assess whether the observed results are close to the real results.
Bias can be differentiated from other mistakes such as accuracy (instrument failure/inadequacy), lack of data, or mistakes in transcription (typos). Bias implies that the data selection may have been skewed by the collection criteria.
Bias does not preclude the existence of any other mistakes. One may have a poorly designed sample, an inaccurate measurement device, and typos in recording data simultaneously.
Also it is useful to recognize that the term “error” specifically refers to the outcome rather than the process (errors of rejection or acceptance of the hypothesis being tested). Use of ''flaw'' or ''mistake'' to differentiate procedural errors from these specifically defined outcome-based terms is recommended.
Bias of an estimator
Statistical bias is a feature of a
statistical technique or of its results whereby the
expected value of the results differs from the true underlying quantitative
parameter
A parameter (), generally, is any characteristic that can help in defining or classifying a particular system (meaning an event, project, object, situation, etc.). That is, a parameter is an element of a system that is useful, or critical, when ...
being
estimated
Estimation (or estimating) is the process of finding an estimate or approximation, which is a value that is usable for some purpose even if input data may be incomplete, uncertain, or unstable. The value is nonetheless usable because it is der ...
. The bias of an estimator of a parameter should not be confused with its degree of precision, as the degree of precision is a measure of the sampling error. The bias is defined as follows: let
be a statistic used to estimate a parameter
, and let
denote the expected value of
. Then,
:
is called the bias of the statistic
(with respect to
). If
, then
is said to be an ''unbiased estimator'' of
; otherwise, it is said to be a ''biased estimator'' of
.
The bias of a statistic
is always relative to the parameter
it is used to estimate, but the parameter
is often omitted when it is clear from the context what is being estimated.
Types
Statistical bias comes from all stages of data analysis. The following sources of bias will be listed in each stage separately.
Data selection
Selection bias involves individuals being more likely to be selected for study than others,
biasing the sample. This can also be termed selection effect,
sampling bias and ''
Berksonian bias''.
*
Spectrum bias In biostatistics, spectrum bias refers to the phenomenon that the performance of a diagnostic test may vary in different clinical settings because each setting has a different mix of patients. Because the performance may be dependent on the mix of ...
arises from evaluating diagnostic tests on biased patient samples, leading to an overestimate of the
sensitivity and specificity
''Sensitivity'' and ''specificity'' mathematically describe the accuracy of a test which reports the presence or absence of a condition. Individuals for which the condition is satisfied are considered "positive" and those for which it is not are ...
of the test. For example, a high prevalence of disease in a study population increases positive predictive values, which will cause a bias between the prediction values and the real ones.
*
Observer selection bias occurs when the evidence presented has been pre-filtered by observers, which is so-called
anthropic principle. The data collected is not only filtered by the design of experiment, but also by the necessary precondition that there must be someone doing a study. An example is the impact of the Earth in the past. The impact event may cause the extinction of intelligent animals, or there were no intelligent animals at that time. Therefore, some impact events have not been observed, but they may have occurred in the past.
*Volunteer bias occurs when volunteers have intrinsically different characteristics from the target population of the study. Research has shown that volunteers tend to come from families with higher socioeconomic status. Furthermore, another study shows that women are more probable to volunteer for studies than men.
*
Funding bias
Funding bias, also known as sponsorship bias, funding outcome bias, funding publication bias, and funding effect, refers to the tendency of a scientific study to support the interests of the study's financial sponsor. This phenomenon is recognized ...
may lead to the selection of outcomes, test samples, or test procedures that favor a study's financial sponsor.
*
Attrition bias
Selection bias is the bias introduced by the selection of individuals, groups, or data for analysis in such a way that proper randomization is not achieved, thereby failing to ensure that the sample obtained is representative of the population int ...
arises due to a loss of participants, e.g., loss of follow up during a study.
*
Recall bias
Recall may refer to:
* Recall (bugle call), a signal to stop
* Recall (information retrieval), a statistical measure
* ''ReCALL'' (journal), an academic journal about computer-assisted language learning
* Recall (memory)
* ''Recall'' (Overwatc ...
arises due to differences in the accuracy or completeness of participant recollections of past events; for example, patients cannot recall how many cigarettes they smoked last week exactly, leading to over-estimation or under-estimation.
Hypothesis testing
Type I and type II errors
In statistical hypothesis testing, a type I error is the mistaken rejection of an actually true null hypothesis (also known as a "false positive" finding or conclusion; example: "an innocent person is convicted"), while a type II error is the fa ...
in
statistical hypothesis testing
A statistical hypothesis test is a method of statistical inference used to decide whether the data at hand sufficiently support a particular hypothesis.
Hypothesis testing allows us to make probabilistic statements about population parameters.
...
leads to wrong results. Type I error happens when the null hypothesis is correct but is rejected. For instance, suppose that the null hypothesis is that if the average driving speed limit ranges from 75 to 85 km/h, it is not considered as speeding. On the other hand, if the average speed is not in that range, it is considered speeding. If someone receives a ticket with an average driving speed of 7 km/h, the decision maker has committed a Type I error. In other words, the average driving speed meets the null hypothesis but is rejected. On the contrary, Type II error happens when the null hypothesis is not correct but is accepted.
Estimator selection
The
bias of an estimator
In statistics, the bias of an estimator (or bias function) is the difference between this estimator's expected value and the true value of the parameter being estimated. An estimator or decision rule with zero bias is called ''unbiased''. In s ...
is the difference between an estimator's expected value and the true value of the parameter being estimated. Although an unbiased estimator is theoretically preferable to a biased estimator, in practice, biased estimators with small biases are frequently used. A biased estimator may be more useful for several reasons. First, an unbiased estimator may not exist without further assumptions. Second, sometimes an unbiased estimator is hard to compute. Third, a biased estimator may have a lower value of mean squared error.
* A biased estimator is better than any unbiased estimator arising from the
Poisson distribution
In probability theory and statistics, the Poisson distribution is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time or space if these events occur with a known co ...
. The value of a biased estimator is always positive and the mean squared error of it is smaller than the unbiased one, which makes the biased estimator be more accurate.
*
Omitted-variable bias
In statistics, omitted-variable bias (OVB) occurs when a statistical model leaves out one or more relevant variables. The bias results in the model attributing the effect of the missing variables to those that were included.
More specifically, OV ...
is the bias that appears in estimates of parameters in regression analysis when the assumed specification omits an independent variable that should be in the model.
Analysis methods
* Detection bias occurs when a phenomenon is more likely to be observed for a particular set of study subjects. For instance, the
syndemic
A syndemic or synergistic epidemic is the aggregation of two or more concurrent or sequential epidemics or disease clusters in a population with biological interactions, which exacerbate the prognosis and burden of disease. The term was develo ...
involving
obesity
Obesity is a medical condition, sometimes considered a disease, in which excess body fat has accumulated to such an extent that it may negatively affect health. People are classified as obese when their body mass index (BMI)—a person's we ...
and
diabetes
Diabetes, also known as diabetes mellitus, is a group of metabolic disorders characterized by a high blood sugar level ( hyperglycemia) over a prolonged period of time. Symptoms often include frequent urination, increased thirst and increased ap ...
may mean doctors are more likely to look for diabetes in obese patients than in thinner patients, leading to an inflation in diabetes among obese patients because of skewed detection efforts.
* In
educational measurement Educational measurement refers to the use of educational assessments and the analysis of data such as scores obtained from educational assessments to infer the abilities and proficiencies of students. The approaches overlap with those in psychometri ...
, bias is defined as "Systematic errors in test content, test administration, and/or scoring procedures that can cause some test takers to get either lower or higher scores than their true ability would merit."
The source of the bias is irrelevant to the trait the test is intended to measure.
*
Observer bias arises when the researcher subconsciously influences the experiment due to
cognitive bias where judgment may alter how an experiment is carried out / how results are recorded.
Interpretation
Reporting bias
In epidemiology, reporting bias is defined as "selective revealing or suppression of information" by subjects (for example about past medical history, smoking, sexual experiences). In artificial intelligence research, the term reporting bias is u ...
involves a skew in the availability of data, such that observations of a certain kind are more likely to be reported.
See also
*
Trueness
Accuracy and precision are two measures of ''observational error''.
''Accuracy'' is how close a given set of measurements (observations or readings) are to their '' true value'', while ''precision'' is how close the measurements are to each othe ...
*
Systematic error
References
{{DEFAULTSORT:Bias (Statistics)
statistics
Statistics (from German language, German: ''wikt:Statistik#German, Statistik'', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of ...
Accuracy and precision