Yates's Correction For Continuity
   HOME

TheInfoList



OR:

In
statistics Statistics (from German language, German: ', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a s ...
, Yates's correction for continuity (or Yates's chi-squared test) is a
statistical test A statistical hypothesis test is a method of statistical inference used to decide whether the data provide sufficient evidence to reject a particular hypothesis. A statistical hypothesis test typically involves a calculation of a test statistic. ...
commonly used when analyzing count data organized in a
contingency table In statistics, a contingency table (also known as a cross tabulation or crosstab) is a type of table in a matrix format that displays the multivariate frequency distribution of the variables. They are heavily used in survey research, business int ...
, particularly when sample sizes are small. It is specifically designed for testing whether two categorical variables are related or
independent Independent or Independents may refer to: Arts, entertainment, and media Artist groups * Independents (artist group), a group of modernist painters based in Pennsylvania, United States * Independentes (English: Independents), a Portuguese artist ...
of each other. The correction modifies the standard
chi-squared test A chi-squared test (also chi-square or test) is a Statistical hypothesis testing, statistical hypothesis test used in the analysis of contingency tables when the sample sizes are large. In simpler terms, this test is primarily used to examine w ...
to account for the fact that a continuous distribution ( chi-squared) is used to approximate discrete data. Almost exclusively applied to 2×2 contingency tables, it involves subtracting 0.5 from the absolute difference between observed and expected frequencies before squaring the result. Unlike the standard Pearson chi-squared statistic, Yates's correction is approximately
unbiased Bias is a disproportionate weight ''in favor of'' or ''against'' an idea or thing, usually in a way that is inaccurate, closed-minded, prejudicial, or unfair. Biases can be innate or learned. People may develop biases for or against an individ ...
for small sample sizes. It is considered more conservative than the uncorrected chi-squared test, as it increases the
p-value In null-hypothesis significance testing, the ''p''-value is the probability of obtaining test results at least as extreme as the result actually observed, under the assumption that the null hypothesis is correct. A very small ''p''-value means ...
and thus reduces the likelihood of rejecting the
null hypothesis The null hypothesis (often denoted ''H''0) is the claim in scientific research that the effect being studied does not exist. The null hypothesis can also be described as the hypothesis in which no relationship exists between two sets of data o ...
when it is true. While widely taught in introductory
statistics Statistics (from German language, German: ', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a s ...
courses, modern computational methods like
Fisher's exact test Fisher's exact test (also Fisher-Irwin test) is a statistical significance test used in the analysis of contingency tables. Although in practice it is employed when sample sizes are small, it is valid for all sample sizes. The test assumes that a ...
may be preferred for analyzing small samples in 2×2 tables, with Yates's correction serving as a middle ground between uncorrected chi-squared tests and Fisher's exact test. The correction was first published by Frank Yates in 1934.


Correction for approximation error

Using the
chi-squared distribution In probability theory and statistics, the \chi^2-distribution with k Degrees of freedom (statistics), degrees of freedom is the distribution of a sum of the squares of k Independence (probability theory), independent standard normal random vari ...
to interpret Pearson's chi-squared statistic requires one to assume that the
discrete Discrete may refer to: *Discrete particle or quantum in physics, for example in quantum theory * Discrete device, an electronic component with just one circuit element, either passive or active, other than an integrated circuit * Discrete group, ...
probability of observed binomial frequencies in the table can be approximated by the continuous
chi-squared distribution In probability theory and statistics, the \chi^2-distribution with k Degrees of freedom (statistics), degrees of freedom is the distribution of a sum of the squares of k Independence (probability theory), independent standard normal random vari ...
. This assumption is not quite correct, and introduces some error. To reduce the error in approximation, Frank Yates, an English
statistician A statistician is a person who works with Theory, theoretical or applied statistics. The profession exists in both the private sector, private and public sectors. It is common to combine statistical knowledge with expertise in other subjects, a ...
, suggested a correction for continuity that adjusts the formula for
Pearson's chi-squared test Pearson's chi-squared test or Pearson's \chi^2 test is a statistical test applied to sets of categorical data to evaluate how likely it is that any observed difference between the sets arose by chance. It is the most widely used of many chi-squa ...
by subtracting 0.5 from the difference between each observed value and its expected value in a 2 × 2 contingency table. Yates, F (1934). "Contingency table involving small numbers and the χ2 test". ''Supplement to the
Journal of the Royal Statistical Society The ''Journal of the Royal Statistical Society'' is a peer-reviewed scientific journal of statistics. It comprises three series and is published by Oxford University Press for the Royal Statistical Society. History The Statistical Society of ...
'' 1(2): 217–235.
This reduces the chi-squared value obtained and thus increases its
p-value In null-hypothesis significance testing, the ''p''-value is the probability of obtaining test results at least as extreme as the result actually observed, under the assumption that the null hypothesis is correct. A very small ''p''-value means ...
. The effect of Yates's correction is to prevent overestimation of statistical significance for small data. This formula is chiefly used when at least one cell of the table has an expected count smaller than 5. : \sum_^N O_i = 20 \, The following is Yates's corrected version of Pearson's chi-squared statistics: : \chi_\text^2 = \sum_^ where: :''Oi'' = an observed frequency :''Ei'' = an expected (theoretical) frequency, asserted by the null hypothesis :''N'' = number of distinct events


2 × 2 table

As a short-cut, for a 2 × 2 table with the following entries: : \chi_\text^2 = \frac. In some cases, this is better. : \chi_\text^2 = \frac. Yates's correction should always be applied, as it will tend to improve the accuracy of the p-value obtained. However, in situations with large sample sizes, using the correction will have little effect on the value of the test statistic, and hence the p-value.


See also

* Continuity correction * Wilson score interval with continuity correction


References

{{reflist Statistical hypothesis testing Theory of probability distributions Computational statistics