Yates's correction for continuity
   HOME

TheInfoList



OR:

In statistics, Yates's correction for continuity (or Yates's chi-squared test) is used in certain situations when testing for
independence Independence is a condition of a person, nation, country, or state in which residents and population, or some portion thereof, exercise self-government, and usually sovereignty, over its territory. The opposite of independence is the statu ...
in a contingency table. It aims at correcting the error introduced by assuming that the discrete probabilities of frequencies in the table can be approximated by a continuous distribution ( chi-squared). In some cases, Yates's correction may adjust too far, and so its current use is limited.


Correction for approximation error

Using the
chi-squared distribution In probability theory and statistics, the chi-squared distribution (also chi-square or \chi^2-distribution) with k degrees of freedom is the distribution of a sum of the squares of k independent standard normal random variables. The chi-squar ...
to interpret Pearson's chi-squared statistic requires one to assume that the
discrete Discrete may refer to: *Discrete particle or quantum in physics, for example in quantum theory *Discrete device, an electronic component with just one circuit element, either passive or active, other than an integrated circuit *Discrete group, a g ...
probability of observed binomial frequencies in the table can be approximated by the continuous
chi-squared distribution In probability theory and statistics, the chi-squared distribution (also chi-square or \chi^2-distribution) with k degrees of freedom is the distribution of a sum of the squares of k independent standard normal random variables. The chi-squar ...
. This assumption is not quite correct, and introduces some error. To reduce the error in approximation, Frank Yates, an
English English usually refers to: * English language * English people English may also refer to: Peoples, culture, and language * ''English'', an adjective for something of, from, or related to England ** English national ide ...
statistician, suggested a correction for continuity that adjusts the formula for Pearson's chi-squared test by subtracting 0.5 from the difference between each observed value and its expected value in a 2 × 2 contingency table. Yates, F (1934). "Contingency table involving small numbers and the χ2 test". ''Supplement to the
Journal of the Royal Statistical Society The ''Journal of the Royal Statistical Society'' is a peer-reviewed scientific journal of statistics. It comprises three series and is published by Wiley for the Royal Statistical Society. History The Statistical Society of London was founded ...
'' 1(2): 217–235.
This reduces the chi-squared value obtained and thus increases its p-value. The effect of Yates's correction is to prevent overestimation of statistical significance for small data. This formula is chiefly used when at least one cell of the table has an expected count smaller than 5. Unfortunately, Yates's correction may tend to overcorrect. This can result in an overly conservative result that fails to reject the
null hypothesis In scientific research, the null hypothesis (often denoted ''H''0) is the claim that no difference or relationship exists between two sets of data or variables being analyzed. The null hypothesis is that any experimentally observed difference is d ...
when it should (a
type II error In statistical hypothesis testing, a type I error is the mistaken rejection of an actually true null hypothesis (also known as a "false positive" finding or conclusion; example: "an innocent person is convicted"), while a type II error is the f ...
). So it is suggested that Yates's correction is unnecessary even with quite low sample sizes,Sokal RR, Rohlf F.J. (1981). ''Biometry: The Principles and Practice of Statistics in Biological Research.'' Oxford: W.H. Freeman, . such as: : \sum_^N O_i = 20 \, The following is Yates's corrected version of Pearson's chi-squared statistics: : \chi_\text^2 = \sum_^ where: :''Oi'' = an observed frequency :''Ei'' = an expected (theoretical) frequency, asserted by the null hypothesis :''N'' = number of distinct events


2 × 2 table

As a short-cut, for a 2 × 2 table with the following entries: we can write N=a+b+c+d : \chi_\text^2 = \frac. In some cases, this is better. : \chi_\text^2 = \frac.


See also

*
Continuity correction In probability theory, a continuity correction is an adjustment that is made when a discrete distribution is approximated by a continuous distribution. Examples Binomial If a random variable ''X'' has a binomial distribution with parameters ' ...
* Wilson score interval with continuity correction


References

{{reflist Statistical tests for contingency tables