Cochran's C Test
   HOME

TheInfoList



OR:

In
statistics Statistics (from German language, German: ''wikt:Statistik#German, Statistik'', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of ...
, Cochran's C test, named after William G. Cochran, is a one-sided upper limit variance
outlier In statistics, an outlier is a data point that differs significantly from other observations. An outlier may be due to a variability in the measurement, an indication of novel data, or it may be the result of experimental error; the latter are ...
test. The C test is used to decide if a single
estimate Estimation (or estimating) is the process of finding an estimate or approximation, which is a value that is usable for some purpose even if input data may be incomplete, uncertain, or unstable. The value is nonetheless usable because it is der ...
of a
variance In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its population mean or sample mean. Variance is a measure of dispersion, meaning it is a measure of how far a set of numbers ...
(or a
standard deviation In statistics, the standard deviation is a measure of the amount of variation or dispersion of a set of values. A low standard deviation indicates that the values tend to be close to the mean (also called the expected value) of the set, while ...
) is significantly larger than a group of variances (or standard deviations) with which the single estimate is supposed to be comparable. The C test is discussed in many text books P. Konieczka, J. Namieśnik, Quality Assurance and Quality Control in the Analytical Chemical Laboratory – A Practical Approach, CRC Press, Boca Raton, Florida, 2009; . and has been recommended by
IUPAC The International Union of Pure and Applied Chemistry (IUPAC ) is an international federation of National Adhering Organizations working for the advancement of the chemical sciences, especially by developing nomenclature and terminology. It is ...
W. Horwitz, Harmonized protocol for the design and interpretation of collaborative studies, Trends in Analytical Chemistry 7(4), 118–120 (April 1988). and
ISO ISO is the most common abbreviation for the International Organization for Standardization. ISO or Iso may also refer to: Business and finance * Iso (supermarket), a chain of Danish supermarkets incorporated into the SuperBest chain in 2007 * Iso ...
.
ISO ISO is the most common abbreviation for the International Organization for Standardization. ISO or Iso may also refer to: Business and finance * Iso (supermarket), a chain of Danish supermarkets incorporated into the SuperBest chain in 2007 * Iso ...
Standard 5725–2:1994, “
Accuracy Accuracy and precision are two measures of ''observational error''. ''Accuracy'' is how close a given set of measurements (observations or readings) are to their ''true value'', while ''precision'' is how close the measurements are to each other ...
(trueness and precision) of measurement methods and results – Part 2: Basic method for the determination of
repeatability Repeatability or test–retest reliability is the closeness of the agreement between the results of successive measurements of the same measure, when carried out under the same conditions of measurement. In other words, the measurements are taken ...
and
reproducibility Reproducibility, also known as replicability and repeatability, is a major principle underpinning the scientific method. For the findings of a study to be reproducible means that results obtained by an experiment or an observational study or in a ...
of a standard measurement method”, International Organization for Standardization, Geneva, Switzerland, 1994; http://www.iso.org/iso/iso_catalogue/catalogue_tc/catalogue_detail.htm?csnumber=11834
Cochran's C test should not be confused with Cochran's Q test, which applies to the
analysis Analysis ( : analyses) is the process of breaking a complex topic or substance into smaller parts in order to gain a better understanding of it. The technique has been applied in the study of mathematics and logic since before Aristotle (38 ...
of two-way
randomized block design In the statistical theory of the design of experiments, blocking is the arranging of experimental units in groups (blocks) that are similar to one another. Blocking can be used to tackle the problem of pseudoreplication. Use Blocking reduces un ...
s. The C test assumes a balanced design, i.e. the considered full
data set A data set (or dataset) is a collection of data. In the case of tabular data, a data set corresponds to one or more database tables, where every column of a table represents a particular variable, and each row corresponds to a given record of the ...
should consist of individual data series that all have equal size. The C test further assumes that each individual data series is normally distributed. Although primarily an outlier test, the C test is also in use as a simple alternative for regular
homoscedasticity In statistics, a sequence (or a vector) of random variables is homoscedastic () if all its random variables have the same finite variance. This is also known as homogeneity of variance. The complementary notion is called heteroscedasticity. The s ...
tests such as Bartlett's test, Levene's test and the
Brown–Forsythe test The Brown–Forsythe test is a statistical test for the equality of group variances based on performing an Analysis of Variance (ANOVA) on a transformation of the response variable. When a one-way ANOVA is performed, samples are assumed to have ...
to check a
statistical data In the pursuit of knowledge, data (; ) is a collection of discrete values that convey information, describing quantity, quality, fact, statistics, other basic units of meaning, or simply sequences of symbols that may be further interpreted. ...
set for
homogeneity of variance In statistics, a sequence (or a vector) of random variables is homoscedastic () if all its random variables have the same finite variance. This is also known as homogeneity of variance. The complementary notion is called heteroscedasticity. The ...
s. An even simpler way to check homoscedasticity is provided by Hartley's Fmax test, but Hartley's Fmax test has the disadvantage that it only accounts for the minimum and the maximum of the variance range, while the C test accounts for all variances within the range.


Description

The C test detects one exceptionally large variance value at a time. The corresponding data series is then omitted from the full data set. According to ISO standard 5725 the C test may be
iterate Iteration is the repetition of a process in order to generate a (possibly unbounded) sequence of outcomes. Each repetition of the process is a single iteration, and the outcome of each iteration is then the starting point of the next iteration. ...
d until no further exceptionally large variance values are detected, but such practice may lead to excessive rejections if the underlying data series are not normally distributed. The C test evaluates the
ratio In mathematics, a ratio shows how many times one number contains another. For example, if there are eight oranges and six lemons in a bowl of fruit, then the ratio of oranges to lemons is eight to six (that is, 8:6, which is equivalent to the ...
: ::C_j = \frac where: :''Cj'' = Cochran's C statistic for data series ''j'' :''Sj'' = standard deviation of data series ''j'' :''N'' = number of data series that remain in the data set; ''N'' is decreased in steps of 1 upon each iteration of the C test :''Si'' = standard deviation of data series i (1 ≤ ''i'' ≤ ''N'') The C test tests the
null hypothesis In scientific research, the null hypothesis (often denoted ''H''0) is the claim that no difference or relationship exists between two sets of data or variables being analyzed. The null hypothesis is that any experimentally observed difference is d ...
(H0) against the alternative hypothesis (Ha): :H0: All variances are equal. :Ha: At least one variance value is significantly larger than the other variance values.


Critical values

The sample variance of data series ''j'' is considered an outlier at
significance level In statistical hypothesis testing, a result has statistical significance when it is very unlikely to have occurred given the null hypothesis (simply by chance alone). More precisely, a study's defined significance level, denoted by \alpha, is the ...
''α'' if ''Cj'' exceeds the upper limit
critical value Critical value may refer to: *In differential topology, a critical value of a differentiable function between differentiable manifolds is the image (value of) ƒ(''x'') in ''N'' of a critical point ''x'' in ''M''. *In statistical hypothesis ...
CUL. CUL depends on the desired significance level ''α'', the number of considered data series ''N'', and the number of data points (''n'') per data series. Selections of values for CUL have been tabulated at significance levels α = 0.01,R. Moore, Mathematics Department, Macquarie University, Sydney, Australia, 1999: http://faculty.washington.edu/heagerty/Books/Biostatistics/TABLES/Cochran.R.U.E. 't Lam, Scrutiny of variance results for outliers: Cochran's test optimized, ''Analytica Chimica Acta'' 659, 68–84 (2010); α = 0.025, and α = 0.05. ''C''UL can also be calculated from: :C_\text (\alpha,n,N) = \left 1+ \frac \right . Here: :''C''UL = upper limit critical value for one-sided test on a balanced design :''α'' = significance level, e.g., 0.05 :''n'' = number of data points per data series :''F''c = critical value of Fisher's F ratio; ''F''c can be obtained from tables of the
F distribution In probability theory and statistics, the ''F''-distribution or F-ratio, also known as Snedecor's ''F'' distribution or the Fisher–Snedecor distribution (after Ronald Fisher and George W. Snedecor) is a continuous probability distribution t ...
Table of critical values of the F-distributio
NIST
/ref> or using computer software for this function.


Generalization

The C test can be generalized to include unbalanced designs, one-sided lower limit tests and
two-sided In mathematics, specifically in topology of manifolds, a compact codimension-one submanifold F of a manifold M is said to be 2-sided in M when there is an embedding ::h\colon F\times 1,1to M with h(x,0)=x for each x\in F and ::h(F\times 1,1\ ...
tests at any significance level ''α'', for any number of data series ''N'', and for any number of individual data points ''nj'' in data series ''j''.R.U.E. 't Lam, Variance Outlier Test, blog: http://rtlam.blogspot.com/


See also

*
Bartlett's test In statistics, Bartlett's test, named after Maurice Stevenson Bartlett, is used to test homoscedasticity, that is, if multiple samples are from populations with equal variances. Some statistical tests, such as the analysis of variance, assume tha ...
*
Levene's test In statistics, Levene's test is an inferential statistic used to assess the equality of variances for a variable calculated for two or more groups. Some common statistical procedures assume that variances of the populations from which different sam ...
*
Brown–Forsythe test The Brown–Forsythe test is a statistical test for the equality of group variances based on performing an Analysis of Variance (ANOVA) on a transformation of the response variable. When a one-way ANOVA is performed, samples are assumed to have ...
*
Hartley's test In statistics, Hartley's test, also known as the ''F''max test or Hartley's ''F''max, is used in the analysis of variance to verify that different groups have a similar variance, an assumption needed for other statistical tests. It was developed by ...
*
F-test of equality of variances In statistics, an ''F''-test of equality of variances is a test for the null hypothesis that two normal populations have the same variance. Notionally, any ''F''-test can be regarded as a comparison of two variances, but the specific case being ...


References


External links


Critical C valuesGeneralized Variance Outlier Test
{{Statistics Statistical tests