Cochran's

C

test, named after William G. Cochran, is a one-sided upper limit variance

outlier In statistics, an outlier is a data point that differs significantly from other observations. An outlier may be due to a variability in the measurement, an indication of novel data, or it may be the result of experimental error; the latter are ...

statistical test . The C test is used to decide if a single

estimate Estimation (or estimating) is the process of finding an estimate or approximation, which is a value that is usable for some purpose even if input data may be incomplete, uncertain, or unstable. The value is nonetheless usable because it is de ...

of a

variance In probability theory and statistics, variance is the expected value of the squared deviation from the mean of a random variable. The standard deviation (SD) is obtained as the square root of the variance. Variance is a measure of dispersion ...

(or a

standard deviation In statistics, the standard deviation is a measure of the amount of variation of the values of a variable about its Expected value, mean. A low standard Deviation (statistics), deviation indicates that the values tend to be close to the mean ( ...

) is significantly larger than a group of variances (or standard deviations) with which the single estimate is supposed to be comparable. The C test is discussed in many text books P. Konieczka, J. Namieśnik, Quality Assurance and Quality Control in the Analytical Chemical Laboratory – A Practical Approach, CRC Press, Boca Raton, Florida, 2009; . and has been recommended by

IUPAC The International Union of Pure and Applied Chemistry (IUPAC ) is an international federation of National Adhering Organizations working for the advancement of the chemical sciences, especially by developing nomenclature and terminology. It is ...

W. Horwitz, Harmonized protocol for the design and interpretation of collaborative studies, Trends in Analytical Chemistry 7(4), 118–120 (April 1988). and

ISO The International Organization for Standardization (ISO ; ; ) is an independent, non-governmental, international standard development organization composed of representatives from the national standards organizations of member countries. Me ...

Standard 5725–2:1994, “

Accuracy Accuracy and precision are two measures of ''observational error''. ''Accuracy'' is how close a given set of measurements (observations or readings) are to their ''true value''. ''Precision'' is how close the measurements are to each other. The ...

(trueness and precision) of measurement methods and results – Part 2: Basic method for the determination of

repeatability Repeatability or test–retest reliability is the closeness of the agreement between the results of successive measurements of the same measure, when carried out under the same conditions of measurement. In other words, the measurements are take ...

and

reproducibility Reproducibility, closely related to replicability and repeatability, is a major principle underpinning the scientific method. For the findings of a study to be reproducible means that results obtained by an experiment or an observational study or ...

of a standard measurement method”, International Organization for Standardization, Geneva, Switzerland, 1994; https://www.iso.org/iso/iso_catalogue/catalogue_tc/catalogue_detail.htm?csnumber=11834 Cochran's C test should not be confused with Cochran's Q test, which applies to the

analysis Analysis (: analyses) is the process of breaking a complex topic or substance into smaller parts in order to gain a better understanding of it. The technique has been applied in the study of mathematics and logic since before Aristotle (38 ...

of two-way

randomized block design In the statistical theory of the design of experiments, blocking is the arranging of experimental units that are similar to one another in groups (blocks) based on one or more variables. These variables are chosen carefully to minimize the effect ...

s. The C test assumes a balanced design, i.e. the considered full

data set A data set (or dataset) is a collection of data. In the case of tabular data, a data set corresponds to one or more table (database), database tables, where every column (database), column of a table represents a particular Variable (computer sci ...

should consist of individual data series that all have equal size. The C test further assumes that each individual data series is

normally distributed In probability theory and statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real number, real-valued random variable. The general form of its probability density function is f(x ...

. Although primarily an outlier test, the C test is also in use as a simple alternative for regular

homoscedasticity In statistics, a sequence of random variables is homoscedastic () if all its random variables have the same finite variance; this is also known as homogeneity of variance. The complementary notion is called heteroscedasticity, also known as hete ...

tests such as Bartlett's test, Levene's test and the

Brown–Forsythe test The Brown–Forsythe test is a statistical test for the equality of group variances based on performing an Analysis of Variance (ANOVA) on a transformation of the response variable. When a one-way ANOVA is performed, samples are assumed to have ...

to check a

statistical data Statistics (from German: ', "description of a state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a scientific, industrial, or socia ...

set for homogeneity of variances. An even simpler way to check homoscedasticity is provided by Hartley's F_max test, but Hartley's F_max test has the disadvantage that it only accounts for the minimum and the maximum of the variance range, while the C test accounts for all variances within the range.

Description

The C test detects one exceptionally large variance value at a time. The corresponding data series is then omitted from the full data set. According to ISO standard 5725 the C test may be

iterate Iteration is the repetition of a process in order to generate a (possibly unbounded) sequence of outcomes. Each repetition of the process is a single iteration, and the outcome of each iteration is then the starting point of the next iteration. ...

d until no further exceptionally large variance values are detected, but such practice may lead to excessive rejections if the underlying data series are not normally distributed. The C test evaluates the

ratio In mathematics, a ratio () shows how many times one number contains another. For example, if there are eight oranges and six lemons in a bowl of fruit, then the ratio of oranges to lemons is eight to six (that is, 8:6, which is equivalent to the ...

: ::

C_j =  \frac

where: :''C_j'' = Cochran's C statistic for data series ''j'' :''S_j'' = standard deviation of data series ''j'' :''N'' = number of data series that remain in the data set; ''N'' is decreased in steps of 1 upon each iteration of the C test :''S_i'' = standard deviation of data series i (1 ≤ ''i'' ≤ ''N'') The C test tests the

null hypothesis The null hypothesis (often denoted ''H''0) is the claim in scientific research that the effect being studied does not exist. The null hypothesis can also be described as the hypothesis in which no relationship exists between two sets of data o ...

(H₀) against the

alternative hypothesis In statistical hypothesis testing, the alternative hypothesis is one of the proposed propositions in the hypothesis test. In general the goal of hypothesis test is to demonstrate that in the given condition, there is sufficient evidence supporting ...

(H_a): :H₀: All variances are equal. :H_a: At least one variance value is significantly larger than the other variance values.

Critical values

The sample variance of data series ''j'' is considered an outlier at

significance level In statistical hypothesis testing, a result has statistical significance when a result at least as "extreme" would be very infrequent if the null hypothesis were true. More precisely, a study's defined significance level, denoted by \alpha, is the ...

''α'' if ''C_j'' exceeds the upper limit

critical value Critical value or threshold value can refer to: * A quantitative threshold in medicine, chemistry and physics * Critical value (statistics), boundary of the acceptance region while testing a statistical hypothesis * Value of a function at a crit ...

C_UL. C_UL depends on the desired significance level ''α'', the number of considered data series ''N'', and the number of data points (''n'') per data series. Selections of values for C_UL have been tabulated at significance levels α = 0.01,R. Moore, Mathematics Department, Macquarie University, Sydney, Australia, 1999: https://faculty.washington.edu/heagerty/Books/Biostatistics/TABLES/Cochran.R.U.E. 't Lam, Scrutiny of variance results for outliers: Cochran's test optimized, ''Analytica Chimica Acta'' 659, 68–84 (2010); α = 0.025, and α = 0.05. ''C''_UL can also be calculated from: :

C_\text (\alpha,n,N) = \left 1+ \frac \right .

Here: :''C''_UL = upper limit critical value for one-sided test on a balanced design :''α'' = significance level, e.g., 0.05 :''n'' = number of data points per data series :''F''_c = critical value of Fisher's F ratio; ''F''_c can be obtained from tables of the

F distribution In probability theory and statistics, the ''F''-distribution or ''F''-ratio, also known as Snedecor's ''F'' distribution or the Fisher–Snedecor distribution (after Ronald Fisher and George W. Snedecor), is a continuous probability distribut ...

Table of critical values of the F-distribution
NIST
/ref> or using computer software for this function.

Generalization

The C test can be generalized to include unbalanced designs, one-sided lower limit tests and two-sided tests at any significance level ''α'', for any number of data series ''N'', and for any number of individual data points ''n_j'' in data series ''j''.R.U.E. 't Lam, Variance Outlier Test, blog: https://rtlam.blogspot.com/

References

External links

Critical C valuesGeneralized Variance Outlier Test
{{Statistics Statistical tests

Description

Critical values

Generalization

See also

References

External links