Cochran's
test, named after
William G. Cochran, is a
one-sided upper limit variance
outlier
In statistics, an outlier is a data point that differs significantly from other observations. An outlier may be due to a variability in the measurement, an indication of novel data, or it may be the result of experimental error; the latter are ...
statistical test . The C test is used to decide if a single
estimate
Estimation (or estimating) is the process of finding an estimate or approximation, which is a value that is usable for some purpose even if input data may be incomplete, uncertain, or unstable. The value is nonetheless usable because it is de ...
of a
variance
In probability theory and statistics, variance is the expected value of the squared deviation from the mean of a random variable. The standard deviation (SD) is obtained as the square root of the variance. Variance is a measure of dispersion ...
(or a
standard deviation
In statistics, the standard deviation is a measure of the amount of variation of the values of a variable about its Expected value, mean. A low standard Deviation (statistics), deviation indicates that the values tend to be close to the mean ( ...
) is
significantly larger than a group of variances (or standard deviations) with which the single estimate is supposed to be comparable. The C test is discussed in many text books
[P. Konieczka, J. Namieśnik, Quality Assurance and Quality Control in the Analytical Chemical Laboratory – A Practical Approach, CRC Press, Boca Raton, Florida, 2009; .] and has been recommended by
IUPAC
The International Union of Pure and Applied Chemistry (IUPAC ) is an international federation of National Adhering Organizations working for the advancement of the chemical sciences, especially by developing nomenclature and terminology. It is ...
[ W. Horwitz, Harmonized protocol for the design and interpretation of collaborative studies, Trends in Analytical Chemistry 7(4), 118–120 (April 1988).] and
ISO
The International Organization for Standardization (ISO ; ; ) is an independent, non-governmental, international standard development organization composed of representatives from the national standards organizations of member countries.
Me ...
.
ISO
The International Organization for Standardization (ISO ; ; ) is an independent, non-governmental, international standard development organization composed of representatives from the national standards organizations of member countries.
Me ...
Standard 5725–2:1994, “Accuracy
Accuracy and precision are two measures of ''observational error''.
''Accuracy'' is how close a given set of measurements (observations or readings) are to their ''true value''.
''Precision'' is how close the measurements are to each other.
The ...
(trueness and precision) of measurement methods and results – Part 2: Basic method for the determination of repeatability
Repeatability or test–retest reliability is the closeness of the agreement between the results of successive measurements of the same measure, when carried out under the same conditions of measurement. In other words, the measurements are take ...
and reproducibility
Reproducibility, closely related to replicability and repeatability, is a major principle underpinning the scientific method. For the findings of a study to be reproducible means that results obtained by an experiment or an observational study or ...
of a standard measurement method”, International Organization for Standardization, Geneva, Switzerland, 1994;
https://www.iso.org/iso/iso_catalogue/catalogue_tc/catalogue_detail.htm?csnumber=11834 Cochran's C test should not be confused with
Cochran's Q test, which applies to the
analysis
Analysis (: analyses) is the process of breaking a complex topic or substance into smaller parts in order to gain a better understanding of it. The technique has been applied in the study of mathematics and logic since before Aristotle (38 ...
of two-way
randomized block design
In the statistical theory of the design of experiments, blocking is the arranging of experimental units that are similar to one another in groups (blocks) based on one or more variables. These variables are chosen carefully to minimize the effect ...
s.
The C test assumes a balanced design, i.e. the considered full
data set
A data set (or dataset) is a collection of data. In the case of tabular data, a data set corresponds to one or more table (database), database tables, where every column (database), column of a table represents a particular Variable (computer sci ...
should consist of individual data series that all have equal size. The C test further assumes that each individual data series is
normally distributed
In probability theory and statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real number, real-valued random variable. The general form of its probability density function is
f(x ...
. Although primarily an outlier test, the C test is also in use as a simple alternative for regular
homoscedasticity
In statistics, a sequence of random variables is homoscedastic () if all its random variables have the same finite variance; this is also known as homogeneity of variance. The complementary notion is called heteroscedasticity, also known as hete ...
tests such as
Bartlett's test,
Levene's test and the
Brown–Forsythe test
The Brown–Forsythe test is a statistical test for the equality of group variances based on performing an Analysis of Variance (ANOVA) on a transformation of the response variable. When a one-way ANOVA is performed, samples are assumed to have ...
to check a
statistical data
Statistics (from German: ', "description of a state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a scientific, industrial, or socia ...
set for
homogeneity of variances. An even simpler way to check homoscedasticity is provided by
Hartley's Fmax test,
but Hartley's F
max test has the disadvantage that it only accounts for the minimum and the maximum of the variance range, while the C test accounts for all variances within the range.
Description
The C test detects one exceptionally large variance value at a time. The corresponding data series is then omitted from the full data set. According to ISO standard 5725
the C test may be
iterate
Iteration is the repetition of a process in order to generate a (possibly unbounded) sequence of outcomes. Each repetition of the process is a single iteration, and the outcome of each iteration is then the starting point of the next iteration.
...
d until no further exceptionally large variance values are detected, but such practice may lead to excessive rejections if the underlying data series are not normally distributed.
The C test evaluates the
ratio
In mathematics, a ratio () shows how many times one number contains another. For example, if there are eight oranges and six lemons in a bowl of fruit, then the ratio of oranges to lemons is eight to six (that is, 8:6, which is equivalent to the ...
:
::
where:
:''C
j'' = Cochran's C statistic for data series ''j''
:''S
j'' = standard deviation of data series ''j''
:''N'' = number of data series that remain in the data set; ''N'' is decreased in steps of 1 upon each iteration of the C test
:''S
i'' = standard deviation of data series i (1 ≤ ''i'' ≤ ''N'')
The C test tests the
null hypothesis
The null hypothesis (often denoted ''H''0) is the claim in scientific research that the effect being studied does not exist. The null hypothesis can also be described as the hypothesis in which no relationship exists between two sets of data o ...
(H
0) against the
alternative hypothesis
In statistical hypothesis testing, the alternative hypothesis is one of the proposed propositions in the hypothesis test. In general the goal of hypothesis test is to demonstrate that in the given condition, there is sufficient evidence supporting ...
(H
a):
:H
0: All variances are equal.
:H
a: At least one variance value is significantly larger than the other variance values.
Critical values
The sample variance of data series ''j'' is considered an outlier at
significance level
In statistical hypothesis testing, a result has statistical significance when a result at least as "extreme" would be very infrequent if the null hypothesis were true. More precisely, a study's defined significance level, denoted by \alpha, is the ...
''α'' if ''C
j'' exceeds the upper limit
critical value Critical value or threshold value can refer to:
* A quantitative threshold in medicine, chemistry and physics
* Critical value (statistics), boundary of the acceptance region while testing a statistical hypothesis
* Value of a function at a crit ...
C
UL. C
UL depends on the desired significance level ''α'', the number of considered data series ''N'', and the number of data points (''n'') per data series. Selections of values for C
UL have been tabulated at significance levels α = 0.01,
[R. Moore, Mathematics Department, Macquarie University, Sydney, Australia, 1999: https://faculty.washington.edu/heagerty/Books/Biostatistics/TABLES/Cochran.][R.U.E. 't Lam, Scrutiny of variance results for outliers: Cochran's test optimized, ''Analytica Chimica Acta'' 659, 68–84 (2010); ] α = 0.025,
and α = 0.05.
''C''
UL can also be calculated from:
:
Here:
:''C''
UL = upper limit critical value for one-sided test on a balanced design
:''α'' = significance level, e.g., 0.05
:''n'' = number of data points per data series
:''F''
c = critical value of
Fisher's F ratio; ''F''
c can be obtained from tables of the
F distribution
In probability theory and statistics, the ''F''-distribution or ''F''-ratio, also known as Snedecor's ''F'' distribution or the Fisher–Snedecor distribution (after Ronald Fisher and George W. Snedecor), is a continuous probability distribut ...
[Table of critical values of the F-distribution]
NIST
/ref> or using computer software for this function.
Generalization
The C test can be generalized to include unbalanced designs, one-sided lower limit tests and two-sided tests at any significance level ''α'', for any number of data series ''N'', and for any number of individual data points ''nj'' in data series ''j''.[R.U.E. 't Lam, Variance Outlier Test, blog: https://rtlam.blogspot.com/]
See also
*Bartlett's test
In statistics, Bartlett's test, named after Maurice Stevenson Bartlett, is used to test homoscedasticity, that is, if multiple samples are from populations with equal variances. Some statistical tests, such as the analysis of variance, assume tha ...
*Levene's test
In statistics, Levene's test is an inferential statistic used to assess the equality of variances for a variable calculated for two or more groups. This test is used because some common statistical procedures assume that variances of the population ...
*Brown–Forsythe test
The Brown–Forsythe test is a statistical test for the equality of group variances based on performing an Analysis of Variance (ANOVA) on a transformation of the response variable. When a one-way ANOVA is performed, samples are assumed to have ...
* Hartley's test
* F-test of equality of variances
References
External links
Critical C values
Generalized Variance Outlier Test
{{Statistics
Statistical tests