HOME

TheInfoList



OR:

In statistical theory, Chauvenet's criterion (named for
William Chauvenet William Chauvenet (24 May 1820 in Milford, Pennsylvania – 13 December 1870 in St. Paul, Minnesota) was a professor of mathematics, astronomy, navigation, and surveying who was instrumental in the establishment of the U.S. Naval Academy at Annapo ...
) is a means of assessing whether one piece of experimental data — an
outlier In statistics, an outlier is a data point that differs significantly from other observations. An outlier may be due to a variability in the measurement, an indication of novel data, or it may be the result of experimental error; the latter are ...
— from a set of observations, is likely to be spurious.


Derivation

The idea behind Chauvenet's criterion is to find a probability band, centered on the mean of a
normal distribution In statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is : f(x) = \frac e^ The parameter \mu ...
, that should reasonably contain all n samples of a data set. By doing this, any data points from the n samples that lie outside this probability band can be considered to be outliers, removed from the data set, and a new mean and standard deviation based on the remaining values and new sample size can be calculated. This identification of the outliers will be achieved by finding the number of standard deviations that correspond to the bounds of the probability band around the mean (D_) and comparing that value to the absolute value of the difference between the suspected outliers and the mean divided by the sample standard deviation (Eq.1). where * D_ is the maximum allowable deviation, * , \cdot , is the absolute value, * x is the value of suspected outlier, * \bar x is sample mean, and * s_x is sample standard deviation. In order to be considered as including all n observations in the sample, the probability band (centered on the mean) must only account for n-\tfrac12 samples (if n=3 then only 2.5 of the samples must be accounted for in the probability band). In reality we cannot have partial samples so n-\tfrac12 (2.5 for n=3) is approximately n. Anything less than n-\tfrac12 is approximately n-1 (2 if n=3) and is not valid because we want to find the probability band that contains n observations, not n-1 samples. In short, we are looking for the probability, P, that is equal to n-\tfrac12 out of n samples (Eq.2). where * P is the probability band centered on the sample mean and * n is the sample size. The quantity \tfrac1 corresponds to the combined probability represented by the two tails of the normal distribution that fall outside of the probability band P. In order to find the standard deviation level associated with P, only the probability of one of the tails of the normal distribution needs to be analyzed due to its symmetry (Eq.3). where * P_z is probability represented by one tail of the normal distribution and * n = sample size. Eq.1 is analogous to the Z-score equation (Eq.4). where * Z is the Z-score, * x is the sample value, * \mu=0 is the mean of standard normal distribution, and * \sigma=1 is the standard deviation of standard normal distribution. Based on Eq.4, to find the D_ (Eq.1) find the z-score corresponding to P_z in a Z-score table. D_ is equal to the score for P_z. Using this method D_ can be determined for any sample size. In Excel, D_ can be found with the following formula: =ABS(NORM.S.INV(1/(4''n''))).


Calculation

To apply Chauvenet's criterion, first calculate the
mean There are several kinds of mean in mathematics, especially in statistics. Each mean serves to summarize a given group of data, often to better understand the overall value (magnitude and sign) of a given data set. For a data set, the ''arithme ...
and
standard deviation In statistics, the standard deviation is a measure of the amount of variation or dispersion of a set of values. A low standard deviation indicates that the values tend to be close to the mean (also called the expected value) of the set, while ...
of the observed data. Based on how much the suspect datum differs from the mean, use the
normal distribution In statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is : f(x) = \frac e^ The parameter \mu ...
function (or a table thereof) to determine the
probability Probability is the branch of mathematics concerning numerical descriptions of how likely an Event (probability theory), event is to occur, or how likely it is that a proposition is true. The probability of an event is a number between 0 and ...
that a given data point will be at the value of the suspect data point. Multiply this probability by the number of data points taken. If the result is less than 0.5, the suspicious data point may be discarded, i.e., a reading may be rejected if the probability of obtaining the particular deviation from the mean is less than \tfrac1.


Example

For instance, suppose a value is measured experimentally in several trials as 9, 10, 10, 10, 11, and 50, and we want to find out if 50 is an outlier. First, we find P_z. P_z = 1-\frac1=1-\frac1=1-\frac1\approx.9583 Then we find D_ by plugging P_z into the
Quantile Function In probability and statistics, the quantile function, associated with a probability distribution of a random variable, specifies the value of the random variable such that the probability of the variable being less than or equal to that value equ ...
. D_=Q(P_z)\approx1.7317 Then we find the z-score of 50. z=\frac=\frac\approx2.04 From there we see that z>D_ and can conclude that 50 is an outlier according to Chauvenet's Criterion.


Peirce's criterion

Another method for eliminating spurious data is called ''
Peirce's criterion In robust statistics, Peirce's criterion is a rule for eliminating outliers from data sets, which was devised by Benjamin Peirce. Outliers removed by Peirce's criterion The problem of outliers In data sets containing real-numbered measurements, ...
''. It was developed a few years before Chauvenet's criterion was published, and it is a more rigorous approach to the rational deletion of outlier data.Ross, PhD, Stephen (2003). University of New Haven article. J. Engr. Technology, Fall 2003. Retrieved from https://www.researchgate.net/profile/Stephen-Ross-9. Other methods such as
Grubbs's test for outliers In statistics, Grubbs's test or the Grubbs test (named after Frank E. Grubbs, who published the test in 1950), also known as the maximum normalized residual test or extreme studentized deviate test, is a test used to detect outliers in a univaria ...
are mentioned under the listing for ''
Outlier In statistics, an outlier is a data point that differs significantly from other observations. An outlier may be due to a variability in the measurement, an indication of novel data, or it may be the result of experimental error; the latter are ...
''.


Criticism

Deletion of outlier data is a controversial practice frowned on by many scientists and science instructors; while Chauvenet's criterion provides an objective and quantitative method for data rejection, it does not make the practice more scientifically or methodologically sound, especially in small sets or where a
normal distribution In statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is : f(x) = \frac e^ The parameter \mu ...
cannot be assumed. Rejection of outliers is more acceptable in areas of practice where the underlying model of the process being measured and the usual distribution of measurement error are confidently known.


References


Bibliography

* Taylor, John R. ''An Introduction to Error Analysis''. 2nd edition. Sausalito, California: University Science Books, 1997. pp 166–8. * Barnett, Vic and Lewis, Toby. "Outliers in Statistical Data". 3rd edition. Chichester: J.Wiley and Sons, 1994. {{ISBN, 0-471-93094-6. *Aicha Zerbet, Mikhail Nikulin. A new statistics for detecting outliers in exponential case, Communications in Statistics: Theory and Methods, 2003, v.32, pp. 573–584. Chauvenet Statistical outliers