HOME

TheInfoList



OR:

In
statistics Statistics (from German language, German: ''wikt:Statistik#German, Statistik'', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of ...
, Duncan's new multiple range test (MRT) is a multiple comparison procedure developed by David B. Duncan in 1955. Duncan's MRT belongs to the general class of multiple comparison procedures that use the
studentized range In statistics, the studentized range, denoted ''q'', is the difference between the largest and smallest data in a sample normalized by the sample standard deviation. It is named after William Sealy Gosset (who wrote under the pseudonym "''Student'' ...
statistic ''q''''r'' to compare sets of means. David B. Duncan developed this test as a modification of the Student–Newman–Keuls method that would have greater power. Duncan's MRT is especially protective against false negative (Type II) error at the expense of having a greater risk of making false positive (Type I) errors. Duncan's test is commonly used in
agronomy Agronomy is the science and technology of producing and using plants by agriculture for food, fuel, fiber, chemicals, recreation, or land conservation. Agronomy has come to include research of plant genetics, plant physiology, meteorology, and ...
and other agricultural research. The result of the test is a set of subsets of means, where in each subset means have been found not to be significantly different from one another. This test is often followed by the
Compact Letter Display (CLD) Compact Letter Display (CLD) is a statistical method to clarify the output of multiple hypothesis testing when using the Analysis of variance, ANOVA and Tukey's range test, Tukey's range tests. CLD can also be applied following the Duncan's new m ...
methodology that renders the output of such test much more accessible to non-statistician audiences.


Definition

Assumptions:
1.A sample of observed means m_,m_,...,m_ , which have been drawn independently from n normal populations with "true" means, \mu_,\mu_,...,\mu_ respectively.
2.A common standard error \sigma . This standard error is unknown, but there is available the usual estimate s_ , which is independent of the observed means and is based on a number of
degrees of freedom Degrees of freedom (often abbreviated df or DOF) refers to the number of independent variables or parameters of a thermodynamic system. In various scientific fields, the word "freedom" is used to describe the limits to which physical movement or ...
, denoted by n_ . (More precisely, S_, has the property that \frac is distributed as \chi^2 with n_2 degrees of freedom, independently of sample means). The exact definition of the test is: The difference between any two means in a set of n means is significant provided the range of each and every subset which contains the given means is significant according to an \alpha_ level range test where \alpha_p=1-\gamma_p , \gamma_p =(1-\alpha)^ and p is the number of means in the subset concerned. Exception: The sole exception to this rule is that no difference between two means can be declared significant if the two means concerned are both contained in a subset of the means which has a non-significant range.


Procedure

The procedure consists of a series of
pairwise comparisons Pairwise comparison generally is any process of comparing entities in pairs to judge which of each entity is preferred, or has a greater amount of some quantitative property, or whether or not the two entities are identical. The method of pairwis ...
between means. Each comparison is performed at a significance level \alpha_ , defined by the number of means separating the two means compared (\alpha_p for p-2 separating means). The test are performed sequentially, where the result of a test determines which test is performed next. The tests are performed in the following order: the largest minus the smallest, the largest minus the second smallest, up to the largest minus the second largest; then the second largest minus the smallest, the second largest minus the second smallest, and so on, finishing with the second smallest minus the smallest. With only one exception, given below, each difference is significant if it exceeds the corresponding shortest significant range; otherwise it is not significant. Where the shortest significant range is the significant
studentized range In statistics, the studentized range, denoted ''q'', is the difference between the largest and smallest data in a sample normalized by the sample standard deviation. It is named after William Sealy Gosset (who wrote under the pseudonym "''Student'' ...
, multiplied by the standard error. The shortest significant range will be designated as R_ , where p is the number means in the subset. The sole exception to this rule is that no difference between two means can be declared significant if the two means concerned are both contained in a subset of the means which has a non-significant range. An algorithm for performing the test is as follows: 1.Rank the sample means, largest to smallest. 2. For each m_ sample mean, largest to smallest, do the following: 2.1 for each sample mean, (denoted m_), for smallest up to m_ . 2.1.1 compare m_i -m_j to critical value \sigma_m \cdot R_ , P=i-j, \alpha=\alpha_p 2.1.2 if m_i-m_j does not exceed the critical value, the subset (m_j , m_,...,m_) is declared not significantly different: 2.1.2.1 Go to next iteration of loop 2. 2.1.3 Otherwise, keep going with loop 2.1


Critical values

Duncan's multiple range test makes use of the studentized range distribution in order to determine critical values for comparisons between means. Note that different comparisons between means may differ by their significance levels- since the significance level is subject to the size of the subset of means in question. Let us denote Q_ as the \gamma_ quantile of the studentized range distribution, with p observations, and \nu degrees of freedom for the second sample (see studentized range for more information). Let us denote r_ as the standardized critical value, given by the rule: If p=2
r_= Q_
Else
r_= max( Q_, r_ ) The shortest critical range, (the actual critical value of the test) is computed as : R_=\sigma_ \cdot r_. For \nu->∞, a tabulation exists for an exact value of Q (see link). A word of caution is needed here: notations for Q and R are not the same throughout literature, where Q is sometimes denoted as the shortest significant interval, and R as the significant
quantile In statistics and probability, quantiles are cut points dividing the range of a probability distribution into continuous intervals with equal probabilities, or dividing the observations in a sample in the same way. There is one fewer quantile tha ...
for studentized range distribution (Duncan's 1955 paper uses both notations in different parts).


Numeric example

Let us look at the example of 5 treatment means:

With a standard error of s_m =1.796, and \nu=20 (degrees of freedom for estimating the standard error). Using a known tabulation for Q, one reaches the values of r_: r_=2.95
r_=3.10
r_=3.18
r_=3.25 Now we may obtain the values of the shortest significant range, by the formula:
R_=\sigma_* r_ Reaching: R_=3.75
R_=3.94
R_=4.04
R_=4.13 Then, the observed differences between means are tested, beginning with the largest versus smallest, which would be compared with the least significant range R_=4.13. Next, the difference of the largest and the second smallest is computed and compared with the least significant difference R_=4.04.

If an observed difference is greater than the corresponding shortest significant range, then we conclude that the pair of means in question is significantly different. If an observed difference is smaller than the corresponding shortest significant range, all differences sharing the same upper mean are considered insignificant, in order to prevent contradictions (differences sharing the same upper mean are shorter by construction).

For our case, the comparison will yield: 4 vs. 1: 21.6-9.8=11.8 >4.13 (R_5)
4 vs. 5: 21.6-10.8=10.8>4.04 (R_4)
4 vs. 2: 21.6-15.4=6.2>3.94 (R_3)
4 vs. 3: 21.6-17.6=4.0>3.75 (R_2)
3 vs. 1:17.6-9.8=7.8>4.04 (R_4)
3 vs. 5:17.6-10.8=6.8>3.94 (R_3)
3 vs. 2: 17.6-15.4=2.2<3.75 (R_2)
2 vs. 1:15.4-9.8=5.6>3.94 (R_3)
2 vs. 5:15.4-10.8=4.6>3.75 (R_2)
5 vs.1 :10.8-9.8=1.0<3.75 (R_2)

We see that there are significant differences between all pairs of treatments except (T3,T2) and (T5,T1). A graph underlining those means that are not significantly different is shown below:
T1 T5 T2 T3 T4


Protection and significance levels based on degrees of freedom

The new multiple range test proposed by Duncan makes use of special protection levels based upon
degrees of freedom Degrees of freedom (often abbreviated df or DOF) refers to the number of independent variables or parameters of a thermodynamic system. In various scientific fields, the word "freedom" is used to describe the limits to which physical movement or ...
. Let \gamma_ = be the protection level for testing the significance of a difference between two means; that is, the
probability Probability is the branch of mathematics concerning numerical descriptions of how likely an Event (probability theory), event is to occur, or how likely it is that a proposition is true. The probability of an event is a number between 0 and ...
that a significant difference between two means will not be found if the population means are equal. Duncan reasons that one has p-1
degrees of freedom Degrees of freedom (often abbreviated df or DOF) refers to the number of independent variables or parameters of a thermodynamic system. In various scientific fields, the word "freedom" is used to describe the limits to which physical movement or ...
for testing p ranked mean, and hence one may conduct p-1 independent tests, each with protection level \gamma_ = . Hence, the joint protection level is: \gamma_ = \gamma_^ = (1-\alpha)^ where \alpha _p = 1-\gamma_p that is, the probability that one finds no significant differences in making p-1 independent tests, each at protection level \gamma_ = , is \gamma_^ , under the hypothesis that all p population means are equal. In general: the difference between any two means in a set of n means is significant provided the range of each and every subset, which contains the given means, is significant according to an \alpha_p –level range test, where p is the number of means in the subset concerned. For \alpha = 0.05 , the protection level can be tabulated for various value of r as follows: Note that although this procedure makes use of the
Studentized range In statistics, the studentized range, denoted ''q'', is the difference between the largest and smallest data in a sample normalized by the sample standard deviation. It is named after William Sealy Gosset (who wrote under the pseudonym "''Student'' ...
, his error rate is neither on an experiment-wise basis (as with Tukey's) nor on a per- comparisons basis. Duncan's multiple range test does not control the family-wise error rate. See Criticism Section for further details.


Duncan Bayesian multiple comparison procedure

Duncan (1965) also gave the first Bayesian multiple comparison procedure, for the
pairwise comparisons Pairwise comparison generally is any process of comparing entities in pairs to judge which of each entity is preferred, or has a greater amount of some quantitative property, or whether or not the two entities are identical. The method of pairwis ...
among the means in a one-way layout. This multiple comparison procedure is different for the one discussed above. Duncan's Bayesian MCP discusses the differences between ordered group means, where the statistics in question are pairwise comparison (no equivalent is defined for the property of a subset having 'significantly different' property). Duncan modeled the consequences of two or more means being equal using additive
loss functions In mathematical optimization and decision theory, a loss function or cost function (sometimes also called an error function) is a function that maps an event or values of one or more variables onto a real number intuitively representing some "cost ...
within and across the
pairwise comparisons Pairwise comparison generally is any process of comparing entities in pairs to judge which of each entity is preferred, or has a greater amount of some quantitative property, or whether or not the two entities are identical. The method of pairwis ...
. If one assumes the same
loss function In mathematical optimization and decision theory, a loss function or cost function (sometimes also called an error function) is a function that maps an event or values of one or more variables onto a real number intuitively representing some "cost ...
across the pairwise comparisons, one needs to specify only one constant K, and this indicates the relative seriousness of type I to type II errors in each pairwise comparison. A study, which performed by Juliet Popper Shaffer (1998), has shown that the method proposed by Duncan, modified to provide weak control of FWE and using an empirical estimate of the
variance In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its population mean or sample mean. Variance is a measure of dispersion, meaning it is a measure of how far a set of numbers ...
of the population means, has good properties both from the Bayesian point of view, as a minimum- risk method, and from the frequentist point of view, with good average power. In addition, results indicate considerable similarity in both risk and average power between Duncan's modified procedure and the Benjamini and Hochberg (1995) False discovery rate -controlling procedure, with the same weak family-wise error control.


Criticism

Duncan's test has been criticised as being too liberal by many statisticians including
Henry Scheffé Henry Scheffé (April 11, 1907 – July 5, 1977) was an American statistician. He is known for the Lehmann–Scheffé theorem and Scheffé's method. Education and career Scheffé was born in New York City on April 11, 1907, the child of Germa ...
, and John W. Tukey. Duncan argued that a more liberal procedure was appropriate because in real world practice the global null hypothesis H0 = "All means are equal" is often false and thus traditional statisticians overprotect a probably false null hypothesis against type I errors. According to Duncan, one should adjust the protection levels for different p-mean comparisons according to the problem discussed. The example discussed by Duncan in his 1955 paper is of a comparison of many means (i.e. 100),when one is interested only in two-mean and three-mean comparisons, and general p-mean comparisons (deciding whether there is some difference between p-means) are of no special interest (if p is 15 or more for example). Duncan's multiple range test is very “liberal” in terms of Type I errors. The following example will illustrate why: Let us assume one is truly interested, as Duncan suggested, only with the correct ranking of subsets of size 4 or below. Let us also assume that one performs the simple pairwise comparison with a protection level \gamma_2 =0.95 . Given an overall set of 100 means, let us look at the null hypotheses of the test: There are 100\choose2 null hypotheses for the correct ranking of each 2 means. The significance level of each hypothesis is 1-0.95 = 0.05 There are 100\choose3 null hypotheses for the correct ranking of each 3 means. The significance level of each hypothesis is 1- (0.95)^2=0.097 There are 100\choose4 null hypotheses for the correct ranking of each 4 means. The significance level of each hypothesis is 1- (0.95)^3=0.143 As we can see, the test has two main problems, regarding the type I errors: # Duncan’s tests is based on the Newman–Keuls procedure, which does not protect the family-wise error rate (though protecting the per-comparison alpha level) # Duncan’s test intentionally raises the alpha levels (
Type I error rate In statistical hypothesis testing, a type I error is the mistaken rejection of an actually true null hypothesis (also known as a "false positive" finding or conclusion; example: "an innocent person is convicted"), while a type II error is the fa ...
) in each step of the Newman–Keuls procedure (significance levels of \alpha_p\geq\alpha ). Therefore, it is advised not to use the procedure discussed. Duncan later developed the Duncan–Waller test which is based on Bayesian principles. It uses the obtained value of F to estimate the prior probability of the
null hypothesis In scientific research, the null hypothesis (often denoted ''H''0) is the claim that no difference or relationship exists between two sets of data or variables being analyzed. The null hypothesis is that any experimentally observed difference is d ...
being true.


Different approaches to the problem

If one still wishes to address the problem of finding similar subsets of group means, other solutions are found in literature. Tukey's range test is commonly used to compare pairs of means, this procedure controls the family-wise error rate in the strong sense. Another solution is to perform
Student's t-test A ''t''-test is any statistical hypothesis test in which the test statistic follows a Student's ''t''-distribution under the null hypothesis. It is most commonly applied when the test statistic would follow a normal distribution if the value of ...
of all pairs of means, and then to use FDR Controlling procedure (to control the expected proportion of incorrectly rejected
null hypotheses In scientific research, the null hypothesis (often denoted ''H''0) is the claim that no difference or relationship exists between two sets of data or variables being analyzed. The null hypothesis is that any experimentally observed difference is d ...
). Other possible solutions, which do not include hypothesis testing, but result in a partition of subsets include Clustering &
Hierarchical Clustering In data mining and statistics, hierarchical clustering (also called hierarchical cluster analysis or HCA) is a method of cluster analysis that seeks to build a hierarchy of clusters. Strategies for hierarchical clustering generally fall into ...
. These solutions differ from the approach presented in this method: * By being distance/density based, and not distribution based. * Needing a larger group of means, in order to produce significant results or working with the entire data set.


References

* * * * {{cite document , first=Rajender , last=Parsad , title=Multiple comparison Procedures , publisher=I.A.S.R.I, Library Avenue, New Delhi 110012 ;Tables for the Use of Range and Studentized Range in Tests of Hypotheses * H. Leon Harter, Champaigne, IL; N. Balakrishnan, McMaster University, Hamilton, Ontario, Canada; Hardback - Published Oct 27, 1997


External links


Critical values for Duncan's multiple range tests
Statistical tests Multiple comparisons