Tukey's range test, also known as Tukey's test, Tukey method, Tukey's honest significance test, or Tukey's HSD (honestly significant difference) test,
[
: Also occasionally described as "honestly", see e.g.]
is a single-step
multiple comparison
Multiple comparisons, multiplicity or multiple testing problem occurs in statistics when one considers a set of statistical inferences simultaneously or Estimation theory, estimates a subset of parameters selected based on the observed values.
Th ...
procedure and
statistical test
A statistical hypothesis test is a method of statistical inference used to decide whether the data provide sufficient evidence to reject a particular hypothesis. A statistical hypothesis test typically involves a calculation of a test statistic. ...
. It can be used to correctly interpret the
statistical significance
In statistical hypothesis testing, a result has statistical significance when a result at least as "extreme" would be very infrequent if the null hypothesis were true. More precisely, a study's defined significance level, denoted by \alpha, is the ...
of the difference between means that have been selected for comparison because of their extreme values.
The method was initially developed and introduced by
John Tukey
John Wilder Tukey (; June 16, 1915 – July 26, 2000) was an American mathematician and statistician, best known for the development of the fast Fourier Transform (FFT) algorithm and box plot. The Tukey range test, the Tukey lambda distributi ...
for use in
Analysis of Variance
Analysis of variance (ANOVA) is a family of statistical methods used to compare the Mean, means of two or more groups by analyzing variance. Specifically, ANOVA compares the amount of variation ''between'' the group means to the amount of variati ...
(ANOVA), and usually has only been taught in connection with ANOVA. However, the
studentized range distribution used to determine the level of significance of the differences considered in Tukey's test has vastly broader application: It is useful for researchers who have searched their collected data for remarkable differences between groups, but then cannot validly determine
how significant their discovered stand-out difference is using standard statistical distributions used for other conventional statistical tests, for which the data must have been selected at random. Since when stand-out data is compared it was by definition ''not'' selected at random, but rather specifically chosen because it was extreme, it needs a different, stricter interpretation provided by the likely frequency and size of the
studentized range In statistics, the studentized range, denoted ''q'', is the difference between the largest and smallest data in a sample normalized by the sample standard deviation.
It is named after William Sealy Gosset (who wrote under the pseudonym "''Student ...
; the modern practice of "
data mining
Data mining is the process of extracting and finding patterns in massive data sets involving methods at the intersection of machine learning, statistics, and database systems. Data mining is an interdisciplinary subfield of computer science and ...
" is an example where it is used.
Development
The test is named after
John Tukey
John Wilder Tukey (; June 16, 1915 – July 26, 2000) was an American mathematician and statistician, best known for the development of the fast Fourier Transform (FFT) algorithm and box plot. The Tukey range test, the Tukey lambda distributi ...
,
[
]
it compares all possible pairs of
means
Means may refer to:
* Means LLC, an anti-capitalist media worker cooperative
* Means (band), a Christian hardcore band from Regina, Saskatchewan
* Means, Kentucky, a town in the US
* Means (surname)
* Means Johnston Jr. (1916–1989), US Navy ...
, and is based on a
studentized range distribution () (this distribution is similar to the distribution of from the
-test. See below).
[
]
Tukey's test compares the means of every treatment to the means of every other treatment; that is, it applies simultaneously to the set of all pairwise comparisons
:
and identifies any difference between two means that is greater than the expected
standard error. The
confidence coefficient for the
set
Set, The Set, SET or SETS may refer to:
Science, technology, and mathematics Mathematics
*Set (mathematics), a collection of elements
*Category of sets, the category whose objects and morphisms are sets and total functions, respectively
Electro ...
, when all sample sizes are equal, is exactly
for any
For unequal sample sizes, the confidence coefficient is greater than
In other words, the Tukey method is conservative when there are
unequal sample sizes.
This test is often followed by the
Compact Letter Display (CLD) statistical procedure to render the output of this test more transparent to non-statistician audiences.
Assumptions
# The observations being tested are
independent within and among the groups.
# The subgroups associated with each mean in the test are
normally distributed.
# There is equal within-subgroup variance across the subgroups associated with each mean in the test (
homogeneity of variance).
The test statistic
Tukey's test is based on a formula very similar to that of the
-test. In fact, Tukey's test is essentially a -test, except that it corrects for
family-wise error rate
In statistics, family-wise error rate (FWER) is the probability of making one or more false discoveries, or type I errors when performing multiple hypotheses tests.
Familywise and experimentwise error rates
John Tukey developed in 1953 the conce ...
.
The formula for Tukey's test is
:
where and are the two means being compared, and SE is the
standard error for the sum of the means. The value is the sample's test statistic. (The notation means the
absolute value
In mathematics, the absolute value or modulus of a real number x, is the non-negative value without regard to its sign. Namely, , x, =x if x is a positive number, and , x, =-x if x is negative (in which case negating x makes -x positive), ...
of ; the magnitude of with the sign set to , regardless of the original sign of .)
This test statistic can then be compared to a value for the chosen significance level from a table of the
studentized range distribution. If the value is ''larger'' than the critical value obtained from the distribution, the two means are said to be significantly different at level
Since the
null hypothesis
The null hypothesis (often denoted ''H''0) is the claim in scientific research that the effect being studied does not exist. The null hypothesis can also be described as the hypothesis in which no relationship exists between two sets of data o ...
for Tukey's test states that all means being compared are from the same population (i.e. ), the means should be normally distributed (according to the
central limit theorem
In probability theory, the central limit theorem (CLT) states that, under appropriate conditions, the Probability distribution, distribution of a normalized version of the sample mean converges to a Normal distribution#Standard normal distributi ...
) with the same model
standard deviation
In statistics, the standard deviation is a measure of the amount of variation of the values of a variable about its Expected value, mean. A low standard Deviation (statistics), deviation indicates that the values tend to be close to the mean ( ...
, estimated by the merged
standard error,
for all the samples; its calculation is discussed in the following sections. This gives rise to the normality assumption of Tukey's test.
The studentized range () distribution
The Tukey method uses the
studentized range distribution. Suppose that we take a sample of size from each of populations with the same
normal distribution
In probability theory and statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is
f(x) = \frac ...
and suppose that
is the smallest of these sample means and
is the largest of these sample means, and suppose
2 is the
pooled sample variance from these samples. Then the following random variable has a Studentized range distribution:
:
This definition of the statistic given above is the basis of the critically significant value for discussed below, and is based on these three factors:
:
the
Type I error
Type I error, or a false positive, is the erroneous rejection of a true null hypothesis in statistical hypothesis testing. A type II error, or a false negative, is the erroneous failure in bringing about appropriate rejection of a false null hy ...
rate, or the probability of rejecting a true null hypothesis;
:
the number of sub-populations being compared;
:
the number of degrees of freedom for each mean
where is the total number of observations.)
The distribution of has been tabulated and appears in many textbooks on statistics. In some tables the distribution of has been tabulated without the
factor. To understand which table it is, we can compute the result for and compare it to the result of the
Student's t-distribution
In probability theory and statistics, Student's distribution (or simply the distribution) t_\nu is a continuous probability distribution that generalizes the Normal distribution#Standard normal distribution, standard normal distribu ...
with the same degrees of freedom and the
In addition,
R offers a
cumulative distribution function
In probability theory and statistics, the cumulative distribution function (CDF) of a real-valued random variable X, or just distribution function of X, evaluated at x, is the probability that X will take a value less than or equal to x.
Ever ...
(
ptukey
) and a
quantile function (
qtukey
)
Confidence limits
The Tukey
confidence limits for all pairwise comparisons with confidence coefficient of at least
:
Notice that the point estimator and the estimated variance are the same as those for a single pairwise comparison. The only difference between the confidence limits for simultaneous comparisons and those for a single comparison is the multiple of the estimated standard deviation.
Also note that the sample sizes must be equal when using the studentized range approach.
is the standard deviation of the entire design, not just that of the two groups being compared. It is possible to work with unequal sample sizes. In this case, one has to calculate the estimated standard deviation for each pairwise comparison as formalized by
Clyde Kramer in 1956, so the procedure for unequal sample sizes is sometimes referred to as the Tukey–Kramer method which is as follows:
:
where and are the sizes of groups and respectively. The degrees of freedom for the whole design is also applied.
Comparing ANOVA and Tukey–Kramer tests
Both ANOVA and Tukey–Kramer tests are based on the same assumptions. However, these two tests for groups (i.e. ) may result in logical contradictions when even if the assumptions do hold.
It is possible to generate a set of pseudorandom samples of strictly negative measure such that hypothesis is rejected at significance level
while is not rejected even at
[
]
See also
*
Family-wise error rate
In statistics, family-wise error rate (FWER) is the probability of making one or more false discoveries, or type I errors when performing multiple hypotheses tests.
Familywise and experimentwise error rates
John Tukey developed in 1953 the conce ...
*
Newman–Keuls method
References
Further reading
*
External links
* {{cite web
, title = Tukey's method
, series = SEMATECH
, department = e-Handbook of Statistical Methods
, publisher =
National Institute of Standards and Technology
The National Institute of Standards and Technology (NIST) is an agency of the United States Department of Commerce whose mission is to promote American innovation and industrial competitiveness. NIST's activities are organized into Outline of p ...
/ U.S.
Department of Commerce
The United States Department of Commerce (DOC) is an United States federal executive departments, executive department of the Federal government of the United States, U.S. federal government. It is responsible for gathering data for business ...
, website = itl.nist.gov/div898/handbook
, url = http://www.itl.nist.gov/div898/handbook/prc/section4/prc471.htm
Analysis of variance
Statistical tests
Multiple comparisons