statistics Statistics (from German: '' Statistik'', "description of a state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a scientific, indust ...

, the studentized range, denoted ''q'', is the difference between the largest and smallest data in a

sample Sample or samples may refer to: Base meaning * Sample (statistics), a subset of a population – complete data set * Sample (signal), a digital discrete sample of a continuous analog signal * Sample (material), a specimen or small quantity of ...

normalized by the sample standard deviation. It is named after

William Sealy Gosset William Sealy Gosset (13 June 1876 – 16 October 1937) was an English statistician, chemist and brewer who served as Head Brewer of Guinness and Head Experimental Brewer of Guinness and was a pioneer of modern statistics. He pioneered small sa ...

(who wrote under the pseudonym "''Student''"), and was introduced by him in 1927. The concept was later discussed by Newman (1939), Keuls (1952), and

John Tukey John Wilder Tukey (; June 16, 1915 – July 26, 2000) was an American mathematician and statistician, best known for the development of the fast Fourier Transform (FFT) algorithm and box plot. The Tukey range test, the Tukey lambda distributi ...

in some unpublished notes. Its statistical distribution is the '' studentized range distribution'', which is used for multiple comparison procedures, such as the single step procedure Tukey's range test, the Newman–Keuls method, and the Duncan's step down procedure, and establishing

confidence interval In frequentist statistics, a confidence interval (CI) is a range of estimates for an unknown parameter. A confidence interval is computed at a designated ''confidence level''; the 95% confidence level is most common, but other levels, such as 9 ...

s that are still valid after data snooping has occurred.

Description

The value of the studentized range, most often represented by the variable ''q'', can be defined based on a random sample ''x''₁, ..., ''x''_''n'' from the ''N''(0, 1) distribution of numbers, and another random variable ''s'' that is independent of all the ''x_i'', and ''νs''² has a ''χ''² distribution with ''ν'' degrees of freedom. Then :

q _= \frac = \max_ \left\

has the Studentized range distribution for ''n'' groups and ''ν'' degrees of freedom. In applications, the ''x_i'' are typically the means of samples each of size ''m'', ''s''² is the pooled variance, and the degrees of freedom are ''ν'' = ''n''(''m'' − 1). The critical value of ''q'' is based on three factors: #''α'' (the probability of rejecting a true

null hypothesis In scientific research, the null hypothesis (often denoted ''H''0) is the claim that no difference or relationship exists between two sets of data or variables being analyzed. The null hypothesis is that any experimentally observed difference is ...

) #''n'' (the number of observations or groups) #''ν'' (the degrees of freedom used to estimate the

sample variance In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its population mean or sample mean. Variance is a measure of dispersion, meaning it is a measure of how far a set of numbe ...

)

Distribution

If ''X''₁, ..., ''X''_''n'' are

independent identically distributed In probability theory and statistics, a collection of random variables is independent and identically distributed if each random variable has the same probability distribution as the others and all are mutually independent. This property is us ...

random variable A random variable (also called random quantity, aleatory variable, or stochastic variable) is a mathematical formalization of a quantity or object which depends on random events. It is a mapping or a function from possible outcomes (e.g., the po ...

s that are normally distributed, the probability distribution of their studentized range is what is usually called the ''studentized range distribution''. Note that the definition of ''q'' does not depend on the

expected value In probability theory, the expected value (also called expectation, expectancy, mathematical expectation, mean, average, or first moment) is a generalization of the weighted average. Informally, the expected value is the arithmetic mean of a ...

or the

standard deviation In statistics, the standard deviation is a measure of the amount of variation or dispersion of a set of values. A low standard deviation indicates that the values tend to be close to the mean (also called the expected value) of the set, whil ...

of the distribution from which the sample is drawn, and therefore its probability distribution is the same regardless of those parameters.

''Studentization''

Generally, the term ''studentized'' means that the variable's scale was adjusted by dividing by an

estimate Estimation (or estimating) is the process of finding an estimate or approximation, which is a value that is usable for some purpose even if input data may be incomplete, uncertain, or unstable. The value is nonetheless usable because it is de ...

of a population

(see also studentized residual). The fact that the standard deviation is a ''sample'' standard deviation rather than the ''population'' standard deviation, and thus something that differs from one random sample to the next, is essential to the definition and the distribution of the ''Studentized'' data. The variability in the value of the ''sample'' standard deviation contributes additional uncertainty into the values calculated. This complicates the problem of finding the probability distribution of any statistic that is ''studentized''.

Description

Distribution

''Studentization''

See also

References

Further reading