statistics Statistics (from German: '' Statistik'', "description of a state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a scientific, indust ...

, Welch's ''t''-test, or unequal variances ''t''-test, is a two-sample location test which is used to test the hypothesis that two

populations Population typically refers to the number of people in a single area, whether it be a city or town, region, country, continent, or the world. Governments typically quantify the size of the resident population within their jurisdiction using a ...

have equal means. It is named for its creator, Bernard Lewis Welch, is an adaptation of Student's ''t''-test, and is more reliable when the two samples have unequal variances and possibly unequal sample sizes. These tests are often referred to as "unpaired" or "independent samples" ''t''-tests, as they are typically applied when the statistical units underlying the two samples being compared are non-overlapping. Given that Welch's ''t''-test has been less popular than Student's ''t''-test and may be less familiar to readers, a more informative name is "Welch's unequal variances ''t''-test" — or "unequal variances ''t''-test" for brevity.

Assumptions

Student's ''t''-test assumes that the sample means being compared for two populations are normally distributed, and that the populations have equal variances. Welch's ''t''-test is designed for unequal population variances, but the assumption of normality is maintained. Welch's ''t''-test is an approximate solution to the

Behrens–Fisher problem In statistics, the Behrens–Fisher problem, named after Walter Behrens and Ronald Fisher, is the problem of interval estimation and hypothesis testing concerning the difference between the means of two normally distributed populations when t ...

Calculations

Welch's ''t''-test defines the statistic ''t'' by the following formula: :

t = \frac = \frac\,

s_ =  \,

where

\overline_i

and

s_

are the

i^\text

sample mean The sample mean (or "empirical mean") and the sample covariance are statistics computed from a sample of data on one or more random variables. The sample mean is the average value (or mean value) of a sample of numbers taken from a larger popu ...

and its

standard error The standard error (SE) of a statistic (usually an estimate of a parameter) is the standard deviation of its sampling distribution or an estimate of that standard deviation. If the statistic is the sample mean, it is called the standard error o ...

, with

s_i

denoting the corrected sample standard deviation, and

sample size Sample size determination is the act of choosing the number of observations or replicates to include in a statistical sample. The sample size is an important feature of any empirical study in which the goal is to make inferences about a populati ...

N_i

. Unlike in Student's ''t''-test, the denominator is ''not'' based on a

pooled variance In statistics, pooled variance (also known as combined variance, composite variance, or overall variance, and written \sigma^2) is a method for estimating variance of several different populations when the mean of each population may be different ...

estimate. The

degrees of freedom Degrees of freedom (often abbreviated df or DOF) refers to the number of independent variables or parameters of a thermodynamic system. In various scientific fields, the word "freedom" is used to describe the limits to which physical movement or ...

\nu

associated with this variance estimate is approximated using the

Welch–Satterthwaite equation In statistics and uncertainty analysis, the Welch–Satterthwaite equation is used to calculate an approximation to the effective degrees of freedom of a linear combination of independent sample variances, also known as the pooled degrees of freed ...

: :

\nu \quad  \approx \quad
 \frac
 .

This expression can be simplified when

N_1 = N_2

: :

\nu  \approx \frac
 
 .

Here,

\nu_i = N_i-1

is the degrees of freedom associated with the ''i''-th variance estimate. The statistic is approximately from the ''t''-distribution since we have an approximation of the

chi-square distribution In probability theory and statistics, the chi-squared distribution (also chi-square or \chi^2-distribution) with k degrees of freedom is the distribution of a sum of the squares of k independent standard normal random variables. The chi-squ ...

. This approximation is better done when both

N_1

and

N_2

Statistical test

Once ''t'' and ''

\nu

'' have been computed, these statistics can be used with the ''t''-distribution to test one of two possible

null hypotheses In scientific research, the null hypothesis (often denoted ''H''0) is the claim that no difference or relationship exists between two sets of data or variables being analyzed. The null hypothesis is that any experimentally observed difference is d ...

: * that the two population means are equal, in which a

two-tailed test In statistical significance testing, a one-tailed test and a two-tailed test are alternative ways of computing the statistical significance of a parameter inferred from a data set, in terms of a test statistic. A two-tailed test is appropriate i ...

is applied; or * that one of the population means is greater than or equal to the other, in which a

one-tailed test In statistical significance testing, a one-tailed test and a two-tailed test are alternative ways of computing the statistical significance of a parameter inferred from a data set, in terms of a test statistic. A two-tailed test is appropriate if ...

is applied. The approximate degrees of freedom are

real number In mathematics, a real number is a number that can be used to measure a ''continuous'' one-dimensional quantity such as a distance, duration or temperature. Here, ''continuous'' means that values can have arbitrarily small variations. Every ...

\left(\nu\in\mathbb^+\right)

and used as such in statistics-oriented software, whereas they are rounded down to the nearest integer in spreadsheets.

Advantages and limitations

Welch's ''t''-test is more robust than Student's ''t''-test and maintains type I error rates close to nominal for unequal variances and for unequal sample sizes under normality. Furthermore, the

power Power most often refers to: * Power (physics), meaning "rate of doing work" ** Engine power, the power put out by an engine ** Electric power * Power (social and political), the ability to influence people or events ** Abusive power Power may a ...

of Welch's ''t''-test comes close to that of Student's ''t''-test, even when the population variances are equal and sample sizes are balanced. Welch's ''t''-test can be generalized to more than 2-samples, which is more robust than

one-way analysis of variance In statistics, one-way analysis of variance (abbreviated one-way ANOVA) is a technique that can be used to compare whether two sample's means are significantly different or not (using the F distribution). This technique can be used only for numerica ...

(ANOVA). It is ''not recommended'' to pre-test for equal variances and then choose between Student's ''t''-test or Welch's ''t''-test. Rather, Welch's ''t''-test can be applied directly and without any substantial disadvantages to Student's ''t''-test as noted above. Welch's ''t''-test remains robust for skewed distributions and large sample sizes. Reliability decreases for skewed distributions and smaller samples, where one could possibly perform Welch's ''t''-test.

Examples

The following three examples compare Welch's ''t''-test and Student's ''t''-test. Samples are from random normal distributions using the

R programming language R is a programming language for statistical computing and graphics supported by the R Core Team and the R Foundation for Statistical Computing. Created by statisticians Ross Ihaka and Robert Gentleman, R is used among data miners, bioinforma ...

. For all three examples, the population means were

\mu_1 = 20

and

\mu_2 = 22

. The first example is for unequal but near variances (

\sigma_1^2 = 7.9

\sigma_2^2 = 3.8

) and equal sample sizes (

N_1 = N_2 = 15

). Let A1 and A2 denote two random samples: :

A_1 = \

A_2 = \

The second example is for unequal variances (

\sigma_1^2 = 9.0

\sigma_2^2 = 0.9

) and unequal sample sizes (

N_1 = 10

N_2 = 20

). The smaller sample has the larger variance: :

\begin
A_1 &= \
\\
A_2 &= \
\end

The third example is for unequal variances (

\sigma_1^2 = 1.4

\sigma_2^2 = 17.1

) and unequal sample sizes (

N_1 = 10

N_2 = 20

). The larger sample has the larger variance: :

\begin
A_1 &= \
\\
A_2 &= \
\end

Reference p-values were obtained by simulating the distributions of the ''t'' statistics for the null hypothesis of equal population means (

\mu_1 - \mu_2 =0

). Results are summarised in the table below, with two-tailed p-values: Welch's ''t''-test and Student's ''t''-test gave identical results when the two samples have similar variances and sample sizes (Example 1). But note that even if you sample data from populations with identical variances, the sample variances will differ, as will the results of the two t-tests. So with actual data, the two tests will almost always give somewhat different results. For unequal variances, Student's ''t''-test gave a low p-value when the smaller sample had a larger variance (Example 2) and a high p-value when the larger sample had a larger variance (Example 3). For unequal variances, Welch's ''t''-test gave p-values close to simulated p-values.

Software implementations

References

{{Reflist, 30em Statistical approximations Statistical tests