statistics Statistics (from German language, German: ', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a s ...

, Levene's test is an inferential statistic used to assess the equality of

variance In probability theory and statistics, variance is the expected value of the squared deviation from the mean of a random variable. The standard deviation (SD) is obtained as the square root of the variance. Variance is a measure of dispersion ...

s for a variable calculated for two or more groups. This test is used because some common statistical procedures assume that variances of the populations from which different samples are drawn are equal. Levene's test assesses this assumption. It tests the

null hypothesis The null hypothesis (often denoted ''H''0) is the claim in scientific research that the effect being studied does not exist. The null hypothesis can also be described as the hypothesis in which no relationship exists between two sets of data o ...

that the population variances are equal (called ''homogeneity of variance'' or ''

homoscedasticity In statistics, a sequence of random variables is homoscedastic () if all its random variables have the same finite variance; this is also known as homogeneity of variance. The complementary notion is called heteroscedasticity, also known as hete ...

''). If the resulting ''p''-value of Levene's test is less than some significance level (typically 0.05), the obtained differences in sample variances are unlikely to have occurred based on random sampling from a population with equal variances. Thus, the null hypothesis of equal variances is rejected and it is concluded that there is a difference between the variances in the population. Levene's test has been used in the past before a comparison of means to inform the decision on whether to use a pooled t-test or the Welch's t-test for two sample tests or

analysis of variance Analysis of variance (ANOVA) is a family of statistical methods used to compare the Mean, means of two or more groups by analyzing variance. Specifically, ANOVA compares the amount of variation ''between'' the group means to the amount of variati ...

or Welch's modified oneway ANOVA for multi-level tests. However, it was shown that such a two-step procedure may markedly inflate the type 1 error obtained with the t-tests and thus is not recommended. Instead, the preferred approach is to just use Welch's test in all cases. Levene's test may also be used as a main test for answering a stand-alone question of whether two sub-samples in a given population have equal or different variances. Levene's test was developed by and named after American statistician and geneticist Howard Levene.

Definition

Levene's test is equivalent to a 1-way between-groups analysis of variance (ANOVA) with the dependent variable being the absolute value of the difference between a score and the mean of the group to which the score belongs (shown below as

Z_ = , Y_ - \bar_,

). The test statistic,

W

, is equivalent to the

F

statistic that would be produced by such an ANOVA, and is defined as follows: :

W = \frac \cdot \frac ,

where *

k

is the number of different groups to which the sampled cases belong, *

N_i

is the number of cases in the

i

th group, *

N

is the total number of cases in all groups, *

Y_

is the value of the measured variable for the

j

th case from the

i

th group, *

Z_ = \begin
, Y_ - \bar_, , & \bar_ \text i\text, \\
, Y_ - \tilde_, , & \tilde_ \text i\text. \end

(Both definitions are in use though the second one is, strictly speaking, the Brown–Forsythe test – see below for comparison.) *

Z_ = \frac \sum_^ Z_

is the mean of the

Z_

for group

i

, *

Z_ = \frac \sum_^k \sum_^ Z_

is the mean of all

Z_

. The test statistic

W

is approximately F-distributed with

k-1

and

N-k

degrees of freedom, and hence is the significance of the outcome

w

W

tested against

F(1-\alpha;k-1,N-k)

where

F

is a quantile of the F-distribution, with

k-1

and

N-k

degrees of freedom, and

\alpha

is the chosen level of significance (usually 0.05 or 0.01).

Comparison with the Brown–Forsythe test

The Brown–Forsythe test uses the median instead of the mean in computing the spread within each group (

\bar

vs.

\tilde

, above). Although the optimal choice depends on the underlying distribution, the definition based on the median is recommended as the choice that provides good

robustness Robustness is the property of being strong and healthy in constitution. When it is transposed into a system A system is a group of interacting or interrelated elements that act according to a set of rules to form a unified whole. A system, ...

against many types of non-normal data while retaining good

statistical power In frequentist statistics, power is the probability of detecting a given effect (if that effect actually exists) using a given test in a given context. In typical use, it is a function of the specific test that is used (including the choice of tes ...

. If one has knowledge of the underlying distribution of the data, this may indicate using one of the other choices. Brown and Forsythe performed

Monte Carlo Monte Carlo ( ; ; or colloquially ; , ; ) is an official administrative area of Monaco, specifically the Ward (country subdivision), ward of Monte Carlo/Spélugues, where the Monte Carlo Casino is located. Informally, the name also refers to ...

studies that indicated that using the trimmed mean performed best when the underlying data followed a

Cauchy distribution The Cauchy distribution, named after Augustin-Louis Cauchy, is a continuous probability distribution. It is also known, especially among physicists, as the Lorentz distribution (after Hendrik Lorentz), Cauchy–Lorentz distribution, Lorentz(ian) ...

(a heavy-tailed distribution) and the median performed best when the underlying data followed a

chi-squared distribution In probability theory and statistics, the \chi^2-distribution with k Degrees of freedom (statistics), degrees of freedom is the distribution of a sum of the squares of k Independence (probability theory), independent standard normal random vari ...

with four degrees of freedom (a heavily

skewed distribution In probability theory and statistics, skewness is a measure of the asymmetry of the probability distribution of a real number, real-valued random variable about its mean. The skewness value can be positive, zero, negative, or undefined. For ...

). Using the mean provided the best power for symmetric, moderate-tailed, distributions.

Software implementations

Many

spreadsheet A spreadsheet is a computer application for computation, organization, analysis and storage of data in tabular form. Spreadsheets were developed as computerized analogs of paper accounting worksheets. The program operates on data entered in c ...

programs and statistics packages, such as R, Python, Julia, and

MATLAB MATLAB (an abbreviation of "MATrix LABoratory") is a proprietary multi-paradigm programming language and numeric computing environment developed by MathWorks. MATLAB allows matrix manipulations, plotting of functions and data, implementat ...

include implementations of Levene's test.

References

{{Reflist

External links

Parametric and nonparametric Levene's test in SPSS
* http://www.itl.nist.gov/div898/handbook/eda/section3/eda35a.htm Analysis of variance Statistical tests

Definition

Comparison with the Brown–Forsythe test

Software implementations

See also

References

External links