Squared deviations from the mean (SDM) result from squaring deviations. In

probability theory Probability theory is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expressing it through a set o ...

and

statistics Statistics (from German language, German: ''wikt:Statistik#German, Statistik'', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of ...

, the definition of ''

variance In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its population mean or sample mean. Variance is a measure of dispersion, meaning it is a measure of how far a set of numbers ...

'' is either the

expected value In probability theory, the expected value (also called expectation, expectancy, mathematical expectation, mean, average, or first moment) is a generalization of the weighted average. Informally, the expected value is the arithmetic mean of a l ...

of the SDM (when considering a theoretical

distribution Distribution may refer to: Mathematics *Distribution (mathematics), generalized functions used to formulate solutions of partial differential equations * Probability distribution, the probability of a particular value or value range of a vari ...

) or its average value (for actual experimental data). Computations for ''

analysis of variance Analysis of variance (ANOVA) is a collection of statistical models and their associated estimation procedures (such as the "variation" among and between groups) used to analyze the differences among means. ANOVA was developed by the statisticia ...

'' involve the partitioning of a sum of SDM.

Background

An understanding of the computations involved is greatly enhanced by a study of the statistical value :

\operatorname(  X ^ 2 )

, where

\operatorname

is the expected value operator. For a

random variable A random variable (also called random quantity, aleatory variable, or stochastic variable) is a mathematical formalization of a quantity or object which depends on random events. It is a mapping or a function from possible outcomes (e.g., the po ...

X

with mean

\mu

and variance

\sigma^2

, :

\sigma^2 = \operatorname(  X ^ 2 ) - \mu^2.

Mood & Graybill: ''An introduction to the Theory of Statistics'' (McGraw Hill) Therefore, :

\operatorname(  X ^ 2 ) = \sigma^2 + \mu^2.

From the above, the following can be derived: :

\operatorname\left( \sum\left( X ^ 2\right) \right) = n\sigma^2 + n\mu^2,

\operatorname\left( \left(\sum X \right)^ 2 \right) = n\sigma^2 + n^2\mu^2.

Sample variance

The sum of squared deviations needed to calculate

sample variance In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its population mean or sample mean. Variance is a measure of dispersion, meaning it is a measure of how far a set of numbers ...

(before deciding whether to divide by ''n'' or ''n'' − 1) is most easily calculated as :

S = \sum x ^ 2 - \frac

From the two derived expectations above the expected value of this sum is :

\operatorname(S) = n\sigma^2 + n\mu^2 - \frac

which implies :

\operatorname(S) = (n - 1)\sigma^2.

This effectively proves the use of the divisor ''n'' − 1 in the calculation of an unbiased sample estimate of ''σ''².

Partition — analysis of variance

In the situation where data is available for ''k'' different treatment groups having size ''n''_''i'' where ''i'' varies from 1 to ''k'', then it is assumed that the expected mean of each group is :

\operatorname(\mu_i) = \mu + T_i

and the variance of each treatment group is unchanged from the population variance

\sigma^2

. Under the Null Hypothesis that the treatments have no effect, then each of the

T_i

will be zero. It is now possible to calculate three sums of squares: ;Individual :

I = \sum x^2

\operatorname(I) = n\sigma^2 + n\mu^2

;Treatments :

T = \sum_^k \left(\left(\sum x\right)^2/n_i\right)

\operatorname(T) = k\sigma^2 + \sum_^k n_i(\mu + T_i)^2

\operatorname(T) = k\sigma^2 + n\mu^2 + 2\mu \sum_^k (n_iT_i) + \sum_^k n_i(T_i)^2

Under the null hypothesis that the treatments cause no differences and all the

T_i

are zero, the expectation simplifies to :

\operatorname(T) = k\sigma^2 + n\mu^2.

;Combination :

C = \left(\sum x\right)^2/n

\operatorname(C) = \sigma^2 + n\mu^2

Sums of squared deviations

Under the null hypothesis, the difference of any pair of ''I'', ''T'', and ''C'' does not contain any dependency on

\mu

, only

\sigma^2

. :

\operatorname(I - C) = (n - 1)\sigma^2

total squared deviations aka ''

total sum of squares In statistical data analysis the total sum of squares (TSS or SST) is a quantity that appears as part of a standard way of presenting results of such analyses. For a set of observations, y_i, i\leq n, it is defined as the sum over all squared dif ...

'' :

\operatorname(T - C) = (k - 1)\sigma^2

treatment squared deviations aka '' explained sum of squares'' :

\operatorname(I - T) = (n - k)\sigma^2

residual squared deviations aka ''

residual sum of squares In statistics, the residual sum of squares (RSS), also known as the sum of squared estimate of errors (SSE), is the sum of the squares of residuals (deviations predicted from actual empirical values of data). It is a measure of the discrepan ...

'' The constants (''n'' − 1), (''k'' − 1), and (''n'' − ''k'') are normally referred to as the number of

degrees of freedom Degrees of freedom (often abbreviated df or DOF) refers to the number of independent variables or parameters of a thermodynamic system. In various scientific fields, the word "freedom" is used to describe the limits to which physical movement or ...

Example

In a very simple example, 5 observations arise from two treatments. The first treatment gives three values 1, 2, and 3, and the second treatment gives two values 4, and 6. :

I = \frac + \frac + \frac + \frac + \frac = 66

T = \frac + \frac = 12 + 50 = 62

C = \frac = 256/5 = 51.2

Giving : Total squared deviations = 66 − 51.2 = 14.8 with 4 degrees of freedom. : Treatment squared deviations = 62 − 51.2 = 10.8 with 1 degree of freedom. : Residual squared deviations = 66 − 62 = 4 with 3 degrees of freedom.

Two-way analysis of variance

{{excerpt, Two-way analysis of variance

References

Statistical deviation and dispersion Analysis of variance