statistics Statistics (from German: '' Statistik'', "description of a state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a scientific, indust ...

, Cochran's theorem, devised by

William G. Cochran William Gemmell Cochran (15 July 1909 – 29 March 1980) was a prominent statistics, statistician. He was born in Scotland but spent most of his life in the United States. Cochran studied mathematics at the University of Glasgow and the Univers ...

, is a

theorem In mathematics, a theorem is a statement that has been proved, or can be proved. The ''proof'' of a theorem is a logical argument that uses the inference rules of a deductive system to establish that the theorem is a logical consequence of t ...

used to justify results relating to the

probability distribution In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment. It is a mathematical description of a random phenomenon ...

s of statistics that are used in the

analysis of variance Analysis of variance (ANOVA) is a collection of statistical models and their associated estimation procedures (such as the "variation" among and between groups) used to analyze the differences among means. ANOVA was developed by the statistician ...

Statement

Let ''U''₁, ..., ''U''_''N'' be i.i.d. standard normally distributed

random variable A random variable (also called random quantity, aleatory variable, or stochastic variable) is a mathematical formalization of a quantity or object which depends on random events. It is a mapping or a function from possible outcomes (e.g., the po ...

s, and

U =_1, ..., U_N T

. Let

B^,B^,\ldots, B^

be symmetric matrices. Define ''r''_''i'' to be the

rank Rank is the relative position, value, worth, complexity, power, importance, authority, level, etc. of a person or object within a ranking, such as: Level or position in a hierarchical organization * Academic rank * Diplomatic rank * Hierarchy * ...

B^

. Define

Q_i=U^T B^U

, so that the ''Q''_i are

quadratic form In mathematics, a quadratic form is a polynomial with terms all of degree two ("form" is another name for a homogeneous polynomial). For example, :4x^2 + 2xy - 3y^2 is a quadratic form in the variables and . The coefficients usually belong to ...

s. Further assume

\sum_i Q_i = U^T U

. Cochran's theorem states that the following are equivalent: *

r_1+\cdots +r_k=N

, * the ''Q''_''i'' are

independent Independent or Independents may refer to: Arts, entertainment, and media Artist groups * Independents (artist group), a group of modernist painters based in the New Hope, Pennsylvania, area of the United States during the early 1930s * Independe ...

* each ''Q''_''i'' has a

chi-squared distribution In probability theory and statistics, the chi-squared distribution (also chi-square or \chi^2-distribution) with k degrees of freedom is the distribution of a sum of the squares of k independent standard normal random variables. The chi-squar ...

with ''r''_''i''

degrees of freedom Degrees of freedom (often abbreviated df or DOF) refers to the number of independent variables or parameters of a thermodynamic system. In various scientific fields, the word "freedom" is used to describe the limits to which physical movement or ...

. Often it's stated as

\sum_i A_i = A

, where

A

is idempotent, and

\sum_i r_i = N

is replaced by

\sum_i r_i = rank(A)

. But after an orthogonal transform,

A = diag(I_M, 0)

, and so we reduce to the above theorem.

Proof

Claim: Let

X

be a standard Gaussian in

\R^n

, then for any symmetric matrices

Q, Q'

, if

X^T Q X

and

X^T Q' X

have the same distribution, then

Q, Q'

have the same eigenvalues (up to multiplicity). Proof: Let the eigenvalues of

Q

\lambda_1, ..., \lambda_n

, then calculate the

characteristic function In mathematics, the term "characteristic function" can refer to any of several distinct concepts: * The indicator function of a subset, that is the function ::\mathbf_A\colon X \to \, :which for a given subset ''A'' of ''X'', has value 1 at points ...

X^T Q X

. It comes out to be

\phi(t) =\left(\prod_j (1-2i \lambda_j t)\right)^

(To calculate it, first diagonalize

Q

, change into that frame, then use the fact that the characteristic function of the sum of independent variables is the product of their characteristic functions.) For

X^T Q X

and

X^T Q' X

to be equal, their characteristic functions must be equal, so

Q, Q'

have the same eigenvalues (up to multiplicity). Claim:

I = \sum_i B_i

. Proof:

U^T (I - \sum_i B_i) U = 0

. Since

(I - \sum_i B_i)

is symmetric, and

U^T (I - \sum_i B_i) U =^d U^T 0 U

, by the previous claim,

(I - \sum_i B_i)

has the same eigenvalues as 0. Lemma: If

\sum_i M_i = I

, all

M_i

symmetric, and have eigenvalues 0, 1, then they are simultaneously diagonalizable. Fix i, and consider the eigenvectors v of

M_i

such that

M_i v = v

. Then we have

v^T v = v^T I v = v^T v + \sum_ v^T M_j v

, so all

v^T M_j v = 0

. Thus we obtain a split of

\R^N

into

V\oplus V^\perp

, such that V is the 1-eigenspace of

M_i

, and in the 0-eigenspaces of all other

M_j

. Now induct by moving into

V^\perp

. Case: All

Q_i

are independent Fix some

i

, define

C_i = I - B_i = \sum_ B_j

, and diagonalize

B_i

by an orthogonal transform

O

. Then consider

O C_i O^T = I - O B_i O^T

. It is diagonalized as well. Let

W = OU

, then it is also standard Gaussian. Then we have

Q_i = W^T (OB_i O^T) W; \quad \sum_ Q_j = W^T (I - OB_i O^T) W

Inspect their diagonal entries, to see that

Q_i \perp \sum_ Q_j

implies that their nonzero diagonal entries are disjoint. Thus all eigenvalues of

B_i

are 0, 1, so

Q_i

is a

\chi^2

dist with

r_i

degrees of freedom. Case: Each

Q_i

is a

\chi^2(r_i)

distribution. Fix any

i

, diagonalize it by orthogonal transform

O

, and reindex, so that

O B_i O^T = diag(\lambda_1, ..., \lambda_, 0, ..., 0)

. Then

Q_i = \sum_j \lambda_j _j^2

for some

U'_j

, a spherical rotation of

U_i

. Since

Q_i\sim \chi^2(r_i)

, we get all

\lambda_j = 1

. So all

B_i\succeq 0

, and have eigenvalues

0, 1

. So diagonalize them simultaneously, add them up, to find

\sum_i r_i = N

. Case:

r_1+\cdots +r_k=N

We first show that the matrices ''B''^(''i'') can be simultaneously diagonalized by an orthogonal matrix and that their non-zero

eigenvalue In linear algebra, an eigenvector () or characteristic vector of a linear transformation is a nonzero vector that changes at most by a scalar factor when that linear transformation is applied to it. The corresponding eigenvalue, often denote ...

s are all equal to +1. Once that's shown, take this orthogonal transform to this simultaneous

eigenbasis In linear algebra, an eigenvector () or characteristic vector of a linear transformation is a nonzero vector that changes at most by a scalar factor when that linear transformation is applied to it. The corresponding eigenvalue, often denoted ...

, in which the random vector

_1, ..., U_N T

becomes

'_1, ..., U'_N T

, but all

U_i'

are still independent and standard Gaussian. Then the result follows. Each of the matrices ''B''^(''i'') has

''r''_''i'' and thus ''r''_''i'' non-zero

s. For each ''i'', the sum

C^ \equiv \sum_B^

has at most rank

\sum_r_j = N-r_i

. Since

B^+C^ = I_

, it follows that ''C''^(''i'') has exactly rank ''N'' − ''r''_''i''. Therefore ''B''^(''i'') and ''C''^(''i'') can be simultaneously diagonalized. This can be shown by first diagonalizing ''B''^(''i''), by the

spectral theorem In mathematics, particularly linear algebra and functional analysis, a spectral theorem is a result about when a linear operator or matrix can be diagonalized (that is, represented as a diagonal matrix in some basis). This is extremely useful be ...

. In this basis, it is of the form: :

\begin
\lambda_1  & 0          & 0      & \cdots        & \cdots &        & 0 \\
0          & \lambda_2  & 0      & \cdots        & \cdots &        & 0 \\
0          &  0         & \ddots &               &        &        & \vdots \\
\vdots     &  \vdots    &        & \lambda_ &        & \\
\vdots     & \vdots     &        &               &  0     & \\
0          & \vdots     &        &               &        & \ddots \\
0          & 0          & \ldots &               &        &        & 0
 \end.

Thus the lower

(N-r_i)

rows are zero. Since

C^ = I - B^

, it follows that these rows in ''C''^(''i'') in this basis contain a right block which is a

(N-r_i)\times(N-r_i)

unit matrix, with zeros in the rest of these rows. But since ''C''^(''i'') has rank ''N'' − ''r''_''i'', it must be zero elsewhere. Thus it is diagonal in this basis as well. It follows that all the non-zero

s of both ''B''^(''i'') and ''C''^(''i'') are +1. This argument applies for all ''i'', thus all ''B''^(''i'') are positive semidefinite. Moreover, the above analysis can be repeated in the diagonal basis for

C^ = B^ + \sum_B^

. In this basis

C^

is the identity of an

(N-r_1)\times(N-r_1)

vector space, so it follows that both ''B''⁽²⁾ and

\sum_B^

are simultaneously diagonalizable in this vector space (and hence also together with ''B''⁽¹⁾). By iteration it follows that all ''B''-s are simultaneously diagonalizable. Thus there exists an

orthogonal matrix In linear algebra, an orthogonal matrix, or orthonormal matrix, is a real square matrix whose columns and rows are orthonormal vectors. One way to express this is Q^\mathrm Q = Q Q^\mathrm = I, where is the transpose of and is the identity m ...

S

such that for all

i

S^\mathrmB^ S \equiv B^

is diagonal, where any entry

B^_

with indices

x = y

\sum_^ r_j < x = y \le \sum_^i r_j

, is equal to 1, while any entry with other indices is equal to 0.

Examples

Sample mean and sample variance

If ''X''₁, ..., ''X''_''n'' are independent normally distributed random variables with mean ''μ'' and standard deviation ''σ'' then :

U_i = \frac

is standard normal for each ''i''. Note that the total ''Q'' is equal to sum of squared ''U''s as shown here: :

\sum_iQ_i=\sum_ U_j B_^ U_k = \sum_ U_j U_k \sum_i B_^ =
\sum_ U_j U_k\delta_ = \sum_ U_j^2

which stems from the original assumption that

B_ + B_ \ldots = I

. So instead we will calculate this quantity and later separate it into ''Q''_''i'''s. It is possible to write :

\sum_^n U_i^2=\sum_^n\left(\frac\right)^2
+ n\left(\frac\right)^2

(here

\overline

is the

sample mean The sample mean (or "empirical mean") and the sample covariance are statistics computed from a sample of data on one or more random variables. The sample mean is the average value (or mean value) of a sample of numbers taken from a larger popu ...

). To see this identity, multiply throughout by

\sigma^2

and note that :

\sum(X_i-\mu)^2=
\sum(X_i-\overline+\overline-\mu)^2

and expand to give :

\sum(X_i-\mu)^2=
\sum(X_i-\overline)^2+\sum(\overline-\mu)^2+
2\sum(X_i-\overline)(\overline-\mu).

The third term is zero because it is equal to a constant times :

\sum(\overline-X_i)=0,

and the second term has just ''n'' identical terms added together. Thus :

\sum(X_i-\mu)^2 = \sum(X_i-\overline)^2+n(\overline-\mu)^2 ,

and hence :

\sum\left(\frac\right)^2=
\sum\left(\frac\right)^2
+n\left(\frac\right)^2=
\overbrace^
+\overbrace^=
Q_1+Q_2.

Now

B^=\frac

with

J_n

the matrix of ones which has rank 1. In turn

B^= I_n-\frac

given that

I_n=B^+B^

. This expression can be also obtained by expanding

Q_1

in matrix notation. It can be shown that the rank of

B^

n-1

as the addition of all its rows is equal to zero. Thus the conditions for Cochran's theorem are met. Cochran's theorem then states that ''Q''₁ and ''Q''₂ are independent, with chi-squared distributions with ''n'' − 1 and 1 degree of freedom respectively. This shows that the sample mean and

sample variance In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its population mean or sample mean. Variance is a measure of dispersion, meaning it is a measure of how far a set of numbe ...

are independent. This can also be shown by Basu's theorem, and in fact this property ''characterizes'' the normal distribution – for no other distribution are the sample mean and sample variance independent.

Distributions

The result for the distributions is written symbolically as :

\sum\left(X_i-\overline\right)^2  \sim \sigma^2 \chi^2_.

n(\overline-\mu)^2\sim \sigma^2 \chi^2_1,

Both these random variables are proportional to the true but unknown variance ''σ''². Thus their ratio does not depend on ''σ''² and, because they are statistically independent. The distribution of their ratio is given by :

\frac
\sim \frac
   \sim F_

where ''F''_{1,''n'' − 1} is the

F-distribution In probability theory and statistics, the ''F''-distribution or F-ratio, also known as Snedecor's ''F'' distribution or the Fisher–Snedecor distribution (after Ronald Fisher and George W. Snedecor) is a continuous probability distribution ...

with 1 and ''n'' − 1 degrees of freedom (see also

Student's t-distribution In probability and statistics, Student's ''t''-distribution (or simply the ''t''-distribution) is any member of a family of continuous probability distributions that arise when estimating the mean of a normally distributed population in situ ...

). The final step here is effectively the definition of a random variable having the F-distribution.

Estimation of variance

To estimate the variance ''σ''², one estimator that is sometimes used is the

maximum likelihood In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of an assumed probability distribution, given some observed data. This is achieved by maximizing a likelihood function so that, under the assumed stat ...

estimator of the variance of a normal distribution :

\widehat^2=
\frac\sum\left(
X_i-\overline\right)^2.

Cochran's theorem shows that :

\frac\sim\chi^2_

and the properties of the chi-squared distribution show that :

\begin
E \left(\frac\right) &= E \left(\chi^2_\right) \\ 
\fracE \left(\widehat^2\right) &= (n-1) \\
E \left(\widehat^2\right) &= \frac
\end

Alternative formulation

The following version is often seen when considering linear regression. Suppose that

Y\sim N_n(0,\sigma^2I_n)

is a standard multivariate normal

random vector In probability, and statistics, a multivariate random variable or random vector is a list of mathematical variables each of whose value is unknown, either because the value has not yet occurred or because there is imperfect knowledge of its value ...

(here

I_n

denotes the ''n''-by-''n''

identity matrix In linear algebra, the identity matrix of size n is the n\times n square matrix with ones on the main diagonal and zeros elsewhere. Terminology and notation The identity matrix is often denoted by I_n, or simply by I if the size is immaterial or ...

), and if

A_1,\ldots,A_k

are all ''n''-by-''n'' symmetric matrices with

\sum_^kA_i=I_n

. Then, on defining

r_i= \operatorname(A_i)

, any one of the following conditions implies the other two: *

\sum_^kr_i=n ,

Y^TA_iY\sim\sigma^2\chi^2_

(thus the

A_i

are positive semidefinite) *

Y^TA_iY

is independent of

Y^TA_jY

for

i\neq j .

References

{{DEFAULTSORT:Cochran's Theorem Theorems in statistics Characterization of probability distributions

Statement

Proof

Examples

Sample mean and sample variance

Distributions

Estimation of variance

Alternative formulation

See also

References