In
statistics
Statistics (from German language, German: ''wikt:Statistik#German, Statistik'', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of ...
, Cochran's theorem, devised by
William G. Cochran
William Gemmell Cochran (15 July 1909 – 29 March 1980) was a prominent statistician. He was born in Scotland but spent most of his life in the United States.
Cochran studied mathematics at the University of Glasgow and the University of Cam ...
,
is a
theorem
In mathematics, a theorem is a statement that has been proved, or can be proved. The ''proof'' of a theorem is a logical argument that uses the inference rules of a deductive system to establish that the theorem is a logical consequence of th ...
used to justify results relating to the
probability distribution
In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment. It is a mathematical description of a random phenomenon i ...
s of statistics that are used in the
analysis of variance
Analysis of variance (ANOVA) is a collection of statistical models and their associated estimation procedures (such as the "variation" among and between groups) used to analyze the differences among means. ANOVA was developed by the statisticia ...
.
Statement
Let ''U''
1, ..., ''U''
''N'' be i.i.d. standard
normally distributed random variable
A random variable (also called random quantity, aleatory variable, or stochastic variable) is a mathematical formalization of a quantity or object which depends on random events. It is a mapping or a function from possible outcomes (e.g., the po ...
s, and
. Let
be
symmetric matrices
In linear algebra, a symmetric matrix is a square matrix that is equal to its transpose. Formally,
Because equal matrices have equal dimensions, only square matrices can be symmetric.
The entries of a symmetric matrix are symmetric with re ...
. Define ''r''
''i'' to be the
rank
Rank is the relative position, value, worth, complexity, power, importance, authority, level, etc. of a person or object within a ranking, such as:
Level or position in a hierarchical organization
* Academic rank
* Diplomatic rank
* Hierarchy
* H ...
of
. Define
, so that the ''Q''
i are
quadratic form
In mathematics, a quadratic form is a polynomial with terms all of degree two ("form" is another name for a homogeneous polynomial). For example,
:4x^2 + 2xy - 3y^2
is a quadratic form in the variables and . The coefficients usually belong to a ...
s. Further assume
.
Cochran's theorem states that the following are equivalent:
*
,
* the ''Q''
''i'' are
independent
Independent or Independents may refer to:
Arts, entertainment, and media Artist groups
* Independents (artist group), a group of modernist painters based in the New Hope, Pennsylvania, area of the United States during the early 1930s
* Independ ...
* each ''Q''
''i'' has a
chi-squared distribution
In probability theory and statistics, the chi-squared distribution (also chi-square or \chi^2-distribution) with k degrees of freedom is the distribution of a sum of the squares of k independent standard normal random variables. The chi-squa ...
with ''r''
''i'' degrees of freedom
Degrees of freedom (often abbreviated df or DOF) refers to the number of independent variables or parameters of a thermodynamic system. In various scientific fields, the word "freedom" is used to describe the limits to which physical movement or ...
.
Often it's stated as
, where
is idempotent, and
is replaced by
. But after an orthogonal transform,
, and so we reduce to the above theorem.
Proof
Claim: Let
be a standard Gaussian in
, then for any symmetric matrices
, if
and
have the same distribution, then
have the same eigenvalues (up to multiplicity).
Proof: Let the eigenvalues of
be
, then calculate the
characteristic function In mathematics, the term "characteristic function" can refer to any of several distinct concepts:
* The indicator function of a subset, that is the function
::\mathbf_A\colon X \to \,
:which for a given subset ''A'' of ''X'', has value 1 at points ...
of
. It comes out to be
(To calculate it, first diagonalize
, change into that frame, then use the fact that the characteristic function of the sum of independent variables is the product of their characteristic functions.)
For
and
to be equal, their characteristic functions must be equal, so
have the same eigenvalues (up to multiplicity).
Claim:
.
Proof:
. Since
is symmetric, and
, by the previous claim,
has the same eigenvalues as 0.
Lemma: If
, all
symmetric, and have eigenvalues 0, 1, then they are simultaneously diagonalizable.
Fix i, and consider the eigenvectors v of
such that
. Then we have
, so all
. Thus we obtain a split of
into
, such that V is the 1-eigenspace of
, and in the 0-eigenspaces of all other
. Now induct by moving into
.
Case: All
are independent
Fix some
, define
, and diagonalize
by an orthogonal transform
. Then consider
. It is diagonalized as well.
Let
, then it is also standard Gaussian. Then we have
Inspect their diagonal entries, to see that
implies that their nonzero diagonal entries are disjoint.
Thus all eigenvalues of
are 0, 1, so
is a
dist with
degrees of freedom.
Case: Each
is a
distribution.
Fix any
, diagonalize it by orthogonal transform
, and reindex, so that
. Then
for some
, a spherical rotation of
.
Since
, we get all
. So all
, and have eigenvalues
.
So diagonalize them simultaneously, add them up, to find
.
Case:
We first show that the matrices ''B''
(''i'') can be
simultaneously diagonalized by an orthogonal matrix and that their non-zero
eigenvalue
In linear algebra, an eigenvector () or characteristic vector of a linear transformation is a nonzero vector that changes at most by a scalar factor when that linear transformation is applied to it. The corresponding eigenvalue, often denoted b ...
s are all equal to +1. Once that's shown, take this orthogonal transform to this simultaneous
eigenbasis
In linear algebra, an eigenvector () or characteristic vector of a linear transformation is a nonzero vector that changes at most by a scalar factor when that linear transformation is applied to it. The corresponding eigenvalue, often denoted b ...
, in which the random vector
becomes
, but all
are still independent and standard Gaussian. Then the result follows.
Each of the matrices ''B''
(''i'') has
rank
Rank is the relative position, value, worth, complexity, power, importance, authority, level, etc. of a person or object within a ranking, such as:
Level or position in a hierarchical organization
* Academic rank
* Diplomatic rank
* Hierarchy
* H ...
''r''
''i'' and thus ''r''
''i'' non-zero
eigenvalue
In linear algebra, an eigenvector () or characteristic vector of a linear transformation is a nonzero vector that changes at most by a scalar factor when that linear transformation is applied to it. The corresponding eigenvalue, often denoted b ...
s. For each ''i'', the sum
has at most rank
. Since
, it follows that ''C''
(''i'') has exactly rank ''N'' − ''r''
''i''.
Therefore ''B''
(''i'') and ''C''
(''i'') can be
simultaneously diagonalized. This can be shown by first diagonalizing ''B''
(''i''), by the
spectral theorem
In mathematics, particularly linear algebra and functional analysis, a spectral theorem is a result about when a linear operator or matrix (mathematics), matrix can be Diagonalizable matrix, diagonalized (that is, represented as a diagonal matrix i ...
. In this basis, it is of the form:
:
Thus the lower
rows are zero. Since
, it follows that these rows in ''C''
(''i'') in this basis contain a right block which is a
unit matrix, with zeros in the rest of these rows. But since ''C''
(''i'') has rank ''N'' − ''r''
''i'', it must be zero elsewhere. Thus it is diagonal in this basis as well. It follows that all the non-zero
eigenvalue
In linear algebra, an eigenvector () or characteristic vector of a linear transformation is a nonzero vector that changes at most by a scalar factor when that linear transformation is applied to it. The corresponding eigenvalue, often denoted b ...
s of both ''B''
(''i'') and ''C''
(''i'') are +1. This argument applies for all ''i'', thus all ''B''
(''i'') are positive semidefinite.
Moreover, the above analysis can be repeated in the diagonal basis for
. In this basis
is the identity of an
vector space, so it follows that both ''B''
(2) and
are simultaneously diagonalizable in this vector space (and hence also together with ''B''
(1)). By iteration it follows that all ''B''-s are simultaneously diagonalizable.
Thus there exists an
orthogonal matrix
In linear algebra, an orthogonal matrix, or orthonormal matrix, is a real square matrix whose columns and rows are orthonormal vectors.
One way to express this is
Q^\mathrm Q = Q Q^\mathrm = I,
where is the transpose of and is the identity ma ...
such that for all
,
is diagonal, where any entry
with indices
,
, is equal to 1, while any entry with other indices is equal to 0.
Examples
Sample mean and sample variance
If ''X''
1, ..., ''X''
''n'' are independent normally distributed random variables with mean ''μ'' and standard deviation ''σ'' then
:
is
standard normal for each ''i''. Note that the total ''Q'' is equal to sum of squared ''U''s as shown here:
:
which stems from the original assumption that
.
So instead we will calculate this quantity and later separate it into ''Q''
''i'''s. It is possible to write
:
(here
is the
sample mean
The sample mean (or "empirical mean") and the sample covariance are statistics computed from a Sample (statistics), sample of data on one or more random variables.
The sample mean is the average value (or mean, mean value) of a sample (statistic ...
). To see this identity, multiply throughout by
and note that
:
and expand to give
:
The third term is zero because it is equal to a constant times
:
and the second term has just ''n'' identical terms added together. Thus
:
and hence
:
Now
with
the
matrix of ones
In mathematics, a matrix of ones or all-ones matrix is a matrix where every entry is equal to one. Examples of standard notation are given below:
:J_2 = \begin
1 & 1 \\
1 & 1
\end;\quad
J_3 = \begin
1 & 1 & 1 \\
1 & 1 & 1 \\
1 & 1 & 1
\end;\quad ...
which has rank 1. In turn
given that
. This expression can be also obtained by expanding
in matrix notation. It can be shown that the rank of
is
as the addition of all its rows is equal to zero. Thus the conditions for Cochran's theorem are met.
Cochran's theorem then states that ''Q''
1 and ''Q''
2 are independent, with chi-squared distributions with ''n'' − 1 and 1 degree of freedom respectively. This shows that the sample mean and
sample variance
In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its population mean or sample mean. Variance is a measure of dispersion, meaning it is a measure of how far a set of numbers ...
are independent. This can also be shown by
Basu's theorem, and in fact this property ''characterizes'' the normal distribution – for no other distribution are the sample mean and sample variance independent.
Distributions
The result for the distributions is written symbolically as
:
:
Both these random variables are proportional to the true but unknown variance ''σ''
2. Thus their ratio does not depend on ''σ''
2 and, because they are statistically independent. The distribution of their ratio is given by
:
where ''F''
1,''n'' − 1 is the
F-distribution
In probability theory and statistics, the ''F''-distribution or F-ratio, also known as Snedecor's ''F'' distribution or the Fisher–Snedecor distribution (after Ronald Fisher and George W. Snedecor) is a continuous probability distribution th ...
with 1 and ''n'' − 1 degrees of freedom (see also
Student's t-distribution
In probability and statistics, Student's ''t''-distribution (or simply the ''t''-distribution) is any member of a family of continuous probability distributions that arise when estimating the mean of a normally distributed population in sit ...
). The final step here is effectively the definition of a random variable having the F-distribution.
Estimation of variance
To estimate the variance ''σ''
2, one estimator that is sometimes used is the
maximum likelihood
In statistics, maximum likelihood estimation (MLE) is a method of estimation theory, estimating the Statistical parameter, parameters of an assumed probability distribution, given some observed data. This is achieved by Mathematical optimization, ...
estimator of the variance of a normal distribution
:
Cochran's theorem shows that
:
and the properties of the chi-squared distribution show that
:
Alternative formulation
The following version is often seen when considering linear regression.
Suppose that
is a standard
multivariate normal
In probability theory and statistics, the multivariate normal distribution, multivariate Gaussian distribution, or joint normal distribution is a generalization of the one-dimensional (univariate) normal distribution to higher dimensions. One d ...
random vector
In probability, and statistics, a multivariate random variable or random vector is a list of mathematical variables each of whose value is unknown, either because the value has not yet occurred or because there is imperfect knowledge of its value. ...
(here
denotes the ''n''-by-''n''
identity matrix
In linear algebra, the identity matrix of size n is the n\times n square matrix with ones on the main diagonal and zeros elsewhere.
Terminology and notation
The identity matrix is often denoted by I_n, or simply by I if the size is immaterial o ...
), and if
are all ''n''-by-''n''
symmetric matrices
In linear algebra, a symmetric matrix is a square matrix that is equal to its transpose. Formally,
Because equal matrices have equal dimensions, only square matrices can be symmetric.
The entries of a symmetric matrix are symmetric with re ...
with
. Then, on defining
, any one of the following conditions implies the other two:
*
*
(thus the
are
positive semidefinite)
*
is independent of
for
See also
*
Cramér's theorem, on decomposing normal distribution
*
Infinite divisibility (probability) In probability theory, a probability distribution is infinitely divisible if it can be expressed as the probability distribution of the sum of an arbitrary number of independent and identically distributed (i.i.d.) random variables. The characteri ...
References
{{DEFAULTSORT:Cochran's Theorem
Theorems in statistics
Characterization of probability distributions