Cochran's theorem
   HOME

TheInfoList



OR:

In
statistics Statistics (from German: '' Statistik'', "description of a state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a scientific, indust ...
, Cochran's theorem, devised by
William G. Cochran William Gemmell Cochran (15 July 1909 – 29 March 1980) was a prominent statistics, statistician. He was born in Scotland but spent most of his life in the United States. Cochran studied mathematics at the University of Glasgow and the Univers ...
, is a
theorem In mathematics, a theorem is a statement that has been proved, or can be proved. The ''proof'' of a theorem is a logical argument that uses the inference rules of a deductive system to establish that the theorem is a logical consequence of t ...
used to justify results relating to the
probability distribution In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment. It is a mathematical description of a random phenomenon ...
s of statistics that are used in the
analysis of variance Analysis of variance (ANOVA) is a collection of statistical models and their associated estimation procedures (such as the "variation" among and between groups) used to analyze the differences among means. ANOVA was developed by the statistician ...
.


Statement

Let ''U''1, ..., ''U''''N'' be i.i.d. standard normally distributed
random variable A random variable (also called random quantity, aleatory variable, or stochastic variable) is a mathematical formalization of a quantity or object which depends on random events. It is a mapping or a function from possible outcomes (e.g., the po ...
s, and U = _1, ..., U_NT. Let B^,B^,\ldots, B^be symmetric matrices. Define ''r''''i'' to be the
rank Rank is the relative position, value, worth, complexity, power, importance, authority, level, etc. of a person or object within a ranking, such as: Level or position in a hierarchical organization * Academic rank * Diplomatic rank * Hierarchy * ...
of B^. Define Q_i=U^T B^U, so that the ''Q''i are
quadratic form In mathematics, a quadratic form is a polynomial with terms all of degree two ("form" is another name for a homogeneous polynomial). For example, :4x^2 + 2xy - 3y^2 is a quadratic form in the variables and . The coefficients usually belong to ...
s. Further assume \sum_i Q_i = U^T U. Cochran's theorem states that the following are equivalent: * r_1+\cdots +r_k=N, * the ''Q''''i'' are
independent Independent or Independents may refer to: Arts, entertainment, and media Artist groups * Independents (artist group), a group of modernist painters based in the New Hope, Pennsylvania, area of the United States during the early 1930s * Independe ...
* each ''Q''''i'' has a
chi-squared distribution In probability theory and statistics, the chi-squared distribution (also chi-square or \chi^2-distribution) with k degrees of freedom is the distribution of a sum of the squares of k independent standard normal random variables. The chi-squar ...
with ''r''''i''
degrees of freedom Degrees of freedom (often abbreviated df or DOF) refers to the number of independent variables or parameters of a thermodynamic system. In various scientific fields, the word "freedom" is used to describe the limits to which physical movement or ...
. Often it's stated as \sum_i A_i = A, where A is idempotent, and \sum_i r_i = N is replaced by \sum_i r_i = rank(A). But after an orthogonal transform, A = diag(I_M, 0), and so we reduce to the above theorem.


Proof

Claim: Let X be a standard Gaussian in \R^n, then for any symmetric matrices Q, Q', if X^T Q X and X^T Q' X have the same distribution, then Q, Q' have the same eigenvalues (up to multiplicity). Proof: Let the eigenvalues of Q be \lambda_1, ..., \lambda_n, then calculate the
characteristic function In mathematics, the term "characteristic function" can refer to any of several distinct concepts: * The indicator function of a subset, that is the function ::\mathbf_A\colon X \to \, :which for a given subset ''A'' of ''X'', has value 1 at points ...
of X^T Q X. It comes out to be \phi(t) =\left(\prod_j (1-2i \lambda_j t)\right)^ (To calculate it, first diagonalize Q, change into that frame, then use the fact that the characteristic function of the sum of independent variables is the product of their characteristic functions.) For X^T Q X and X^T Q' X to be equal, their characteristic functions must be equal, so Q, Q' have the same eigenvalues (up to multiplicity). Claim: I = \sum_i B_i. Proof: U^T (I - \sum_i B_i) U = 0. Since (I - \sum_i B_i) is symmetric, and U^T (I - \sum_i B_i) U =^d U^T 0 U, by the previous claim, (I - \sum_i B_i) has the same eigenvalues as 0. Lemma: If \sum_i M_i = I, all M_i symmetric, and have eigenvalues 0, 1, then they are simultaneously diagonalizable. Fix i, and consider the eigenvectors v of M_i such that M_i v = v. Then we have v^T v = v^T I v = v^T v + \sum_ v^T M_j v, so all v^T M_j v = 0. Thus we obtain a split of \R^N into V\oplus V^\perp, such that V is the 1-eigenspace of M_i , and in the 0-eigenspaces of all other M_j . Now induct by moving into V^\perp. Case: All Q_i are independent Fix some i, define C_i = I - B_i = \sum_ B_j, and diagonalize B_i by an orthogonal transform O. Then consider O C_i O^T = I - O B_i O^T. It is diagonalized as well. Let W = OU, then it is also standard Gaussian. Then we have Q_i = W^T (OB_i O^T) W; \quad \sum_ Q_j = W^T (I - OB_i O^T) W Inspect their diagonal entries, to see that Q_i \perp \sum_ Q_j implies that their nonzero diagonal entries are disjoint. Thus all eigenvalues of B_i are 0, 1, so Q_i is a \chi^2 dist with r_i degrees of freedom. Case: Each Q_i is a \chi^2(r_i) distribution. Fix any i, diagonalize it by orthogonal transform O, and reindex, so that O B_i O^T = diag(\lambda_1, ..., \lambda_, 0, ..., 0). Then Q_i = \sum_j \lambda_j _j^2 for some U'_j, a spherical rotation of U_i. Since Q_i\sim \chi^2(r_i), we get all \lambda_j = 1. So all B_i\succeq 0, and have eigenvalues 0, 1. So diagonalize them simultaneously, add them up, to find \sum_i r_i = N. Case: r_1+\cdots +r_k=N We first show that the matrices ''B''(''i'') can be simultaneously diagonalized by an orthogonal matrix and that their non-zero
eigenvalue In linear algebra, an eigenvector () or characteristic vector of a linear transformation is a nonzero vector that changes at most by a scalar factor when that linear transformation is applied to it. The corresponding eigenvalue, often denote ...
s are all equal to +1. Once that's shown, take this orthogonal transform to this simultaneous
eigenbasis In linear algebra, an eigenvector () or characteristic vector of a linear transformation is a nonzero vector that changes at most by a scalar factor when that linear transformation is applied to it. The corresponding eigenvalue, often denoted ...
, in which the random vector _1, ..., U_NT becomes '_1, ..., U'_NT, but all U_i' are still independent and standard Gaussian. Then the result follows. Each of the matrices ''B''(''i'') has
rank Rank is the relative position, value, worth, complexity, power, importance, authority, level, etc. of a person or object within a ranking, such as: Level or position in a hierarchical organization * Academic rank * Diplomatic rank * Hierarchy * ...
''r''''i'' and thus ''r''''i'' non-zero
eigenvalue In linear algebra, an eigenvector () or characteristic vector of a linear transformation is a nonzero vector that changes at most by a scalar factor when that linear transformation is applied to it. The corresponding eigenvalue, often denote ...
s. For each ''i'', the sum C^ \equiv \sum_B^ has at most rank \sum_r_j = N-r_i. Since B^+C^ = I_, it follows that ''C''(''i'') has exactly rank ''N'' − ''r''''i''. Therefore ''B''(''i'') and ''C''(''i'') can be simultaneously diagonalized. This can be shown by first diagonalizing ''B''(''i''), by the
spectral theorem In mathematics, particularly linear algebra and functional analysis, a spectral theorem is a result about when a linear operator or matrix can be diagonalized (that is, represented as a diagonal matrix in some basis). This is extremely useful be ...
. In this basis, it is of the form: :\begin \lambda_1 & 0 & 0 & \cdots & \cdots & & 0 \\ 0 & \lambda_2 & 0 & \cdots & \cdots & & 0 \\ 0 & 0 & \ddots & & & & \vdots \\ \vdots & \vdots & & \lambda_ & & \\ \vdots & \vdots & & & 0 & \\ 0 & \vdots & & & & \ddots \\ 0 & 0 & \ldots & & & & 0 \end. Thus the lower (N-r_i) rows are zero. Since C^ = I - B^, it follows that these rows in ''C''(''i'') in this basis contain a right block which is a (N-r_i)\times(N-r_i) unit matrix, with zeros in the rest of these rows. But since ''C''(''i'') has rank ''N'' − ''r''''i'', it must be zero elsewhere. Thus it is diagonal in this basis as well. It follows that all the non-zero
eigenvalue In linear algebra, an eigenvector () or characteristic vector of a linear transformation is a nonzero vector that changes at most by a scalar factor when that linear transformation is applied to it. The corresponding eigenvalue, often denote ...
s of both ''B''(''i'') and ''C''(''i'') are +1. This argument applies for all ''i'', thus all ''B''(''i'') are positive semidefinite. Moreover, the above analysis can be repeated in the diagonal basis for C^ = B^ + \sum_B^. In this basis C^ is the identity of an (N-r_1)\times(N-r_1) vector space, so it follows that both ''B''(2) and \sum_B^ are simultaneously diagonalizable in this vector space (and hence also together with ''B''(1)). By iteration it follows that all ''B''-s are simultaneously diagonalizable. Thus there exists an
orthogonal matrix In linear algebra, an orthogonal matrix, or orthonormal matrix, is a real square matrix whose columns and rows are orthonormal vectors. One way to express this is Q^\mathrm Q = Q Q^\mathrm = I, where is the transpose of and is the identity m ...
S such that for all i, S^\mathrmB^ S \equiv B^ is diagonal, where any entry B^_ with indices x = y, \sum_^ r_j < x = y \le \sum_^i r_j , is equal to 1, while any entry with other indices is equal to 0.


Examples


Sample mean and sample variance

If ''X''1, ..., ''X''''n'' are independent normally distributed random variables with mean ''μ'' and standard deviation ''σ'' then :U_i = \frac is standard normal for each ''i''. Note that the total ''Q'' is equal to sum of squared ''U''s as shown here: :\sum_iQ_i=\sum_ U_j B_^ U_k = \sum_ U_j U_k \sum_i B_^ = \sum_ U_j U_k\delta_ = \sum_ U_j^2 which stems from the original assumption that B_ + B_ \ldots = I. So instead we will calculate this quantity and later separate it into ''Q''''i'''s. It is possible to write : \sum_^n U_i^2=\sum_^n\left(\frac\right)^2 + n\left(\frac\right)^2 (here \overline is the
sample mean The sample mean (or "empirical mean") and the sample covariance are statistics computed from a sample of data on one or more random variables. The sample mean is the average value (or mean value) of a sample of numbers taken from a larger popu ...
). To see this identity, multiply throughout by \sigma^2 and note that : \sum(X_i-\mu)^2= \sum(X_i-\overline+\overline-\mu)^2 and expand to give : \sum(X_i-\mu)^2= \sum(X_i-\overline)^2+\sum(\overline-\mu)^2+ 2\sum(X_i-\overline)(\overline-\mu). The third term is zero because it is equal to a constant times :\sum(\overline-X_i)=0, and the second term has just ''n'' identical terms added together. Thus : \sum(X_i-\mu)^2 = \sum(X_i-\overline)^2+n(\overline-\mu)^2 , and hence : \sum\left(\frac\right)^2= \sum\left(\frac\right)^2 +n\left(\frac\right)^2= \overbrace^ +\overbrace^= Q_1+Q_2. Now B^=\frac with J_n the matrix of ones which has rank 1. In turn B^= I_n-\frac given that I_n=B^+B^. This expression can be also obtained by expanding Q_1 in matrix notation. It can be shown that the rank of B^ is n-1 as the addition of all its rows is equal to zero. Thus the conditions for Cochran's theorem are met. Cochran's theorem then states that ''Q''1 and ''Q''2 are independent, with chi-squared distributions with ''n'' − 1 and 1 degree of freedom respectively. This shows that the sample mean and
sample variance In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its population mean or sample mean. Variance is a measure of dispersion, meaning it is a measure of how far a set of numbe ...
are independent. This can also be shown by Basu's theorem, and in fact this property ''characterizes'' the normal distribution – for no other distribution are the sample mean and sample variance independent.


Distributions

The result for the distributions is written symbolically as : \sum\left(X_i-\overline\right)^2 \sim \sigma^2 \chi^2_. : n(\overline-\mu)^2\sim \sigma^2 \chi^2_1, Both these random variables are proportional to the true but unknown variance ''σ''2. Thus their ratio does not depend on ''σ''2 and, because they are statistically independent. The distribution of their ratio is given by : \frac \sim \frac \sim F_ where ''F''1,''n'' − 1 is the
F-distribution In probability theory and statistics, the ''F''-distribution or F-ratio, also known as Snedecor's ''F'' distribution or the Fisher–Snedecor distribution (after Ronald Fisher and George W. Snedecor) is a continuous probability distribution ...
with 1 and ''n'' − 1 degrees of freedom (see also
Student's t-distribution In probability and statistics, Student's ''t''-distribution (or simply the ''t''-distribution) is any member of a family of continuous probability distributions that arise when estimating the mean of a normally distributed population in situ ...
). The final step here is effectively the definition of a random variable having the F-distribution.


Estimation of variance

To estimate the variance ''σ''2, one estimator that is sometimes used is the
maximum likelihood In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of an assumed probability distribution, given some observed data. This is achieved by maximizing a likelihood function so that, under the assumed stat ...
estimator of the variance of a normal distribution : \widehat^2= \frac\sum\left( X_i-\overline\right)^2. Cochran's theorem shows that : \frac\sim\chi^2_ and the properties of the chi-squared distribution show that :\begin E \left(\frac\right) &= E \left(\chi^2_\right) \\ \fracE \left(\widehat^2\right) &= (n-1) \\ E \left(\widehat^2\right) &= \frac \end


Alternative formulation

The following version is often seen when considering linear regression. Suppose that Y\sim N_n(0,\sigma^2I_n) is a standard multivariate normal
random vector In probability, and statistics, a multivariate random variable or random vector is a list of mathematical variables each of whose value is unknown, either because the value has not yet occurred or because there is imperfect knowledge of its value ...
(here I_n denotes the ''n''-by-''n''
identity matrix In linear algebra, the identity matrix of size n is the n\times n square matrix with ones on the main diagonal and zeros elsewhere. Terminology and notation The identity matrix is often denoted by I_n, or simply by I if the size is immaterial or ...
), and if A_1,\ldots,A_k are all ''n''-by-''n'' symmetric matrices with \sum_^kA_i=I_n. Then, on defining r_i= \operatorname(A_i), any one of the following conditions implies the other two: * \sum_^kr_i=n , * Y^TA_iY\sim\sigma^2\chi^2_ (thus the A_i are positive semidefinite) * Y^TA_iY is independent of Y^TA_jY for i\neq j .


See also

* Cramér's theorem, on decomposing normal distribution * Infinite divisibility (probability)


References

{{DEFAULTSORT:Cochran's Theorem Theorems in statistics Characterization of probability distributions