Hotelling's T-squared Distribution
   HOME

TheInfoList



OR:

In
statistics Statistics (from German language, German: ''wikt:Statistik#German, Statistik'', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of ...
, particularly in
hypothesis testing A statistical hypothesis test is a method of statistical inference used to decide whether the data at hand sufficiently support a particular hypothesis. Hypothesis testing allows us to make probabilistic statements about population parameters. ...
, the Hotelling's ''T''-squared distribution (''T''2), proposed by
Harold Hotelling Harold Hotelling (; September 29, 1895 – December 26, 1973) was an American mathematical statistician and an influential economic theorist, known for Hotelling's law, Hotelling's lemma, and Hotelling's rule in economics, as well as Hotelling's T ...
, is a
multivariate probability distribution Given two random variables that are defined on the same probability space, the joint probability distribution is the corresponding probability distribution on all possible pairs of outputs. The joint distribution can just as well be considere ...
that is tightly related to the ''F''-distribution and is most notable for arising as the distribution of a set of
sample statistics A statistic (singular) or sample statistic is any quantity computed from values in a sample which is considered for a statistical purpose. Statistical purposes include estimating a population parameter, describing a sample, or evaluating a hypot ...
that are natural generalizations of the statistics underlying the Student's ''t''-distribution. The Hotelling's ''t''-squared statistic (''t''2) is a generalization of Student's ''t''-statistic that is used in
multivariate Multivariate may refer to: In mathematics * Multivariable calculus * Multivariate function * Multivariate polynomial In computing * Multivariate cryptography * Multivariate division algorithm * Multivariate interpolation * Multivariate optical c ...
hypothesis testing A statistical hypothesis test is a method of statistical inference used to decide whether the data at hand sufficiently support a particular hypothesis. Hypothesis testing allows us to make probabilistic statements about population parameters. ...
.


Motivation

The distribution arises in
multivariate statistics Multivariate statistics is a subdivision of statistics encompassing the simultaneous observation and analysis of more than one outcome variable. Multivariate statistics concerns understanding the different aims and background of each of the dif ...
in undertaking
tests Test(s), testing, or TEST may refer to: * Test (assessment), an educational assessment intended to measure the respondents' knowledge or other abilities Arts and entertainment * ''Test'' (2013 film), an American film * ''Test'' (2014 film), ...
of the differences between the (multivariate) means of different populations, where tests for univariate problems would make use of a ''t''-test. The distribution is named for
Harold Hotelling Harold Hotelling (; September 29, 1895 – December 26, 1973) was an American mathematical statistician and an influential economic theorist, known for Hotelling's law, Hotelling's lemma, and Hotelling's rule in economics, as well as Hotelling's T ...
, who developed it as a generalization of Student's ''t''-distribution.


Definition

If the vector d is Gaussian multivariate-distributed with zero mean and unit
covariance matrix In probability theory and statistics, a covariance matrix (also known as auto-covariance matrix, dispersion matrix, variance matrix, or variance–covariance matrix) is a square matrix giving the covariance between each pair of elements of ...
N(\mathbf_, \mathbf_) and M is a p \times p matrix with unit
scale matrix In affine geometry, uniform scaling (or isotropic scaling) is a linear transformation that enlarges (increases) or shrinks (diminishes) objects by a ''scale factor'' that is the same in all directions. The result of uniform scaling is similar ...
and ''m''
degrees of freedom Degrees of freedom (often abbreviated df or DOF) refers to the number of independent variables or parameters of a thermodynamic system. In various scientific fields, the word "freedom" is used to describe the limits to which physical movement or ...
with a
Wishart distribution In statistics, the Wishart distribution is a generalization to multiple dimensions of the gamma distribution. It is named in honor of John Wishart, who first formulated the distribution in 1928. It is a family of probability distributions define ...
W(\mathbf_, m), then the
quadratic form In mathematics, a quadratic form is a polynomial with terms all of degree two ("form" is another name for a homogeneous polynomial). For example, :4x^2 + 2xy - 3y^2 is a quadratic form in the variables and . The coefficients usually belong to a ...
X has a Hotelling distribution (with parameters p and m): :X = m d^T M^ d \sim T^2(p, m). Furthermore, if a random variable ''X'' has Hotelling's ''T''-squared distribution, X \sim T^2_, then: : \frac X\sim F_ where F_ is the ''F''-distribution with parameters ''p'' and ''m−p+1''.


Hotelling ''t''-squared statistic

Let \hat be the
sample covariance The sample mean (or "empirical mean") and the sample covariance are statistics computed from a sample of data on one or more random variables. The sample mean is the average value (or mean value) of a sample of numbers taken from a larger po ...
: : \hat = \frac 1 \sum_^n (\mathbf_i -\overline) (\mathbf_i-\overline)' where we denote
transpose In linear algebra, the transpose of a matrix is an operator which flips a matrix over its diagonal; that is, it switches the row and column indices of the matrix by producing another matrix, often denoted by (among other notations). The tr ...
by an
apostrophe The apostrophe ( or ) is a punctuation mark, and sometimes a diacritical mark, in languages that use the Latin alphabet and some other alphabets. In English, the apostrophe is used for two basic purposes: * The marking of the omission of one o ...
. It can be shown that \hat is a positive (semi) definite matrix and (n-1)\hat follows a ''p''-variate
Wishart distribution In statistics, the Wishart distribution is a generalization to multiple dimensions of the gamma distribution. It is named in honor of John Wishart, who first formulated the distribution in 1928. It is a family of probability distributions define ...
with ''n''−1 degrees of freedom. The sample covariance matrix of the mean reads \hat_\overline=\hat/n. The Hotelling's ''t''-squared statistic is then defined as: : t^2=(\overline-\boldsymbol)'\hat_\overline^ (\overline-\boldsymbol), which is proportional to the
distance Distance is a numerical or occasionally qualitative measurement of how far apart objects or points are. In physics or everyday usage, distance may refer to a physical length or an estimation based on other criteria (e.g. "two counties over"). ...
between the sample mean and \boldsymbol. Because of this, one should expect the statistic to assume low values if \overline \approx \boldsymbol, and high values if they are different. From the
distribution Distribution may refer to: Mathematics *Distribution (mathematics), generalized functions used to formulate solutions of partial differential equations * Probability distribution, the probability of a particular value or value range of a vari ...
, :t^2 \sim T^2_=\frac F_ , where F_ is the ''F''-distribution with parameters ''p'' and ''n'' − ''p''. In order to calculate a ''p''-value (unrelated to ''p'' variable here), note that the distribution of t^2 equivalently implies that : \frac t^2 \sim F_ . Then, use the quantity on the left hand side to evaluate the ''p''-value corresponding to the sample, which comes from the ''F''-distribution. A
confidence region In statistics, a confidence region is a multi-dimensional generalization of a confidence interval. It is a set of points in an ''n''-dimensional space, often represented as an ellipsoid around a point which is an estimated solution to a problem, al ...
may also be determined using similar logic.


Motivation

Let \mathcal_p(\boldsymbol,) denote a ''p''-variate normal distribution with
location In geography, location or place are used to denote a region (point, line, or area) on Earth's surface or elsewhere. The term ''location'' generally implies a higher degree of certainty than ''place'', the latter often indicating an entity with an ...
\boldsymbol and known
covariance In probability theory and statistics, covariance is a measure of the joint variability of two random variables. If the greater values of one variable mainly correspond with the greater values of the other variable, and the same holds for the les ...
. Let :_1,\dots,_n\sim \mathcal_p(\boldsymbol,) be ''n'' independent identically distributed (iid)
random variable A random variable (also called random quantity, aleatory variable, or stochastic variable) is a mathematical formalization of a quantity or object which depends on random events. It is a mapping or a function from possible outcomes (e.g., the po ...
s, which may be represented as p\times1 column vectors of real numbers. Define :\overline=\frac to be the
sample mean The sample mean (or "empirical mean") and the sample covariance are statistics computed from a Sample (statistics), sample of data on one or more random variables. The sample mean is the average value (or mean, mean value) of a sample (statistic ...
with covariance _\overline=/ n. It can be shown that :(\overline-\boldsymbol)'_\overline^(\overline-\boldsymbol)\sim\chi^2_p , where \chi^2_p is the
chi-squared distribution In probability theory and statistics, the chi-squared distribution (also chi-square or \chi^2-distribution) with k degrees of freedom is the distribution of a sum of the squares of k independent standard normal random variables. The chi-squa ...
with ''p'' degrees of freedom.


Two-sample statistic

If _1,\dots,_\sim N_p(\boldsymbol,) and _1,\dots,_\sim N_p(\boldsymbol,), with the samples independently drawn from two
independent Independent or Independents may refer to: Arts, entertainment, and media Artist groups * Independents (artist group), a group of modernist painters based in the New Hope, Pennsylvania, area of the United States during the early 1930s * Independ ...
multivariate normal distribution In probability theory and statistics, the multivariate normal distribution, multivariate Gaussian distribution, or joint normal distribution is a generalization of the one-dimensional (univariate) normal distribution to higher dimensions. One d ...
s with the same mean and covariance, and we define :\overline=\frac\sum_^ \mathbf_i \qquad \overline=\frac\sum_^ \mathbf_i as the sample means, and :\hat_=\frac\sum_^ (\mathbf_i-\overline)(\mathbf_i-\overline)' :\hat_=\frac\sum_^ (\mathbf_i-\overline)(\mathbf_i-\overline)' as the respective sample covariance matrices. Then :\hat= \frac is the unbiased pooled covariance matrix estimate (an extension of
pooled variance In statistics, pooled variance (also known as combined variance, composite variance, or overall variance, and written \sigma^2) is a method for estimating variance of several different populations when the mean of each population may be different ...
). Finally, the Hotelling's two-sample ''t''-squared statistic is :t^2 = \frac(\overline-\overline)'\hat^(\overline-\overline) \sim T^2(p, n_x+n_y-2)


Related concepts

It can be related to the F-distribution by :\fract^2 \sim F(p,n_x+n_y-1-p). The non-null distribution of this statistic is the
noncentral F-distribution In probability theory and statistics, the noncentral ''F''-distribution is a continuous probability distribution that is a noncentral distribution, noncentral generalization of the (ordinary) F-distribution, ''F''-distribution. It describes the di ...
(the ratio of a non-central Chi-squared random variable and an independent central Chi-squared random variable) :\fract^2 \sim F(p,n_x+n_y-1-p;\delta), with :\delta = \frac\boldsymbol'\mathbf^\boldsymbol, where \boldsymbol=\mathbf is the difference vector between the population means. In the two-variable case, the formula simplifies nicely allowing appreciation of how the correlation, \rho, between the variables affects t^2. If we define :d_ = \overline_-\overline_, \qquad d_ = \overline_-\overline_ and :s_1 = \sqrt \qquad s_2 = \sqrt \qquad \rho = \Sigma_/(s_1 s_2) = \Sigma_/(s_1 s_2) then :t^2 = \frac \left \left ( \frac \right )^2+\left ( \frac \right )^2-2\rho \left ( \frac \right )\left ( \frac \right ) \right Thus, if the differences in the two rows of the vector \mathbf d = \overline-\overline are of the same sign, in general, t^2 becomes smaller as \rho becomes more positive. If the differences are of opposite sign t^2 becomes larger as \rho becomes more positive. A univariate special case can be found in
Welch's t-test In statistics, Welch's ''t''-test, or unequal variances ''t''-test, is a two-sample location test which is used to test the hypothesis that two populations have equal means. It is named for its creator, Bernard Lewis Welch, is an adaptation of ...
. More robust and powerful tests than Hotelling's two-sample test have been proposed in the literature, see for example the interpoint distance based tests which can be applied also when the number of variables is comparable with, or even larger than, the number of subjects.


See also

* Student's ''t''-test in univariate statistics * Student's ''t''-distribution in univariate probability theory *
Multivariate Student distribution In statistics, the multivariate ''t''-distribution (or multivariate Student distribution) is a multivariate probability distribution. It is a generalization to random vectors of the Student's ''t''-distribution, which is a distribution applicab ...
* ''F''-distribution (commonly tabulated or available in software libraries, and hence used for testing the ''T''-squared statistic using the relationship given above) *
Wilks's lambda distribution In statistics, Wilks' lambda distribution (named for Samuel S. Wilks), is a probability distribution used in multivariate hypothesis testing, especially with regard to the likelihood-ratio test and multivariate analysis of variance (MANOVA). De ...
(in
multivariate statistics Multivariate statistics is a subdivision of statistics encompassing the simultaneous observation and analysis of more than one outcome variable. Multivariate statistics concerns understanding the different aims and background of each of the dif ...
, Wilks's ''Λ'' is to Hotelling's ''T''2 as Snedecor's ''F'' is to Student's ''t'' in univariate statistics)


References


External links

* {{DEFAULTSORT:Hotelling's T-Squared Distribution Continuous distributions