In statistics, the multivariate ''t''-distribution (or multivariate Student distribution) is a multivariate probability distribution. It is a generalization to

random vector In probability, and statistics, a multivariate random variable or random vector is a list of mathematical variables each of whose value is unknown, either because the value has not yet occurred or because there is imperfect knowledge of its valu ...

s of the Student's ''t''-distribution, which is a distribution applicable to univariate

random variable A random variable (also called random quantity, aleatory variable, or stochastic variable) is a mathematical formalization of a quantity or object which depends on random events. It is a mapping or a function from possible outcomes (e.g., the p ...

s. While the case of a

random matrix In probability theory and mathematical physics, a random matrix is a matrix-valued random variable—that is, a matrix in which some or all elements are random variables. Many important properties of physical systems can be represented mathemat ...

could be treated within this structure, the matrix ''t''-distribution is distinct and makes particular use of the matrix structure.

Definition

One common method of construction of a multivariate ''t''-distribution, for the case of

p

dimensions, is based on the observation that if

\mathbf y

and

u

are independent and distributed as

N(,)

and

\chi^2_\nu

(i.e.

multivariate normal In probability theory and statistics, the multivariate normal distribution, multivariate Gaussian distribution, or joint normal distribution is a generalization of the one-dimensional (univariate) normal distribution to higher dimensions. One ...

and

chi-squared distribution In probability theory and statistics, the chi-squared distribution (also chi-square or \chi^2-distribution) with k degrees of freedom is the distribution of a sum of the squares of k independent standard normal random variables. The chi-squar ...

s) respectively, the matrix

\mathbf\,

is a ''p'' × ''p'' matrix, and

/\sqrt = -

, then

has the density :

\frac\left +\frac(-)^T^(-)\right

and is said to be distributed as a multivariate ''t''-distribution with parameters

,,\nu

. Note that

\mathbf\Sigma

is not the covariance matrix since the covariance is given by

\nu/(\nu-2)\mathbf\Sigma

(for

\nu>2

). The constructive definition of a multivariate ''t''-distribution simultaneously serves as a sampling algorithm: # Generate

u \sim \chi^2_\nu

and

\mathbf \sim N(\mathbf, \boldsymbol)

, independently. # Compute

\mathbf \gets \sqrt\mathbf+ \boldsymbol

. This formulation gives rise to the hierarchical representation of a multivariate ''t''-distribution as a scale-mixture of normals:

u \sim \mathrm(\nu/2,\nu/2)

where

\mathrm(a,b)

indicates a gamma distribution with density proportional to

x^e^

, and

\mathbf\mid u

conditionally follows

N(\boldsymbol,u^\boldsymbol)

. In the special case

\nu=1

, the distribution is a multivariate Cauchy distribution.

Derivation

There are in fact many candidates for the multivariate generalization of Student's ''t''-distribution. An extensive survey of the field has been given by Kotz and Nadarajah (2004). The essential issue is to define a probability density function of several variables that is the appropriate generalization of the formula for the univariate case. In one dimension (

p=1

), with

t=x-\mu

and

\Sigma=1

, we have the

probability density function In probability theory, a probability density function (PDF), or density of a continuous random variable, is a function whose value at any given sample (or point) in the sample space (the set of possible values taken by the random variable) c ...

f(t) = \frac (1+t^2/\nu)^

and one approach is to write down a corresponding function of several variables. This is the basic idea of

elliptical distribution In probability and statistics, an elliptical distribution is any member of a broad family of probability distributions that generalize the multivariate normal distribution. Intuitively, in the simplified two and three dimensional case, the joint d ...

theory, where one writes down a corresponding function of

p

variables

t_i

that replaces

t^2

by a quadratic function of all the

t_i

. It is clear that this only makes sense when all the marginal distributions have the same

degrees of freedom Degrees of freedom (often abbreviated df or DOF) refers to the number of independent variables or parameters of a thermodynamic system. In various scientific fields, the word "freedom" is used to describe the limits to which physical movement or ...

\nu

. With

\mathbf = \boldsymbol\Sigma^

, one has a simple choice of multivariate density function :

f(\mathbf t) = \frac \left(1+\sum_^ A_ t_i t_j/\nu\right)^

which is the standard but not the only choice. An important special case is the standard bivariate ''t''-distribution, ''p'' = 2: :

f(t_1,t_2) = \frac \left(1+\sum_^ A_ t_i t_j/\nu\right)^

Note that

\frac= \frac

. Now, if

\mathbf

is the identity matrix, the density is :

f(t_1,t_2) = \frac \left(1+(t_1^2 + t_2^2)/\nu\right)^.

The difficulty with the standard representation is revealed by this formula, which does not factorize into the product of the marginal one-dimensional distributions. When

\Sigma

is diagonal the standard representation can be shown to have zero

correlation In statistics, correlation or dependence is any statistical relationship, whether causal or not, between two random variables or bivariate data. Although in the broadest sense, "correlation" may indicate any type of association, in statisti ...

but the

marginal distribution In probability theory and statistics, the marginal distribution of a subset of a collection of random variables is the probability distribution of the variables contained in the subset. It gives the probabilities of various values of the variables ...

s do not agree with statistical independence.

Cumulative distribution function

The definition of the

cumulative distribution function In probability theory and statistics, the cumulative distribution function (CDF) of a real-valued random variable X, or just distribution function of X, evaluated at x, is the probability that X will take a value less than or equal to x. Ev ...

(cdf) in one dimension can be extended to multiple dimensions by defining the following probability (here

\mathbf

is a real vector): :

F(\mathbf) = \mathbb(\mathbf\leq \mathbf), \quad \textrm\;\; \mathbf\sim t_\nu(\boldsymbol\mu,\boldsymbol\Sigma).

There is no simple formula for

F(\mathbf)

, but it can b
approximated numerically
via

Monte Carlo integration In mathematics, Monte Carlo integration is a technique for numerical integration using random numbers. It is a particular Monte Carlo method that numerically computes a definite integral. While other algorithms usually evaluate the integrand at ...

Conditional Distribution

This was demonstrated by Muirhead though previously derived using the simpler ratio representation above, by Cornish. Let vector

X

follow the multivariate ''t'' distribution and partition into two subvectors of

p_1, p_2

elements: :

X_p =  \begin
     X_1  \\
     X_2  \end \sim t_p \left (\mu_p, \Sigma_, \nu \right )

where

p_1 + p_2 = p

, the known mean vector is

\mu_p =  \begin
     \mu_1  \\
     \mu_2  \end

and the scale matrix is

\Sigma_ = \begin
     \Sigma_ & \Sigma_ \\
     \Sigma_  & \Sigma_ \end

. Then :

p(X_2, X_1) \sim t_  \left( \mu_,\frac \Sigma_, \nu + p_1  \right)

where :

\mu_ =  \mu_2 + \Sigma_ \Sigma_^ \left(X_1 - \mu_1 \right )

is the conditional mean where it exists or median otherwise. :

\Sigma_ = \Sigma_ - \Sigma_ \Sigma_^ \Sigma_

is the

Schur complement In linear algebra and the theory of matrices, the Schur complement of a block matrix is defined as follows. Suppose ''p'', ''q'' are nonnegative integers, and suppose ''A'', ''B'', ''C'', ''D'' are respectively ''p'' × ''p'', ''p'' × ''q'', ''q'' ...

\Sigma_ \text \Sigma.

d_1 = (X_1 - \mu_1)^T \Sigma_^ (X_1 - \mu_1)

is the squared

Mahalanobis distance The Mahalanobis distance is a measure of the distance between a point ''P'' and a distribution ''D'', introduced by P. C. Mahalanobis in 1936. Mahalanobis's definition was prompted by the problem of identifying the similarities of skulls based ...

X_1

from

\mu_1

with scale matrix

\Sigma_

See for a simple proof of the above conditional distribution.

Copulas based on the multivariate ''t''

The use of such distributions is enjoying renewed interest due to applications in

mathematical finance Mathematical finance, also known as quantitative finance and financial mathematics, is a field of applied mathematics, concerned with mathematical modeling of financial markets. In general, there exist two separate branches of finance that requir ...

, especially through the use of the Student's ''t'' copula.

Elliptical Representation

Constructed as an

and in the simplest centralised case with spherical symmetry and without scaling,

\Sigma = \operatorname \,

, the multivariate t PDF takes the form :

f_X(X)= g(X^T X) = \frac \bigg( 1 + \nu^ X^T X \bigg)^

where

X =(x_1, \cdots ,x_p )^T\text  p\text

and

\nu

= degrees of freedom. The expected covariance of

X

is :

\int_^\infty \cdots \int_^\infty f_X(x_1,\dots, x_p) XX^T \, dx_1 \dots dx_p = \frac \operatorname (XX^T)

The aim is to convert the Cartesian PDF to a radial one. Kibria and Joarder, in a tutorial-style paper, define radial measure

r_2 = R^2 = \frac

such that

$= \int_^\infty \cdots \int_^\infty f_X(x_1,\dots, x_p) \frac \, dx_1 \dots dx_p$

which is equivalent to the expected variance of

p

-element vector

X

treated as a univariate zero-mean random sequence. They note that

r_2

follows the Fisher-Snedecor or

F

distribution: :

r_2 \sim F_( p,\nu) = B \bigg( \frac , \frac  \bigg ) ^ \bigg (\frac \bigg )^ r_2^  
 \bigg( 1 + \frac r_2 \bigg) ^

having mean value

= \frac

. By a change of random variable to

y =  \frac  r_2 = \frac

in the equation above, retaining

p

-vector

X

, we have

= \int_^\infty \cdots \int_^\infty f_X(X) \frac \, dx_1 \dots dx_p = \frac

and probability distribution :

\begin  f_Y(y,  \,p,\nu) & = \frac    B \bigg( \frac , \frac  \bigg )^  \big (\frac \big )^ \big (\frac \big )^ y^   \big( 1 + y \big) ^ \\ \\
                &  = B \bigg ( \frac , \frac  \bigg )^ y^(1+ y )^  \end

which is a regular

Beta-prime distribution In probability theory and statistics, the beta prime distribution (also known as inverted beta distribution or beta distribution of the second kindJohnson et al (1995), p 248) is an absolutely continuous probability distribution. Definitions ...

y \sim \beta \, '  \bigg(y; \frac , \frac  \bigg )

having mean value

\frac  = \frac

. The cumulative distribution function of

y

is thus known to be

$F_Y(y) \sim I \, \bigg(\frac ; \, \frac , \frac \bigg )$

where

I

is the incomplete

Beta function In mathematics, the beta function, also called the Euler integral of the first kind, is a special function that is closely related to the gamma function and to binomial coefficients. It is defined by the integral : \Beta(z_1,z_2) = \int_0^1 t^ ...

. These results can be derived by straightforward transformation of coordinates from cartesian to spherical. A constant radius surface at

R = (X^TX)^

with PDF

p_X(X)  \propto \bigg( 1 + \nu^ R^2 \bigg)^

is an iso-density surface. The quantum of probability in a surface shell of area

A_R

and thickness

\delta R

R

\delta P = p_X(R) \, A_R \delta R

. The enclosed sphere in

p

dimensions has surface area

A_R = \frac

and substitution into

\delta P

shows that the shell has element of probability

\delta P = p_X(R) \frac  \delta R

. This is equivalent to a radial density function :

f_R(R) =  \frac  \frac  \bigg( 1 + \frac \bigg)^

which simplifies to

f_R(R) =   \frac  \bigg( \frac  \bigg)^   \bigg( 1 + \frac \bigg)^

where

B(*,*)

is the

. Changing the radial variable to

r_2=R^2

gets :

f_(r_2) =   \frac  \bigg( \frac  \bigg)^   \bigg( 1 + \frac \bigg)^

Finally, scaling to

y= r_2 / \nu

returns the previous Beta Prime distribution :

f_Y(y) =  \frac   y^   \bigg( 1 + y \bigg)^

To scale the radial variables without changing the radial shape function, define scale matrix

\Sigma = \alpha \operatorname

, yielding a 3-parameter Cartesian density function, ie. the probability

\Delta

in volume element

dx_1 \dots dx_p

is :

\Delta f_X(X \,, \alpha, p, \nu) = \frac \bigg( 1 +  \frac \bigg)^ \; dx_1 \dots dx_p

or, in terms of scalar radial variable

R

, :

f_R(R \,, \alpha, p, \nu) =   \frac  \bigg( \frac  \bigg)^   \bigg( 1 + \frac \bigg)^

The moments of all the radial variables can be derived from the Beta Prime distribution. If

Z \sim \beta'(a,b)

then

\operatorname (Z^m) =

, a known result. Thus, for variable

y

, proportional to

R^2

, we have :

\operatorname (y^m) =  = \frac

The moments of

r_2 = \nu \, y

are :

\operatorname (r_2^m) = \nu^m\operatorname (y^m)

while introducing the scale matrix yields :

\operatorname (r_2^m ,  \alpha) = \alpha^m \nu^m \operatorname (y^m)

Moments relating to radial variable

R

are found by setting

R =(\alpha\nu y)^

and

M=2m

whereupon :

\operatorname (R^M ) =\operatorname \big((\alpha \nu y)^ \big)^ = (\alpha \nu )^ \operatorname (y^)= (\alpha \nu )^

Linear Combinations and Affine Transformation

Following section 3.3 of Kibria et.al. let

Z

be a

p

-vector sampled from a central spherical multivariate ''t'' distribution with

\nu

degrees of freedom:

Z_p \sim mvt_p(0, \operatorname, \nu)

X

is derived from

Z

via a linear transformation: :

X = \mu + \Sigma^ Z

where

\Sigma

has full rank, then :

X \sim mvt_p(\mu, \Sigma, \nu)

That is

\operatorname(X) = \mu

and the covariance of

X

= \frac \Sigma

Furthermore, if

A

is a non-singular matrix then :

Y = AX + b

\sim mvt_p(A \mu + b,  A \Sigma A^T, \nu)

with mean

\operatorname (Y) = A \mu + b

and covariance

= \frac A\Sigma A^T

. Roth (reference below) notes that if

A

is a

p_1 \times p_2

squat matrix with

p_1 < p_2

then

Y

has distribution

Y_ \sim mvt_(A \mu + b,  A \Sigma A^T, \nu)

. If

A

takes the form

Y_ =  \begin
     \operatorname & 0_    \end X_p

then the PDF of

Y_

is the marginal distribution of the leading

p_1

elements of

X_p

. The degrees of freedom parameter

\nu

is invariant throughout.

Related concepts

In univariate statistics, the Student's ''t''-test makes use of Student's ''t''-distribution. Hotelling's ''T''-squared distribution is a distribution that arises in multivariate statistics. The matrix ''t''-distribution is a distribution for random variables arranged in a matrix structure.

References

Literature

* *

External links

Copula Methods vs Canonical Multivariate Distributions: the multivariate Student T distribution with general degrees of freedom
{{DEFAULTSORT:Multivariate Normal Distribution Continuous distributions Multivariate continuous distributions

Definition

Derivation

Cumulative distribution function

Conditional Distribution

Copulas based on the multivariate ''t''

Elliptical Representation

Linear Combinations and Affine Transformation

Related concepts

See also

References

Literature

External links