In
probability theory
Probability theory is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expressing it through a set ...
and
statistics, the multivariate normal distribution, multivariate Gaussian distribution, or joint normal distribution is a generalization of the one-dimensional (
univariate
In mathematics, a univariate object is an expression, equation, function or polynomial involving only one variable. Objects involving more than one variable are multivariate. In some cases the distinction between the univariate and multivariate ...
)
normal distribution
In statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is
:
f(x) = \frac e^
The parameter \mu ...
to higher
dimension
In physics and mathematics, the dimension of a Space (mathematics), mathematical space (or object) is informally defined as the minimum number of coordinates needed to specify any Point (geometry), point within it. Thus, a Line (geometry), lin ...
s. One definition is that a
random vector
In probability, and statistics, a multivariate random variable or random vector is a list of mathematical variables each of whose value is unknown, either because the value has not yet occurred or because there is imperfect knowledge of its value ...
is said to be ''k''-variate normally distributed if every
linear combination of its ''k'' components has a univariate normal distribution. Its importance derives mainly from the
multivariate central limit theorem. The multivariate normal distribution is often used to describe, at least approximately, any set of (possibly)
correlated
In statistics, correlation or dependence is any statistical relationship, whether causal or not, between two random variables or bivariate data. Although in the broadest sense, "correlation" may indicate any type of association, in statistics ...
real-valued
random variable
A random variable (also called random quantity, aleatory variable, or stochastic variable) is a mathematical formalization of a quantity or object which depends on random events. It is a mapping or a function from possible outcomes (e.g., the po ...
s each of which clusters around a mean value.
Definitions
Notation and parameterization
The multivariate normal distribution of a ''k''-dimensional random vector
can be written in the following notation:
:
or to make it explicitly known that ''X'' is ''k''-dimensional,
:
with ''k''-dimensional
mean vector
:
and
covariance matrix
In probability theory and statistics, a covariance matrix (also known as auto-covariance matrix, dispersion matrix, variance matrix, or variance–covariance matrix) is a square matrix giving the covariance between each pair of elements of ...
:
such that
and
. The
inverse of the covariance matrix is called the
precision
Precision, precise or precisely may refer to:
Science, and technology, and mathematics Mathematics and computing (general)
* Accuracy and precision, measurement deviation from true value and its scatter
* Significant figures, the number of digit ...
matrix, denoted by
.
Standard normal random vector
A real
random vector
In probability, and statistics, a multivariate random variable or random vector is a list of mathematical variables each of whose value is unknown, either because the value has not yet occurred or because there is imperfect knowledge of its value ...
is called a standard normal random vector if all of its components
are independent and each is a zero-mean unit-variance normally distributed random variable, i.e. if
for all
.
Centered normal random vector
A real random vector
is called a centered normal random vector if there exists a deterministic
matrix
such that
has the same distribution as
where
is a standard normal random vector with
components.
[
]
Normal random vector
A real random vector is called a normal random vector if there exists a random -vector , which is a standard normal random vector, a -vector , and a matrix , such that .[
Formally:
Here the ]covariance matrix
In probability theory and statistics, a covariance matrix (also known as auto-covariance matrix, dispersion matrix, variance matrix, or variance–covariance matrix) is a square matrix giving the covariance between each pair of elements of ...
is .
In the degenerate
Degeneracy, degenerate, or degeneration may refer to:
Arts and entertainment
* Degenerate (album), ''Degenerate'' (album), a 2010 album by the British band Trigger the Bloodshed
* Degenerate art, a term adopted in the 1920s by the Nazi Party i ...
case where the covariance matrix is singular
Singular may refer to:
* Singular, the grammatical number that denotes a unit quantity, as opposed to the plural and other forms
* Singular homology
* SINGULAR, an open source Computer Algebra System (CAS)
* Singular or sounder, a group of boar, ...
, the corresponding distribution has no density; see the section below for details. This case arises frequently in statistics; for example, in the distribution of the vector of residuals in the ordinary least squares
In statistics, ordinary least squares (OLS) is a type of linear least squares method for choosing the unknown parameters in a linear regression model (with fixed level-one effects of a linear function of a set of explanatory variables) by the prin ...
regression. The are in general ''not'' independent; they can be seen as the result of applying the matrix to a collection of independent Gaussian variables .
Equivalent definitions
The following definitions are equivalent to the definition given above. A random vector has a multivariate normal distribution if it satisfies one of the following equivalent conditions.
*Every linear combination of its components is normally distributed. That is, for any constant vector , the random variable has a univariate normal distribution, where a univariate normal distribution with zero variance is a point mass on its mean.
*There is a ''k''-vector and a symmetric, positive semidefinite matrix , such that the characteristic function In mathematics, the term "characteristic function" can refer to any of several distinct concepts:
* The indicator function of a subset, that is the function
::\mathbf_A\colon X \to \,
:which for a given subset ''A'' of ''X'', has value 1 at points ...
of is
The spherical normal distribution can be characterised as the unique distribution where components are independent in any orthogonal coordinate system.
Density function
Non-degenerate case
The multivariate normal distribution is said to be "non-degenerate" when the symmetric covariance matrix
In probability theory and statistics, a covariance matrix (also known as auto-covariance matrix, dispersion matrix, variance matrix, or variance–covariance matrix) is a square matrix giving the covariance between each pair of elements of ...
is positive definite In mathematics, positive definiteness is a property of any object to which a bilinear form or a sesquilinear form may be naturally associated, which is positive-definite. See, in particular:
* Positive-definite bilinear form
* Positive-definite f ...
. In this case the distribution has density
Density (volumetric mass density or specific mass) is the substance's mass per unit of volume. The symbol most often used for density is ''ρ'' (the lower case Greek letter rho), although the Latin letter ''D'' can also be used. Mathematical ...
where is a real ''k''-dimensional column vector and is the determinant
In mathematics, the determinant is a scalar value that is a function of the entries of a square matrix. It characterizes some properties of the matrix and the linear map represented by the matrix. In particular, the determinant is nonzero if and ...
of , also known as the generalized variance The generalized variance is a scalar value which generalizes variance for multivariate random variables. It was introduced by Samuel S. Wilks.
The generalized variance is defined as the determinant of the covariance matrix
In probabilit ...
. The equation above reduces to that of the univariate normal distribution if is a matrix (i.e. a single real number).
The circularly symmetric version of the complex normal distribution
In probability theory, the family of complex normal distributions, denoted \mathcal or \mathcal_, characterizes complex random variables whose real and imaginary parts are jointly normal. The complex normal family has three parameters: ''location ...
has a slightly different form.
Each iso-density locus
Locus (plural loci) is Latin for "place". It may refer to:
Entertainment
* Locus (comics), a Marvel Comics mutant villainess, a member of the Mutant Liberation Front
* ''Locus'' (magazine), science fiction and fantasy magazine
** ''Locus Award' ...
— the locus of points in ''k''-dimensional space each of which gives the same particular value of the density — is an ellipse
In mathematics, an ellipse is a plane curve surrounding two focus (geometry), focal points, such that for all points on the curve, the sum of the two distances to the focal points is a constant. It generalizes a circle, which is the special ty ...
or its higher-dimensional generalization; hence the multivariate normal is a special case of the elliptical distribution
In probability and statistics, an elliptical distribution is any member of a broad family of probability distributions that generalize the multivariate normal distribution. Intuitively, in the simplified two and three dimensional case, the joint ...
s.
The quantity is known as the Mahalanobis distance, which represents the distance of the test point from the mean . Note that in the case when , the distribution reduces to a univariate normal distribution and the Mahalanobis distance reduces to the absolute value of the standard score
In statistics, the standard score is the number of standard deviations by which the value of a raw score (i.e., an observed value or data point) is above or below the mean value of what is being observed or measured. Raw scores above the mean ...
. See also Interval below.
Bivariate case
In the 2-dimensional nonsingular case (), the probability density function
In probability theory, a probability density function (PDF), or density of a continuous random variable, is a function whose value at any given sample (or point) in the sample space (the set of possible values taken by the random variable) can ...
of a vector is:
where is the correlation
In statistics, correlation or dependence is any statistical relationship, whether causal or not, between two random variables or bivariate data. Although in the broadest sense, "correlation" may indicate any type of association, in statistics ...
between and and
where and . In this case,
:
In the bivariate case, the first equivalent condition for multivariate reconstruction of normality can be made less restrictive as it is sufficient to verify that countably many
In mathematics, a set is countable if either it is finite or it can be made in one to one correspondence with the set of natural numbers. Equivalently, a set is ''countable'' if there exists an injective function from it into the natural number ...
distinct linear combinations of and are normal in order to conclude that the vector of is bivariate normal.[
The bivariate iso-density loci plotted in the -plane are ]ellipse
In mathematics, an ellipse is a plane curve surrounding two focus (geometry), focal points, such that for all points on the curve, the sum of the two distances to the focal points is a constant. It generalizes a circle, which is the special ty ...
s, whose principal axes are defined by the eigenvectors
In linear algebra, an eigenvector () or characteristic vector of a linear transformation is a nonzero vector that changes at most by a scalar factor when that linear transformation is applied to it. The corresponding eigenvalue, often denoted ...
of the covariance matrix (the major and minor semidiameters of the ellipse equal the square-root of the ordered eigenvalues).
As the absolute value of the correlation parameter increases, these loci are squeezed toward the following line :
:
This is because this expression, with (where sgn is the Sign function) replaced by , is the best linear unbiased prediction In statistics, best linear unbiased prediction (BLUP) is used in linear mixed models for the estimation of random effects. BLUP was derived by Charles Roy Henderson in 1950 but the term "best linear unbiased predictor" (or "prediction") seems not ...
of given a value of .[
]
Degenerate case
If the covariance matrix is not full rank, then the multivariate normal distribution is degenerate and does not have a density. More precisely, it does not have a density with respect to ''k''-dimensional Lebesgue measure
In measure theory, a branch of mathematics, the Lebesgue measure, named after French mathematician Henri Lebesgue, is the standard way of assigning a measure to subsets of ''n''-dimensional Euclidean space. For ''n'' = 1, 2, or 3, it coincides wit ...
(which is the usual measure assumed in calculus-level probability courses). Only random vectors whose distributions are absolutely continuous
In calculus, absolute continuity is a smoothness property of functions that is stronger than continuity and uniform continuity. The notion of absolute continuity allows one to obtain generalizations of the relationship between the two central ope ...
with respect to a measure are said to have densities (with respect to that measure). To talk about densities but avoid dealing with measure-theoretic complications it can be simpler to restrict attention to a subset of of the coordinates of such that the covariance matrix for this subset is positive definite; then the other coordinates may be thought of as an affine function of these selected coordinates.
To talk about densities meaningfully in singular cases, then, we must select a different base measure. Using the disintegration theorem
In mathematics, the disintegration theorem is a result in measure theory and probability theory. It rigorously defines the idea of a non-trivial "restriction" of a measure to a measure zero subset of the measure space in question. It is relate ...
we can define a restriction of Lebesgue measure to the -dimensional affine subspace of where the Gaussian distribution is supported, i.e. . With respect to this measure the distribution has the density of the following motif:
:
where is the generalized inverse
In mathematics, and in particular, algebra, a generalized inverse (or, g-inverse) of an element ''x'' is an element ''y'' that has some properties of an inverse element but not necessarily all of them. The purpose of constructing a generalized in ...
, is the rank of and is the pseudo-determinant In linear algebra and statistics, the pseudo-determinant is the product of all non-zero eigenvalues of a square matrix. It coincides with the regular determinant when the matrix is non-singular.
Definition
The pseudo-determinant of a square '' ...
.[
]
Cumulative distribution function
The notion of cumulative distribution function
In probability theory and statistics, the cumulative distribution function (CDF) of a real-valued random variable X, or just distribution function of X, evaluated at x, is the probability that X will take a value less than or equal to x.
Ev ...
(cdf) in dimension 1 can be extended in two ways to the multidimensional case, based on rectangular and ellipsoidal regions.
The first way is to define the cdf of a random vector as the probability that all components of are less than or equal to the corresponding values in the vector :
:
Though there is no closed form for , there are a number of algorithms that
estimate it numerically
Another way is to define the cdf as the probability that a sample lies inside the ellipsoid determined by its Mahalanobis distance from the Gaussian, a direct generalization of the standard deviation.[Bensimhoun Michael, ''N-Dimensional Cumulative Function, And Other Useful Facts About Gaussians and Normal Densities'' (2006)]
/ref>
In order to compute the values of this function, closed analytic formulae exist, as follows.
Interval
The interval for the multivariate normal distribution yields a region consisting of those vectors x satisfying
:
Here is a -dimensional vector, is the known -dimensional mean vector, is the known covariance matrix
In probability theory and statistics, a covariance matrix (also known as auto-covariance matrix, dispersion matrix, variance matrix, or variance–covariance matrix) is a square matrix giving the covariance between each pair of elements of ...
and is the quantile function
In probability and statistics, the quantile function, associated with a probability distribution of a random variable, specifies the value of the random variable such that the probability of the variable being less than or equal to that value equ ...
for probability of the chi-squared distribution
In probability theory and statistics, the chi-squared distribution (also chi-square or \chi^2-distribution) with k degrees of freedom is the distribution of a sum of the squares of k independent standard normal random variables. The chi-squa ...
with degrees of freedom.[
When the expression defines the interior of an ellipse and the chi-squared distribution simplifies to an ]exponential distribution
In probability theory and statistics, the exponential distribution is the probability distribution of the time between events in a Poisson point process, i.e., a process in which events occur continuously and independently at a constant average ...
with mean equal to two (rate equal to half).
Complementary cumulative distribution function (tail distribution)
The complementary cumulative distribution function (ccdf) or the tail distribution
is defined as .
When , then
the ccdf can be written as a probability the maximum of dependent Gaussian variables:[
]
:
While no simple closed formula exists for computing the ccdf, the maximum of dependent Gaussian variables can
be estimated accurately via the Monte Carlo method
Monte Carlo methods, or Monte Carlo experiments, are a broad class of computational algorithms that rely on repeated random sampling to obtain numerical results. The underlying concept is to use randomness to solve problems that might be determi ...
.[
]
Properties
Probability in different domains
The probability content of the multivariate normal in a quadratic domain defined by (where is a matrix, is a vector, and is a scalar), which is relevant for Bayesian classification/decision theory using Gaussian discriminant analysis, is given by the generalized chi-squared distribution
In probability theory and statistics, the generalized chi-squared distribution (or generalized chi-square distribution) is the distribution of a quadratic form of a multinormal variable (normal vector), or a linear combination of different no ...
.
The probability content within any general domain defined by (where is a general function) can be computed using the numerical method of ray-tracing
Matlab code
.
Higher moments
The ''k''th-order moments of x are given by
:
where
The ''k''th-order central moments are as follows
where the sum is taken over all allocations of the set into ''λ'' (unordered) pairs. That is, for a ''k''th central moment, one sums the products of covariances (the expected value ''μ'' is taken to be 0 in the interests of parsimony):
:
This yields terms in the sum (15 in the above case), each being the product of ''λ'' (in this case 3) covariances. For fourth order moments (four variables) there are three terms. For sixth-order moments there are terms, and for eighth-order moments there are terms.
The covariances are then determined by replacing the terms of the list