probability Probability is a branch of mathematics and statistics concerning events and numerical descriptions of how likely they are to occur. The probability of an event is a number between 0 and 1; the larger the probability, the more likely an e ...

, and

statistics Statistics (from German language, German: ', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a s ...

, a multivariate random variable or random vector is a list or vector of mathematical variables each of whose value is unknown, either because the value has not yet occurred or because there is imperfect knowledge of its value. The individual variables in a random vector are grouped together because they are all part of a single mathematical system — often they represent different properties of an individual statistical unit. For example, while a given person has a specific age, height and weight, the representation of these features of ''an unspecified person'' from within a group would be a random vector. Normally each element of a random vector is a

real number In mathematics, a real number is a number that can be used to measure a continuous one- dimensional quantity such as a duration or temperature. Here, ''continuous'' means that pairs of values can have arbitrarily small differences. Every re ...

. Random vectors are often used as the underlying implementation of various types of aggregate

random variable A random variable (also called random quantity, aleatory variable, or stochastic variable) is a Mathematics, mathematical formalization of a quantity or object which depends on randomness, random events. The term 'random variable' in its mathema ...

s, e.g. a random matrix,

random tree In common usage, randomness is the apparent or actual lack of definite pattern or predictability in information. A random sequence of events, symbols or steps often has no :wikt:order, order and does not follow an intelligible pattern or com ...

, random sequence,

stochastic process In probability theory and related fields, a stochastic () or random process is a mathematical object usually defined as a family of random variables in a probability space, where the index of the family often has the interpretation of time. Sto ...

, etc. Formally, a multivariate random variable is a

column vector In linear algebra, a column vector with elements is an m \times 1 matrix consisting of a single column of entries, for example, \boldsymbol = \begin x_1 \\ x_2 \\ \vdots \\ x_m \end. Similarly, a row vector is a 1 \times n matrix for some , c ...

\mathbf = (X_1,\dots,X_n)^\mathsf

(or its

transpose In linear algebra, the transpose of a Matrix (mathematics), matrix is an operator which flips a matrix over its diagonal; that is, it switches the row and column indices of the matrix by producing another matrix, often denoted by (among other ...

, which is a row vector) whose components are

s on the

probability space In probability theory, a probability space or a probability triple (\Omega, \mathcal, P) is a mathematical construct that provides a formal model of a random process or "experiment". For example, one can define a probability space which models ...

(\Omega, \mathcal, P)

, where

\Omega

is the sample space,

\mathcal

is the sigma-algebra (the collection of all events), and

P

is the

probability measure In mathematics, a probability measure is a real-valued function defined on a set of events in a σ-algebra that satisfies Measure (mathematics), measure properties such as ''countable additivity''. The difference between a probability measure an ...

(a function returning each event's

Probability distribution

Every random vector gives rise to a probability measure on

\mathbb^n

with the Borel algebra as the underlying sigma-algebra. This measure is also known as the

joint probability distribution A joint or articulation (or articular surface) is the connection made between bones, ossicles, or other hard structures in the body which link an animal's skeletal system into a functional whole.Saladin, Ken. Anatomy & Physiology. 7th ed. McGraw- ...

, the joint distribution, or the multivariate distribution of the random vector. The distributions of each of the component random variables

X_i

are called

marginal distribution In probability theory and statistics, the marginal distribution of a subset of a collection of random variables is the probability distribution of the variables contained in the subset. It gives the probabilities of various values of the variable ...

s. The conditional probability distribution of

X_i

given

X_j

is the probability distribution of

X_i

when

X_j

is known to be a particular value. The cumulative distribution function

F_ : \R^n \mapsto,1 /math> of a random vector \mathbf=(X_1,\dots,X_n)^\mathsf is defined as where \mathbf = (x_1, \dots, x_n)^\mathsf .

Operations on random vectors

Random vectors can be subjected to the same kinds of algebraic operations as can non-random vectors: addition, subtraction, multiplication by a scalar, and the taking of inner products.

Affine transformations

Similarly, a new random vector

\mathbf

can be defined by applying an

affine transformation In Euclidean geometry, an affine transformation or affinity (from the Latin, '' affinis'', "connected with") is a geometric transformation that preserves lines and parallelism, but not necessarily Euclidean distances and angles. More general ...

g\colon \mathbb^n \to \mathbb^n

to a random vector

\mathbf

: :

\mathbf=\mathbf\mathbf+b

, where

\mathbf

is an

n \times  n

matrix and

b

is an

n \times  1

column vector. If

\mathbf

is an invertible matrix and

\textstyle\mathbf

has a probability density function

f_

, then the probability density of

\mathbf

is :

f_(y)=\frac

Invertible mappings

More generally we can study invertible mappings of random vectors. Let

g

be a one-to-one mapping from an open subset

\mathcal

\mathbb^n

onto a subset

\mathcal

\mathbb^n

, let

g

have continuous partial derivatives in

\mathcal

and let the Jacobian determinant

\det\left (\frac\right )

g

be zero at no point of

\mathcal

. Assume that the real random vector

\mathbf

has a probability density function

f_(\mathbf)

and satisfies

P(\mathbf \in \mathcal) = 1

. Then the random vector

\mathbf=g(\mathbf)

is of probability density :

\left. f_(\mathbf)=\frac \right , _ \mathbf(\mathbf \in R_\mathbf)

where

\mathbf

denotes the

indicator function In mathematics, an indicator function or a characteristic function of a subset of a set is a function that maps elements of the subset to one, and all other elements to zero. That is, if is a subset of some set , then the indicator functio ...

and set

R_\mathbf = \ \subseteq \mathcal

denotes support of

\mathbf

Expected value

The

expected value In probability theory, the expected value (also called expectation, expectancy, expectation operator, mathematical expectation, mean, expectation value, or first Moment (mathematics), moment) is a generalization of the weighted average. Informa ...

or mean of a random vector

\mathbf

is a fixed vector

\operatorname mathbf /math> whose elements are the expected values of the respective random variables.

Covariance and cross-covariance

Definitions

The covariance matrix (also called second central moment or variance-covariance matrix) of an

n \times 1

random vector is an

n \times n

matrix whose (''i,j'')^th element is the covariance between the ''i''^th and the ''j''^th random variables. The covariance matrix is the expected value, element by element, of the

n \times n

matrix computed as

mathbf-\operatorname[\mathbf mathbf-\operatorname[\mathbf^T

, where the superscript T refers to the transpose of the indicated vector: By extension, the cross-covariance matrix between two random vectors

\mathbf

and

\mathbf

(

\mathbf

having

n

elements and

\mathbf

having

p

elements) is the

n \times p

matrix where again the matrix expectation is taken element-by-element in the matrix. Here the (''i,j'')^th element is the covariance between the ''i''^th element of

\mathbf

and the ''j''^th element of

\mathbf

Properties

The covariance matrix is a

symmetric matrix In linear algebra, a symmetric matrix is a square matrix that is equal to its transpose. Formally, Because equal matrices have equal dimensions, only square matrices can be symmetric. The entries of a symmetric matrix are symmetric with ...

, i.e. :

\operatorname_^T = \operatorname_

. The covariance matrix is a positive semidefinite matrix, i.e. :

\mathbf^T \operatorname_ \mathbf \ge 0 \quad \text \mathbf \in \mathbb^n

. The cross-covariance matrix

\operatorname mathbf,\mathbf /math> is simply the transpose of the matrix \operatorname mathbf,\mathbf /math>, i.e.
: \operatorname_ = \operatorname_^T .

Uncorrelatedness

Two random vectors

\mathbf=(X_1,...,X_m)^T

and

\mathbf=(Y_1,...,Y_n)^T

are called uncorrelated if :

\operatorname mathbf \mathbf^T = \operatorname mathbf operatorname mathbf T

. They are uncorrelated if and only if their cross-covariance matrix

\operatorname_

is zero.

Correlation and cross-correlation

Definitions

The correlation matrix (also called second moment) of an

n \times 1

random vector is an

n \times n

matrix whose (''i,j'')^th element is the correlation between the ''i''^th and the ''j''^th random variables. The correlation matrix is the expected value, element by element, of the

n \times n

matrix computed as

\mathbf \mathbf^T

, where the superscript T refers to the transpose of the indicated vector: By extension, the cross-correlation matrix between two random vectors

\mathbf

and

\mathbf

(

\mathbf

having

n

elements and

\mathbf

having

p

elements) is the

n \times p

matrix

Properties

The correlation matrix is related to the covariance matrix by :

\operatorname_ = \operatorname_ + \operatorname mathbf operatorname mathbf T

. Similarly for the cross-correlation matrix and the cross-covariance matrix: :

\operatorname_ = \operatorname_ + \operatorname mathbf operatorname mathbf T

Orthogonality

Two random vectors of the same size

\mathbf=(X_1,...,X_n)^T

and

\mathbf=(Y_1,...,Y_n)^T

are called orthogonal if :

\operatorname mathbf^T \mathbf = 0

Independence

Two random vectors

\mathbf

and

\mathbf

are called independent if for all

\mathbf

and

\mathbf

F_(\mathbf) = F_(\mathbf) \cdot F_(\mathbf)

where

F_(\mathbf)

and

F_(\mathbf)

denote the cumulative distribution functions of

\mathbf

and

\mathbf

and

F_(\mathbf)

denotes their joint cumulative distribution function. Independence of

\mathbf

and

\mathbf

is often denoted by

\mathbf \perp\!\!\!\perp \mathbf

. Written component-wise,

\mathbf

and

\mathbf

are called independent if for all

x_1,\ldots,x_m,y_1,\ldots,y_n

F_(x_1,\ldots,x_m,y_1,\ldots,y_n) = F_(x_1,\ldots,x_m) \cdot F_(y_1,\ldots,y_n)

Characteristic function

The characteristic function of a random vector

\mathbf

with

n

components is a function

\mathbb^n \to \mathbb

that maps every vector

\mathbf = (\omega_1,\ldots,\omega_n)^T

to a complex number. It is defined by :

\varphi_(\mathbf) = \operatorname \left e^ \right = \operatorname \left e^ \right /math>.

Further properties

Expectation of a quadratic form

One can take the expectation of a

quadratic form In mathematics, a quadratic form is a polynomial with terms all of degree two (" form" is another name for a homogeneous polynomial). For example, 4x^2 + 2xy - 3y^2 is a quadratic form in the variables and . The coefficients usually belong t ...

in the random vector

\mathbf

as follows: :

+ \operatorname(A K_),

where

K_

is the covariance matrix of

\mathbf

and

\operatorname

refers to the trace of a matrix — that is, to the sum of the elements on its main diagonal (from upper left to lower right). Since the quadratic form is a scalar, so is its expectation. Proof: Let

\mathbf

be an

m \times 1

random vector with

\operatorname mathbf = \mu

and

\operatorname mathbf V

and let

A

be an

m \times m

non-stochastic matrix. Then based on the formula for the covariance, if we denote

\mathbf^T = \mathbf

and

\mathbf^T A^T = \mathbf

, we see that: :

\operatorname mathbf,\mathbf = \operatorname mathbf\mathbf^T \operatorname mathbf operatorname mathbf T

Hence :

+ \mu^T A \mu , \end

which leaves us to show that :

\operatorname(AV).

This is true based on the fact that one can cyclically permute matrices when taking a trace without changing the end result (e.g.:

\operatorname(AB) = \operatorname(BA)

). We see

that ''That'' is an English language word used for several grammar, grammatical purposes. These include use as an adjective, conjunction (grammar), conjunction, pronoun, adverb and intensifier; it has distance from the speaker, as opposed to words li ...

\end

And since :

\left(  \right)^T \left(  \right)

is a scalar, then :

(z - \mu)^T ( Az - A\mu)= \operatorname\left(  \right) = \operatorname \left((z - \mu )^T A(z - \mu ) \right)

trivially. Using the permutation we get: :

\operatorname\left(  \right) = \operatorname\left(  \right),

and by plugging this into the original formula we get: :

\\ &= \operatorname \left( \right) \\ &= \operatorname (A V). \end

Expectation of the product of two different quadratic forms

One can take the expectation of the product of two different quadratic forms in a zero-mean Gaussian random vector

\mathbf

as follows: :

= 2\operatorname(A K_ B K_) + \operatorname(A K_)\operatorname(B K_)

where again

K_

is the covariance matrix of

\mathbf

. Again, since both quadratic forms are scalars and hence their product is a scalar, the expectation of their product is also a scalar.

Applications

Portfolio theory

In portfolio theory in

finance Finance refers to monetary resources and to the study and Academic discipline, discipline of money, currency, assets and Liability (financial accounting), liabilities. As a subject of study, is a field of Business administration, Business Admin ...

, an objective often is to choose a portfolio of risky assets such that the distribution of the random portfolio return has desirable properties. For example, one might want to choose the portfolio return having the lowest variance for a given expected value. Here the random vector is the vector

\mathbf

of random returns on the individual assets, and the portfolio return ''p'' (a random scalar) is the inner product of the vector of random returns with a vector ''w'' of portfolio weights — the fractions of the portfolio placed in the respective assets. Since ''p'' = ''w''^T

\mathbf

, the expected value of the portfolio return is ''w''^TE(

\mathbf

) and the variance of the portfolio return can be shown to be ''w''^TC''w'', where C is the covariance matrix of

\mathbf

Regression theory

linear regression In statistics, linear regression is a statistical model, model that estimates the relationship between a Scalar (mathematics), scalar response (dependent variable) and one or more explanatory variables (regressor or independent variable). A mode ...

theory, we have data on ''n'' observations on a dependent variable ''y'' and ''n'' observations on each of ''k'' independent variables ''x_j''. The observations on the dependent variable are stacked into a column vector ''y''; the observations on each independent variable are also stacked into column vectors, and these latter column vectors are combined into a design matrix ''X'' (not denoting a random vector in this context) of observations on the independent variables. Then the following regression equation is postulated as a description of the process that generated the data: :

y = X \beta + e,

where β is a postulated fixed but unknown vector of ''k'' response coefficients, and ''e'' is an unknown random vector reflecting random influences on the dependent variable. By some chosen technique such as ordinary least squares, a vector

\hat \beta

is chosen as an estimate of β, and the estimate of the vector ''e'', denoted

\hat e

, is computed as :

\hat e = y - X \hat \beta.

Then the statistician must analyze the properties of

\hat \beta

and

\hat e

, which are viewed as random vectors since a randomly different selection of ''n'' cases to observe would have resulted in different values for them.

Vector time series

The evolution of a ''k''×1 random vector

\mathbf

through time can be modelled as a vector autoregression (VAR) as follows: :

\mathbf_t = c + A_1 \mathbf_ + A_2 \mathbf_ + \cdots + A_p \mathbf_ + \mathbf_t, \,

where the ''i''-periods-back vector observation

\mathbf_

is called the ''i''-th lag of

\mathbf

, ''c'' is a ''k'' × 1 vector of constants ( intercepts), ''A_i'' is a time-invariant ''k'' × ''k'' matrix and

\mathbf_t

is a ''k'' × 1 random vector of

error An error (from the Latin , meaning 'to wander'Oxford English Dictionary, s.v. “error (n.), Etymology,” September 2023, .) is an inaccurate or incorrect action, thought, or judgement. In statistics, "error" refers to the difference between t ...

terms.

Probability distribution

Operations on random vectors

Affine transformations

Invertible mappings

Expected value

Covariance and cross-covariance

Definitions

Properties

Uncorrelatedness

Correlation and cross-correlation

Definitions

Properties

Orthogonality

Independence

Characteristic function

Further properties

Expectation of a quadratic form

Expectation of the product of two different quadratic forms

Applications

Portfolio theory

Regression theory

Vector time series

References

Further reading