probability theory Probability theory or probability calculus is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expre ...

and

statistics Statistics (from German language, German: ', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a s ...

, covariance is a measure of the joint variability of two

random variable A random variable (also called random quantity, aleatory variable, or stochastic variable) is a Mathematics, mathematical formalization of a quantity or object which depends on randomness, random events. The term 'random variable' in its mathema ...

s. The sign of the covariance, therefore, shows the tendency in the linear relationship between the variables. If greater values of one variable mainly correspond with greater values of the other variable, and the same holds for lesser values (that is, the variables tend to show similar behavior), the covariance is positive. In the opposite case, when greater values of one variable mainly correspond to lesser values of the other (that is, the variables tend to show opposite behavior), the covariance is negative. The magnitude of the covariance is the geometric mean of the variances that are in common for the two random variables. The

correlation coefficient A correlation coefficient is a numerical measure of some type of linear correlation, meaning a statistical relationship between two variables. The variables may be two columns of a given data set of observations, often called a sample, or two c ...

normalizes the covariance by dividing by the geometric mean of the total variances for the two random variables. A distinction must be made between (1) the covariance of two random variables, which is a

population Population is a set of humans or other organisms in a given region or area. Governments conduct a census to quantify the resident population size within a given jurisdiction. The term is also applied to non-human animals, microorganisms, and pl ...

parameter A parameter (), generally, is any characteristic that can help in defining or classifying a particular system (meaning an event, project, object, situation, etc.). That is, a parameter is an element of a system that is useful, or critical, when ...

that can be seen as a property of the

joint probability distribution A joint or articulation (or articular surface) is the connection made between bones, ossicles, or other hard structures in the body which link an animal's skeletal system into a functional whole.Saladin, Ken. Anatomy & Physiology. 7th ed. McGraw- ...

, and (2) the sample covariance, which in addition to serving as a descriptor of the sample, also serves as an estimated value of the population parameter.

Definition

For two jointly distributed real-valued

X

and

Y

with finite second moments, the covariance is defined as the

expected value In probability theory, the expected value (also called expectation, expectancy, expectation operator, mathematical expectation, mean, expectation value, or first Moment (mathematics), moment) is a generalization of the weighted average. Informa ...

(or mean) of the product of their deviations from their individual expected values:

\operatorname(X, Y) = \operatorname

where

\operatorname /math> is the expected value of X, also known as the mean of X . The covariance is also sometimes denoted \sigma_or \sigma(X,Y), in analogy to

variance In probability theory and statistics, variance is the expected value of the squared deviation from the mean of a random variable. The standard deviation (SD) is obtained as the square root of the variance. Variance is a measure of dispersion ...

. By using the linearity property of expectations, this can be simplified to the expected value of their product minus the product of their expected values:

\end

This identity is useful for mathematical derivations. From the viewpoint of numerical computation, however, it is susceptible to

catastrophic cancellation In numerical analysis, catastrophic cancellation is the phenomenon that subtracting good approximations to two nearby numbers may yield a very bad approximation to the difference of the original numbers. For example, if there are two studs, one L ...

(see the section on

numerical computation Numerical analysis is the study of algorithms that use numerical approximation (as opposed to symbolic manipulations) for the problems of mathematical analysis (as distinguished from discrete mathematics). It is the study of numerical methods t ...

below). The

units of measurement A unit of measurement, or unit of measure, is a definite magnitude (mathematics), magnitude of a quantity, defined and adopted by convention or by law, that is used as a standard for measurement of the same kind of quantity. Any other qua ...

of the covariance

\operatorname(X, Y)

are those of

X

times those of

Y

. By contrast, correlation coefficients, which depend on the covariance, are a

dimensionless Dimensionless quantities, or quantities of dimension one, are quantities implicitly defined in a manner that prevents their aggregation into units of measurement. ISBN 978-92-822-2272-0. Typically expressed as ratios that align with another sy ...

measure of linear dependence. (In fact, correlation coefficients can simply be understood as a normalized version of covariance.)

Complex random variables

The covariance between two

complex random variable In probability theory and statistics, complex random variables are a generalization of real-valued random variables to complex numbers, i.e. the possible values a complex random variable may take are complex numbers. Complex random variables can al ...

Z, W

is defined as

\overline\right">.html" ;"title="Z - \operatorname[Z">Z - \operatorname[Z\overline\right= \operatorname\left[Z\overline\right] - \operatorname[Z]\operatorname\left[\overline\right]

Notice the complex conjugation of the second factor in the definition. A related ''pseudo-covariance'' can also be defined.

Discrete random variables

If the (real) random variable pair

(X,Y)

can take on the values

(x_i,y_i)

for

i = 1,\ldots,n

, with equal probabilities

p_i=1/n

, then the covariance can be equivalently written in terms of the means

\operatorname /math> and \operatorname /math> as \operatorname (X,Y) = \frac\sum_^n (x_i-E(X)) (y_i-E(Y)). It can also be equivalently expressed, without directly referring to the means, as \operatorname(X,Y) = \frac \sum_^n \sum_^n \frac(x_i - x_j)(y_i - y_j) = \frac \sum_i \sum_ (x_i-x_j)(y_i - y_j). More generally, if there are n possible realizations of (X,Y), namely (x_i,y_i) but with possibly unequal probabilities p_i for i = 1,\ldots,n, then the covariance is \operatorname (X,Y) = \sum_^n p_i (x_i-E(X)) (y_i-E(Y)). In the case where two discrete random variables X and Y have a joint probability distribution, represented by elements p_corresponding to the joint probabilities of P( X = x_i, Y = y_j ), the covariance is calculated using a double summation over the indices of the matrix: \operatorname (X, Y) = \sum_^\sum_^ p_ (x_i - E (y_j - E .

Examples

Consider three independent random variables

A, B, C

and two constants

q, r

\begin
X &= qA + B \\
Y &= rA + C \\
\operatorname(X, Y)
&= qr \operatorname(A)
\end

In the special case,

q=1

and

r=1

, the covariance between

X

and

Y

is just the variance of

A

and the name covariance is entirely appropriate. Covariance_geometric_visualisation

Suppose that

X

and

Y

have the following joint probability mass function, in which the six central cells give the discrete joint probabilities

f(x, y)

of the six hypothetical realizations

X

can take on three values (5, 6 and 7) while

Y

can take on two (8 and 9). Their means are

\mu_X = 5(0.3) + 6(0.4) + 7(0.1 + 0.2) = 6

and

\mu_Y = 8(0.4 + 0.1) + 9(0.3 + 0.2) = 8.5

. Then,

= & \; . \end

Properties

Covariance with itself

The

is a special case of the covariance in which the two variables are identical:

\operatorname(X, X) = \operatorname(X)\equiv\sigma^2(X)\equiv\sigma_X^2.

Covariance of linear combinations

X

Y

W

, and

V

are real-valued random variables and

a,b,c,d

are real-valued constants, then the following facts are a consequence of the definition of covariance:

\begin
    \operatorname(X, a) &= 0 \\
    \operatorname(X, X) &= \operatorname(X) \\
    \operatorname(X, Y) &= \operatorname(Y, X) \\
    \operatorname(aX, bY) &= ab\, \operatorname(X, Y) \\
    \operatorname(X+a, Y+b) &= \operatorname(X, Y) \\ 
    \operatorname(aX+bY, cW+dV) &= ac\,\operatorname(X,W)+ad\,\operatorname(X,V)+bc\,\operatorname(Y,W)+bd\,\operatorname(Y,V)
\end

For a sequence

X_1,\ldots,X_n

of random variables in real-valued, and constants

a_1,\ldots,a_n

, we have

\operatorname\left(\sum_^n a_iX_i \right) = \sum_^n a_i^2\sigma^2(X_i) + 2\sum_ a_ia_j\operatorname(X_i,X_j) = \sum_

Hoeffding's covariance identity

A useful identity to compute the covariance between two random variables

X, Y

is the Hoeffding's covariance identity:

\operatorname(X, Y) = \int_\mathbb R \int_\mathbb R \left(F_(x, y) - F_X(x)F_Y(y)\right) \,dx \,dy

where

F_(x,y)

is the joint cumulative distribution function of the random vector

(X, Y)

and

F_X(x), F_Y(y)

are the marginals.

Uncorrelatedness and independence

Random variables whose covariance is zero are called

uncorrelated In probability theory and statistics, two real-valued random variables, X, Y, are said to be uncorrelated if their covariance, \operatorname ,Y= \operatorname Y- \operatorname \operatorname /math>, is zero. If two variables are uncorrelated, ther ...

. Similarly, the components of random vectors whose

covariance matrix In probability theory and statistics, a covariance matrix (also known as auto-covariance matrix, dispersion matrix, variance matrix, or variance–covariance matrix) is a square matrix giving the covariance between each pair of elements of ...

is zero in every entry outside the main diagonal are also called uncorrelated. If

X

and

Y

are independent random variables, then their covariance is zero. This follows because under independence,

\operatorname Y = \operatorname \cdot \operatorname

The converse, however, is not generally true. For example, let

X

be uniformly distributed in

1,1 /math> and let Y = X^2 . Clearly, X and Y are not independent, but \begin
  \operatorname(X, Y) &= \operatorname\left(X, X^2\right) \\
         &= \operatorname\left \cdot X^2\right - \operatorname \cdot \operatorname\left^2\right \\
         &= \operatorname\left^3\right - \operatorname operatorname\left^2\right \\
         &= 0 - 0 \cdot \operatorname^2 \\
         &= 0.  
\end In this case, the relationship between Y and X is non-linear, while correlation and covariance are measures of linear dependence between two random variables. This example shows that if two random variables are uncorrelated, that does not in general imply that they are independent. However, if two variables are jointly normally distributed (but not if they are merely individually normally distributed), uncorrelatedness ''does'' imply independence. X and Y whose covariance is positive are called positively correlated, which implies if X>E /math> then likely Y>E /math>. Conversely, X and Y with negative covariance are negatively correlated, and if X>E /math> then likely Y /math>.

Relationship to inner products

Many of the properties of covariance can be extracted elegantly by observing that it satisfies similar properties to those of an

inner product In mathematics, an inner product space (or, rarely, a Hausdorff pre-Hilbert space) is a real vector space or a complex vector space with an operation called an inner product. The inner product of two vectors in the space is a scalar, ofte ...

: # bilinear: for constants

a

and

b

and random variables

X,Y,Z,

\operatorname(aX+bY,Z) = a \operatorname(X,Z) + b \operatorname(Y,Z)

# symmetric:

\operatorname(X,Y) = \operatorname(Y,X)

# positive semi-definite:

\sigma^2(X) = \operatorname(X,X) \ge 0

for all random variables

X

, and

\operatorname(X,X) = 0

implies that

X

is constant

almost surely In probability theory, an event is said to happen almost surely (sometimes abbreviated as a.s.) if it happens with probability 1 (with respect to the probability measure). In other words, the set of outcomes on which the event does not occur ha ...

. In fact these properties imply that the covariance defines an inner product over the quotient vector space obtained by taking the subspace of random variables with finite second moment and identifying any two that differ by a constant. (This identification turns the positive semi-definiteness above into positive definiteness.) That quotient vector space is isomorphic to the subspace of random variables with finite second moment and mean zero; on that subspace, the covariance is exactly the L² inner product of real-valued functions on the sample space. As a result, for random variables with finite variance, the inequality

\left, \operatorname(X, Y)\ \le \sqrt

holds via the

Cauchy–Schwarz inequality The Cauchy–Schwarz inequality (also called Cauchy–Bunyakovsky–Schwarz inequality) is an upper bound on the absolute value of the inner product between two vectors in an inner product space in terms of the product of the vector norms. It is ...

. Proof: If

\sigma^2(Y) = 0

, then it holds trivially. Otherwise, let random variable

Z = X - \frac Y.

Then we have

&= \sigma^2(X) - \frac \\ \implies (\operatorname(X, Y))^2 &\le \sigma^2(X)\sigma^2(Y) \\ \left, \operatorname(X, Y)\ &\le \sqrt \end

Calculating the sample covariance

The sample covariances among

K

variables based on

N

observations of each, drawn from an otherwise unobserved population, are given by the

K \times K

matrix Matrix (: matrices or matrixes) or MATRIX may refer to: Science and mathematics * Matrix (mathematics), a rectangular array of numbers, symbols or expressions * Matrix (logic), part of a formula in prenex normal form * Matrix (biology), the m ...

\textstyle \overline = \left_\right /math> with the entries

: q_ = \frac\sum_^N \left(X_ - \bar_j\right) \left(X_ - \bar_k\right), which is an estimate of the covariance between variable j and variable k .

The sample mean and the sample covariance matrix are unbiased estimates of the

mean A mean is a quantity representing the "center" of a collection of numbers and is intermediate to the extreme values of the set of numbers. There are several kinds of means (or "measures of central tendency") in mathematics, especially in statist ...

and the covariance matrix of the

random vector In probability, and statistics, a multivariate random variable or random vector is a list or vector of mathematical variables each of whose value is unknown, either because the value has not yet occurred or because there is imperfect knowledge ...

\textstyle \mathbf

, a vector whose ''j''th element

(j = 1,\, \ldots,\, K)

is one of the random variables. The reason the sample covariance matrix has

\textstyle N-1

in the denominator rather than

\textstyle N

is essentially that the population mean

\operatorname(\mathbf)

is not known and is replaced by the sample mean

\mathbf

. If the population mean

\operatorname(\mathbf)

is known, the analogous unbiased estimate is given by :

q_ = \frac \sum_^N \left(X_ - \operatorname\left(X_j\right)\right) \left(X_ - \operatorname\left(X_k\right)\right)

Generalizations

Auto-covariance matrix of real random vectors

For a vector

\mathbf = \begin X_1 & X_2 & \dots & X_m \end^\mathrm

m

jointly distributed random variables with finite second moments, its auto-covariance matrix (also known as the variance–covariance matrix or simply the covariance matrix)

\operatorname_

(also denoted by

\Sigma(\mathbf)

\operatorname(\mathbf, \mathbf)

) is defined as

- \operatorname[\mathbf]\operatorname[\mathbf]^\mathrm. \end

Let

\mathbf

be a

with covariance matrix , and let be a matrix that can act on

\mathbf

on the left. The covariance matrix of the matrix-vector product is:

right)\mathbf^\mathrm \\ &= \mathbf\Sigma\mathbf^\mathrm. \end

This is a direct result of the linearity of expectation and is useful when applying a

linear transformation In mathematics, and more specifically in linear algebra, a linear map (also called a linear mapping, linear transformation, vector space homomorphism, or in some contexts linear function) is a mapping V \to W between two vector spaces that pr ...

, such as a

whitening transformation A whitening transformation or sphering transformation is a linear transformation that transforms a vector of random variables with a known covariance matrix into a set of new variables whose covariance is the identity matrix, meaning that they ar ...

, to a vector.

Cross-covariance matrix of real random vectors

For real

\mathbf \in \mathbb^m

and

\mathbf \in \mathbb^n

, the

m \times n

cross-covariance matrix is equal to where

\mathbf^

is the

transpose In linear algebra, the transpose of a Matrix (mathematics), matrix is an operator which flips a matrix over its diagonal; that is, it switches the row and column indices of the matrix by producing another matrix, often denoted by (among other ...

of the vector (or matrix)

\mathbf

. The

(i,j)

-th element of this matrix is equal to the covariance

\operatorname(X_i,Y_j)

between the -th scalar component of

\mathbf

and the -th scalar component of

\mathbf

. In particular,

\operatorname(\mathbf,\mathbf)

is the

\operatorname(\mathbf,\mathbf)

Cross-covariance sesquilinear form of random vectors in a real or complex Hilbert space

More generally let

H_1 = (H_1, \langle \,,\rangle_1)

and

H_2 = (H_2, \langle \,,\rangle_2)

, be

Hilbert space In mathematics, a Hilbert space is a real number, real or complex number, complex inner product space that is also a complete metric space with respect to the metric induced by the inner product. It generalizes the notion of Euclidean space. The ...

s over

\mathbb

\mathbb

with

\langle \,, \rangle

anti linear in the first variable, and let

\mathbf, \mathbf

H_1

resp.

H_2

valued random variables. Then the covariance of

\mathbf

and

\mathbf

is the sesquilinear form on

H_1 \times H_2

(anti linear in the first variable) given by

(\mathbf - \operatorname[\mathbf">mathbf.html" ;"title="\mathbf - \operatorname[\mathbf">\mathbf - \operatorname[\mathbf(\mathbf - \operatorname[\mathbf^\dagger \right]h_2 \rangle_1\\ &= \langle h_1, \left( \operatorname[\mathbf\mathbf^\dagger] - \operatorname[\mathbf]\operatorname[\mathbf]^\dagger \right) h_2 \rangle_1\\ \end

Numerical computation

When

\operatorname Y \approx \operatorname operatorname /math>, the equation \operatorname(X, Y) = \operatorname\left Y\right - \operatorname\left \right \operatorname\left \right /math> is prone to

\operatorname\left Y\right /math> and \operatorname\left \right \operatorname\left \right /math> are not computed exactly and thus should be avoided in computer programs when the data has not been centered before. Numerically stable algorithms should be preferred in this case.

Comments

The covariance is sometimes called a measure of "linear dependence" between the two random variables. That does not mean the same thing as in the context of

linear algebra Linear algebra is the branch of mathematics concerning linear equations such as :a_1x_1+\cdots +a_nx_n=b, linear maps such as :(x_1, \ldots, x_n) \mapsto a_1x_1+\cdots +a_nx_n, and their representations in vector spaces and through matrix (mathemat ...

(see

linear dependence In the theory of vector spaces, a set of vectors is said to be if there exists no nontrivial linear combination of the vectors that equals the zero vector. If such a linear combination exists, then the vectors are said to be . These concep ...

). When the covariance is normalized, one obtains the

Pearson correlation coefficient In statistics, the Pearson correlation coefficient (PCC) is a correlation coefficient that measures linear correlation between two sets of data. It is the ratio between the covariance of two variables and the product of their standard deviatio ...

, which gives the goodness of the fit for the best possible linear function describing the relation between the variables. In this sense covariance is a linear gauge of dependence.

Applications

In genetics and molecular biology

Covariance is an important measure in

biology Biology is the scientific study of life and living organisms. It is a broad natural science that encompasses a wide range of fields and unifying principles that explain the structure, function, growth, History of life, origin, evolution, and ...

. Certain sequences of

DNA Deoxyribonucleic acid (; DNA) is a polymer composed of two polynucleotide chains that coil around each other to form a double helix. The polymer carries genetic instructions for the development, functioning, growth and reproduction of al ...

are conserved more than others among species, and thus to study secondary and tertiary structures of

protein Proteins are large biomolecules and macromolecules that comprise one or more long chains of amino acid residue (biochemistry), residues. Proteins perform a vast array of functions within organisms, including Enzyme catalysis, catalysing metab ...

s, or of

RNA Ribonucleic acid (RNA) is a polymeric molecule that is essential for most biological functions, either by performing the function itself (non-coding RNA) or by forming a template for the production of proteins (messenger RNA). RNA and deoxyrib ...

structures, sequences are compared in closely related species. If sequence changes are found or no changes at all are found in noncoding RNA (such as

microRNA Micro ribonucleic acid (microRNA, miRNA, μRNA) are small, single-stranded, non-coding RNA molecules containing 21–23 nucleotides. Found in plants, animals, and even some viruses, miRNAs are involved in RNA silencing and post-transcr ...

), sequences are found to be necessary for common structural motifs, such as an RNA loop. In genetics, covariance serves a basis for computation of Genetic Relationship Matrix (GRM) (aka kinship matrix), enabling inference on population structure from sample with no known close relatives as well as inference on estimation of heritability of complex traits. In the theory of

evolution Evolution is the change in the heritable Phenotypic trait, characteristics of biological populations over successive generations. It occurs when evolutionary processes such as natural selection and genetic drift act on genetic variation, re ...

and

natural selection Natural selection is the differential survival and reproduction of individuals due to differences in phenotype. It is a key mechanism of evolution, the change in the Heredity, heritable traits characteristic of a population over generation ...

, the price equation describes how a genetic trait changes in frequency over time. The equation uses a covariance between a trait and fitness, to give a mathematical description of evolution and natural selection. It provides a way to understand the effects that gene transmission and natural selection have on the proportion of genes within each new generation of a population.

In financial economics

Covariances play a key role in

financial economics Financial economics is the branch of economics characterized by a "concentration on monetary activities", in which "money of one type or another is likely to appear on ''both sides'' of a trade".William F. Sharpe"Financial Economics", in Its co ...

, especially in

modern portfolio theory Modern portfolio theory (MPT), or mean-variance analysis, is a mathematical framework for assembling a portfolio of assets such that the expected return is maximized for a given level of risk. It is a formalization and extension of Diversificatio ...

and in the

capital asset pricing model In finance, the capital asset pricing model (CAPM) is a model used to determine a theoretically appropriate required rate of return of an asset, to make decisions about adding assets to a Diversification (finance), well-diversified Portfolio (f ...

. Covariances among various assets' returns are used to determine, under certain assumptions, the relative amounts of different assets that investors should (in a normative analysis) or are predicted to (in a positive analysis) choose to hold in a context of diversification.

In meteorological and oceanographic data assimilation

The covariance matrix is important in estimating the initial conditions required for running weather forecast models, a procedure known as data assimilation. The "forecast error covariance matrix" is typically constructed between perturbations around a mean state (either a climatological or ensemble mean). The "observation error covariance matrix" is constructed to represent the magnitude of combined observational errors (on the diagonal) and the correlated errors between measurements (off the diagonal). This is an example of its widespread application to Kalman filtering and more general state estimation for time-varying systems.

In micrometeorology

The

eddy covariance The eddy covariance (also known as eddy correlation and eddy flux) is a key atmospheric measurement technique to measure and calculate vertical turbulent fluxes within planetary boundary layer, atmospheric boundary layers. The method analyses hig ...

technique is a key atmospherics measurement technique where the covariance between instantaneous deviation in vertical wind speed from the mean value and instantaneous deviation in gas concentration is the basis for calculating the vertical turbulent fluxes.

In signal processing

The covariance matrix is used to capture the spectral variability of a signal.

In statistics and image processing

The covariance matrix is used in

principal component analysis Principal component analysis (PCA) is a linear dimensionality reduction technique with applications in exploratory data analysis, visualization and data preprocessing. The data is linearly transformed onto a new coordinate system such that th ...

to reduce feature dimensionality in data preprocessing.

References

{{statistics Covariance and correlation Algebra of random variables