Law of total variance
   HOME

TheInfoList



OR:

In
probability theory Probability theory is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expressing it through a set o ...
, the law of total variance or variance decomposition formula or conditional variance formulas or law of iterated variances also known as Eve's law, states that if X and Y are
random variable A random variable (also called random quantity, aleatory variable, or stochastic variable) is a mathematical formalization of a quantity or object which depends on random events. It is a mapping or a function from possible outcomes (e.g., the po ...
s on the same
probability space In probability theory, a probability space or a probability triple (\Omega, \mathcal, P) is a mathematical construct that provides a formal model of a random process or "experiment". For example, one can define a probability space which models t ...
, and the
variance In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its population mean or sample mean. Variance is a measure of dispersion, meaning it is a measure of how far a set of numbers ...
of Y is finite, then \operatorname(Y) = \operatorname operatorname(Y \mid X)+ \operatorname(\operatorname \mid X. In language perhaps better known to statisticians than to probability theorists, the two terms are the "unexplained" and the "explained" components of the variance respectively (cf.
fraction of variance unexplained In statistics, the fraction of variance unexplained (FVU) in the context of a regression task is the fraction of variance of the regressand (dependent variable) ''Y'' which cannot be explained, i.e., which is not correctly predicted, by the e ...
,
explained variation In statistics, explained variation measures the proportion to which a mathematical model accounts for the variation (dispersion) of a given data set. Often, variation is quantified as variance; then, the more specific term explained variance can be ...
). In actuarial science, specifically
credibility theory Credibility theory is a form of statistical inference used to forecast an uncertain future event developed by Thomas Bayes. It is employed to combine multiple estimates into a summary estimate that takes into account information on the accuracy o ...
, the first component is called the expected value of the process variance (EVPV) and the second is called the variance of the hypothetical means (VHM). These two components are also the source of the term "Eve's law", from the initials EV VE for "expectation of variance" and "variance of expectation".


Formulation

There is a general variance decomposition formula for c \geq 2 components (see below).Bowsher, C.G. and P.S. Swain, Identifying sources of variation and the flow of information in biochemical networks, PNAS May 15, 2012 109 (20) E1320-E1328. For example, with two conditioning random variables: \operatorname = \operatorname\left operatorname\left(Y \mid X_1, X_2\right)\right+ \operatorname operatorname(\operatorname\left[Y_\mid_X_1,_X_2\right\mid_X_1).html" ;"title="_\mid_X_1,_X_2\right.html" ;"title="operatorname(\operatorname\left[Y \mid X_1, X_2\right">operatorname(\operatorname\left[Y \mid X_1, X_2\right\mid X_1)">_\mid_X_1,_X_2\right.html" ;"title="operatorname(\operatorname\left[Y \mid X_1, X_2\right">operatorname(\operatorname\left[Y \mid X_1, X_2\right\mid X_1)+ \operatorname(\operatorname\left[Y \mid X_1\right]), which follows from the law of total conditional variance: \operatorname(Y \mid X_1) = \operatorname \left[\operatorname(Y \mid X_1, X_2) \mid X_1\right] + \operatorname \left(\operatorname\left \mid X_1, X_2 \right\mid X_1\right). Note that the
conditional expected value In probability theory, the conditional expectation, conditional expected value, or conditional mean of a random variable is its expected value – the value it would take “on average” over an arbitrarily large number of occurrences – given ...
\operatorname(Y \mid X) is a random variable in its own right, whose value depends on the value of X. Notice that the conditional expected value of Y given the X = x is a function of x (this is where adherence to the conventional and rigidly case-sensitive notation of probability theory becomes important!). If we write \operatorname(Y \mid X = x) = g(x) then the random variable \operatorname(Y \mid X) is just g(X). Similar comments apply to the
conditional variance In probability theory and statistics, a conditional variance is the variance of a random variable given the value(s) of one or more other variables. Particularly in econometrics, the conditional variance is also known as the scedastic function or ...
. One special case, (similar to the
law of total expectation The proposition in probability theory known as the law of total expectation, the law of iterated expectations (LIE), Adam's law, the tower rule, and the smoothing theorem, among other names, states that if X is a random variable whose expected v ...
) states that if A_1, \ldots, A_n is a partition of the whole outcome space, that is, these events are mutually exclusive and exhaustive, then \begin \operatorname (X) = & \sum_^n \operatorname(X\mid A_i) \Pr(A_i) + \sum_^n \operatorname \mid A_i2 (1-\Pr(A_i))\Pr(A_i) \\ pt& - 2\sum_^n \sum_^ \operatorname \mid A_i\Pr(A_i)\operatorname \mid A_j\Pr(A_j). \end In this formula, the first component is the expectation of the conditional variance; the other two components are the variance of the conditional expectation.


Proof

The law of total variance can be proved using the
law of total expectation The proposition in probability theory known as the law of total expectation, the law of iterated expectations (LIE), Adam's law, the tower rule, and the smoothing theorem, among other names, states that if X is a random variable whose expected v ...
.Neil A. Weiss, ''A Course in Probability'', Addison–Wesley, 2005, pages 380–383. First, \operatorname = \operatorname\left ^2\right- \operatorname 2 from the definition of variance. Again, from the definition of variance, and applying the law of total expectation, we have \operatorname\left ^2\right= \operatorname\left operatorname[Y^2\mid_Xright.html" ;"title="^2\mid_X.html" ;"title="operatorname[Y^2\mid X">operatorname[Y^2\mid Xright">^2\mid_X.html" ;"title="operatorname[Y^2\mid X">operatorname[Y^2\mid Xright= \operatorname \left operatorname[Y_\mid_X+_[\operatorname[Y_\mid_X.html" ;"title="_\mid_X.html" ;"title="operatorname[Y \mid X">operatorname[Y \mid X+ [\operatorname[Y \mid X">_\mid_X.html" ;"title="operatorname[Y \mid X">operatorname[Y \mid X+ [\operatorname[Y \mid X^2\right]. Now we rewrite the conditional second moment of Y in terms of its variance and first moment, and apply the law of total expectation on the right hand side: \operatorname\left ^2\right- \operatorname 2 = \operatorname \left operatorname[Y_\mid_X+_[\operatorname[Y_\mid_X.html" ;"title="_\mid_X.html" ;"title="operatorname[Y \mid X">operatorname[Y \mid X+ [\operatorname[Y \mid X">_\mid_X.html" ;"title="operatorname[Y \mid X">operatorname[Y \mid X+ [\operatorname[Y \mid X^2\right] - [\operatorname [\operatorname \mid X]^2. Since the expectation of a sum is the sum of expectations, the terms can now be regrouped: = \left(\operatorname _\mid_X\right)_+_\left(\operatorname_\left[\operatorname[Y_\mid_X.html" ;"title="operatorname \mid X\right) + \left(\operatorname \left[\operatorname[Y \mid X">operatorname \mid X\right) + \left(\operatorname \left[\operatorname[Y \mid X2\right] - [\operatorname [\operatorname \mid X]^2\right). Finally, we recognize the terms in the second set of parentheses as the variance of the conditional expectation \operatorname \mid X/math>: = \operatorname [\operatorname \mid X + \operatorname [\operatorname \mid X.


General variance decomposition applicable to dynamic systems

The following formula shows how to apply the general, measure theoretic variance decomposition formula to stochastic dynamic systems. Let Y(t) be the value of a system variable at time t. Suppose we have the internal histories (natural filtrations) H_,H_,\ldots,H_, each one corresponding to the history (trajectory) of a different collection of system variables. The collections need not be disjoint. The variance of Y(t) can be decomposed, for all times t, into c \geq 2 components as follows: \begin \operatorname (t)= & \operatorname(\operatorname (t)\mid H_,H_,\ldots,H_ \\ pt& + \sum_^\operatorname(\operatorname operatorname[Y(t)\mid_H_,H_,\ldots,H_\mid_H_,H_,\ldots,H_.html" ;"title="(t)\mid_H_,H_,\ldots,H_.html" ;"title="operatorname[Y(t)\mid H_,H_,\ldots,H_">operatorname[Y(t)\mid H_,H_,\ldots,H_\mid H_,H_,\ldots,H_">(t)\mid_H_,H_,\ldots,H_.html" ;"title="operatorname[Y(t)\mid H_,H_,\ldots,H_">operatorname[Y(t)\mid H_,H_,\ldots,H_\mid H_,H_,\ldots,H_ \\ pt& + \operatorname(\operatorname[Y(t)\mid H_]). \end The decomposition is not unique. It depends on the order of the conditioning in the sequential decomposition.


The square of the correlation and explained (or informational) variation

In cases where (Y, X) are such that the conditional expected value is linear; that is, in cases where \operatorname(Y \mid X) = a X + b, it follows from the bilinearity of covariance that a= and b = \operatorname(Y)- \operatorname(X) and the explained component of the variance divided by the total variance is just the square of the
correlation In statistics, correlation or dependence is any statistical relationship, whether causal or not, between two random variables or bivariate data. Although in the broadest sense, "correlation" may indicate any type of association, in statistics ...
between Y and X; that is, in such cases, = \operatorname(X, Y)^2. One example of this situation is when (X, Y) have a bivariate normal (Gaussian) distribution. More generally, when the conditional expectation \operatorname(Y \mid X) is a non-linear function of X \iota_ = = \operatorname(\operatorname(Y \mid X), Y)^2, which can be estimated as the R squared from a non-linear regression of Y on X, using data drawn from the joint distribution of (X, Y). When \operatorname(Y \mid X) has a Gaussian distribution (and is an invertible function of X), or Y itself has a (marginal) Gaussian distribution, this explained component of variation sets a lower bound on the
mutual information In probability theory and information theory, the mutual information (MI) of two random variables is a measure of the mutual dependence between the two variables. More specifically, it quantifies the " amount of information" (in units such ...
: \operatorname(Y; X) \geq \ln \left( - \iota_\right).


Higher moments

A similar law for the third
central moment In probability theory and statistics, a central moment is a moment of a probability distribution of a random variable about the random variable's mean; that is, it is the expected value of a specified integer power of the deviation of the random ...
\mu_3 says \mu_3(Y)=\operatorname\left(\mu_3(Y \mid X)\right) + \mu_3(\operatorname(Y \mid X)) + 3\operatorname(\operatorname(Y \mid X), \operatorname(Y \mid X)). For higher
cumulant In probability theory and statistics, the cumulants of a probability distribution are a set of quantities that provide an alternative to the '' moments'' of the distribution. Any two probability distributions whose moments are identical will ha ...
s, a generalization exists. See
law of total cumulance In probability theory and mathematical statistics, the law of total cumulance is a generalization to cumulants of the law of total probability, the law of total expectation, and the law of total variance. It has applications in the analysis of t ...
.


See also

* − a generalization *


References

* * (Problem 34.10(b)) Algebra of random variables Statistical deviation and dispersion Articles containing proofs Theory of probability distributions Theorems in statistics Statistical laws