The proposition in

probability theory Probability theory is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expressing it through a set o ...

known as the law of total expectation, the law of iterated expectations (LIE), Adam's law, the tower rule, and the smoothing theorem, among other names, states that if

X

is a

random variable A random variable (also called random quantity, aleatory variable, or stochastic variable) is a mathematical formalization of a quantity or object which depends on random events. It is a mapping or a function from possible outcomes (e.g., the p ...

whose expected value

\operatorname(X)

is defined, and

Y

is any random variable on the same

probability space In probability theory, a probability space or a probability triple (\Omega, \mathcal, P) is a mathematical construct that provides a formal model of a random process or "experiment". For example, one can define a probability space which models t ...

, then :

\operatorname (X) = \operatorname ( \operatorname ( X \mid Y)),

i.e., the

expected value In probability theory, the expected value (also called expectation, expectancy, mathematical expectation, mean, average, or first moment) is a generalization of the weighted average. Informally, the expected value is the arithmetic mean of a ...

of the conditional expected value of

X

given

Y

is the same as the expected value of

X

. One special case states that if

_i

is a finite or countable partition of the sample space, then :

\operatorname (X) = \sum_i.

Note: The conditional expected value E(''X'' , ''Z'') is a random variable whose value depend on the value of ''Z''. Note that the conditional expected value of ''X'' given the ''event'' ''Z'' = ''z'' is a function of ''z''. If we write E(''X'' , ''Z'' = ''z'') = ''g''(''z'') then the random variable E(''X'' , ''Z'') is ''g''(''Z''). Similar comments apply to the conditional covariance.

Example

Suppose that only two factories supply light bulbs to the market. Factory

X

's bulbs work for an average of 5000 hours, whereas factory

Y

's bulbs work for an average of 4000 hours. It is known that factory

X

supplies 60% of the total bulbs available. What is the expected length of time that a purchased bulb will work for? Applying the law of total expectation, we have: :

&=4600 \end

where *

\operatorname (L)

is the expected life of the bulb; *

\operatorname(X)=

is the probability that the purchased bulb was manufactured by factory

X

; *

\operatorname(Y)=

is the probability that the purchased bulb was manufactured by factory

Y

; *

\operatorname(L \mid X)=5000

is the expected lifetime of a bulb manufactured by

X

; *

\operatorname(L \mid Y)=4000

is the expected lifetime of a bulb manufactured by

Y

. Thus each purchased light bulb has an expected lifetime of 4600 hours.

Proof in the finite and countable cases

Let the random variables

X

and

Y

, defined on the same probability space, assume a finite or countably infinite set of finite values. Assume that

\operatorname /math> is defined, i.e. \min (\operatorname_+ \operatorname_- < \infty . If \ is a partition of the probability space \Omega, then

: \operatorname (X) = \sum_i. Proof.
: \begin
\operatorname \left( \operatorname (X \mid Y) \right) &= \operatorname \Bigg \sum_x x \cdot \operatorname(X=x \mid Y) \Bigg \\ pt &=\sum_y \Bigg \sum_x x \cdot \operatorname(X=x \mid Y=y) \Bigg \cdot \operatorname(Y=y) \\ pt &=\sum_y \sum_x x \cdot \operatorname(X=x, Y=y).
\end If the series is finite, then we can switch the summations around, and the previous expression will become
: \begin
\sum_x \sum_y x \cdot \operatorname(X=x, Y=y)&=\sum_x x\sum_y \operatorname(X=x, Y=y)\\ pt &=\sum_x x \cdot \operatorname(X=x)\\ pt &=\operatorname(X).
\end If, on the other hand, the series is infinite, then its convergence cannot be

conditional Conditional (if then) may refer to: * Causal conditional, if X then Y, where X is a cause of Y * Conditional probability, the probability of an event A given that another event B has occurred *Conditional proof, in logic: a proof that asserts a ...

, due to the assumption that

) < \infty.

The series converges absolutely if both

\operatorname_+ /math> and \operatorname_- /math> are finite, and diverges to an infinity when either \operatorname_+ /math> or \operatorname_- /math> is infinite.  In both scenarios, the above summations may be exchanged without affecting the sum.

Proof in the general case

Let

(\Omega,\mathcal,\operatorname)

be a probability space on which two sub σ-algebras

\mathcal_1 \subseteq \mathcal_2 \subseteq \mathcal

are defined. For a random variable

X

on such a space, the smoothing law states that if

\operatorname /math> is defined, i.e. \min(\operatorname_+ \operatorname_- <\infty, then

: \operatorname \operatorname[X \mid \mathcal_2 \mid \mathcal_1">_\mid_\mathcal_2.html" ;"title="\operatorname[X \mid \mathcal_2">\operatorname[X \mid \mathcal_2\mid \mathcal_1= \operatorname[X \mid \mathcal_1]\quad\text. Proof. Since a conditional expectation is a Radon–Nikodym theorem, Radon–Nikodym derivative, verifying the following two properties establishes the smoothing law:

* \operatorname \operatorname[X \mid \mathcal_2 \mid \mathcal_1">_\mid_\mathcal_2.html" ;"title="\operatorname[X \mid \mathcal_2">\operatorname[X \mid \mathcal_2\mid \mathcal_1\mbox \mathcal_1 -measurable
* \int_ \operatorname \operatorname[X \mid \mathcal_2 \mid \mathcal_1">_\mid_\mathcal_2.html" ;"title="\operatorname[X \mid \mathcal_2">\operatorname[X \mid \mathcal_2\mid \mathcal_1d\operatorname = \int_ X d\operatorname, for all G_1 \in \mathcal_1. The first of these properties holds by definition of the conditional expectation. To prove the second one,

: \begin
\min\left(\int_X_+\, d\operatorname, \int_X_-\, d\operatorname\right) &\leq \min\left(\int_\Omega X_+\, d\operatorname, \int_\Omega X_-\, d\operatorname\right)\\[4pt]
&=\min(\operatorname_+ \operatorname_- < \infty,
\end so the integral \textstyle \int_X\, d\operatorname is defined (not equal \infty - \infty).

The second property thus holds since G_1 \in \mathcal_1 \subseteq \mathcal_2 implies
: \int_ \operatorname \operatorname[X \mid \mathcal_2 \mid \mathcal_1">_\mid_\mathcal_2.html" ;"title="\operatorname[X \mid \mathcal_2">\operatorname[X \mid \mathcal_2\mid \mathcal_1d\operatorname
= \int_ \operatorname[X \mid \mathcal_2] d\operatorname
= \int_ X d\operatorname. Corollary. In the special case when \mathcal_1 = \ and \mathcal_2 = \sigma(Y), the smoothing law reduces to
: \operatorname \operatorname[X \mid Y = \operatorname[X">_\mid_Y.html" ;"title="\operatorname[X \mid Y">\operatorname[X \mid Y = \operatorname[X Alternative proof for \operatorname \operatorname[X \mid Y = \operatorname[X">_\mid_Y.html" ;"title="\operatorname[X \mid Y">\operatorname[X \mid Y = \operatorname[X This is a simple consequence of the measure-theoretic definition of conditional expectation.  By definition, \operatorname[X \mid Y] := \operatorname[X \mid \sigma(Y)] is a \sigma(Y) -measurable random variable that satisfies
: \int_\operatorname[X \mid Y] d\operatorname = \int_ X d\operatorname, for every measurable set A \in \sigma(Y) . Taking A = \Omega proves the claim.

Proof of partition formula

\begin
\sum\limits_i\operatorname(X\mid A_i)\operatorname(A_i)
&=\sum\limits_i\int\limits_\Omega X(\omega)\operatorname(d\omega\mid A_i)\cdot\operatorname(A_i)\\
&=\sum\limits_i\int\limits_\Omega X(\omega)\operatorname(d\omega\cap A_i)\\
&=\sum\limits_i\int\limits_\Omega X(\omega)I_(\omega)\operatorname(d\omega)\\
&=\sum\limits_i\operatorname(XI_),
\end

where

I_

is the

indicator function In mathematics, an indicator function or a characteristic function of a subset of a set is a function that maps elements of the subset to one, and all other elements to zero. That is, if is a subset of some set , one has \mathbf_(x)=1 if x ...

of the set

A_i

. If the partition

_^n

is finite, then, by linearity, the previous expression becomes :

\operatorname\left(\sum\limits_^n XI_\right)=\operatorname(X),

and we are done. If, however, the partition

_^\infty

is infinite, then we use the dominated convergence theorem to show that :

\operatorname\left(\sum\limits_^n XI_\right)\to\operatorname(X).

Indeed, for every

n\geq 0

, :

\left, \sum_^n XI_\\leq , X, I_\leq , X, .

Since every element of the set

\Omega

falls into a specific partition

A_i

, it is straightforward to verify that the sequence

_^\infty

converges pointwise to

X

. By initial assumption,

\operatorname, X, <\infty

. Applying the dominated convergence theorem yields the desired result.

References

* (Theorem 34.4) * Christopher Sims
"Notes on Random Variables, Expectations, Probability Densities, and Martingales"
especially equations (16) through (18) {{DEFAULTSORT:Law Of Total Expectation Algebra of random variables Theory of probability distributions Statistical laws

Example

Proof in the finite and countable cases

Proof in the general case

Proof of partition formula

See also

References