The proposition in

probability theory Probability theory is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expressing it through a set ...

known as the law of total expectation, the law of iterated expectations (LIE), Adam's law, the tower rule, and the smoothing theorem, among other names, states that if

X

is a

random variable A random variable (also called random quantity, aleatory variable, or stochastic variable) is a mathematical formalization of a quantity or object which depends on random events. It is a mapping or a function from possible outcomes (e.g., the po ...

whose expected value

\operatorname(X)

is defined, and

Y

is any random variable on the same

probability space In probability theory, a probability space or a probability triple (\Omega, \mathcal, P) is a mathematical construct that provides a formal model of a random process or "experiment". For example, one can define a probability space which models t ...

, then :

\operatorname (X) = \operatorname ( \operatorname ( X \mid Y)),

i.e., the

expected value In probability theory, the expected value (also called expectation, expectancy, mathematical expectation, mean, average, or first moment) is a generalization of the weighted average. Informally, the expected value is the arithmetic mean of a ...

of the

conditional expected value In probability theory, the conditional expectation, conditional expected value, or conditional mean of a random variable is its expected value – the value it would take “on average” over an arbitrarily large number of occurrences – given ...

X

given

Y

is the same as the expected value of

X

. One special case states that if

_i

is a finite or

countable In mathematics, a set is countable if either it is finite or it can be made in one to one correspondence with the set of natural numbers. Equivalently, a set is ''countable'' if there exists an injective function from it into the natural numbers ...

partition of the

sample space In probability theory, the sample space (also called sample description space, possibility space, or outcome space) of an experiment or random trial is the set of all possible outcomes or results of that experiment. A sample space is usually den ...

, then :

\operatorname (X) = \sum_i.

Note: The

E(''X'' , ''Z'') is a random variable whose value depend on the value of ''Z''. Note that the conditional expected value of ''X'' given the ''event'' ''Z'' = ''z'' is a function of ''z''. If we write E(''X'' , ''Z'' = ''z'') = ''g''(''z'') then the random variable E(''X'' , ''Z'') is ''g''(''Z''). Similar comments apply to the conditional covariance.

Example

Suppose that only two factories supply

light bulb An electric light, lamp, or light bulb is an electrical component that produces light. It is the most common form of artificial lighting. Lamps usually have a base made of ceramic, metal, glass, or plastic, which secures the lamp in the soc ...

s to the market. Factory

X

's bulbs work for an average of 5000 hours, whereas factory

Y

's bulbs work for an average of 4000 hours. It is known that factory

X

supplies 60% of the total bulbs available. What is the expected length of time that a purchased bulb will work for? Applying the law of total expectation, we have: :

&=4600 \end

where *

\operatorname (L)

is the expected life of the bulb; *

\operatorname(X)=

is the probability that the purchased bulb was manufactured by factory

X

; *

\operatorname(Y)=

is the probability that the purchased bulb was manufactured by factory

Y

; *

\operatorname(L \mid X)=5000

is the expected lifetime of a bulb manufactured by

X

; *

\operatorname(L \mid Y)=4000

is the expected lifetime of a bulb manufactured by

Y

. Thus each purchased light bulb has an expected lifetime of 4600 hours.

Proof in the finite and countable cases

Let the random variables

X

and

Y

, defined on the same probability space, assume a finite or countably infinite set of finite values. Assume that

\operatorname /math> is defined, i.e. \min (\operatorname_+ \operatorname_- < \infty . If \ is a partition of the probability space \Omega, then

: \operatorname (X) = \sum_i. Proof.
: \begin
\operatorname \left( \operatorname (X \mid Y) \right) &= \operatorname \Bigg \sum_x x \cdot \operatorname(X=x \mid Y) \Bigg \\ pt &=\sum_y \Bigg \sum_x x \cdot \operatorname(X=x \mid Y=y) \Bigg \cdot \operatorname(Y=y) \\ pt &=\sum_y \sum_x x \cdot \operatorname(X=x, Y=y).
\end If the series is finite, then we can switch the summations around, and the previous expression will become
: \begin
\sum_x \sum_y x \cdot \operatorname(X=x, Y=y)&=\sum_x x\sum_y \operatorname(X=x, Y=y)\\ pt &=\sum_x x \cdot \operatorname(X=x)\\ pt &=\operatorname(X).
\end If, on the other hand, the series is infinite, then its convergence cannot be

conditional Conditional (if then) may refer to: *Causal conditional, if X then Y, where X is a cause of Y *Conditional probability, the probability of an event A given that another event B has occurred *Conditional proof, in logic: a proof that asserts a co ...

, due to the assumption that

) < \infty.

The series converges absolutely if both

\operatorname_+ /math> and \operatorname_- /math> are finite, and diverges to an infinity when either \operatorname_+ /math> or \operatorname_- /math> is infinite.  In both scenarios, the above summations may be exchanged without affecting the sum.

Proof in the general case

Let

(\Omega,\mathcal,\operatorname)

be a probability space on which two sub σ-algebras

\mathcal_1 \subseteq \mathcal_2 \subseteq \mathcal

are defined. For a random variable

X

on such a space, the smoothing law states that if

\operatorname /math> is defined, i.e. \min(\operatorname_+ \operatorname_- <\infty, then

: \operatorname \operatorname[X_\mid_\mathcal_2 \mid_\mathcal_1.html" ;"title="_\mid_\mathcal_2.html" ;"title="\operatorname[X \mid \mathcal_2">\operatorname[X \mid \mathcal_2\mid \mathcal_1">_\mid_\mathcal_2.html" ;"title="\operatorname[X \mid \mathcal_2">\operatorname[X \mid \mathcal_2\mid \mathcal_1= \operatorname[X \mid \mathcal_1]\quad\text. Proof. Since a conditional expectation is a Radon–Nikodym theorem, Radon–Nikodym derivative, verifying the following two properties establishes the smoothing law:

* \operatorname \operatorname[X_\mid_\mathcal_2 \mid_\mathcal_1.html" ;"title="_\mid_\mathcal_2.html" ;"title="\operatorname[X \mid \mathcal_2">\operatorname[X \mid \mathcal_2\mid \mathcal_1">_\mid_\mathcal_2.html" ;"title="\operatorname[X \mid \mathcal_2">\operatorname[X \mid \mathcal_2\mid \mathcal_1\mbox \mathcal_1 -measurable
* \int_ \operatorname \operatorname[X_\mid_\mathcal_2 \mid_\mathcal_1.html" ;"title="_\mid_\mathcal_2.html" ;"title="\operatorname[X \mid \mathcal_2">\operatorname[X \mid \mathcal_2\mid \mathcal_1">_\mid_\mathcal_2.html" ;"title="\operatorname[X \mid \mathcal_2">\operatorname[X \mid \mathcal_2\mid \mathcal_1d\operatorname = \int_ X d\operatorname, for all G_1 \in \mathcal_1. The first of these properties holds by definition of the conditional expectation. To prove the second one,

: \begin
\min\left(\int_X_+\, d\operatorname, \int_X_-\, d\operatorname\right) &\leq \min\left(\int_\Omega X_+\, d\operatorname, \int_\Omega X_-\, d\operatorname\right)\\[4pt]
&=\min(\operatorname_+ \operatorname_- < \infty,
\end so the integral \textstyle \int_X\, d\operatorname is defined (not equal \infty - \infty).

The second property thus holds since G_1 \in \mathcal_1 \subseteq \mathcal_2 implies
: \int_ \operatorname \operatorname[X_\mid_\mathcal_2 \mid_\mathcal_1.html" ;"title="_\mid_\mathcal_2.html" ;"title="\operatorname[X \mid \mathcal_2">\operatorname[X \mid \mathcal_2\mid \mathcal_1">_\mid_\mathcal_2.html" ;"title="\operatorname[X \mid \mathcal_2">\operatorname[X \mid \mathcal_2\mid \mathcal_1d\operatorname
= \int_ \operatorname[X \mid \mathcal_2] d\operatorname
= \int_ X d\operatorname. Corollary. In the special case when \mathcal_1 = \ and \mathcal_2 = \sigma(Y), the smoothing law reduces to
: \operatorname \operatorname[X_\mid_Y_=_\operatorname[X.html" ;"title="_\mid_Y.html" ;"title="\operatorname[X \mid Y">\operatorname[X \mid Y = \operatorname[X">_\mid_Y.html" ;"title="\operatorname[X \mid Y">\operatorname[X \mid Y = \operatorname[X Alternative proof for \operatorname \operatorname[X_\mid_Y_=_\operatorname[X.html" ;"title="_\mid_Y.html" ;"title="\operatorname[X \mid Y">\operatorname[X \mid Y = \operatorname[X">_\mid_Y.html" ;"title="\operatorname[X \mid Y">\operatorname[X \mid Y = \operatorname[X This is a simple consequence of the measure-theoretic definition of conditional expectation.  By definition, \operatorname[X \mid Y] := \operatorname[X \mid \sigma(Y)] is a \sigma(Y) -measurable random variable that satisfies
: \int_\operatorname[X \mid Y] d\operatorname = \int_ X d\operatorname, for every measurable set A \in \sigma(Y) . Taking A = \Omega proves the claim.

Proof of partition formula

\begin
\sum\limits_i\operatorname(X\mid A_i)\operatorname(A_i)
&=\sum\limits_i\int\limits_\Omega X(\omega)\operatorname(d\omega\mid A_i)\cdot\operatorname(A_i)\\
&=\sum\limits_i\int\limits_\Omega X(\omega)\operatorname(d\omega\cap A_i)\\
&=\sum\limits_i\int\limits_\Omega X(\omega)I_(\omega)\operatorname(d\omega)\\
&=\sum\limits_i\operatorname(XI_),
\end

where

I_

is the

indicator function In mathematics, an indicator function or a characteristic function of a subset of a set is a function that maps elements of the subset to one, and all other elements to zero. That is, if is a subset of some set , one has \mathbf_(x)=1 if x\i ...

of the set

A_i

. If the partition

_^n

is finite, then, by linearity, the previous expression becomes :

\operatorname\left(\sum\limits_^n XI_\right)=\operatorname(X),

and we are done. If, however, the partition

_^\infty

is infinite, then we use the

dominated convergence theorem In measure theory, Lebesgue's dominated convergence theorem provides sufficient conditions under which almost everywhere convergence of a sequence of functions implies convergence in the ''L''1 norm. Its power and utility are two of the primary t ...

to show that :

\operatorname\left(\sum\limits_^n XI_\right)\to\operatorname(X).

Indeed, for every

n\geq 0

, :

\left, \sum_^n XI_\\leq , X, I_\leq , X, .

Since every element of the set

\Omega

falls into a specific partition

A_i

, it is straightforward to verify that the sequence

_^\infty

converges pointwise to

X

. By initial assumption,

\operatorname, X, <\infty

. Applying the dominated convergence theorem yields the desired result.

References

* (Theorem 34.4) *

Christopher Sims Christopher Albert Sims (born October 21, 1942) is an American econometrician and macroeconomist. He is currently the John J.F. Sherrerd '52 University Professor of Economics at Princeton University. Together with Thomas Sargent, he won the ...

"Notes on Random Variables, Expectations, Probability Densities, and Martingales"
especially equations (16) through (18) {{DEFAULTSORT:Law Of Total Expectation Algebra of random variables Theory of probability distributions Statistical laws

Example

Proof in the finite and countable cases

Proof in the general case

Proof of partition formula

See also

References