probability theory Probability theory or probability calculus is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expre ...

, the chain rule (also called the general product rule) describes how to calculate the probability of the intersection of, not necessarily

independent Independent or Independents may refer to: Arts, entertainment, and media Artist groups * Independents (artist group), a group of modernist painters based in Pennsylvania, United States * Independentes (English: Independents), a Portuguese artist ...

, events or the

joint distribution A joint or articulation (or articular surface) is the connection made between bones, ossicles, or other hard structures in the body which link an animal's skeletal system into a functional whole.Saladin, Ken. Anatomy & Physiology. 7th ed. McGraw- ...

random variables A random variable (also called random quantity, aleatory variable, or stochastic variable) is a mathematical formalization of a quantity or object which depends on random events. The term 'random variable' in its mathematical definition refers ...

respectively, using conditional probabilities. This rule allows one to express a joint probability in terms of only conditional probabilities. The rule is notably used in the context of discrete

stochastic processes In probability theory and related fields, a stochastic () or random process is a mathematical object usually defined as a family of random variables in a probability space, where the index of the family often has the interpretation of time. Stoc ...

and in applications, e.g. the study of

Bayesian network A Bayesian network (also known as a Bayes network, Bayes net, belief network, or decision network) is a probabilistic graphical model that represents a set of variables and their conditional dependencies via a directed acyclic graph (DAG). Whi ...

s, which describe a

probability distribution In probability theory and statistics, a probability distribution is a Function (mathematics), function that gives the probabilities of occurrence of possible events for an Experiment (probability theory), experiment. It is a mathematical descri ...

in terms of conditional probabilities.

Chain rule for events

Two events

For two events

A

and

B

, the chain rule states that :

\mathbb P(A \cap B) = \mathbb P(B \mid A) \mathbb P(A)

, where

\mathbb P(B \mid A)

denotes the

conditional probability In probability theory, conditional probability is a measure of the probability of an Event (probability theory), event occurring, given that another event (by assumption, presumption, assertion or evidence) is already known to have occurred. This ...

B

given

A

Example

An Urn A has 1 black ball and 2 white balls and another Urn B has 1 black ball and 3 white balls. Suppose we pick an urn at random and then select a ball from that urn. Let event

A

be choosing the first urn, i.e.

\mathbb P(A) = \mathbb P(\overline) = 1/2

, where

\overline A

is the

complementary event In probability theory, the complement of any event ''A'' is the event ot ''A'' i.e. the event that ''A'' does not occur.Robert R. Johnson, Patricia J. Kuby: ''Elementary Statistics''. Cengage Learning 2007, , p. 229 () The event ''A'' and ...

A

. Let event

B

be the chance we choose a white ball. The chance of choosing a white ball, given that we have chosen the first urn, is

\mathbb P(B, A) = 2/3.

The intersection

A \cap B

then describes choosing the first urn and a white ball from it. The probability can be calculated by the chain rule as follows: :

\mathbb P(A \cap B) = \mathbb P(B \mid A) \mathbb P(A) = \frac 23 \cdot \frac 12 = \frac 13.

Finitely many events

For events

A_1,\ldots,A_n

whose intersection has not probability zero, the chain rule states :

\begin
\mathbb P\left(A_1 \cap A_2 \cap \ldots \cap A_n\right) 
&= \mathbb P\left(A_n \mid A_1 \cap \ldots \cap A_\right) \mathbb P\left(A_1 \cap \ldots \cap A_\right) \\
&= \mathbb P\left(A_n \mid A_1 \cap \ldots \cap A_\right) \mathbb P\left(A_ \mid A_1 \cap \ldots \cap A_\right) \mathbb P\left(A_1 \cap \ldots \cap A_\right) \\
&= \mathbb P\left(A_n \mid A_1 \cap \ldots \cap A_\right) \mathbb P\left(A_ \mid A_1 \cap \ldots \cap A_\right) \cdot \ldots \cdot \mathbb P(A_3 \mid A_1 \cap A_2) \mathbb P(A_2 \mid A_1) \mathbb P(A_1)\\
&= \mathbb P(A_1) \mathbb P(A_2 \mid A_1) \mathbb P(A_3 \mid A_1 \cap A_2) \cdot \ldots \cdot \mathbb P(A_n \mid A_1 \cap \dots \cap A_)\\
&= \prod_^n \mathbb P(A_k \mid A_1 \cap \dots \cap A_)\\
&= \prod_^n \mathbb  P\left(A_k \,\Bigg, \, \bigcap_^ A_j\right).
\end

Example 1

For

n=4

, i.e. four events, the chain rule reads :

\begin
\mathbb P(A_1 \cap A_2 \cap A_3 \cap A_4) &= \mathbb P(A_4 \mid A_3 \cap A_2 \cap A_1)\mathbb P(A_3 \cap A_2 \cap A_1) \\
&= \mathbb P(A_4 \mid A_3 \cap A_2 \cap A_1)\mathbb P(A_3 \mid A_2 \cap A_1)\mathbb P(A_2 \cap A_1) \\
&= \mathbb P(A_4 \mid A_3 \cap A_2 \cap A_1)\mathbb P(A_3 \mid A_2 \cap A_1)\mathbb P(A_2 \mid A_1)\mathbb P(A_1)
\end

Example 2

We randomly draw 4 cards (one at a time) without replacement from deck with 52 cards. What is the probability that we have picked 4 aces? First, we set

A_n := \left\

. Obviously, we get the following probabilities :

\mathbb P(A_1) = \frac 4, 
\qquad
\mathbb P(A_2 \mid A_1) = \frac 3, 
\qquad
\mathbb P(A_3 \mid A_1 \cap A_2) = \frac 2, 
\qquad
\mathbb P(A_4 \mid A_1 \cap A_2 \cap A_3) = \frac 1

. Applying the chain rule, :

\mathbb P(A_1 \cap A_2 \cap A_3 \cap A_4) 
= \frac 4 \cdot \frac 3 \cdot \frac 2 \cdot \frac 1 = \frac

Statement of the theorem and proof

Let

(\Omega, \mathcal A, \mathbb P)

be a probability space. Recall that the

of an

A \in \mathcal A

given

B \in \mathcal A

is defined as :

\begin
\mathbb P(A \mid B) := 
\begin \frac, & \mathbb P(B) > 0,\\ 0 & \mathbb P(B) = 0. \end
\end

Then we have the following theorem.

Chain rule for discrete random variables

Two random variables

For two discrete random variables

X,Y

, we use the events

A := \

and

B := \

in the definition above, and find the joint distribution as :

\mathbb P(X = x,Y = y) = \mathbb P(X = x\mid Y = y) \mathbb P(Y = y),

or :

\mathbb P_(x,y) = \mathbb P_(x\mid y) \mathbb P_Y(y),

where

\mathbb P_X(x) := \mathbb P(X = x)

is the

X

and

\mathbb P_(x\mid y)

conditional probability distribution In probability theory and statistics, the conditional probability distribution is a probability distribution that describes the probability of an outcome given the occurrence of a particular event. Given two jointly distributed random variables X ...

X

given

Y

Finitely many random variables

Let

X_1, \ldots , X_n

be random variables and

x_1, \dots, x_n \in \mathbb R

. By the definition of the conditional probability, :

\mathbb P\left(X_n=x_n, \ldots , X_1=x_1\right) = \mathbb P\left(X_n=x_n ,  X_=x_, \ldots , X_1=x_1\right) \mathbb P\left(X_=x_, \ldots , X_1=x_1\right)

and using the chain rule, where we set

A_k := \

, we can find the joint distribution as :

\begin
\mathbb P\left(X_1 = x_1, \ldots X_n = x_n\right) 
&= \mathbb P\left(X_1 = x_1 \mid X_2 = x_2, \ldots, X_n = x_n\right) \mathbb P\left(X_2 = x_2, \ldots, X_n = x_n\right) \\
&= \mathbb P(X_1 = x_1) \mathbb P(X_2 = x_2 \mid X_1 = x_1) \mathbb P(X_3 = x_3 \mid X_1 = x_1, X_2 = x_2) \cdot \ldots \\
&\qquad \cdot \mathbb P(X_n = x_n \mid X_1 = x_1, \dots, X_ = x_)\\
\end

Example

For

n=3

, i.e. considering three random variables. Then, the chain rule reads :

\begin
\mathbb P_(x_1,x_2,x_3)
&= \mathbb P(X_1=x_1, X_2 = x_2, X_3 = x_3)\\ 
&= \mathbb P(X_3=x_3 \mid X_2 = x_2, X_1 = x_1) \mathbb P(X_2 = x_2, X_1 = x_1) \\
&= \mathbb P(X_3=x_3 \mid X_2 = x_2, X_1 = x_1) \mathbb P(X_2 = x_2 \mid X_1 = x_1) \mathbb P(X_1 = x_1) \\
&= \mathbb P_(x_3 \mid x_2, x_1) \mathbb P_(x_2 \mid x_1) \mathbb P_(x_1).
\end

Bibliography

* * * , p. 496.

References

{{reflist Bayesian inference Bayesian statistics Mathematical identities Probability theory