HOME

TheInfoList



OR:

In
probability theory Probability theory or probability calculus is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expre ...
, the chain rule (also called the general product rule) describes how to calculate the probability of the intersection of, not necessarily
independent Independent or Independents may refer to: Arts, entertainment, and media Artist groups * Independents (artist group), a group of modernist painters based in Pennsylvania, United States * Independentes (English: Independents), a Portuguese artist ...
, events or the
joint distribution A joint or articulation (or articular surface) is the connection made between bones, ossicles, or other hard structures in the body which link an animal's skeletal system into a functional whole.Saladin, Ken. Anatomy & Physiology. 7th ed. McGraw- ...
of
random variables A random variable (also called random quantity, aleatory variable, or stochastic variable) is a mathematical formalization of a quantity or object which depends on random events. The term 'random variable' in its mathematical definition refers ...
respectively, using conditional probabilities. This rule allows one to express a joint probability in terms of only conditional probabilities. The rule is notably used in the context of discrete
stochastic processes In probability theory and related fields, a stochastic () or random process is a mathematical object usually defined as a family of random variables in a probability space, where the index of the family often has the interpretation of time. Stoc ...
and in applications, e.g. the study of
Bayesian network A Bayesian network (also known as a Bayes network, Bayes net, belief network, or decision network) is a probabilistic graphical model that represents a set of variables and their conditional dependencies via a directed acyclic graph (DAG). Whi ...
s, which describe a
probability distribution In probability theory and statistics, a probability distribution is a Function (mathematics), function that gives the probabilities of occurrence of possible events for an Experiment (probability theory), experiment. It is a mathematical descri ...
in terms of conditional probabilities.


Chain rule for events


Two events

For two events A and B, the chain rule states that :\mathbb P(A \cap B) = \mathbb P(B \mid A) \mathbb P(A), where \mathbb P(B \mid A) denotes the
conditional probability In probability theory, conditional probability is a measure of the probability of an Event (probability theory), event occurring, given that another event (by assumption, presumption, assertion or evidence) is already known to have occurred. This ...
of B given A.


Example

An Urn A has 1 black ball and 2 white balls and another Urn B has 1 black ball and 3 white balls. Suppose we pick an urn at random and then select a ball from that urn. Let event A be choosing the first urn, i.e. \mathbb P(A) = \mathbb P(\overline) = 1/2, where \overline A is the
complementary event In probability theory, the complement of any event ''A'' is the event ot ''A'' i.e. the event that ''A'' does not occur.Robert R. Johnson, Patricia J. Kuby: ''Elementary Statistics''. Cengage Learning 2007, , p. 229 () The event ''A'' and ...
of A. Let event B be the chance we choose a white ball. The chance of choosing a white ball, given that we have chosen the first urn, is \mathbb P(B, A) = 2/3. The intersection A \cap B then describes choosing the first urn and a white ball from it. The probability can be calculated by the chain rule as follows: :\mathbb P(A \cap B) = \mathbb P(B \mid A) \mathbb P(A) = \frac 23 \cdot \frac 12 = \frac 13.


Finitely many events

For events A_1,\ldots,A_n whose intersection has not probability zero, the chain rule states :\begin \mathbb P\left(A_1 \cap A_2 \cap \ldots \cap A_n\right) &= \mathbb P\left(A_n \mid A_1 \cap \ldots \cap A_\right) \mathbb P\left(A_1 \cap \ldots \cap A_\right) \\ &= \mathbb P\left(A_n \mid A_1 \cap \ldots \cap A_\right) \mathbb P\left(A_ \mid A_1 \cap \ldots \cap A_\right) \mathbb P\left(A_1 \cap \ldots \cap A_\right) \\ &= \mathbb P\left(A_n \mid A_1 \cap \ldots \cap A_\right) \mathbb P\left(A_ \mid A_1 \cap \ldots \cap A_\right) \cdot \ldots \cdot \mathbb P(A_3 \mid A_1 \cap A_2) \mathbb P(A_2 \mid A_1) \mathbb P(A_1)\\ &= \mathbb P(A_1) \mathbb P(A_2 \mid A_1) \mathbb P(A_3 \mid A_1 \cap A_2) \cdot \ldots \cdot \mathbb P(A_n \mid A_1 \cap \dots \cap A_)\\ &= \prod_^n \mathbb P(A_k \mid A_1 \cap \dots \cap A_)\\ &= \prod_^n \mathbb P\left(A_k \,\Bigg, \, \bigcap_^ A_j\right). \end


Example 1

For n=4, i.e. four events, the chain rule reads :\begin \mathbb P(A_1 \cap A_2 \cap A_3 \cap A_4) &= \mathbb P(A_4 \mid A_3 \cap A_2 \cap A_1)\mathbb P(A_3 \cap A_2 \cap A_1) \\ &= \mathbb P(A_4 \mid A_3 \cap A_2 \cap A_1)\mathbb P(A_3 \mid A_2 \cap A_1)\mathbb P(A_2 \cap A_1) \\ &= \mathbb P(A_4 \mid A_3 \cap A_2 \cap A_1)\mathbb P(A_3 \mid A_2 \cap A_1)\mathbb P(A_2 \mid A_1)\mathbb P(A_1) \end.


Example 2

We randomly draw 4 cards (one at a time) without replacement from deck with 52 cards. What is the probability that we have picked 4 aces? First, we set A_n := \left\. Obviously, we get the following probabilities :\mathbb P(A_1) = \frac 4, \qquad \mathbb P(A_2 \mid A_1) = \frac 3, \qquad \mathbb P(A_3 \mid A_1 \cap A_2) = \frac 2, \qquad \mathbb P(A_4 \mid A_1 \cap A_2 \cap A_3) = \frac 1. Applying the chain rule, :\mathbb P(A_1 \cap A_2 \cap A_3 \cap A_4) = \frac 4 \cdot \frac 3 \cdot \frac 2 \cdot \frac 1 = \frac.


Statement of the theorem and proof

Let (\Omega, \mathcal A, \mathbb P) be a probability space. Recall that the
conditional probability In probability theory, conditional probability is a measure of the probability of an Event (probability theory), event occurring, given that another event (by assumption, presumption, assertion or evidence) is already known to have occurred. This ...
of an A \in \mathcal A given B \in \mathcal A is defined as : \begin \mathbb P(A \mid B) := \begin \frac, & \mathbb P(B) > 0,\\ 0 & \mathbb P(B) = 0. \end \end Then we have the following theorem.


Chain rule for discrete random variables


Two random variables

For two discrete random variables X,Y, we use the eventsA := \ and B := \ in the definition above, and find the joint distribution as :\mathbb P(X = x,Y = y) = \mathbb P(X = x\mid Y = y) \mathbb P(Y = y), or :\mathbb P_(x,y) = \mathbb P_(x\mid y) \mathbb P_Y(y), where \mathbb P_X(x) := \mathbb P(X = x) is the
probability distribution In probability theory and statistics, a probability distribution is a Function (mathematics), function that gives the probabilities of occurrence of possible events for an Experiment (probability theory), experiment. It is a mathematical descri ...
of X and \mathbb P_(x\mid y)
conditional probability distribution In probability theory and statistics, the conditional probability distribution is a probability distribution that describes the probability of an outcome given the occurrence of a particular event. Given two jointly distributed random variables X ...
of X given Y.


Finitely many random variables

Let X_1, \ldots , X_n be random variables and x_1, \dots, x_n \in \mathbb R. By the definition of the conditional probability, :\mathbb P\left(X_n=x_n, \ldots , X_1=x_1\right) = \mathbb P\left(X_n=x_n , X_=x_, \ldots , X_1=x_1\right) \mathbb P\left(X_=x_, \ldots , X_1=x_1\right) and using the chain rule, where we set A_k := \, we can find the joint distribution as :\begin \mathbb P\left(X_1 = x_1, \ldots X_n = x_n\right) &= \mathbb P\left(X_1 = x_1 \mid X_2 = x_2, \ldots, X_n = x_n\right) \mathbb P\left(X_2 = x_2, \ldots, X_n = x_n\right) \\ &= \mathbb P(X_1 = x_1) \mathbb P(X_2 = x_2 \mid X_1 = x_1) \mathbb P(X_3 = x_3 \mid X_1 = x_1, X_2 = x_2) \cdot \ldots \\ &\qquad \cdot \mathbb P(X_n = x_n \mid X_1 = x_1, \dots, X_ = x_)\\ \end


Example

For n=3, i.e. considering three random variables. Then, the chain rule reads :\begin \mathbb P_(x_1,x_2,x_3) &= \mathbb P(X_1=x_1, X_2 = x_2, X_3 = x_3)\\ &= \mathbb P(X_3=x_3 \mid X_2 = x_2, X_1 = x_1) \mathbb P(X_2 = x_2, X_1 = x_1) \\ &= \mathbb P(X_3=x_3 \mid X_2 = x_2, X_1 = x_1) \mathbb P(X_2 = x_2 \mid X_1 = x_1) \mathbb P(X_1 = x_1) \\ &= \mathbb P_(x_3 \mid x_2, x_1) \mathbb P_(x_2 \mid x_1) \mathbb P_(x_1). \end


Bibliography

* * * , p. 496.


References

{{reflist Bayesian inference Bayesian statistics Mathematical identities Probability theory