HOME

TheInfoList



OR:

In
probability Probability is the branch of mathematics concerning numerical descriptions of how likely an Event (probability theory), event is to occur, or how likely it is that a proposition is true. The probability of an event is a number between 0 and ...
theory, the chain rule (also called the general product rule) permits the calculation of any member of the
joint distribution Given two random variables that are defined on the same probability space, the joint probability distribution is the corresponding probability distribution on all possible pairs of outputs. The joint distribution can just as well be considered ...
of a set of
random variables A random variable (also called random quantity, aleatory variable, or stochastic variable) is a mathematical formalization of a quantity or object which depends on random events. It is a mapping or a function from possible outcomes (e.g., the po ...
using only
conditional probabilities In probability theory, conditional probability is a measure of the probability of an event occurring, given that another event (by assumption, presumption, assertion or evidence) has already occurred. This particular method relies on event B occur ...
. The rule is useful in the study of
Bayesian network A Bayesian network (also known as a Bayes network, Bayes net, belief network, or decision network) is a probabilistic graphical model that represents a set of variables and their conditional dependencies via a directed acyclic graph (DAG). Bay ...
s, which describe a probability distribution in terms of conditional probabilities.


Chain rule for events


Two events

The chain rule for two random
events Event may refer to: Gatherings of people * Ceremony, an event of ritual significance, performed on a special occasion * Convention (meeting), a gathering of individuals engaged in some common interest * Event management, the organization of ev ...
A and B says P(A \cap B) = P(B \mid A) \cdot P(A).


Example

This rule is illustrated in the following example. Urn 1 has 1 black ball and 2 white balls and Urn 2 has 1 black ball and 3 white balls. Suppose we pick an urn at random and then select a ball from that urn. Let event A be choosing the first urn: P(A) = P(\overline) = 1/2. Let event B be the chance we choose a white ball. The chance of choosing a white ball, given that we have chosen the first urn, is P(B, A) = 2/3. Event A \cap B would be their intersection: choosing the first urn and a white ball from it. The probability can be found by the chain rule for probability: \mathrm P(A \cap B) = \mathrm P(B \mid A) \mathrm P(A) = 2/3 \times 1/2 = 1/3.


More than two events

For more than two events A_1,\ldots,A_n the chain rule extends to the formula \mathrm P\left(A_n \cap \ldots \cap A_1\right) = \mathrm P\left(A_n , A_ \cap \ldots \cap A_1\right) \cdot \mathrm P\left(A_ \cap \ldots \cap A_1\right) which by induction may be turned into \mathrm P\left(A_n \cap \ldots \cap A_1\right) = \prod_^n \mathrm P\left(A_k \,\Bigg, \, \bigcap_^ A_j\right).


Example

With four events (n=4), the chain rule is \begin \mathrm P(A_1 \cap A_2 \cap A_3 \cap A_4) &= \mathrm P(A_4 \mid A_3 \cap A_2 \cap A_1)\cdot \mathrm P(A_3 \cap A_2 \cap A_1) \\ &= \mathrm P(A_4 \mid A_3 \cap A_2 \cap A_1)\cdot \mathrm P(A_3 \mid A_2 \cap A_1)\cdot \mathrm P(A_2 \cap A_1) \\ &= \mathrm P(A_4 \mid A_3 \cap A_2 \cap A_1)\cdot \mathrm P(A_3 \mid A_2 \cap A_1)\cdot \mathrm P(A_2 \mid A_1)\cdot \mathrm P(A_1) \end


Chain rule for random variables


Two random variables

For two random variables X,Y, to find the joint distribution, we can apply the definition of conditional probability to obtain: \mathrm P(X = x,Y = y) = \mathrm P(X = x\mid Y = y) \cdot \mathrm P(Y = y) for any possible values x of X and y of Y in the discrete case or, in general, \mathrm P(X \in A,Y \in B) = \mathrm P(X \in A\mid Y \in B) \cdot \mathrm P(Y \in B) for any possible measurable sets A and B. If one desires a notation for the probability distribution of X, one can use P_X, so that P_X(x) := P(X = x) in the discrete case or, in general, P_X(A) := P(X \in A) for a measurable set A. Note: in the examples below, it is meaningless to write P(X) for a single random variable X or multiple random variables. We have left them as an earlier editor wrote them to provide an example to warn against this incomplete notation. It is particularly egregious to write intersections of random variables.


More than two random variables

Consider an indexed collection of random variables X_1, \ldots , X_n. To find the value of this member of the joint distribution, we can apply the definition of conditional probability to obtain: \mathrm P\left(X_n, \ldots , X_1\right) = \mathrm P\left(X_n , X_, \ldots , X_1\right) \cdot\mathrm P\left(X_, \ldots , X_1\right) Repeating this process with each final term creates the product: \mathrm P\left(\bigcap_^n X_k\right) = \prod_^n \mathrm P\left(X_k \,\Bigg, \, \bigcap_^ X_j\right).


Example

With four variables (n=4), the chain rule produces this product of conditional probabilities: \begin \mathrm P(X_4, X_3, X_2, X_1) &= \mathrm P(X_4 \mid X_3, X_2, X_1)\cdot \mathrm P(X_3, X_2, X_1) \\ &= \mathrm P(X_4 \mid X_3, X_2, X_1)\cdot \mathrm P(X_3 \mid X_2, X_1)\cdot \mathrm P(X_2, X_1) \\ &= \mathrm P(X_4 \mid X_3, X_2, X_1)\cdot \mathrm P(X_3 \mid X_2, X_1)\cdot \mathrm P(X_2 \mid X_1)\cdot \mathrm P(X_1) \end


See also

*


References

* {{Russell Norvig 2003, p. 496.
"The Chain Rule of Probability"
''
developerWorks IBM Developer is a global community of coders, developer advocates, and digital resources that help developers learn, build, and connect. The IBM Developer website (previously known as IBM developerWorks) hosts a wide range of resources, tools, a ...
'', Nov 3, 2012. Bayesian inference Bayesian statistics Mathematical identities Probability theory