Given two

random variable A random variable (also called random quantity, aleatory variable, or stochastic variable) is a mathematical formalization of a quantity or object which depends on random events. It is a mapping or a function from possible outcomes (e.g., the p ...

s that are defined on the same

probability space In probability theory, a probability space or a probability triple (\Omega, \mathcal, P) is a mathematical construct that provides a formal model of a random process or "experiment". For example, one can define a probability space which models t ...

, the joint probability distribution is the corresponding

probability distribution In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment. It is a mathematical description of a random phenomeno ...

on all possible pairs of outputs. The joint distribution can just as well be considered for any given number of random variables. The joint distribution encodes the

marginal distribution In probability theory and statistics, the marginal distribution of a subset of a collection of random variables is the probability distribution of the variables contained in the subset. It gives the probabilities of various values of the variables ...

s, i.e. the distributions of each of the individual random variables. It also encodes the

conditional probability distribution In probability theory and statistics, given two jointly distributed random variables X and Y, the conditional probability distribution of Y given X is the probability distribution of Y when X is known to be a particular value; in some cases the c ...

s, which deal with how the outputs of one random variable are distributed when given information on the outputs of the other random variable(s). In the formal mathematical setup of measure theory, the joint distribution is given by the pushforward measure, by the map obtained by pairing together the given random variables, of the sample space's

probability measure In mathematics, a probability measure is a real-valued function defined on a set of events in a probability space that satisfies measure properties such as ''countable additivity''. The difference between a probability measure and the more g ...

. In the case of real-valued random variables, the joint distribution, as a particular multivariate distribution, may be expressed by a multivariate

cumulative distribution function In probability theory and statistics, the cumulative distribution function (CDF) of a real-valued random variable X, or just distribution function of X, evaluated at x, is the probability that X will take a value less than or equal to x. Ev ...

, or by a multivariate

probability density function In probability theory, a probability density function (PDF), or density of a continuous random variable, is a function whose value at any given sample (or point) in the sample space (the set of possible values taken by the random variable) c ...

together with a multivariate

probability mass function In probability and statistics, a probability mass function is a function that gives the probability that a discrete random variable is exactly equal to some value. Sometimes it is also known as the discrete density function. The probability mass ...

. In the special case of continuous random variables, it is sufficient to consider probability density functions, and in the case of

discrete random variable A random variable (also called random quantity, aleatory variable, or stochastic variable) is a mathematical formalization of a quantity or object which depends on random events. It is a mapping or a function from possible outcomes (e.g., the po ...

s, it is sufficient to consider probability mass functions.

Examples

Draws from an urn

Suppose each of two urns contains twice as many red balls as blue balls, and no others, and suppose one ball is randomly selected from each urn, with the two draws independent of each other. Let

A

and

B

be discrete random variables associated with the outcomes of the draw from the first urn and second urn respectively. The probability of drawing a red ball from either of the urns is 2/3, and the probability of drawing a blue ball is 1/3. The joint probability distribution is presented in the following table: Each of the four inner cells shows the probability of a particular combination of results from the two draws; these probabilities are the joint distribution. In any one cell the probability of a particular combination occurring is (since the draws are independent) the product of the probability of the specified result for A and the probability of the specified result for B. The probabilities in these four cells sum to 1, as it is always true for probability distributions. Moreover, the final row and the final column give the marginal probability distribution for A and the marginal probability distribution for B respectively. For example, for A the first of these cells gives the sum of the probabilities for A being red, regardless of which possibility for B in the column above the cell occurs, as 2/3. Thus the marginal probability distribution for

A

gives

A

's probabilities ''unconditional'' on

B

, in a margin of the table.

Coin flips

Consider the flip of two fair coins; let

A

and

B

be discrete random variables associated with the outcomes of the first and second coin flips respectively. Each coin flip is a

Bernoulli trial In the theory of probability and statistics, a Bernoulli trial (or binomial trial) is a random experiment with exactly two possible outcomes, "success" and "failure", in which the probability of success is the same every time the experiment is ...

and has a

Bernoulli distribution In probability theory and statistics, the Bernoulli distribution, named after Swiss mathematician Jacob Bernoulli,James Victor Uspensky: ''Introduction to Mathematical Probability'', McGraw-Hill, New York 1937, page 45 is the discrete probab ...

. If a coin displays "heads" then the associated random variable takes the value 1, and it takes the value 0 otherwise. The probability of each of these outcomes is 1/2, so the marginal (unconditional) density functions are :

P(A)=1/2 \quad \text \quad A\in \;

P(B)=1/2 \quad \text \quad B\in \.

The joint probability mass function of

A

and

B

defines probabilities for each pair of outcomes. All possible outcomes are :

(A=0,B=0),
(A=0,B=1),
(A=1,B=0),
(A=1,B=1).

Since each outcome is equally likely the joint probability mass function becomes :

P(A,B)=1/4 \quad \text \quad A,B\in\.

Since the coin flips are independent, the joint probability mass function is the product of the marginals: :

P(A,B)=P(A)P(B) \quad \text \quad A,B \in\.

Rolling a dice

Consider the roll of a fair

dice Dice (singular die or dice) are small, throwable objects with marked sides that can rest in multiple positions. They are used for generating random values, commonly as part of tabletop games, including dice games, board games, role-playing ...

and let

A=1

if the number is even (i.e. 2, 4, or 6) and

A=0

otherwise. Furthermore, let

B=1

if the number is prime (i.e. 2, 3, or 5) and

B=0

otherwise. Then, the joint distribution of

A

and

B

, expressed as a probability mass function, is :

\mathrm(A=0,B=0)=P\=\frac,\quad \quad \mathrm(A=1,B=0)=P\=\frac,

\mathrm(A=0,B=1)=P\=\frac,\quad \quad \mathrm(A=1,B=1)=P\=\frac.

These probabilities necessarily sum to 1, since the probability of ''some'' combination of

A

and

B

occurring is 1.

Marginal probability distribution

If more than one random variable is defined in a random experiment, it is important to distinguish between the joint probability distribution of X and Y and the probability distribution of each variable individually. The individual probability distribution of a random variable is referred to as its marginal probability distribution. In general, the marginal probability distribution of X can be determined from the joint probability distribution of X and other random variables. If the joint probability density function of random variable X and Y is

f_(x,y)

, the marginal probability density function of X and Y, which defines the

Marginal distribution In probability theory and statistics, the marginal distribution of a subset of a collection of random variables is the probability distribution of the variables contained in the subset. It gives the probabilities of various values of the variables ...

, is given by:

f_(x)= \int f_(x,y) \; dy

f_(y)= \int f_(x,y)  \; dx

where the first integral is over all points in the range of (X,Y) for which X=x and the second integral is over all points in the range of (X,Y) for which Y=y.

Joint cumulative distribution function

For a pair of random variables

X,Y

, the joint cumulative distribution function (CDF)

F_

is given by where the right-hand side represents the

probability Probability is the branch of mathematics concerning numerical descriptions of how likely an Event (probability theory), event is to occur, or how likely it is that a proposition is true. The probability of an event is a number between 0 and ...

that the random variable

X

takes on a value less than or equal to

x

and that

Y

takes on a value less than or equal to

y

. For

N

random variables

X_1,\ldots,X_N

, the joint CDF

F_

is given by Interpreting the

N

random variables as a

random vector In probability, and statistics, a multivariate random variable or random vector is a list of mathematical variables each of whose value is unknown, either because the value has not yet occurred or because there is imperfect knowledge of its valu ...

\mathbf = (X_1,\ldots,X_N)^T

yields a shorter notation: :

F_(\mathbf) = \operatorname(X_1 \leq x_1,\ldots,X_N \leq x_N)

Joint density function or mass function

Discrete case

The joint

of two

X, Y

is: or written in terms of conditional distributions :

p_(x,y) = \mathrm(Y=y \mid X=x) \cdot \mathrm(X=x) = \mathrm(X=x \mid Y=y) \cdot \mathrm(Y=y)

where

\mathrm(Y=y \mid X=x)

is the

Y = y

given that

X = x

. The generalization of the preceding two-variable case is the joint probability distribution of

n\,

discrete random variables

X_1, X_2, \dots,X_n

which is: or equivalently :

\begin
p_(x_1,\ldots,x_n) & =  \mathrm(X_1=x_1) \cdot \mathrm(X_2=x_2\mid X_1=x_1) \\ & \cdot \mathrm(X_3=x_3\mid X_1=x_1,X_2=x_2)  \\ &  \dots \\  & \cdot P(X_n=x_n\mid X_1=x_1,X_2=x_2,\dots,X_=x_).
\end

. This identity is known as the

chain rule of probability In probability theory, the chain rule (also called the general product rule) permits the calculation of any member of the joint distribution of a set of random variables using only conditional probabilities. The rule is useful in the study of Bayes ...

. Since these are probabilities, in the two-variable case :

\sum_i \sum_j \mathrm(X=x_i\ \mathrm\ Y=y_j) = 1,\,

which generalizes for

n\,

discrete random variables

X_1, X_2, \dots , X_n

to :

\sum_ \sum_ \dots \sum_ \mathrm(X_1=x_,X_2=x_, \dots, X_n=x_) = 1.\;

Continuous case

The joint

f_(x,y)

for two continuous random variables is defined as the derivative of the joint cumulative distribution function (see ): This is equal to: :

f_(x,y) = f_(y\mid x)f_X(x) = f_(x\mid y)f_Y(y)

where

f_(y\mid x)

and

f_(x\mid y)

are the

conditional distribution In probability theory and statistics, given two jointly distributed random variables X and Y, the conditional probability distribution of Y given X is the probability distribution of Y when X is known to be a particular value; in some cases the c ...

s of

Y

given

X=x

and of

X

given

Y=y

respectively, and

f_X(x)

and

f_Y(y)

are the

s for

X

and

Y

respectively. The definition extends naturally to more than two random variables: Again, since these are probability distributions, one has :

\int_x \int_y f_(x,y) \; dy \; dx= 1

respectively :

\int_ \ldots \int_ f_(x_1,\ldots,x_n) \; dx_n \ldots \; dx_1 = 1

Mixed case

The "mixed joint density" may be defined where one or more random variables are continuous and the other random variables are discrete. With one variable of each type :

\begin
f_(x,y) = f_(x \mid y)\mathrm(Y=y)= \mathrm(Y=y \mid X=x) f_X(x).
\end

One example of a situation in which one may wish to find the cumulative distribution of one random variable which is continuous and another random variable which is discrete arises when one wishes to use a

logistic regression In statistics, the logistic model (or logit model) is a statistical model that models the probability of an event taking place by having the log-odds for the event be a linear function (calculus), linear combination of one or more independent var ...

in predicting the probability of a binary outcome Y conditional on the value of a continuously distributed outcome

X

. One ''must'' use the "mixed" joint density when finding the cumulative distribution of this binary outcome because the input variables

(X,Y)

were initially defined in such a way that one could not collectively assign it either a probability density function or a probability mass function. Formally,

f_(x,y)

is the probability density function of

(X,Y)

with respect to the product measure on the respective supports of

X

and

Y

. Either of these two decompositions can then be used to recover the joint cumulative distribution function: :

\begin
F_(x,y)&=\sum\limits_\int_^x f_(s,t)\;ds.
\end

The definition generalizes to a mixture of arbitrary numbers of discrete and continuous random variables.

Additional properties

Joint distribution for independent variables

In general two random variables

X

and

Y

are

independent Independent or Independents may refer to: Arts, entertainment, and media Artist groups * Independents (artist group), a group of modernist painters based in the New Hope, Pennsylvania, area of the United States during the early 1930s * Independe ...

if and only if the joint cumulative distribution function satisfies :

F_(x,y) = F_X(x) \cdot F_Y(y)

Two discrete random variables

X

and

Y

are independent if and only if the joint probability mass function satisfies :

P(X = x \ \mbox \ Y = y ) = P( X = x) \cdot P( Y = y)

for all

x

and

y

. While the number of independent random events grows, the related joint probability value decreases rapidly to zero, according to a negative exponential law. Similarly, two absolutely continuous random variables are independent if and only if :

f_(x,y) = f_X(x) \cdot f_Y(y)

for all

x

and

y

. This means that acquiring any information about the value of one or more of the random variables leads to a conditional distribution of any other variable that is identical to its unconditional (marginal) distribution; thus no variable provides any information about any other variable.

Joint distribution for conditionally dependent variables

If a subset

A

of the variables

X_1,\cdots,X_n

is conditionally dependent given another subset

B

of these variables, then the probability mass function of the joint distribution is

\mathrm(X_1,\ldots,X_n)

\mathrm(X_1,\ldots,X_n)

is equal to

P(B)\cdot P(A\mid B)

. Therefore, it can be efficiently represented by the lower-dimensional probability distributions

P(B)

and

P(A\mid B)

. Such conditional independence relations can be represented with a

Bayesian network A Bayesian network (also known as a Bayes network, Bayes net, belief network, or decision network) is a probabilistic graphical model that represents a set of variables and their conditional dependencies via a directed acyclic graph (DAG). Bay ...

or copula functions.

Covariance In probability theory and statistics, covariance is a measure of the joint variability of two random variables. If the greater values of one variable mainly correspond with the greater values of the other variable, and the same holds for the le ...

When two or more random variables are defined on a probability space, it is useful to describe how they vary together; that is, it is useful to measure the relationship between the variables. A common measure of the relationship between two random variables is the covariance. Covariance is a measure of linear relationship between the random variables. If the relationship between the random variables is nonlinear, the covariance might not be sensitive to the relationship, which means, it does not relate the correlation between two variables. The covariance between the random variable X and Y, denoted as cov(X,Y), is :

E(XY)-\mu_x\mu_y

Correlation In statistics, correlation or dependence is any statistical relationship, whether causal or not, between two random variables or bivariate data. Although in the broadest sense, "correlation" may indicate any type of association, in statisti ...

There is another measure of the relationship between two random variables that is often easier to interpret than the covariance. The correlation just scales the covariance by the product of the standard deviation of each variable. Consequently, the correlation is a dimensionless quantity that can be used to compare the linear relationships between pairs of variables in different units. If the points in the joint probability distribution of X and Y that receive positive probability tend to fall along a line of positive (or negative) slope, ρ_XY is near +1 (or −1). If ρ_XY equals +1 or −1, it can be shown that the points in the joint probability distribution that receive positive probability fall exactly along a straight line. Two random variables with nonzero correlation are said to be correlated. Similar to covariance, the correlation is a measure of the linear relationship between random variables. The correlation between random variable X and Y, denoted as

\rho_=\frac=\frac

Important named distributions

Named joint distributions that arise frequently in statistics include the

multivariate normal distribution In probability theory and statistics, the multivariate normal distribution, multivariate Gaussian distribution, or joint normal distribution is a generalization of the one-dimensional ( univariate) normal distribution to higher dimensions. One ...

, the multivariate stable distribution, the

multinomial distribution In probability theory, the multinomial distribution is a generalization of the binomial distribution. For example, it models the probability of counts for each side of a ''k''-sided dice rolled ''n'' times. For ''n'' independent trials each of w ...

, the negative multinomial distribution, the multivariate hypergeometric distribution, and the elliptical distribution.

References

External links

* * *''A modern introduction to probability and statistics : understanding why and how''. Dekking, Michel, 1946-. London: Springer. 2005. .

OCLC OCLC, Inc., doing business as OCLC, See also: is an American nonprofit cooperative organization "that provides shared technology services, original research, and community programs for its membership and the library community at large". It wa ...

262680588. *
Mathworld: Joint Distribution Function
{{Probability distributions, multivariate Theory of probability distributions Types of probability distributions