Beliefs depend on the available information. This idea is formalized in
probability theory
Probability theory is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expressing it through a set o ...
by conditioning.
Conditional probabilities
In probability theory, conditional probability is a measure of the probability of an event occurring, given that another event (by assumption, presumption, assertion or evidence) has already occurred. This particular method relies on event B occur ...
,
conditional expectation
In probability theory, the conditional expectation, conditional expected value, or conditional mean of a random variable is its expected value – the value it would take “on average” over an arbitrarily large number of occurrences – give ...
s, and
conditional probability distribution
In probability theory and statistics, given two jointly distributed random variables X and Y, the conditional probability distribution of Y given X is the probability distribution of Y when X is known to be a particular value; in some cases the co ...
s are treated on three levels:
discrete probabilities,
probability density function
In probability theory, a probability density function (PDF), or density of a continuous random variable, is a function whose value at any given sample (or point) in the sample space (the set of possible values taken by the random variable) can ...
s, and
measure theory
In mathematics, the concept of a measure is a generalization and formalization of geometrical measures ( length, area, volume) and other common notions, such as mass and probability of events. These seemingly distinct concepts have many simil ...
. Conditioning leads to a non-random result if the condition is completely specified; otherwise, if the condition is left random, the result of conditioning is also random.
Conditioning on the discrete level
Example: A fair coin is tossed 10 times; the
random variable
A random variable (also called random quantity, aleatory variable, or stochastic variable) is a mathematical formalization of a quantity or object which depends on random events. It is a mapping or a function from possible outcomes (e.g., the po ...
''X'' is the number of heads in these 10 tosses, and ''Y'' is the number of heads in the first 3 tosses. In spite of the fact that ''Y'' emerges before ''X'' it may happen that someone knows ''X'' but not ''Y''.
Conditional probability
Given that ''X'' = 1, the conditional probability of the event ''Y'' = 0 is
:
More generally,
:
One may also treat the conditional probability as a random variable, — a function of the random variable ''X'', namely,
:
The
expectation of this random variable is equal to the (unconditional) probability,
:
namely,
:
which is an instance of the
law of total probability
In probability theory, the law (or formula) of total probability is a fundamental rule relating marginal probabilities to conditional probabilities. It expresses the total probability of an outcome which can be realized via several distinct even ...
Thus,
may be treated as the value of the random variable
corresponding to ''X'' = 1. On the other hand,
is well-defined irrespective of other possible values of ''X''.
Conditional expectation
Given that ''X'' = 1, the conditional expectation of the random variable ''Y'' is
More generally,
:
(In this example it appears to be a linear function, but in general it is nonlinear.) One may also treat the conditional expectation as a random variable, — a function of the random variable ''X'', namely,
:
The expectation of this random variable is equal to the (unconditional) expectation of ''Y'',
:
namely,
:
or simply
:
which is an instance of the
law of total expectation
The proposition in probability theory known as the law of total expectation, the law of iterated expectations (LIE), Adam's law, the tower rule, and the smoothing theorem, among other names, states that if X is a random variable whose expected v ...
The random variable
is the best predictor of ''Y'' given ''X''. That is, it minimizes the mean square error
on the class of all random variables of the form ''f''(''X''). This class of random variables remains intact if ''X'' is replaced, say, with 2''X''. Thus,
It does not mean that
rather,
In particular,
More generally,
for every function ''g'' that is one-to-one on the set of all possible values of ''X''. The values of ''X'' are irrelevant; what matters is the partition (denote it α
''X'')
:
of the sample space Ω into disjoint sets . (Here
are all possible values of ''X''.) Given an arbitrary partition α of Ω, one may define the random variable E ( ''Y'' , α ). Still, E ( E ( ''Y'' , α)) = E ( ''Y'' ).
Conditional probability may be treated as a special case of conditional expectation. Namely, P ( ''A'' , ''X'' ) = E ( ''Y'' , ''X'' ) if ''Y'' is the
indicator
Indicator may refer to:
Biology
* Environmental indicator of environmental health (pressures, conditions and responses)
* Ecological indicator of ecosystem health (ecological processes)
* Health indicator, which is used to describe the health o ...
of ''A''. Therefore the conditional probability also depends on the partition α
''X'' generated by ''X'' rather than on ''X'' itself; P ( ''A'' , ''g''(''X'') ) = P (''A'' , ''X'') = P (''A'' , α), α = α
''X'' = α
''g''(''X'').
On the other hand, conditioning on an event ''B'' is well-defined, provided that
irrespective of any partition that may contain ''B'' as one of several parts.
Conditional distribution
Given ''X'' = x, the conditional distribution of ''Y'' is
:
for 0 ≤ ''y'' ≤ min ( 3, ''x'' ). It is the
hypergeometric distribution
In probability theory and statistics, the hypergeometric distribution is a discrete probability distribution that describes the probability of k successes (random draws for which the object drawn has a specified feature) in n draws, ''without'' ...
H ( ''x''; 3, 7 ), or equivalently, H ( 3; ''x'', 10-''x'' ). The corresponding expectation 0.3 ''x'', obtained from the general formula
:
for H ( ''n''; ''R'', ''W'' ), is nothing but the conditional expectation E (''Y'' , ''X'' = ''x'') = 0.3 ''x''.
Treating H ( ''X''; 3, 7 ) as a random distribution (a random vector in the four-dimensional space of all measures on ), one may take its expectation, getting the unconditional distribution of ''Y'', — the
binomial distribution
In probability theory and statistics, the binomial distribution with parameters ''n'' and ''p'' is the discrete probability distribution of the number of successes in a sequence of ''n'' independent experiments, each asking a yes–no quest ...
Bin ( 3, 0.5 ). This fact amounts to the equality
:
for ''y'' = 0,1,2,3; which is an instance of the
law of total probability
In probability theory, the law (or formula) of total probability is a fundamental rule relating marginal probabilities to conditional probabilities. It expresses the total probability of an outcome which can be realized via several distinct even ...
.
Conditioning on the level of densities
Example. A point of the sphere ''x''
2 + ''y''
2 + ''z''
2 = 1 is chosen at random according to the
uniform distribution on the sphere. The random variables ''X'', ''Y'', ''Z'' are the coordinates of the random point. The joint density of ''X'', ''Y'', ''Z'' does not exist (since the sphere is of zero volume), but the joint density ''f''
''X'',''Y'' of ''X'', ''Y'' exists,
:
(The density is non-constant because of a non-constant angle between the
sphere and the plane.) The density of ''X'' may be calculated by integration,
:
surprisingly, the result does not depend on ''x'' in (−1,1),
: