The posterior probability is a type of
conditional probability
In probability theory, conditional probability is a measure of the probability of an Event (probability theory), event occurring, given that another event (by assumption, presumption, assertion or evidence) is already known to have occurred. This ...
that results from
updating the
prior probability
A prior probability distribution of an uncertain quantity, simply called the prior, is its assumed probability distribution before some evidence is taken into account. For example, the prior could be the probability distribution representing the ...
with information summarized by the
likelihood
A likelihood function (often simply called the likelihood) measures how well a statistical model explains observed data by calculating the probability of seeing that data under different parameter values of the model. It is constructed from the j ...
via an application of
Bayes' rule. From an
epistemological perspective, the posterior probability contains everything there is to know about an uncertain proposition (such as a scientific hypothesis, or parameter values), given prior knowledge and a mathematical model describing the observations available at a particular time. After the arrival of new information, the current posterior probability may serve as the prior in another round of Bayesian updating.
In the context of
Bayesian statistics
Bayesian statistics ( or ) is a theory in the field of statistics based on the Bayesian interpretation of probability, where probability expresses a ''degree of belief'' in an event. The degree of belief may be based on prior knowledge about ...
, the posterior
probability distribution
In probability theory and statistics, a probability distribution is a Function (mathematics), function that gives the probabilities of occurrence of possible events for an Experiment (probability theory), experiment. It is a mathematical descri ...
usually describes the epistemic uncertainty about
statistical parameter
In statistics, as opposed to its general use in mathematics, a parameter is any quantity of a statistical population that summarizes or describes an aspect of the population, such as a mean or a standard deviation. If a population exactly follo ...
s conditional on a collection of observed data. From a given posterior distribution, various
point and
interval estimates can be derived, such as the
maximum a posteriori (MAP) or the
highest posterior density interval (HPDI). But while conceptually simple, the posterior distribution is generally not tractable and therefore needs to be either analytically or numerically approximated.
Definition in the distributional case
In Bayesian statistics, the posterior probability is the probability of the parameters
given the evidence
, and is denoted
.
It contrasts with the
likelihood function
A likelihood function (often simply called the likelihood) measures how well a statistical model explains observed data by calculating the probability of seeing that data under different parameter values of the model. It is constructed from the ...
, which is the probability of the evidence given the parameters:
.
The two are related as follows:
Given a
prior
The term prior may refer to:
* Prior (ecclesiastical), the head of a priory (monastery)
* Prior convictions, the life history and previous convictions of a suspect or defendant in a criminal case
* Prior probability, in Bayesian statistics
* Prio ...
belief that a
probability distribution function is
and that the observations
have a likelihood
, then the posterior probability is defined as
:
,
where
is the normalizing constant and is calculated as
:
for continuous
,
or by summing
over all possible values of
for discrete
.
The posterior probability is therefore
proportional to the product ''Likelihood · Prior probability''.
Example
Suppose there is a school with 60% boys and 40% girls as students. The girls wear trousers or skirts in equal numbers; all boys wear trousers. An observer sees a (random) student from a distance; all the observer can see is that this student is wearing trousers. What is the probability this student is a girl? The correct answer can be computed using Bayes' theorem.
The event is that the student observed is a girl, and the event is that the student observed is wearing trousers. To compute the posterior probability
, we first need to know:
*
, or the probability that the student is a girl regardless of any other information. Since the observer sees a random student, meaning that all students have the same probability of being observed, and the percentage of girls among the students is 40%, this probability equals 0.4.
*
, or the probability that the student is not a girl (i.e. a boy) regardless of any other information ( is the complementary event to ). This is 60%, or 0.6.
*
, or the probability of the student wearing trousers given that the student is a girl. As they are as likely to wear skirts as trousers, this is 0.5.
*
, or the probability of the student wearing trousers given that the student is a boy. This is given as 1.
*
, or the probability of a (randomly selected) student wearing trousers regardless of any other information. Since
(via the
law of total probability), this is
.
Given all this information, the posterior probability of the observer having spotted a girl given that the observed student is wearing trousers can be computed by substituting these values in the formula:
:
An intuitive way to solve this is to assume the school has ''N'' students. Number of boys = 0.6''N'' and number of girls = 0.4''N''. If ''N'' is sufficiently large, total number of trouser wearers = 0.6''N'' + 50% of 0.4''N''. And number of girl trouser wearers = 50% of 0.4''N''. Therefore, in the population of trousers, girls are (50% of 0.4''N'')/(0.6''N'' + 50% of 0.4''N'') = 25%. In other words, if you separated out the group of trouser wearers, a quarter of that group will be girls. Therefore, if you see trousers, the most you can deduce is that you are looking at a single sample from a subset of students where 25% are girls. And by definition, chance of this random student being a girl is 25%. Every Bayes-theorem problem can be solved in this way.
Calculation
The posterior probability distribution of one
random variable
A random variable (also called random quantity, aleatory variable, or stochastic variable) is a Mathematics, mathematical formalization of a quantity or object which depends on randomness, random events. The term 'random variable' in its mathema ...
given the value of another can be calculated with
Bayes' theorem
Bayes' theorem (alternatively Bayes' law or Bayes' rule, after Thomas Bayes) gives a mathematical rule for inverting Conditional probability, conditional probabilities, allowing one to find the probability of a cause given its effect. For exampl ...
by multiplying the
prior probability distribution
A prior probability distribution of an uncertain quantity, simply called the prior, is its assumed probability distribution before some evidence is taken into account. For example, the prior could be the probability distribution representing the ...
by the
likelihood function
A likelihood function (often simply called the likelihood) measures how well a statistical model explains observed data by calculating the probability of seeing that data under different parameter values of the model. It is constructed from the ...
, and then dividing by the
normalizing constant, as follows:
:
gives the posterior
probability density function
In probability theory, a probability density function (PDF), density function, or density of an absolutely continuous random variable, is a Function (mathematics), function whose value at any given sample (or point) in the sample space (the s ...
for a random variable
given the data
, where
*
is the prior density of
,
*
is the likelihood function as a function of
,
*
is the normalizing constant, and
*
is the posterior density of
given the data
.
Credible interval
Posterior probability is a conditional probability conditioned on randomly observed data. Hence it is a random variable. For a random variable, it is important to summarize its amount of uncertainty. One way to achieve this goal is to provide a
credible interval of the posterior probability.
Classification
In
classification
Classification is the activity of assigning objects to some pre-existing classes or categories. This is distinct from the task of establishing the classes themselves (for example through cluster analysis). Examples include diagnostic tests, identif ...
, posterior probabilities reflect the uncertainty of assessing an observation to particular class, see also
class-membership probabilities.
While
statistical classification
When classification is performed by a computer, statistical methods are normally used to develop the algorithm.
Often, the individual observations are analyzed into a set of quantifiable properties, known variously as explanatory variables or ''f ...
methods by definition generate posterior probabilities, Machine Learners usually supply membership values which do not induce any probabilistic confidence. It is desirable to transform or rescale membership values to class-membership probabilities, since they are comparable and additionally more easily applicable for post-processing.
See also
*
Prediction interval
*
Bernstein–von Mises theorem
*
Probability of success
*
Bayesian epistemology
*
Metropolis–Hastings algorithm
References
Further reading
*
* {{cite book , title = Bayesian Statistics : An Introduction , last = Lee , first = Peter M. , publisher=
Wiley , year=2004 , edition=3rd , isbn=0-340-81405-5
Bayesian statistics