HOME

TheInfoList



OR:

The posterior probability is a type of
conditional probability In probability theory, conditional probability is a measure of the probability of an event occurring, given that another event (by assumption, presumption, assertion or evidence) has already occurred. This particular method relies on event B occu ...
that results from updating the
prior probability In Bayesian statistical inference, a prior probability distribution, often simply called the prior, of an uncertain quantity is the probability distribution that would express one's beliefs about this quantity before some evidence is taken into ...
with information summarized by the
likelihood The likelihood function (often simply called the likelihood) represents the probability of random variable realizations conditional on particular values of the statistical parameters. Thus, when evaluated on a given sample, the likelihood functi ...
via an application of
Bayes' rule In probability theory and statistics, Bayes' theorem (alternatively Bayes' law or Bayes' rule), named after Thomas Bayes, describes the probability of an event, based on prior knowledge of conditions that might be related to the event. For exa ...
. From an epistemological perspective, the posterior probability contains everything there is to know about an uncertain proposition (such as a scientific hypothesis, or parameter values), given prior knowledge and a mathematical model describing the observations available at a particular time. After the arrival of new information, the current posterior probability may serve as the prior in another round of Bayesian updating. In the context of
Bayesian statistics Bayesian statistics is a theory in the field of statistics based on the Bayesian interpretation of probability where probability expresses a ''degree of belief'' in an event. The degree of belief may be based on prior knowledge about the event, ...
, the posterior probability distribution usually describes the epistemic uncertainty about
statistical parameter In statistics, as opposed to its general use in mathematics, a parameter is any measured quantity of a statistical population that summarises or describes an aspect of the population, such as a mean or a standard deviation. If a population ...
s conditional on a collection of observed data. From a given posterior distribution, various point and interval estimates can be derived, such as the
maximum a posteriori In Bayesian statistics, a maximum a posteriori probability (MAP) estimate is an estimate of an unknown quantity, that equals the mode of the posterior distribution. The MAP can be used to obtain a point estimate of an unobserved quantity on the ...
(MAP) or the
highest posterior density interval In Bayesian statistics, a credible interval is an interval within which an unobserved parameter value falls with a particular probability. It is an interval in the domain of a posterior probability distribution or a predictive distribution. The ...
(HPDI). But while conceptually simple, the posterior distribution is generally not tractable and therefore needs to be either analytically or numerically approximated.


Definition in the distributional case

In variational Bayesian methods, the posterior probability is the probability of the parameters \theta given the evidence X, and is denoted p(\theta , X). It contrasts with the
likelihood function The likelihood function (often simply called the likelihood) represents the probability of random variable realizations conditional on particular values of the statistical parameters. Thus, when evaluated on a given sample, the likelihood functi ...
, which is the probability of the evidence given the parameters: p(X, \theta). The two are related as follows: Given a
prior Prior (or prioress) is an ecclesiastical title for a superior in some religious orders. The word is derived from the Latin for "earlier" or "first". Its earlier generic usage referred to any monastic superior. In abbeys, a prior would be low ...
belief that a probability distribution function is p(\theta) and that the observations x have a likelihood p(x, \theta), then the posterior probability is defined as :p(\theta, x) = \fracp(\theta) where p(x) is the normalizing constant and is calculated as : p(x) = \int p(x, \theta)p(\theta)d\theta for continuous \theta, or by summing p(x, \theta)p(\theta) over all possible values of \theta for discrete \theta. The posterior probability is therefore proportional to the product ''Likelihood · Prior probability''.


Example

Suppose there is a school having 60% boys and 40% girls as students. The girls wear trousers or skirts in equal numbers; all boys wear trousers. An observer sees a (random) student from a distance; all the observer can see is that this student is wearing trousers. What is the probability this student is a girl? The correct answer can be computed using Bayes' theorem. The event G is that the student observed is a girl, and the event T is that the student observed is wearing trousers. To compute the posterior probability P(G, T), we first need to know: * P(G), or the probability that the student is a girl regardless of any other information. Since the observer sees a random student, meaning that all students have the same probability of being observed, and the percentage of girls among the students is 40%, this probability equals 0.4. * P(B), or the probability that the student is not a girl (i.e. a boy) regardless of any other information (B is the complementary event to G). This is 60%, or 0.6. * P(T, G), or the probability of the student wearing trousers given that the student is a girl. As they are as likely to wear skirts as trousers, this is 0.5. * P(T, B), or the probability of the student wearing trousers given that the student is a boy. This is given as 1. * P(T), or the probability of a (randomly selected) student wearing trousers regardless of any other information. Since P(T) = P(T, G)P(G) + P(T, B)P(B) (via the law of total probability), this is P(T)= 0.5\times0.4 + 1\times0.6 = 0.8. Given all this information, the posterior probability of the observer having spotted a girl given that the observed student is wearing trousers can be computed by substituting these values in the formula: :P(G, T) = \frac = \frac = 0.25. An intuitive way to solve this is to assume the school has N students. Number of boys = 0.6N and number of girls = 0.4N. If N is sufficiently large, total number of trouser wearers = 0.6N+ 50% of 0.4N. And number of girl trouser wearers = 50% of 0.4N. Therefore, in the population of trousers, girls are (50% of 0.4N)/(0.6N+ 50% of 0.4N) = 25%. In other words, if you separated out the group of trouser wearers, a quarter of that group will be girls. Therefore, if you see trousers, the most you can deduce is that you are looking at a single sample from a subset of students where 25% are girls. And by definition, chance of this random student being a girl is 25%. Every Bayes theorem problem can be solved in this way.


Calculation

The posterior probability distribution of one
random variable A random variable (also called random quantity, aleatory variable, or stochastic variable) is a mathematical formalization of a quantity or object which depends on random events. It is a mapping or a function from possible outcomes (e.g., the po ...
given the value of another can be calculated with
Bayes' theorem In probability theory and statistics, Bayes' theorem (alternatively Bayes' law or Bayes' rule), named after Thomas Bayes, describes the probability of an event, based on prior knowledge of conditions that might be related to the event. For examp ...
by multiplying the
prior probability distribution In Bayesian statistical inference, a prior probability distribution, often simply called the prior, of an uncertain quantity is the probability distribution that would express one's beliefs about this quantity before some evidence is taken in ...
by the
likelihood function The likelihood function (often simply called the likelihood) represents the probability of random variable realizations conditional on particular values of the statistical parameters. Thus, when evaluated on a given sample, the likelihood functi ...
, and then dividing by the
normalizing constant The concept of a normalizing constant arises in probability theory and a variety of other areas of mathematics. The normalizing constant is used to reduce any probability function to a probability density function with total probability of one. ...
, as follows: :f_(x)= gives the posterior
probability density function In probability theory, a probability density function (PDF), or density of a continuous random variable, is a function whose value at any given sample (or point) in the sample space (the set of possible values taken by the random variable) ca ...
for a random variable X given the data Y=y, where * f_X(x) is the prior density of X, * \mathcal L_(x) = f_(y) is the likelihood function as a function of x, * \int_^\infty f_X(u) \mathcal L_(u)\,du is the normalizing constant, and * f_(x) is the posterior density of X given the data Y=y.


Credible interval

Posterior probability is a conditional probability conditioned on randomly observed data. Hence it is a random variable. For a random variable, it is important to summarize its amount of uncertainty. One way to achieve this goal is to provide a
credible interval In Bayesian statistics, a credible interval is an interval within which an unobserved parameter value falls with a particular probability. It is an interval in the domain of a posterior probability distribution or a predictive distribution. The ...
of the posterior probability.


Classification

In
classification Classification is a process related to categorization, the process in which ideas and objects are recognized, differentiated and understood. Classification is the grouping of related facts into classes. It may also refer to: Business, organizat ...
, posterior probabilities reflect the uncertainty of assessing an observation to particular class, see also Class membership probabilities. While
statistical classification In statistics, classification is the problem of identifying which of a set of categories (sub-populations) an observation (or observations) belongs to. Examples are assigning a given email to the "spam" or "non-spam" class, and assigning a diagn ...
methods by definition generate posterior probabilities, Machine Learners usually supply membership values which do not induce any probabilistic confidence. It is desirable to transform or re-scale membership values to class membership probabilities, since they are comparable and additionally more easily applicable for post-processing.


See also

*
Prediction interval In statistical inference, specifically predictive inference, a prediction interval is an estimate of an interval in which a future observation will fall, with a certain probability, given what has already been observed. Prediction intervals are ...
*
Bernstein–von Mises theorem In Bayesian inference, the Bernstein-von Mises theorem provides the basis for using Bayesian credible sets for confidence statements in parametric models. It states that under some conditions, a posterior distribution converges in the limit of in ...
*
Probability of success Probability is the branch of mathematics concerning numerical descriptions of how likely an event is to occur, or how likely it is that a proposition is true. The probability of an event is a number between 0 and 1, where, roughly speaking ...
*
Bayesian epistemology Bayesian epistemology is a formal approach to various topics in epistemology that has its roots in Thomas Bayes' work in the field of probability theory. One advantage of its formal method in contrast to traditional epistemology is that its conc ...


References


Further reading

* * {{cite book , title = Bayesian Statistics : An Introduction , last = Lee , first = Peter M. , publisher= Wiley , year=2004 , edition=3rd , isbn=0-340-81405-5 Bayesian statistics