A prior probability distribution of an uncertain quantity, simply called the prior, is its assumed
probability distribution
In probability theory and statistics, a probability distribution is a Function (mathematics), function that gives the probabilities of occurrence of possible events for an Experiment (probability theory), experiment. It is a mathematical descri ...
before some evidence is taken into account. For example, the prior could be the probability distribution representing the relative proportions of voters who will vote for a particular politician in a future election. The unknown quantity may be a
parameter
A parameter (), generally, is any characteristic that can help in defining or classifying a particular system (meaning an event, project, object, situation, etc.). That is, a parameter is an element of a system that is useful, or critical, when ...
of the model or a
latent variable
In statistics, latent variables (from Latin: present participle of ) are variables that can only be inferred indirectly through a mathematical model from other observable variables that can be directly observed or measured. Such '' latent va ...
rather than an
observable variable.
In
Bayesian statistics
Bayesian statistics ( or ) is a theory in the field of statistics based on the Bayesian interpretation of probability, where probability expresses a ''degree of belief'' in an event. The degree of belief may be based on prior knowledge about ...
,
Bayes' rule prescribes how to update the prior with new information to obtain the
posterior probability distribution, which is the conditional distribution of the uncertain quantity given new data. Historically, the choice of priors was often constrained to a
conjugate family of a given
likelihood function
A likelihood function (often simply called the likelihood) measures how well a statistical model explains observed data by calculating the probability of seeing that data under different parameter values of the model. It is constructed from the ...
, so that it would result in a tractable posterior of the same family. The widespread availability of
Markov chain Monte Carlo
In statistics, Markov chain Monte Carlo (MCMC) is a class of algorithms used to draw samples from a probability distribution. Given a probability distribution, one can construct a Markov chain whose elements' distribution approximates it – that ...
methods, however, has made this less of a concern.
There are many ways to construct a prior distribution. In some cases, a prior may be determined from past information, such as previous experiments. A prior can also be ''elicited'' from the purely subjective assessment of an experienced expert. When no information is available, an uninformative prior may be adopted as justified by the
principle of indifference.
In modern applications, priors are also often chosen for their mechanical properties, such as
regularization and
feature selection
In machine learning, feature selection is the process of selecting a subset of relevant Feature (machine learning), features (variables, predictors) for use in model construction. Feature selection techniques are used for several reasons:
* sim ...
.
The prior distributions of model parameters will often depend on parameters of their own. Uncertainty about these
hyperparameters can, in turn, be expressed as
hyperprior probability distributions. For example, if one uses a
beta distribution
In probability theory and statistics, the beta distribution is a family of continuous probability distributions defined on the interval , 1
The comma is a punctuation mark that appears in several variants in different languages. Some typefaces render it as a small line, slightly curved or straight, but inclined from the vertical; others give it the appearance of a miniature fille ...
or (0, 1) in terms of two positive Statistical parameter, parameters, denoted by ''alpha'' (''α'') an ...
to model the distribution of the parameter ''p'' of a
Bernoulli distribution
In probability theory and statistics, the Bernoulli distribution, named after Swiss mathematician Jacob Bernoulli, is the discrete probability distribution of a random variable which takes the value 1 with probability p and the value 0 with pro ...
, then:
* ''p'' is a parameter of the underlying system (Bernoulli distribution), and
* ''α'' and ''β'' are parameters of the prior distribution (beta distribution); hence ''hyper''parameters.
In principle, priors can be decomposed into many conditional levels of distributions, so-called ''hierarchical priors''.
Informative priors
An ''informative prior'' expresses specific, definite information about a variable.
An example is a prior distribution for the temperature at noon tomorrow.
A reasonable approach is to make the prior a
normal distribution
In probability theory and statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is
f(x) = \frac ...
with
expected value
In probability theory, the expected value (also called expectation, expectancy, expectation operator, mathematical expectation, mean, expectation value, or first Moment (mathematics), moment) is a generalization of the weighted average. Informa ...
equal to today's noontime temperature, with
variance
In probability theory and statistics, variance is the expected value of the squared deviation from the mean of a random variable. The standard deviation (SD) is obtained as the square root of the variance. Variance is a measure of dispersion ...
equal to the day-to-day variance of atmospheric temperature,
or a distribution of the temperature for that day of the year.
This example has a property in common with many priors, namely, that the posterior from one problem (today's temperature) becomes the prior for another problem (tomorrow's temperature); pre-existing evidence which has already been taken into account is part of the prior and, as more evidence accumulates, the posterior is determined largely by the evidence rather than any original assumption, provided that the original assumption admitted the possibility of what the evidence is suggesting. The terms "prior" and "posterior" are generally relative to a specific datum or observation.
Strong prior
A strong prior is a preceding assumption, theory, concept or idea upon which, after taking account of new information, a current assumption, theory, concept or idea is founded. A strong prior is a type of informative prior in which the information contained in the prior distribution dominates the information contained in the data being analyzed. The
Bayesian analysis
Thomas Bayes ( ; c. 1701 – 1761) was an English statistician, philosopher, and Presbyterian
Presbyterianism is a historically Reformed Protestant tradition named for its form of church government by representative assemblies of elde ...
combines the information contained in the prior with that extracted from the data to produce the
posterior distribution which, in the case of a "strong prior", would be little changed from the prior distribution.
Weakly informative priors
A ''weakly informative prior'' expresses partial information about a variable, steering the analysis toward solutions that align with existing knowledge without overly constraining the results and preventing extreme estimates. An example is, when setting the prior distribution for the temperature at noon tomorrow in St. Louis, to use a normal distribution with mean 50 degrees Fahrenheit and standard deviation 40 degrees, which very loosely constrains the temperature to the range (10 degrees, 90 degrees) with a small chance of being below -30 degrees or above 130 degrees. The purpose of a weakly informative prior is for
regularization, that is, to keep inferences in a reasonable range.
Uninformative priors
An ''uninformative'', ''flat'', or ''diffuse prior'' expresses vague or general information about a variable.
The term "uninformative prior" is somewhat of a misnomer. Such a prior might also be called a ''not very informative prior'', or an ''objective prior'', i.e., one that is not subjectively elicited.
Uninformative priors can express "objective" information such as "the variable is positive" or "the variable is less than some limit". The simplest and oldest rule for determining a non-informative prior is the
principle of indifference, which assigns equal probabilities to all possibilities. In parameter estimation problems, the use of an uninformative prior typically yields results which are not too different from conventional statistical analysis, as the likelihood function often yields more information than the uninformative prior.
Some attempts have been made at finding
a priori probabilities, i.e., probability distributions in some sense logically required by the nature of one's state of uncertainty; these are a subject of philosophical controversy, with Bayesians being roughly divided into two schools: "objective Bayesians", who believe such priors exist in many useful situations, and "subjective Bayesians" who believe that in practice priors usually represent subjective judgements of opinion that cannot be rigorously justified (Williamson 2010). Perhaps the strongest arguments for objective Bayesianism were given by
Edwin T. Jaynes, based mainly on the consequences of symmetries and on the principle of maximum entropy.
As an example of an a priori prior, due to Jaynes (2003), consider a situation in which one knows a ball has been hidden under one of three cups, A, B, or C, but no other information is available about its location. In this case a ''uniform prior'' of ''p''(''A'') = ''p''(''B'') = ''p''(''C'') = 1/3 seems intuitively like the only reasonable choice. More formally, we can see that the problem remains the same if we swap around the labels ("A", "B" and "C") of the cups. It would therefore be odd to choose a prior for which a permutation of the labels would cause a change in our predictions about which cup the ball will be found under; the uniform prior is the only one which preserves this invariance. If one accepts this invariance principle then one can see that the uniform prior is the logically correct prior to represent this state of knowledge. This prior is "objective" in the sense of being the correct choice to represent a particular state of knowledge, but it is not objective in the sense of being an observer-independent feature of the world: in reality the ball exists under a particular cup, and it only makes sense to speak of probabilities in this situation if there is an observer with limited knowledge about the system.
As a more contentious example, Jaynes published an argument based on the invariance of the prior under a change of parameters that suggests that the prior representing complete uncertainty about a probability should be the
Haldane prior ''p''
−1(1 − ''p'')
−1.
The example Jaynes gives is of finding a chemical in a lab and asking whether it will dissolve in water in repeated experiments. The Haldane prior gives by far the most weight to
and
, indicating that the sample will either dissolve every time or never dissolve, with equal probability. However, if one has observed samples of the chemical to dissolve in one experiment and not to dissolve in another experiment then this prior is updated to the
uniform distribution on the interval
, 1
The comma is a punctuation mark that appears in several variants in different languages. Some typefaces render it as a small line, slightly curved or straight, but inclined from the vertical; others give it the appearance of a miniature fille ...
This is obtained by applying
Bayes' theorem
Bayes' theorem (alternatively Bayes' law or Bayes' rule, after Thomas Bayes) gives a mathematical rule for inverting Conditional probability, conditional probabilities, allowing one to find the probability of a cause given its effect. For exampl ...
to the data set consisting of one observation of dissolving and one of not dissolving, using the above prior. The Haldane prior is an improper prior distribution (meaning that it has an infinite mass).
Harold Jeffreys
Sir Harold Jeffreys, FRS (22 April 1891 – 18 March 1989) was a British geophysicist who made significant contributions to mathematics and statistics. His book, ''Theory of Probability'', which was first published in 1939, played an importan ...
devised a systematic way for designing uninformative priors as e.g.,
Jeffreys prior ''p''
−1/2(1 − ''p'')
−1/2 for the Bernoulli random variable.
Priors can be constructed which are proportional to the
Haar measure
In mathematical analysis, the Haar measure assigns an "invariant volume" to subsets of locally compact topological groups, consequently defining an integral for functions on those groups.
This Measure (mathematics), measure was introduced by Alfr� ...
if the parameter space ''X'' carries a
natural group structure which leaves invariant our Bayesian state of knowledge.
This can be seen as a generalisation of the invariance principle used to justify the uniform prior over the three cups in the example above. For example, in physics we might expect that an experiment will give the same results regardless of our choice of the origin of a coordinate system. This induces the group structure of the
translation group on ''X'', which determines the prior probability as a constant
improper prior. Similarly, some measurements are naturally invariant to the choice of an arbitrary scale (e.g., whether centimeters or inches are used, the physical results should be equal). In such a case, the scale group is the natural group structure, and the corresponding prior on ''X'' is proportional to 1/''x''. It sometimes matters whether we use the left-invariant or right-invariant Haar measure. For example, the left and right invariant Haar measures on the
affine group
In mathematics, the affine group or general affine group of any affine space is the group of all invertible affine transformations from the space into itself. In the case of a Euclidean space (where the associated field of scalars is the real nu ...
are not equal. Berger (1985, p. 413) argues that the right-invariant Haar measure is the correct choice.
Another idea, championed by
Edwin T. Jaynes, is to use the
principle of maximum entropy (MAXENT). The motivation is that the
Shannon entropy of a probability distribution measures the amount of information contained in the distribution. The larger the entropy, the less information is provided by the distribution. Thus, by maximizing the entropy over a suitable set of probability distributions on ''X'', one finds the distribution that is least informative in the sense that it contains the least amount of information consistent with the constraints that define the set. For example, the maximum entropy prior on a discrete space, given only that the probability is normalized to 1, is the prior that assigns equal probability to each state. And in the continuous case, the maximum entropy prior given that the density is normalized with mean zero and unit variance is the standard
normal distribution
In probability theory and statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is
f(x) = \frac ...
. The principle of ''
minimum cross-entropy'' generalizes MAXENT to the case of "updating" an arbitrary prior distribution with suitable constraints in the maximum-entropy sense.
A related idea,
reference priors, was introduced by
José-Miguel Bernardo. Here, the idea is to maximize the expected
Kullback–Leibler divergence
In mathematical statistics, the Kullback–Leibler (KL) divergence (also called relative entropy and I-divergence), denoted D_\text(P \parallel Q), is a type of statistical distance: a measure of how much a model probability distribution is diff ...
of the posterior distribution relative to the prior. This maximizes the expected posterior information about ''X'' when the prior density is ''p''(''x''); thus, in some sense, ''p''(''x'') is the "least informative" prior about X. The reference prior is defined in the asymptotic limit, i.e., one considers the limit of the priors so obtained as the number of data points goes to infinity. In the present case, the KL divergence between the prior and posterior distributions is given by
Here,
is a sufficient statistic for some parameter
. The inner integral is the KL divergence between the posterior
and prior
distributions and the result is the weighted mean over all values of
. Splitting the logarithm into two parts, reversing the order of integrals in the second part and noting that
does not depend on
yields
The inner integral in the second part is the integral over
of the joint density
. This is the marginal distribution
, so we have
Now we use the concept of entropy which, in the case of probability distributions, is the negative expected value of the logarithm of the probability mass or density function or
Using this in the last equation yields
In words, KL is the negative expected value over
of the entropy of
conditional on
plus the marginal (i.e., unconditional) entropy of
. In the limiting case where the sample size tends to infinity, the
Bernstein-von Mises theorem states that the distribution of
conditional on a given observed value of
is normal with a variance equal to the reciprocal of the Fisher information at the 'true' value of
. The entropy of a normal density function is equal to half the logarithm of
where
is the variance of the distribution. In this case therefore
where
is the arbitrarily large sample size (to which Fisher information is proportional) and
is the 'true' value. Since this does not depend on
it can be taken out of the integral, and as this integral is over a probability space it equals one. Hence we can write the asymptotic form of KL as
where
is proportional to the (asymptotically large) sample size. We do not know the value of
. Indeed, the very idea goes against the philosophy of Bayesian inference in which 'true' values of parameters are replaced by prior and posterior distributions. So we remove
by replacing it with
and taking the expected value of the normal entropy, which we obtain by multiplying by
and integrating over
. This allows us to combine the logarithms yielding
This is a quasi-KL divergence ("quasi" in the sense that the square root of the Fisher information may be the kernel of an improper distribution). Due to the minus sign, we need to minimise this in order to maximise the KL divergence with which we started. The minimum value of the last equation occurs where the two distributions in the logarithm argument, improper or not, do not diverge. This in turn occurs when the prior distribution is proportional to the square root of the Fisher information of the likelihood function. Hence in the single parameter case, reference priors and Jeffreys priors are identical, even though Jeffreys has a very different rationale.
Reference priors are often the objective prior of choice in multivariate problems, since other rules (e.g.,
Jeffreys' rule) may result in priors with problematic behavior.
Objective prior distributions may also be derived from other principles, such as
information
Information is an Abstraction, abstract concept that refers to something which has the power Communication, to inform. At the most fundamental level, it pertains to the Interpretation (philosophy), interpretation (perhaps Interpretation (log ...
or
coding theory
Coding theory is the study of the properties of codes and their respective fitness for specific applications. Codes are used for data compression, cryptography, error detection and correction, data transmission and computer data storage, data sto ...
(see e.g.,
minimum description length) or
frequentist statistics (so-called
probability matching priors). Such methods are used in
Solomonoff's theory of inductive inference. Constructing objective priors have been recently introduced in bioinformatics, and specially inference in cancer systems biology, where sample size is limited and a vast amount of prior knowledge is available. In these methods, either an information theory based criterion, such as KL divergence or log-likelihood function for binary supervised learning problems and mixture model problems.
Philosophical problems associated with uninformative priors are associated with the choice of an appropriate metric, or measurement scale. Suppose we want a prior for the running speed of a runner who is unknown to us. We could specify, say, a normal distribution as the prior for his speed, but alternatively we could specify a normal prior for the time he takes to complete 100 metres, which is proportional to the reciprocal of the first prior. These are very different priors, but it is not clear which is to be preferred. Jaynes'
method of transformation groups can answer this question in some situations.
Similarly, if asked to estimate an unknown proportion between 0 and 1, we might say that all proportions are equally likely, and use a uniform prior. Alternatively, we might say that all orders of magnitude for the proportion are equally likely, the , which is the uniform prior on the logarithm of proportion. The
Jeffreys prior attempts to solve this problem by computing a prior which expresses the same belief no matter which metric is used. The Jeffreys prior for an unknown proportion ''p'' is ''p''
−1/2(1 − ''p'')
−1/2, which differs from Jaynes' recommendation.
Priors based on notions of
algorithmic probability are used in
inductive inference
Inductive reasoning refers to a variety of methods of reasoning in which the conclusion of an argument is supported not with deductive certainty, but with some degree of probability. Unlike ''deductive'' reasoning (such as mathematical inducti ...
as a basis for induction in very general settings.
Practical problems associated with uninformative priors include the requirement that the posterior distribution be proper. The usual uninformative priors on continuous, unbounded variables are improper. This need not be a problem if the posterior distribution is proper. Another issue of importance is that if an uninformative prior is to be used ''routinely'', i.e., with many different data sets, it should have good
frequentist properties. Normally a
Bayesian would not be concerned with such issues, but it can be important in this situation. For example, one would want any
decision rule based on the posterior distribution to be
admissible under the adopted loss function. Unfortunately, admissibility is often difficult to check, although some results are known (e.g., Berger and Strawderman 1996). The issue is particularly acute with
hierarchical Bayes models; the usual priors (e.g., Jeffreys' prior) may give badly inadmissible decision rules if employed at the higher levels of the hierarchy.
Improper priors
Let events
be mutually exclusive and exhaustive. If Bayes' theorem is written as
then it is clear that the same result would be obtained if all the prior probabilities ''P''(''A''
''i'') and ''P''(''A''
''j'') were multiplied by a given constant; the same would be true for a
continuous random variable
In probability theory and statistics, a probability distribution is a function that gives the probabilities of occurrence of possible events for an experiment. It is a mathematical description of a random phenomenon in terms of its sample spa ...
. If the summation in the denominator converges, the posterior probabilities will still sum (or integrate) to 1 even if the prior values do not, and so the priors may only need to be specified in the correct proportion. Taking this idea further, in many cases the sum or integral of the prior values may not even need to be finite to get sensible answers for the posterior probabilities. When this is the case, the prior is called an improper prior. However, the posterior distribution need not be a proper distribution if the prior is improper. This is clear from the case where event ''B'' is independent of all of the ''A''
''j''.
Statisticians sometimes use improper priors as
uninformative priors. For example, if they need a prior distribution for the mean and variance of a random variable, they may assume ''p''(''m'', ''v'') ~ 1/''v'' (for ''v'' > 0) which would suggest that any value for the mean is "equally likely" and that a value for the positive variance becomes "less likely" in inverse proportion to its value. Many authors (Lindley, 1973; De Groot, 1937; Kass and Wasserman, 1996) warn against the danger of over-interpreting those priors since they are not probability densities. The only relevance they have is found in the corresponding posterior, as long as it is well-defined for all observations. (The
Haldane prior is a typical counterexample.)
By contrast,
likelihood function
A likelihood function (often simply called the likelihood) measures how well a statistical model explains observed data by calculating the probability of seeing that data under different parameter values of the model. It is constructed from the ...
s do not need to be integrated, and a likelihood function that is uniformly 1 corresponds to the absence of data (all models are equally likely, given no data): Bayes' rule multiplies a prior by the likelihood, and an empty product is just the constant likelihood 1. However, without starting with a prior probability distribution, one does not end up getting a
posterior probability
The posterior probability is a type of conditional probability that results from updating the prior probability with information summarized by the likelihood via an application of Bayes' rule. From an epistemological perspective, the posteri ...
distribution, and thus cannot integrate or compute expected values or loss. See for details.
Examples
Examples of improper priors include:
* The
uniform distribution on an infinite interval (i.e., a half-line or the entire real line).
* Beta(0,0), the
beta distribution
In probability theory and statistics, the beta distribution is a family of continuous probability distributions defined on the interval , 1
The comma is a punctuation mark that appears in several variants in different languages. Some typefaces render it as a small line, slightly curved or straight, but inclined from the vertical; others give it the appearance of a miniature fille ...
or (0, 1) in terms of two positive Statistical parameter, parameters, denoted by ''alpha'' (''α'') an ...
for ''α''=0, ''β''=0 (uniform distribution on
log-odds scale).
* The logarithmic prior on the
positive reals
Positive is a property of positivity and may refer to:
Mathematics and science
* Positive formula, a logical formula not containing negation
* Positive number, a number that is greater than 0
* Plus sign, the sign "+" used to indicate a posit ...
(uniform distribution on
log scale).
These functions, interpreted as uniform distributions, can also be interpreted as the
likelihood function
A likelihood function (often simply called the likelihood) measures how well a statistical model explains observed data by calculating the probability of seeing that data under different parameter values of the model. It is constructed from the ...
in the absence of data, but are not proper priors.
Prior probability in statistical mechanics
While in Bayesian statistics the prior probability is used to represent initial beliefs about an uncertain parameter, in
statistical mechanics
In physics, statistical mechanics is a mathematical framework that applies statistical methods and probability theory to large assemblies of microscopic entities. Sometimes called statistical physics or statistical thermodynamics, its applicati ...
the a priori probability is used to describe the initial state of a system. The classical version is defined as the ratio of the number of
elementary events (e.g., the number of times a die is thrown) to the total number of events—and these considered purely deductively, i.e., without any experimenting. In the case of the die if we look at it on the table without throwing it, each elementary event is reasoned deductively to have the same probability—thus the probability of each outcome of an imaginary throwing of the (perfect) die or simply by counting the number of faces is 1/6. Each face of the die appears with equal probability—probability being a measure defined for each elementary event. The result is different if we throw the die twenty times and ask how many times (out of 20) the number 6 appears on the upper face. In this case time comes into play and we have a different type of probability depending on time or the number of times the die is thrown. On the other hand, the a priori probability is independent of time—you can look at the die on the table as long as you like without touching it and you deduce the probability for the number 6 to appear on the upper face is 1/6.
In statistical mechanics, e.g., that of a gas contained in a finite volume
, both the spatial coordinates
and the momentum coordinates
of the individual gas elements (atoms or molecules) are finite in the phase space spanned by these coordinates. In analogy to the case of the die, the a priori probability is here (in the case of a continuum) proportional to the phase space volume element
divided by
, and is the number of standing waves (i.e., states) therein, where
is the range of the variable
and
is the range of the variable
(here for simplicity considered in one dimension). In 1 dimension (length
) this number or statistical weight or a priori weighting is
. In customary 3 dimensions (volume
) the corresponding number can be calculated to be
. In order to understand this quantity as giving a number of states in quantum (i.e., wave) mechanics, recall that in quantum mechanics every particle is associated with a matter wave which is the solution of a
Schrödinger equation
The Schrödinger equation is a partial differential equation that governs the wave function of a non-relativistic quantum-mechanical system. Its discovery was a significant landmark in the development of quantum mechanics. It is named after E ...
. In the case of free particles (of energy
) like those of a gas in a box of volume
such a matter wave is explicitly
where
are integers. The number of different
values and hence states in the region between
is then found to be the above expression
by considering the area covered by these points.
Moreover, in view of the
uncertainty relation, which in 1 spatial dimension is
these states are indistinguishable (i.e., these states do not carry labels). An important consequence is a result known as
Liouville's theorem, i.e., the time independence of this phase space volume element and thus of the a priori probability. A time dependence of this quantity would imply known information about the dynamics of the system, and hence would not be an a priori probability.
Thus the region
when differentiated with respect to time
yields zero (with the help of Hamilton's equations): The volume at time
is the same as at time zero. One describes this also as conservation of information.
In the full quantum theory one has an analogous conservation law. In this case, the phase space region is replaced by a subspace of the space of states expressed in terms of a projection operator
, and instead of the probability in phase space, one has the probability density
where
is the dimensionality of the subspace. The conservation law in this case is expressed by the unitarity of the
S-matrix
In physics, the ''S''-matrix or scattering matrix is a Matrix (mathematics), matrix that relates the initial state and the final state of a physical system undergoing a scattering, scattering process. It is used in quantum mechanics, scattering ...
. In either case, the considerations assume a closed isolated system. This closed isolated system is a system with (1) a fixed energy
and (2) a fixed number of particles
in (c) a state of equilibrium. If one considers a huge number of replicas of this system, one obtains what is called a ''
microcanonical ensemble
In statistical mechanics, the microcanonical ensemble is a statistical ensemble that represents the possible states of a mechanical system whose total energy is exactly specified. The system is assumed to be isolated in the sense that it canno ...
''. It is for this system that one postulates in quantum statistics the "fundamental postulate of equal a priori probabilities of an isolated system." This says that the isolated system in equilibrium occupies each of its accessible states with the same probability. This fundamental postulate therefore allows us to equate the a priori probability to the degeneracy of a system, i.e., to the number of different states with the same energy.
Example
The following example illustrates the a priori probability (or a priori weighting) in (a) classical and (b) quantal contexts.
Priori probability and distribution functions
In statistical mechanics (see any book) one derives the so-called
distribution functions for various statistics. In the case of
Fermi–Dirac statistics and
Bose–Einstein statistics these functions are
respectively
These functions are derived for (1) a system in dynamic equilibrium (i.e., under steady, uniform conditions) with (2) total (and huge) number of particles
(this condition determines the constant
), and (3) total energy
, i.e., with each of the
particles having the energy
. An important aspect in the derivation is the taking into account of the indistinguishability of particles and states in quantum statistics, i.e., there particles and states do not have labels. In the case of fermions, like electrons, obeying the
Pauli principle (only one particle per state or none allowed), one has therefore
Thus
is a measure of the fraction of states actually occupied by electrons at energy
and temperature
. On the other hand, the a priori probability
is a measure of the number of wave mechanical states available. Hence
Since
is constant under uniform conditions (as many particles as flow out of a volume element also flow in steadily, so that the situation in the element appears static), i.e., independent of time
, and
is also independent of time
as shown earlier, we obtain
Expressing this equation in terms of its partial derivatives, one obtains the
Boltzmann transport equation. How do coordinates
etc. appear here suddenly? Above no mention was made of electric or other fields. Thus with no such fields present we have the Fermi-Dirac distribution as above. But with such fields present we have this additional dependence of
.
See also
*
Base rate
In probability and statistics, the base rate (also known as prior probabilities) is the class of probabilities unconditional on "featural evidence" ( likelihoods).
It is the proportion of individuals in a population who have a certain characte ...
*
Base rate fallacy
The base rate fallacy, also called base rate neglect or base rate bias, is a type of fallacy in which people tend to ignore the base rate (e.g., general prevalence) in favor of the individuating information (i.e., information pertaining only to a ...
*
Bayesian epistemology
*
Strong prior
Notes
References
*
*
*
*
*
*
*
*
External links
PriorDBa collaborative database of models and their priors
{{DEFAULTSORT:Prior Probability
Bayesian statistics
Probability assessment