In
Bayesian inference
Bayesian inference ( or ) is a method of statistical inference in which Bayes' theorem is used to calculate a probability of a hypothesis, given prior evidence, and update it as more information becomes available. Fundamentally, Bayesian infer ...
, the Bernstein–von Mises theorem provides the basis for using Bayesian credible sets for confidence statements in
parametric models. It states that under some conditions, a posterior distribution converges in
total variation distance
In probability theory, the total variation distance is a statistical distance between probability distributions, and is sometimes called the statistical distance, statistical difference or variational distance.
Definition
Consider a measurable ...
to a multivariate normal distribution centered at the maximum likelihood estimator
with covariance matrix given by
, where
is the true population parameter and
is the
Fisher information matrix at the true population parameter value:
:
The Bernstein–von Mises theorem links
Bayesian inference
Bayesian inference ( or ) is a method of statistical inference in which Bayes' theorem is used to calculate a probability of a hypothesis, given prior evidence, and update it as more information becomes available. Fundamentally, Bayesian infer ...
with
frequentist inference
Frequentist inference is a type of statistical inference based in frequentist probability, which treats “probability” in equivalent terms to “frequency” and draws conclusions from sample-data by means of emphasizing the frequency or pr ...
. It assumes there is some true probabilistic process that generates the observations, as in frequentism, and then studies the quality of Bayesian methods of recovering that process, and making uncertainty statements about that process. In particular, it states that asymptotically, many Bayesian credible sets of a certain credibility level
will act as confidence sets of confidence level
, which allows for the interpretation of Bayesian credible sets.
Statement
Let
be a well-specified statistical model, where the parameter space
is a subset of
. Further, let data
be independently and identically distributed from
. Suppose that all of the following conditions hold:
# The model admits densities
with respect to some measure
.
# The Fisher information matrix
is nonsingular.
# The model is differentiable in quadratic mean. That is, there exists a measurable function
such that
as
.
# For every
, there exists a sequence of test functions