HOME

TheInfoList



OR:

In
Bayesian inference Bayesian inference is a method of statistical inference in which Bayes' theorem is used to update the probability for a hypothesis as more evidence or information becomes available. Bayesian inference is an important technique in statistics, a ...
, the Bernstein-von Mises theorem provides the basis for using Bayesian credible sets for confidence statements in parametric models. It states that under some conditions, a posterior distribution converges in the limit of infinite data to a multivariate normal distribution centered at the maximum likelihood estimator with covariance matrix given by n^ I(\theta_0)^ , where \theta_0 is the true population parameter and I(\theta_0) is the
Fisher information matrix In mathematical statistics, the Fisher information (sometimes simply called information) is a way of measuring the amount of information that an observable random variable ''X'' carries about an unknown parameter ''θ'' of a distribution that model ...
at the true population parameter value. The Bernstein-von Mises theorem links
Bayesian inference Bayesian inference is a method of statistical inference in which Bayes' theorem is used to update the probability for a hypothesis as more evidence or information becomes available. Bayesian inference is an important technique in statistics, a ...
with
frequentist inference Frequentist inference is a type of statistical inference based in frequentist probability, which treats “probability” in equivalent terms to “frequency” and draws conclusions from sample-data by means of emphasizing the frequency or pro ...
. It assumes there is some true probabilistic process that generates the observations, as in frequentism, and then studies the quality of Bayesian methods of recovering that process, and making uncertainty statements about that process. In particular, it states that Bayesian credible sets of a certain credibility level \alpha will asymptotically be confidence sets of confidence level \alpha, which allows for the interpretation of Bayesian credible sets.


Heuristic statement

In a model (P_\theta: \theta \in \Theta), under certain regularity conditions (finite-dimensional, well-specified, smooth, existence of tests), if the prior distribution \Pi on \theta has a density with respect to the Lebesgue measure which is smooth enough (near \theta_0 bounded away from zero), the total variation distance between the rescaled posterior distribution (by centring and rescaling to \sqrt(\theta - \theta_0)) and a Gaussian distribution centred on any
efficient estimator In statistics, efficiency is a measure of quality of an estimator, of an experimental design, or of a hypothesis testing procedure. Essentially, a more efficient estimator, needs fewer input data or observations than a less efficient one to achie ...
and with the inverse Fisher information as variance will converge in probability to zero.


Bernstein–von Mises and maximum likelihood estimation

In case the
maximum likelihood estimator In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of an assumed probability distribution, given some observed data. This is achieved by maximizing a likelihood function so that, under the assumed statist ...
is an efficient estimator, we can plug this in, and we recover a common, more specific, version of the Bernstein–von Mises theorem.


Implications

The most important implication of the Bernstein–von Mises theorem is that the Bayesian inference is asymptotically correct from a frequentist point of view. This means that for large amounts of data, one can use the posterior distribution to make, from a frequentist point of view, valid statements about estimation and uncertainty.


History

The theorem is named after
Richard von Mises Richard Edler von Mises (; 19 April 1883 – 14 July 1953) was an Austrian scientist and mathematician who worked on solid mechanics, fluid mechanics, aerodynamics, aeronautics, statistics and probability theory. He held the position of Gordo ...
and S. N. Bernstein although the first proper proof was given by Joseph L. Doob in 1949 for random variables with finite
probability space In probability theory, a probability space or a probability triple (\Omega, \mathcal, P) is a mathematical construct that provides a formal model of a random process or "experiment". For example, one can define a probability space which models t ...
. Later
Lucien Le Cam Lucien Marie Le Cam (November 18, 1924 – April 25, 2000) was a mathematician and statistician. Biography Le Cam was born November 18, 1924 in Croze, France. His parents were farmers, and unable to afford higher education for him; his father die ...
, his PhD student
Lorraine Schwartz Lorraine Schwartz is an American bespoke high jewellery designer. She has designed jewelleries for a host of famous female celebrities including Beyoncé, Blake Lively and Kim Kardashian. In 2009, Angelina Jolie showed up at the Oscars wearing ...
, David A. Freedman and
Persi Diaconis Persi Warren Diaconis (; born January 31, 1945) is an American mathematician of Greek descent and former professional magician. He is the Mary V. Sunseri Professor of Statistics and Mathematics at Stanford University. He is particularly know ...
extended the proof under more general assumptions.


Limitations

In case of a misspecified model, the posterior distribution will also become asymptotically Gaussian with a correct mean, but not necessarily with the Fisher information as the variance. This implies that Bayesian credible sets of level \alpha cannot be interpreted as confidence sets of level \alpha. In the case of nonparametric statistics, the Bernstein-von Mises theorem usually fails to hold with a notable exception of the
Dirichlet process In probability theory, Dirichlet processes (after the distribution associated with Peter Gustav Lejeune Dirichlet) are a family of stochastic processes whose realizations are probability distributions. In other words, a Dirichlet process is a pro ...
. A remarkable result was found by Freedman in 1965: the Bernstein–von Mises theorem does not hold
almost surely In probability theory, an event is said to happen almost surely (sometimes abbreviated as a.s.) if it happens with probability 1 (or Lebesgue measure 1). In other words, the set of possible exceptions may be non-empty, but it has probability 0. ...
if the random variable has an infinite countable
probability space In probability theory, a probability space or a probability triple (\Omega, \mathcal, P) is a mathematical construct that provides a formal model of a random process or "experiment". For example, one can define a probability space which models t ...
; however, this depends on allowing a very broad range of possible priors. In practice, the priors used typically in research do have the desirable property even with an infinite countable
probability space In probability theory, a probability space or a probability triple (\Omega, \mathcal, P) is a mathematical construct that provides a formal model of a random process or "experiment". For example, one can define a probability space which models t ...
. Different summary statistics such as the
mode Mode ( la, modus meaning "manner, tune, measure, due measure, rhythm, melody") may refer to: Arts and entertainment * '' MO''D''E (magazine)'', a defunct U.S. women's fashion magazine * ''Mode'' magazine, a fictional fashion magazine which is ...
and mean may behave differently in the posterior distribution. In Freedman's examples, the posterior density and its mean can converge on the wrong result, but the posterior mode is consistent and will converge on the correct result.


Notes


References

* * Doob, Joseph L. (1949), ''Application of the theory of martingales''. Colloq. Intern. du C.N.R.S (Paris), No. 13, pp. 23–27. * Freedman, David A. (1963). ''On the asymptotic behaviour of Bayes estimates in the discrete case I''. The Annals of Mathematical Statistics, vol. 34, pp. 1386–1403. * Freedman, David A. (1965). ''On the asymptotic behaviour of Bayes estimates in the discrete case II''. The Annals of Mathematical Statistics, vol. 36, pp. 454–456. * Le Cam, Lucien (1986). ''Asymptotic Methods in Statistical Decision Theory'', Springer. (Pages 336 and 618–621). * Lorraine Schwartz (1965). ''On Bayes procedures''. Z. Wahrscheinlichkeitstheorie, No. 4, pp. 10–26. {{DEFAULTSORT:Bernstein-von Mises theorem Bayesian inference Theorems in statistics