
In
statistics
Statistics (from German language, German: ', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a s ...
and
statistical physics
In physics, statistical mechanics is a mathematical framework that applies statistical methods and probability theory to large assemblies of microscopic entities. Sometimes called statistical physics or statistical thermodynamics, its applicati ...
, the Metropolis–Hastings algorithm is a
Markov chain Monte Carlo
In statistics, Markov chain Monte Carlo (MCMC) is a class of algorithms used to draw samples from a probability distribution. Given a probability distribution, one can construct a Markov chain whose elements' distribution approximates it – that ...
(MCMC) method for obtaining a sequence of
random samples from a
probability distribution
In probability theory and statistics, a probability distribution is a Function (mathematics), function that gives the probabilities of occurrence of possible events for an Experiment (probability theory), experiment. It is a mathematical descri ...
from which direct sampling is difficult. New samples are added to the sequence in two steps: first a new sample is proposed based on the previous sample, then the proposed sample is either added to the sequence or rejected depending on the value of the probability distribution at that point. The resulting sequence can be used to approximate the distribution (e.g. to generate a
histogram
A histogram is a visual representation of the frequency distribution, distribution of quantitative data. To construct a histogram, the first step is to Data binning, "bin" (or "bucket") the range of values— divide the entire range of values in ...
) or to
compute an integral (e.g. an
expected value
In probability theory, the expected value (also called expectation, expectancy, expectation operator, mathematical expectation, mean, expectation value, or first Moment (mathematics), moment) is a generalization of the weighted average. Informa ...
).
Metropolis–Hastings and other MCMC algorithms are generally used for sampling from multi-dimensional distributions, especially when the number of dimensions is high. For single-dimensional distributions, there are usually other methods (e.g.
adaptive rejection sampling) that can directly return independent samples from the distribution, and these are free from the problem of
autocorrelated samples that is inherent in MCMC methods.
History
The algorithm is named in part for
Nicholas Metropolis, the first coauthor of a 1953 paper, entitled ''
Equation of State Calculations by Fast Computing Machines'', with
Arianna W. Rosenbluth,
Marshall Rosenbluth,
Augusta H. Teller and
Edward Teller
Edward Teller (; January 15, 1908 – September 9, 2003) was a Hungarian and American Theoretical physics, theoretical physicist and chemical engineer who is known colloquially as "the father of the hydrogen bomb" and one of the creators of ...
. For many years the algorithm was known simply as the ''Metropolis algorithm''. The paper proposed the algorithm for the case of symmetrical proposal distributions, but in 1970,
W.K. Hastings extended it to the more general case.
[ The generalized method was eventually identified by both names, although the first use of the term "Metropolis-Hastings algorithm" is unclear.
Some controversy exists with regard to credit for development of the Metropolis algorithm. Metropolis, who was familiar with the computational aspects of the method, had coined the term "Monte Carlo" in an earlier article with ]Stanisław Ulam
Stanisław Marcin Ulam ( ; 13 April 1909 – 13 May 1984) was a Polish and American mathematician, nuclear physicist and computer scientist. He participated in the Manhattan Project, originated the History of the Teller–Ulam design, Telle ...
, and led the group in the Theoretical Division that designed and built the MANIAC I computer used in the experiments in 1952. However, prior to 2003 there was no detailed account of the algorithm's development. Shortly before his death, Marshall Rosenbluth attended a 2003 conference at LANL marking the 50th anniversary of the 1953 publication. At this conference, Rosenbluth described the algorithm and its development in a presentation titled "Genesis of the Monte Carlo Algorithm for Statistical Mechanics".[ Further historical clarification is made by Gubernatis in a 2005 journal article][ recounting the 50th anniversary conference. Rosenbluth makes it clear that he and his wife Arianna did the work, and that Metropolis played no role in the development other than providing computer time.
This contradicts an account by Edward Teller, who states in his memoirs that the five authors of the 1953 article worked together for "days (and nights)".][ In contrast, the detailed account by Rosenbluth credits Teller with a crucial but early suggestion to "take advantage of ]statistical mechanics
In physics, statistical mechanics is a mathematical framework that applies statistical methods and probability theory to large assemblies of microscopic entities. Sometimes called statistical physics or statistical thermodynamics, its applicati ...
and take ensemble averages instead of following detailed kinematics
In physics, kinematics studies the geometrical aspects of motion of physical objects independent of forces that set them in motion. Constrained motion such as linked machine parts are also described as kinematics.
Kinematics is concerned with s ...
". This, says Rosenbluth, started him thinking about the generalized Monte Carlo approach – a topic which he says he had discussed often with John Von Neumann
John von Neumann ( ; ; December 28, 1903 – February 8, 1957) was a Hungarian and American mathematician, physicist, computer scientist and engineer. Von Neumann had perhaps the widest coverage of any mathematician of his time, in ...
. Arianna Rosenbluth recounted (to Gubernatis in 2003) that Augusta Teller started the computer work, but that Arianna herself took it over and wrote the code from scratch. In an oral history recorded shortly before his death,[ Rosenbluth again credits Teller with posing the original problem, himself with solving it, and Arianna with programming the computer.
]
Description
The Metropolis–Hastings algorithm can draw samples from any probability distribution
In probability theory and statistics, a probability distribution is a Function (mathematics), function that gives the probabilities of occurrence of possible events for an Experiment (probability theory), experiment. It is a mathematical descri ...
with probability density , provided that we know a function proportional to the density
Density (volumetric mass density or specific mass) is the ratio of a substance's mass to its volume. The symbol most often used for density is ''ρ'' (the lower case Greek letter rho), although the Latin letter ''D'' (or ''d'') can also be u ...
and the values of can be calculated. The requirement that must only be proportional to the density, rather than exactly equal to it, makes the Metropolis–Hastings algorithm particularly useful, because it removes the need to calculate the density's normalization factor, which is often extremely difficult in practice.
The Metropolis–Hastings algorithm generates a sequence of sample values in such a way that, as more and more sample values are produced, the distribution of values more closely approximates the desired distribution. These sample values are produced iteratively in such a way, that the distribution of the next sample depends only on the current sample value, which makes the sequence of samples a Markov chain
In probability theory and statistics, a Markov chain or Markov process is a stochastic process describing a sequence of possible events in which the probability of each event depends only on the state attained in the previous event. Informally ...
. Specifically, at each iteration, the algorithm proposes a candidate for the next sample value based on the current sample value. Then, with some probability, the candidate is either accepted, in which case the candidate value is used in the next iteration, or it is rejected in which case the candidate value is discarded, and the current value is reused in the next iteration. The probability of acceptance is determined by comparing the values of the function of the current and candidate sample values with respect to the desired distribution.
The method used to propose new candidates is characterized by the probability distribution (sometimes written ) of a new proposed sample given the previous sample . This is called the ''proposal density'', ''proposal function'', or ''jumping distribution''. A common choice for is a Gaussian distribution
In probability theory and statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real number, real-valued random variable. The general form of its probability density function is
f(x ...
centered at , so that points closer to are more likely to be visited next, making the sequence of samples into a Gaussian random walk. In the original paper by Metropolis et al. (1953), was suggested to be a uniform distribution limited to some maximum distance from . More complicated proposal functions are also possible, such as those of Hamiltonian Monte Carlo, Langevin Monte Carlo, or preconditioned Crank–Nicolson.
For the purpose of illustration, the Metropolis algorithm, a special case of the Metropolis–Hastings algorithm where the proposal function is symmetric, is described below.
;Metropolis algorithm (symmetric proposal distribution):
Let be a function that is proportional to the desired probability density function (a.k.a. a target distribution).
# Initialization: Choose an arbitrary point to be the first observation in the sample and choose a proposal function . In this section, is assumed to be symmetric; in other words, it must satisfy .
# For each iteration ''t'':
#* ''Propose'' a candidate for the next sample by picking from the distribution .
#* ''Calculate'' the ''acceptance ratio'' , which will be used to decide whether to accept or reject the candidate. Because ''f'' is proportional to the density of ''P'', we have that .
#* ''Accept or reject'':
#** Generate a uniform random number