Multiple-try Metropolis (MTM) is a
sampling method
In statistics, quality assurance, and survey methodology, sampling is the selection of a subset (a statistical sample) of individuals from within a statistical population to estimate characteristics of the whole population. Statisticians attem ...
that is a modified form of the
Metropolis–Hastings method, first presented by Liu, Liang, and Wong in 2000.
It is designed to help the sampling trajectory converge faster,
by increasing both the step size and the acceptance rate.
Background
Problems with Metropolis–Hastings
In
Markov chain Monte Carlo
In statistics, Markov chain Monte Carlo (MCMC) methods comprise a class of algorithms for sampling from a probability distribution. By constructing a Markov chain that has the desired distribution as its equilibrium distribution, one can obtain ...
, the
Metropolis–Hastings algorithm
In statistics and statistical physics, the Metropolis–Hastings algorithm is a Markov chain Monte Carlo (MCMC) method for obtaining a sequence of random samples from a probability distribution from which direct sampling is difficult. This seq ...
(MH) can be used to sample from a
probability distribution
In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment. It is a mathematical description of a random phenomeno ...
which is difficult to sample from directly. However, the MH algorithm requires the user to supply a proposal distribution, which can be relatively arbitrary. In many cases, one uses a Gaussian distribution centered on the current point in the probability space, of the form
. This proposal distribution is convenient to sample from and may be the best choice if one has little knowledge about the target distribution,
. If desired, one can use the more general
multivariate normal distribution
In probability theory and statistics, the multivariate normal distribution, multivariate Gaussian distribution, or joint normal distribution is a generalization of the one-dimensional ( univariate) normal distribution to higher dimensions. One ...
,
, where
is the covariance matrix which the user believes is similar to the target distribution.
Although this method must converge to the stationary distribution in the limit of infinite sample size, in practice the progress can be exceedingly slow. If
is too large, almost all steps under the MH algorithm will be rejected. On the other hand, if
is too small, almost all steps will be accepted, and the Markov chain will be similar to a random walk through the probability space. In the simpler case of
, we see that
steps only takes us a distance of
. In this event, the Markov Chain will not fully explore the probability space in any reasonable amount of time. Thus the MH algorithm requires reasonable tuning of the scale parameter (
or
).
Problems with high dimensionality
Even if the scale parameter is well-tuned, as the dimensionality of the problem increases, progress can still remain exceedingly slow. To see this, again consider
. In one dimension, this corresponds to a Gaussian distribution with mean 0 and variance 1. For one dimension, this distribution has a mean step of zero, however the mean squared step size is given by
:
As the number of dimensions increases, the expected step size becomes larger and larger. In
dimensions, the probability of moving a radial distance
is related to the
Chi distribution
In probability theory and statistics, the chi distribution is a continuous probability distribution. It is the distribution of the positive square root of the sum of squares of a set of independent random variables each following a standard no ...
, and is given by
:
This distribution is peaked at
which is
for large
. This means that the step size will increase as the roughly the square root of the number of dimensions. For the MH algorithm, large steps will almost always land in regions of low probability, and therefore be rejected.
If we now add the scale parameter
back in, we find that to retain a reasonable acceptance rate, we must make the transformation
. In this situation, the acceptance rate can now be made reasonable, but the exploration of the probability space becomes increasingly slow. To see this, consider a slice along any one dimension of the problem. By making the scale transformation above, the expected step size is any one dimension is not
but instead is
. As this step size is much smaller than the "true" scale of the probability distribution (assuming that
is somehow known a priori, which is the best possible case), the algorithm executes a random walk along every parameter.
The multiple-try Metropolis algorithm
Suppose
is an arbitrary
proposal function
Proposal(s) or The Proposal may refer to:
* Proposal (business)
* Research proposal
* Proposal (marriage)
* Proposition, a proposal in logic and philosophy
Arts, entertainment, and media
* ''The Proposal'' (album)
Films
* ''The Proposal'' ( ...
. We require that
only if
. Additionally,
is the likelihood function.
Define
where
is a non-negative symmetric function in
and
that can be chosen by the user.
Now suppose the current state is
. The MTM algorithm is as follows:
1) Draw ''k'' independent trial proposals
from
. Compute the weights
for each of these.
2) Select
from the
with probability proportional to the weights.
3) Now produce a reference set by drawing
from the distribution
. Set
(the current point).
4) Accept
with probability
:
It can be shown that this method satisfies the
detailed balance The principle of detailed balance can be used in kinetic systems which are decomposed into elementary processes (collisions, or steps, or elementary reactions). It states that at equilibrium, each elementary process is in equilibrium with its reve ...
property and therefore produces a reversible Markov chain with
as the stationary distribution.
If
is symmetric (as is the case for the
multivariate normal distribution
In probability theory and statistics, the multivariate normal distribution, multivariate Gaussian distribution, or joint normal distribution is a generalization of the one-dimensional ( univariate) normal distribution to higher dimensions. One ...
), then one can choose
which gives
.
Disadvantages
Multiple-try Metropolis needs to compute the energy of
other states at every step.
If the slow part of the process is calculating the energy, then this method can be slower.
If the slow part of the process is finding neighbors of a given point, or generating random numbers, then again this method can be slower.
It can be argued that this method only appears faster because it puts much more computation into a "single step" than Metropolis-Hastings does.
See also
*
Markov chain Monte Carlo
In statistics, Markov chain Monte Carlo (MCMC) methods comprise a class of algorithms for sampling from a probability distribution. By constructing a Markov chain that has the desired distribution as its equilibrium distribution, one can obtain ...
*
Metropolis–Hastings algorithm
In statistics and statistical physics, the Metropolis–Hastings algorithm is a Markov chain Monte Carlo (MCMC) method for obtaining a sequence of random samples from a probability distribution from which direct sampling is difficult. This seq ...
*
Detailed balance The principle of detailed balance can be used in kinetic systems which are decomposed into elementary processes (collisions, or steps, or elementary reactions). It states that at equilibrium, each elementary process is in equilibrium with its reve ...
References
* Liu, J. S., Liang, F. and Wong, W. H. (2000). The multiple-try method and local optimization in Metropolis sampling, ''Journal of the American Statistical Association'', 95(449): 121–13
JSTOR
Monte Carlo methods
Markov chain Monte Carlo