statistics Statistics (from German language, German: ''wikt:Statistik#German, Statistik'', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of ...

, Markov chain Monte Carlo (MCMC) methods comprise a class of

algorithm In mathematics and computer science, an algorithm () is a finite sequence of rigorous instructions, typically used to solve a class of specific Computational problem, problems or to perform a computation. Algorithms are used as specificat ...

s for sampling from a probability distribution. By constructing a Markov chain that has the desired distribution as its equilibrium distribution, one can obtain a sample of the desired distribution by recording states from the chain. The more steps that are included, the more closely the distribution of the sample matches the actual desired distribution. Various algorithms exist for constructing chains, including the

Metropolis–Hastings algorithm In statistics and statistical physics, the Metropolis–Hastings algorithm is a Markov chain Monte Carlo (MCMC) method for obtaining a sequence of random samples from a probability distribution from which direct sampling is difficult. This sequ ...

Application domains

MCMC methods are primarily used for calculating numerical approximations of multi-dimensional integrals, for example in

Bayesian statistics Bayesian statistics is a theory in the field of statistics based on the Bayesian interpretation of probability where probability expresses a ''degree of belief'' in an event. The degree of belief may be based on prior knowledge about the event, ...

computational physics Computational physics is the study and implementation of numerical analysis to solve problems in physics for which a quantitative theory already exists. Historically, computational physics was the first application of modern computers in science, ...

, computational biology and computational linguistics. In Bayesian statistics, the recent development of MCMC methods has made it possible to compute large hierarchical models that require integrations over hundreds to thousands of unknown parameters. In

rare event sampling Rare event sampling is an umbrella term for a group of computer simulation methods intended to selectively sample 'special' regions of the dynamic space of systems which are unlikely to visit those special regions through brute-force simulation. A ...

, they are also used for generating samples that gradually populate the rare failure region.

General explanation

Metropolis algorithm convergence example

Markov chain Monte Carlo methods create samples from a continuous random variable, with

probability density In probability theory, a probability density function (PDF), or density of a continuous random variable, is a function whose value at any given sample (or point) in the sample space (the set of possible values taken by the random variable) can ...

proportional to a known function. These samples can be used to evaluate an integral over that variable, as its expected value or

variance In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its population mean or sample mean. Variance is a measure of dispersion, meaning it is a measure of how far a set of numbe ...

. Practically, an

ensemble Ensemble may refer to: Art * Architectural ensemble * Ensemble (album), ''Ensemble'' (album), Kendji Girac 2015 album * Ensemble (band), a project of Olivier Alary * Ensemble cast (drama, comedy) * Ensemble (musical theatre), also known as the ...

of chains is generally developed, starting from a set of points arbitrarily chosen and sufficiently distant from each other. These chains are stochastic processes of "walkers" which move around randomly according to an algorithm that looks for places with a reasonably high contribution to the integral to move into next, assigning them higher probabilities. Random walk Monte Carlo methods are a kind of random

simulation A simulation is the imitation of the operation of a real-world process or system over time. Simulations require the use of models; the model represents the key characteristics or behaviors of the selected system or process, whereas the s ...

Monte Carlo method Monte Carlo methods, or Monte Carlo experiments, are a broad class of computational algorithms that rely on repeated random sampling to obtain numerical results. The underlying concept is to use randomness to solve problems that might be determi ...

. However, whereas the random samples of the integrand used in a conventional

Monte Carlo integration In mathematics, Monte Carlo integration is a technique for numerical integration using random numbers. It is a particular Monte Carlo method that numerically computes a definite integral. While other algorithms usually evaluate the integrand a ...

are

statistically independent Independence is a fundamental notion in probability theory, as in statistics and the theory of stochastic processes. Two events are independent, statistically independent, or stochastically independent if, informally speaking, the occurrence of o ...

, those used in MCMC are autocorrelated. Correlations of samples introduces the need to use the

Markov chain central limit theorem In the mathematical theory of random processes, the Markov chain central limit theorem has a conclusion somewhat similar in form to that of the classic central limit theorem (CLT) of probability theory, but the quantity in the role taken by the var ...

when estimating the error of mean values. These algorithms create

Markov chains A Markov chain or Markov process is a stochastic model describing a sequence of possible events in which the probability of each event depends only on the state attained in the previous event. Informally, this may be thought of as, "What happen ...

such that they have an equilibrium distribution which is proportional to the function given.

Reducing correlation

While MCMC methods were created to address multi-dimensional problems better than generic Monte Carlo algorithms, when the number of dimensions rises they too tend to suffer the

curse of dimensionality The curse of dimensionality refers to various phenomena that arise when analyzing and organizing data in high-dimensional spaces that do not occur in low-dimensional settings such as the three-dimensional physical space of everyday experience. T ...

: regions of higher probability tend to stretch and get lost in an increasing volume of space that contributes little to the integral. One way to address this problem could be shortening the steps of the walker, so that it doesn't continuously try to exit the highest probability region, though this way the process would be highly autocorrelated and expensive (i.e. many steps would be required for an accurate result). More sophisticated methods such as

Hamiltonian Monte Carlo The Hamiltonian Monte Carlo algorithm (originally known as hybrid Monte Carlo) is a Markov chain Monte Carlo method for obtaining a sequence of random samples which converge to being distributed according to a target probability distribution for whi ...

and the

Wang and Landau algorithm The Wang and Landau algorithm, proposed by Fugao Wang and David P. Landau, is a Monte Carlo method designed to estimate the density of states of a system. The method performs a non-Markovian random walk to build the density of states by quickly v ...

use various ways of reducing this autocorrelation, while managing to keep the process in the regions that give a higher contribution to the integral. These algorithms usually rely on a more complicated theory and are harder to implement, but they usually converge faster.

Examples

Random walk

: This method generates a Markov chain using a proposal density for new steps and a method for rejecting some of the proposed moves. It is actually a general framework which includes as special cases the very first and simpler MCMC (Metropolis algorithm) and many more recent alternatives listed below. **

Gibbs sampling In statistics, Gibbs sampling or a Gibbs sampler is a Markov chain Monte Carlo (MCMC) algorithm for obtaining a sequence of observations which are approximated from a specified multivariate probability distribution, when direct sampling is dif ...

: This method requires all the

conditional distribution In probability theory and statistics, given two jointly distributed random variables X and Y, the conditional probability distribution of Y given X is the probability distribution of Y when X is known to be a particular value; in some cases the ...

s of the target distribution to be sampled exactly. When drawing from the full-conditional distributions is not straightforward other samplers-within-Gibbs are used (e.g., see ). Gibbs sampling is popular partly because it does not require any 'tuning'. Algorithm structure of the

highly resembles that of the coordinate ascent variational inference in that both algorithms utilize the full-conditional distributions in the updating procedure. ** Metropolis-adjusted Langevin algorithm and other methods that rely on the gradient (and possibly second derivative) of the log target density to propose steps that are more likely to be in the direction of higher probability density. ** Pseudo-marginal Metropolis–Hastings: This method replaces the evaluation of the density of the target distribution with an unbiased estimate and is useful when the target density is not available analytically, e.g.

latent variable model A latent variable model is a statistical model that relates a set of observable variables (also called ''manifest variables'' or ''indicators'') to a set of latent variables. It is assumed that the responses on the indicators or manifest variabl ...

s. *

Slice sampling Slice sampling is a type of Markov chain Monte Carlo algorithm for pseudo-random number sampling, i.e. for drawing random samples from a statistical distribution. The method is based on the observation that to sample a random variable one can sampl ...

: This method depends on the principle that one can sample from a distribution by sampling uniformly from the region under the plot of its density function. It alternates uniform sampling in the vertical direction with uniform sampling from the horizontal 'slice' defined by the current vertical position. * Multiple-try Metropolis: This method is a variation of the Metropolis–Hastings algorithm that allows multiple trials at each point. By making it possible to take larger steps at each iteration, it helps address the curse of dimensionality. * Reversible-jump: This method is a variant of the Metropolis–Hastings algorithm that allows proposals that change the dimensionality of the space. Markov chain Monte Carlo methods that change dimensionality have long been used in

statistical physics Statistical physics is a branch of physics that evolved from a foundation of statistical mechanics, which uses methods of probability theory and statistics, and particularly the mathematical tools for dealing with large populations and approxim ...

applications, where for some problems a distribution that is a

grand canonical ensemble In statistical mechanics, the grand canonical ensemble (also known as the macrocanonical ensemble) is the statistical ensemble that is used to represent the possible states of a mechanical system of particles that are in thermodynamic equilibriu ...

is used (e.g., when the number of molecules in a box is variable). But the reversible-jump variant is useful when doing Markov chain Monte Carlo or Gibbs sampling over

nonparametric Nonparametric statistics is the branch of statistics that is not based solely on Statistical parameter, parametrized families of probability distributions (common examples of parameters are the mean and variance). Nonparametric statistics is based ...

Bayesian models such as those involving the

Dirichlet process In probability theory, Dirichlet processes (after the distribution associated with Peter Gustav Lejeune Dirichlet) are a family of stochastic processes whose realizations are probability distributions. In other words, a Dirichlet process is a p ...

Chinese restaurant process In probability theory, the Chinese restaurant process is a discrete-time stochastic process, analogous to seating customers at tables in a restaurant. Imagine a restaurant with an infinite number of circular tables, each with infinite capacity. C ...

, where the number of mixing components/clusters/etc. is automatically inferred from the data. * Hamiltonian (or Hybrid) Monte Carlo (HMC): Tries to avoid random walk behaviour by introducing an auxiliary momentum vector and implementing

Hamiltonian dynamics Hamiltonian mechanics emerged in 1833 as a reformulation of Lagrangian mechanics. Introduced by Sir William Rowan Hamilton, Hamiltonian mechanics replaces (generalized) velocities \dot q^i used in Lagrangian mechanics with (generalized) ''momenta ...

, so the potential energy function is the target density. The momentum samples are discarded after sampling. The end result of Hybrid Monte Carlo is that proposals move across the sample space in larger steps; they are therefore less correlated and converge to the target distribution more rapidly.

Interacting particle methods

Interacting MCMC methodologies are a class of

mean-field particle methods Mean-field particle methods are a broad class of ''interacting type'' Monte Carlo algorithms for simulating from a sequence of probability distributions satisfying a nonlinear evolution equation. These flows of probability measures can always be int ...

for obtaining

random samples In statistics, quality assurance, and Statistical survey, survey methodology, sampling is the selection of a subset (a statistical sample) of individuals from within a population (statistics), statistical population to estimate characteristics o ...

from a sequence of probability distributions with an increasing level of sampling complexity. These probabilistic models include path space state models with increasing time horizon, posterior distributions w.r.t. sequence of partial observations, increasing constraint level sets for conditional distributions, decreasing temperature schedules associated with some Boltzmann–Gibbs distributions, and many others. In principle, any Markov chain Monte Carlo sampler can be turned into an interacting Markov chain Monte Carlo sampler. These interacting Markov chain Monte Carlo samplers can be interpreted as a way to run in parallel a sequence of Markov chain Monte Carlo samplers. For instance, interacting

simulated annealing Simulated annealing (SA) is a probabilistic technique for approximating the global optimum of a given function. Specifically, it is a metaheuristic to approximate global optimization in a large search space for an optimization problem. ...

algorithms are based on independent Metropolis–Hastings moves interacting sequentially with a selection-resampling type mechanism. In contrast to traditional Markov chain Monte Carlo methods, the precision parameter of this class of interacting Markov chain Monte Carlo samplers is ''only'' related to the number of interacting Markov chain Monte Carlo samplers. These advanced particle methodologies belong to the class of Feynman–Kac particle models, also called Sequential Monte Carlo or

particle filter Particle filters, or sequential Monte Carlo methods, are a set of Monte Carlo algorithms used to solve filtering problems arising in signal processing and Bayesian statistical inference. The filtering problem consists of estimating the i ...

methods in Bayesian inference and

signal processing Signal processing is an electrical engineering subfield that focuses on analyzing, modifying and synthesizing ''signals'', such as sound, images, and scientific measurements. Signal processing techniques are used to optimize transmissions, ...

communities. Interacting Markov chain Monte Carlo methods can also be interpreted as a mutation-selection genetic particle algorithm with Markov chain Monte Carlo mutations.

Markov Chain quasi–Monte Carlo (MCQMC).

The advantage of

low-discrepancy sequence In mathematics, a low-discrepancy sequence is a sequence with the property that for all values of ''N'', its subsequence ''x''1, ..., ''x'N'' has a low discrepancy. Roughly speaking, the discrepancy of a sequence is low if the proportion of poi ...

s in lieu of random numbers for simple independent Monte Carlo sampling is well known. This procedure, known as

Quasi-Monte Carlo method In numerical analysis, the quasi-Monte Carlo method is a method for numerical integration and solving some other problems using low-discrepancy sequences (also called quasi-random sequences or sub-random sequences). This is in contrast to the regu ...

(QMC), yields an integration error that decays at a superior rate to that obtained by IID sampling, by the Koksma-Hlawka inequality. Empirically it allows the reduction of both estimation error and convergence time by an order of magnitude. The Array-RQMC method combines randomized quasi–Monte Carlo and Markov chain simulation by simulating

n

chains simultaneously in a way that the empirical distribution of the

n

states at any given step is a better approximation of the true distribution of the chain than with ordinary MCMC. In empirical experiments, the variance of the average of a function of the state sometimes converges at rate

O(n^)

or even faster, instead of the

O(n^)

Monte Carlo rate.

Convergence

Usually it is not hard to construct a Markov chain with the desired properties. The more difficult problem is to determine how many steps are needed to converge to the stationary distribution within an acceptable error. A good chain will have rapid mixing: the stationary distribution is reached quickly starting from an arbitrary position. A standard empirical method to assess convergence is to run several independent simulated Markov chains and check that the ratio of inter-chain to intra-chain variances for all the parameters sampled is close to 1. Typically, Markov chain Monte Carlo sampling can only approximate the target distribution, as there is always some residual effect of the starting position. More sophisticated Markov chain Monte Carlo-based algorithms such as

coupling from the past Among Markov chain Monte Carlo (MCMC) algorithms, coupling from the past is a method for sampling from the stationary distribution of a Markov chain. Contrary to many MCMC algorithms, coupling from the past gives in principle a perfect sample from ...

can produce exact samples, at the cost of additional computation and an unbounded (though finite in expectation)

running time In computer science, the time complexity is the computational complexity that describes the amount of computer time it takes to run an algorithm. Time complexity is commonly estimated by counting the number of elementary operations performed by t ...

. Many random walk Monte Carlo methods move around the equilibrium distribution in relatively small steps, with no tendency for the steps to proceed in the same direction. These methods are easy to implement and analyze, but unfortunately it can take a long time for the walker to explore all of the space. The walker will often double back and cover ground already covered. Further consideration of convergence is at

. See for a discussion of the theory related to convergence and stationarity of the Metropolis–Hastings algorithm.

Software

Several software programs provide MCMC sampling capabilities, for example:
ParaMonte
parallel Monte Carlo software available in multiple programming languages including C,

C%2B%2B C++ (pronounced "C plus plus") is a high-level general-purpose programming language created by Danish computer scientist Bjarne Stroustrup as an extension of the C programming language, or "C with Classes". The language has expanded signific ...

, Fortran,

MATLAB MATLAB (an abbreviation of "MATrix LABoratory") is a proprietary multi-paradigm programming language and numeric computing environment developed by MathWorks. MATLAB allows matrix manipulations, plotting of functions and data, implementa ...

, and

Python Python may refer to: Snakes * Pythonidae, a family of nonvenomous snakes found in Africa, Asia, and Australia ** ''Python'' (genus), a genus of Pythonidae found in Africa and Asia * Python (mythology), a mythical serpent Computing * Python (pro ...

.
Vandal
software for creation of Monte Carlo simulation available in

. * Packages that use dialects of the BUGS model language: **

WinBUGS WinBUGS is statistical software for Bayesian analysis using Markov chain Monte Carlo (MCMC) methods. It is based on the BUGS ( Bayesian inference Using Gibbs Sampling) project started in 1989. It runs under Microsoft Windows, though it can als ...

OpenBUGS OpenBUGS is a software application for the Bayesian analysis of complex statistical models using Markov chain Monte Carlo (MCMC) methods. OpenBUGS is the open source variant of WinBUGS ( Bayesian inference Using Gibbs Sampling). It runs under ...

MultiBUGS
** JAGS *

MCSim GNU MCSim is a suite of simulation software. It allows one to design one's own statistical or simulation models, perform Monte Carlo simulations, and Bayesian inference through (tempered) Markov chain Monte Carlo simulations. The latest version ...

* Julia language with packages like Turing.jl, DynamicHMC.jl, AffineInvariantMCMC.jl, and the ones in StanJulia repository. *

Python (programming language) Python is a high-level, general-purpose programming language. Its design philosophy emphasizes code readability with the use of significant indentation. Python is dynamically-typed and garbage-collected. It supports multiple programming p ...

with the package
emcee
ParaMonte

PyMC3 PyMC (formerly known as PyMC3) is a Python (programming language), Python package for Bayesian statistical modeling and probabilistic machine learning which focuses on advanced Markov chain Monte Carlo and variational fitting algorithms. It is a r ...

, and vandal. *

R (programming language) R is a programming language for statistical computing and graphics supported by the R Core Team and the R Foundation for Statistical Computing. Created by statisticians Ross Ihaka and Robert Gentleman, R is used among data miners, bioinfor ...

with the packages adaptMCMC, atmcmc, BRugs, mcmc, MCMCpack, ramcmc, rjags, rstan, etc. * Stan
TensorFlow Probability
(

probabilistic programming Probabilistic programming (PP) is a programming paradigm in which probabilistic models are specified and inference for these models is performed automatically. It represents an attempt to unify probabilistic modeling and traditional general pur ...

library built on

TensorFlow TensorFlow is a free and open-source software library for machine learning and artificial intelligence. It can be used across a range of tasks but has a particular focus on training and inference of deep neural networks. "It is machine learnin ...

)
Korali
high-performance framework for Bayesian UQ, optimization, and reinforcement learning.

References

Citations

Sources

* Christophe Andrieu, Nando De Freitas, Arnaud Doucet and Michael I. Jorda
''An Introduction to MCMC for Machine Learning''
2003 * * * * * * * ''(See Chapter 11.)'' * * * * * * * * * * *