In computational statistics, the Metropolis-adjusted Langevin algorithm (MALA) or Langevin Monte Carlo (LMC) is a

Markov chain Monte Carlo In statistics, Markov chain Monte Carlo (MCMC) methods comprise a class of algorithms for sampling from a probability distribution. By constructing a Markov chain that has the desired distribution as its equilibrium distribution, one can obtain ...

(MCMC) method for obtaining

random samples In statistics, quality assurance, and survey methodology, sampling is the selection of a subset (a statistical sample) of individuals from within a statistical population to estimate characteristics of the whole population. Statisticians attempt ...

– sequences of random observations – from a

probability distribution In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment. It is a mathematical description of a random phenomenon i ...

for which direct sampling is difficult. As the name suggests, MALA uses a combination of two mechanisms to generate the states of a

random walk In mathematics, a random walk is a random process that describes a path that consists of a succession of random steps on some mathematical space. An elementary example of a random walk is the random walk on the integer number line \mathbb Z ...

that has the target probability distribution as an

invariant measure In mathematics, an invariant measure is a measure that is preserved by some function. The function may be a geometric transformation. For examples, circular angle is invariant under rotation, hyperbolic angle is invariant under squeeze mapping, an ...

: * new states are proposed using (

overdamped Damping is an influence within or upon an oscillatory system that has the effect of reducing or preventing its oscillation. In physical systems, damping is produced by processes that dissipate the energy stored in the oscillation. Examples incl ...

)

Langevin dynamics In physics, Langevin dynamics is an approach to the mathematical modeling of the dynamics of molecular systems. It was originally developed by French physicist Paul Langevin. The approach is characterized by the use of simplified models while acco ...

, which use evaluations of the

gradient In vector calculus, the gradient of a scalar-valued differentiable function of several variables is the vector field (or vector-valued function) \nabla f whose value at a point p is the "direction and rate of fastest increase". If the gradi ...

of the target

probability density function In probability theory, a probability density function (PDF), or density of a continuous random variable, is a function whose value at any given sample (or point) in the sample space (the set of possible values taken by the random variable) can ...

; * these proposals are accepted or rejected using the

Metropolis–Hastings algorithm In statistics and statistical physics, the Metropolis–Hastings algorithm is a Markov chain Monte Carlo (MCMC) method for obtaining a sequence of random samples from a probability distribution from which direct sampling is difficult. This seque ...

, which uses evaluations of the target probability density (but not its gradient). Informally, the Langevin dynamics drive the random walk towards regions of high probability in the manner of a gradient flow, while the Metropolis–Hastings accept/reject mechanism improves the mixing and convergence properties of this random walk. MALA was originally proposed by

Julian Besag Julian Ernst Besag FRS (26 March 1945 – 6 August 2010) was a British statistician known chiefly for his work in spatial statistics (including its applications to epidemiology, image analysis and agricultural science), and Bayesian inferenc ...

in 1994, (although the method Smart Monte Carlo was already introduced in 1978 ) and its properties were examined in detail by Gareth Roberts together with

Richard Tweedie Richard Lewis Tweedie (22 August 1947 – 7 June 2001) was an Australian statistician. Education After having completed his undergraduate studies and a Master of Arts at the Australian National University, Tweedie moved to Cambridge University, ...

and

Jeff Rosenthal Jeffrey Seth Rosenthal (born October 13, 1967 in Scarborough, Toronto, Scarborough, Ontario) is a Canadian statistician and nonfiction author. He is a professor in the University of Toronto's Department of Statistics, cross-appointed with Univer ...

. Many variations and refinements have been introduced since then, e.g. the

manifold In mathematics, a manifold is a topological space that locally resembles Euclidean space near each point. More precisely, an n-dimensional manifold, or ''n-manifold'' for short, is a topological space with the property that each point has a n ...

variant of Girolami and Calderhead (2011). The method is equivalent to using the

Hamiltonian Monte Carlo The Hamiltonian Monte Carlo algorithm (originally known as hybrid Monte Carlo) is a Markov chain Monte Carlo method for obtaining a sequence of random samples which converge to being distributed according to a target probability distribution for whi ...

(hybrid Monte Carlo) algorithm with only a single discrete time step.

Further details

Let

\pi

denote a probability density function on

\mathbb^

, one from which it is desired to draw an ensemble of independent and identically distributed samples. We consider the overdamped Langevin

Itô diffusion In mathematics – specifically, in stochastic analysis – an Itô diffusion is a solution to a specific type of stochastic differential equation. That equation is similar to the Langevin equation used in physics to describe the Brownian motion o ...

\dot = \nabla \log \pi(X) + \sqrt \dot

driven by the time derivative of a standard Brownian motion

W

. (Note that another commonly-used normalization for this diffusion is :

\dot = \frac \nabla \log \pi(X) + \dot,

which generates the same dynamics.) In the limit as

t \to \infty

, this probability distribution

\rho(t)

X(t)

approaches a stationary distribution, which is also invariant under the diffusion, which we denote

\rho_\infty

. It turns out that, in fact,

\rho_\infty = \pi

. Approximate sample paths of the Langevin diffusion can be generated by many discrete-time methods. One of the simplest is the Euler–Maruyama method with a fixed time step

\tau > 0

. We set

X_0 := x_0

and then recursively define an approximation

X_k

to the true solution

X(k \tau)

by :

X_ := X_k + \tau \nabla \log \pi(X_k) + \sqrt \xi_k,

where each

\xi_

is an independent draw from a

multivariate normal distribution In probability theory and statistics, the multivariate normal distribution, multivariate Gaussian distribution, or joint normal distribution is a generalization of the one-dimensional (univariate) normal distribution to higher dimensions. One d ...

\mathbb^

with

mean There are several kinds of mean in mathematics, especially in statistics. Each mean serves to summarize a given group of data, often to better understand the overall value (magnitude and sign) of a given data set. For a data set, the ''arithme ...

0 and

covariance matrix In probability theory and statistics, a covariance matrix (also known as auto-covariance matrix, dispersion matrix, variance matrix, or variance–covariance matrix) is a square matrix giving the covariance between each pair of elements of ...

equal to the

d \times d

identity matrix In linear algebra, the identity matrix of size n is the n\times n square matrix with ones on the main diagonal and zeros elsewhere. Terminology and notation The identity matrix is often denoted by I_n, or simply by I if the size is immaterial o ...

. Note that

X_

is normally distributed with mean

X_k + \tau \nabla \log \pi(X_k)

and covariance equal to

2 \tau

times the

d \times d

identity matrix. In contrast to the Euler–Maruyama method for simulating the Langevin diffusion, which always updates

X_k

according to the update rule :

X_ := X_k + \tau \nabla \log \pi(X_k) + \sqrt \xi_k,

MALA incorporates an additional step. We consider the above update rule as defining a ''proposal''

\tilde_

for a new state, :

\tilde_ := X_k + \tau \nabla \log \pi(X_k) + \sqrt \xi_k.

This proposal is accepted or rejected according to the Metropolis-Hastings algorithm: set :

\alpha := \min \left\,

where :

q(x'\mid x) \propto \exp \left( - \frac \,  x' - x - \tau \nabla \log \pi(x) \, _2^2 \right)

is the transition probability density from

x

x'

(note that, in general

q(x'\mid x) \neq q(x\mid x')

). Let

u

be drawn from the

continuous uniform distribution In probability theory and statistics, the continuous uniform distribution or rectangular distribution is a family of symmetric probability distributions. The distribution describes an experiment where there is an arbitrary outcome that lies betw ...

on the interval

, 1 The comma is a punctuation mark that appears in several variants in different languages. It has the same shape as an apostrophe or single closing quotation mark () in many typefaces, but it differs from them in being placed on the baseline (t ...

/math>. If

u \leq \alpha

, then the proposal is accepted, and we set

X_ := \tilde_

; otherwise, the proposal is rejected, and we set

X_ := X_k

. The combined dynamics of the Langevin diffusion and the Metropolis–Hastings algorithm satisfy the detailed balance conditions necessary for the existence of a unique, invariant, stationary distribution

\rho_ = \pi

. Compared to naive Metropolis–Hastings, MALA has the advantage that it usually proposes moves into regions of higher

\pi

probability, which are then more likely to be accepted. On the other hand, when

\pi

is strongly

anisotropic Anisotropy () is the property of a material which allows it to change or assume different properties in different directions, as opposed to isotropy. It can be defined as a difference, when measured along different axes, in a material's physic ...

(i.e. it varies much more quickly in some directions than others), it is necessary to take

0 < \tau \ll 1

in order to properly capture the Langevin dynamics; the use of a positive-definite

preconditioning In mathematics, preconditioning is the application of a transformation, called the preconditioner, that conditions a given problem into a form that is more suitable for numerical solving methods. Preconditioning is typically related to reducing ...

matrix

A \in \mathbb^

can help to alleviate this problem, by generating proposals according to :

\tilde_ := X_k + \tau A \nabla \log \pi(X_k) + \sqrt \xi_k,

so that

\tilde_{k + 1}

has mean

X_k + \tau A \nabla \log \pi(X_k)

and covariance

2 \tau A

. For limited classes of target distributions, the optimal acceptance rate for this algorithm can be shown to be

0.574

; if it is discovered to be substantially different in practice,

\tau

should be modified accordingly.

References

Monte Carlo methods Markov chain Monte Carlo Sampling techniques