Approximate Bayesian computation (ABC) constitutes a class of
computational methods rooted in
Bayesian statistics
Bayesian statistics ( or ) is a theory in the field of statistics based on the Bayesian interpretation of probability, where probability expresses a ''degree of belief'' in an event. The degree of belief may be based on prior knowledge about ...
that can be used to estimate the posterior distributions of model parameters.
In all model-based
statistical inference
Statistical inference is the process of using data analysis to infer properties of an underlying probability distribution.Upton, G., Cook, I. (2008) ''Oxford Dictionary of Statistics'', OUP. . Inferential statistical analysis infers properties of ...
, the
likelihood function
A likelihood function (often simply called the likelihood) measures how well a statistical model explains observed data by calculating the probability of seeing that data under different parameter values of the model. It is constructed from the ...
is of central importance, since it expresses the probability of the observed data under a particular
statistical model
A statistical model is a mathematical model that embodies a set of statistical assumptions concerning the generation of Sample (statistics), sample data (and similar data from a larger Statistical population, population). A statistical model repre ...
, and thus quantifies the support data lend to particular values of parameters and to choices among different models. For simple models, an analytical formula for the likelihood function can typically be derived. However, for more complex models, an analytical formula might be elusive or the likelihood function might be computationally very costly to evaluate.
ABC methods bypass the evaluation of the likelihood function. In this way, ABC methods widen the realm of models for which statistical inference can be considered. ABC methods are mathematically well-founded, but they inevitably make assumptions and approximations whose impact needs to be carefully assessed. Furthermore, the wider application domain of ABC exacerbates the challenges of
parameter estimation and
model selection
Model selection is the task of selecting a model from among various candidates on the basis of performance criterion to choose the best one.
In the context of machine learning and more generally statistical analysis, this may be the selection of ...
.
ABC has rapidly gained popularity over the last years and in particular for the analysis of complex problems arising in
biological sciences
Biology is the scientific study of life and living organisms. It is a broad natural science that encompasses a wide range of fields and unifying principles that explain the structure, function, growth, origin, evolution, and distribution of ...
, e.g. in
population genetics
Population genetics is a subfield of genetics that deals with genetic differences within and among populations, and is a part of evolutionary biology. Studies in this branch of biology examine such phenomena as Adaptation (biology), adaptation, s ...
,
ecology
Ecology () is the natural science of the relationships among living organisms and their Natural environment, environment. Ecology considers organisms at the individual, population, community (ecology), community, ecosystem, and biosphere lev ...
,
epidemiology
Epidemiology is the study and analysis of the distribution (who, when, and where), patterns and Risk factor (epidemiology), determinants of health and disease conditions in a defined population, and application of this knowledge to prevent dise ...
,
systems biology
Systems biology is the computational modeling, computational and mathematical analysis and modeling of complex biological systems. It is a biology-based interdisciplinary field of study that focuses on complex interactions within biological system ...
, and in
radio propagation
Radio propagation is the behavior of radio waves as they travel, or are wave propagation, propagated, from one point to another in vacuum, or into various parts of the atmosphere.
As a form of electromagnetic radiation, like light waves, radio w ...
.
History
The first ABC-related ideas date back to the 1980s.
Donald Rubin, when discussing the interpretation of Bayesian statements in 1984,
described a hypothetical sampling mechanism that yields a sample from the
posterior distribution
The posterior probability is a type of conditional probability that results from updating the prior probability with information summarized by the likelihood via an application of Bayes' rule. From an epistemological perspective, the posterior ...
. This scheme was more of a conceptual
thought experiment
A thought experiment is an imaginary scenario that is meant to elucidate or test an argument or theory. It is often an experiment that would be hard, impossible, or unethical to actually perform. It can also be an abstract hypothetical that is ...
to demonstrate what type of manipulations are done when inferring the posterior distributions of parameters. The description of the sampling mechanism coincides exactly with that of the
ABC-rejection scheme, and this article can be considered to be the first to describe approximate Bayesian computation. However, a two-stage
quincunx
A quincunx ( ) is a geometry, geometric pattern consisting of five points arranged in a cross, with four of them forming a Square (geometry), square or rectangle and a fifth at its center. The same pattern has other names, including "in saltire" ...
was constructed by
Francis Galton
Sir Francis Galton (; 16 February 1822 – 17 January 1911) was an English polymath and the originator of eugenics during the Victorian era; his ideas later became the basis of behavioural genetics.
Galton produced over 340 papers and b ...
in the late 1800s that can be seen as a physical implementation of an
ABC-rejection scheme for a single unknown (parameter) and a single observation.
[see figure 5 in ] Another prescient point was made by Rubin when he argued that in Bayesian inference, applied statisticians should not settle for analytically tractable models only, but instead consider computational methods that allow them to estimate the posterior distribution of interest. This way, a wider range of models can be considered. These arguments are particularly relevant in the context of ABC.
In 1984,
Peter Diggle and
Richard Gratton suggested using a systematic simulation scheme to approximate the likelihood function in situations where its analytic form is
intractable.
Their method was based on defining a grid in the parameter space and using it to approximate the likelihood by running several simulations for each grid point. The approximation was then improved by applying smoothing techniques to the outcomes of the simulations. While the idea of using simulation for hypothesis testing was not new,
Diggle and Gratton seemingly introduced the first procedure using simulation to do statistical inference under a circumstance where the likelihood is intractable.
Although Diggle and Gratton's approach had opened a new frontier, their method was not yet exactly identical to what is now known as ABC, as it aimed at approximating the likelihood rather than the posterior distribution. An article of
Simon Tavaré and co-authors was first to propose an ABC algorithm for posterior inference.
In their seminal work, inference about the genealogy of DNA sequence data was considered, and in particular the problem of deciding the posterior distribution of the time to the
most recent common ancestor
A most recent common ancestor (MRCA), also known as a last common ancestor (LCA), is the most recent individual from which all organisms of a set are inferred to have descended. The most recent common ancestor of a higher taxon is generally assu ...
of the sampled individuals. Such inference is analytically intractable for many demographic models, but the authors presented ways of simulating coalescent trees under the putative models. A sample from the posterior of model parameters was obtained by accepting/rejecting proposals based on comparing the number of segregating sites in the synthetic and real data. This work was followed by an applied study on modeling the variation in human Y chromosome by
Jonathan K. Pritchard and co-authors using the ABC method.
Finally, the term approximate Bayesian computation was established by Mark Beaumont and co-authors,
extending further the ABC methodology and discussing the suitability of the ABC-approach more specifically for problems in population genetics. Since then, ABC has spread to applications outside population genetics, such as systems biology, epidemiology, and
phylogeography
Phylogeography is the study of the historical processes that may be responsible for the past to present geographic distributions of genealogical lineages. This is accomplished by considering the geographic distribution of individuals in light of ge ...
.
Approximate Bayesian computation can be understood as a kind of Bayesian version of
indirect inference.
Several efficient Monte Carlo based approaches have been developed to perform sampling from the ABC posterior distribution for purposes of estimation and prediction problems. A popular choice is the SMC Samplers algorithm adapted to the ABC context in the method (SMC-ABC).
Method
Motivation
A common incarnation of
Bayes’ theorem relates the
conditional probability
In probability theory, conditional probability is a measure of the probability of an Event (probability theory), event occurring, given that another event (by assumption, presumption, assertion or evidence) is already known to have occurred. This ...
(or density) of a particular parameter value
given data
to the
probability
Probability is a branch of mathematics and statistics concerning events and numerical descriptions of how likely they are to occur. The probability of an event is a number between 0 and 1; the larger the probability, the more likely an e ...
of
given
by the rule
:
,
where
denotes the posterior,
the likelihood,
the prior, and
the evidence (also referred to as the
marginal likelihood
A marginal likelihood is a likelihood function that has been integrated over the parameter space. In Bayesian statistics, it represents the probability of generating the observed sample for all possible values of the parameters; it can be under ...
or the prior predictive probability of the data). Note that the denominator
is normalizing the total probability of the posterior density
to one and can be calculated that way.
The prior represents beliefs or knowledge (such as e.g. physical constraints) about
before
is available. Since the prior narrows down uncertainty, the posterior estimates have less variance, but might be biased. For convenience the prior is often specified by choosing a particular distribution among a set of well-known and tractable families of distributions, such that both the evaluation of prior probabilities and random generation of values of
are relatively straightforward. For certain kinds of models, it is more pragmatic to specify the prior
using a factorization of the joint distribution of all the elements of
in terms of a sequence of their conditional distributions. If one is only interested in the relative posterior plausibilities of different values of
, the evidence
can be ignored, as it constitutes a
normalising constant, which cancels for any ratio of posterior probabilities. It remains, however, necessary to evaluate the likelihood
and the prior
. For numerous applications, it is
computationally expensive, or even completely infeasible, to evaluate the likelihood,
which motivates the use of ABC to circumvent this issue.
The ABC rejection algorithm
All ABC-based methods approximate the likelihood function by simulations, the outcomes of which are compared with the observed data.
More specifically, with the ABC rejection algorithm — the most basic form of ABC — a set of parameter points is first sampled from the prior distribution. Given a sampled parameter point
, a data set
is then simulated under the statistical model
specified by
. If the generated
is too different from the observed data
, the sampled parameter value is discarded. In precise terms,
is accepted with tolerance
if:
:
,
where the distance measure
determines the level of discrepancy between
and
based on a given
metric
Metric or metrical may refer to:
Measuring
* Metric system, an internationally adopted decimal system of measurement
* An adjective indicating relation to measurement in general, or a noun describing a specific type of measurement
Mathematics
...
(e.g.
Euclidean distance
In mathematics, the Euclidean distance between two points in Euclidean space is the length of the line segment between them. It can be calculated from the Cartesian coordinates of the points using the Pythagorean theorem, and therefore is o ...
). A strictly positive tolerance is usually necessary, since the probability that the simulation outcome coincides exactly with the data (event
) is negligible for all but trivial applications of ABC, which would in practice lead to rejection of nearly all sampled parameter points. The outcome of the ABC rejection algorithm is a sample of parameter values approximately distributed according to the desired posterior distribution, and, crucially, obtained without the need to explicitly evaluate the likelihood function.
Summary statistics
The probability of generating a data set
with a small distance to
typically decreases as the dimensionality of the data increases. This leads to a substantial decrease in the computational efficiency of the above basic ABC rejection algorithm. A common approach to lessen this problem is to replace
with a set of lower-dimensional
summary statistics
In descriptive statistics, summary statistics are used to summarize a set of observations, in order to communicate the largest amount of information as simply as possible. Statisticians commonly try to describe the observations in
* a measure of ...
, which are selected to capture the relevant information in
. The acceptance criterion in ABC rejection algorithm becomes:
:
.
If the summary statistics are
sufficient with respect to the model parameters
, the efficiency increase obtained in this way does not introduce any error.
Indeed, by definition, sufficiency implies that all information in
about
is captured by
.
As
elaborated below, it is typically impossible, outside the
exponential family of distributions, to identify a finite-dimensional set of sufficient statistics. Nevertheless, informative but possibly insufficient summary statistics are often used in applications where inference is performed with ABC methods.
Example
An illustrative example is a
bistable system that can be characterized by a
hidden Markov model (HMM) subject to measurement noise. Such models are employed for many biological systems: They have, for example, been used in development,
cell signaling
In biology, cell signaling (cell signalling in British English) is the Biological process, process by which a Cell (biology), cell interacts with itself, other cells, and the environment. Cell signaling is a fundamental property of all Cell (biol ...
,
activation
In chemistry and biology, activation is the process whereby something is prepared or excited for a subsequent reaction.
Chemistry
In chemistry, "activation" refers to the reversible transition of a molecule into a nearly identical chemical or ...
/deactivation, logical processing and
non-equilibrium thermodynamics
Non-equilibrium thermodynamics is a branch of thermodynamics that deals with physical systems that are not in thermodynamic equilibrium but can be described in terms of macroscopic quantities (non-equilibrium state variables) that represent an ex ...
. For instance, the behavior of the
Sonic hedgehog
Sonic hedgehog protein (SHH) is a major signaling molecule of embryonic development in humans and animals, encoded by the ''SHH'' gene.
This signaling molecule is key in regulating embryonic morphogenesis in all animals. SHH controls organoge ...
(Shh) transcription factor in ''
Drosophila melanogaster
''Drosophila melanogaster'' is a species of fly (an insect of the Order (biology), order Diptera) in the family Drosophilidae. The species is often referred to as the fruit fly or lesser fruit fly, or less commonly the "vinegar fly", "pomace fly" ...
'' can be modeled with an HMM.
The (biological) dynamical model consists of two states: A and B. If the probability of a transition from one state to the other is defined as
in both directions, then the probability to remain in the same state at each time step is
. The probability to measure the state correctly is
(and conversely, the probability of an incorrect measurement is
).
Due to the conditional dependencies between states at different time points, calculation of the likelihood of time series data is somewhat tedious, which illustrates the motivation to use ABC. A computational issue for basic ABC is the large dimensionality of the data in an application like this. The dimensionality can be reduced using the summary statistic
, which is the frequency of switches between the two states. The absolute difference is used as a distance measure
with tolerance
. The posterior inference about the parameter
can be done following the five steps presented in.
Step 1: Assume that the observed data form the state sequence AAAABAABBAAAAAABAAAA, which is generated using
and
. The associated summary statistic—the number of switches between the states in the experimental data—is
.
Step 2: Assuming nothing is known about
, a uniform prior in the interval