Approximate Bayesian computation (ABC) constitutes a class of
computational methods rooted in
Bayesian statistics
Bayesian statistics is a theory in the field of statistics based on the Bayesian interpretation of probability where probability expresses a ''degree of belief'' in an event. The degree of belief may be based on prior knowledge about the event, ...
that can be used to estimate the posterior distributions of model parameters.
In all model-based
statistical inference
Statistical inference is the process of using data analysis to infer properties of an underlying probability distribution, distribution of probability.Upton, G., Cook, I. (2008) ''Oxford Dictionary of Statistics'', OUP. . Inferential statistical ...
, the
likelihood function
The likelihood function (often simply called the likelihood) represents the probability of random variable realizations conditional on particular values of the statistical parameters. Thus, when evaluated on a given sample, the likelihood funct ...
is of central importance, since it expresses the probability of the observed data under a particular
statistical model
A statistical model is a mathematical model that embodies a set of statistical assumptions concerning the generation of Sample (statistics), sample data (and similar data from a larger Statistical population, population). A statistical model repres ...
, and thus quantifies the support data lend to particular values of parameters and to choices among different models. For simple models, an analytical formula for the likelihood function can typically be derived. However, for more complex models, an analytical formula might be elusive or the likelihood function might be computationally very costly to evaluate.
ABC methods bypass the evaluation of the likelihood function. In this way, ABC methods widen the realm of models for which statistical inference can be considered. ABC methods are mathematically well-founded, but they inevitably make assumptions and approximations whose impact needs to be carefully assessed. Furthermore, the wider application domain of ABC exacerbates the challenges of
parameter estimation
Estimation theory is a branch of statistics that deals with estimating the values of parameters based on measured empirical data that has a random component. The parameters describe an underlying physical setting in such a way that their value ...
and
model selection
Model selection is the task of selecting a statistical model from a set of candidate models, given data. In the simplest cases, a pre-existing set of data is considered. However, the task can also involve the design of experiments such that the ...
.
ABC has rapidly gained popularity over the last years and in particular for the analysis of complex problems arising in
biological sciences
Biology is the scientific study of life. It is a natural science with a broad scope but has several unifying themes that tie it together as a single, coherent field. For instance, all organisms are made up of cells that process hereditary ...
, e.g. in
population genetics
Population genetics is a subfield of genetics that deals with genetic differences within and between populations, and is a part of evolutionary biology. Studies in this branch of biology examine such phenomena as adaptation, speciation, and pop ...
,
ecology
Ecology () is the study of the relationships between living organisms, including humans, and their physical environment. Ecology considers organisms at the individual, population, community, ecosystem, and biosphere level. Ecology overlaps wi ...
,
epidemiology
Epidemiology is the study and analysis of the distribution (who, when, and where), patterns and determinants of health and disease conditions in a defined population.
It is a cornerstone of public health, and shapes policy decisions and evidenc ...
,
systems biology
Systems biology is the computational modeling, computational and mathematical analysis and modeling of complex biological systems. It is a biology-based interdisciplinary field of study that focuses on complex interactions within biological syst ...
, and in
radio propagation
Radio propagation is the behavior of radio waves as they travel, or are propagated, from one point to another in vacuum, or into various parts of the atmosphere.
As a form of electromagnetic radiation, like light waves, radio waves are affecte ...
.
History
The first ABC-related ideas date back to the 1980s.
Donald Rubin
Donald is a masculine given name derived from the Gaelic name ''Dòmhnall''.. This comes from the Proto-Celtic *''Dumno-ualos'' ("world-ruler" or "world-wielder"). The final -''d'' in ''Donald'' is partly derived from a misinterpretation of the ...
, when discussing the interpretation of Bayesian statements in 1984,
described a hypothetical sampling mechanism that yields a sample from the
posterior distribution
The posterior probability is a type of conditional probability that results from updating the prior probability with information summarized by the likelihood via an application of Bayes' rule. From an epistemological perspective, the posterior p ...
. This scheme was more of a conceptual
thought experiment
A thought experiment is a hypothetical situation in which a hypothesis, theory, or principle is laid out for the purpose of thinking through its consequences.
History
The ancient Greek ''deiknymi'' (), or thought experiment, "was the most anci ...
to demonstrate what type of manipulations are done when inferring the posterior distributions of parameters. The description of the sampling mechanism coincides exactly with that of the
ABC-rejection scheme, and this article can be considered to be the first to describe approximate Bayesian computation. However, a two-stage
quincunx
A quincunx () is a geometric pattern consisting of five points arranged in a cross, with four of them forming a square or rectangle and a fifth at its center. The same pattern has other names, including "in saltire" or "in cross" in heraldry (dep ...
was constructed by
Francis Galton
Sir Francis Galton, FRS FRAI (; 16 February 1822 – 17 January 1911), was an English Victorian era polymath: a statistician, sociologist, psychologist, anthropologist, tropical explorer, geographer, inventor, meteorologist, proto- ...
in the late 1800s that can be seen as a physical implementation of an
ABC-rejection scheme for a single unknown (parameter) and a single observation.
[see figure 5 in ] Another prescient point was made by Rubin when he argued that in Bayesian inference, applied statisticians should not settle for analytically tractable models only, but instead consider computational methods that allow them to estimate the posterior distribution of interest. This way, a wider range of models can be considered. These arguments are particularly relevant in the context of ABC.
In 1984,
Peter Diggle
Peter John Diggle, (born 24 February 1950, Lancashire, England) is a British statistician. He holds concurrent appointments with the Faculty of Health and Medicine at Lancaster University, and the Institute of Infection and Global Health at the U ...
and
Richard Gratton suggested using a systematic simulation scheme to approximate the likelihood function in situations where its analytic form is
intractable.
Their method was based on defining a grid in the parameter space and using it to approximate the likelihood by running several simulations for each grid point. The approximation was then improved by applying smoothing techniques to the outcomes of the simulations. While the idea of using simulation for hypothesis testing was not new,
Diggle and Gratton seemingly introduced the first procedure using simulation to do statistical inference under a circumstance where the likelihood is intractable.
Although Diggle and Gratton's approach had opened a new frontier, their method was not yet exactly identical to what is now known as ABC, as it aimed at approximating the likelihood rather than the posterior distribution. An article of
Simon Tavaré
Simon Tavaré (born 1952) is the founding Director of the Herbert and Florence Irving Institute of Cancer Dynamics at Columbia University. Prior to joining Columbia, he was Director of the Cancer Research UK Cambridge Institute, Professor of ...
and co-authors was first to propose an ABC algorithm for posterior inference.
In their seminal work, inference about the genealogy of DNA sequence data was considered, and in particular the problem of deciding the posterior distribution of the time to the
most recent common ancestor
In biology and genetic genealogy, the most recent common ancestor (MRCA), also known as the last common ancestor (LCA) or concestor, of a set of organisms is the most recent individual from which all the organisms of the set are descended. The ...
of the sampled individuals. Such inference is analytically intractable for many demographic models, but the authors presented ways of simulating coalescent trees under the putative models. A sample from the posterior of model parameters was obtained by accepting/rejecting proposals based on comparing the number of segregating sites in the synthetic and real data. This work was followed by an applied study on modeling the variation in human Y chromosome by
Jonathan K. Pritchard
Jonathan Karl Pritchard is an English-born professor of genetics at Stanford University, best known for his development of the STRUCTURE algorithm for studying population structure and his work on human genetic variation and evolution.Pritchard Lab ...
and co-authors using the ABC method.
Finally, the term approximate Bayesian computation was established by Mark Beaumont and co-authors,
extending further the ABC methodology and discussing the suitability of the ABC-approach more specifically for problems in population genetics. Since then, ABC has spread to applications outside population genetics, such as systems biology, epidemiology, and
phylogeography
Phylogeography is the study of the historical processes that may be responsible for the past to present geographic distributions of genealogical lineages. This is accomplished by considering the geographic distribution of individuals in light of ge ...
.
Method
Motivation
A common incarnation of
Bayes’ theorem relates the
conditional probability
In probability theory, conditional probability is a measure of the probability of an event occurring, given that another event (by assumption, presumption, assertion or evidence) has already occurred. This particular method relies on event B occur ...
(or density) of a particular parameter value
given data
to the
probability
Probability is the branch of mathematics concerning numerical descriptions of how likely an Event (probability theory), event is to occur, or how likely it is that a proposition is true. The probability of an event is a number between 0 and ...
of
given
by the rule
:
,
where
denotes the posterior,
the likelihood,
the prior, and
the evidence (also referred to as the
marginal likelihood
A marginal likelihood is a likelihood function that has been integrated over the parameter space. In Bayesian statistics, it represents the probability of generating the observed sample from a prior and is therefore often referred to as model evi ...
or the prior predictive probability of the data). Note that the denominator
is normalizing the total probability of the posterior density
to one and can be calculated that way.
The prior represents beliefs or knowledge (such as f.e. physical constraints) about
before
is available. Since the prior narrows down uncertainty, the posterior estimates have less variance, but might be biased. For convenience the prior is often specified by choosing a particular distribution among a set of well-known and tractable families of distributions, such that both the evaluation of prior probabilities and random generation of values of
are relatively straightforward. For certain kinds of models, it is more pragmatic to specify the prior
using a factorization of the joint distribution of all the elements of
in terms of a sequence of their conditional distributions. If one is only interested in the relative posterior plausibilities of different values of
, the evidence
can be ignored, as it constitutes a
normalising constant, which cancels for any ratio of posterior probabilities. It remains, however, necessary to evaluate the likelihood
and the prior
. For numerous applications, it is
computationally expensive
In computer science, the analysis of algorithms is the process of finding the computational complexity of algorithms—the amount of time, storage, or other resources needed to execute them. Usually, this involves determining a function that re ...
, or even completely infeasible, to evaluate the likelihood,
which motivates the use of ABC to circumvent this issue.
The ABC rejection algorithm
All ABC-based methods approximate the likelihood function by simulations, the outcomes of which are compared with the observed data.
More specifically, with the ABC rejection algorithm—the most basic form of ABC—a set of parameter points is first sampled from the prior distribution. Given a sampled parameter point
, a data set
is then simulated under the statistical model
specified by
. If the generated
is too different from the observed data
, the sampled parameter value is discarded. In precise terms,
is accepted with tolerance
if:
:
,
where the distance measure
determines the level of discrepancy between
and
based on a given
metric
Metric or metrical may refer to:
* Metric system, an internationally adopted decimal system of measurement
* An adjective indicating relation to measurement in general, or a noun describing a specific type of measurement
Mathematics
In mathem ...
(e.g.
Euclidean distance
In mathematics, the Euclidean distance between two points in Euclidean space is the length of a line segment between the two points.
It can be calculated from the Cartesian coordinates of the points using the Pythagorean theorem, therefor ...
). A strictly positive tolerance is usually necessary, since the probability that the simulation outcome coincides exactly with the data (event
) is negligible for all but trivial applications of ABC, which would in practice lead to rejection of nearly all sampled parameter points. The outcome of the ABC rejection algorithm is a sample of parameter values approximately distributed according to the desired posterior distribution, and, crucially, obtained without the need to explicitly evaluate the likelihood function.
Summary statistics
The probability of generating a data set
with a small distance to
typically decreases as the dimensionality of the data increases. This leads to a substantial decrease in the computational efficiency of the above basic ABC rejection algorithm. A common approach to lessen this problem is to replace
with a set of lower-dimensional
summary statistics
In descriptive statistics, summary statistics are used to summarize a set of observations, in order to communicate the largest amount of information as simply as possible. Statisticians commonly try to describe the observations in
* a measure of ...
, which are selected to capture the relevant information in
. The acceptance criterion in ABC rejection algorithm becomes:
:
.
If the summary statistics are
sufficient
In logic and mathematics, necessity and sufficiency are terms used to describe a conditional or implicational relationship between two statements. For example, in the conditional statement: "If then ", is necessary for , because the truth of ...
with respect to the model parameters
, the efficiency increase obtained in this way does not introduce any error.
Indeed, by definition, sufficiency implies that all information in
about
is captured by
.
As
elaborated below, it is typically impossible, outside the
exponential family of distributions, to identify a finite-dimensional set of sufficient statistics. Nevertheless, informative but possibly insufficient summary statistics are often used in applications where inference is performed with ABC methods.
Example
An illustrative example is a
bistable system that can be characterized by a
hidden Markov model (HMM) subject to measurement noise. Such models are employed for many biological systems: They have, for example, been used in development,
cell signaling
In biology, cell signaling (cell signalling in British English) or cell communication is the ability of a cell to receive, process, and transmit signals with its environment and with itself. Cell signaling is a fundamental property of all cellula ...
,
activation
Activation, in chemistry and biology, is the process whereby something is prepared or excited for a subsequent reaction.
Chemistry
In chemistry, "activation" refers to the reversible transition of a molecule into a nearly identical chemical or ...
/deactivation, logical processing and
non-equilibrium thermodynamics
Non-equilibrium thermodynamics is a branch of thermodynamics that deals with physical systems that are not in thermodynamic equilibrium but can be described in terms of macroscopic quantities (non-equilibrium state variables) that represent an ext ...
. For instance, the behavior of the
Sonic hedgehog
Sonic hedgehog protein (SHH) is encoded for by the ''SHH'' gene. The protein is named after the character ''Sonic the Hedgehog''.
This signaling molecule is key in regulating embryonic morphogenesis in all animals. SHH controls organogenesis and ...
(Shh) transcription factor in ''
Drosophila melanogaster
''Drosophila melanogaster'' is a species of fly (the taxonomic order Diptera) in the family Drosophilidae. The species is often referred to as the fruit fly or lesser fruit fly, or less commonly the "vinegar fly" or "pomace fly". Starting with Ch ...
'' can be modeled with an HMM.
The (biological) dynamical model consists of two states: A and B. If the probability of a transition from one state to the other is defined as
in both directions, then the probability to remain in the same state at each time step is
. The probability to measure the state correctly is
(and conversely, the probability of an incorrect measurement is
).
Due to the conditional dependencies between states at different time points, calculation of the likelihood of time series data is somewhat tedious, which illustrates the motivation to use ABC. A computational issue for basic ABC is the large dimensionality of the data in an application like this. The dimensionality can be reduced using the summary statistic
, which is the frequency of switches between the two states. The absolute difference is used as a distance measure
with tolerance
. The posterior inference about the parameter
can be done following the five steps presented in.
Step 1: Assume that the observed data form the state sequence AAAABAABBAAAAAABAAAA, which is generated using
and
. The associated summary statistic—the number of switches between the states in the experimental data—is
.
Step 2: Assuming nothing is known about
, a uniform prior in the interval