Approximate Bayesian computation (ABC) constitutes a class of computational methods rooted in

Bayesian statistics Bayesian statistics is a theory in the field of statistics based on the Bayesian interpretation of probability where probability expresses a ''degree of belief'' in an event. The degree of belief may be based on prior knowledge about the event, ...

that can be used to estimate the posterior distributions of model parameters. In all model-based

statistical inference Statistical inference is the process of using data analysis to infer properties of an underlying probability distribution, distribution of probability.Upton, G., Cook, I. (2008) ''Oxford Dictionary of Statistics'', OUP. . Inferential statistical ...

, the

likelihood function The likelihood function (often simply called the likelihood) represents the probability of random variable realizations conditional on particular values of the statistical parameters. Thus, when evaluated on a given sample, the likelihood funct ...

is of central importance, since it expresses the probability of the observed data under a particular

statistical model A statistical model is a mathematical model that embodies a set of statistical assumptions concerning the generation of Sample (statistics), sample data (and similar data from a larger Statistical population, population). A statistical model repres ...

, and thus quantifies the support data lend to particular values of parameters and to choices among different models. For simple models, an analytical formula for the likelihood function can typically be derived. However, for more complex models, an analytical formula might be elusive or the likelihood function might be computationally very costly to evaluate. ABC methods bypass the evaluation of the likelihood function. In this way, ABC methods widen the realm of models for which statistical inference can be considered. ABC methods are mathematically well-founded, but they inevitably make assumptions and approximations whose impact needs to be carefully assessed. Furthermore, the wider application domain of ABC exacerbates the challenges of

parameter estimation Estimation theory is a branch of statistics that deals with estimating the values of parameters based on measured empirical data that has a random component. The parameters describe an underlying physical setting in such a way that their value ...

and

model selection Model selection is the task of selecting a statistical model from a set of candidate models, given data. In the simplest cases, a pre-existing set of data is considered. However, the task can also involve the design of experiments such that the ...

. ABC has rapidly gained popularity over the last years and in particular for the analysis of complex problems arising in

biological sciences Biology is the scientific study of life. It is a natural science with a broad scope but has several unifying themes that tie it together as a single, coherent field. For instance, all organisms are made up of cells that process hereditary ...

, e.g. in

population genetics Population genetics is a subfield of genetics that deals with genetic differences within and between populations, and is a part of evolutionary biology. Studies in this branch of biology examine such phenomena as adaptation, speciation, and pop ...

ecology Ecology () is the study of the relationships between living organisms, including humans, and their physical environment. Ecology considers organisms at the individual, population, community, ecosystem, and biosphere level. Ecology overlaps wi ...

epidemiology Epidemiology is the study and analysis of the distribution (who, when, and where), patterns and determinants of health and disease conditions in a defined population. It is a cornerstone of public health, and shapes policy decisions and evidenc ...

systems biology Systems biology is the computational modeling, computational and mathematical analysis and modeling of complex biological systems. It is a biology-based interdisciplinary field of study that focuses on complex interactions within biological syst ...

, and in

radio propagation Radio propagation is the behavior of radio waves as they travel, or are propagated, from one point to another in vacuum, or into various parts of the atmosphere. As a form of electromagnetic radiation, like light waves, radio waves are affecte ...

History

The first ABC-related ideas date back to the 1980s.

Donald Rubin Donald is a masculine given name derived from the Gaelic name ''Dòmhnall''.. This comes from the Proto-Celtic *''Dumno-ualos'' ("world-ruler" or "world-wielder"). The final -''d'' in ''Donald'' is partly derived from a misinterpretation of the ...

, when discussing the interpretation of Bayesian statements in 1984, described a hypothetical sampling mechanism that yields a sample from the

posterior distribution The posterior probability is a type of conditional probability that results from updating the prior probability with information summarized by the likelihood via an application of Bayes' rule. From an epistemological perspective, the posterior p ...

. This scheme was more of a conceptual

thought experiment A thought experiment is a hypothetical situation in which a hypothesis, theory, or principle is laid out for the purpose of thinking through its consequences. History The ancient Greek ''deiknymi'' (), or thought experiment, "was the most anci ...

to demonstrate what type of manipulations are done when inferring the posterior distributions of parameters. The description of the sampling mechanism coincides exactly with that of the ABC-rejection scheme, and this article can be considered to be the first to describe approximate Bayesian computation. However, a two-stage

quincunx A quincunx () is a geometric pattern consisting of five points arranged in a cross, with four of them forming a square or rectangle and a fifth at its center. The same pattern has other names, including "in saltire" or "in cross" in heraldry (dep ...

was constructed by

Francis Galton Sir Francis Galton, FRS FRAI (; 16 February 1822 – 17 January 1911), was an English Victorian era polymath: a statistician, sociologist, psychologist, anthropologist, tropical explorer, geographer, inventor, meteorologist, proto- ...

in the late 1800s that can be seen as a physical implementation of an ABC-rejection scheme for a single unknown (parameter) and a single observation.see figure 5 in Another prescient point was made by Rubin when he argued that in Bayesian inference, applied statisticians should not settle for analytically tractable models only, but instead consider computational methods that allow them to estimate the posterior distribution of interest. This way, a wider range of models can be considered. These arguments are particularly relevant in the context of ABC. In 1984,

Peter Diggle Peter John Diggle, (born 24 February 1950, Lancashire, England) is a British statistician. He holds concurrent appointments with the Faculty of Health and Medicine at Lancaster University, and the Institute of Infection and Global Health at the U ...

and Richard Gratton suggested using a systematic simulation scheme to approximate the likelihood function in situations where its analytic form is intractable. Their method was based on defining a grid in the parameter space and using it to approximate the likelihood by running several simulations for each grid point. The approximation was then improved by applying smoothing techniques to the outcomes of the simulations. While the idea of using simulation for hypothesis testing was not new, Diggle and Gratton seemingly introduced the first procedure using simulation to do statistical inference under a circumstance where the likelihood is intractable. Although Diggle and Gratton's approach had opened a new frontier, their method was not yet exactly identical to what is now known as ABC, as it aimed at approximating the likelihood rather than the posterior distribution. An article of

Simon Tavaré Simon Tavaré (born 1952) is the founding Director of the Herbert and Florence Irving Institute of Cancer Dynamics at Columbia University. Prior to joining Columbia, he was Director of the Cancer Research UK Cambridge Institute, Professor of ...

and co-authors was first to propose an ABC algorithm for posterior inference. In their seminal work, inference about the genealogy of DNA sequence data was considered, and in particular the problem of deciding the posterior distribution of the time to the

most recent common ancestor In biology and genetic genealogy, the most recent common ancestor (MRCA), also known as the last common ancestor (LCA) or concestor, of a set of organisms is the most recent individual from which all the organisms of the set are descended. The ...

of the sampled individuals. Such inference is analytically intractable for many demographic models, but the authors presented ways of simulating coalescent trees under the putative models. A sample from the posterior of model parameters was obtained by accepting/rejecting proposals based on comparing the number of segregating sites in the synthetic and real data. This work was followed by an applied study on modeling the variation in human Y chromosome by

Jonathan K. Pritchard Jonathan Karl Pritchard is an English-born professor of genetics at Stanford University, best known for his development of the STRUCTURE algorithm for studying population structure and his work on human genetic variation and evolution.Pritchard Lab ...

and co-authors using the ABC method. Finally, the term approximate Bayesian computation was established by Mark Beaumont and co-authors, extending further the ABC methodology and discussing the suitability of the ABC-approach more specifically for problems in population genetics. Since then, ABC has spread to applications outside population genetics, such as systems biology, epidemiology, and

phylogeography Phylogeography is the study of the historical processes that may be responsible for the past to present geographic distributions of genealogical lineages. This is accomplished by considering the geographic distribution of individuals in light of ge ...

Method

Motivation

A common incarnation of Bayes’ theorem relates the

conditional probability In probability theory, conditional probability is a measure of the probability of an event occurring, given that another event (by assumption, presumption, assertion or evidence) has already occurred. This particular method relies on event B occur ...

(or density) of a particular parameter value

\theta

given data

D

to the

probability Probability is the branch of mathematics concerning numerical descriptions of how likely an Event (probability theory), event is to occur, or how likely it is that a proposition is true. The probability of an event is a number between 0 and ...

D

given

\theta

by the rule :

p(\theta, D) = \frac

, where

p(\theta, D)

denotes the posterior,

p(D, \theta)

the likelihood,

p(\theta)

the prior, and

p(D)

the evidence (also referred to as the

marginal likelihood A marginal likelihood is a likelihood function that has been integrated over the parameter space. In Bayesian statistics, it represents the probability of generating the observed sample from a prior and is therefore often referred to as model evi ...

or the prior predictive probability of the data). Note that the denominator

p(D)

is normalizing the total probability of the posterior density

p(\theta, D)

to one and can be calculated that way. The prior represents beliefs or knowledge (such as f.e. physical constraints) about

\theta

before

D

is available. Since the prior narrows down uncertainty, the posterior estimates have less variance, but might be biased. For convenience the prior is often specified by choosing a particular distribution among a set of well-known and tractable families of distributions, such that both the evaluation of prior probabilities and random generation of values of

\theta

are relatively straightforward. For certain kinds of models, it is more pragmatic to specify the prior

p(\theta)

using a factorization of the joint distribution of all the elements of

\theta

in terms of a sequence of their conditional distributions. If one is only interested in the relative posterior plausibilities of different values of

\theta

, the evidence

p(D)

can be ignored, as it constitutes a normalising constant, which cancels for any ratio of posterior probabilities. It remains, however, necessary to evaluate the likelihood

p(D, \theta)

and the prior

p(\theta)

. For numerous applications, it is

computationally expensive In computer science, the analysis of algorithms is the process of finding the computational complexity of algorithms—the amount of time, storage, or other resources needed to execute them. Usually, this involves determining a function that re ...

, or even completely infeasible, to evaluate the likelihood, which motivates the use of ABC to circumvent this issue.

The ABC rejection algorithm

All ABC-based methods approximate the likelihood function by simulations, the outcomes of which are compared with the observed data. More specifically, with the ABC rejection algorithm—the most basic form of ABC—a set of parameter points is first sampled from the prior distribution. Given a sampled parameter point

\hat

, a data set

\hat

is then simulated under the statistical model

M

specified by

\hat

. If the generated

\hat

is too different from the observed data

D

, the sampled parameter value is discarded. In precise terms,

\hat

is accepted with tolerance

\epsilon \ge 0

if: :

\rho (\hat,D)\le\epsilon

, where the distance measure

\rho(\hat,D)

determines the level of discrepancy between

\hat

and

D

based on a given

metric Metric or metrical may refer to: * Metric system, an internationally adopted decimal system of measurement * An adjective indicating relation to measurement in general, or a noun describing a specific type of measurement Mathematics In mathem ...

(e.g.

Euclidean distance In mathematics, the Euclidean distance between two points in Euclidean space is the length of a line segment between the two points. It can be calculated from the Cartesian coordinates of the points using the Pythagorean theorem, therefor ...

). A strictly positive tolerance is usually necessary, since the probability that the simulation outcome coincides exactly with the data (event

\hat=D

) is negligible for all but trivial applications of ABC, which would in practice lead to rejection of nearly all sampled parameter points. The outcome of the ABC rejection algorithm is a sample of parameter values approximately distributed according to the desired posterior distribution, and, crucially, obtained without the need to explicitly evaluate the likelihood function.

Summary statistics

The probability of generating a data set

\hat

with a small distance to

D

typically decreases as the dimensionality of the data increases. This leads to a substantial decrease in the computational efficiency of the above basic ABC rejection algorithm. A common approach to lessen this problem is to replace

D

with a set of lower-dimensional

summary statistics In descriptive statistics, summary statistics are used to summarize a set of observations, in order to communicate the largest amount of information as simply as possible. Statisticians commonly try to describe the observations in * a measure of ...

S(D)

, which are selected to capture the relevant information in

D

. The acceptance criterion in ABC rejection algorithm becomes: :

\rho(S(\hat),S(D))\le\epsilon

. If the summary statistics are

sufficient In logic and mathematics, necessity and sufficiency are terms used to describe a conditional or implicational relationship between two statements. For example, in the conditional statement: "If then ", is necessary for , because the truth of ...

with respect to the model parameters

\theta

, the efficiency increase obtained in this way does not introduce any error. Indeed, by definition, sufficiency implies that all information in

D

about

\theta

is captured by

S(D)

. As elaborated below, it is typically impossible, outside the exponential family of distributions, to identify a finite-dimensional set of sufficient statistics. Nevertheless, informative but possibly insufficient summary statistics are often used in applications where inference is performed with ABC methods.

Example

An illustrative example is a bistable system that can be characterized by a hidden Markov model (HMM) subject to measurement noise. Such models are employed for many biological systems: They have, for example, been used in development,

cell signaling In biology, cell signaling (cell signalling in British English) or cell communication is the ability of a cell to receive, process, and transmit signals with its environment and with itself. Cell signaling is a fundamental property of all cellula ...

activation Activation, in chemistry and biology, is the process whereby something is prepared or excited for a subsequent reaction. Chemistry In chemistry, "activation" refers to the reversible transition of a molecule into a nearly identical chemical or ...

/deactivation, logical processing and

non-equilibrium thermodynamics Non-equilibrium thermodynamics is a branch of thermodynamics that deals with physical systems that are not in thermodynamic equilibrium but can be described in terms of macroscopic quantities (non-equilibrium state variables) that represent an ext ...

. For instance, the behavior of the

Sonic hedgehog Sonic hedgehog protein (SHH) is encoded for by the ''SHH'' gene. The protein is named after the character ''Sonic the Hedgehog''. This signaling molecule is key in regulating embryonic morphogenesis in all animals. SHH controls organogenesis and ...

(Shh) transcription factor in ''

Drosophila melanogaster ''Drosophila melanogaster'' is a species of fly (the taxonomic order Diptera) in the family Drosophilidae. The species is often referred to as the fruit fly or lesser fruit fly, or less commonly the "vinegar fly" or "pomace fly". Starting with Ch ...

'' can be modeled with an HMM. The (biological) dynamical model consists of two states: A and B. If the probability of a transition from one state to the other is defined as

\theta

in both directions, then the probability to remain in the same state at each time step is

. The probability to measure the state correctly is

\gamma

(and conversely, the probability of an incorrect measurement is

). Due to the conditional dependencies between states at different time points, calculation of the likelihood of time series data is somewhat tedious, which illustrates the motivation to use ABC. A computational issue for basic ABC is the large dimensionality of the data in an application like this. The dimensionality can be reduced using the summary statistic

S

, which is the frequency of switches between the two states. The absolute difference is used as a distance measure

\rho(\cdot,\cdot)

with tolerance

\epsilon=2

. The posterior inference about the parameter

\theta

can be done following the five steps presented in. Step 1: Assume that the observed data form the state sequence AAAABAABBAAAAAABAAAA, which is generated using

\theta=0.25

and

\gamma=0.8

. The associated summary statistic—the number of switches between the states in the experimental data—is

\omega_E=6

. Step 2: Assuming nothing is known about

\theta

, a uniform prior in the interval

,1 /math> is employed. The parameter \gamma is assumed to be known and fixed to the data-generating value \gamma=0.8, but it could in general also be estimated from the observations. A total of n parameter points are drawn from the prior, and the model is simulated for each of the parameter points \theta_i: \text i = 1,\ldots, n, which results in n sequences of simulated data. In this example, n=5, with each drawn parameter and simulated dataset recorded in Table 1, columns 2-3 . In practice, n would need to be much larger to obtain an appropriate approximation.



Step 3: The summary statistic is computed for each sequence of simulated data \omega_: \text i = 1,\ldots,n .

Step 4: The distance between the observed and simulated transition frequencies \rho(\omega_, \omega_E) = , \omega_ - \omega_, is computed for all parameter points. Parameter points for which the distance is smaller than or equal to \epsilon are accepted as approximate samples from the posterior.

Step 5: The posterior distribution is approximated with the accepted parameter points. The posterior distribution should have a non-negligible probability for parameter values in a region around the true value of \theta in the system if the data are sufficiently informative. In this example, the posterior probability mass is evenly split between the values 0.08 and 0.43.

The posterior probabilities are obtained via ABC with large n by utilizing the summary statistic (with \epsilon = 0 and \epsilon = 2) and the full data sequence (with \epsilon = 0). These are compared with the true posterior, which can be computed exactly and efficiently using the

Viterbi algorithm The Viterbi algorithm is a dynamic programming algorithm for obtaining the maximum a posteriori probability estimate of the most likely sequence of hidden states—called the Viterbi path—that results in a sequence of observed events, especiall ...

. The summary statistic utilized in this example is not sufficient, as the deviation from the theoretical posterior is significant even under the stringent requirement of

\epsilon = 0

. A much longer observed data sequence would be needed to obtain a posterior concentrated around

\theta = 0.25

, the true value of

\theta

. This example application of ABC uses simplifications for illustrative purposes. More realistic applications of ABC are available in a growing number of peer-reviewed articles.

Model comparison with ABC

Outside of parameter estimation, the ABC framework can be used to compute the posterior probabilities of different candidate models. In such applications, one possibility is to use rejection sampling in a hierarchical manner. First, a model is sampled from the prior distribution for the models. Then, parameters are sampled from the prior distribution assigned to that model. Finally, a simulation is performed as in single-model ABC. The relative acceptance frequencies for the different models now approximate the posterior distribution for these models. Again, computational improvements for ABC in the space of models have been proposed, such as constructing a particle filter in the joint space of models and parameters. Once the posterior probabilities of the models have been estimated, one can make full use of the techniques of

Bayesian model comparison The Bayes factor is a ratio of two competing statistical models represented by their marginal likelihood, and is used to quantify the support for one model over the other. The models in questions can have a common set of parameters, such as a nul ...

. For instance, to compare the relative plausibilities of two models

M_1

and

M_2

, one can compute their posterior ratio, which is related to the

Bayes factor The Bayes factor is a ratio of two competing statistical models represented by their marginal likelihood, and is used to quantify the support for one model over the other. The models in questions can have a common set of parameters, such as a nu ...

B_

: :

\frac=\frac\frac = B_\frac

. If the model priors are equal—that is,

p(M_1)=p(M_2)

—the Bayes factor equals the posterior ratio. In practice, as discussed below, these measures can be highly sensitive to the choice of parameter prior distributions and summary statistics, and thus conclusions of model comparison should be drawn with caution.

Pitfalls and remedies

As for all statistical methods, a number of assumptions and approximations are inherently required for the application of ABC-based methods to real modeling problems. For example, setting the tolerance parameter

\epsilon

to zero ensures an exact result, but typically makes computations prohibitively expensive. Thus, values of

\epsilon

larger than zero are used in practice, which introduces a bias. Likewise, sufficient statistics are typically not available and instead, other summary statistics are used, which introduces an additional bias due to the loss of information. Additional sources of bias- for example, in the context of model selection—may be more subtle. At the same time, some of the criticisms that have been directed at the ABC methods, in particular within the field of

, are not specific to ABC and apply to all Bayesian methods or even all statistical methods (e.g., the choice of prior distribution and parameter ranges). However, because of the ability of ABC-methods to handle much more complex models, some of these general pitfalls are of particular relevance in the context of ABC analyses. This section discusses these potential risks and reviews possible ways to address them.

Approximation of the posterior

A non-negligible

\epsilon

comes with the price that one samples from

p(\theta, \rho(\hat,D)\le\epsilon)

instead of the true posterior

p(\theta, D)

. With a sufficiently small tolerance, and a sensible distance measure, the resulting distribution

p(\theta, \rho(\hat,D)\le\epsilon)

should often approximate the actual target distribution

p(\theta, D)

reasonably well. On the other hand, a tolerance that is large enough that every point in the parameter space becomes accepted will yield a replica of the prior distribution. There are empirical studies of the difference between

p(\theta, \rho(\hat,D)\le\epsilon)

and

p(\theta, D)

as a function of

\epsilon

, and theoretical results for an upper

\epsilon

-dependent bound for the error in parameter estimates. The accuracy of the posterior (defined as the expected quadratic loss) delivered by ABC as a function of

\epsilon

has also been investigated. However, the convergence of the distributions when

\epsilon

approaches zero, and how it depends on the distance measure used, is an important topic that has yet to be investigated in greater detail. In particular, it remains difficult to disentangle errors introduced by this approximation from errors due to model mis-specification. As an attempt to correct some of the error due to a non-zero

\epsilon

, the usage of local linear weighted regression with ABC to reduce the variance of the posterior estimates has been suggested. The method assigns weights to the parameters according to how well simulated summaries adhere to the observed ones and performs linear regression between the summaries and the weighted parameters in the vicinity of observed summaries. The obtained regression coefficients are used to correct sampled parameters in the direction of observed summaries. An improvement was suggested in the form of nonlinear regression using a feed-forward neural network model. However, it has been shown that the posterior distributions obtained with these approaches are not always consistent with the prior distribution, which did lead to a reformulation of the regression adjustment that respects the prior distribution. Finally, statistical inference using ABC with a non-zero tolerance

\epsilon

is not inherently flawed: under the assumption of measurement errors, the optimal

\epsilon

can in fact be shown to be not zero. Indeed, the bias caused by a non-zero tolerance can be characterized and compensated by introducing a specific form of noise to the summary statistics. Asymptotic consistency for such “noisy ABC”, has been established, together with formulas for the asymptotic variance of the parameter estimates for a fixed tolerance.

Choice and sufficiency of summary statistics

Summary statistics may be used to increase the acceptance rate of ABC for high-dimensional data. Low-dimensional sufficient statistics are optimal for this purpose, as they capture all relevant information present in the data in the simplest possible form. However, low-dimensional sufficient statistics are typically unattainable for statistical models where ABC-based inference is most relevant, and consequently, some

heuristic A heuristic (; ), or heuristic technique, is any approach to problem solving or self-discovery that employs a practical method that is not guaranteed to be optimal, perfect, or rational, but is nevertheless sufficient for reaching an immediate, ...

is usually necessary to identify useful low-dimensional summary statistics. The use of a set of poorly chosen summary statistics will often lead to inflated

credible interval In Bayesian statistics, a credible interval is an interval within which an unobserved parameter value falls with a particular probability. It is an interval in the domain of a posterior probability distribution or a predictive distribution. The ...

s due to the implied loss of information, which can also bias the discrimination between models. A review of methods for choosing summary statistics is available, which may provide valuable guidance in practice. One approach to capture most of the information present in data would be to use many statistics, but the accuracy and stability of ABC appears to decrease rapidly with an increasing numbers of summary statistics. Instead, a better strategy is to focus on the relevant statistics only—relevancy depending on the whole inference problem, on the model used, and on the data at hand. An algorithm has been proposed for identifying a representative subset of summary statistics, by iteratively assessing whether an additional statistic introduces a meaningful modification of the posterior. One of the challenges here is that a large ABC approximation error may heavily influence the conclusions about the usefulness of a statistic at any stage of the procedure. Another method decomposes into two main steps. First, a reference approximation of the posterior is constructed by minimizing the

entropy Entropy is a scientific concept, as well as a measurable physical property, that is most commonly associated with a state of disorder, randomness, or uncertainty. The term and the concept are used in diverse fields, from classical thermodynam ...

. Sets of candidate summaries are then evaluated by comparing the ABC-approximated posteriors with the reference posterior. With both of these strategies, a subset of statistics is selected from a large set of candidate statistics. Instead, the

partial least squares regression Partial least squares regression (PLS regression) is a statistical method that bears some relation to principal components regression; instead of finding hyperplanes of maximum variance between the response and independent variables, it finds a ...

approach uses information from all the candidate statistics, each being weighted appropriately. Recently, a method for constructing summaries in a semi-automatic manner has attained a considerable interest. This method is based on the observation that the optimal choice of summary statistics, when minimizing the quadratic loss of the parameter point estimates, can be obtained through the posterior mean of the parameters, which is approximated by performing a linear regression based on the simulated data. Methods for the identification of summary statistics that could also simultaneously assess the influence on the approximation of the posterior would be of substantial value. This is because the choice of summary statistics and the choice of tolerance constitute two sources of error in the resulting posterior distribution. These errors may corrupt the ranking of models and may also lead to incorrect model predictions. Indeed, none of the methods above assesses the choice of summaries for the purpose of model selection.

Bayes factor with ABC and summary statistics

It has been shown that the combination of insufficient summary statistics and ABC for model selection can be problematic. Indeed, if one lets the Bayes factor based on the summary statistic

S(D)

be denoted by

B_^s

, the relation between

B_

and

B_^s

takes the form: :

B_=\frac=\frac \frac=\frac B_^s

. Thus, a summary statistic

S(D)

is sufficient for comparing two models

M_1

and

M_2

if and only if: :

p(D, S(D),M_1)=p(D, S(D),M_2)

, which results in that

B_=B_^s

. It is also clear from the equation above that there might be a huge difference between

B_

and

B_^s

if the condition is not satisfied, as can be demonstrated by toy examples. Crucially, it was shown that sufficiency for

M_1

M_2

alone, or for both models, does not guarantee sufficiency for ranking the models. However, it was also shown that any sufficient summary statistic for a model

M

in which both

M_1

and

M_2

are

nested ''Nested'' is the seventh studio album by Bronx-born singer, songwriter and pianist Laura Nyro, released in 1978 on Columbia Records. Following on from her extensive tour to promote 1976's ''Smile'', which resulted in the 1977 live album '' Seas ...

is valid for ranking the nested models. The computation of Bayes factors on

S(D)

may therefore be misleading for model selection purposes, unless the ratio between the Bayes factors on

D

and

S(D)

would be available, or at least could be approximated reasonably well. Alternatively, necessary and sufficient conditions on summary statistics for a consistent Bayesian model choice have recently been derived, which can provide useful guidance. However, this issue is only relevant for model selection when the dimension of the data has been reduced. ABC-based inference, in which the actual data sets are directly compared—as is the case for some systems biology applications (e.g., see )—circumvents this problem.

Indispensable quality controls

As the above discussion makes clear, any ABC analysis requires choices and trade-offs that can have a considerable impact on its outcomes. Specifically, the choice of competing models/hypotheses, the number of simulations, the choice of summary statistics, or the acceptance threshold cannot currently be based on general rules, but the effect of these choices should be evaluated and tested in each study. A number of heuristic approaches to the quality control of ABC have been proposed, such as the quantification of the fraction of parameter variance explained by the summary statistics. A common class of methods aims at assessing whether or not the inference yields valid results, regardless of the actually observed data. For instance, given a set of parameter values, which are typically drawn from the prior or the posterior distributions for a model, one can generate a large number of artificial datasets. In this way, the quality and robustness of ABC inference can be assessed in a controlled setting, by gauging how well the chosen ABC inference method recovers the true parameter values, and also models if multiple structurally different models are considered simultaneously. Another class of methods assesses whether the inference was successful in light of the given observed data, for example, by comparing the posterior predictive distribution of summary statistics to the summary statistics observed. Beyond that, cross-validation techniques and predictive checks represent promising future strategies to evaluate the stability and out-of-sample predictive validity of ABC inferences. This is particularly important when modeling large data sets, because then the posterior support of a particular model can appear overwhelmingly conclusive, even if all proposed models in fact are poor representations of the stochastic system underlying the observation data. Out-of-sample predictive checks can reveal potential systematic biases within a model and provide clues on to how to improve its structure or parametrization. Fundamentally novel approaches for model choice that incorporate quality control as an integral step in the process have recently been proposed. ABC allows, by construction, estimation of the discrepancies between the observed data and the model predictions, with respect to a comprehensive set of statistics. These statistics are not necessarily the same as those used in the acceptance criterion. The resulting discrepancy distributions have been used for selecting models that are in agreement with many aspects of the data simultaneously, and model inconsistency is detected from conflicting and co-dependent summaries. Another quality-control-based method for model selection employs ABC to approximate the effective number of model parameters and the deviance of the posterior predictive distributions of summaries and parameters. The deviance information criterion is then used as measure of model fit. It has also been shown that the models preferred based on this criterion can conflict with those supported by

s. For this reason, it is useful to combine different methods for model selection to obtain correct conclusions. Quality controls are achievable and indeed performed in many ABC-based works, but for certain problems, the assessment of the impact of the method-related parameters can be challenging. However, the rapidly increasing use of ABC can be expected to provide a more thorough understanding of the limitations and applicability of the method.

General risks in statistical inference exacerbated in ABC

This section reviews risks that are strictly speaking not specific to ABC, but also relevant for other statistical methods as well. However, the flexibility offered by ABC to analyze very complex models makes them highly relevant to discuss here.

Prior distribution and parameter ranges

The specification of the range and the prior distribution of parameters strongly benefits from previous knowledge about the properties of the system. One criticism has been that in some studies the “parameter ranges and distributions are only guessed based upon the subjective opinion of the investigators”, which is connected to classical objections of Bayesian approaches. With any computational method, it is typically necessary to constrain the investigated parameter ranges. The parameter ranges should if possible be defined based on known properties of the studied system, but may for practical applications necessitate an educated guess. However, theoretical results regarding objective priors are available, which may for example be based on the

principle of indifference The principle of indifference (also called principle of insufficient reason) is a rule for assigning epistemic probabilities. The principle of indifference states that in the absence of any relevant evidence, agents should distribute their cre ...

or the

principle of maximum entropy The principle of maximum entropy states that the probability distribution which best represents the current state of knowledge about a system is the one with largest entropy, in the context of precisely stated prior data (such as a proposition ...

. On the other hand, automated or semi-automated methods for choosing a prior distribution often yield improper densities. As most ABC procedures require generating samples from the prior, improper priors are not directly applicable to ABC. One should also keep the purpose of the analysis in mind when choosing the prior distribution. In principle, uninformative and flat priors, that exaggerate our subjective ignorance about the parameters, may still yield reasonable parameter estimates. However, Bayes factors are highly sensitive to the prior distribution of parameters. Conclusions on model choice based on Bayes factor can be misleading unless the sensitivity of conclusions to the choice of priors is carefully considered.

Small number of models

Model-based methods have been criticized for not exhaustively covering the hypothesis space. Indeed, model-based studies often revolve around a small number of models, and due to the high computational cost to evaluate a single model in some instances, it may then be difficult to cover a large part of the hypothesis space. An upper limit to the number of considered candidate models is typically set by the substantial effort required to define the models and to choose between many alternative options. There is no commonly accepted ABC-specific procedure for model construction, so experience and prior knowledge are used instead. Although more robust procedures for ''a priori'' model choice and formulation would be beneficial, there is no one-size-fits-all strategy for model development in statistics: sensible characterization of complex systems will always necessitate a great deal of detective work and use of expert knowledge from the problem domain. Some opponents of ABC contend that since only few models—subjectively chosen and probably all wrong—can be realistically considered, ABC analyses provide only limited insight. However, there is an important distinction between identifying a plausible null hypothesis and assessing the relative fit of alternative hypotheses. Since useful null hypotheses, that potentially hold true, can extremely seldom be put forward in the context of complex models, predictive ability of statistical models as explanations of complex phenomena is far more important than the test of a statistical null hypothesis in this context. It is also common to average over the investigated models, weighted based on their relative plausibility, to infer model features (e.g., parameter values) and to make predictions.

Large datasets

Large data sets may constitute a computational bottleneck for model-based methods. It was, for example, pointed out that in some ABC-based analyses, part of the data have to be omitted. A number of authors have argued that large data sets are not a practical limitation, although the severity of this issue depends strongly on the characteristics of the models. Several aspects of a modeling problem can contribute to the computational complexity, such as the sample size, number of observed variables or features, time or spatial resolution, etc. However, with increasing computing power, this issue will potentially be less important. Instead of sampling parameters for each simulation from the prior, it has been proposed alternatively to combine the Metropolis-Hastings algorithm with ABC, which was reported to result in a higher acceptance rate than for plain ABC. Naturally, such an approach inherits the general burdens of MCMC methods, such as the difficulty to assess convergence, correlation among the samples from the posterior, and relatively poor parallelizability. Likewise, the ideas of sequential Monte Carlo (SMC) and population Monte Carlo (PMC) methods have been adapted to the ABC setting. The general idea is to iteratively approach the posterior from the prior through a sequence of target distributions. An advantage of such methods, compared to ABC-MCMC, is that the samples from the resulting posterior are independent. In addition, with sequential methods the tolerance levels must not be specified prior to the analysis, but are adjusted adaptively. It is relatively straightforward to parallelize a number of steps in ABC algorithms based on rejection sampling and sequential Monte Carlo methods. It has also been demonstrated that parallel algorithms may yield significant speedups for MCMC-based inference in phylogenetics, which may be a tractable approach also for ABC-based methods. Yet an adequate model for a complex system is very likely to require intensive computation irrespectively of the chosen method of inference, and it is up to the user to select a method that is suitable for the particular application in question.

Curse of dimensionality

High-dimensional data sets and high-dimensional parameter spaces can require an extremely large number of parameter points to be simulated in ABC-based studies to obtain a reasonable level of accuracy for the posterior inferences. In such situations, the computational cost is severely increased and may in the worst case render the computational analysis intractable. These are examples of well-known phenomena, which are usually referred to with the umbrella term

curse of dimensionality The curse of dimensionality refers to various phenomena that arise when analyzing and organizing data in high-dimensional spaces that do not occur in low-dimensional settings such as the three-dimensional physical space of everyday experience. The ...

. To assess how severely the dimensionality of a data set affects the analysis within the context of ABC, analytical formulas have been derived for the error of the ABC estimators as functions of the dimension of the summary statistics. In addition, Blum and François have investigated how the dimension of the summary statistics is related to the mean squared error for different correction adjustments to the error of ABC estimators. It was also argued that dimension reduction techniques are useful to avoid the curse-of-dimensionality, due to a potentially lower-dimensional underlying structure of summary statistics. Motivated by minimizing the quadratic loss of ABC estimators, Fearnhead and Prangle have proposed a scheme to project (possibly high-dimensional) data into estimates of the parameter posterior means; these means, now having the same dimension as the parameters, are then used as summary statistics for ABC. ABC can be used to infer problems in high-dimensional parameter spaces, although one should account for the possibility of overfitting (e.g., see the model selection methods in and ). However, the probability of accepting the simulated values for the parameters under a given tolerance with the ABC rejection algorithm typically decreases exponentially with increasing dimensionality of the parameter space (due to the global acceptance criterion). Although no computational method (based on ABC or not) seems to be able to break the curse-of-dimensionality, methods have recently been developed to handle high-dimensional parameter spaces under certain assumptions (e.g., based on polynomial approximation on sparse grids, which could potentially heavily reduce the simulation times for ABC). However, the applicability of such methods is problem dependent, and the difficulty of exploring parameter spaces should in general not be underestimated. For example, the introduction of deterministic global parameter estimation led to reports that the global optima obtained in several previous studies of low-dimensional problems were incorrect. For certain problems, it might therefore be difficult to know whether the model is incorrect or, as discussed above, whether the explored region of the parameter space is inappropriate. More pragmatic approaches are to cut the scope of the problem through model reduction, discretisation of variables and the use of canonical models such as noisy models. Noisy models exploit information on the conditional independence between variables.

Software

A number of software packages are currently available for application of ABC to particular classes of statistical models. The suitability of individual software packages depends on the specific application at hand, the computer system environment, and the algorithms required.

References

External links

* * {{DEFAULTSORT:Approximate Bayesian Computation Bayesian statistics Statistical approximations