importance sampling
   HOME

TheInfoList



OR:

Importance sampling is a
Monte Carlo method Monte Carlo methods, or Monte Carlo experiments, are a broad class of computational algorithms that rely on repeated random sampling to obtain numerical results. The underlying concept is to use randomness to solve problems that might be determi ...
for evaluating properties of a particular
distribution Distribution may refer to: Mathematics *Distribution (mathematics), generalized functions used to formulate solutions of partial differential equations * Probability distribution, the probability of a particular value or value range of a vari ...
, while only having samples generated from a different distribution than the distribution of interest. Its introduction in statistics is generally attributed to a paper by
Teun Kloek Teunis (Teun) Kloek (born 1934) is a Dutch economist and Emeritus Professor of Econometrics at the Erasmus Universiteit Rotterdam. His research interests centered on econometric methods and their applications, especially nonparametric and robust ...
and Herman K. van Dijk in 1978, but its precursors can be found in
statistical physics Statistical physics is a branch of physics that evolved from a foundation of statistical mechanics, which uses methods of probability theory and statistics, and particularly the Mathematics, mathematical tools for dealing with large populations ...
as early as 1949. Importance sampling is also related to
umbrella sampling Umbrella sampling is a technique in computational physics and chemistry, used to improve sampling of a system (or different systems) where ergodicity is hindered by the form of the system's energy landscape. It was first suggested by Torrie and ...
in
computational physics Computational physics is the study and implementation of numerical analysis to solve problems in physics for which a quantitative theory already exists. Historically, computational physics was the first application of modern computers in science, ...
. Depending on the application, the term may refer to the process of sampling from this alternative distribution, the process of inference, or both.


Basic theory

Let X\colon \Omega\to \mathbb be a
random variable A random variable (also called random quantity, aleatory variable, or stochastic variable) is a mathematical formalization of a quantity or object which depends on random events. It is a mapping or a function from possible outcomes (e.g., the po ...
in some
probability space In probability theory, a probability space or a probability triple (\Omega, \mathcal, P) is a mathematical construct that provides a formal model of a random process or "experiment". For example, one can define a probability space which models t ...
(\Omega,\mathcal,P). We wish to estimate the
expected value In probability theory, the expected value (also called expectation, expectancy, mathematical expectation, mean, average, or first moment) is a generalization of the weighted average. Informally, the expected value is the arithmetic mean of a l ...
of ''X'' under ''P'', denoted E 'X;P'' If we have statistically independent random samples x_1, \ldots, x_n, generated according to ''P'', then an empirical estimate of E 'X;P''is : \widehat_ ;P= \frac \sum_^n x_i \quad \mathrm\; x_i \sim P(X) and the precision of this estimate depends on the variance of ''X'': : \operatorname widehat_;P= \frac n. The basic idea of importance sampling is to sample the states from a different distribution to lower the variance of the estimation of E 'X;P'' or when sampling from ''P'' is difficult. This is accomplished by first choosing a random variable L\geq 0 such that E 'L'';''P''nbsp;= 1 and that ''P''-
almost everywhere In measure theory (a branch of mathematical analysis), a property holds almost everywhere if, in a technical sense, the set for which the property holds takes up nearly all possibilities. The notion of "almost everywhere" is a companion notion to ...
L(\omega)\neq 0. With the variable ''L'' we define a probability P^ that satisfies : \mathbf ;P= \mathbf\left frac;P^\right The variable ''X''/''L'' will thus be sampled under ''P''(''L'') to estimate E 'X;P''as above and this estimation is improved when \operatorname\left frac;P^\right< \operatorname ;P/math>. When ''X'' is of constant sign over Ω, the best variable ''L'' would clearly be L^*=\frac\geq 0, so that ''X''/''L''* is the searched constant E 'X;P''and a single sample under ''P''(''L''*) suffices to give its value. Unfortunately we cannot take that choice, because E 'X;P''is precisely the value we are looking for! However this theoretical best case ''L*'' gives us an insight into what importance sampling does: : \begin\forall a\in\mathbb, \; P^(X\in ;a+da &= \int_ \frac \, dP(\omega) \\ pt&= \frac\; a\,P(X\in ;a+da \end to the right, a\,P(X\in ;a+da is one of the infinitesimal elements that sum up to E 'X'';''P'' : E ;P= \int_^ a\,P(X\in ;a+da therefore, a good probability change ''P''(''L'') in importance sampling will redistribute the law of ''X'' so that its samples' frequencies are sorted directly according to their weights in E 'X'';''P'' Hence the name "importance sampling." Importance sampling is often used as a Monte Carlo integrator. When P is the uniform distribution and \Omega =\mathbb, E 'X;P''corresponds to the integral of the real function X\colon \mathbb\to\mathbb.


Application to probabilistic inference

Such methods are frequently used to estimate posterior densities or expectations in state and/or parameter estimation problems in probabilistic models that are too hard to treat analytically, for example in
Bayesian network A Bayesian network (also known as a Bayes network, Bayes net, belief network, or decision network) is a probabilistic graphical model that represents a set of variables and their conditional dependencies via a directed acyclic graph (DAG). Bay ...
s.


Application to simulation

Importance sampling is a
variance reduction In mathematics, more specifically in the theory of Monte Carlo methods, variance reduction is a procedure used to increase the precision of the estimates obtained for a given simulation or computational effort. Every output random variable fro ...
technique that can be used in the
Monte Carlo method Monte Carlo methods, or Monte Carlo experiments, are a broad class of computational algorithms that rely on repeated random sampling to obtain numerical results. The underlying concept is to use randomness to solve problems that might be determi ...
. The idea behind importance sampling is that certain values of the input
random variables A random variable (also called random quantity, aleatory variable, or stochastic variable) is a mathematical formalization of a quantity or object which depends on random events. It is a mapping or a function from possible outcomes (e.g., the po ...
in a
simulation A simulation is the imitation of the operation of a real-world process or system over time. Simulations require the use of Conceptual model, models; the model represents the key characteristics or behaviors of the selected system or proc ...
have more impact on the parameter being estimated than others. If these " important" values are emphasized by sampling more frequently, then the
estimator In statistics, an estimator is a rule for calculating an estimate of a given quantity based on observed data: thus the rule (the estimator), the quantity of interest (the estimand) and its result (the estimate) are distinguished. For example, the ...
variance can be reduced. Hence, the basic methodology in importance sampling is to choose a distribution which "encourages" the important values. This use of "biased" distributions will result in a biased estimator if it is applied directly in the simulation. However, the simulation outputs are weighted to correct for the use of the biased distribution, and this ensures that the new importance sampling estimator is unbiased. The weight is given by the
likelihood ratio The likelihood function (often simply called the likelihood) represents the probability of random variable realizations conditional on particular values of the statistical parameters. Thus, when evaluated on a given sample, the likelihood functi ...
, that is, the Radon–Nikodym derivative of the true underlying distribution with respect to the biased simulation distribution. The fundamental issue in implementing importance sampling simulation is the choice of the biased distribution which encourages the important regions of the input variables. Choosing or designing a good biased distribution is the "art" of importance sampling. The rewards for a good distribution can be huge run-time savings; the penalty for a bad distribution can be longer run times than for a general Monte Carlo simulation without importance sampling. Consider X to be the sample and \frac to be the likelihood ratio, where f is the probability density (mass) function of the desired distribution and g is the probability density (mass) function of the biased/proposal/sample distribution. Then the problem can be characterized by choosing the sample distribution g that minimizes the variance of the scaled sample: :g^* = \min_g \operatorname_g \left( X \frac \right). It can be shown that the following distribution minimizes the above variance: : g^*(X) = \frac. Notice that when X\ge 0, this variance becomes 0.


Mathematical approach

Consider estimating by simulation the probability p_t\, of an event X \ge t, where X is a random variable with
distribution Distribution may refer to: Mathematics *Distribution (mathematics), generalized functions used to formulate solutions of partial differential equations * Probability distribution, the probability of a particular value or value range of a vari ...
F and
probability density function In probability theory, a probability density function (PDF), or density of a continuous random variable, is a function whose value at any given sample (or point) in the sample space (the set of possible values taken by the random variable) can ...
f(x)= F'(x)\,, where prime denotes
derivative In mathematics, the derivative of a function of a real variable measures the sensitivity to change of the function value (output value) with respect to a change in its argument (input value). Derivatives are a fundamental tool of calculus. F ...
. A K-length
independent and identically distributed In probability theory and statistics, a collection of random variables is independent and identically distributed if each random variable has the same probability distribution as the others and all are mutually independent. This property is usua ...
(i.i.d.) sequence X_i\, is generated from the distribution F, and the number k_t of random variables that lie above the threshold t are counted. The random variable k_t is characterized by the
Binomial distribution In probability theory and statistics, the binomial distribution with parameters ''n'' and ''p'' is the discrete probability distribution of the number of successes in a sequence of ''n'' independent experiments, each asking a yes–no quest ...
:P(k_t = k)=p_t^k(1-p_t)^,\,\quad \quad k=0,1,\dots,K. One can show that \operatorname _t/K= p_t, and \operatorname _t/K= p_t(1-p_t)/K, so in the limit K \to \infty we are able to obtain p_t. Note that the variance is low if p_t \approx 1. Importance sampling is concerned with the determination and use of an alternate density function f_*\,(for X), usually referred to as a biasing density, for the simulation experiment. This density allows the event to occur more frequently, so the sequence lengths K gets smaller for a given
estimator In statistics, an estimator is a rule for calculating an estimate of a given quantity based on observed data: thus the rule (the estimator), the quantity of interest (the estimand) and its result (the estimate) are distinguished. For example, the ...
variance. Alternatively, for a given K, use of the biasing density results in a variance smaller than that of the conventional Monte Carlo estimate. From the definition of p_t\,, we can introduce f_*\, as below. : \begin p_t & = (X \ge t)\\ pt& = \int 1(x \ge t) \frac f_*(x) \,dx \\ pt& = E_* (X \ge t) W(X)\end where :W(\cdot) \equiv \frac is a likelihood ratio and is referred to as the weighting function. The last equality in the above equation motivates the estimator : \hat p_t = \frac\,\sum_^K 1(X_i \ge t) W(X_i),\,\quad \quad X_i \sim f_* This is the importance sampling estimator of p_t\, and is unbiased. That is, the estimation procedure is to generate i.i.d. samples from f_*\, and for each sample which exceeds t\,, the estimate is incremented by the weight W\, evaluated at the sample value. The results are averaged over K\, trials. The variance of the importance sampling estimator is easily shown to be : \begin \operatorname_*\widehat p_t & = \frac\operatorname_* (X \ge t)W(X)\\ pt& = \frac\left\ \\ pt& = \frac\left\ \end Now, the importance sampling problem then focuses on finding a biasing density f_*\, such that the variance of the importance sampling estimator is less than the variance of the general Monte Carlo estimate. For some biasing density function, which minimizes the variance, and under certain conditions reduces it to zero, it is called an optimal biasing density function.


Conventional biasing methods

Although there are many kinds of biasing methods, the following two methods are most widely used in the applications of importance sampling.


Scaling

Shifting probability mass into the event region by positive scaling of the random variable X\, with a number greater than unity has the effect of increasing the variance (mean also) of the density function. This results in a heavier tail of the density, leading to an increase in the event probability. Scaling is probably one of the earliest biasing methods known and has been extensively used in practice. It is simple to implement and usually provides conservative simulation gains as compared to other methods. In importance sampling by scaling, the simulation density is chosen as the density function of the scaled random variable aX\,, where usually a>1 for tail probability estimation. By transformation, : f_*(x)=\frac f \bigg( \frac \bigg)\, and the weighting function is : W(x)= a \frac \, While scaling shifts probability mass into the desired event region, it also pushes mass into the complementary region X which is undesirable. If X\, is a sum of n\, random variables, the spreading of mass takes place in an n\, dimensional space. The consequence of this is a decreasing importance sampling gain for increasing n\,, and is called the dimensionality effect. A modern version of importance sampling by scaling is e.g. so-called sigma-scaled sampling (SSS) which is running multiple Monte Carlo (MC) analysis with different scaling factors. In opposite to many other high yield estimation methods (like worst-case distances WCD) SSS does not suffer much from the dimensionality problem. Also addressing multiple MC outputs causes no degradation in efficiency. On the other hand, as WCD, SSS is only designed for Gaussian statistical variables, and in opposite to WCD, the SSS method is not designed to provide accurate statistical corners. Another SSS disadvantage is that the MC runs with large scale factors may become difficult, e. g. due to model and simulator convergence problems. In addition, in SSS we face a strong bias-variance trade-off: Using large scale factors, we obtain quite stable yield results, but the larger the scale factors, the larger the bias error. If the advantages of SSS does not matter much in the application of interest, then often other methods are more efficient.


Translation

Another simple and effective biasing technique employs translation of the density function (and hence random variable) to place much of its probability mass in the rare event region. Translation does not suffer from a dimensionality effect and has been successfully used in several applications relating to simulation of
digital communication Data transmission and data reception or, more broadly, data communication or digital communications is the transfer and reception of data in the form of a digital bitstream or a digitized analog signal transmitted over a point-to-point or ...
systems. It often provides better simulation gains than scaling. In biasing by translation, the simulation density is given by : f_*(x)= f(x-c), \quad c>0 \, where c\, is the amount of shift and is to be chosen to minimize the variance of the importance sampling estimator.


Effects of system complexity

The fundamental problem with importance sampling is that designing good biased distributions becomes more complicated as the system complexity increases. Complex systems are the systems with long memory since complex processing of a few inputs is much easier to handle. This dimensionality or memory can cause problems in three ways: * long memory (severe
intersymbol interference In telecommunication, intersymbol interference (ISI) is a form of distortion of a signal in which one symbol interferes with subsequent symbols. This is an unwanted phenomenon as the previous symbols have a similar effect as noise, thus making ...
(ISI)) * unknown memory ( Viterbi decoders) * possibly infinite memory (adaptive equalizers) In principle, the importance sampling ideas remain the same in these situations, but the design becomes much harder. A successful approach to combat this problem is essentially breaking down a simulation into several smaller, more sharply defined subproblems. Then importance sampling strategies are used to target each of the simpler subproblems. Examples of techniques to break the simulation down are conditioning and error-event simulation (EES) and regenerative simulation.


Evaluation of importance sampling

In order to identify successful importance sampling techniques, it is useful to be able to quantify the run-time savings due to the use of the importance sampling approach. The performance measure commonly used is \sigma^2_ / \sigma^2_ \,, and this can be interpreted as the speed-up factor by which the importance sampling estimator achieves the same precision as the MC estimator. This has to be computed empirically since the estimator variances are not likely to be analytically possible when their mean is intractable. Other useful concepts in quantifying an importance sampling estimator are the variance bounds and the notion of asymptotic efficiency. One related measure is the so-called Effective Sample Size (ESS).


Variance cost function

Variance is not the only possible cost function for a simulation, and other cost functions, such as the mean absolute deviation, are used in various statistical applications. Nevertheless, the variance is the primary cost function addressed in the literature, probably due to the use of variances in
confidence interval In frequentist statistics, a confidence interval (CI) is a range of estimates for an unknown parameter. A confidence interval is computed at a designated ''confidence level''; the 95% confidence level is most common, but other levels, such as 9 ...
s and in the performance measure \sigma^2_ / \sigma^2_ \,. An associated issue is the fact that the ratio \sigma^2_ / \sigma^2_ \, overestimates the run-time savings due to importance sampling since it does not include the extra computing time required to compute the weight function. Hence, some people evaluate the net run-time improvement by various means. Perhaps a more serious overhead to importance sampling is the time taken to devise and program the technique and analytically derive the desired weight function.


Multiple and adaptive importance sampling

When different proposal distributions, g_n(x) , n=1,\ldots,N, are jointly used for drawing the samples x_1, \ldots, x_N, different proper weighting functions can be employed (e.g., see ). In an adaptive setting, the proposal distributions, g_(x) , n=1,\ldots,N, and t=1,\ldots,T, are updated each iteration t of the adaptive importance sampling algorithm. Hence, since a population of proposal densities is used, several suitable combinations of sampling and weighting schemes can be employed.


See also

*
Monte Carlo method Monte Carlo methods, or Monte Carlo experiments, are a broad class of computational algorithms that rely on repeated random sampling to obtain numerical results. The underlying concept is to use randomness to solve problems that might be determi ...
*
Variance reduction In mathematics, more specifically in the theory of Monte Carlo methods, variance reduction is a procedure used to increase the precision of the estimates obtained for a given simulation or computational effort. Every output random variable fro ...
*
Stratified sampling In statistics, stratified sampling is a method of sampling from a population which can be partitioned into subpopulations. In statistical surveys, when subpopulations within an overall population vary, it could be advantageous to sample each s ...
* Recursive stratified sampling *
VEGAS algorithm The VEGAS algorithm, due to G. Peter Lepage, is a method for variance reduction, reducing error in Monte Carlo simulations by using a known or approximate probability distribution function to concentrate the search in those areas of the integrand t ...
*
Particle filter Particle filters, or sequential Monte Carlo methods, are a set of Monte Carlo algorithms used to solve filtering problems arising in signal processing and Bayesian statistical inference. The filtering problem consists of estimating the i ...
— a sequential Monte Carlo method, which uses importance sampling *
Auxiliary field Monte Carlo Auxiliary-field Monte Carlo is a method that allows the calculation, by use of Monte Carlo techniques, of averages of operators in many-body quantum mechanical (Blankenbecler 1981, Ceperley 1977) or classical problems (Baeurle 2004, Baeurle 2003, ...
*
Rejection sampling In numerical analysis and computational statistics, rejection sampling is a basic technique used to generate observations from a distribution. It is also commonly called the acceptance-rejection method or "accept-reject algorithm" and is a type of ...
*
Variable bitrate Variable bitrate (VBR) is a term used in telecommunications and computing that relates to the bitrate used in sound or video encoding. As opposed to constant bitrate (CBR), VBR files vary the amount of output data per time segment. VBR allows a ...
— a common audio application of importance sampling


Notes


References

* * * * * * * * * *{{cite book , first=R. , last=Srinivasan , title=Importance sampling – Applications in communications and detection , publisher=Springer-Verlag , location=Berlin , year=2002


External links


Sequential Monte Carlo Methods (Particle Filtering)
homepage on University of Cambridge
Introduction to importance sampling in rare-event simulations
European journal of Physics. PDF document.
Adaptive monte carlo methods for rare event simulation: adaptive monte carlo methods for rare event simulations
Winter Simulation Conference Monte Carlo methods Variance reduction Stochastic simulation