Subset simulation
   HOME

TheInfoList



OR:

Subset simulation is a method used in reliability engineering to compute small (i.e., rare event) failure probabilities encountered in engineering systems. The basic idea is to express a small failure probability as a product of larger conditional probabilities by introducing intermediate failure events. This conceptually converts the original rare event problem into a series of frequent event problems that are easier to solve. In the actual implementation, samples conditional on intermediate failure events are adaptively generated to gradually populate from the frequent to rare event region. These 'conditional samples' provide information for estimating the complementary cumulative distribution function (CCDF) of the quantity of interest (that governs failure), covering the high as well as the low probability regions. They can also be used for investigating the cause and consequence of failure events. The generation of conditional samples is not trivial but can be performed efficiently using
Markov chain Monte Carlo In statistics, Markov chain Monte Carlo (MCMC) methods comprise a class of algorithms for sampling from a probability distribution. By constructing a Markov chain that has the desired distribution as its equilibrium distribution, one can obtain ...
(MCMC). Subset Simulation takes the relationship between the (input) random variables and the (output) response quantity of interest as a '
black box In science, computing, and engineering, a black box is a system which can be viewed in terms of its inputs and outputs (or transfer characteristics), without any knowledge of its internal workings. Its implementation is "opaque" (black). The te ...
'. This can be attractive for complex systems where it is difficult to use other
variance reduction In mathematics, more specifically in the theory of Monte Carlo methods, variance reduction is a procedure used to increase the precision of the estimates obtained for a given simulation or computational effort. Every output random variable fro ...
or
rare event sampling Rare event sampling is an umbrella term for a group of computer simulation methods intended to selectively sample 'special' regions of the dynamic space of systems which are unlikely to visit those special regions through brute-force simulation. A ...
techniques that require prior information about the system behaviour. For problems where it is possible to incorporate prior information into the reliability algorithm, it is often more efficient to use other
variance reduction In mathematics, more specifically in the theory of Monte Carlo methods, variance reduction is a procedure used to increase the precision of the estimates obtained for a given simulation or computational effort. Every output random variable fro ...
techniques such as
importance sampling Importance sampling is a Monte Carlo method for evaluating properties of a particular distribution, while only having samples generated from a different distribution than the distribution of interest. Its introduction in statistics is generally at ...
. It has been shown that subset simulation is more efficient than traditional Monte Carlo simulation, but less efficient than line sampling, when applied to a fracture mechanics test problem.


Basic idea

Let X be a vector of random variables and ''Y'' = ''h''(X) be a scalar (output) response quantity of interest for which the failure probability P(F)=P(Y>b) is to be determined. Each evaluation of ''h''(·) is expensive and so it should be avoided if possible. Using direct
Monte Carlo method Monte Carlo methods, or Monte Carlo experiments, are a broad class of computational algorithms that rely on repeated random sampling to obtain numerical results. The underlying concept is to use randomness to solve problems that might be determi ...
s one can generate
i.i.d. In probability theory and statistics, a collection of random variables is independent and identically distributed if each random variable has the same probability distribution as the others and all are mutually independent. This property is us ...
(
independent and identically distributed In probability theory and statistics, a collection of random variables is independent and identically distributed if each random variable has the same probability distribution as the others and all are mutually independent. This property is usual ...
) samples of X and then estimate ''P''(''F'') simply as the fraction of samples with ''Y'' > ''b''. However this is not efficient when ''P''(''F'') is small because most samples will not fail (i.e., with ''Y'' ≤ ''b'') and in many cases an estimate of 0 results. As a rule of thumb for small ''P''(''F'') one requires 10 failed samples to estimate P(F) with a coefficient of variation of 30% (a moderate requirement). For example, 10000 i.i.d. samples, and hence evaluations of ''h''(·), would be required for such an estimate if ''P''(''F'') = 0.001. Subset simulation attempts to convert a rare event problem into more frequent ones. Let b_1 < b_2 < \cdots < b_m = b be an increasing sequence of intermediate threshold levels. From the basic property of
conditional probability In probability theory, conditional probability is a measure of the probability of an event occurring, given that another event (by assumption, presumption, assertion or evidence) has already occurred. This particular method relies on event B occu ...
, : \begin P(Y>b) &= P(Y>b_\mid Y>b_) P(Y>b_)\\ &= P(Y>b_m\mid Y>b_) P(Y>b_\mid Y>b_) P(Y>b_)\\ &= \cdots \\ &= P(Y>b_m\mid Y>b_) P(Y>b_\mid Y>b_) \cdots P(Y>b_2\mid Y>b_1) P(Y>b_1) \end The 'raw idea' of subset simulation is to estimate P(F) by estimating P(Y>b_1) and the conditional probabilities P(Y>b_i\mid Y>b_) for i=2,\ldots,m , anticipating efficiency gain when these probabilities are not small. To implement this idea there are two basic issues: # Estimating the conditional probabilities by means of simulation requires the efficient generation of samples of X conditional on the intermediate failure events, i.e., the conditional samples. This is generally non-trivial. # The intermediate threshold levels b_i should be chosen so that the intermediate probabilities are not too small (otherwise ending up with rare event problem again) but not too large (otherwise requiring too many levels to reach the target event). However, this requires information of the CCDF, which is the target to be estimated. In the standard algorithm of subset simulation the first issue is resolved by using
Markov chain Monte Carlo In statistics, Markov chain Monte Carlo (MCMC) methods comprise a class of algorithms for sampling from a probability distribution. By constructing a Markov chain that has the desired distribution as its equilibrium distribution, one can obtain ...
. More generic and flexible version of the simulation algorithms not based on
Markov chain Monte Carlo In statistics, Markov chain Monte Carlo (MCMC) methods comprise a class of algorithms for sampling from a probability distribution. By constructing a Markov chain that has the desired distribution as its equilibrium distribution, one can obtain ...
have been recently developed. The second issue is resolved by choosing the intermediate threshold levels adaptively using samples from the last simulation level. As a result, subset simulation in fact produces a set of estimates for ''b'' that corresponds to different fixed values of ''p'' = ''P''(''Y'' > ''b''), rather than estimates of probabilities for fixed threshold values. There are a number of variations of subset simulation used in different contexts in applied probability and stochastic operations research For example, in some variations the simulation effort to estimate each conditional probability P(''Y'' > b''i'' ,  ''Y'' > ''b''''i''−1) (''i'' = 2, ..., ''m'') may not be fixed prior to the simulation, but may be random, similar to the splitting method in rare-event probability estimation. These versions of subset simulation can also be used to approximately sample from the distribution of X given the failure of the system (that is, conditional on the event \). In that case, the relative variance of the (random) number of particles in the final level m can be used to bound the sampling error as measured by the
total variation distance of probability measures In probability theory, the total variation distance is a distance measure for probability distributions. It is an example of a statistical distance metric, and is sometimes called the statistical distance, statistical difference or variational dist ...
.


See also

*
Rare event sampling Rare event sampling is an umbrella term for a group of computer simulation methods intended to selectively sample 'special' regions of the dynamic space of systems which are unlikely to visit those special regions through brute-force simulation. A ...
*
Curse of dimensionality The curse of dimensionality refers to various phenomena that arise when analyzing and organizing data in high-dimensional spaces that do not occur in low-dimensional settings such as the three-dimensional physical space of everyday experience. T ...
* Line sampling


Notes

*See Au & Wang for an introductory coverage of subset simulation and its application to engineering risk analysis. *Schuëller & Pradlwarter reports the performance of Subset Simulation (and other variance reduction techniques) in a set of stochastic mechanics benchmark problems. *Chapter 4 of Phoon discusses the application of subset simulation (and other Monte Carlo methods) to geotechnical engineering problems. *Zio & Pedroni discusses the application of subset simulation (and other methods) to a problem in nuclear engineering.


References

{{Reflist Reliability analysis Variance reduction