Subset simulation is a method used in

reliability engineering Reliability engineering is a sub-discipline of systems engineering that emphasizes the ability of equipment to function without failure. Reliability is defined as the probability that a product, system, or service will perform its intended functi ...

to compute small (i.e., rare event) failure probabilities encountered in engineering systems. The basic idea is to express a small failure probability as a product of larger conditional probabilities by introducing intermediate failure events. This conceptually converts the original rare-event problem into a series of frequent-event problems that are easier to solve. In the actual implementation, samples conditional on intermediate failure events are adaptively generated to gradually populate from the frequent to rare event region. These 'conditional samples' provide information for estimating the

complementary cumulative distribution function In probability theory and statistics, the cumulative distribution function (CDF) of a real-valued random variable X, or just distribution function of X, evaluated at x, is the probability that X will take a value less than or equal to x. Ever ...

(CCDF) of the quantity of interest (that governs failure), covering the high as well as the low probability regions. They can also be used for investigating the cause and consequence of failure events. The generation of conditional samples is not trivial but can be performed efficiently using

Markov chain Monte Carlo In statistics, Markov chain Monte Carlo (MCMC) is a class of algorithms used to draw samples from a probability distribution. Given a probability distribution, one can construct a Markov chain whose elements' distribution approximates it – that ...

(MCMC). Subset simulation takes the relationship between the (input) random variables and the (output) response quantity of interest as a '

black box In science, computing, and engineering, a black box is a system which can be viewed in terms of its inputs and outputs (or transfer characteristics), without any knowledge of its internal workings. Its implementation is "opaque" (black). The te ...

'. This can be attractive for complex systems where it is difficult to use other

variance reduction In mathematics, more specifically in the theory of Monte Carlo methods, variance reduction is a procedure used to increase the precision of the estimates obtained for a given simulation or computational effort. Every output random variable fr ...

or rare-event sampling techniques that require prior information about the system behaviour. For problems where it is possible to incorporate prior information into the reliability algorithm, it is often more efficient to use other

techniques such as

importance sampling Importance sampling is a Monte Carlo method for evaluating properties of a particular distribution, while only having samples generated from a different distribution than the distribution of interest. Its introduction in statistics is generally at ...

. It has been shown that subset simulation is more efficient than traditional

Monte Carlo simulation Monte Carlo methods, or Monte Carlo experiments, are a broad class of computational algorithms that rely on repeated random sampling to obtain numerical results. The underlying concept is to use randomness to solve problems that might be det ...

, but less efficient than line sampling, when applied to a

fracture mechanics Fracture mechanics is the field of mechanics concerned with the study of the propagation of cracks in materials. It uses methods of analytical solid mechanics to calculate the driving force on a crack and those of experimental solid mechanics t ...

test problem.

Basic idea

Let X be a vector of random variables and ''Y'' = ''h''(X) be a scalar (output) response quantity of interest for which the failure probability

P(F)=P(Y>b)

is to be determined. Each evaluation of ''h''(·) is expensive and so it should be avoided if possible. Using direct

Monte Carlo method Monte Carlo methods, or Monte Carlo experiments, are a broad class of computational algorithms that rely on repeated random sampling to obtain numerical results. The underlying concept is to use randomness to solve problems that might be ...

s one can generate i.i.d. (

independent and identically distributed Independent or Independents may refer to: Arts, entertainment, and media Artist groups * Independents (artist group), a group of modernist painters based in Pennsylvania, United States * Independentes (English: Independents), a Portuguese artist ...

) samples of X and then estimate ''P''(''F'') simply as the fraction of samples with ''Y'' > ''b''. However this is not efficient when ''P''(''F'') is small because most samples will not fail (i.e., with ''Y'' ≤ ''b'') and in many cases an estimate of 0 results. As a rule of thumb for small ''P''(''F'') one requires 10 failed samples to estimate P(F) with a

coefficient of variation In probability theory and statistics, the coefficient of variation (CV), also known as normalized root-mean-square deviation (NRMSD), percent RMS, and relative standard deviation (RSD), is a standardized measure of dispersion of a probability ...

of 30% (a moderate requirement). For example, 10000 i.i.d. samples, and hence evaluations of ''h''(·), would be required for such an estimate if ''P''(''F'') = 0.001. Subset simulation attempts to convert a rare event problem into more frequent ones. Let

b_1 < b_2 < \cdots < b_m = b

be an increasing sequence of intermediate threshold levels. From the basic property of

conditional probability In probability theory, conditional probability is a measure of the probability of an Event (probability theory), event occurring, given that another event (by assumption, presumption, assertion or evidence) is already known to have occurred. This ...

, :

\begin
P(Y>b) &= P(Y>b_\mid Y>b_) P(Y>b_)\\
&= P(Y>b_m\mid Y>b_) P(Y>b_\mid Y>b_) P(Y>b_)\\
&= \cdots \\
&= P(Y>b_m\mid Y>b_) P(Y>b_\mid Y>b_) \cdots P(Y>b_2\mid Y>b_1) P(Y>b_1)
\end

The 'raw idea' of subset simulation is to estimate P(F) by estimating

P(Y>b_1)

and the conditional probabilities

P(Y>b_i\mid Y>b_)

for

i=2,\ldots,m

, anticipating efficiency gain when these probabilities are not small. To implement this idea there are two basic issues: # Estimating the conditional probabilities by means of simulation requires the efficient generation of samples of X conditional on the intermediate failure events, i.e., the conditional samples. This is generally non-trivial. # The intermediate threshold levels

b_i

should be chosen so that the intermediate probabilities are not too small (otherwise ending up with rare event problem again) but not too large (otherwise requiring too many levels to reach the target event). However, this requires information of the CCDF, which is the target to be estimated. In the standard algorithm of subset simulation the first issue is resolved by using

. More generic and flexible version of the simulation algorithms not based on

have been recently developed. The second issue is resolved by choosing the intermediate threshold levels adaptively using samples from the last simulation level. As a result, subset simulation in fact produces a set of estimates for ''b'' that corresponds to different fixed values of ''p'' = ''P''(''Y'' > ''b''), rather than estimates of probabilities for fixed threshold values. There are a number of variations of subset simulation used in different contexts in applied probability and stochastic operations research For example, in some variations the simulation effort to estimate each conditional probability P(''Y'' > b_''i'' , ''Y'' > ''b''_''i''−1) (''i'' = 2, ..., ''m'') may not be fixed prior to the simulation, but may be random, similar to the splitting method in rare-event probability estimation. These versions of subset simulation can also be used to approximately sample from the distribution of X given the failure of the system (that is, conditional on the event

\

). In that case, the relative variance of the (random) number of particles in the final level

m

can be used to bound the sampling error as measured by the

total variation distance of probability measures In probability theory, the total variation distance is a statistical distance between probability distributions, and is sometimes called the statistical distance, statistical difference or variational distance. Definition Consider a measurab ...

Notes

*See Au & Wang for an introductory coverage of subset simulation and its application to engineering risk analysis. *Schuëller & Pradlwarter reports the performance of subset simulation (and other variance-reduction techniques) in a set of stochastic mechanics benchmark problems. *Chapter 4 of Phoon discusses the application of subset simulation (and other Monte Carlo methods) to geotechnical engineering problems. *Zio & Pedroni discusses the application of subset simulation (and other methods) to a problem in nuclear engineering.

References

{{Reflist Reliability analysis Variance reduction

Basic idea

See also

Notes

References