Probability bounds analysis (PBA) is a collection of methods of uncertainty propagation for making qualitative and quantitative calculations in the face of uncertainties of various kinds. It is used to project partial information about random variables and other quantities through mathematical expressions. For instance, it computes sure bounds on the distribution of a sum, product, or more complex function, given only sure bounds on the distributions of the inputs. Such bounds are called

probability box A probability box (or p-box) is a characterization of uncertain numbers consisting of both aleatoric and epistemic uncertainties that is often used in risk analysis or quantitative uncertainty modeling where numerical calculations must be pe ...

es, and constrain cumulative probability distributions (rather than

densities Density (volumetric mass density or specific mass) is the substance's mass per unit of volume. The symbol most often used for density is ''ρ'' (the lower case Greek letter rho), although the Latin letter ''D'' can also be used. Mathematicall ...

or mass functions). This bounding approach permits analysts to make calculations without requiring overly precise assumptions about parameter values, dependence among variables, or even distribution shape. Probability bounds analysis is essentially a combination of the methods of standard interval analysis and classical

probability theory Probability theory is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expressing it through a set ...

. Probability bounds analysis gives the same answer as interval analysis does when only range information is available. It also gives the same answers as Monte Carlo simulation does when information is abundant enough to precisely specify input distributions and their dependencies. Thus, it is a generalization of both interval analysis and probability theory. The diverse methods comprising probability bounds analysis provide algorithms to evaluate mathematical expressions when there is uncertainty about the input values, their dependencies, or even the form of mathematical expression itself. The calculations yield results that are guaranteed to enclose all possible distributions of the output variable if the input p-boxes were also sure to enclose their respective distributions. In some cases, a calculated p-box will also be best-possible in the sense that the bounds could be no tighter without excluding some of the possible distributions. P-boxes are usually merely bounds on possible distributions. The bounds often also enclose distributions that are not themselves possible. For instance, the set of probability distributions that could result from adding random values without the independence assumption from two (precise) distributions is generally a proper subset of all the distributions enclosed by the p-box computed for the sum. That is, there are distributions within the output p-box that could not arise under any dependence between the two input distributions. The output p-box will, however, always contain all distributions that are possible, so long as the input p-boxes were sure to enclose their respective underlying distributions. This property often suffices for use in risk analysis and other fields requiring calculations under uncertainty.

History of bounding probability

The idea of bounding probability has a very long tradition throughout the history of probability theory. Indeed, in 1854

George Boole George Boole (; 2 November 1815 – 8 December 1864) was a largely self-taught English mathematician, philosopher, and logician, most of whose short career was spent as the first professor of mathematics at Queen's College, Cork in ...

used the notion of interval bounds on probability in his ''

The Laws of Thought ''An Investigation of the Laws of Thought on Which are Founded the Mathematical Theories of Logic and Probabilities'' by George Boole, published in 1854, is the second of Boole's two monographs on algebraic logic. Boole was a professor of mathem ...

''. Also dating from the latter half of the 19th century, the

inequality Inequality may refer to: Economics * Attention inequality, unequal distribution of attention across users, groups of people, issues in etc. in attention economy * Economic inequality, difference in economic well-being between population groups * ...

attributed to

Chebyshev Pafnuty Lvovich Chebyshev ( rus, Пафну́тий Льво́вич Чебышёв, p=pɐfˈnutʲɪj ˈlʲvovʲɪtɕ tɕɪbɨˈʂof) ( – ) was a Russian mathematician and considered to be the founding father of Russian mathematics. Chebyshe ...

described bounds on a distribution when only the mean and variance of the variable are known, and the related

attributed to

Markov Markov ( Bulgarian, russian: Марков), Markova, and Markoff are common surnames used in Russia and Bulgaria. Notable people with the name include: Academics *Ivana Markova (born 1938), Czechoslovak-British emeritus professor of psychology at ...

found bounds on a positive variable when only the mean is known. KyburgKyburg, H.E., Jr. (1999)
Interval valued probabilities
SIPTA Documention on Imprecise Probability. reviewed the history of interval probabilities and traced the development of the critical ideas through the 20th century, including the important notion of incomparable probabilities favored by

Keynes John Maynard Keynes, 1st Baron Keynes, ( ; 5 June 1883 – 21 April 1946), was an English economist whose ideas fundamentally changed the theory and practice of macroeconomics and the economic policies of governments. Originally trained in m ...

. Of particular note is Fréchet's derivation in the 1930s of bounds on calculations involving total probabilities without dependence assumptions. Bounding probabilities has continued to the present day (e.g., Walley's theory of

imprecise probability Imprecise probability generalizes probability theory to allow for partial probability specifications, and is applicable when information is scarce, vague, or conflicting, in which case a unique probability distribution may be hard to identify. There ...

.) The methods of probability bounds analysis that could be routinely used in risk assessments were developed in the 1980s. Hailperin described a computational scheme for bounding logical calculations extending the ideas of Boole. YagerYager, R.R. (1986). Arithmetic and other operations on Dempster–Shafer structures. ''International Journal of Man-machine Studies'' 25: 357–366. described the elementary procedures by which bounds on

convolutions In mathematics (in particular, functional analysis), convolution is a mathematical operation on two functions ( and ) that produces a third function (f*g) that expresses how the shape of one is modified by the other. The term ''convolution'' ...

can be computed under an assumption of independence. At about the same time, Makarov,Makarov, G.D. (1981). Estimates for the distribution function of a sum of two random variables when the marginal distributions are fixed. ''Theory of Probability and Its Applications'' 26: 803–806. and independently, Rüschendorf solved the problem, originally posed by

Kolmogorov Andrey Nikolaevich Kolmogorov ( rus, Андре́й Никола́евич Колмого́ров, p=ɐnˈdrʲej nʲɪkɐˈlajɪvʲɪtɕ kəlmɐˈɡorəf, a=Ru-Andrey Nikolaevich Kolmogorov.ogg, 25 April 1903 – 20 October 1987) was a Sovi ...

, of how to find the upper and lower bounds for the probability distribution of a sum of random variables whose marginal distributions, but not their joint distribution, are known. Frank et al.Frank, M.J., R.B. Nelsen and B. Schweizer (1987). Best-possible bounds for the distribution of a sum—a problem of Kolmogorov. ''Probability Theory and Related Fields'' 74: 199–211. generalized the result of Makarov and expressed it in terms of copulas. Since that time, formulas and algorithms for sums have been generalized and extended to differences, products, quotients and other binary and unary functions under various dependence assumptions.Williamson, R.C., and T. Downs (1990). Probabilistic arithmetic I: Numerical methods for calculating convolutions and dependency bounds. ''International Journal of Approximate Reasoning'' 4: 89–158.Ferson, S., V. Kreinovich, L. Ginzburg, D.S. Myers, and K. Sentz. (2003)
''Constructing Probability Boxes and Dempster–Shafer Structures''
. SAND2002-4015. Sandia National Laboratories, Albuquerque, NM.Berleant, D., and C. Goodman-Strauss (1998). Bounding the results of arithmetic operations on random variables of unknown dependency using intervals. ''Reliable Computing'' 4: 147–165.Ferson, S., R. Nelsen, J. Hajagos, D. Berleant, J. Zhang, W.T. Tucker, L. Ginzburg and W.L. Oberkampf (2004)
''Dependence in Probabilistic Modeling, Dempster–Shafer Theory, and Probability Bounds Analysis''
Sandia National Laboratories, SAND2004-3072, Albuquerque, NM.

Arithmetic expressions

Arithmetic expressions involving operations such as additions, subtractions, multiplications, divisions, minima, maxima, powers, exponentials, logarithms, square roots, absolute values, etc., are commonly used in risk analyses and uncertainty modeling. Convolution is the operation of finding the probability distribution of a sum of independent random variables specified by probability distributions. We can extend the term to finding distributions of other mathematical functions (products, differences, quotients, and more complex functions) and other assumptions about the intervariable dependencies. There are convenient algorithms for computing these generalized convolutions under a variety of assumptions about the dependencies among the inputs.

Mathematical details

Let

\mathbb

denote the space of distribution functions on the

real number In mathematics, a real number is a number that can be used to measure a ''continuous'' one-dimensional quantity such as a distance, duration or temperature. Here, ''continuous'' means that values can have arbitrarily small variations. Every ...

\R,

i.e., :

\mathbb = \.

A p-box is a quintuple :

\left \,

where

\overline, \underline \in \mathbb, m, v

are real intervals, and

\mathbf \subset \mathbb.

This quintuple denotes the set of distribution functions

F \in \mathbf \subset \mathbb

such that: :

&\int_\R x dF(x) \in m && \text \\ &\int_\R x^2 dF(x) - \left ( \int_\R x dF(x) \right )^2 \in v && \text \end

If a function satisfies all the conditions above it is said to be ''inside'' the p-box. In some cases, there may be no information about the moments or distribution family other than what is encoded in the two distribution functions that constitute the edges of the p-box. Then the quintuple representing the p-box

\

can be denoted more compactly as 'B''₁, ''B''₂ This notation harkens to that of intervals on the real line, except that the endpoints are distributions rather than points. The notation

X \sim F

denotes the fact that

X \in \R

is a random variable governed by the distribution function ''F'', that is, :

\\ x \mapsto \Pr (X \leq x) \end

Let us generalize the tilde notation for use with p-boxes. We will write ''X'' ~ ''B'' to mean that ''X'' is a random variable whose distribution function is unknown except that it is inside ''B''. Thus, ''X'' ~ ''F'' ∈ ''B'' can be contracted to X ~ B without mentioning the distribution function explicitly. If ''X'' and ''Y'' are independent random variables with distributions ''F'' and ''G'' respectively, then ''X'' + ''Y'' = ''Z'' ~ ''H'' given by :

H(z) = \int_ F(x) G(y) dz = \int_ F(x) G(z-x) dx = F * G.

This operation is called a

convolution In mathematics (in particular, functional analysis), convolution is a mathematical operation on two functions ( and ) that produces a third function (f*g) that expresses how the shape of one is modified by the other. The term ''convolution'' ...

on ''F'' and ''G''. The analogous operation on p-boxes is straightforward for sums. Suppose :

X \sim A =_1, A_2 \quad \text \quad Y \sim B =_1, B_2

If ''X'' and ''Y'' are stochastically independent, then the distribution of ''Z'' = ''X'' + ''Y'' is inside the p-box :

\left_1 * B_1, A_2 * B_2 \right

Finding bounds on the distribution of sums ''Z'' = ''X'' + ''Y'' ''without making any assumption about the dependence'' between ''X'' and ''Y'' is actually easier than the problem assuming independence. Makarov showed that :

Z \sim \left \sup_ \max ( F(x) +G(y) -1, 0), \inf_ \min (F(x)+G(y), 1) \right /math>

These bounds are implied by the Fréchet–Hoeffding copula bounds. The problem can also be solved using the methods of

mathematical programming Mathematical optimization (alternatively spelled ''optimisation'') or mathematical programming is the selection of a best element, with regard to some criterion, from some set of available alternatives. It is generally divided into two subfi ...

. The convolution under the intermediate assumption that ''X'' and ''Y'' have positive dependence is likewise easy to compute, as is the convolution under the extreme assumptions of perfect positive or perfect negative dependency between ''X'' and ''Y''. Generalized convolutions for other operations such as subtraction, multiplication, division, etc., can be derived using transformations. For instance, p-box subtraction ''A'' − ''B'' can be defined as ''A'' + (−''B''), where the negative of a p-box ''B'' = 'B''₁, ''B''₂is 'B''₂(−''x''), ''B''₁(−''x'')

Logical expressions

Logical or

Boolean expressions In computer science, a Boolean expression is an expression used in programming languages that produces a Boolean value when evaluated. A Boolean value is either true or false. A Boolean expression may be composed of a combination of the Boolean c ...

involving conjunctions (

AND or AND may refer to: Logic, grammar, and computing * Conjunction (grammar), connecting two words, phrases, or clauses * Logical conjunction in mathematical logic, notated as "∧", "⋅", "&", or simple juxtaposition * Bitwise AND, a boolea ...

operations), disjunctions ( OR operations), exclusive disjunctions, equivalences, conditionals, etc. arise in the analysis of fault trees and event trees common in risk assessments. If the probabilities of events are characterized by intervals, as suggested by Boole and

among others, these binary operations are straightforward to evaluate. For example, if the probability of an event A is in the interval P(A) = ''a'' = .2, 0.25 and the probability of the event B is in P(B) = ''b'' = .1, 0.3 then the probability of the

conjunction Conjunction may refer to: * Conjunction (grammar), a part of speech * Logical conjunction, a mathematical operator ** Conjunction introduction, a rule of inference of propositional logic * Conjunction (astronomy), in which two astronomical bodies ...

is surely in the interval : P(A & B) = ''a'' × ''b'' :::: = .2, 0.25× .1, 0.3:::: = .2 × 0.1, 0.25 × 0.3:::: = .02, 0.075so long as A and B can be assumed to be independent events. If they are not independent, we can still bound the conjunction using the classical Fréchet inequality. In this case, we can infer at least that the probability of the joint event A & B is surely within the interval : P(A & B) = env(max(0, ''a''+''b''−1), min(''a'', ''b'')) :::: = env(max(0, .2, 0.25 .1, 0.3��1), min( .2, 0.25 .1, 0.3) :::: = env( ax(0, 0.2+0.1–1), max(0, 0.25+0.3–1) in(0.2,0.1), min(0.25, 0.3) :::: = env( ,0 .1, 0.25 :::: = , 0.25where env( 'x''₁,''x''₂ 'y''₁,''y''₂ is in(''x''₁,''y''₁), max(''x''₂,''y''₂) Likewise, the probability of the

disjunction In logic, disjunction is a logical connective typically notated as \lor and read aloud as "or". For instance, the English language sentence "it is raining or it is snowing" can be represented in logic using the disjunctive formula R \lor S ...

is surely in the interval : P(A v B) = ''a'' + ''b'' − ''a'' × ''b'' = 1 − (1 − ''a'') × (1 − ''b'') :::: = 1 − (1 − .2, 0.25 × (1 − .1, 0.3 :::: = 1 − .75, 0.8× .7, 0.9:::: = 1 − .525, 0.72:::: = .28, 0.475if A and B are independent events. If they are not independent, the Fréchet inequality bounds the disjunction : P(A v B) = env(max(''a'', ''b''), min(1, ''a'' + ''b'')) :::: = env(max( .2, 0.25 .1, 0.3, min(1, .2, 0.25+ .1, 0.3) :::: = env( .2, 0.3 .3, 0.55 :::: = .2, 0.55 It is also possible to compute interval bounds on the conjunction or disjunction under other assumptions about the dependence between A and B. For instance, one might assume they are positively dependent, in which case the resulting interval is not as tight as the answer assuming independence but tighter than the answer given by the Fréchet inequality. Comparable calculations are used for other logical functions such as negation, exclusive disjunction, etc. When the Boolean expression to be evaluated becomes complex, it may be necessary to evaluate it using the methods of mathematical programming to get best-possible bounds on the expression. A similar problem one presents in the case of

probabilistic logic Probabilistic logic (also probability logic and probabilistic reasoning) involves the use of probability and logic to deal with uncertain situations. Probabilistic logic extends traditional logic truth tables with probabilistic expressions. A diffic ...

(see for example Gerla 1994). If the probabilities of the events are characterized by probability distributions or p-boxes rather than intervals, then analogous calculations can be done to obtain distributional or p-box results characterizing the probability of the top event.

Magnitude comparisons

The probability that an uncertain number represented by a p-box ''D'' is less than zero is the interval Pr(''D'' < 0) = u>''F''(0), ''F̅''(0) where ''F̅''(0) is the left bound of the probability box ''D'' and ''F''(0) is its right bound, both evaluated at zero. Two uncertain numbers represented by probability boxes may then be compared for numerical magnitude with the following encodings: :''A'' < ''B'' = Pr(''A'' − ''B'' < 0), :''A'' > ''B'' = Pr(''B'' − ''A'' < 0), :''A'' ≤ ''B'' = Pr(''A'' − ''B'' ≤ 0), and :''A'' ≥ ''B'' = Pr(''B'' − ''A'' ≤ 0). Thus the probability that ''A'' is less than ''B'' is the same as the probability that their difference is less than zero, and this probability can be said to be the value of the expression ''A'' < ''B''. Like arithmetic and logical operations, these magnitude comparisons generally depend on the stochastic dependence between ''A'' and ''B'', and the subtraction in the encoding should reflect that dependence. If their dependence is unknown, the difference can be computed without making any assumption using the Fréchet operation.

Sampling-based computation

Some analysts use sampling-based approaches to computing probability bounds, including Monte Carlo simulation, Latin hypercube methods or

importance sampling Importance sampling is a Monte Carlo method for evaluating properties of a particular distribution, while only having samples generated from a different distribution than the distribution of interest. Its introduction in statistics is generally at ...

. These approaches cannot assure mathematical rigor in the result because such simulation methods are approximations, although their performance can generally be improved simply by increasing the number of replications in the simulation. Thus, unlike the analytical theorems or methods based on mathematical programming, sampling-based calculations usually cannot produce verified computations. However, sampling-based methods can be very useful in addressing a variety of problems which are computationally difficult to solve analytically or even to rigorously bound. One important example is the use of Cauchy-deviate sampling to avoid the

curse of dimensionality The curse of dimensionality refers to various phenomena that arise when analyzing and organizing data in high-dimensional spaces that do not occur in low-dimensional settings such as the three-dimensional physical space of everyday experience. T ...

in propagating interval uncertainty through high-dimensional problems.Trejo, R., Kreinovich, V. (2001)
Error estimations for indirect measurements: randomized vs. deterministic algorithms for ‘black-box’ programs
''Handbook on Randomized Computing'', S. Rajasekaran, P. Pardalos, J. Reif, and J. Rolim (eds.), Kluwer, 673–729.

Relationship to other uncertainty propagation approaches

PBA belongs to a class of methods that use imprecise probabilities to simultaneously represent aleatoric and epistemic uncertainties. PBA is a generalization of both interval analysis and probabilistic

such as is commonly implemented with Monte Carlo simulation. PBA is also closely related to robust Bayes analysis, which is sometimes called Bayesian sensitivity analysis. PBA is an alternative to second-order Monte Carlo simulation.

Applications

References

Further references

* * * * {{cite book , last1 = Oberkampf , first1 = William L. , last2 = Roy , first2 = Christopher J. , title = Verification and Validation in Scientific Computing , publisher = Cambridge University Press , location = New York , year = 2010 , isbn = 978-0-521-11360-1

External links

Probability bounds analysis in environmental risk assessments

Intervals and probability distributions

Epistemic uncertainty project

The Society for Imprecise Probability: Theories and Applications
Mathematical analysis