probability theory Probability theory or probability calculus is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expre ...

, Boole's inequality, also known as the union bound, says that for any finite or

countable In mathematics, a Set (mathematics), set is countable if either it is finite set, finite or it can be made in one to one correspondence with the set of natural numbers. Equivalently, a set is ''countable'' if there exists an injective function fro ...

set Set, The Set, SET or SETS may refer to: Science, technology, and mathematics Mathematics *Set (mathematics), a collection of elements *Category of sets, the category whose objects and morphisms are sets and total functions, respectively Electro ...

of events, the probability that at least one of the events happens is no greater than the sum of the probabilities of the individual events. This inequality provides an upper bound on the probability of occurrence of at least one of a countable number of events in terms of the individual probabilities of the events. Boole's inequality is named for its discoverer,

George Boole George Boole ( ; 2 November 1815 – 8 December 1864) was a largely self-taught English mathematician, philosopher and logician, most of whose short career was spent as the first professor of mathematics at Queen's College, Cork in Ireland. H ...

. Formally, for a countable set of events ''A''₁, ''A''₂, ''A''₃, ..., we have :

\left(\bigcup_^ A_i \right) \le \sum_^ (A_i).

In measure-theoretic terms, Boole's inequality follows from the fact that a measure (and certainly any

probability measure In mathematics, a probability measure is a real-valued function defined on a set of events in a σ-algebra that satisfies Measure (mathematics), measure properties such as ''countable additivity''. The difference between a probability measure an ...

) is ''σ''- sub-additive. Thus Boole's inequality holds not only for probability measures

, but more generally when

is replaced by any finite measure.

Proof

Proof using induction

Boole's inequality may be proved for finite collections of

n

events using the method of induction. For the

n=1

case, it follows that :

\mathbb P(A_1) \le \mathbb P(A_1).

For the case

n

, we have :

\left(\bigcup_^ A_i \right) \le \sum_^ (A_i).

Since

\mathbb P(A \cup B) = \mathbb P(A) + \mathbb(B) - \mathbb(A \cap B),

and because the union operation is

associative In mathematics, the associative property is a property of some binary operations that rearranging the parentheses in an expression will not change the result. In propositional logic, associativity is a valid rule of replacement for express ...

, we have :

\mathbb\left(\bigcup_^A_i\right) = \mathbb\left(\bigcup_^n A_i\right) + \mathbb(A_) -\mathbb\left(\bigcup_^n A_i \cap A_\right).

Since :

\left(\bigcup_^n A_i \cap A_\right) \ge 0,

by the first axiom of probability, we have :

\mathbb\left(\bigcup_^ A_i \right) \le \mathbb \left(\bigcup_^n A_i\right) + \mathbb(A_),

and therefore :

\mathbb\left(\bigcup_^ A_i \right) \le \sum_^ \mathbb(A_i) + \mathbb(A_) = \sum_^ \mathbb(A_i).

Proof without using induction

Let events

A_1, A_2, A_3, \dots

in our

probability space In probability theory, a probability space or a probability triple (\Omega, \mathcal, P) is a mathematical construct that provides a formal model of a random process or "experiment". For example, one can define a probability space which models ...

be given. The countable additivity of the measure

\mathbb

states that if

B_1, B_2, B_3, \dots

are pairwise disjoint events, then :

\mathbb\left(\bigcup_ B_i\right) = \sum_i \mathbb P(B_i).

Set :

B_i := A_i - \bigcup^_ A_j.

Then

B_1, B_2, B_3, \dots

are pairwise disjoint. We claim that: :

\bigcup^_ A_i = \bigcup^_ B_i.

One inclusion is clear. Indeed, since

B_i \subset A_i

for all i, thus

\bigcup^_ B_i \subset \bigcup^_ A_i

. For the other inclusion, let

x \in \bigcup^_ A_i

be given. Write

k

for the minimum positive

integer An integer is the number zero (0), a positive natural number (1, 2, 3, ...), or the negation of a positive natural number (−1, −2, −3, ...). The negations or additive inverses of the positive natural numbers are referred to as negative in ...

such that

x \in A_k

. Then

x \in  A_k - \bigcup^_ A_j = B_k

. Thus

x \in \bigcup^_ B_i

. Therefore

\bigcup^_ A_i \subset \bigcup^_ B_i

. Therefore :

\mathbb P\left(\bigcup_iA_i\right) = \mathbb P\left(\bigcup_iB_i\right) = \sum_i \mathbb P (B_i) \leq \sum_i \mathbb P(A_i),

where the last inequality holds because

B_i \subset A_i

implies that

\mathbb P (B_i) \leq \mathbb P(A_i),

for all i.

Bonferroni inequalities

Boole's inequality for a finite number of events may be generalized to certain upper and lower bounds on the probability of finite unions of events. These bounds are known as Bonferroni inequalities, after Carlo Emilio Bonferroni; see . Let :

S_1 := \sum_^n (A_i), \quad S_2 := \sum_ (A_ \cap A_ ),\quad \ldots,\quad S_k := \sum_ (A_\cap \cdots \cap A_ )

for all integers ''k'' in . Then, when

K \leq n

is odd: :

\sum_^K (-1)^ S_j \geq \mathbb\Big(\bigcup_^n A_i\Big) = \sum_^n  (-1)^ S_j

holds, and when

K \leq n

is even: :

\sum_^K (-1)^ S_j \leq \mathbb\Big(\bigcup_^n A_i\Big) = \sum_^n  (-1)^ S_j

holds. The inequalities follow from the inclusion–exclusion principle, and Boole's inequality is the special case of

K=1

. Since the proof of the inclusion-exclusion principle requires only the finite additivity (and nonnegativity) of

\mathbb

, thus the Bonferroni inequalities holds more generally

\mathbb

is replaced by any finite content, in the sense of measure theory.

Proof for odd K

Let

E = \bigcap_^n B_i

, where

B_i \in \

for each

i = 1, \dots, n

. These such

E

partition the

sample space In probability theory, the sample space (also called sample description space, possibility space, or outcome space) of an experiment or random trial is the set of all possible outcomes or results of that experiment. A sample space is usually den ...

, and for each

E

and every

i

E

is either contained in

A_i

or disjoint from it. If

E = \bigcap_^n A_i^c

, then

E

contributes 0 to both sides of the inequality. Otherwise, assume

E

is contained in exactly

L

of the

A_i

. Then

E

contributes exactly

\mathbb(E)

to the right side of the inequality, while it contributes :

\sum_^K (-1)^  \mathbb(E)

to the left side of the inequality. However, by Pascal's rule, this is equal to :

\sum_^K (-1)^ \Big( +  \Big)\mathbb(E)

which telescopes to :

\Big( 1 + \Big) \mathbb(E) \geq \mathbb(E)

Thus, the inequality holds for all events

E

, and so by summing over

E

, we obtain the desired inequality: :

\sum_^K (-1)^ S_j \geq \mathbb\Big(\bigcup_^n A_i\Big)

The proof for even

K

is nearly identical.

Example

Suppose that you are estimating five parameters based on a random sample, and you can control each parameter separately. If you want your estimations of all five parameters to be good with a chance 95%, what should you do to each parameter? Tuning each parameter's chance to be good to within 95% is not enough because "all are good" is a subset of each event "Estimate ''i'' is good". We can use Boole's Inequality to solve this problem. By finding the complement of event "all five are good", we can change this question into another condition: :''P''(at least one estimation is bad) = 0.05 ≤ ''P''(''A''₁ is bad) + ''P''(''A''₂ is bad) + ''P''(''A''₃ is bad) + ''P''(''A''₄ is bad) + ''P''(''A''₅ is bad) One way is to make each of them equal to 0.05/5 = 0.01, that is 1%. In other words, you have to guarantee each estimate good to 99%( for example, by constructing a 99% confidence interval) to make sure the total estimation to be good with a chance 95%. This is called the Bonferroni Method of simultaneous inference.

Proof

Proof using induction

Proof without using induction

Bonferroni inequalities

Proof for odd K

Example

See also

References

Other related articles