probability theory Probability theory or probability calculus is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expre ...

and

statistics Statistics (from German language, German: ', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a s ...

, the Poisson binomial distribution is the

discrete probability distribution In probability theory and statistics, a probability distribution is a function that gives the probabilities of occurrence of possible events for an experiment. It is a mathematical description of a random phenomenon in terms of its sample spa ...

of a sum of

independent Independent or Independents may refer to: Arts, entertainment, and media Artist groups * Independents (artist group), a group of modernist painters based in Pennsylvania, United States * Independentes (English: Independents), a Portuguese artist ...

Bernoulli trial In the theory of probability and statistics, a Bernoulli trial (or binomial trial) is a random experiment with exactly two possible outcomes, "success" and "failure", in which the probability of success is the same every time the experiment is ...

s that are not necessarily identically distributed. The concept is named after

Siméon Denis Poisson Baron Siméon Denis Poisson (, ; ; 21 June 1781 – 25 April 1840) was a French mathematician and physicist who worked on statistics, complex analysis, partial differential equations, the calculus of variations, analytical mechanics, electricity ...

. In other words, it is the

probability distribution In probability theory and statistics, a probability distribution is a Function (mathematics), function that gives the probabilities of occurrence of possible events for an Experiment (probability theory), experiment. It is a mathematical descri ...

of the number of successes in a collection of ''n''

yes/no experiments with success

probabilities Probability is a branch of mathematics and statistics concerning Event (probability theory), events and numerical descriptions of how likely they are to occur. The probability of an event is a number between 0 and 1; the larger the probab ...

p_1, p_2, \dots , p_n

. The ordinary

binomial distribution In probability theory and statistics, the binomial distribution with parameters and is the discrete probability distribution of the number of successes in a sequence of statistical independence, independent experiment (probability theory) ...

is a special case of the Poisson binomial distribution, when all success probabilities are the same, that is

p_1 = p_2 = \cdots = p_n

Definitions

Probability mass function

The probability of having ''k'' successful trials out of a total of ''n'' can be written as the sum :

\Pr(K=k) = \sum\limits_ \prod\limits_ p_i \prod\limits_ (1-p_j)

where

F_k

is the set of all subsets of ''k'' integers that can be selected from

\

. For example, if ''n'' = 3, then

F_2=\left\

A^c

is the complement of

A

, i.e.

A^c =\\smallsetminus A

F_k

will contain

n!/((n-k)!k!)

elements, the sum over which is infeasible to compute in practice unless the number of trials ''n'' is small (e.g. if ''n'' = 30,

F_

contains over 10²⁰ elements). However, there are other, more efficient ways to calculate

\Pr(K=k)

. As long as none of the success probabilities are equal to one, one can calculate the probability of ''k'' successes using the recursive formula :

\Pr (K=k)= \begin
\prod\limits_^n (1-p_i) &  k=0 \\ 
\frac \sum\limits_^k (-1)^\Pr (K=k-i)T(i) & k>0 \\ 
\end

where :

T(i)=\sum\limits_^n \left( \frac \right)^i.

The recursive formula is not numerically stable, and should be avoided if

n

is greater than approximately 20. An alternative is to use a

divide-and-conquer algorithm In computer science, divide and conquer is an algorithm design paradigm. A divide-and-conquer algorithm recursively breaks down a problem into two or more sub-problems of the same or related type, until these become simple enough to be solved dir ...

: if we assume

n = 2^b

is a power of two, denoting by

f(p_)

the Poisson binomial of

p_i, \dots, p_j

and

*

the

convolution In mathematics (in particular, functional analysis), convolution is a operation (mathematics), mathematical operation on two function (mathematics), functions f and g that produces a third function f*g, as the integral of the product of the two ...

operator, we have

f(p_) = f(p_)*f(p_)

. More generally, the probability mass function of a Poisson binomial can be expressed as the convolution of the vectors

P_1, \dots, P_n

where

P_i= -p_i,p_i /math>. This observation leads to the direct convolution (DC) algorithm for computing \Pr (K=0) through \Pr (K=n) :

''// PMF and nextPMF begin at index 0'' function DC(

p_1, \dots, p_n

) is declare new PMF array of size 1 PMF = for i = 1 to

n

do declare new nextPMF array of size i + 1 nextPMF = (1 -

p_i

) * PMF nextPMF =

p_i

* PMF - 1 for k = 1 to i - 1 do nextPMF =

p_i

* PMF - 1+ (1 -

p_i

) * PMF repeat PMF = nextPMF repeat return PMF end function

\Pr (K=k)

will be found in PMF DC is numerically stable, exact, and, when implemented as a software routine, exceptionally fast for

n \leq 2000

. It can also be quite fast for larger

n

, depending on the distribution of the

p_i

. Another possibility is using the

discrete Fourier transform In mathematics, the discrete Fourier transform (DFT) converts a finite sequence of equally-spaced Sampling (signal processing), samples of a function (mathematics), function into a same-length sequence of equally-spaced samples of the discre ...

. :

\Pr (K=k)=\frac \sum_^n C^ \prod_^n \left( 1+(C^\ell-1) p_m \right)

where

C=\exp \left( \frac \right)

and

i=\sqrt

. Still other methods are described in "Statistical Applications of the Poisson-Binomial and conditional Bernoulli distributions" by Chen and Liu and in "A simple and fast method for computing the Poisson binomial distribution function" by Biscarri et al.

Cumulative distribution function

The

cumulative distribution function In probability theory and statistics, the cumulative distribution function (CDF) of a real-valued random variable X, or just distribution function of X, evaluated at x, is the probability that X will take a value less than or equal to x. Ever ...

(CDF) can be expressed as: :

\Pr(K \leq k) = \sum^_ \sum_ \prod_ p_i \prod_ (1-p_j),

where

F_\ell

is the set of all subsets of size

\ell

that can be selected from

\

. It can be computed by invoking the DC function above, and then adding elements

0

through

k

of the returned PMF array.

Properties

Mean and variance

Since a Poisson binomial distributed variable is a sum of ''n'' independent Bernoulli distributed variables, its mean and variance will simply be sums of the mean and variance of the ''n'' Bernoulli distributions: :

\mu = \sum\limits_^n p_i

\sigma^2 =\sum\limits_^n (1-p_i) p_i

Entropy

There is no simple formula for the entropy of a Poisson binomial distribution, but the entropy is bounded above by the entropy of a binomial distribution with the same number parameter and the same mean. Therefore, the entropy is also bounded above by the entropy of a Poisson distribution with the same mean. The Shepp–Olkin concavity conjecture, due to Lawrence Shepp and Ingram Olkin in 1981, states that the entropy of a Poisson binomial distribution is a concave function of the success probabilities

p_1, p_2, \dots , p_n

. This conjecture was proved by Erwan Hillion and Oliver Johnson in 2015. The Shepp–Olkin monotonicity conjecture, also from the same 1981 paper, is that the entropy is monotone increasing in

p_i

, if all

p_i \leq 1/2

. This conjecture was also proved by Hillion and Johnson, in 2019.

Chernoff bound

The probability that a Poisson binomial distribution gets large, can be bounded using its moment generating function as follows (valid when

s \geq \mu

and for any

t>0

): :

right">_\sum_i_X_i\right.html" ;"title="exp\left[t \sum_i X_i\right">exp\left[t \sum_i X_i\rightright\\&= \exp(-st)\prod_i (1-p_i+e^t p_i) \\&= \exp\left(-st + \sum_i \log\left( p_i (e^t - 1) + 1\right)\right) \\&\le \exp\left(-st + \sum_i \log\left( \exp(p_i(e^t-1))\right)\right) \\&= \exp\left(-st + \sum_i p_i(e^t-1)\right) \\&= \exp\left(s-\mu-s\log\frac\right), \end

where we took

t=\log\left(s/\mu\right)

. This is similar to the

tail bounds of a binomial distribution.

Related distribution

Approximation by binomial distribution

A Poisson binomial distribution

PB

can be approximated by a binomial distribution

B

where

\mu

, the mean of the

p_i

, is the success probability of

B

. The variances of

PB

and

B

are related by the formula :

\operatorname(PB)=\operatorname(B)- \sum_^n (p_i-\mu)^2

As can be seen, the closer the

p_i

are to

\mu

, that is, the more the

p_i

tend to homogeneity, the larger

PB

's variance. When all the

p_i

are equal to

\mu

PB

becomes

B

\operatorname(PB)=\operatorname(B)

, and the variance is at its maximum. Ehm has determined bounds for the Total variation distance of probability measures">total variation distance In probability theory, the total variation distance is a statistical distance between probability distributions, and is sometimes called the statistical distance, statistical difference or variational distance. Definition Consider a measurable ...

PB

and

B

, in effect providing bounds on the error introduced when approximating

PB

with

B

. Let

\nu = 1 - \mu

and

d(PB,B)

be the total variation distance of

PB

and

B

. Then :

d(PB,B)\le(1-\mu^-\nu^) \frac

d(PB,B)\ge C\min\left\ \sum_^n (p_i-\mu)^2

where

C\ge\frac

d(PB,B)

tends to 0 if and only if

\operatorname(PB)/\operatorname(B)

tends to 1.

Approximation by Poisson distribution

A Poisson binomial distribution

PB

can also be approximated by a Poisson distribution

Po

with mean

\lambda=\sum_^n p_i

. Barbour and Hall have shown that :

\frac\min\left\ \sum_^n p_i^2\le d(PB, Po)\le\frac \lambda \sum_^n p_i^2

where

d(PB,B)

is the total variation distance of

PB

and

Po

. It can be seen that the smaller the

p_i

, the better

Po

approximates

PB

. As

\operatorname(Po)=\lambda=\sum_^n p_i

and

\operatorname(PB) =\sum\limits_^n p_i-\sum\limits_^n p_i^2

\operatorname(\mathrm)>\operatorname(PB)

; so a Poisson binomial distribution's variance is bounded above by a Poisson distribution with

\lambda=\sum_^n p_i

, and the smaller the

p_i

, the closer

\operatorname(\mathrm)

will be to

\operatorname(PB)

Computational methods

The reference discusses techniques of evaluating the probability mass function of the Poisson binomial distribution. The following software implementations are based on it: * An R packag
''poibin''
was provided along with the paper, which is available for the computing of the cdf, pmf, quantile function, and random number generation of the Poisson binomial distribution. For computing the PMF, a DFT algorithm or a recursive algorithm can be specified to compute the exact PMF, and approximation methods using the normal and Poisson distribution can also be specified.
''poibin'' – Python implementation
– can compute the PMF and CDF, uses the DFT method described in the paper for doing so.

References

{{ProbDistributions, discrete-finite Discrete distributions Factorial and binomial topics

Definitions

Probability mass function

Cumulative distribution function

Properties

Mean and variance

Entropy

Chernoff bound

Related distribution

Approximation by binomial distribution

Approximation by Poisson distribution

Computational methods

See also

References