HOME

TheInfoList



OR:

In
probability theory Probability theory is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expressing it through a set ...
and
statistics Statistics (from German: '' Statistik'', "description of a state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a scientific, indust ...
, the Poisson binomial distribution is the
discrete probability distribution In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment. It is a mathematical description of a random phenomenon ...
of a sum of
independent Independent or Independents may refer to: Arts, entertainment, and media Artist groups * Independents (artist group), a group of modernist painters based in the New Hope, Pennsylvania, area of the United States during the early 1930s * Independe ...
Bernoulli trial In the theory of probability and statistics, a Bernoulli trial (or binomial trial) is a random experiment with exactly two possible outcomes, "success" and "failure", in which the probability of success is the same every time the experiment is c ...
s that are not necessarily identically distributed. The concept is named after
Siméon Denis Poisson Baron Siméon Denis Poisson FRS FRSE (; 21 June 1781 – 25 April 1840) was a French mathematician and physicist who worked on statistics, complex analysis, partial differential equations, the calculus of variations, analytical mechanics, electri ...
. In other words, it is the
probability distribution In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment. It is a mathematical description of a random phenomenon ...
of the number of successes in a collection of ''n''
independent Independent or Independents may refer to: Arts, entertainment, and media Artist groups * Independents (artist group), a group of modernist painters based in the New Hope, Pennsylvania, area of the United States during the early 1930s * Independe ...
yes/no experiments with success
probabilities Probability is the branch of mathematics concerning numerical descriptions of how likely an event is to occur, or how likely it is that a proposition is true. The probability of an event is a number between 0 and 1, where, roughly speaking, ...
p_1, p_2, \dots , p_n. The ordinary
binomial distribution In probability theory and statistics, the binomial distribution with parameters ''n'' and ''p'' is the discrete probability distribution of the number of successes in a sequence of ''n'' independent experiments, each asking a yes–no ques ...
is a special case of the Poisson binomial distribution, when all success probabilities are the same, that is p_1 = p_2 = \cdots = p_n.


Definitions


Probability Mass Function

The probability of having ''k'' successful trials out of a total of ''n'' can be written as the sum :\Pr(K=k) = \sum\limits_ \prod\limits_ p_i \prod\limits_ (1-p_j) where F_k is the set of all subsets of ''k'' integers that can be selected from . For example, if ''n'' = 3, then F_2=\left\. A^c is the complement of A, i.e. A^c =\\setminus A. F_k will contain n!/((n-k)!k!) elements, the sum over which is infeasible to compute in practice unless the number of trials ''n'' is small (e.g. if ''n'' = 30, F_ contains over 1020 elements). However, there are other, more efficient ways to calculate \Pr(K=k). As long as none of the success probabilities are equal to one, one can calculate the probability of ''k'' successes using the recursive formula :\Pr (K=k)= \begin \prod\limits_^n (1-p_i) & k=0 \\ \frac \sum\limits_^k (-1)^\Pr (K=k-i)T(i) & k>0 \\ \end where : T(i)=\sum\limits_^n \left( \frac \right)^i. The recursive formula is not numerically stable, and should be avoided if n is greater than approximately 20. An alternative is to use a
divide-and-conquer algorithm In computer science, divide and conquer is an algorithm design paradigm. A divide-and-conquer algorithm recursively breaks down a problem into two or more sub-problems of the same or related type, until these become simple enough to be solved dire ...
: if we assume n = 2^b is a power of two, denoting by f(p_) the Poisson binomial of p_i, \dots, p_j and * the
convolution In mathematics (in particular, functional analysis), convolution is a mathematical operation on two functions ( and ) that produces a third function (f*g) that expresses how the shape of one is modified by the other. The term ''convolution'' ...
operator, we have f(p_) = f(p_)*f(p_). Another possibility is using the
discrete Fourier transform In mathematics, the discrete Fourier transform (DFT) converts a finite sequence of equally-spaced samples of a function into a same-length sequence of equally-spaced samples of the discrete-time Fourier transform (DTFT), which is a comple ...
. :\Pr (K=k)=\frac \sum\limits_^n C^ \prod\limits_^n \left( 1+(C^l-1) p_m \right) where C=\exp \left( \frac \right) and i=\sqrt. Still other methods are described in "Statistical Applications of the Poisson-Binomial and conditional Bernoulli distributions" by Chen and Liu.


Properties


Mean and Variance

Since a Poisson binomial distributed variable is a sum of ''n'' independent Bernoulli distributed variables, its mean and variance will simply be sums of the mean and variance of the ''n'' Bernoulli distributions: : \mu = \sum\limits_^n p_i : \sigma^2 =\sum\limits_^n (1-p_i) p_i For fixed values of the mean (\mu) and size (''n''), the variance is maximal when all success probabilities are equal and we have a binomial distribution. When the mean is fixed, the variance is bounded from above by the variance of the
Poisson distribution In probability theory and statistics, the Poisson distribution is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time or space if these events occur with a known co ...
with the same mean which is attained asymptotically as ''n'' tends to infinity.


Entropy

There is no simple formula for the entropy of a Poisson binomial distribution, but the entropy is bounded above by the entropy of a binomial distribution with the same number parameter and the same mean. Therefore, the entropy is also bounded above by the entropy of a Poisson distribution with the same mean. The Shepp–Olkin concavity conjecture, due to
Lawrence Shepp Lawrence Alan Shepp (September 9, 1936 Brooklyn, NY – April 23, 2013, Tucson, AZ) was an American mathematician, specializing in statistics and computational tomography. Shepp obtained his PhD from Princeton University in 1961 with a disser ...
and
Ingram Olkin Ingram Olkin (July 23, 1924 – April 28, 2016) was a professor emeritus and chair of statistics and education at Stanford University and the Stanford Graduate School of Education. He is known for developing statistical analysis for evalua ...
in 1981, states that the entropy of a Poisson binomial distribution is a concave function of the success probabilities p_1, p_2, \dots , p_n. This conjecture was proved by Erwan Hillion and Oliver Johnson in 2015. The Shepp-Olkin monotonicity conjecture, also from the same 1981 paper, is that the entropy is monotone increasing in p_i, if all p_i \leq 1/2. This conjecture was also proved by Hillion and Johnson, in 2019


Chernoff bound

The probability that a Poisson binomial distribution gets large, can be bounded using its moment generating function as follows (valid when s \geq \mu and for any t>0): : \begin \Pr \ge s&\le \exp(-st)\operatorname E\left exp\left exp\left[t_\sum_i_X_i\rightright">_\sum_i_X_i\right.html"_;"title="exp\left[t_\sum_i_X_i\right">exp\left[t_\sum_i_X_i\rightright\\&=_\exp(-st)\prod_i_(1-p_i+e^t_p_i) \\&=_\exp\left(-st_+_\sum_i_\log\left(_p_i_(e^t_-_1)_+_1\right)\right) \\&\le_\exp\left(-st_+_\sum_i_\log\left(_\exp(p_i(e^t-1))\right)\right) \\&=_\exp\left(-st_+_\sum_i_p_i(e^t-1)\right) \\&=_\exp\left(s-\mu-s\log\frac\right), \end where_we_took_t=\log\left(s\left/\sum_ip_i\right.\right)._This_is_similar_to_the_Binomial_distribution#Tail_bounds.html" ;"title="_\sum_i_X_i\rightright.html" ;"title="_\sum_i_X_i\right.html" ;"title="exp\left[t \sum_i X_i\right">exp\left[t \sum_i X_i\rightright">_\sum_i_X_i\right.html" ;"title="exp\left[t \sum_i X_i\right">exp\left[t \sum_i X_i\rightright\\&= \exp(-st)\prod_i (1-p_i+e^t p_i) \\&= \exp\left(-st + \sum_i \log\left( p_i (e^t - 1) + 1\right)\right) \\&\le \exp\left(-st + \sum_i \log\left( \exp(p_i(e^t-1))\right)\right) \\&= \exp\left(-st + \sum_i p_i(e^t-1)\right) \\&= \exp\left(s-\mu-s\log\frac\right), \end where we took t=\log\left(s\left/\sum_ip_i\right.\right). This is similar to the Binomial distribution#Tail bounds">tail bounds of a binomial distribution.


Computational methods

The reference discusses techniques of evaluating the probability mass function of the Poisson binomial distribution. The following software implementations are based on it: * An R packag
''poibin''
was provided along with the paper, which is available for the computing of the cdf, pmf, quantile function, and random number generation of the Poisson binomial distribution. For computing the PMF, a DFT algorithm or a recursive algorithm can be specified to compute the exact PMF, and approximation methods using the normal and Poisson distribution can also be specified.
''poibin'' - Python implementation
- can compute the PMF and CDF, uses the DFT method described in the paper for doing so.


See also

*Le Cam's theorem


References

{{ProbDistributions, discrete-finite Discrete distributions Factorial and binomial topics