In
probability theory
Probability theory is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expressing it through a set ...
and
statistics
Statistics (from German: '' Statistik'', "description of a state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a scientific, indust ...
, the Poisson binomial distribution is the
discrete probability distribution
In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment. It is a mathematical description of a random phenomenon ...
of a sum of
independent
Independent or Independents may refer to:
Arts, entertainment, and media Artist groups
* Independents (artist group), a group of modernist painters based in the New Hope, Pennsylvania, area of the United States during the early 1930s
* Independe ...
Bernoulli trial
In the theory of probability and statistics, a Bernoulli trial (or binomial trial) is a random experiment with exactly two possible outcomes, "success" and "failure", in which the probability of success is the same every time the experiment is c ...
s that are not necessarily identically distributed. The concept is named after
Siméon Denis Poisson
Baron Siméon Denis Poisson FRS FRSE (; 21 June 1781 – 25 April 1840) was a French mathematician and physicist who worked on statistics, complex analysis, partial differential equations, the calculus of variations, analytical mechanics, electri ...
.
In other words, it is the
probability distribution
In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment. It is a mathematical description of a random phenomenon ...
of the
number of successes in a collection of ''n''
independent
Independent or Independents may refer to:
Arts, entertainment, and media Artist groups
* Independents (artist group), a group of modernist painters based in the New Hope, Pennsylvania, area of the United States during the early 1930s
* Independe ...
yes/no experiments with success
probabilities
Probability is the branch of mathematics concerning numerical descriptions of how likely an event is to occur, or how likely it is that a proposition is true. The probability of an event is a number between 0 and 1, where, roughly speaking, ...
. The ordinary
binomial distribution
In probability theory and statistics, the binomial distribution with parameters ''n'' and ''p'' is the discrete probability distribution of the number of successes in a sequence of ''n'' independent experiments, each asking a yes–no ques ...
is a special case of the Poisson binomial distribution, when all success probabilities are the same, that is
.
Definitions
Probability Mass Function
The probability of having ''k'' successful trials out of a total of ''n'' can be written as the sum
:
where
is the set of all subsets of ''k'' integers that can be selected from . For example, if ''n'' = 3, then
.
is the
complement of
, i.e.
.
will contain
elements, the sum over which is infeasible to compute in practice unless the number of trials ''n'' is small (e.g. if ''n'' = 30,
contains over 10
20 elements). However, there are other, more efficient ways to calculate
.
As long as none of the success probabilities are equal to one, one can calculate the probability of ''k'' successes using the recursive formula
:
where
:
The recursive formula is not
numerically stable, and should be avoided if
is greater than approximately 20. An alternative is to use a
divide-and-conquer algorithm
In computer science, divide and conquer is an algorithm design paradigm. A divide-and-conquer algorithm recursively breaks down a problem into two or more sub-problems of the same or related type, until these become simple enough to be solved dire ...
: if we assume
is a power of two, denoting by
the Poisson binomial of
and
the
convolution
In mathematics (in particular, functional analysis), convolution is a mathematical operation on two functions ( and ) that produces a third function (f*g) that expresses how the shape of one is modified by the other. The term ''convolution'' ...
operator, we have
.
Another possibility is using the
discrete Fourier transform
In mathematics, the discrete Fourier transform (DFT) converts a finite sequence of equally-spaced samples of a function into a same-length sequence of equally-spaced samples of the discrete-time Fourier transform (DTFT), which is a comple ...
.
:
where
and
.
Still other methods are described in "Statistical Applications of the Poisson-Binomial and conditional Bernoulli distributions" by Chen and Liu.
Properties
Mean and Variance
Since a Poisson binomial distributed variable is a sum of ''n'' independent Bernoulli distributed variables, its mean and variance will simply be sums of the mean and variance of the ''n'' Bernoulli distributions:
:
:
For fixed values of the mean (
) and size (''n''), the variance is maximal when all success probabilities are equal and we have a binomial distribution. When the mean is fixed, the variance is bounded from above by the variance of the
Poisson distribution
In probability theory and statistics, the Poisson distribution is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time or space if these events occur with a known co ...
with the same mean which is attained asymptotically as ''n'' tends to infinity.
Entropy
There is no simple formula for the entropy of a Poisson binomial distribution, but the entropy is bounded above by the entropy of a binomial distribution with the same number parameter and the same mean. Therefore, the entropy is also bounded above by the entropy of a Poisson distribution with the same mean.
The Shepp–Olkin concavity conjecture, due to
Lawrence Shepp
Lawrence Alan Shepp (September 9, 1936 Brooklyn, NY – April 23, 2013, Tucson, AZ) was an American mathematician, specializing in statistics and computational tomography.
Shepp obtained his PhD from Princeton University in 1961 with a disser ...
and
Ingram Olkin
Ingram Olkin (July 23, 1924 – April 28, 2016) was a professor emeritus and chair of statistics and education at Stanford University and the Stanford Graduate School of Education. He is known for developing statistical analysis for evalua ...
in 1981, states that the entropy of a Poisson binomial distribution is a concave function of the success probabilities
. This conjecture was proved by Erwan Hillion and Oliver Johnson in 2015. The Shepp-Olkin monotonicity conjecture, also from the same 1981 paper, is that the entropy is monotone increasing in
, if all
. This conjecture was also proved by Hillion and Johnson, in 2019
Chernoff bound
The probability that a Poisson binomial distribution gets large, can be bounded using its moment generating function as follows (valid when
and for any
):
:
where_we_took_
t=\log\left(s\left/\sum_ip_i\right.\right)._This_is_similar_to_the_Binomial_distribution#Tail_bounds.html" ;"title="_\sum_i_X_i\rightright.html" ;"title="_\sum_i_X_i\right.html" ;"title="exp\left[t \sum_i X_i\right">exp\left[t \sum_i X_i\rightright">_\sum_i_X_i\right.html" ;"title="exp\left[t \sum_i X_i\right">exp\left[t \sum_i X_i\rightright\\&= \exp(-st)\prod_i (1-p_i+e^t p_i)
\\&= \exp\left(-st + \sum_i \log\left( p_i (e^t - 1) + 1\right)\right)
\\&\le \exp\left(-st + \sum_i \log\left( \exp(p_i(e^t-1))\right)\right)
\\&= \exp\left(-st + \sum_i p_i(e^t-1)\right)
\\&= \exp\left(s-\mu-s\log\frac\right),
\end
where we took . This is similar to the Binomial distribution#Tail bounds">tail bounds of a binomial distribution.
Computational methods
The reference [
] discusses techniques of evaluating the probability mass function of the Poisson binomial distribution. The following software implementations are based on it:
* An R packag
''poibin''
was provided along with the paper, which is available for the computing of the cdf, pmf, quantile function, and random number generation of the Poisson binomial distribution. For computing the PMF, a DFT algorithm or a recursive algorithm can be specified to compute the exact PMF, and approximation methods using the normal and Poisson distribution can also be specified.
''poibin'' - Python implementation
- can compute the PMF and CDF, uses the DFT method described in the paper for doing so.
See also
*Le Cam's theorem
References
{{ProbDistributions, discrete-finite
Discrete distributions
Factorial and binomial topics