HOME

TheInfoList



OR:

In
probability theory Probability theory is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expressing it through a set o ...
and statistics, the negative binomial distribution is a
discrete probability distribution In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment. It is a mathematical description of a random phenomenon i ...
that models the number of failures in a sequence of independent and identically distributed
Bernoulli trial In the theory of probability and statistics, a Bernoulli trial (or binomial trial) is a random experiment with exactly two possible outcomes, "success" and "failure", in which the probability of success is the same every time the experiment is ...
s before a specified (non-random) number of successes (denoted r) occurs. For example, we can define rolling a 6 on a die as a success, and rolling any other number as a failure, and ask how many failure rolls will occur before we see the third success (r=3). In such a case, the probability distribution of the number of failures that appear will be a negative binomial distribution. An alternative formulation is to model the number of total trials (instead of the number of failures). In fact, for a specified (non-random) number of successes (r), the number of failures (n - r) are random because the total trials (n) are random. For example, we could use the negative binomial distribution to model the number of days n (random) a certain machine works (specified by r) before it breaks down. The Pascal distribution (after
Blaise Pascal Blaise Pascal ( , , ; ; 19 June 1623 – 19 August 1662) was a French mathematician, physicist, inventor, philosopher, and Catholic writer. He was a child prodigy who was educated by his father, a tax collector in Rouen. Pascal's earlies ...
) and Polya distribution (for
George Pólya George Pólya (; hu, Pólya György, ; December 13, 1887 – September 7, 1985) was a Hungarian mathematician. He was a professor of mathematics from 1914 to 1940 at ETH Zürich and from 1940 to 1953 at Stanford University. He made fundamenta ...
) are special cases of the negative binomial distribution. A convention among engineers, climatologists, and others is to use "negative binomial" or "Pascal" for the case of an integer-valued stopping-time parameter (r) and use "Polya" for the real-valued case. For occurrences of associated discrete events, like tornado outbreaks, the Polya distributions can be used to give more accurate models than the
Poisson distribution In probability theory and statistics, the Poisson distribution is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time or space if these events occur with a known ...
by allowing the mean and variance to be different, unlike the Poisson. The negative binomial distribution has a variance \mu /p, where r is the number of successes, with the distribution becoming identical to Poisson in the limit p\to 1 for a given mean \mu. This can make the distribution a useful overdispersed alternative to the Poisson distribution, for example for a
robust Robustness is the property of being strong and healthy in constitution. When it is transposed into a system, it refers to the ability of tolerating perturbations that might affect the system’s functional body. In the same line ''robustness'' ca ...
modification of
Poisson regression In statistics, Poisson regression is a generalized linear model form of regression analysis used to model count data and contingency tables. Poisson regression assumes the response variable ''Y'' has a Poisson distribution, and assumes the loga ...
. In epidemiology, it has been used to model disease transmission for infectious diseases where the likely number of onward infections may vary considerably from individual to individual and from setting to setting. More generally, it may be appropriate where events have positively correlated occurrences causing a larger
variance In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its population mean or sample mean. Variance is a measure of dispersion, meaning it is a measure of how far a set of number ...
than if the occurrences were independent, due to a positive
covariance In probability theory and statistics, covariance is a measure of the joint variability of two random variables. If the greater values of one variable mainly correspond with the greater values of the other variable, and the same holds for the le ...
term. The term "negative binomial" is likely due to the fact that a certain
binomial coefficient In mathematics, the binomial coefficients are the positive integers that occur as coefficients in the binomial theorem. Commonly, a binomial coefficient is indexed by a pair of integers and is written \tbinom. It is the coefficient of the t ...
that appears in the formula for the
probability mass function In probability and statistics, a probability mass function is a function that gives the probability that a discrete random variable is exactly equal to some value. Sometimes it is also known as the discrete density function. The probability mass ...
of the distribution can be written more simply with negative numbers.


Definitions

Imagine a sequence of independent
Bernoulli trial In the theory of probability and statistics, a Bernoulli trial (or binomial trial) is a random experiment with exactly two possible outcomes, "success" and "failure", in which the probability of success is the same every time the experiment is ...
s: each trial has two potential outcomes called "success" and "failure." In each trial the probability of success is p and of failure is 1-p. We observe this sequence until a predefined number r of successes occurs. Then the random number of observed failures, X, follows the negative binomial (or Pascal) distribution: : X\sim\operatorname(r, p)


Probability mass function

The
probability mass function In probability and statistics, a probability mass function is a function that gives the probability that a discrete random variable is exactly equal to some value. Sometimes it is also known as the discrete density function. The probability mass ...
of the negative binomial distribution is : f(k; r, p) \equiv \Pr(X = k) = \binom (1-p)^k p^r where ''r'' is the number of successes, ''k'' is the number of failures, and ''p'' is the probability of success on each trial. Note that this formulation is an alternative formulation to the sidebar; in this formulation, the mean is /(1-p) and the variance is rp/(1-p)^2. Here, the quantity in parentheses is the
binomial coefficient In mathematics, the binomial coefficients are the positive integers that occur as coefficients in the binomial theorem. Commonly, a binomial coefficient is indexed by a pair of integers and is written \tbinom. It is the coefficient of the t ...
, and is equal to : \binom = \frac = \frac. There are ''k'' failures chosen from ''k'' + ''r'' − 1 trials rather than ''k'' + ''r'' because the last of the ''k'' + ''r'' trials is by definition a success. This quantity can alternatively be written in the following manner, explaining the name "negative binomial": : \begin & \frac \\ pt= & (-1)^k \frac = (-1)^k\binom. \end Note that by the last expression and the
binomial series In mathematics, the binomial series is a generalization of the polynomial that comes from a binomial formula expression like (1+x)^n for a nonnegative integer n. Specifically, the binomial series is the Taylor series for the function f(x)=(1 ...
, for every and q=1-p, : p^ = (1-q)^ = \sum_^\infty \binom(-q)^k = \sum_^\infty \binomq^k hence the terms of the probability mass function indeed add up to one as below. : \sum_^\infty \binom(1-p)^kp^r = p^p^r = 1 To understand the above definition of the probability mass function, note that the probability for every specific sequence of ''r'' successes and ''k'' failures is , because the outcomes of the ''k'' + ''r'' trials are supposed to happen independently. Since the ''r''th success always comes last, it remains to choose the ''k'' trials with failures out of the remaining ''k'' + ''r'' − 1 trials. The above binomial coefficient, due to its combinatorial interpretation, gives precisely the number of all these sequences of length ''k'' + ''r'' − 1.


Cumulative distribution function

The
cumulative distribution function In probability theory and statistics, the cumulative distribution function (CDF) of a real-valued random variable X, or just distribution function of X, evaluated at x, is the probability that X will take a value less than or equal to x. Ev ...
can be expressed in terms of the
regularized incomplete beta function In mathematics, the beta function, also called the Euler integral of the first kind, is a special function that is closely related to the gamma function and to binomial coefficients. It is defined by the integral : \Beta(z_1,z_2) = \int_0^1 t^ ...
: : F(k; r, p) \equiv \Pr(X\le k) = I_(r, k+1) = 1- I_(k+1,r). It can also be expressed in terms of the
cumulative distribution function In probability theory and statistics, the cumulative distribution function (CDF) of a real-valued random variable X, or just distribution function of X, evaluated at x, is the probability that X will take a value less than or equal to x. Ev ...
of the
binomial distribution In probability theory and statistics, the binomial distribution with parameters ''n'' and ''p'' is the discrete probability distribution of the number of successes in a sequence of ''n'' independent experiments, each asking a yes–no qu ...
: : F(k; r, p) = F_(k;n=k+r,p).


Alternative formulations

Some sources may define the negative binomial distribution slightly differently from the primary one here. The most common variations are where the random variable ''X'' is counting different things. These variations can be seen in the table here: Each of these definitions of the negative binomial distribution can be expressed in slightly different but equivalent ways. The first alternative formulation is simply an equivalent form of the binomial coefficient, that is: \binom ab = \binom a \quad \text\ 0\leq b\leq a. The second alternate formulation somewhat simplifies the expression by recognizing that the total number of trials is simply the number of successes and failures, that is: n=r+k . These second formulations may be more intuitive to understand, however they are perhaps less practical as they have more terms. # The definition where ''X'' is the number of ''k'' failures that occur for a given number of ''r'' successes. This definition is very similar to the primary definition used in this article, only that ''k'' successes and ''r'' failures are switched when considering what is being counted and what is given. Note however, that ''p'' still refers to the probability of "success". # The definition where ''X'' is the number of ''n'' trials that occur for a given number of ''r'' successes. This definition is very similar to definition #2, only that ''r'' successes is given instead of ''k'' failures. Note however, that ''p'' still refers to the probability of "success". * The definition of the negative binomial distribution can be extended to the case where the parameter ''r'' can take on a positive
real Real may refer to: Currencies * Brazilian real (R$) * Central American Republic real * Mexican real * Portuguese real * Spanish real * Spanish colonial real Music Albums * ''Real'' (L'Arc-en-Ciel album) (2000) * ''Real'' (Bright album) (201 ...
value. Although it is impossible to visualize a non-integer number of "failures", we can still formally define the distribution through its probability mass function. The problem of extending the definition to real-valued (positive) ''r'' boils down to extending the binomial coefficient to its real-valued counterpart, based on the
gamma function In mathematics, the gamma function (represented by , the capital letter gamma from the Greek alphabet) is one commonly used extension of the factorial function to complex numbers. The gamma function is defined for all complex numbers except th ...
: :: \binom = \frac = \frac : After substituting this expression in the original definition, we say that ''X'' has a negative binomial (or Pólya) distribution if it has a
probability mass function In probability and statistics, a probability mass function is a function that gives the probability that a discrete random variable is exactly equal to some value. Sometimes it is also known as the discrete density function. The probability mass ...
: :: f(k; r, p) \equiv \Pr(X = k) = \frac (1-p)^k p^r \quad\textk = 0, 1, 2, \dotsc : Here ''r'' is a real, positive number. In negative binomial regression, the distribution is specified in terms of its mean, m=\frac, which is then related to explanatory variables as in
linear regression In statistics, linear regression is a linear approach for modelling the relationship between a scalar response and one or more explanatory variables (also known as dependent and independent variables). The case of one explanatory variable is ...
or other
generalized linear model In statistics, a generalized linear model (GLM) is a flexible generalization of ordinary linear regression. The GLM generalizes linear regression by allowing the linear model to be related to the response variable via a ''link function'' and by ...
s. From the expression for the mean ''m'', one can derive p=\frac and 1-p=\frac. Then, substituting these expressions in the one for the probability mass function when ''r'' is real-valued, yields this parametrization of the probability mass function in terms of ''m'': : \Pr(X = k) = \frac \left(\frac\right)^r \left(\frac\right)^k \quad\textk = 0, 1, 2, \dotsc The variance can then be written as m+\frac. Some authors prefer to set \alpha = \frac, and express the variance as m+\alpha m^2. In this context, and depending on the author, either the parameter ''r'' or its reciprocal ''α'' is referred to as the "dispersion parameter", "shape parameter" or "clustering coefficient", or the "heterogeneity" or "aggregation" parameter. The term "aggregation" is particularly used in ecology when describing counts of individual organisms. Decrease of the aggregation parameter ''r'' towards zero corresponds to increasing aggregation of the organisms; increase of ''r'' towards infinity corresponds to absence of aggregation, as can be described by
Poisson regression In statistics, Poisson regression is a generalized linear model form of regression analysis used to model count data and contingency tables. Poisson regression assumes the response variable ''Y'' has a Poisson distribution, and assumes the loga ...
.


Alternative parameterizations

Sometimes the distribution is parameterized in terms of its mean ''μ'' and variance ''σ''2: :: \begin & p =\frac, \\ pt& r =\frac, \\ pt& \Pr(X=k) = \left(\frac\right)^k \left(\frac \mu \right)^. \end


Examples


Length of hospital stay

Hospital
length of stay Length of stay (LOS) is the duration of a single episode of hospitalization. patient, Inpatient days are calculated by subtracting day of admission from day of :wikt:discharge, discharge. Analysis A common statistic associated with length of stay ...
is an example of real-world data that can be modelled well with a negative binomial distribution via
Negative binomial regression In statistics, Poisson regression is a generalized linear model form of regression analysis used to model count data and contingency tables. Poisson regression assumes the response variable ''Y'' has a Poisson distribution, and assumes the logari ...
.


Selling candy

Pat Collis is required to sell candy bars to raise money for the 6th grade field trip. Pat is (somewhat harshly) not supposed to return home until five candy bars have been sold. So the child goes door to door, selling candy bars. At each house, there is a 0.6 probability of selling one candy bar and a 0.4 probability of selling nothing. ''What's the probability of selling the last candy bar at the'' ''n''th ''house?'' Successfully selling candy enough times is what defines our stopping criterion (as opposed to failing to sell it), so ''k'' in this case represents the number of failures and ''r'' represents the number of successes. Recall that the NegBin(''r'', ''p'') distribution describes the probability of ''k'' failures and ''r'' successes in ''k'' + ''r'' Bernoulli(''p'') trials with success on the last trial. Selling five candy bars means getting five successes. The number of trials (i.e. houses) this takes is therefore ''k'' + 5 = ''n''. The random variable we are interested in is the number of houses, so we substitute ''k'' = ''n'' − 5 into a NegBin(5, 0.4) mass function and obtain the following mass function of the distribution of houses (for ''n'' ≥ 5): : f(n) = \; (1-0.4)^5 \; 0.4^ = \; 3^5 \; \frac. ''What's the probability that Pat finishes on the tenth house?'' : f(10) = 0.1003290624. \, ''What's the probability that Pat finishes on or before reaching the eighth house?'' To finish on or before the eighth house, Pat must finish at the fifth, sixth, seventh, or eighth house. Sum those probabilities: : f(5) = 0.07776 \, : f(6) = 0.15552 \, : f(7) = 0.18662 \, : f(8) = 0.17418 \, :\sum_^8 f(j) = 0.59408. ''What's the probability that Pat exhausts all 30 houses that happen to stand in the neighborhood?'' This can be expressed as the probability that Pat does not finish on the fifth through the thirtieth house: :1-\sum_^ f(j) = 1 - I_(5, 30-5+1) \approx 1 - 0.99999342 = 0.00000658. Because of the rather high probability that Pat will sell to each house (60 percent), the probability of her NOT fulfilling her quest is vanishingly slim.


Properties


Expectation

The expected total number of successes in a negative binomial distribution with parameters is ''rp''/(1 − ''p''). To see this, imagine an experiment simulating the negative binomial is performed many times. That is, a set of trials is performed until failures are obtained, then another set of trials, and then another etc. Write down the number of trials performed in each experiment: and set . Now we would expect about successes in total. Say the experiment was performed times. Then there are failures in total. So we would expect , so . See that is just the average number of trials per experiment. That is what we mean by "expectation". The average number of successes per experiment is . This agrees with the mean given in the box on the right-hand side of this page. A rigorous derivation can be done by representing the negative binomial distribution as the sum of waiting times. Let X_r \sim\operatorname(r, p) with the convention X represents the number of successes observed before r failures with the probability of success being p. And let Y_i \sim Geom(1-p) where Y_i represents the number of successes before seeing a failure. We can think of Y_i as the waiting time (number of success) between the ith and (i-1)th failure. Thus : X_r = Y_1 + Y_2 + \ldots + Y_r. The mean is : E _r= E _1+ E _2+ \ldots + E _r= \dfrac which follows from the fact E _i= p/1-p.


Variance

When counting the number of successes given the number ''r'' of failures, the variance is ''rp''/(1 − ''p'')2. When counting the number of failures before the ''r''-th success, the variance is ''r''(1 − ''p'')/''p''2.


Relation to the binomial theorem

Suppose ''Y'' is a random variable with a
binomial distribution In probability theory and statistics, the binomial distribution with parameters ''n'' and ''p'' is the discrete probability distribution of the number of successes in a sequence of ''n'' independent experiments, each asking a yes–no qu ...
with parameters ''n'' and ''p''. Assume ''p'' + ''q'' = 1, with ''p'', ''q'' ≥ 0, then :1=1^n=(p+q)^n. Using Newton's binomial theorem, this can equally be written as: :(p+q)^n=\sum_^\infty p^k q^, in which the upper bound of summation is infinite. In this case, the
binomial coefficient In mathematics, the binomial coefficients are the positive integers that occur as coefficients in the binomial theorem. Commonly, a binomial coefficient is indexed by a pair of integers and is written \tbinom. It is the coefficient of the t ...
: =. is defined when ''n'' is a real number, instead of just a positive integer. But in our case of the binomial distribution it is zero when ''k'' > ''n''. We can then say, for example : (p+q)^=\sum_^\infty p^k q^. Now suppose ''r'' > 0 and we use a negative exponent: :1=p^r\cdot p^=p^r (1-q)^=p^r \sum_^\infty (-q)^k. Then all of the terms are positive, and the term :p^r (-q)^k is just the probability that the number of failures before the ''r''th success is equal to ''k'', provided ''r'' is an integer. (If ''r'' is a negative non-integer, so that the exponent is a positive non-integer, then some of the terms in the sum above are negative, so we do not have a probability distribution on the set of all nonnegative integers.) Now we also allow non-integer values of ''r''. Then we have a proper negative binomial distribution, which is a generalization of the Pascal distribution, which coincides with the Pascal distribution when ''r'' happens to be a positive integer. Recall from above that :The sum of independent negative-binomially distributed random variables ''r''1 and ''r''2 with the same value for parameter ''p'' is negative-binomially distributed with the same ''p'' but with ''r''-value ''r''1 + ''r''2. This property persists when the definition is thus generalized, and affords a quick way to see that the negative binomial distribution is
infinitely divisible Infinite divisibility arises in different ways in philosophy, physics, economics, order theory (a branch of mathematics), and probability theory (also a branch of mathematics). One may speak of infinite divisibility, or the lack thereof, of matter ...
.


Recurrence relation

The following
recurrence relation In mathematics, a recurrence relation is an equation according to which the nth term of a sequence of numbers is equal to some combination of the previous terms. Often, only k previous terms of the sequence appear in the equation, for a paramete ...
holds: : \begin (k+1) \Pr (k+1)-p \Pr (k) (k+r)=0, \\ pt\Pr (0)=(1-p)^r \end


Related distributions

* The
geometric distribution In probability theory and statistics, the geometric distribution is either one of two discrete probability distributions: * The probability distribution of the number ''X'' of Bernoulli trials needed to get one success, supported on the set \; ...
(on ) is a special case of the negative binomial distribution, with ::\operatorname(p) = \operatorname(1,\, 1-p).\, * The negative binomial distribution is a special case of the discrete phase-type distribution. * The negative binomial distribution is a special case of discrete
Compound Poisson distribution In probability theory, a compound Poisson distribution is the probability distribution of the sum of a number of independent identically-distributed random variables, where the number of terms to be added is itself a Poisson-distributed variable. T ...
.


Poisson distribution

Consider a sequence of negative binomial random variables where the stopping parameter ''r'' goes to infinity, whereas the probability of success in each trial, ''p'', goes to zero in such a way as to keep the mean of the distribution constant. Denoting this mean as ''λ'', the parameter ''p'' will be ''p'' = ''r''/(''r'' + ''λ'') : \begin \text \quad & \lambda = \frac \quad \Rightarrow \quad p = \frac, \\ \text \quad & \lambda \left( 1 + \frac \right) > \lambda, \quad \text. \end Under this parametrization the probability mass function will be : f(k; r, p) = \fracp^k(1-p)^r = \frac \cdot \frac \cdot \frac Now if we consider the limit as ''r'' → ∞, the second factor will converge to one, and the third to the exponent function: : \lim_ f(k; r, p) = \frac \cdot 1 \cdot \frac, which is the mass function of a
Poisson-distributed In probability theory and statistics, the Poisson distribution is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time or space if these events occur with a known ...
random variable with expected value ''λ''. In other words, the alternatively parameterized negative binomial distribution converges to the Poisson distribution and ''r'' controls the deviation from the Poisson. This makes the negative binomial distribution suitable as a robust alternative to the Poisson, which approaches the Poisson for large ''r'', but which has larger variance than the Poisson for small ''r''. : \operatorname(\lambda) = \lim_ \operatorname \left(r, \frac\right).


Gamma–Poisson mixture

The negative binomial distribution also arises as a continuous mixture of
Poisson distribution In probability theory and statistics, the Poisson distribution is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time or space if these events occur with a known ...
s (i.e. a
compound probability distribution In probability and statistics, a compound probability distribution (also known as a mixture distribution or contagious distribution) is the probability distribution that results from assuming that a random variable is distributed according to some ...
) where the mixing distribution of the Poisson rate is a
gamma distribution In probability theory and statistics, the gamma distribution is a two- parameter family of continuous probability distributions. The exponential distribution, Erlang distribution, and chi-square distribution are special cases of the gamma dis ...
. That is, we can view the negative binomial as a distribution, where ''λ'' is itself a random variable, distributed as a gamma distribution with shape = ''r'' and scale ''θ'' = or correspondingly rate . To display the intuition behind this statement, consider two independent Poisson processes, "Success" and "Failure", with intensities ''p'' and 1 − ''p''. Together, the Success and Failure processes are equivalent to a single Poisson process of intensity 1, where an occurrence of the process is a success if a corresponding independent coin toss comes up heads with probability ''p''; otherwise, it is a failure. If ''r'' is a counting number, the coin tosses show that the count of successes before the ''r''th failure follows a negative binomial distribution with parameters ''r'' and ''p''. The count is also, however, the count of the Success Poisson process at the random time ''T'' of the ''r''th occurrence in the Failure Poisson process. The Success count follows a Poisson distribution with mean ''pT'', where ''T'' is the waiting time for ''r'' occurrences in a Poisson process of intensity 1 − ''p'', i.e., ''T'' is gamma-distributed with shape parameter ''r'' and intensity 1 − ''p''. Thus, the negative binomial distribution is equivalent to a Poisson distribution with mean ''pT'', where the random variate ''T'' is gamma-distributed with shape parameter ''r'' and intensity . The preceding paragraph follows, because ''λ'' = ''pT'' is gamma-distributed with shape parameter ''r'' and intensity . The following formal derivation (which does not depend on ''r'' being a counting number) confirms the intuition. : \begin \int_0^\infty f_(k) \times f_(\lambda) \, \mathrm\lambda & = \int_0^\infty \frac e^ \times \left(\frac\right)^r \, \lambda^\frac\, \mathrm\lambda \\ pt & = \left(\frac\right)^r \frac \int_0^\infty \lambda^ e^ \;\mathrm\lambda \\ pt & = \left(\frac\right)^r \frac \Gamma(r+k) (1-p)^ \int_0^\infty f_(\lambda) \;\mathrm\lambda \\ pt & = \frac \; (1-p)^k \,p^r \\ pt & = f(k; r, p). \end Because of this, the negative binomial distribution is also known as the gamma–Poisson (mixture) distribution. The negative binomial distribution was originally derived as a limiting case of the gamma-Poisson distribution.


Distribution of a sum of geometrically distributed random variables

If ''Y''''r'' is a random variable following the negative binomial distribution with parameters ''r'' and ''p'', and support , then ''Y''''r'' is a sum of ''r''
independent Independent or Independents may refer to: Arts, entertainment, and media Artist groups * Independents (artist group), a group of modernist painters based in the New Hope, Pennsylvania, area of the United States during the early 1930s * Independe ...
variables following the
geometric distribution In probability theory and statistics, the geometric distribution is either one of two discrete probability distributions: * The probability distribution of the number ''X'' of Bernoulli trials needed to get one success, supported on the set \; ...
(on ) with parameter ''p''. As a result of the
central limit theorem In probability theory, the central limit theorem (CLT) establishes that, in many situations, when independent random variables are summed up, their properly normalized sum tends toward a normal distribution even if the original variables thems ...
, ''Y''''r'' (properly scaled and shifted) is therefore approximately normal for sufficiently large ''r''. Furthermore, if ''B''''s''+''r'' is a random variable following the
binomial distribution In probability theory and statistics, the binomial distribution with parameters ''n'' and ''p'' is the discrete probability distribution of the number of successes in a sequence of ''n'' independent experiments, each asking a yes–no qu ...
with parameters ''s'' + ''r'' and ''p'', then : \begin \Pr(Y_r \leq s) & = 1 - I_p(s+1, r) \\ pt& = 1 - I_((s+r)-(r-1), (r-1)+1) \\ pt& = 1 - \Pr(B_ \leq r-1) \\ pt& = \Pr(B_ \geq r) \\ pt& = \Pr(\text s+r \text r \text). \end In this sense, the negative binomial distribution is the "inverse" of the binomial distribution. The sum of independent negative-binomially distributed random variables ''r''1 and ''r''2 with the same value for parameter ''p'' is negative-binomially distributed with the same ''p'' but with ''r''-value ''r''1 + ''r''2. The negative binomial distribution is
infinitely divisible Infinite divisibility arises in different ways in philosophy, physics, economics, order theory (a branch of mathematics), and probability theory (also a branch of mathematics). One may speak of infinite divisibility, or the lack thereof, of matter ...
, i.e., if ''Y'' has a negative binomial distribution, then for any positive integer ''n'', there exist independent identically distributed random variables ''Y''1, ..., ''Y''''n'' whose sum has the same distribution that ''Y'' has.


Representation as compound Poisson distribution

The negative binomial distribution NB(''r'',''p'') can be represented as a
compound Poisson distribution In probability theory, a compound Poisson distribution is the probability distribution of the sum of a number of independent identically-distributed random variables, where the number of terms to be added is itself a Poisson-distributed variable. T ...
: Let denote a sequence of
independent and identically distributed random variables In probability theory and statistics, a collection of random variables is independent and identically distributed if each random variable has the same probability distribution as the others and all are mutually independent. This property is usu ...
, each one having the
logarithmic distribution In probability and statistics, the logarithmic distribution (also known as the logarithmic series distribution or the log-series distribution) is a discrete probability distribution derived from the Maclaurin series expansion : -\ln(1-p) = p ...
Log(''p''), with probability mass function : f(k; r, p) = \frac,\qquad k\in. Let ''N'' be a random variable,
independent Independent or Independents may refer to: Arts, entertainment, and media Artist groups * Independents (artist group), a group of modernist painters based in the New Hope, Pennsylvania, area of the United States during the early 1930s * Independe ...
of the sequence, and suppose that ''N'' has a
Poisson distribution In probability theory and statistics, the Poisson distribution is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time or space if these events occur with a known ...
with mean . Then the random sum : X=\sum_^N Y_n is NB(''r'',''p'')-distributed. To prove this, we calculate the
probability generating function In probability theory, the probability generating function of a discrete random variable is a power series representation (the generating function) of the probability mass function of the random variable. Probability generating functions are often ...
''GX'' of ''X'', which is the composition of the probability generating functions ''GN'' and ''G''''Y''1. Using :G_N(z)=\exp(\lambda(z-1)),\qquad z\in\mathbb, and : G_(z)=\frac,\qquad , z, <\frac1p, we obtain : \beginG_X(z) & =G_N(G_(z))\\ pt&=\exp\biggl(\lambda\biggl(\frac-1\biggr)\biggr)\\ pt&=\exp\bigl(-r(\ln(1-pz)-\ln(1-p))\bigr)\\ pt&=\biggl(\frac\biggr)^r,\qquad , z, <\frac1p, \end which is the probability generating function of the NB(''r'',''p'') distribution. The following table describes four distributions related to the number of successes in a sequence of draws:


(a,b,0) class of distributions

The negative binomial, along with the Poisson and binomial distributions, is a member of the (''a'',''b'',0) class of distributions. All three of these distributions are special cases of the Panjer distribution. They are also members of the
natural exponential family In probability and statistics, a natural exponential family (NEF) is a class of probability distributions that is a special case of an exponential family (EF). Definition Univariate case The natural exponential families (NEF) are a subset of ...
.


Statistical inference


Parameter estimation


MVUE for ''p''

Suppose ''p'' is unknown and an experiment is conducted where it is decided ahead of time that sampling will continue until ''r'' successes are found. A
sufficient statistic In statistics, a statistic is ''sufficient'' with respect to a statistical model and its associated unknown parameter if "no other statistic that can be calculated from the same sample provides any additional information as to the value of the pa ...
for the experiment is ''k'', the number of failures. In estimating ''p'', the
minimum variance unbiased estimator In statistics a minimum-variance unbiased estimator (MVUE) or uniformly minimum-variance unbiased estimator (UMVUE) is an unbiased estimator that has lower variance than any other unbiased estimator for all possible values of the parameter. For pr ...
is : \widehat=\frac.


Maximum likelihood estimation

When ''r'' is known, the
maximum likelihood In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of an assumed probability distribution, given some observed data. This is achieved by maximizing a likelihood function so that, under the assumed sta ...
estimate of ''p'' is : \widetilde=\frac, but this is a biased estimate. Its inverse (''r'' + ''k'')/''r'', is an unbiased estimate of 1/''p'', however. When ''r'' is unknown, the maximum likelihood estimator for ''p'' and ''r'' together only exists for samples for which the sample variance is larger than the sample mean. The likelihood function for ''N'' iid observations (''k''1, ..., ''k''''N'') is :L(r,p)=\prod_^N f(k_i;r,p)\,\! from which we calculate the log-likelihood function :\ell(r,p) = \sum_^N \ln(\Gamma(k_i + r)) - \sum_^N \ln(k_i !) - N\ln(\Gamma(r)) + \sum_^N k_i \ln(1-p) + Nr \ln(p). To find the maximum we take the partial derivatives with respect to ''r'' and ''p'' and set them equal to zero: :\frac = -\left sum_^N k_i \frac\right+ Nr \frac = 0 and :\frac = \left sum_^N \psi(k_i + r)\right- N\psi(r) + N\ln(p) = 0 where : \psi(k) = \frac \! is the
digamma function In mathematics, the digamma function is defined as the logarithmic derivative of the gamma function: :\psi(x)=\frac\ln\big(\Gamma(x)\big)=\frac\sim\ln-\frac. It is the first of the polygamma functions. It is strictly increasing and strict ...
. Solving the first equation for ''p'' gives: :p = \frac Substituting this in the second equation gives: :\frac = \left sum_^N \psi(k_i + r)\right- N\psi(r) + N\ln\left(\frac\right) = 0 This equation cannot be solved for ''r'' in closed form. If a numerical solution is desired, an iterative technique such as
Newton's method In numerical analysis, Newton's method, also known as the Newton–Raphson method, named after Isaac Newton and Joseph Raphson, is a root-finding algorithm which produces successively better approximations to the roots (or zeroes) of a real ...
can be used. Alternatively, the
expectation–maximization algorithm In statistics, an expectation–maximization (EM) algorithm is an iterative method to find (local) maximum likelihood or maximum a posteriori (MAP) estimates of parameters in statistical models, where the model depends on unobserved latent variab ...
can be used.


Occurrence and applications


Waiting time in a Bernoulli process

For the special case where ''r'' is an integer, the negative binomial distribution is known as the Pascal distribution. It is the probability distribution of a certain number of failures and successes in a series of
independent and identically distributed In probability theory and statistics, a collection of random variables is independent and identically distributed if each random variable has the same probability distribution as the others and all are mutually independent. This property is usua ...
Bernoulli trials. For ''k'' + ''r''
Bernoulli trial In the theory of probability and statistics, a Bernoulli trial (or binomial trial) is a random experiment with exactly two possible outcomes, "success" and "failure", in which the probability of success is the same every time the experiment is ...
s with success probability ''p'', the negative binomial gives the probability of ''k'' successes and ''r'' failures, with a failure on the last trial. In other words, the negative binomial distribution is the probability distribution of the number of successes before the ''r''th failure in a
Bernoulli process In probability and statistics, a Bernoulli process (named after Jacob Bernoulli) is a finite or infinite sequence of binary random variables, so it is a discrete-time stochastic process that takes only two values, canonically 0 and 1. Th ...
, with probability ''p'' of successes on each trial. A Bernoulli process is a
discrete Discrete may refer to: *Discrete particle or quantum in physics, for example in quantum theory *Discrete device, an electronic component with just one circuit element, either passive or active, other than an integrated circuit *Discrete group, a ...
time process, and so the number of trials, failures, and successes are integers. Consider the following example. Suppose we repeatedly throw a die, and consider a 1 to be a failure. The probability of success on each trial is 5/6. The number of successes before the third failure belongs to the infinite set . That number of successes is a negative-binomially distributed random variable. When ''r'' = 1 we get the probability distribution of number of successes before the first failure (i.e. the probability of the first failure occurring on the (''k'' + 1)st trial), which is a
geometric distribution In probability theory and statistics, the geometric distribution is either one of two discrete probability distributions: * The probability distribution of the number ''X'' of Bernoulli trials needed to get one success, supported on the set \; ...
: : f(k; r, p) = (1-p) \cdot p^k \!


Overdispersed Poisson

The negative binomial distribution, especially in its alternative parameterization described above, can be used as an alternative to the Poisson distribution. It is especially useful for discrete data over an unbounded positive range whose sample
variance In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its population mean or sample mean. Variance is a measure of dispersion, meaning it is a measure of how far a set of number ...
exceeds the sample
mean There are several kinds of mean in mathematics, especially in statistics. Each mean serves to summarize a given group of data, often to better understand the overall value ( magnitude and sign) of a given data set. For a data set, the '' ari ...
. In such cases, the observations are overdispersed with respect to a Poisson distribution, for which the mean is equal to the variance. Hence a Poisson distribution is not an appropriate model. Since the negative binomial distribution has one more parameter than the Poisson, the second parameter can be used to adjust the variance independently of the mean. See Cumulants of some discrete probability distributions. An application of this is to annual counts of
tropical cyclone A tropical cyclone is a rapidly rotating storm system characterized by a low-pressure center, a closed low-level atmospheric circulation, strong winds, and a spiral arrangement of thunderstorms that produce heavy rain and squalls. Dep ...
s in the
North Atlantic The Atlantic Ocean is the second-largest of the world's five oceans, with an area of about . It covers approximately 20% of Earth's surface and about 29% of its water surface area. It is known to separate the "Old World" of Africa, Europe and ...
or to monthly to 6-monthly counts of wintertime extratropical cyclones over Europe, for which the variance is greater than the mean. In the case of modest overdispersion, this may produce substantially similar results to an overdispersed Poisson distribution. The negative binomial distribution is also commonly used to model data in the form of discrete sequence read counts from high-throughput RNA and DNA sequencing experiments.


History

This distribution was first studied in 1713, by Montmort, as the distribution of the number of trials required in an experiment to obtain a given number of successes.Montmort PR de (1713) Essai d'analyse sur les jeux de hasard. 2nd ed. Quillau, Paris It had previously been mentioned by
Pascal Pascal, Pascal's or PASCAL may refer to: People and fictional characters * Pascal (given name), including a list of people with the name * Pascal (surname), including a list of people and fictional characters with the name ** Blaise Pascal, Frenc ...
.Pascal B (1679) Varia Opera Mathematica. D. Petri de Fermat. Tolosae


See also

*
Coupon collector's problem In probability theory, the coupon collector's problem describes "collect all coupons and win" contests. It asks the following question: If each box of a brand of cereals contains a coupon, and there are ''n'' different types of coupons, what is th ...
*
Beta negative binomial distribution In probability theory, a beta negative binomial distribution is the probability distribution of a discrete random variable X equal to the number of failures needed to get r successes in a sequence of independent Bernoulli trials. The probabil ...
*
Extended negative binomial distribution In probability and statistics the extended negative binomial distribution is a discrete probability distribution extending the negative binomial distribution. It is a truncated version of the negative binomial distribution for which estimation met ...
*
Negative multinomial distribution In probability theory and statistics, the negative multinomial distribution is a generalization of the negative binomial distribution (NB(''x''0, ''p'')) to more than two outcomes.Le Gall, F. The modes of a negative multinomial distributio ...
*
Binomial distribution In probability theory and statistics, the binomial distribution with parameters ''n'' and ''p'' is the discrete probability distribution of the number of successes in a sequence of ''n'' independent experiments, each asking a yes–no qu ...
*
Poisson distribution In probability theory and statistics, the Poisson distribution is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time or space if these events occur with a known ...
*
Compound Poisson distribution In probability theory, a compound Poisson distribution is the probability distribution of the sum of a number of independent identically-distributed random variables, where the number of terms to be added is itself a Poisson-distributed variable. T ...
*
Exponential family In probability and statistics, an exponential family is a parametric set of probability distributions of a certain form, specified below. This special form is chosen for mathematical convenience, including the enabling of the user to calculate ...
*
Negative binomial regression In statistics, Poisson regression is a generalized linear model form of regression analysis used to model count data and contingency tables. Poisson regression assumes the response variable ''Y'' has a Poisson distribution, and assumes the logari ...
*
Vector generalized linear model In statistics, the class of vector generalized linear models (VGLMs) was proposed to enlarge the scope of models catered for by generalized linear models (GLMs). In particular, VGLMs allow for response variables outside the classical exponen ...


References

{{DEFAULTSORT:Negative Binomial Distribution Discrete distributions Exponential family distributions Compound probability distributions Factorial and binomial topics Infinitely divisible probability distributions