HOME

TheInfoList



OR:

In
probability theory Probability theory is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expressing it through a set o ...
and
statistics Statistics (from German language, German: ''wikt:Statistik#German, Statistik'', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of ...
, the negative hypergeometric distribution describes probabilities for when sampling from a finite population without replacement in which each sample can be classified into two mutually exclusive categories like Pass/Fail or Employed/Unemployed. As random selections are made from the population, each subsequent draw decreases the population causing the probability of success to change with each draw. Unlike the standard
hypergeometric distribution In probability theory and statistics, the hypergeometric distribution is a discrete probability distribution that describes the probability of k successes (random draws for which the object drawn has a specified feature) in n draws, ''without'' ...
, which describes the number of successes in a fixed sample size, in the negative hypergeometric distribution, samples are drawn until r failures have been found, and the distribution describes the probability of finding k successes in such a sample. In other words, the negative hypergeometric distribution describes the likelihood of k successes in a sample with exactly r failures.


Definition

There are N elements, of which K are defined as "successes" and the rest are "failures". Elements are drawn one after the other, ''without'' replacements, until r failures are encountered. Then, the drawing stops and the number k of successes is counted. The negative hypergeometric distribution, NHG_(k) is the
discrete distribution In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment. It is a mathematical description of a random phenomenon i ...
of this k. Negative hypergeometric distribution
in Encyclopedia of Math.
The negative hypergeometric distribution is a special case of the
beta-binomial distribution In probability theory and statistics, the beta-binomial distribution is a family of discrete probability distributions on a finite support of non-negative integers arising when the probability of success in each of a fixed or known number of B ...
with parameters \alpha = r and \beta = N-K-r+1 both being integers (and n = K). The outcome requires that we observe k successes in (k+r-1) draws and the (k+r)\text bit must be a failure. The probability of the former can be found by the direct application of the
hypergeometric distribution In probability theory and statistics, the hypergeometric distribution is a discrete probability distribution that describes the probability of k successes (random draws for which the object drawn has a specified feature) in n draws, ''without'' ...
(HG_(k)) and the probability of the latter is simply the number of failures remaining (=N-K-(r-1)) divided by the size of the remaining population (=N-(k+r-1). The probability of having exactly k successes up to the r\text failure (i.e. the drawing stops as soon as the sample includes the predefined number of r failures) is then the product of these two probabilities: \frac \cdot \frac =\frac. Therefore, a
random variable A random variable (also called random quantity, aleatory variable, or stochastic variable) is a mathematical formalization of a quantity or object which depends on random events. It is a mapping or a function from possible outcomes (e.g., the po ...
X follows the negative hypergeometric distribution if its
probability mass function In probability and statistics, a probability mass function is a function that gives the probability that a discrete random variable is exactly equal to some value. Sometimes it is also known as the discrete density function. The probability mass ...
(pmf) is given by f(k; N, K, r) \equiv \Pr(X = k) =\frac\quad\textk = 0, 1, 2, \dotsc, K where * N is the population size, * K is the number of success states in the population, * r is the number of failures, * k is the number of observed successes, * a \choose b is a
binomial coefficient In mathematics, the binomial coefficients are the positive integers that occur as coefficients in the binomial theorem. Commonly, a binomial coefficient is indexed by a pair of integers and is written \tbinom. It is the coefficient of the t ...
By design the probabilities sum up to 1. However, in case we want show it explicitly we have: \sum_^ \Pr(X=k) = \sum_^ \frac = \frac\sum_^ = \frac = 1, where we have used that, \begin \sum_^k \binom \binom &=\sum_^k (-1)^ \binom (-1)^ \binom\\ &=(-1)^ \binom = (-1)^ \binom = \binom, \end which can be derived using the binomial identity, , and the
Chu–Vandermonde identity In combinatorics, Vandermonde's identity (or Vandermonde's convolution) is the following identity for binomial coefficients: :=\sum_^r for any nonnegative integers ''r'', ''m'', ''n''. The identity is named after Alexandre-Théophile Vandermo ...
, \sum_^k \binom m j \binom = \binom n k, which holds for any complex-values m and n and any non-negative integer k. The relationship \sum_^k \binom \binom= \binom can also be found by examination of the coefficient of x^k in the expansion of \frac\frac=\frac, using Newton's binomial series.


Expectation

When counting the number k of successes before r failures, the expected number of successes is \frac and can be derived as follows. \begin E &= \sum_^ k \Pr(X=k) = \sum_^ k \frac = \frac\left sum_^ \frac \rightr \\ &= \frac\left sum_^ \rightr = \frac\left sum_^ \rightr\\ &= \frac\left \rightr = \frac, \end where we have used the relationship \sum_^k \binom \binom= \binom, that we derived above to show that the negative hypergeometric distribution was properly normalized.


Variance

The variance can be derived by the following calculation. \begin E ^2&= \sum_^ k^2 \Pr(X=k) = \left sum_^ (k+r)(k+r+1) \Pr(X=k)\right(2r+1)E r^2-r \\ &=\frac\left sum_^ \right(2r+1)E r^2-r\\ &= \frac\left \right(2r+1)E r^2-r = \frac \end Then the variance is \textrm E ^2\left(E right)^2 = \frac


Related distributions

If the drawing stops after a constant number n of draws (regardless of the number of failures), then the number of successes has the
hypergeometric distribution In probability theory and statistics, the hypergeometric distribution is a discrete probability distribution that describes the probability of k successes (random draws for which the object drawn has a specified feature) in n draws, ''without'' ...
, HG_(k). The two functions are related in the following way: NHG_(k) = 1-HG_(r-1) Negative-hypergeometric distribution (like the hypergeometric distribution) deals with draws ''without replacement'', so that the probability of success is different in each draw. In contrast, negative-binomial distribution (like the binomial distribution) deals with draws ''with replacement'', so that the probability of success is the same and the trials are independent. The following table summarizes the four distributions related to drawing items: Some authorsKhan, RA (1994). A note on the generating function of a negative hypergeometric distribution. Sankhya: The Indian Journal of Statistics B, 56(3), 309-313. define the negative hypergeometric distribution to be the number of draws required to get the rth failure. If we let Y denote this number then it is clear that Y=X+r where X is as defined above. Hence the PMF Pr(Y=y)=\binom\frac. If we let the number of failures N-K be denoted by M means that we have Pr(Y=y)=\binom\frac. The support of Y is the set \. It is clear that E E r=\frac and that \textrm \textrm /math>.


References

{{ProbDistributions, discrete-finite Discrete distributions Factorial and binomial topics