Coupon Collector Problem
   HOME

TheInfoList



OR:

In
probability theory Probability theory is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expressing it through a set o ...
, the coupon collector's problem describes "collect all coupons and win" contests. It asks the following question: If each box of a brand of cereals contains a coupon, and there are ''n'' different types of coupons, what is the probability that more than ''t'' boxes need to be bought to collect all ''n'' coupons? An alternative statement is: Given ''n'' coupons, how many coupons do you
expect Expect is an extension to the Tcl scripting language written by Don Libes. The program automates interactions with programs that expose a text terminal interface. Expect, originally written in 1990 for the Unix platform, has since become avail ...
you need to draw with replacement before having drawn each coupon at least once? The mathematical analysis of the problem reveals that the
expected number In probability theory, the expected value (also called expectation, expectancy, mathematical expectation, mean, average, or first moment) is a generalization of the weighted average. Informally, the expected value is the arithmetic mean of a ...
of trials needed grows as \Theta(n\log(n)). For example, when ''n'' = 50 it takes about 225E(50) = 50(1 + 1/2 + 1/3 + ... + 1/50) = 224.9603, the expected number of trials to collect all 50 coupons. The approximation n\log n+\gamma n+1/2 for this expected number gives in this case 50\log 50+50\gamma+1/2 \approx 195.6011+28.8608+0.5\approx 224.9619. trials on average to collect all 50 coupons.


Solution


Calculating the expectation

Let time ''T'' be the number of draws needed to collect all ''n'' coupons, and let ''ti'' be the time to collect the ''i''-th coupon after ''i'' − 1 coupons have been collected. Then T=t_1 + \cdots + t_n. Think of ''T'' and ''ti'' as
random variable A random variable (also called random quantity, aleatory variable, or stochastic variable) is a mathematical formalization of a quantity or object which depends on random events. It is a mapping or a function from possible outcomes (e.g., the po ...
s. Observe that the probability of collecting a coupon is p_i = \frac = \frac. Therefore, t_i has
geometric distribution In probability theory and statistics, the geometric distribution is either one of two discrete probability distributions: * The probability distribution of the number ''X'' of Bernoulli trials needed to get one success, supported on the set \; * ...
with expectation \frac = \frac. By the linearity of expectations we have: : \begin \operatorname(T) & = \operatorname(t_1 + t_2 + \cdots + t_n) \\ & = \operatorname(t_1) + \operatorname(t_2) + \cdots + \operatorname(t_n) \\ & = \frac + \frac + \cdots + \frac \\ & = \frac + \frac + \cdots + \frac \\ & = n \cdot \left(\frac + \frac + \cdots + \frac\right) \\ & = n \cdot H_n. \end Here ''Hn'' is the ''n''-th
harmonic number In mathematics, the -th harmonic number is the sum of the reciprocals of the first natural numbers: H_n= 1+\frac+\frac+\cdots+\frac =\sum_^n \frac. Starting from , the sequence of harmonic numbers begins: 1, \frac, \frac, \frac, \frac, \dot ...
. Using the
asymptotics In mathematical analysis, asymptotic analysis, also known as asymptotics, is a method of describing limiting behavior. As an illustration, suppose that we are interested in the properties of a function as becomes very large. If , then as bec ...
of the harmonic numbers, we obtain: : \operatorname(T) = n \cdot H_n = n \log n + \gamma n + \frac + O(1/n), where \gamma \approx 0.5772156649 is the
Euler–Mascheroni constant Euler's constant (sometimes also called the Euler–Mascheroni constant) is a mathematical constant usually denoted by the lowercase Greek letter gamma (). It is defined as the limiting difference between the harmonic series and the natural l ...
. Using the
Markov inequality In probability theory, Markov's inequality gives an upper bound for the probability that a non-negative function (mathematics), function of a random variable is greater than or equal to some positive Constant (mathematics), constant. It is named a ...
to bound the desired probability: :\operatorname(T \geq cn H_n) \le \frac. The above can be modified slightly to handle the case when we've already collected some of the coupons. Let ''k'' be the number of coupons already collected, then: : \begin \operatorname(T_k) & = \operatorname(t_ + t_ + \cdots + t_n) \\ & = n \cdot \left(\frac + \frac + \cdots + \frac\right) \\ & = n \cdot H_ \end And when k=0 then we get the original result.


Calculating the variance

Using the independence of random variables ''ti'', we obtain: : \begin \operatorname(T)& = \operatorname(t_1 + \cdots + t_n) \\ & = \operatorname(t_1) + \operatorname(t_2) + \cdots + \operatorname(t_n) \\ & = \frac + \frac + \cdots + \frac \\ & < \left(\frac + \frac + \cdots + \frac\right) \\ & = n^2 \cdot \left(\frac + \frac + \cdots + \frac \right) \\ & < \frac n^2 \end since \frac6=\frac+\frac+\cdots+\frac+\cdots (see
Basel problem The Basel problem is a problem in mathematical analysis with relevance to number theory, concerning an infinite sum of inverse squares. It was first posed by Pietro Mengoli in 1650 and solved by Leonhard Euler in 1734, and read on 5 December 1735 ...
). Bound the desired probability using the
Chebyshev inequality In probability theory, Chebyshev's inequality (also called the Bienaymé–Chebyshev inequality) guarantees that, for a wide class of probability distributions, no more than a certain fraction of values can be more than a certain distance from t ...
: :\operatorname\left(, T- n H_n, \geq cn\right) \le \frac.


Tail estimates

A stronger tail estimate for the upper tail be obtained as follows. Let _i^r denote the event that the i-th coupon was not picked in the first r trials. Then : \begin P\left _i^r \right = \left(1-\frac\right)^r \le e^. \end Thus, for r = \beta n \log n, we have P\left _i^r \right \le e^ = n^. Via a union bound over the n coupons, we obtain : \begin P\left T > \beta n \log n \right = P \left \bigcup_i _i^ \right \le n \cdot P _1^ \le n^. \end


Extensions and generalizations

*
Pierre-Simon Laplace Pierre-Simon, marquis de Laplace (; ; 23 March 1749 – 5 March 1827) was a French scholar and polymath whose work was important to the development of engineering, mathematics, statistics, physics, astronomy, and philosophy. He summarized ...
, but also
Paul Erdős Paul Erdős ( hu, Erdős Pál ; 26 March 1913 – 20 September 1996) was a Hungarian mathematician. He was one of the most prolific mathematicians and producers of mathematical conjectures of the 20th century. pursued and proposed problems in ...
and
Alfréd Rényi Alfréd Rényi (20 March 1921 – 1 February 1970) was a Hungarian mathematician known for his work in probability theory, though he also made contributions in combinatorics, graph theory, and number theory. Life Rényi was born in Budapest to ...
, proved the limit theorem for the distribution of ''T''. This result is a further extension of previous bounds. A proof is found in. ::\operatorname(T < n\log n + cn) \to e^, \text n \to \infty. *
Donald J. Newman Donald Joseph (D. J.) Newman (July 27, 1930 – March 28, 2007) was an American mathematician. He gave simple proofs of the prime number theorem and the Hardy-Ramanujan partition formula. He excelled on multiple occasions at the annual Putnam com ...
and
Lawrence Shepp Lawrence Alan Shepp (September 9, 1936 Brooklyn, NY – April 23, 2013, Tucson, AZ) was an American mathematician, specializing in statistics and computational tomography. Shepp obtained his PhD from Princeton University in 1961 with a disserta ...
gave a generalization of the coupon collector's problem when ''m'' copies of each coupon need to be collected. Let ''Tm'' be the first time ''m'' copies of each coupon are collected. They showed that the expectation in this case satisfies: ::\operatorname(T_m) = n \log n + (m-1) n \log\log n + O(n), \text n \to \infty. :Here ''m'' is fixed. When ''m'' = 1 we get the earlier formula for the expectation. * Common generalization, also due to Erdős and Rényi: ::\operatorname\left(T_m < n\log n + (m-1) n \log\log n + cn\right) \to e^, \text n \to \infty. * In the general case of a nonuniform probability distribution, according to
Philippe Flajolet Philippe Flajolet (; 1 December 1948 – 22 March 2011) was a French computer scientist. Biography A former student of École Polytechnique, Philippe Flajolet received his PhD in computer science from University Paris Diderot in 1973 and state ...
et al. ::\operatorname(T)=\int_0^\infty \left(1 - \prod_^m \left(1-e^\right)\right)dt. :This is equal to ::\operatorname(T)=\sum_^ (-1)^ \sum_ \frac, :where ''m'' denotes the number of coupons to be collected and ''PJ'' denotes the probability of getting any coupon in the set of coupons ''J''.


See also

*
McDonald's Monopoly The McDonald's Monopoly game is a sales promotion run by fast food restaurant chain McDonald's, with a theme based on the Hasbro board game ''Monopoly''. The game first ran in the U.S. in 1987 and has since been used worldwide. The promotion has ...
– an example of the coupon collector's problem that further increases the challenge by making some coupons of the set rarer *
Watterson estimator In population genetics, the Watterson estimator is a method for describing the genetic diversity in a population. It was developed by Margaret Wu and G. A. Watterson in the 1970s. It is estimated by counting the number of polymorphic sites. It is a ...
*
Birthday problem In probability theory, the birthday problem asks for the probability that, in a set of randomly chosen people, at least two will share a birthday. The birthday paradox is that, counterintuitively, the probability of a shared birthday exceeds 5 ...


Notes


References

*. *. *. *. * *. *. *. {{refend


External links

*
Coupon Collector Problem
by Ed Pegg, Jr., the
Wolfram Demonstrations Project The Wolfram Demonstrations Project is an organized, open-source collection of small (or medium-size) interactive programs called Demonstrations, which are meant to visually and interactively represent ideas from a range of fields. It is hos ...
. Mathematica package. *
How Many Singles, Doubles, Triples, Etc., Should The Coupon Collector Expect?
', a short note by
Doron Zeilberger Doron Zeilberger (דורון ציילברגר, born 2 July 1950 in Haifa, Israel) is an Israeli mathematician, known for his work in combinatorics. Education and career He received his doctorate from the Weizmann Institute of Science in 1976, ...
. Articles containing proofs Gambling mathematics Probability theorems Probability problems