In
probability theory
Probability theory is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expressing it through a set o ...
, the birthday problem asks for the probability that, in a set of
random
In common usage, randomness is the apparent or actual lack of pattern or predictability in events. A random sequence of events, symbols or steps often has no :wikt:order, order and does not follow an intelligible pattern or combination. Ind ...
ly chosen people, at least two will share a
birthday
A birthday is the anniversary of the birth of a person, or figuratively of an institution. Birthdays of people are celebrated in numerous cultures, often with birthday gifts, birthday cards, a birthday party, or a rite of passage.
Many re ...
. The birthday paradox is that, counterintuitively, the probability of a shared birthday exceeds 50% in a group of only 23 people.
The birthday paradox is a
veridical paradox
A paradox is a logically self-contradictory statement or a statement that runs contrary to one's expectation. It is a statement that, despite apparently valid reasoning from true premises, leads to a seemingly self-contradictory or a logically u ...
: it appears wrong, but is in fact true. While it may seem surprising that only 23 individuals are required to reach a 50% probability of a shared birthday, this result is made more intuitive by considering that the comparisons of birthdays will be made between every possible pair of individuals. With 23 individuals, there are (23 × 22) / 2 = 253 pairs to consider, much more than half the number of days in a year.
Real-world applications for the birthday problem include a cryptographic attack called the
birthday attack, which uses this probabilistic model to reduce the complexity of finding a
collision
In physics, a collision is any event in which two or more bodies exert forces on each other in a relatively short time. Although the most common use of the word ''collision'' refers to incidents in which two or more objects collide with great fo ...
for a
hash function
A hash function is any function that can be used to map data of arbitrary size to fixed-size values. The values returned by a hash function are called ''hash values'', ''hash codes'', ''digests'', or simply ''hashes''. The values are usually u ...
, as well as calculating the approximate risk of a hash collision existing within the hashes of a given size of population.
The problem is generally attributed to
Harold Davenport
Harold Davenport FRS (30 October 1907 – 9 June 1969) was an English mathematician, known for his extensive work in number theory.
Early life
Born on 30 October 1907 in Huncoat, Lancashire, Davenport was educated at Accrington Grammar Scho ...
in about 1927, though he did not publish it at the time. Davenport did not claim to be its discoverer "because he could not believe that it had not been stated earlier". The first publication of a version of the birthday problem was by
Richard von Mises
Richard Edler von Mises (; 19 April 1883 – 14 July 1953) was an Austrian scientist and mathematician who worked on solid mechanics, fluid mechanics, aerodynamics, aeronautics, statistics and probability theory. He held the position of Gordo ...
in 1939.
Calculating the probability
From a
permutations
In mathematics, a permutation of a set is, loosely speaking, an arrangement of its members into a sequence or linear order, or if the set is already ordered, a rearrangement of its elements. The word "permutation" also refers to the act or pr ...
perspective, let the event be the probability of finding a group of 23 people without any repeated birthdays. Where the event is the probability of finding a group of 23 people with at least two people sharing same birthday, . is the ratio of the total number of birthdays,
, without repetitions and order matters (e.g. for a group of 2 people, mm/dd birthday format, one possible outcome is
divided by the total number of birthdays with repetition and order matters,
, as it is the total space of outcomes from the experiment (e.g. 2 people, one possible outcome is
. Therefore
and
are
permutations
In mathematics, a permutation of a set is, loosely speaking, an arrangement of its members into a sequence or linear order, or if the set is already ordered, a rearrangement of its elements. The word "permutation" also refers to the act or pr ...
.
:
Another way the birthday problem can be solved is by asking for an approximate probability that in a group of people at least two have the same birthday. For simplicity,
leap year
A leap year (also known as an intercalary year or bissextile year) is a calendar year that contains an additional day (or, in the case of a lunisolar calendar, a month) added to keep the calendar year synchronized with the astronomical year or s ...
s,
twin
Twins are two offspring produced by the same pregnancy.MedicineNet > Definition of TwinLast Editorial Review: 19 June 2000 Twins can be either ''monozygotic'' ('identical'), meaning that they develop from one zygote, which splits and forms two em ...
s,
selection bias, and seasonal and weekly variations in birth rates are generally disregarded, and instead it is assumed that there are 365 possible birthdays, and that each person's birthday is equally likely to be any of these days, independent of the other people in the group. For independent birthdays, the uniform distribution on birthdays is the distribution that minimizes the probability of two people with the same birthday; any unevenness increases this probability. The problem of a non-uniform number of births occurring during each day of the year was first addressed by
Murray Klamkin in 1967. As it happens, the real-world distribution yields a critical size of 23 to reach 50%.
The goal is to compute , the probability that at least two people in the room have the same birthday. However, it is simpler to calculate , the probability that no two people in the room have the same birthday. Then, because and are the only two possibilities and are also
mutually exclusive
In logic and probability theory, two events (or propositions) are mutually exclusive or disjoint if they cannot both occur at the same time. A clear example is the set of outcomes of a single coin toss, which can result in either heads or tails ...
,
Here is the calculation of for 23 people. Let the 23 people be numbered 1 to 23. The
event
Event may refer to:
Gatherings of people
* Ceremony, an event of ritual significance, performed on a special occasion
* Convention (meeting), a gathering of individuals engaged in some common interest
* Event management, the organization of e ...
that all 23 people have different birthdays is the same as the event that person 2 does not have the same birthday as person 1, and that person 3 does not have the same birthday as either person 1 or person 2, and so on, and finally that person 23 does not have the same birthday as any of persons 1 through 22. Let these events be called Event 2, Event 3, and so on. Event 1 is the event of person 1 having a birthday, which occurs with probability 1. This conjunction of events may be computed using
conditional probability
In probability theory, conditional probability is a measure of the probability of an event occurring, given that another event (by assumption, presumption, assertion or evidence) has already occurred. This particular method relies on event B occur ...
: the probability of Event 2 is 364/365, as person 2 may have any birthday other than the birthday of person 1. Similarly, the probability of Event 3 given that Event 2 occurred is 363/365, as person 3 may have any of the birthdays not already taken by persons 1 and 2. This continues until finally the probability of Event 23 given that all preceding events occurred is 343/365. Finally, the principle of conditional probability implies that is equal to the product of these individual probabilities:
The terms of equation () can be collected to arrive at:
Evaluating equation () gives
Therefore, (50.7297%).
This process can be generalized to a group of people, where is the probability of at least two of the people sharing a birthday. It is easier to first calculate the probability that all birthdays are ''different''. According to the
pigeonhole principle
In mathematics, the pigeonhole principle states that if items are put into containers, with , then at least one container must contain more than one item. For example, if one has three gloves (and none is ambidextrous/reversible), then there mu ...
, is zero when . When :
:
where is the
factorial
In mathematics, the factorial of a non-negative denoted is the product of all positive integers less than or equal The factorial also equals the product of n with the next smaller factorial:
\begin
n! &= n \times (n-1) \times (n-2) \t ...
operator, is the
binomial coefficient
In mathematics, the binomial coefficients are the positive integers that occur as coefficients in the binomial theorem. Commonly, a binomial coefficient is indexed by a pair of integers and is written \tbinom. It is the coefficient of the t ...
and denotes
permutation
In mathematics, a permutation of a set is, loosely speaking, an arrangement of its members into a sequence or linear order, or if the set is already ordered, a rearrangement of its elements. The word "permutation" also refers to the act or proc ...
.
The equation expresses the fact that the first person has no one to share a birthday, the second person cannot have the same birthday as the first (), the third cannot have the same birthday as either of the first two (), and in general the th birthday cannot be the same as any of the preceding birthdays.
The
event
Event may refer to:
Gatherings of people
* Ceremony, an event of ritual significance, performed on a special occasion
* Convention (meeting), a gathering of individuals engaged in some common interest
* Event management, the organization of e ...
of at least two of the persons having the same birthday is
complementary
A complement is something that completes something else.
Complement may refer specifically to:
The arts
* Complement (music), an interval that, when added to another, spans an octave
** Aggregate complementation, the separation of pitch-class ...
to all birthdays being different. Therefore, its probability is
:
The following table shows the probability for some other values of (for this table, the existence of leap years is ignored, and each birthday is assumed to be equally likely):
:
Approximations
The
Taylor series
In mathematics, the Taylor series or Taylor expansion of a function is an infinite sum of terms that are expressed in terms of the function's derivatives at a single point. For most common functions, the function and the sum of its Taylor serie ...
expansion of the
exponential function
The exponential function is a mathematical function denoted by f(x)=\exp(x) or e^x (where the argument is written as an exponent). Unless otherwise specified, the term generally refers to the positive-valued function of a real variable, a ...
(the constant )
:
provides a first-order approximation for for
:
:
To apply this approximation to the first expression derived for , set . Thus,
:
Then, replace with non-negative integers for each term in the formula of until , for example, when ,
:
The first expression derived for can be approximated as
:
Therefore,
:
An even coarser approximation is given by
:
which, as the graph illustrates, is still fairly accurate.
According to the approximation, the same approach can be applied to any number of "people" and "days". If rather than 365 days there are , if there are persons, and if , then using the same approach as above we achieve the result that if is the probability that at least two out of people share the same birthday from a set of available days, then:
:
A simple exponentiation
The probability of any two people not having the same birthday is . In a room containing ''n'' people, there are pairs of people, i.e. events. The probability of no two people sharing the same birthday can be approximated by assuming that these events are independent and hence by multiplying their probability together. In short can be multiplied by itself times, which gives us
:
Since this is the probability of no one having the same birthday, then the probability of someone sharing a birthday is
:
Poisson approximation
Applying the
Poisson approximation for the binomial on the group of 23 people,
:
so
:
The result is over 50% as previous descriptions. This approximation is the same as the one above based on the Taylor expansion that uses
.
Square approximation
A good
rule of thumb
In English, the phrase ''rule of thumb'' refers to an approximate method for doing something, based on practical experience rather than theory. This usage of the phrase can be traced back to the 17th century and has been associated with various t ...
which can be used for
mental calculation
Mental calculation consists of arithmetical calculations using only the human brain, with no help from any supplies (such as pencil and paper) or devices such as a calculator. People may use mental calculation when computing tools are not availab ...
is the relation
:
which can also be written as
:
which works well for probabilities less than or equal to . In these equations, is the number of days in a year.
For instance, to estimate the number of people required for a chance of a shared birthday, we get
:
Which is not too far from the correct answer of 23.
Approximation of number of people
This can also be approximated using the following formula for the ''number'' of people necessary to have at least a chance of matching:
:
This is a result of the good approximation that an event with probability will have a chance of occurring at least once if it is repeated times.
Probability table
:
The lighter fields in this table show the number of hashes needed to achieve the given probability of collision (column) given a hash space of a certain size in bits (row). Using the birthday analogy: the "hash space size" resembles the "available days", the "probability of collision" resembles the "probability of shared birthday", and the "required number of hashed elements" resembles the "required number of people in a group". One could also use this chart to determine the minimum hash size required (given upper bounds on the hashes and probability of error), or the probability of collision (for fixed number of hashes and probability of error).
For comparison, to is the uncorrectable bit error rate of a typical hard disk. In theory, 128-bit hash functions, such as
MD5, should stay within that range until about documents, even if its possible outputs are many more.
An upper bound on the probability and a lower bound on the number of people
The argument below is adapted from an argument of
Paul Halmos
Paul Richard Halmos ( hu, Halmos Pál; March 3, 1916 – October 2, 2006) was a Hungarian-born American mathematician and statistician who made fundamental advances in the areas of mathematical logic, probability theory, statistics, operator ...
.
As stated above, the probability that no two birthdays coincide is
:
As in earlier paragraphs, interest lies in the smallest such that ; or equivalently, the smallest such that .
Using the inequality in the above expression we replace with . This yields
:
Therefore, the expression above is not only an approximation, but also an
upper bound
In mathematics, particularly in order theory, an upper bound or majorant of a subset of some preordered set is an element of that is greater than or equal to every element of .
Dually, a lower bound or minorant of is defined to be an eleme ...
of . The inequality
:
implies . Solving for gives
:
Now, is approximately 505.997, which is barely below 506, the value of attained when . Therefore, 23 people suffice. Incidentally, solving for ''n'' gives the approximate formula of Frank H. Mathis cited above.
This derivation only shows that ''at most'' 23 people are needed to ensure a birthday match with even chance; it leaves open the possibility that is 22 or less could also work.
Generalizations
Arbitrary number of days
Given a year with days, the generalized birthday problem asks for the minimal number such that, in a set of randomly chosen people, the probability of a birthday coincidence is at least 50%. In other words, is the minimal integer such that
:
The classical birthday problem thus corresponds to determining . The first 99 values of are given here :
:
A similar calculation shows that = 23 when is in the range 341–372.
A number of bounds and formulas for have been published.
For any , the number satisfies
: