In
combinatorics
Combinatorics is an area of mathematics primarily concerned with counting, both as a means and as an end to obtaining results, and certain properties of finite structures. It is closely related to many other areas of mathematics and has many ...
, Bertrand's ballot problem is the question: "In an
election
An election is a formal group decision-making process whereby a population chooses an individual or multiple individuals to hold Public administration, public office.
Elections have been the usual mechanism by which modern representative d ...
where candidate A receives ''p'' votes and candidate B receives ''q'' votes with ''p'' > ''q'', what is the
probability
Probability is a branch of mathematics and statistics concerning events and numerical descriptions of how likely they are to occur. The probability of an event is a number between 0 and 1; the larger the probability, the more likely an e ...
that A will be strictly ahead of B throughout the count under the assumption that votes are counted in a randomly picked order?" The answer is
:
The result was first published by
W. A. Whitworth in 1878, but is named after
Joseph Louis François Bertrand
Joseph Louis François Bertrand (; 11 March 1822 – 5 April 1900) was a French mathematician whose work emphasized number theory, differential geometry, probability theory, economics and thermodynamics.
Biography
Joseph Bertrand was the son of ...
who rediscovered it in 1887.
In Bertrand's original paper, he sketches a proof based on a general formula for the number of favourable sequences using a
recursion relation. He remarks that it seems probable that such a simple result could be proved by a more direct method. Such a proof was given by
Désiré André, based on the observation that the unfavourable sequences can be divided into two equally probable cases, one of which (the case where B receives the first vote) is easily computed; he proves the equality by an explicit
bijection
In mathematics, a bijection, bijective function, or one-to-one correspondence is a function between two sets such that each element of the second set (the codomain) is the image of exactly one element of the first set (the domain). Equival ...
. A variation of his method is popularly known as André's reflection method, although André did not use any reflections.
Bertrand's ballot theorem is related to the cycle lemma. They give similar formulas, but the cycle lemma considers
circular shift
In combinatorial mathematics, a circular shift is the operation of rearranging the entries in a tuple, either by moving the final entry to the first position, while shifting all other entries to the next position, or by performing the inverse ope ...
s of a given ballot counting order rather than all permutations.
Example
Suppose there are 5 voters, of whom 3 vote for candidate ''A'' and 2 vote for candidate ''B'' (so ''p'' = 3 and ''q'' = 2). There are ten equally likely orders in which the votes could be counted:
*''AAABB''
*''AABAB''
*''ABAAB''
*''BAAAB''
*''AABBA''
*''ABABA''
*''BAABA''
*''ABBAA''
*''BABAA''
*''BBAAA''
For the order ''AABAB'', the tally of the votes as the election progresses is:
For each column the tally for ''A'' is always larger than the tally for ''B'', so ''A'' is always strictly ahead of ''B''. For the order ''AABBA'' the tally of the votes as the election progresses is:
For this order, ''B'' is tied with ''A'' after the fourth vote, so ''A'' is not always strictly ahead of ''B''.
Of the 10 possible orders, ''A'' is always ahead of ''B'' only for ''AAABB'' and ''AABAB''. So the probability that ''A'' will always be strictly ahead is
:
and this is indeed equal to
as the theorem predicts.
Equivalent problems
Favourable orders
Rather than computing the probability that a random vote counting order has the desired property, one can instead compute the number of favourable counting orders, then divide by the total number of ways in which the votes could have been counted. (This is the method that was used by Bertrand.) The total number of ways is the
binomial coefficient
In mathematics, the binomial coefficients are the positive integers that occur as coefficients in the binomial theorem. Commonly, a binomial coefficient is indexed by a pair of integers and is written \tbinom. It is the coefficient of the t ...
; Bertrand's proof shows that the number of favourable orders in which to count the votes is
(though he does not give this number explicitly). And indeed after division this gives
.
Random walks
Another related problem is to calculate the number of
random walk
In mathematics, a random walk, sometimes known as a drunkard's walk, is a stochastic process that describes a path that consists of a succession of random steps on some Space (mathematics), mathematical space.
An elementary example of a rand ...
s on the
integer
An integer is the number zero (0), a positive natural number (1, 2, 3, ...), or the negation of a positive natural number (−1, −2, −3, ...). The negations or additive inverses of the positive natural numbers are referred to as negative in ...
s that consist of ''n'' steps of unit length, beginning at the origin and ending at the point ''m'', that never become negative. As ''n'' and ''m'' have the same parity and
, this number is
:
When
and
is even, this gives the
Catalan number
The Catalan numbers are a sequence of natural numbers that occur in various Enumeration, counting problems, often involving recursion, recursively defined objects. They are named after Eugène Charles Catalan, Eugène Catalan, though they were p ...
. Thus the probability that a random walk is never negative and returns to origin at time
is
. By Stirling's formula, when
, this probability is
.
[Note that
have the same parity as follows: let
be the number of "positive" moves, i.e., to the right, and let
be the number of "negative" moves, i.e., to the left. Since
and
, we have
and
. Since
and
are integers,
have the same parity]
Proof by reflection
For A to be strictly ahead of B throughout the counting of the votes, there can be no ties. Separate the counting sequences according to the first vote. Any sequence that begins with a vote for B must reach a tie at some point, because A eventually wins. For any sequence that begins with A and reaches a tie, reflect the votes up to the point of the first tie (so any A becomes a B, and vice versa) to obtain a sequence that begins with B. Hence every sequence that begins with A and reaches a tie is in one-to-one correspondence with a sequence that begins with B, and the probability that a sequence begins with B is
, so the probability that A always leads the vote is
:
the probability of sequences that tie at some point
:
the probability of sequences that tie at some point and begin with A or B
:
the probability of sequences that tie at some point and begin with B
:
the probability that a sequence begins with B
:
Proof by induction
Another method of proof is by
mathematical induction
Mathematical induction is a method for mathematical proof, proving that a statement P(n) is true for every natural number n, that is, that the infinitely many cases P(0), P(1), P(2), P(3), \dots all hold. This is done by first proving a ...
:
*We loosen the condition
to
. Clearly, the theorem is correct when
, since in this case the first candidate will not be ''strictly'' ahead after all the votes have been counted (so the probability is 0).
*Clearly the theorem is true if ''p'' > 0 and ''q'' = 0 when the probability is 1, given that the first candidate receives all the votes; it is also true when ''p'' = ''q'' > 0 as we have just seen.
*Assume it is true both when ''p'' = ''a'' − 1 and ''q'' = ''b'', and when ''p'' = ''a'' and ''q'' = ''b'' − 1, with ''a'' > ''b'' > 0. (We don't need to consider the case
here, since we have already disposed of it before.) Then considering the case with ''p'' = ''a'' and ''q'' = ''b'', the last vote counted is either for the first candidate with probability ''a''/(''a'' + ''b''), or for the second with probability ''b''/(''a'' + ''b''). So the probability of the first being ahead throughout the count to the penultimate vote counted (and also after the final vote) is:
::
*And so it is true for all ''p'' and ''q'' with ''p'' > ''q'' > 0.
Proof by the cycle lemma
A simple proof is based on the cycle lemma of Dvoretzky and Motzkin.
Call a ballot sequence ''dominating'' if A is strictly ahead of B throughout the counting of the votes. The cycle lemma asserts that any sequence of
A's and
B's, where
, has precisely
dominating cyclic permutations. To see this, just arrange the given sequence of
A's and B's in a circle and repeatedly remove adjacent pairs AB until only
A's remain. Each of these A's was the start of a dominating cyclic permutation before anything was removed. So
out of the
cyclic permutations of any arrangement of
A votes and
B votes are dominating.
Proof by martingales
Let
. Define the "backwards counting" stochastic process
where
is the lead of candidate A over B, after
votes have come in.
Claim:
is a
martingale process.
Given , we know that , so of the first votes, were for candidate A, and were for candidate B.
So, with probability , we have , and . Similarly for the other one. Then compute to find .
Define the stopping time
as either the minimum
such that
, or
if there's no such
. Then the probability that candidate A leads all the time is just