In
statistics, the question of checking whether a coin is fair is one whose importance lies, firstly, in providing a simple problem on which to illustrate basic ideas of
statistical inference and, secondly, in providing a simple problem that can be used to compare various competing methods of statistical inference, including
decision theory
Decision theory (or the theory of choice; not to be confused with choice theory) is a branch of applied probability theory concerned with the theory of making decisions based on assigning probabilities to various factors and assigning numerical ...
. The practical problem of checking whether a coin is fair might be considered as easily solved by performing a sufficiently large number of trials, but statistics and
probability theory
Probability theory is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expressing it through a set o ...
can provide guidance on two types of question; specifically those of how many trials to undertake and of the accuracy of an estimate of the probability of turning up heads, derived from a given sample of trials.
A
fair coin
In probability theory and statistics, a sequence of independent Bernoulli trials with probability 1/2 of success on each trial is metaphorically called a fair coin. One for which the probability is not 1/2 is called a biased or unfair coin. In th ...
is an idealized
randomizing device with two states (usually named
"heads" and "tails") which are equally likely to occur. It is based on the
coin flip used widely in sports and other situations where it is required to give two parties the same chance of winning. Either a specially designed
chip or more usually a simple currency
coin
A coin is a small, flat (usually depending on the country or value), round piece of metal or plastic used primarily as a medium of exchange or legal tender. They are standardized in weight, and produced in large quantities at a mint in orde ...
is used, although the latter might be slightly "unfair" due to an asymmetrical weight distribution, which might cause one state to occur more frequently than the other, giving one party an unfair advantage. So it might be necessary to test experimentally whether the coin is in fact "fair" – that is, whether the probability of the coin's falling on either side when it is tossed is exactly 50%. It is of course impossible to rule out arbitrarily small deviations from fairness such as might be expected to affect only one flip in a lifetime of flipping; also it is always possible for an unfair (or "
biased") coin to happen to turn up exactly 10 heads in 20 flips. Therefore, any fairness test must only establish a certain degree of confidence in a certain degree of fairness (a certain maximum bias). In more rigorous terminology, the problem is of determining the parameters of a
Bernoulli process
In probability and statistics, a Bernoulli process (named after Jacob Bernoulli) is a finite or infinite sequence of binary random variables, so it is a discrete-time stochastic process that takes only two values, canonically 0 and 1. Th ...
, given only a limited sample of
Bernoulli trial
In the theory of probability and statistics, a Bernoulli trial (or binomial trial) is a random experiment with exactly two possible outcomes, "success" and "failure", in which the probability of success is the same every time the experiment is ...
s.
Preamble
This article describes experimental procedures for determining whether a coin is fair or unfair. There are many statistical methods for analyzing such an experimental procedure. This article illustrates two of them.
Both methods prescribe an experiment (or trial) in which the coin is tossed many times and the result of each toss is recorded. The results can then be analysed statistically to decide whether the coin is "fair" or "probably not fair".
* Posterior probability density function, or PDF (
Bayesian approach). Initially, the true probability of obtaining a particular side when a coin is tossed is unknown, but the uncertainty is represented by the "
prior distribution
In Bayesian statistical inference, a prior probability distribution, often simply called the prior, of an uncertain quantity is the probability distribution that would express one's beliefs about this quantity before some evidence is taken int ...
". The theory of
Bayesian inference is used to derive the
posterior distribution
The posterior probability is a type of conditional probability that results from updating the prior probability with information summarized by the likelihood via an application of Bayes' rule. From an epistemological perspective, the posterior p ...
by combining the prior distribution and the
likelihood function
The likelihood function (often simply called the likelihood) represents the probability of random variable realizations conditional on particular values of the statistical parameters. Thus, when evaluated on a given sample, the likelihood funct ...
which represents the information obtained from the experiment. The probability that this particular coin is a "fair coin" can then be obtained by integrating the PDF of the
posterior distribution
The posterior probability is a type of conditional probability that results from updating the prior probability with information summarized by the likelihood via an application of Bayes' rule. From an epistemological perspective, the posterior p ...
over the relevant interval that represents all the probabilities that can be counted as "fair" in a practical sense.
* Estimator of true probability (
Frequentist approach). This method assumes that the experimenter can decide to toss the coin any number of times. The experimenter first decides on the level of confidence required and the tolerable margin of error. These parameters determine the minimum number of tosses that must be performed to complete the experiment.
An important difference between these two approaches is that the first approach gives some weight to one's prior experience of tossing coins, while the second does not. The question of how much weight to give to prior experience, depending on the quality (credibility) of that experience, is discussed under
credibility theory
Credibility theory is a form of statistical inference used to forecast an uncertain future event developed by Thomas Bayes. It is employed to combine multiple estimates into a summary estimate that takes into account information on the accuracy ...
.
Posterior probability density function
One method is to calculate the posterior
probability density function
In probability theory, a probability density function (PDF), or density of a continuous random variable, is a function whose value at any given sample (or point) in the sample space (the set of possible values taken by the random variable) c ...
of
Bayesian probability theory
Bayesian probability is an interpretation of the concept of probability, in which, instead of frequency or propensity of some phenomenon, probability is interpreted as reasonable expectation representing a state of knowledge or as quantification o ...
.
A test is performed by tossing the coin ''N'' times and noting the observed numbers of heads, ''h'', and tails, ''t''. The symbols ''H'' and ''T'' represent more generalised variables expressing the numbers of heads and tails respectively that ''might'' have been observed in the experiment. Thus ''N'' = ''H''+''T'' = ''h''+''t''.
Next, let ''r'' be the actual probability of obtaining heads in a single toss of the coin. This is the property of the coin which is being investigated. Using
Bayes' theorem
In probability theory and statistics, Bayes' theorem (alternatively Bayes' law or Bayes' rule), named after Thomas Bayes, describes the probability of an event, based on prior knowledge of conditions that might be related to the event. For exa ...
, the posterior probability density of ''r'' conditional on ''h'' and ''t'' is expressed as follows:
:
where ''g''(''r'') represents the prior probability density distribution of ''r'', which lies in the range 0 to 1.
The prior probability density distribution summarizes what is known about the distribution of ''r'' in the absence of any observation. We will assume that the
prior distribution
In Bayesian statistical inference, a prior probability distribution, often simply called the prior, of an uncertain quantity is the probability distribution that would express one's beliefs about this quantity before some evidence is taken int ...
of ''r'' is
uniform
A uniform is a variety of clothing worn by members of an organization while participating in that organization's activity. Modern uniforms are most often worn by armed forces and paramilitary organizations such as police, emergency services, se ...
over the interval
, 1
The comma is a punctuation mark that appears in several variants in different languages. It has the same shape as an apostrophe or single closing quotation mark () in many typefaces, but it differs from them in being placed on the baseline o ...
That is, ''g''(''r'') = 1. (In practice, it would be more appropriate to assume a prior distribution which is much more heavily weighted in the region around 0.5, to reflect our experience with real coins.)
The probability of obtaining ''h'' heads in ''N'' tosses of a coin with a probability of heads equal to ''r'' is given by the
binomial distribution
In probability theory and statistics, the binomial distribution with parameters ''n'' and ''p'' is the discrete probability distribution of the number of successes in a sequence of ''n'' independent experiments, each asking a yes–no qu ...
:
:
Substituting this into the previous formula:
:
This is in fact a
beta distribution
In probability theory and statistics, the beta distribution is a family of continuous probability distributions defined on the interval , 1in terms of two positive parameters, denoted by ''alpha'' (''α'') and ''beta'' (''β''), that appear as ...
(the
conjugate prior
In Bayesian probability theory, if the posterior distribution p(\theta \mid x) is in the same probability distribution family as the prior probability distribution p(\theta), the prior and posterior are then called conjugate distributions, and t ...
for the binomial distribution), whose denominator can be expressed in terms of the
beta function
In mathematics, the beta function, also called the Euler integral of the first kind, is a special function that is closely related to the gamma function and to binomial coefficients. It is defined by the integral
: \Beta(z_1,z_2) = \int_0^1 t^ ...
:
:
As a uniform prior distribution has been assumed, and because ''h'' and ''t'' are integers, this can also be written in terms of
factorial
In mathematics, the factorial of a non-negative denoted is the product of all positive integers less than or equal The factorial also equals the product of n with the next smaller factorial:
\begin
n! &= n \times (n-1) \times (n-2) ...
s:
:
Example
For example, let ''N'' = 10, ''h'' = 7, i.e. the coin is tossed 10 times and 7 heads are obtained:
:
The graph on the right shows the
probability density function
In probability theory, a probability density function (PDF), or density of a continuous random variable, is a function whose value at any given sample (or point) in the sample space (the set of possible values taken by the random variable) c ...
of ''r'' given that 7 heads were obtained in 10 tosses. (Note: ''r'' is the probability of obtaining heads when tossing the same coin once.)

The probability for an unbiased coin (defined for this purpose as one whose probability of coming down heads is somewhere between 45% and 55%)
:
is small when compared with the alternative hypothesis (a biased coin). However, it is not small enough to cause us to believe that the coin has a significant bias. This probability is slightly ''higher'' than our presupposition of the probability that the coin was fair corresponding to the uniform prior distribution, which was 10%.
Using a prior distribution that reflects our prior knowledge of what a coin is and how it acts, the posterior distribution would not favor the hypothesis of bias. However the number of trials in this example (10 tosses) is very small, and with more trials the choice of prior distribution would be somewhat less relevant.)
With the uniform prior, the posterior probability distribution ''f''(''r'' , ''H'' = 7,''T'' = 3) achieves its peak at ''r'' = ''h'' / (''h'' + ''t'') = 0.7; this value is called the
maximum ''a posteriori'' (MAP) estimate of ''r''. Also with the uniform prior, the
expected value
In probability theory, the expected value (also called expectation, expectancy, mathematical expectation, mean, average, or first moment) is a generalization of the weighted average. Informally, the expected value is the arithmetic mean of a ...
of ''r'' under the posterior distribution is
:
Estimator of true probability
Using this approach, to decide the number of times the coin should be tossed, two parameters are required:
# The confidence level which is denoted by
confidence interval
In frequentist statistics, a confidence interval (CI) is a range of estimates for an unknown parameter. A confidence interval is computed at a designated ''confidence level''; the 95% confidence level is most common, but other levels, such as ...
(Z)
# The maximum (acceptable) error (E)
*The confidence level is denoted by Z and is given by the Z-value of a standard
normal distribution
In statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is
:
f(x) = \frac e^
The parameter \mu i ...
. This value can be read off a
standard score
In statistics, the standard score is the number of standard deviations by which the value of a raw score (i.e., an observed value or data point) is above or below the mean value of what is being observed or measured. Raw scores above the me ...
statistics table for the normal distribution. Some examples are:
*The maximum error (E) is defined by
where
is the estimated probability of obtaining heads. Note:
is the same actual probability (of obtaining heads) as
of the previous section in this article.
*In statistics, the estimate of a proportion of a sample (denoted by ''p'') has a
standard error
The standard error (SE) of a statistic (usually an estimate of a parameter) is the standard deviation of its sampling distribution or an estimate of that standard deviation. If the statistic is the sample mean, it is called the standard error ...
given by:
:
where ''n'' is the number of trials (which was denoted by ''N'' in the previous section).
This standard error
function of ''p'' has a maximum at
. Further, in the case of a coin being tossed, it is likely that ''p'' will be not far from 0.5, so it is reasonable to take ''p''=0.5 in the following:
:
And hence the value of maximum error (E) is given by
:
Solving for the required number of coin tosses, ''n'',
:
Examples
1. If a maximum error of 0.01 is desired, how many times should the coin be tossed?
:
:
at 68.27% level of confidence (Z=1)
:
at 95.45% level of confidence (Z=2)
:
at 99.90% level of confidence (Z=3.3)
2. If the coin is tossed 10000 times, what is the maximum error of the estimator
on the value of
(the actual probability of obtaining heads in a coin toss)?
:
:
:
at 68.27% level of confidence (Z=1)
:
at 95.45% level of confidence (Z=2)
:
at 99.90% level of confidence (Z=3.3)
3. The coin is tossed 12000 times with a result of 5961 heads (and 6039 tails). What interval does the value of
(the true probability of obtaining heads) lie within if a confidence level of 99.999% is desired?
:
Now find the value of Z corresponding to 99.999% level of confidence.
:
Now calculate E
:
The interval which contains r is thus:
:
:
Other approaches
Other approaches to the question of checking whether a coin is fair are available using
decision theory
Decision theory (or the theory of choice; not to be confused with choice theory) is a branch of applied probability theory concerned with the theory of making decisions based on assigning probabilities to various factors and assigning numerical ...
, whose application would require the formulation of a
loss function
In mathematical optimization and decision theory, a loss function or cost function (sometimes also called an error function) is a function that maps an event or values of one or more variables onto a real number intuitively representing some "co ...
or
utility function
As a topic of economics, utility is used to model worth or value. Its usage has evolved significantly over time. The term was introduced initially as a measure of pleasure or happiness as part of the theory of utilitarianism by moral philosoph ...
which describes the consequences of making a given decision. An approach that avoids requiring either a loss function or a prior probability (as in the Bayesian approach) is that of "acceptance sampling".
[Cox, D.R., Hinkley, D.V. (1974) ''Theoretical Statistics'' (Example 11.7), Chapman & Hall. ]
Other applications
The above mathematical analysis for determining if a coin is fair can also be applied to other uses. For example:
* Determining the proportion of defective items for a product subjected to a particular (but well defined) condition. Sometimes a product can be very difficult or expensive to produce. Furthermore, if testing such products will result in their destruction, a minimum number of items should be tested. Using a similar analysis, the probability density function of the product defect rate can be found.
* Two party polling. If a small random sample poll is taken where there are only two mutually exclusive choices, then this is similar to tossing a single coin multiple times using a possibly biased coin. A similar analysis can therefore be applied to determine the confidence to be ascribed to the actual ratio of votes cast. (If people are allowed to
abstain then the analysis must take account of that, and the coin-flip analogy doesn't quite hold.)
* Determining the sex ratio in a large group of an animal species. Provided that a small random sample (i.e. small in comparison with the total population) is taken when performing the random sampling of the population, the analysis is similar to determining the probability of obtaining heads in a coin toss.
See also
*
Binomial test
In statistics, the binomial test is an exact test of the statistical significance of deviations from a theoretically expected distribution of observations into two categories using sample data.
Usage
The binomial test is useful to test hypothe ...
*
Coin flipping
Coin flipping, coin tossing, or heads or tails is the practice of throwing a coin in the air and checking which side is showing when it lands, in order to choose between two alternatives, heads or tails, sometimes used to resolve a dispute betwe ...
*
Confidence interval
In frequentist statistics, a confidence interval (CI) is a range of estimates for an unknown parameter. A confidence interval is computed at a designated ''confidence level''; the 95% confidence level is most common, but other levels, such as ...
*
Estimation theory
Estimation theory is a branch of statistics that deals with estimating the values of parameters based on measured empirical data that has a random component. The parameters describe an underlying physical setting in such a way that their val ...
*
Inferential statistics
Statistical inference is the process of using data analysis to infer properties of an underlying distribution of probability.Upton, G., Cook, I. (2008) ''Oxford Dictionary of Statistics'', OUP. . Inferential statistical analysis infers propertie ...
*
Loaded dice
Dice (singular die or dice) are small, throwable objects with marked sides that can rest in multiple positions. They are used for generating random values, commonly as part of tabletop games, including dice games, board games, role-playing g ...
*
Margin of error
The margin of error is a statistic expressing the amount of random sampling error in the results of a survey. The larger the margin of error, the less confidence one should have that a poll result would reflect the result of a census of the en ...
*
Point estimation
In statistics, point estimation involves the use of sample data to calculate a single value (known as a point estimate since it identifies a point in some parameter space) which is to serve as a "best guess" or "best estimate" of an unknown popu ...
*
Statistical randomness A numeric sequence is said to be statistically random when it contains no recognizable patterns or regularities; sequences such as the results of an ideal dice roll or the digits of π exhibit statistical randomness.
Statistical randomness does n ...
References
*Guttman, Wilks, and Hunter: ''Introductory Engineering Statistics'', John Wiley & Sons, Inc. (1971)
*Devinder Sivia: ''Data Analysis, a Bayesian Tutorial'', Oxford University Press (1996) {{ISBN, 0-19-851889-7
Statistical tests
Bayesian inference
Experiments
Coin flipping