In
statistics
Statistics (from German: '' Statistik'', "description of a state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a scientific, indust ...
, a binomial proportion confidence interval is a
confidence interval
In frequentist statistics, a confidence interval (CI) is a range of estimates for an unknown parameter. A confidence interval is computed at a designated ''confidence level''; the 95% confidence level is most common, but other levels, such as 9 ...
for the probability of success calculated from the outcome of a series of success–failure experiments (
Bernoulli trials
In the theory of probability and statistics, a Bernoulli trial (or binomial trial) is a random experiment with exactly two possible outcomes, "success" and "failure", in which the probability of success is the same every time the experiment is c ...
). In other words, a binomial proportion confidence interval is an interval estimate of a success probability ''p'' when only the number of experiments ''n'' and the number of successes ''n
S'' are known.
There are several formulas for a binomial confidence interval, but all of them rely on the assumption of a
binomial distribution
In probability theory and statistics, the binomial distribution with parameters ''n'' and ''p'' is the discrete probability distribution of the number of successes in a sequence of ''n'' independent experiments, each asking a yes–no ques ...
. In general, a binomial distribution applies when an experiment is repeated a fixed number of times, each trial of the experiment has two possible outcomes (success and failure), the probability of success is the same for each trial, and the trials are
statistically independent
Independence is a fundamental notion in probability theory, as in statistics and the theory of stochastic processes. Two events are independent, statistically independent, or stochastically independent if, informally speaking, the occurrence of o ...
. Because the binomial distribution is a
discrete probability distribution
In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment. It is a mathematical description of a random phenomenon ...
(i.e., not continuous) and difficult to calculate for large numbers of trials, a variety of approximations are used to calculate this confidence interval, all with their own tradeoffs in accuracy and computational intensity.
A simple example of a binomial distribution is the set of various possible outcomes, and their probabilities, for the number of heads observed when a
coin is flipped ten times. The observed binomial proportion is the fraction of the flips that turn out to be heads. Given this observed proportion, the confidence interval for the true probability of the coin landing on heads is a range of possible proportions, which may or may not contain the true proportion. A 95% confidence interval for the proportion, for instance, will contain the true proportion 95% of the times that the procedure for constructing the confidence interval is employed.
Normal approximation interval or Wald interval
A commonly used formula for a binomial confidence interval relies on approximating the distribution of error about a binomially-distributed observation,
, with a
normal distribution
In statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is
:
f(x) = \frac e^
The parameter \mu ...
.
This approximation is based on the
central limit theorem
In probability theory, the central limit theorem (CLT) establishes that, in many situations, when independent random variables are summed up, their properly normalized sum tends toward a normal distribution even if the original variables themsel ...
and is unreliable when the sample size is small or the success probability is close to 0 or 1.
Using the normal approximation, the success probability ''p'' is estimated as
:
or the equivalent
:
where
is the proportion of successes in a
Bernoulli trial
In the theory of probability and statistics, a Bernoulli trial (or binomial trial) is a random experiment with exactly two possible outcomes, "success" and "failure", in which the probability of success is the same every time the experiment is c ...
process, measured with
trials yielding
successes and
failures, and
is the
quantile
In statistics and probability, quantiles are cut points dividing the range of a probability distribution into continuous intervals with equal probabilities, or dividing the observations in a sample in the same way. There is one fewer quantile th ...
of a
standard normal distribution
In statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is
:
f(x) = \frac e^
The parameter \mu ...
(i.e., the
probit
In probability theory and statistics, the probit function is the quantile function associated with the standard normal distribution. It has applications in data analysis and machine learning, in particular exploratory statistical graphics and s ...
) corresponding to the target error rate
. For a 95% confidence level, the error
, so
and
.
An important theoretical derivation of this confidence interval involves the inversion of a hypothesis test. Under this formulation, the confidence interval represents those values of the population parameter that would have large ''p''-values if they were tested as a hypothesized
population proportion
In statistics, a population proportion, generally denoted by P or the Greek letter \pi, is a parameter that describes a percentage value associated with a population. For example, the 2010 United States Census showed that 83.7% of the American po ...
. The collection of values,
, for which the normal approximation is valid can be represented as
:
where
is the
quantile
In statistics and probability, quantiles are cut points dividing the range of a probability distribution into continuous intervals with equal probabilities, or dividing the observations in a sample in the same way. There is one fewer quantile th ...
of a
standard normal distribution
In statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is
:
f(x) = \frac e^
The parameter \mu ...
.
Since the test in the middle of the inequality is a
Wald test
In statistics, the Wald test (named after Abraham Wald) assesses constraints on statistical parameters based on the weighted distance between the unrestricted estimate and its hypothesized value under the null hypothesis, where the weight is the ...
, the normal approximation interval is sometimes called the Wald interval or Wald method, after
Abraham Wald
Abraham Wald (; hu, Wald Ábrahám, yi, אברהם וואַלד; – ) was a Jewish Hungarian mathematician who contributed to decision theory, geometry, and econometrics and founded the field of statistical sequential analysis. One of ...
, but it was first described by
Pierre-Simon Laplace
Pierre-Simon, marquis de Laplace (; ; 23 March 1749 – 5 March 1827) was a French scholar and polymath whose work was important to the development of engineering, mathematics, statistics, physics, astronomy, and philosophy. He summarized ...
in 1812.
Bracketing the confidence interval
Extending the normal approximation and Wald-Laplace interval concepts,
Michael Short has shown that inequalities on the approximation error between the binomial distribution and the normal distribution can be used to accurately bracket the estimate of the confidence interval around
:
:
where
is again the (unknown) proportion of successes in a Bernoulli trial process, measured with
trials yielding
successes,
is the
quantile of a standard normal distribution (i.e., the probit) corresponding to the target error rate
, and the constants
and
are simple algebraic functions of
.
For a fixed
(and hence
), the above inequalities give easily computed one- or two-sided intervals which bracket the exact binomial upper and lower confidence limits corresponding to the error rate
.
Standard error of a proportion estimation when using weighted data
Let there be a simple random sample
where each
is
i.i.d from a
Bernoulli Bernoulli can refer to:
People
*Bernoulli family of 17th and 18th century Swiss mathematicians:
** Daniel Bernoulli (1700–1782), developer of Bernoulli's principle
**Jacob Bernoulli (1654–1705), also known as Jacques, after whom Bernoulli numbe ...
(p) distribution and weight
is the weight for each observation. Standardize the (positive) weights
so they sum to 1. The
weighted sample proportion is:
. Since the
are independent and each one has variance
, the sampling variance of the proportion therefore is:
:
.
The standard error of
is the square root of this quantity. Because we do not know
, we have to estimate it. Although there are many possible estimators, a conventional one is to use
, the sample mean, and plug this into the formula. That gives:
:
For unweighted data,
, giving
. The SE becomes
, leading to the familiar formulas, showing that the calculation for weighted data is a direct generalization of them.
Wilson score interval
The Wilson score interval is an improvement over the normal approximation interval in multiple respects. It was developed by
Edwin Bidwell Wilson
Edwin Bidwell Wilson (April 25, 1879 – December 28, 1964) was an American mathematician, statistician, physicist and general polymath. He was the sole protégé of Yale University physicist Josiah Willard Gibbs and was mentor to MIT economist ...
(1927).
[
] Unlike the symmetric normal approximation interval (above), the Wilson score interval is asymmetric. It does not suffer from problems of ''overshoot'' and ''zero-width intervals'' that afflict the normal interval, and it may be safely employed with small samples and skewed observations.
The observed
coverage probability
In statistics, the coverage probability is a technique for calculating a confidence interval which is the proportion of the time that the interval contains the true value of interest. For example, suppose our interest is in the mean number of mon ...
is consistently closer to the nominal value,
.
Like the normal interval, the interval can be computed directly from a formula.
Wilson started with the normal approximation to the binomial:
:
with the analytic formula for the sample standard deviation given by
Combining the two, and squaring out the radical, gives an equation that is quadratic in :
:
Transforming the relation into a standard-form quadratic equation for , treating
and as known values from the sample (see prior section), and using the value of that corresponds to the desired confidence for the estimate of gives this:
where all of the values in parentheses are known quantities.
The solution for estimates the upper and lower limits of the confidence interval for . Hence the probability of success is estimated by
:
or the equivalent
:
The practical observation from using this interval is that it has good properties even for a small number of trials and / or an extreme probability.
Intuitively, the center value of this interval is the weighted average of
and
, with
receiving greater weight as the sample size increases. Formally, the center value corresponds to using a
pseudocount
In statistics, additive smoothing, also called Laplace smoothing or Lidstone smoothing, is a technique used to smooth categorical data. Given a set of observation counts \textstyle from a \textstyle -dimensional multinomial distribution wit ...
of , the number of standard deviations of the confidence interval: add this number to both the count of successes and of failures to yield the estimate of the ratio. For the common two standard deviations in each direction interval (approximately 95% coverage, which itself is approximately 1.96 standard deviations), this yields the estimate
, which is known as the "plus four rule".
Although the quadratic can be solved explicitly, in most cases Wilson's equations can also be solved numerically using the fixed-point iteration
:
with
.
The Wilson interval can also be derived from the
single sample z-test or
Pearson's chi-squared test
Pearson's chi-squared test (\chi^2) is a statistical test applied to sets of categorical data to evaluate how likely it is that any observed difference between the sets arose by chance. It is the most widely used of many chi-squared tests (e.g., ...
with two categories. The resulting interval,
:
can then be solved for
to produce the Wilson score interval. The test in the middle of the inequality is a
score test
In statistics, the score test assesses constraints on statistical parameters based on the gradient of the likelihood function—known as the ''score''—evaluated at the hypothesized parameter value under the null hypothesis. Intuitively, if the ...
.
The interval equality principle
Since the interval is derived by solving from the normal approximation to the binomial, the Wilson score interval
has the property of being guaranteed to obtain the same result as the equivalent
z-test
A ''Z''-test is any statistical test for which the distribution of the test statistic under the null hypothesis can be approximated by a normal distribution. Z-tests test the mean of a distribution. For each significance level in the confidence ...
or
chi-squared test
A chi-squared test (also chi-square or test) is a statistical hypothesis test used in the analysis of contingency tables when the sample sizes are large. In simpler terms, this test is primarily used to examine whether two categorical variable ...
.
This property can be visualised by plotting the
probability density function
In probability theory, a probability density function (PDF), or density of a continuous random variable, is a function whose value at any given sample (or point) in the sample space (the set of possible values taken by the random variable) can ...
for the Wilson score interval (see Wallis 2021: 297-313)
[
] and then plotting a normal pdf at each bound. The tail areas of the resulting Wilson and normal distributions, representing the chance of a significant result in that direction, must be equal.
The continuity-corrected Wilson score interval and the
Clopper-Pearson interval
In statistics, a binomial proportion confidence interval is a confidence interval for the probability of success calculated from the outcome of a series of success–failure experiments (Bernoulli trial, Bernoulli trials). In other words, a binomia ...
are also compliant with this property. The practical import is that these intervals may be employed as
significance tests, with identical results to the source test, and new tests may be derived by geometry.
[
]
Wilson score interval with continuity correction
The Wilson interval may be modified by employing a continuity correction In probability theory, a continuity correction is an adjustment that is made when a discrete distribution is approximated by a continuous distribution.
Examples
Binomial
If a random variable ''X'' has a binomial distribution with parameters ' ...
, in order to align the minimum coverage probability
In statistics, the coverage probability is a technique for calculating a confidence interval which is the proportion of the time that the interval contains the true value of interest. For example, suppose our interest is in the mean number of mon ...
, rather than the average coverage probability, with the nominal value, .
Just as the Wilson interval mirrors Pearson's chi-squared test
Pearson's chi-squared test (\chi^2) is a statistical test applied to sets of categorical data to evaluate how likely it is that any observed difference between the sets arose by chance. It is the most widely used of many chi-squared tests (e.g., ...
, the Wilson interval with continuity correction mirrors the equivalent Yates' chi-squared test.
The following formulae for the lower and upper bounds of the Wilson score interval with continuity correction are derived from Newcombe (1998).[
:
However, if ''p'' = 0, must be taken as 0; if ''p'' = 1, is then 1.
Wallis (2021)][ identifies a simpler method for computing continuity-corrected Wilson intervals that employs functions. For the lower bound, let , where is the selected error level for . Then
. This method has the advantage of being further decomposable.
]
Jeffreys interval
The ''Jeffreys interval'' has a Bayesian derivation, but it has good frequentist properties. In particular, it has coverage properties that are similar to those of the Wilson interval, but it is one of the few intervals with the advantage of being ''equal-tailed'' (e.g., for a 95% confidence interval, the probabilities of the interval lying above or below the true value are both close to 2.5%). In contrast, the Wilson interval has a systematic bias such that it is centred too close to ''p'' = 0.5.
The Jeffreys interval is the Bayesian credible interval
In Bayesian statistics, a credible interval is an interval within which an unobserved parameter value falls with a particular probability. It is an interval in the domain of a posterior probability distribution or a predictive distribution. The ...
obtained when using the non-informative Jeffreys prior
In Bayesian probability, the Jeffreys prior, named after Sir Harold Jeffreys, is a non-informative (objective) prior distribution for a parameter space; its density function is proportional to the square root of the determinant of the Fisher infor ...
for the binomial proportion . The Jeffreys prior for this problem is a Beta distribution
In probability theory and statistics, the beta distribution is a family of continuous probability distributions defined on the interval , 1in terms of two positive parameters, denoted by ''alpha'' (''α'') and ''beta'' (''β''), that appear as ...
with parameters , it is a conjugate prior
In Bayesian probability theory, if the posterior distribution p(\theta \mid x) is in the same probability distribution family as the prior probability distribution p(\theta), the prior and posterior are then called conjugate distributions, and th ...
. After observing successes in trials, the posterior distribution
The posterior probability is a type of conditional probability that results from updating the prior probability with information summarized by the likelihood via an application of Bayes' rule. From an epistemological perspective, the posterior p ...
for is a Beta distribution with parameters .
When and , the Jeffreys interval is taken to be the equal-tailed posterior probability interval, i.e., the and quantiles of a Beta distribution with parameters . These quantiles need to be computed numerically, although this is reasonably simple with modern statistical software.
In order to avoid the coverage probability tending to zero when or , when the upper limit is calculated as before but the lower limit is set to 0, and when the lower limit is calculated as before but the upper limit is set to 1.[
]
Clopper–Pearson interval
The Clopper–Pearson interval is an early and very common method for calculating binomial confidence intervals. This is often called an 'exact' method, because it is based on the cumulative probabilities of the binomial distribution (i.e., exactly the correct distribution rather than an approximation). However, in cases where we know the population size, the intervals may not be the smallest possible. For instance, for a population of size 20 with true proportion of 50%, Clopper–Pearson gives .272, 0.728 which has width 0.456 (and where bounds are 0.0280 away from the "next achievable values" of 6/20 and 14/20); whereas Wilson's gives .299, 0.701 which has width 0.401 (and is 0.0007 away from the next achievable values).
The Clopper–Pearson interval can be written as
:
or equivalently,
:
with
:
where 0 ≤ ''x'' ≤ ''n'' is the number of successes observed in the sample and Bin(''n''; ''θ'') is a binomial random variable with ''n'' trials and probability of success ''θ''.
Equivalently we can say that the Clopper–Pearson interval is with confidence level if is the infimum of those such that the following tests of hypothesis succeed with significance :
# H0: with HA:
# H0: with HA: .
Because of a relationship between the binomial distribution and the beta distribution
In probability theory and statistics, the beta distribution is a family of continuous probability distributions defined on the interval , 1in terms of two positive parameters, denoted by ''alpha'' (''α'') and ''beta'' (''β''), that appear as ...
, the Clopper–Pearson interval is sometimes presented in an alternate format that uses quantiles from the beta distribution.
:
where ''x'' is the number of successes, ''n'' is the number of trials, and ''B''(''p''; ''v'',''w'') is the ''p''th quantile
In statistics and probability, quantiles are cut points dividing the range of a probability distribution into continuous intervals with equal probabilities, or dividing the observations in a sample in the same way. There is one fewer quantile th ...
from a beta distribution with shape parameters ''v'' and ''w''.
Thus, , where:
:
:
The binomial proportion confidence interval is then , as follows from the relation between the Binomial distribution cumulative distribution function and the regularized incomplete beta function
In mathematics, the beta function, also called the Euler integral of the first kind, is a special function that is closely related to the gamma function and to binomial coefficients. It is defined by the integral
: \Beta(z_1,z_2) = \int_0^1 t^(1 ...
.
When is either or , closed-form expressions for the interval bounds are available: when the interval is and when it is .
The beta distribution is, in turn, related to the F-distribution
In probability theory and statistics, the ''F''-distribution or F-ratio, also known as Snedecor's ''F'' distribution or the Fisher–Snedecor distribution (after Ronald Fisher and George W. Snedecor) is a continuous probability distribution th ...
so a third formulation of the Clopper–Pearson interval can be written using F quantiles:
:
where ''x'' is the number of successes, ''n'' is the number of trials, and ''F''(''c''; ''d''1, ''d''2) is the ''c'' quantile from an F-distribution with ''d''1 and ''d''2 degrees of freedom.
The Clopper–Pearson interval is an exact interval since it is based directly on the binomial distribution rather than any approximation to the binomial distribution. This interval never has less than the nominal coverage for any population proportion, but that means that it is usually conservative. For example, the true coverage rate of a 95% Clopper–Pearson interval may be well above 95%, depending on ''n'' and ''θ''. Thus the interval may be wider than it needs to be to achieve 95% confidence. In contrast, it is worth noting that other confidence bounds may be narrower than their nominal confidence width, i.e., the normal approximation (or "standard") interval, Wilson interval,[ Agresti–Coull interval,][
] etc., with a nominal coverage of 95% may in fact cover less than 95%.[
The definition of the Clopper–Pearson interval can also be modified to obtain exact confidence intervals for different distributions. For instance, it can also be applied to the case where the samples are drawn without replacement from a population of a known size, instead of repeated draws of a binomial distribution. In this case, the underlying distribution would be the ]hypergeometric distribution
In probability theory and statistics, the hypergeometric distribution is a discrete probability distribution that describes the probability of k successes (random draws for which the object drawn has a specified feature) in n draws, ''without'' ...
.
The interval boundaries are easily computed with numerical methods functions lik
qbeta
in R an
in Python.
from scipy.stats import beta
k = 20
n = 400
alpha = 0.05
p_u, p_o = beta.ppf( lpha/2, 1 - alpha/2 , k + 1 - k + 1, n - k
Agresti–Coull interval
The Agresti–Coull interval is also another approximate binomial confidence interval.[
Given successes in trials, define
:
and
:
Then, a confidence interval for is given by
:
where is the quantile of a standard normal distribution, as before (for example, a 95% confidence interval requires , thereby producing ). According to ]Brown
Brown is a color. It can be considered a composite color, but it is mainly a darker shade of orange. In the CMYK color model used in printing or painting, brown is usually made by combining the colors orange and black. In the RGB color model used ...
, Cai, and DasGupta, taking instead of 1.96 produces the "add 2 successes and 2 failures" interval previously described by Agresti and Coull.[
This interval can be summarised as employing the centre-point adjustment, , of the Wilson score interval, and then applying the Normal approximation to this point.]
:
Arcsine transformation
The arcsine transformation has the effect of pulling out the ends of the distribution. While it can stabilize the variance (and thus confidence intervals) of proportion data, its use has been criticized in several contexts.
Let ''X'' be the number of successes in ''n'' trials and let ''p'' = ''X''/''n''. The variance of ''p'' is
:
Using the arc sine transform the variance of the arcsine of ''p''1/2 is[Shao J (1998) Mathematical statistics. Springer. New York, New York, USA]
:
So, the confidence interval itself has the following form:
:
where is the quantile of a standard normal distribution.
This method may be used to estimate the variance of ''p'' but its use is problematic when ''p'' is close to 0 or 1.
''t''''a'' transform
Let ''p'' be the proportion of successes. For 0 ≤ ''a'' ≤ 2,
:
This family is a generalisation of the logit transform which is a special case with ''a'' = 1 and can be used to transform a proportional data distribution to an approximately normal distribution
In statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is
:
f(x) = \frac e^
The parameter \mu ...
. The parameter ''a'' has to be estimated for the data set.
Rule of three — for when no successes are observed
The rule of three Rule of three or Rule of Thirds may refer to:
Science and technology
*Rule of three (aeronautics), a rule of descent in aviation
*Rule of three (C++ programming), a rule of thumb about class method definitions
* Rule of three (computer programming ...
is used to provide a simple way of stating an approximate 95% confidence interval for ''p'', in the special case that no successes () have been observed.[Steve Simon (2010]
"Confidence interval with zero events"
The Children's Mercy Hospital, Kansas City, Mo. (website: "Ask Professor Mean a
Stats topics or Medical Research
) The interval is .
By symmetry, one could expect for only successes (), the interval is .
Comparison and discussion
There are several research papers that compare these and other confidence intervals for the binomial proportion.[ Both Agresti and Coull (1998)][ and Ross (2003)][ point out that exact methods such as the Clopper–Pearson interval may not work as well as certain approximations. The Normal approximation interval and its presentation in textbooks has been heavily criticised, with many statisticians advocating that it be not used.] The principal problems are ''overshoot'' (bounds exceed , 1
The comma is a punctuation mark that appears in several variants in different languages. It has the same shape as an apostrophe or single closing quotation mark () in many typefaces, but it differs from them in being placed on the baseline (t ...
, ''zero-width intervals'' at = 0 and 1 (falsely implying certainty),[ and overall inconsistency with significance testing.][
Of the approximations listed above, Wilson score interval methods (with or without continuity correction) have been shown to be the most accurate and the most robust,][ though some prefer the Agresti–Coull approach for larger sample sizes.] Wilson and Clopper–Pearson methods obtain consistent results with source significance tests,[ and this property is decisive for many researchers.
Many of these intervals can be calculated in R using packages lik]
"binom"
See also
* Binomial distribution#Confidence intervals
* Estimation theory
Estimation theory is a branch of statistics that deals with estimating the values of parameters based on measured empirical data that has a random component. The parameters describe an underlying physical setting in such a way that their valu ...
* Pseudocount
In statistics, additive smoothing, also called Laplace smoothing or Lidstone smoothing, is a technique used to smooth categorical data. Given a set of observation counts \textstyle from a \textstyle -dimensional multinomial distribution wit ...
References
{{DEFAULTSORT:Binomial Proportion Confidence Interval
Statistical approximations
Statistical intervals