In the theory of finite population sampling, Bernoulli sampling is a sampling process where each element of the

population Population typically refers to the number of people in a single area, whether it be a city or town, region, country, continent, or the world. Governments typically quantify the size of the resident population within their jurisdiction usi ...

is subjected to an

independent Independent or Independents may refer to: Arts, entertainment, and media Artist groups * Independents (artist group), a group of modernist painters based in the New Hope, Pennsylvania, area of the United States during the early 1930s * Independe ...

Bernoulli trial In the theory of probability and statistics, a Bernoulli trial (or binomial trial) is a random experiment with exactly two possible outcomes, "success" and "failure", in which the probability of success is the same every time the experiment is c ...

which determines whether the element becomes part of the sample. An essential property of Bernoulli sampling is that all elements of the population have equal probability of being included in the sample. Bernoulli sampling is therefore a special case of Poisson sampling. In Poisson sampling each element of the population may have a different probability of being included in the sample. In Bernoulli sampling, the probability is equal for all the elements. Because each element of the population is considered separately for the sample, the sample size is not fixed but rather follows a

binomial distribution In probability theory and statistics, the binomial distribution with parameters ''n'' and ''p'' is the discrete probability distribution of the number of successes in a sequence of ''n'' independent experiments, each asking a yes–no ques ...

Example

The most basic Bernoulli method generates ''n'' random variates to extract a sample from a population of ''n'' items. Suppose you want to extract a given percentage ''pct'' of the population. The algorithm can be described as follows: for each item in the set generate a random non-negative integer R if (R mod 100) < pct then select item Scaled binomial distribution

A percentage of 20%, say, is usually expressed as a probability ''p''=0.2. In that case, random variates are generated in the unit interval. After running the algorithm, a sample of size ''k'' will have been selected. One would expect to have

k \approx n \cdot p

, which is more and more likely as ''n'' grows. In fact, It is possible to calculate the probability of obtaining a sample size of ''k'' by the

Binomial distribution In probability theory and statistics, the binomial distribution with parameters ''n'' and ''p'' is the discrete probability distribution of the number of successes in a sequence of ''n'' independent experiments, each asking a yes–no ques ...

f(k,n,p) = \binomp^k(1-p)^

On the left this function is shown for four values of

n

and

p=0.2

. In order to compare the values for different values of

n

, the

k

's in abscissa are scaled from

\left, n\right /math> to the unit interval, while the value of the function, in ordinate, is multiplied by the inverse, so that the area under the graph maintains the same value —that area is related to the corresponding cumulative distribution function.  The values are shown in logarithmic scale.

On the right the minimum values of

n

that satisfy given error bounds with 95% probability. Given an error, the set of

k

's within bounds can be described as follows:

K_ = \left\

The probability to end up within

K

is given again by the binomial distribution as:

\sum_ f(k, n, p).

The picture shows the lowest values of

n

such that the sum is at least 0.95. For

p = 0.0

and

p = 1.00

the algorithm delivers exact results for all

n

's. The

p

's in between are obtained by

bisection In geometry, bisection is the division of something into two equal or congruent parts, usually by a line, which is then called a ''bisector''. The most often considered types of bisectors are the ''segment bisector'' (a line that passes throug ...

. Note that, if

100 \cdot p

is an integer percentage,

\mathrm = 0.005

, guarantees that

100 \cdot k/n = 100 \cdot p

. Values as high as

n = 38400

can be required for such an exact match.

References

{{reflist

External links

Faster Random Samples With Gap Sampling
Sampling techniques

Example

See also

References

External links