In
statistics
Statistics (from German language, German: ''wikt:Statistik#German, Statistik'', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of ...
, a simple random sample (or SRS) is a
subset
In mathematics, Set (mathematics), set ''A'' is a subset of a set ''B'' if all Element (mathematics), elements of ''A'' are also elements of ''B''; ''B'' is then a superset of ''A''. It is possible for ''A'' and ''B'' to be equal; if they are ...
of
individuals
An individual is that which exists as a distinct entity. Individuality (or self-hood) is the state or quality of being an individual; particularly (in the case of humans) of being a person unique from other people and possessing one's own need ...
(a
sample
Sample or samples may refer to:
Base meaning
* Sample (statistics), a subset of a population – complete data set
* Sample (signal), a digital discrete sample of a continuous analog signal
* Sample (material), a specimen or small quantity of s ...
) chosen from a larger
set
Set, The Set, SET or SETS may refer to:
Science, technology, and mathematics Mathematics
*Set (mathematics), a collection of elements
*Category of sets, the category whose objects and morphisms are sets and total functions, respectively
Electro ...
(a
population
Population typically refers to the number of people in a single area, whether it be a city or town, region, country, continent, or the world. Governments typically quantify the size of the resident population within their jurisdiction using a ...
) in which a subset of individuals are chosen
randomly
In common usage, randomness is the apparent or actual lack of pattern or predictability in events. A random sequence of events, symbols or steps often has no :wikt:order, order and does not follow an intelligible pattern or combination. Ind ...
, all with the same probability. It is a process of selecting a sample in a random way. In SRS, each subset of ''k'' individuals has the same probability of being chosen for the sample as any other subset of ''k'' individuals. A simple random sample is an unbiased sampling technique. Simple random sampling is a basic type of sampling and can be a component of other more complex sampling methods.
Introduction
The principle of simple random sampling is that every set of items has the same probability of being chosen. For example, suppose ''N'' college students want to get a ticket for a basketball game, but there are only ''X'' < ''N'' tickets for them, so they decide to have a fair way to see who gets to go. Then, everybody is given a number in the range from 0 to ''N''-1, and random numbers are generated, either electronically or from a table of random numbers. Numbers outside the range from 0 to ''N''-1 are ignored, as are any numbers previously selected. The first ''X'' numbers would identify the lucky ticket winners.
In small populations and often in large ones, such sampling is typically done "without replacement", i.e., one deliberately avoids choosing any member of the population more than once. Although simple random sampling can be conducted with replacement instead, this is less common and would normally be described more fully as simple random sampling with replacement.
Sampling done without replacement is no longer independent, but still satisfies
exchangeability In statistics, an exchangeable sequence of random variables (also sometimes interchangeable) is a sequence ''X''1, ''X''2, ''X''3, ... (which may be finitely or infinitely long) whose joint probability distribution does not change whe ...
, hence many results still hold. Further, for a small sample from a large population, sampling without replacement is approximately the same as sampling with replacement, since the probability of choosing the same individual twice is low.
An unbiased random selection of individuals is important so that if many samples were drawn, the average sample would accurately represent the population. However, this does not guarantee that a particular sample is a perfect representation of the population. Simple random sampling merely allows one to draw externally valid conclusions about the entire population based on the sample.
Conceptually, simple random sampling is the simplest of the probability sampling techniques. It requires a complete
sampling frame In statistics, a sampling frame is the source material or device from which a sample is drawn. It is a list of all those within a population who can be sampled, and may include individuals, households or institutions.
Importance of the sampling fra ...
, which may not be available or feasible to construct for large populations. Even if a complete frame is available, more efficient approaches may be possible if other useful information is available about the units in the population.
Advantages are that it is free of classification error, and it requires minimum advance knowledge of the population other than the frame. Its simplicity also makes it relatively easy to interpret data collected in this manner. For these reasons, simple random sampling best suits situations where not much information is available about the population and data collection can be efficiently conducted on randomly distributed items, or where the cost of sampling is small enough to make efficiency less important than simplicity. If these conditions do not hold,
stratified sampling
In statistics, stratified sampling is a method of sampling from a population which can be partitioned into subpopulations.
In statistical surveys, when subpopulations within an overall population vary, it could be advantageous to sample each s ...
or
cluster sampling
In statistics, cluster sampling is a sampling plan used when mutually homogeneous yet internally heterogeneous groupings are evident in a statistical population. It is often used in marketing research.
In this sampling plan, the total populat ...
may be a better choice.
Relationship between simple random sample and other methods
Equal probability sampling (epsem)
A sampling method for which each individual unit has the same chance of being selected is called equal probability sampling (epsem for short).
Using a simple random sample will always lead to an epsem, but not all epsem samples are SRS. For example, if a teacher has a class arranged in 5 rows of 6 columns and she wants to take a random sample of 5 students she might pick one of the 6 columns at random. This would be an epsem sample but not all subsets of 5 pupils are equally likely here, as only the subsets that are arranged as a single column are eligible for selection. There are also ways of constructing
multistage sampling
In statistics, multistage sampling is the taking of samples in stages using smaller and smaller sampling units at each stage.
Multistage sampling can be a complex form of cluster sampling because it is a type of sampling which involves dividing ...
, that are not srs, while the final sample will be epsem. For example,
systematic random sampling produces a sample for which each individual unit has the same probability of inclusion, but different sets of units have different probabilities of being selected.
Samples that are epsem are self weighting, meaning that the inverse of selection probability for each sample is equal.
Distinction between a systematic random sample and a simple random sample
Consider a school with 1000 students, and suppose that a researcher wants to select 100 of them for further study. All their names might be put in a bucket and then 100 names might be pulled out. Not only does each person have an equal chance of being selected, we can also easily calculate the probability (''P'') of a given person being chosen, since we know the sample size (''n'') and the population (''N''):
1. In the case that any given person can only be selected once (i.e., after selection a person is removed from the selection pool):
:
2. In the case that any selected person is returned to the selection pool (i.e., can be picked more than once):
:
This means that every student in the school has in any case approximately a 1 in 10 chance of being selected using this method. Further, any combination of 100 students has the same probability of selection.
If a systematic pattern is introduced into random sampling, it is referred to as "systematic (random) sampling". An example would be if the students in the school had numbers attached to their names ranging from 0001 to 1000, and we chose a random starting point, e.g. 0533, and then picked every 10th name thereafter to give us our sample of 100 (starting over with 0003 after reaching 0993). In this sense, this technique is similar to cluster sampling, since the choice of the first unit will determine the remainder. This is no longer simple random sampling, because some combinations of 100 students have a larger selection probability than others – for instance, has a 1/10 chance of selection, while cannot be selected under this method.
Sampling a dichotomous population
If the members of the population come in three kinds, say "blue" "red" and "black", the number of red elements in a sample of given size will vary by sample and hence is a random variable whose distribution can be studied. That distribution depends on the numbers of red and black elements in the full population. For a simple random sample ''with'' replacement, the distribution is a ''
binomial distribution
In probability theory and statistics, the binomial distribution with parameters ''n'' and ''p'' is the discrete probability distribution of the number of successes in a sequence of ''n'' independent experiments, each asking a yes–no quest ...
''. For a simple random sample ''without'' replacement, one obtains a ''
hypergeometric distribution
In probability theory and statistics, the hypergeometric distribution is a discrete probability distribution that describes the probability of k successes (random draws for which the object drawn has a specified feature) in n draws, ''without'' ...
''.
Algorithms
Several efficient algorithms for simple random sampling have been developed. A naive algorithm is the draw-by-draw algorithm where at each step we remove the item at that step from the set with equal probability and put the item in the sample. We continue until we have sample of desired size
. The drawback of this method is that it requires random access in the set.
The selection-rejection algorithm developed by Fan et al. in 1962 requires a single pass over data; however, it is a sequential algorithm and requires knowledge of total count of items
, which is not available in streaming scenarios.
A very simple random sort algorithm was proved by Sunter in 1977. The algorithm simply assigns a random number drawn from uniform distribution
as a key to each item, then sorts all items using the key and selects the smallest
items.
J. Vitter in 1985 proposed
reservoir sampling
Reservoir sampling is a family of randomized algorithms for choosing a simple random sample, without replacement, of items from a population of unknown size in a single pass over the items. The size of the population is not known to the algorit ...
algorithms, which are widely used. This algorithm does not require knowledge of the size of the population
in advance, and uses constant space.
Random sampling can also be accelerated by sampling from the distribution of gaps between samples
and skipping over the gaps.
See also
*
Multistage sampling
In statistics, multistage sampling is the taking of samples in stages using smaller and smaller sampling units at each stage.
Multistage sampling can be a complex form of cluster sampling because it is a type of sampling which involves dividing ...
*
Nonprobability sampling Sampling is the use of a subset of the population to represent the whole population or to inform about (social) processes that are meaningful beyond the particular cases, individuals or sites studied. Probability sampling, or random sampling, is a ...
*
Opinion poll
An opinion poll, often simply referred to as a survey or a poll (although strictly a poll is an actual election) is a human research survey of public opinion from a particular sample. Opinion polls are usually designed to represent the opinions ...
*
Quantitative marketing research
Quantitative marketing research is the application of quantitative research techniques to the field of marketing research. It has roots in both the positivist view of the world, and the modern marketing viewpoint that marketing is an interactive ...
*
Sampling design
Sampling may refer to:
*Sampling (signal processing), converting a continuous signal into a discrete signal
* Sampling (graphics), converting continuous colors into discrete color components
*Sampling (music), the reuse of a sound recording in ano ...
*
Bernoulli sampling In the theory of finite population sampling, Bernoulli sampling is a sampling process where each element of the population is subjected to an independent Bernoulli trial which determines whether the element becomes part of the sample. An essential p ...
*
Poisson sampling
In survey methodology, Poisson sampling (sometimes denoted as ''PO sampling'') is a sampling process where each element of the population is subjected to an independent Bernoulli trial which determines whether the element becomes part of the sampl ...
References
the
External links
*
{{Social surveys
Sampling techniques