In
survey methodology
Survey methodology is "the study of survey methods".
As a field of applied statistics concentrating on human-research surveys, survey methodology studies the sampling of individual units from a population and associated techniques of survey d ...
, one-dimensional systematic sampling is a
statistical method involving the selection of elements from an ordered
sampling frame
In statistics, a sampling frame is the source material or device from which a Sampling (statistics), sample is drawn. It is a list of all those within a Statistical population, population who can be sampled, and may include individuals, households ...
. The most common form of systematic sampling is an
equiprobability method. This applies in particular when the sampled units are individuals, households or corporations. When a geographic area is sampled for a
spatial analysis
Spatial analysis is any of the formal Scientific technique, techniques which study entities using their topological, geometric, or geographic properties, primarily used in Urban design, Urban Design. Spatial analysis includes a variety of techni ...
, bi-dimensional systematic sampling on an
area sampling frame can be applied.
In one-dimensional systematic sampling, progression through the list is treated circularly, with a return to the top once the list ends. The sampling starts by selecting an element from the list at random and then every ''k''
th element in the frame is selected, where ''k'', is the sampling interval (sometimes known as the ''skip''): this is calculated as:
:
where ''n'' is the sample size, and ''N'' is the population size.
Using this procedure each element in the
population
Population is a set of humans or other organisms in a given region or area. Governments conduct a census to quantify the resident population size within a given jurisdiction. The term is also applied to non-human animals, microorganisms, and pl ...
has a known and equal probability of selection (also known as epsem). This makes systematic sampling functionally similar to
simple random sampling
In statistics, a simple random sample (or SRS) is a subset of individuals (a sample (statistics), sample) chosen from a larger Set (mathematics), set (a statistical population, population) in which a subset of individuals are chosen randomization, ...
(SRS). However, it is not the same as SRS because not every possible sample of a certain size has an equal chance of being chosen (e.g. samples with at least two elements adjacent to each other will never be chosen by systematic sampling). It is, however, much more efficient (if the variance within a systematic sample is more than the variance of the population).
Systematic sampling is to be applied only if the given population is logically homogeneous, because systematic sample units are uniformly distributed over the population. The researcher must ensure that the chosen sampling interval does not hide a pattern. Any pattern would threaten randomness.
Example: Suppose a supermarket wants to study buying habits of their customers, then using systematic sampling they can choose every 10th or 15th customer entering the supermarket and conduct the study on this sample.
This is random sampling with a system. From the sampling frame, a starting point is chosen at random, and choices thereafter are at regular intervals. For example, suppose you want to sample 8 houses from a street of 120 houses. 120/8=15, so every 15th house is chosen after a random starting point between 1 and 15. If the random starting point is 11, then the houses selected are 11, 26, 41, 56, 71, 86, 101, and 116. As an aside, if every 15th house was a "corner house" then this corner pattern could destroy the randomness of the sample.
If, more frequently, the population is not evenly divisible (suppose you want to sample 8 houses out of 125, where 125/8=15.625), should you take every 15th house or every 16th house? If you take every 16th house, 8*16=128, there is a risk that the last house chosen does not exist. On the other hand, if you take every 15th house, 8*15=120, so the last five houses will never be selected. The random starting point should instead be selected as a non-integer between 0 and 15.625 (inclusive on one endpoint only) to ensure that every house has an equal chance of being selected; the interval should now be non-integral (15.625); and each non-integer selected should be rounded up to the next integer. If the random starting point is 3.6, then the houses selected are 4, 20, 35, 50, 66, 82, 98, and 113, where there are 3 cyclic intervals of 15 and 4 intervals of 16.
To illustrate the danger of systematic skip concealing a pattern, suppose we were to sample a planned neighborhood where each street has ten houses on each block. This places houses No. 1, 10, 11, 20, 21, 30... on block corners; corner blocks may be less valuable, since more of their area is taken up by street front etc. that is unavailable for building purposes. If we then sample every 10th household, our sample will either be made up ''only'' of corner houses (if we start at 1 or 10) or have ''no'' corner houses (any other start); either way, it will not be representative.
Systematic sampling may also be used with non-equal selection probabilities. In this case, rather than simply counting through elements of the population and selecting every ''k''
th unit, we allocate each element a space along a
number line
A number line is a graphical representation of a straight line that serves as spatial representation of numbers, usually graduated like a ruler with a particular origin point representing the number zero and evenly spaced marks in either dire ...
according to its selection probability. We then generate a random start from a uniform distribution between 0 and 1, and move along the number line in steps of 1.
Example: We have a population of 5 units (A to E). We want to give unit A a 20% probability of selection, unit B a 40% probability, and so on up to unit E (100%). Assuming we maintain alphabetical order, we allocate each unit to the following interval:
A: 0 to 0.2
B: 0.2 to 0.6 (= 0.2 + 0.4)
C: 0.6 to 1.2 (= 0.6 + 0.6)
D: 1.2 to 2.0 (= 1.2 + 0.8)
E: 2.0 to 3.0 (= 2.0 + 1.0)
If our random start was 0.156, we would first select the unit whose interval contains this number (i.e. A). Next, we would select the interval containing 1.156 (element C), then 2.156 (element E). If instead our random start was 0.350, we would select from points 0.350 (B), 1.350 (D), and 2.350 (E).
See also
*
Low-discrepancy sequence
In mathematics, a low-discrepancy sequence is a sequence with the property that for all values of N, its subsequence x_1, \ldots, x_N has a low discrepancy of a sequence, discrepancy.
Roughly speaking, the discrepancy of a sequence is low if the p ...
References
External links
TRSL – Template Range Sampling Libraryis a free-software and open-source C++ library that implements systematic sampling behind an (STL-like) iterator interface.
{{DEFAULTSORT:Systematic Sampling
Sampling techniques