Sample size determination or estimation is the act of choosing the number of observations or
replicates to include in a
statistical sample
In this statistics, quality assurance, and survey methodology, sampling is the selection of a subset or a statistical sample (termed sample for short) of individuals from within a statistical population to estimate characteristics of the whole ...
. The sample size is an important feature of any empirical study in which the goal is to make
inferences about a
population
Population is a set of humans or other organisms in a given region or area. Governments conduct a census to quantify the resident population size within a given jurisdiction. The term is also applied to non-human animals, microorganisms, and pl ...
from a sample. In practice, the sample size used in a study is usually determined based on the cost, time, or convenience of collecting the data, and the need for it to offer sufficient
statistical power
In frequentist statistics, power is the probability of detecting a given effect (if that effect actually exists) using a given test in a given context. In typical use, it is a function of the specific test that is used (including the choice of tes ...
. In complex studies, different sample sizes may be allocated, such as in stratified surveys or experimental designs with multiple treatment groups. In a
census
A census (from Latin ''censere'', 'to assess') is the procedure of systematically acquiring, recording, and calculating population information about the members of a given Statistical population, population, usually displayed in the form of stati ...
, data is sought for an entire population, hence the intended sample size is equal to the population. In
experimental design, where a study may be divided into different
treatment groups, there may be different sample sizes for each group.
Sample sizes may be chosen in several ways:
*using experience – small samples, though sometimes unavoidable, can result in wide
confidence intervals and risk of errors in
statistical hypothesis testing
A statistical hypothesis test is a method of statistical inference used to decide whether the data provide sufficient evidence to reject a particular hypothesis. A statistical hypothesis test typically involves a calculation of a test statistic. T ...
.
*using a target variance for an estimate to be derived from the sample eventually obtained, i.e., if a high precision is required (narrow confidence interval) this translates to a low target variance of the estimator.
*the use of a power target, i.e. the power of
statistical test
A statistical hypothesis test is a method of statistical inference used to decide whether the data provide sufficient evidence to reject a particular hypothesis. A statistical hypothesis test typically involves a calculation of a test statistic. ...
to be applied once the sample is collected.
*using a confidence level, i.e. the larger the required confidence level, the larger the sample size (given a constant precision requirement).
Introduction
Sample size determination is a crucial aspect of research methodology that plays a significant role in ensuring the reliability and validity of study findings. In order to influence the accuracy of estimates, the power of statistical tests, and the general robustness of the research findings, it entails carefully choosing the number of participants or data points to be included in a study.
Consider the case where we are conducting a survey to determine the average satisfaction level of customers regarding a new product. To determine an appropriate sample size, we need to consider factors such as the desired level of confidence, margin of error, and variability in the responses. We might decide that we want a 95% confidence level, meaning we are 95% confident that the true average satisfaction level falls within the calculated range. We also decide on a margin of error, of ±3%, which indicates the acceptable range of difference between our sample estimate and the true population parameter. Additionally, we may have some idea of the expected variability in satisfaction levels based on previous data or assumptions.
Importance
Larger sample sizes generally lead to increased
precision when
estimating unknown parameters. For instance, to accurately determine the prevalence of pathogen infection in a specific species of fish, it is preferable to examine a sample of 200 fish rather than 100 fish. Several fundamental facts of mathematical statistics describe this phenomenon, including the
law of large numbers and the
central limit theorem.
In some situations, the increase in precision for larger sample sizes is minimal, or even non-existent. This can result from the presence of
systematic error
Observational error (or measurement error) is the difference between a measurement, measured value of a physical quantity, quantity and its unknown true value.Dodge, Y. (2003) ''The Oxford Dictionary of Statistical Terms'', OUP. Such errors are ...
s or strong
dependence in the data, or if the data follows a heavy-tailed distribution, or because the data is strongly dependent or biased.
Sample sizes may be evaluated by the quality of the resulting estimates, as follows. It is usually determined on the basis of the cost, time or convenience of data collection and the need for sufficient statistical power. For example, if a proportion is being estimated, one may wish to have the 95%
confidence interval be less than 0.06 units wide. Alternatively, sample size may be assessed based on the
power of a hypothesis test. For example, if we are comparing the support for a certain political candidate among women with the support for that candidate among men, we may wish to have 80% power to detect a difference in the support levels of 0.04 units.
Estimation
Estimation of a proportion
A relatively simple situation is estimation of a
proportion. It is a fundamental aspect of statistical analysis, particularly when gauging the prevalence of a specific characteristic within a population. For example, we may wish to estimate the proportion of residents in a community who are at least 65 years old.
The
estimator of a
proportion is
, where ''X'' is the number of 'positive' instances (e.g., the number of people out of the ''n'' sampled people who are at least 65 years old). When the observations are
independent, this estimator has a (scaled)
binomial distribution
In probability theory and statistics, the binomial distribution with parameters and is the discrete probability distribution of the number of successes in a sequence of statistical independence, independent experiment (probability theory) ...
(and is also the
sample mean
A mean is a quantity representing the "center" of a collection of numbers and is intermediate to the extreme values of the set of numbers. There are several kinds of means (or "measures of central tendency") in mathematics, especially in statist ...
of data from a
Bernoulli distribution
In probability theory and statistics, the Bernoulli distribution, named after Swiss mathematician Jacob Bernoulli, is the discrete probability distribution of a random variable which takes the value 1 with probability p and the value 0 with pro ...
). The maximum
variance
In probability theory and statistics, variance is the expected value of the squared deviation from the mean of a random variable. The standard deviation (SD) is obtained as the square root of the variance. Variance is a measure of dispersion ...
of this distribution is 0.25, which occurs when the true
parameter
A parameter (), generally, is any characteristic that can help in defining or classifying a particular system (meaning an event, project, object, situation, etc.). That is, a parameter is an element of a system that is useful, or critical, when ...
is ''p'' = 0.5. In practical applications, where the true parameter ''p'' is unknown, the maximum variance is often employed for sample size assessments. If a reasonable estimate for p is known the quantity
may be used in place of 0.25.
As the sample size ''n'' grows sufficiently large, the distribution of
will be closely approximated by a
normal distribution
In probability theory and statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is
f(x) = \frac ...
. Using this and the
Wald method for the binomial distribution, yields a confidence interval, with Z representing the standard Z-score for the desired confidence level (e.g., 1.96 for a 95% confidence interval), in the form:
:
To determine an appropriate sample size ''n'' for estimating proportions, the equation below can be solved, where W represents the desired width of the confidence interval. The resulting sample size formula, is often applied with a conservative estimate of ''p'' (e.g., 0.5):
:
for ''n'', yielding the sample size
, in the case of using 0.5 as the most conservative estimate of the proportion. ''(Note: W/2 =
margin of error
The margin of error is a statistic expressing the amount of random sampling error in the results of a Statistical survey, survey. The larger the margin of error, the less confidence one should have that a poll result would reflect the result of ...
.)''
In the figure below one can observe how sample sizes for binomial proportions change given different confidence levels and margins of error.
Otherwise, the formula would be
, which yields
.
For example, in estimating the proportion of the U.S. population supporting a presidential candidate with a 95% confidence interval width of 2 percentage points (0.02), a sample size of (1.96)
2/ (0.02
2) = 9604 is required with the margin of error in this case is 1
percentage point
A percentage point or percent point is the unit (measurement), unit for the difference (mathematics), arithmetic difference between two percentages. For example, moving up from 40 percent to 44 percent is an increase of 4 percentage points (altho ...
. It is reasonable to use the 0.5 estimate for p in this case because the presidential races are often close to 50/50, and it is also prudent to use a conservative estimate. The
margin of error
The margin of error is a statistic expressing the amount of random sampling error in the results of a Statistical survey, survey. The larger the margin of error, the less confidence one should have that a poll result would reflect the result of ...
in this case is 1 percentage point (half of 0.02).
In practice, the formula :
is commonly used to form a 95% confidence interval for the true proportion. The equation
can be solved for ''n'', providing a minimum sample size needed to meet the desired margin of error ''W''. The foregoing is commonly simplified: ''n'' = 4/''W''
2 = 1/''B''
2 where ''B'' is the error bound on the estimate, i.e., the estimate is usually given as ''within ± B''. For ''B'' = 10% one requires ''n'' = 100, for ''B'' = 5% one needs ''n'' = 400, for ''B'' = 3% the requirement approximates to ''n'' = 1000, while for ''B'' = 1% a sample size of ''n'' = 10000 is required. These numbers are quoted often in news reports of
opinion poll
An opinion poll, often simply referred to as a survey or a poll, is a human research survey of public opinion from a particular sample. Opinion polls are usually designed to represent the opinions of a population by conducting a series of qu ...
s and other
sample surveys. However, the results reported may not be the exact value as numbers are preferably rounded up. Knowing that the value of the ''n'' is the minimum number of
sample points needed to acquire the desired result, the number of respondents then must lie on or above the minimum.
Estimation of a mean
Simply speaking, if we are trying to estimate the average time it takes for people to commute to work in a city. Instead of surveying the entire population, you can take a random sample of 100 individuals, record their commute times, and then calculate the mean (average) commute time for that sample. For example, person 1 takes 25 minutes, person 2 takes 30 minutes, ..., person 100 takes 20 minutes. Add up all the commute times and divide by the number of people in the sample (100 in this case). The result would be your estimate of the mean commute time for the entire population. This method is practical when it's not feasible to measure everyone in the population, and it provides a reasonable approximation based on a representative sample.
In a precisely mathematical way, when estimating the population mean using an independent and identically distributed (iid) sample of size ''n'', where each data value has variance ''σ''
2, the
standard error
The standard error (SE) of a statistic (usually an estimator of a parameter, like the average or mean) is the standard deviation of its sampling distribution or an estimate of that standard deviation. In other words, it is the standard deviati ...
of the sample mean is:
:
This expression describes quantitatively how the estimate becomes more precise as the sample size increases. Using the
central limit theorem to justify approximating the sample mean with a normal distribution yields a confidence interval of the form
:
,
:where Z is a standard
Z-score
In statistics, the standard score or ''z''-score is the number of standard deviations by which the value of a raw score (i.e., an observed value or data point) is above or below the mean value of what is being observed or measured. Raw scores ...
for the desired level of confidence (1.96 for a 95% confidence interval).
To determine the sample size ''n'' required for a confidence interval of width W, with W/2 as the margin of error on each side of the sample mean, the equation
:
can be solved. This yields the sample size formula, for ''n'':
''.''
For instance, if estimating the effect of a drug on blood pressure with a 95% confidence interval that is six units wide, and the known standard deviation of blood pressure in the population is 15, the required sample size would be
, which would be rounded up to 97, since sample sizes must be integers and must meet or exceed the calculated ''minimum'' value. Understanding these calculations is essential for researchers designing studies to accurately estimate population means within a desired level of confidence.
Required sample sizes for hypothesis tests
One of the prevalent challenges faced by statisticians revolves around the task of calculating the sample size needed to attain a specified statistical power for a test, all while maintaining a pre-determined
Type I error
Type I error, or a false positive, is the erroneous rejection of a true null hypothesis in statistical hypothesis testing. A type II error, or a false negative, is the erroneous failure in bringing about appropriate rejection of a false null hy ...
rate α, which signifies the level of significance in hypothesis testing. It yields a certain
power for a test, given a predetermined. As follows, this can be estimated by pre-determined tables for certain values, by formulas, by simulation, by Mead's resource equation, or by the
cumulative distribution function
In probability theory and statistics, the cumulative distribution function (CDF) of a real-valued random variable X, or just distribution function of X, evaluated at x, is the probability that X will take a value less than or equal to x.
Ever ...
:
Tables
The table shown on the right can be used in a
two-sample t-test to estimate the sample sizes of an
experimental group and a
control group
In the design of experiments, hypotheses are applied to experimental units in a treatment group.
In comparative experiments, members of a control group receive a standard treatment, a placebo, or no treatment at all. There may be more than one tr ...
that are of equal size, that is, the total number of individuals in the trial is twice that of the number given, and the desired
significance level is 0.05.
[Chapter 13]
page 215, in: The parameters used are:
*The desired
statistical power
In frequentist statistics, power is the probability of detecting a given effect (if that effect actually exists) using a given test in a given context. In typical use, it is a function of the specific test that is used (including the choice of tes ...
of the trial, shown in column to the left.
*
Cohen's d (= effect size), which is the expected difference between the
mean
A mean is a quantity representing the "center" of a collection of numbers and is intermediate to the extreme values of the set of numbers. There are several kinds of means (or "measures of central tendency") in mathematics, especially in statist ...
s of the target values between the experimental group and the
control group
In the design of experiments, hypotheses are applied to experimental units in a treatment group.
In comparative experiments, members of a control group receive a standard treatment, a placebo, or no treatment at all. There may be more than one tr ...
, divided by the expected
standard deviation
In statistics, the standard deviation is a measure of the amount of variation of the values of a variable about its Expected value, mean. A low standard Deviation (statistics), deviation indicates that the values tend to be close to the mean ( ...
.
Formulas
Calculating a required sample size is often not easy since the distribution of the test statistic under the alternative hypothesis of interest is usually hard to work with. Approximate sample size formulas for specific problems are available - some general references are
and
A computational approach (QuickSize)
The QuickSize algorithm
is a very general approach that is simple to use yet versatile enough to give an exact solution for a broad range of problems. It uses simulation together with a search algorithm.
Mead's resource equation
Mead
Mead (), also called honey wine, and hydromel (particularly when low in alcohol content), is an alcoholic beverage made by fermenting honey mixed with water, and sometimes with added ingredients such as fruits, spices, grains, or hops. The alco ...
's resource equation is often used for estimating sample sizes of
laboratory animal
Animal testing, also known as animal experimentation, animal research, and ''in vivo'' testing, is the use of animals, as model organisms, in experiments that seek answers to scientific and medical questions. This approach can be contrasted ...
s, as well as in many other laboratory experiments. It may not be as accurate as using other methods in estimating sample size, but gives a hint of what is the appropriate sample size where parameters such as expected standard deviations or expected differences in values between groups are unknown or very hard to estimate.
[online Page 29]
/ref>
All the parameters in the equation are in fact the degrees of freedom
In many scientific fields, the degrees of freedom of a system is the number of parameters of the system that may vary independently. For example, a point in the plane has two degrees of freedom for translation: its two coordinates; a non-infinite ...
of the number of their concepts, and hence, their numbers are subtracted by 1 before insertion into the equation.
The equation is:[
:
where:
*''N'' is the total number of individuals or units in the study (minus 1)
*''B'' is the ''blocking component'', representing environmental effects allowed for in the design (minus 1)
*''T'' is the ''treatment component'', corresponding to the number of treatment groups (including ]control group
In the design of experiments, hypotheses are applied to experimental units in a treatment group.
In comparative experiments, members of a control group receive a standard treatment, a placebo, or no treatment at all. There may be more than one tr ...
) being used, or the number of questions being asked (minus 1)
*''E'' is the degrees of freedom of the ''error component and'' should be somewhere between 10 and 20.
For example, if a study using laboratory animals is planned with four treatment groups (''T''=3), with eight animals per group, making 32 animals total (''N''=31), without any further stratification (''B''=0), then ''E'' would equal 28, which is above the cutoff of 20, indicating that sample size may be a bit too large, and six animals per group might be more appropriate.
Cumulative distribution function
Let ''Xi'', ''i'' = 1, 2, ..., ''n'' be independent observations taken from a normal distribution
In probability theory and statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is
f(x) = \frac ...
with unknown mean μ and known variance σ2. Consider two hypotheses, a null hypothesis
The null hypothesis (often denoted ''H''0) is the claim in scientific research that the effect being studied does not exist. The null hypothesis can also be described as the hypothesis in which no relationship exists between two sets of data o ...
:
:
and an alternative hypothesis:
:
for some 'smallest significant difference' ''μ''* > 0. This is the smallest value for which we care about observing a difference. Now, for (1) to reject ''H''0 with a probability of at least 1 − ''β'' when
''H''a is true (i.e. a power of 1 − ''β''), and (2) reject ''H''0 with probability α when ''H''0 is true, the following is necessary:
If ''z''''α'' is the upper α percentage point of the standard normal distribution, then
:
and so
: 'Reject ''H''0 if our sample average () is more than '
is a decision rule which satisfies (2). (This is a 1-tailed test.) In such a scenario, achieving this with a probability of at least 1−β when the alternative hypothesis ''H''a is true becomes imperative. Here, the sample average originates from a Normal distribution with a mean of ''μ''*. Thus, the requirement is expressed as:
:
Through careful manipulation, this can be shown (see Statistical power Example) to happen when
:
where is the normal cumulative distribution function
In probability theory and statistics, the cumulative distribution function (CDF) of a real-valued random variable X, or just distribution function of X, evaluated at x, is the probability that X will take a value less than or equal to x.
Ever ...
.
Stratified sample size
With more complicated sampling techniques, such as stratified sampling, the sample can often be split up into sub-samples. Typically, if there are ''H'' such sub-samples (from ''H'' different strata) then each of them will have a sample size ''nh'', ''h'' = 1, 2, ..., ''H''. These ''nh'' must conform to the rule that ''n''1 + ''n''2 + ... + ''n''''H'' = ''n'' (i.e., that the total sample size is given by the sum of the sub-sample sizes). Selecting these ''nh'' optimally can be done in various ways, using (for example) Neyman's optimal allocation.
There are many reasons to use stratified sampling: to decrease variances of sample estimates, to use partly non-random methods, or to study strata individually. A useful, partly non-random method would be to sample individuals where easily accessible, but, where not, sample clusters to save travel costs.
In general, for ''H'' strata, a weighted sample mean is
:
with
:
The weights, , frequently, but not always, represent the proportions of the population elements in the strata, and . For a fixed sample size, that is ,
:
which can be made a minimum if the sampling rate
In signal processing, sampling is the reduction of a continuous-time signal to a discrete-time signal. A common example is the conversion of a sound wave to a sequence of "samples".
A sample is a value of the signal at a point in time and/or s ...
within each stratum is made
proportional to the standard deviation within each stratum: , where and is a constant such that .
An "optimum allocation" is reached when the sampling rates within the strata
are made directly proportional to the standard deviations within the strata
and inversely proportional to the square root of the sampling cost per element
within the strata, :
:
where is a constant such that , or, more generally, when
:
Qualitative research
Qualitative research approaches sample size determination with a distinctive methodology that diverges from quantitative methods. Rather than relying on predetermined formulas or statistical calculations, it involves a subjective and iterative judgment throughout the research process. In qualitative studies, researchers often adopt a subjective stance, making determinations as the study unfolds.
Sample size determination in qualitative studies takes a different approach. It is generally a subjective judgment, taken as the research proceeds. One common approach is to continually include additional participants or materials until a point of "saturation" is reached. Saturation occurs when new participants or data cease to provide fresh insights, indicating that the study has adequately captured the diversity of perspectives or experiences within the chosen sample saturation is reached. The number needed to reach saturation has been investigated empirically.
Unlike quantitative research
Quantitative research is a research strategy that focuses on quantifying the collection and analysis of data. It is formed from a deductive approach where emphasis is placed on the testing of theory, shaped by empiricist and positivist philoso ...
, qualitative studies face a scarcity of reliable guidance regarding sample size estimation prior to beginning the research.
Imagine conducting in-depth interviews with cancer survivors, qualitative researchers may use data saturation to determine the appropriate sample size. If, over a number of interviews, no fresh themes or insights show up, saturation has been reached and more interviews might not add much to our knowledge of the survivor's experience. Thus, rather than following a preset statistical formula, the concept of attaining saturation serves as a dynamic guide for determining sample size in qualitative research. There is a paucity of reliable guidance on estimating sample sizes before starting the research, with a range of suggestions given. In an effort to introduce some structure to the sample size determination process in qualitative research, a tool analogous to quantitative power calculations has been proposed. This tool, based on the negative binomial distribution
In probability theory and statistics, the negative binomial distribution, also called a Pascal distribution, is a discrete probability distribution that models the number of failures in a sequence of independent and identically distributed Berno ...
, is particularly tailored for thematic analysis
Thematic analysis is one of the most common forms of analysis within qualitative research. It emphasizes identifying, analysing and interpreting patterns of meaning (or "themes") within qualitative data. Thematic analysis is often understood as a m ...
.[Galvin R (2015). How many interviews are enough? Do qualitative interviews in building energy consumption research produce reliable knowledge? Journal of Building Engineering, 1:2–12.]
See also
*Design of experiments
The design of experiments (DOE), also known as experiment design or experimental design, is the design of any task that aims to describe and explain the variation of information under conditions that are hypothesized to reflect the variation. ...
*Engineering response surface example under Stepwise regression
In statistics, stepwise regression is a method of fitting regression models in which the choice of predictive variables is carried out by an automatic procedure. In each step, a variable is considered for addition to or subtraction from the set of ...
* Cohen's h
*Receiver operating characteristic
A receiver operating characteristic curve, or ROC curve, is a graph of a function, graphical plot that illustrates the performance of a binary classifier model (can be used for multi class classification as well) at varying threshold values. ROC ...
References
General references
*
*
*
*
*Rens van de Schoot, Milica Miočević (eds.). 2020. Small Sample Size Solutions (Open Access): A Guide for Applied Researchers and Practitioners. Routledge.
Further reading
NIST: Selecting Sample Sizes
* ASTM
ASTM International, formerly known as American Society for Testing and Materials, is a standards organization that develops and publishes voluntary consensus technical international standards for a wide range of materials, products, systems and s ...
E122-07: Standard Practice for Calculating Sample Size to Estimate, With Specified Precision, the Average for a Characteristic of a Lot or Process
External links
A MATLAB script implementing Cochran's sample size formula
{{DEFAULTSORT:Sample Size
Sampling (statistics)
de:Zufallsstichprobe#Stichprobenumfang