statistics Statistics (from German language, German: ', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a s ...

, a sampling distribution or finite-sample distribution is the

probability distribution In probability theory and statistics, a probability distribution is a Function (mathematics), function that gives the probabilities of occurrence of possible events for an Experiment (probability theory), experiment. It is a mathematical descri ...

of a given random-sample-based

statistic A statistic (singular) or sample statistic is any quantity computed from values in a sample which is considered for a statistical purpose. Statistical purposes include estimating a population parameter, describing a sample, or evaluating a hypot ...

. For an arbitrarily large number of samples where each sample, involving multiple observations (data points), is separately used to compute one value of a statistic (for example, the sample mean or sample

variance In probability theory and statistics, variance is the expected value of the squared deviation from the mean of a random variable. The standard deviation (SD) is obtained as the square root of the variance. Variance is a measure of dispersion ...

) per sample, the sampling distribution is the probability distribution of the values that the statistic takes on. In many contexts, only one sample (i.e., a set of observations) is observed, but the sampling distribution can be found theoretically. Sampling distributions are important in statistics because they provide a major simplification en route to

statistical inference Statistical inference is the process of using data analysis to infer properties of an underlying probability distribution.Upton, G., Cook, I. (2008) ''Oxford Dictionary of Statistics'', OUP. . Inferential statistical analysis infers properties of ...

. More specifically, they allow analytical considerations to be based on the probability distribution of a statistic, rather than on the

joint probability distribution A joint or articulation (or articular surface) is the connection made between bones, ossicles, or other hard structures in the body which link an animal's skeletal system into a functional whole.Saladin, Ken. Anatomy & Physiology. 7th ed. McGraw- ...

of all the individual sample values.

Introduction

The sampling distribution of a statistic is the distribution of that statistic, considered as a

random variable A random variable (also called random quantity, aleatory variable, or stochastic variable) is a Mathematics, mathematical formalization of a quantity or object which depends on randomness, random events. The term 'random variable' in its mathema ...

, when derived from a random sample of size

n

. It may be considered as the distribution of the statistic for ''all possible samples from the same population'' of a given sample size. The sampling distribution depends on the underlying distribution of the population, the statistic being considered, the sampling procedure employed, and the sample size used. There is often considerable interest in whether the sampling distribution can be approximated by an asymptotic distribution, which corresponds to the limiting case either as the number of random samples of finite size, taken from an infinite population and used to produce the distribution, tends to infinity, or when just one equally-infinite-size "sample" is taken of that same population. For example, consider a normal population with mean

\mu

and variance

\sigma^2

. Assume we repeatedly take samples of a given size from this population and calculate the

arithmetic mean In mathematics and statistics, the arithmetic mean ( ), arithmetic average, or just the ''mean'' or ''average'' is the sum of a collection of numbers divided by the count of numbers in the collection. The collection is often a set of results fr ...

\bar x

for each sample – this statistic is called the sample mean. The distribution of these means, or averages, is called the "sampling distribution of the sample mean". This distribution is normal

\mathcal(\mu, \sigma^2/n)

(''n'' is the sample size) since the underlying population is normal, although sampling distributions may also often be close to normal even when the population distribution is not (see central limit theorem). An alternative to the sample mean is the sample

median The median of a set of numbers is the value separating the higher half from the lower half of a Sample (statistics), data sample, a statistical population, population, or a probability distribution. For a data set, it may be thought of as the “ ...

. When calculated from the same population, it has a different sampling distribution to that of the mean and is generally not normal (but it may be close for large sample sizes). The mean of a sample from a population having a normal distribution is an example of a simple statistic taken from one of the simplest

statistical population In statistics, a population is a set of similar items or events which is of interest for some question or experiment. A statistical population can be a group of existing objects (e.g. the set of all stars within the Milky Way galaxy) or a hyp ...

s. For other statistics and other populations the formulas are more complicated, and often they do not exist in closed-form. In such cases the sampling distributions may be approximated through Monte-Carlo simulations, bootstrap methods, or asymptotic distribution theory.

Standard error

The

standard deviation In statistics, the standard deviation is a measure of the amount of variation of the values of a variable about its Expected value, mean. A low standard Deviation (statistics), deviation indicates that the values tend to be close to the mean ( ...

of the sampling distribution of a

is referred to as the standard error of the statistic. For the case where the statistic is the sample mean, and samples are uncorrelated, the standard error is:

\sigma_ = \frac

where

\sigma

is the standard deviation of the population distribution of that quantity and

n

is the sample size (number of items in the sample). An important implication of this formula is that the sample size must be quadrupled (multiplied by 4) to achieve half (1/2) the measurement error. When designing statistical studies where cost is a factor, this may have a role in understanding cost–benefit tradeoffs. For the case where the statistic is the sample total, and samples are uncorrelated, the standard error is:

\sigma_ = \sigma\sqrt

where, again,

\sigma

is the standard deviation of the population distribution of that quantity and

n

is the sample size (number of items in the sample).

Examples

References

* Merberg, A. and S.J. Miller (2008)
"The Sample Distribution of the Median"
''Course Notes for Math 162: Mathematical Statistics'', pgs 1–9.

External links

''Mathematica'' demonstration showing the sampling distribution of various statistics (e.g. Σ''x''²) for a normal population
{{Authority control Statistical inference Sampling (statistics)