HOME

TheInfoList



OR:

In
statistics Statistics (from German: '' Statistik'', "description of a state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a scientific, indust ...
, a sampling distribution or finite-sample distribution is the
probability distribution In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment. It is a mathematical description of a random phenomenon ...
of a given random-sample-based
statistic A statistic (singular) or sample statistic is any quantity computed from values in a sample which is considered for a statistical purpose. Statistical purposes include estimating a population parameter, describing a sample, or evaluating a hypo ...
. If an arbitrarily large number of samples, each involving multiple observations (data points), were separately used in order to compute one value of a statistic (such as, for example, the
sample mean The sample mean (or "empirical mean") and the sample covariance are statistics computed from a sample of data on one or more random variables. The sample mean is the average value (or mean value) of a sample of numbers taken from a larger popu ...
or sample
variance In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its population mean or sample mean. Variance is a measure of dispersion, meaning it is a measure of how far a set of numbe ...
) for each sample, then the sampling distribution is the probability distribution of the values that the statistic takes on. In many contexts, only one sample is observed, but the sampling distribution can be found theoretically. Sampling distributions are important in statistics because they provide a major simplification en route to
statistical inference Statistical inference is the process of using data analysis to infer properties of an underlying distribution of probability.Upton, G., Cook, I. (2008) ''Oxford Dictionary of Statistics'', OUP. . Inferential statistical analysis infers properti ...
. More specifically, they allow analytical considerations to be based on the probability distribution of a statistic, rather than on the
joint probability distribution Given two random variables that are defined on the same probability space, the joint probability distribution is the corresponding probability distribution on all possible pairs of outputs. The joint distribution can just as well be considered ...
of all the individual sample values.


Introduction

The sampling distribution of a statistic is the
distribution Distribution may refer to: Mathematics * Distribution (mathematics), generalized functions used to formulate solutions of partial differential equations *Probability distribution, the probability of a particular value or value range of a vari ...
of that statistic, considered as a
random variable A random variable (also called random quantity, aleatory variable, or stochastic variable) is a mathematical formalization of a quantity or object which depends on random events. It is a mapping or a function from possible outcomes (e.g., the po ...
, when derived from a
random sample In statistics, quality assurance, and survey methodology, sampling is the selection of a subset (a statistical sample) of individuals from within a statistical population to estimate characteristics of the whole population. Statisticians atte ...
of size n. It may be considered as the distribution of the statistic for ''all possible samples from the same population'' of a given sample size. The sampling distribution depends on the underlying
distribution Distribution may refer to: Mathematics * Distribution (mathematics), generalized functions used to formulate solutions of partial differential equations *Probability distribution, the probability of a particular value or value range of a vari ...
of the population, the statistic being considered, the sampling procedure employed, and the sample size used. There is often considerable interest in whether the sampling distribution can be approximated by an
asymptotic distribution In mathematics and statistics, an asymptotic distribution is a probability distribution that is in a sense the "limiting" distribution of a sequence of distributions. One of the main uses of the idea of an asymptotic distribution is in providing ...
, which corresponds to the limiting case either as the number of random samples of finite size, taken from an infinite population and used to produce the distribution, tends to infinity, or when just one equally-infinite-size "sample" is taken of that same population. For example, consider a normal population with mean \mu and variance \sigma^2. Assume we repeatedly take samples of a given size from this population and calculate the
arithmetic mean In mathematics and statistics, the arithmetic mean ( ) or arithmetic average, or just the '' mean'' or the ''average'' (when the context is clear), is the sum of a collection of numbers divided by the count of numbers in the collection. The co ...
\bar x for each sample – this statistic is called the
sample mean The sample mean (or "empirical mean") and the sample covariance are statistics computed from a sample of data on one or more random variables. The sample mean is the average value (or mean value) of a sample of numbers taken from a larger popu ...
. The distribution of these means, or averages, is called the "sampling distribution of the sample mean". This distribution is normal \mathcal(\mu, \sigma^2/n) (''n'' is the sample size) since the underlying population is normal, although sampling distributions may also often be close to normal even when the population distribution is not (see
central limit theorem In probability theory, the central limit theorem (CLT) establishes that, in many situations, when independent random variables are summed up, their properly normalized sum tends toward a normal distribution even if the original variables themsel ...
). An alternative to the sample mean is the sample
median In statistics and probability theory, the median is the value separating the higher half from the lower half of a data sample, a population, or a probability distribution. For a data set, it may be thought of as "the middle" value. The basic f ...
. When calculated from the same population, it has a different sampling distribution to that of the mean and is generally not normal (but it may be close for large sample sizes). The mean of a sample from a population having a normal distribution is an example of a simple statistic taken from one of the simplest
statistical population In statistics, a population is a set of similar items or events which is of interest for some question or experiment. A statistical population can be a group of existing objects (e.g. the set of all stars within the Milky Way galaxy) or a hypoth ...
s. For other statistics and other populations the formulas are more complicated, and often they do not exist in closed-form. In such cases the sampling distributions may be approximated through Monte-Carlo simulations, bootstrap methods, or
asymptotic distribution In mathematics and statistics, an asymptotic distribution is a probability distribution that is in a sense the "limiting" distribution of a sequence of distributions. One of the main uses of the idea of an asymptotic distribution is in providing ...
theory.


Standard error

The
standard deviation In statistics, the standard deviation is a measure of the amount of variation or dispersion of a set of values. A low standard deviation indicates that the values tend to be close to the mean (also called the expected value) of the set, whil ...
of the sampling distribution of a
statistic A statistic (singular) or sample statistic is any quantity computed from values in a sample which is considered for a statistical purpose. Statistical purposes include estimating a population parameter, describing a sample, or evaluating a hypo ...
is referred to as the
standard error The standard error (SE) of a statistic (usually an estimate of a parameter) is the standard deviation of its sampling distribution or an estimate of that standard deviation. If the statistic is the sample mean, it is called the standard error o ...
of that quantity. For the case where the statistic is the sample mean, and samples are uncorrelated, the standard error is: \sigma_ = \frac where \sigma is the standard deviation of the population distribution of that quantity and n is the sample size (number of items in the sample). An important implication of this formula is that the sample size must be quadrupled (multiplied by 4) to achieve half (1/2) the measurement error. When designing statistical studies where cost is a factor, this may have a role in understanding cost–benefit tradeoffs. For the case where the statistic is the sample total, and samples are uncorrelated, the standard error is: \sigma_ = \sigma\sqrt where, again, \sigma is the standard deviation of the population distribution of that quantity and n is the sample size (number of items in the sample).


Examples


References

* Merberg, A. and S.J. Miller (2008)
"The Sample Distribution of the Median"
''Course Notes for Math 162: Mathematical Statistics'', pgs 1–9.


External links


''Mathematica'' demonstration showing the sampling distribution of various statistics (e.g. Σ''x''²) for a normal population
{{Statistics, inference Statistical inference Sampling (statistics)