A tolerance interval (TI) is a

statistical interval In statistics, interval estimation is the use of Sample (statistics), sample data to estimation, estimate an ''interval (mathematics), interval'' of possible values of a Statistical parameter, parameter of interest. This is in contrast to point est ...

within which, with some confidence level, a specified

sampled Sample or samples may refer to: * Sample (graphics), an intersection of a color channel and a pixel * Sample (material), a specimen or small quantity of something * Sample (signal), a digital discrete sample of a continuous analog signal * Sample ...

proportion of a population falls. "More specifically, a tolerance interval provides limits within which at least a certain proportion (''p'') of the population falls with a given level of confidence (1−α)." "A (''p'', 1−α) tolerance interval (TI) based on a sample is constructed so that it would include at least a proportion ''p'' of the sampled population with confidence 1−α; such a TI is usually referred to as p-content − (1−α) coverage TI."Krishnamoorthy, K. and Lian, Xiaodong(2011) 'Closed-form approximate tolerance intervals for some general linear models and comparison studies', Journal of Statistical Computation and Simulation, First published on: 13 June 2011 "A (p, 1−α) upper tolerance limit (TL) is simply a 1−α upper confidence limit for the 100 ''p''

percentile In statistics, a ''k''-th percentile, also known as percentile score or centile, is a score (e.g., a data point) a given percentage ''k'' of all scores in its frequency distribution exists ("exclusive" definition) or a score a given percentage ...

of the population."

Definition

Assume observations or

random variate In probability and statistics, a random variate or simply variate is a particular outcome or ''realization'' of a random variable; the random variates which are other outcomes of the same random variable might have different values ( random numbe ...

\mathbf=(x_1,\ldots,x_n)

as realization of independent random variables

\mathbf=(X_1,\ldots,X_n)

which have a common distribution

F_\theta

, with unknown parameter

\theta

. Then, a tolerance interval with endpoints

(L(\mathbf), U(\mathbf)]

which has the defining property: :

\inf_\theta\ = 100(1-\alpha)

where

\inf\

denotes the

infimum In mathematics, the infimum (abbreviated inf; : infima) of a subset S of a partially ordered set P is the greatest element in P that is less than or equal to each element of S, if such an element exists. If the infimum of S exists, it is unique ...

function. This is in contrast to a prediction interval with endpoints

(\mathbf), u(\mathbf) /math> which has the defining property: : \inf_\theta\= 100(1-\alpha) .
Here, X_0 is a random variable from the same distribution F_\theta but independent of the first n variables.

Notice X_0 is  involved in the definition of tolerance interval, which deals only with the first sample, of size ''n''.

Calculation

One-sided normal tolerance intervals have an exact solution in terms of the sample mean and sample variance based on the noncentral ''t''-distribution. Two-sided normal tolerance intervals can be estimated using the

chi-squared distribution In probability theory and statistics, the \chi^2-distribution with k Degrees of freedom (statistics), degrees of freedom is the distribution of a sum of the squares of k Independence (probability theory), independent standard normal random vari ...

., p.23

Relation to other intervals

"In the parameters-known case, a 95% tolerance interval and a 95%

prediction interval In statistical inference, specifically predictive inference, a prediction interval is an estimate of an interval (statistics), interval in which a future observation will fall, with a certain probability, given what has already been observed. Pr ...

are the same." If we knew a population's exact parameters, we would be able to compute a range within which a certain proportion of the population falls. For example, if we know a population is

normally distributed In probability theory and statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real number, real-valued random variable. The general form of its probability density function is f(x ...

with

mean A mean is a quantity representing the "center" of a collection of numbers and is intermediate to the extreme values of the set of numbers. There are several kinds of means (or "measures of central tendency") in mathematics, especially in statist ...

\mu

and

standard deviation In statistics, the standard deviation is a measure of the amount of variation of the values of a variable about its Expected value, mean. A low standard Deviation (statistics), deviation indicates that the values tend to be close to the mean ( ...

\sigma

, then the interval

\mu \pm 1.96\sigma

includes 95% of the population (1.96 is the

z-score In statistics, the standard score or ''z''-score is the number of standard deviations by which the value of a raw score (i.e., an observed value or data point) is above or below the mean value of what is being observed or measured. Raw scores ...

for 95% coverage of a normally distributed population). However, if we have only a sample from the population, we know only the

sample mean The sample mean (sample average) or empirical mean (empirical average), and the sample covariance or empirical covariance are statistics computed from a sample of data on one or more random variables. The sample mean is the average value (or me ...

\hat

and sample standard deviation

\hat

, which are only estimates of the true parameters. In that case,

\hat \pm 1.96\hat

will not necessarily include 95% of the population, due to variance in these estimates. A tolerance interval bounds this variance by introducing a confidence level

\gamma

, which is the confidence with which this interval actually includes the specified proportion of the population. For a normally distributed population, a z-score can be transformed into a "''k'' factor" or tolerance factor for a given

\gamma

via lookup tables or several approximation formulas. "As the

degrees of freedom In many scientific fields, the degrees of freedom of a system is the number of parameters of the system that may vary independently. For example, a point in the plane has two degrees of freedom for translation: its two coordinates; a non-infinite ...

approach infinity, the prediction and tolerance intervals become equal." The tolerance interval is less widely known than the confidence interval and

, a situation some educators have lamented, as it can lead to misuse of the other intervals where a tolerance interval is more appropriate. The tolerance interval differs from a confidence interval in that the confidence interval bounds a single-valued population parameter (the

or the

variance In probability theory and statistics, variance is the expected value of the squared deviation from the mean of a random variable. The standard deviation (SD) is obtained as the square root of the variance. Variance is a measure of dispersion ...

, for example) with some confidence, while the tolerance interval bounds the range of data values that includes a specific proportion of the population. Whereas a confidence interval's size is entirely due to

sampling error In statistics, sampling errors are incurred when the statistical characteristics of a population are estimated from a subset, or sample, of that population. Since the sample does not include all members of the population, statistics of the sample ...

, and will approach a zero-width interval at the true population parameter as sample size increases, a tolerance interval's size is due partly to sampling error and partly to actual variance in the population, and will approach the population's probability interval as sample size increases. The tolerance interval is related to a

in that both put bounds on variation in future samples. However, the prediction interval only bounds a single future sample, whereas a tolerance interval bounds the entire population (equivalently, an arbitrary sequence of future samples). In other words, a prediction interval covers a specified proportion of a population ''on average'', whereas a tolerance interval covers it ''with a certain confidence level'', making the tolerance interval more appropriate if a single interval is intended to bound multiple future samples.

Examples

gives the following example:

So consider once again a proverbial EPA mileage test scenario, in which several nominally identical autos of a particular model are tested to produce mileage figures $y_1, y_2, ..., y_n$ . If such data are processed to produce a 95% confidence interval for the mean mileage of the model, it is, for example, possible to use it to project the mean or total gasoline consumption for the manufactured fleet of such autos over their first 5,000 miles of use. Such an interval, would however, not be of much help to a person renting one of these cars and wondering whether the (full) 10-gallon tank of gas will suffice to carry him the 350 miles to his destination. For that job, a prediction interval would be much more useful. (Consider the differing implications of being "95% sure" that $\mu \ge 35$ as opposed to being "95% sure" that $y_ \ge 35$ .) But neither a confidence interval for $\mu$ nor a prediction interval for a single additional mileage is exactly what is needed by a design engineer charged with determining how large a gas tank the model really needs to guarantee that 99% of the autos produced will have a 400-mile cruising range. What the engineer really needs is a tolerance interval for a fraction $p = .99$ of mileages of such autos.

Another example is given by:

The air lead levels were collected from $n=15$ different areas within the facility. It was noted that the log-transformed lead levels fitted a normal distribution well (that is, the data are from a
lognormal distribution In probability theory, a log-normal (or lognormal) distribution is a continuous probability distribution of a random variable whose logarithm is normal distribution, normally distributed. Thus, if the random variable is log-normally distributed ...
. Let $\mu$ and $\sigma^2$ , respectively, denote the population mean and variance for the log-transformed data. If $X$ denotes the corresponding random variable, we thus have $X \sim \mathcal(\mu, \sigma^2)$ . We note that $\exp(\mu)$ is the median air lead level. A confidence interval for $\mu$ can be constructed the usual way, based on the ''t''-distribution; this in turn will provide a confidence interval for the median air lead level. If $\bar$ and $S$ denote the sample mean and standard deviation of the log-transformed data for a sample of size n, a 95% confidence interval for $\mu$ is given by $\bar \pm t_ S / \sqrt$ , where $t_$ denotes the $1-\alpha$ quantile of a ''t''-distribution with $m$ degrees of freedom. It may also be of interest to derive a 95% upper confidence bound for the median air lead level. Such a bound for $\mu$ is given by $\bar + t_ S / \sqrt$ . Consequently, a 95% upper confidence bound for the median air lead is given by $\exp$ . Now suppose we want to predict the air lead level at a particular area within the laboratory. A 95% upper prediction limit for the log-transformed lead level is given by $\bar + t_ S \sqrt$ . A two-sided prediction interval can be similarly computed. The meaning and interpretation of these intervals are well known. For example, if the confidence interval $\bar \pm t_ S / \sqrt$ is computed repeatedly from independent samples, 95% of the intervals so computed will include the true value of $\mu$ , in the long run. In other words, the interval is meant to provide information concerning the parameter $\mu$ only. A prediction interval has a similar interpretation, and is meant to provide information concerning a single lead level only. Now suppose we want to use the sample to conclude whether or not at least 95% of the population lead levels are below a threshold. The confidence interval and prediction interval cannot answer this question, since the confidence interval is only for the median lead level, and the prediction interval is only for a single lead level. What is required is a tolerance interval; more specifically, an upper tolerance limit. The upper tolerance limit is to be computed subject to the condition that at least 95% of the population lead levels is below the limit, with a certain confidence level, say 99%.

Definition

Calculation

Relation to other intervals

Examples

See also

References

Further reading