HOME

TheInfoList



OR:

In
statistical inference Statistical inference is the process of using data analysis to infer properties of an underlying probability distribution, distribution of probability.Upton, G., Cook, I. (2008) ''Oxford Dictionary of Statistics'', OUP. . Inferential statistical ...
, specifically
predictive inference Statistical inference is the process of using data analysis to infer properties of an underlying distribution of probability.Upton, G., Cook, I. (2008) ''Oxford Dictionary of Statistics'', OUP. . Inferential statistical analysis infers propertie ...
, a prediction interval is an estimate of an interval in which a future observation will fall, with a certain probability, given what has already been observed. Prediction intervals are often used in
regression analysis In statistical modeling, regression analysis is a set of statistical processes for estimating the relationships between a dependent variable (often called the 'outcome' or 'response' variable, or a 'label' in machine learning parlance) and one ...
. Prediction intervals are used in both
frequentist statistics Frequentist inference is a type of statistical inference based in frequentist probability, which treats “probability” in equivalent terms to “frequency” and draws conclusions from sample-data by means of emphasizing the frequency or pr ...
and
Bayesian statistics Bayesian statistics is a theory in the field of statistics based on the Bayesian interpretation of probability where probability expresses a ''degree of belief'' in an event. The degree of belief may be based on prior knowledge about the event, ...
: a prediction interval bears the same relationship to a future observation that a frequentist
confidence interval In frequentist statistics, a confidence interval (CI) is a range of estimates for an unknown parameter. A confidence interval is computed at a designated ''confidence level''; the 95% confidence level is most common, but other levels, such as 9 ...
or Bayesian
credible interval In Bayesian statistics, a credible interval is an interval within which an unobserved parameter value falls with a particular probability. It is an interval in the domain of a posterior probability distribution or a predictive distribution. T ...
bears to an unobservable population parameter: prediction intervals predict the distribution of individual future points, whereas confidence intervals and credible intervals of parameters predict the distribution of estimates of the true population mean or other quantity of interest that cannot be observed.


Introduction

For example, if one makes the parametric assumption that the underlying distribution is a
normal distribution In statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is : f(x) = \frac e^ The parameter \mu ...
, and has a sample set , then confidence intervals and credible intervals may be used to estimate the
population mean In statistics, a population is a set of similar items or events which is of interest for some question or experiment. A statistical population can be a group of existing objects (e.g. the set of all stars within the Milky Way galaxy) or a hypothe ...
''μ'' and population standard deviation ''σ'' of the underlying population, while prediction intervals may be used to estimate the value of the next sample variable, ''X''''n''+1. Alternatively, in Bayesian terms, a prediction interval can be described as a credible interval for the variable itself, rather than for a parameter of the distribution thereof. The concept of prediction intervals need not be restricted to inference about a single future sample value but can be extended to more complicated cases. For example, in the context of river flooding where analyses are often based on annual values of the largest flow within the year, there may be interest in making inferences about the largest flood likely to be experienced within the next 50 years. Since prediction intervals are only concerned with past and future observations, rather than unobservable population parameters, they are advocated as a better method than confidence intervals by some statisticians, such as
Seymour Geisser Seymour Geisser (October 5, 1929 – March 11, 2004) was an American statistician noted for emphasizing predictive inference. In his book ''Predictive Inference: An Introduction'', he held that conventional statistical inference about unobservable ...
, following the focus on observables by
Bruno de Finetti Bruno de Finetti (13 June 1906 – 20 July 1985) was an Italian probabilist statistician and actuary, noted for the "operational subjective" conception of probability. The classic exposition of his distinctive theory is the 1937 "La prévision: ...
.


Normal distribution

Given a sample from a
normal distribution In statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is : f(x) = \frac e^ The parameter \mu ...
, whose parameters are unknown, it is possible to give prediction intervals in the frequentist sense, i.e., an interval 'a'', ''b''based on statistics of the sample such that on repeated experiments, ''X''''n''+1 falls in the interval the desired percentage of the time; one may call these "predictive
confidence interval In frequentist statistics, a confidence interval (CI) is a range of estimates for an unknown parameter. A confidence interval is computed at a designated ''confidence level''; the 95% confidence level is most common, but other levels, such as 9 ...
s". A general technique of frequentist prediction intervals is to find and compute a
pivotal quantity In statistics, a pivotal quantity or pivot is a function of observations and unobservable parameters such that the function's probability distribution does not depend on the unknown parameters (including nuisance parameters). A pivot quantity need ...
of the observables ''X''1, ..., ''X''''n'', ''X''''n''+1 – meaning a function of observables and parameters whose probability distribution does not depend on the parameters – that can be inverted to give a probability of the future observation ''X''''n''+1 falling in some interval computed in terms of the observed values so far, X_1,\dots,X_n. Such a pivotal quantity, depending only on observables, is called an
ancillary statistic An ancillary statistic is a measure of a sample whose distribution (or whose pmf or pdf) does not depend on the parameters of the model. An ancillary statistic is a pivotal quantity that is also a statistic. Ancillary statistics can be used to c ...
. The usual method of constructing pivotal quantities is to take the difference of two variables that depend on location, so that location cancels out, and then take the ratio of two variables that depend on scale, so that scale cancels out. The most familiar pivotal quantity is the
Student's t-statistic In statistics, the ''t''-statistic is the ratio of the departure of the estimated value of a parameter from its hypothesized value to its standard error. It is used in hypothesis testing via Student's ''t''-test. The ''t''-statistic is used in a ...
, which can be derived by this method and is used in the sequel.


Known mean, known variance

A prediction interval 'ℓ'',''u''for a future observation ''X'' in a normal distribution ''N''(''µ'',''σ''2) with known
mean There are several kinds of mean in mathematics, especially in statistics. Each mean serves to summarize a given group of data, often to better understand the overall value (magnitude and sign) of a given data set. For a data set, the '' ari ...
and
variance In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its population mean or sample mean. Variance is a measure of dispersion, meaning it is a measure of how far a set of numbe ...
may be calculated from :\gamma=P(\ell where Z=\frac, the
standard score In statistics, the standard score is the number of standard deviations by which the value of a raw score (i.e., an observed value or data point) is above or below the mean value of what is being observed or measured. Raw scores above the mean ...
of ''X'', is distributed as standard normal. Hence :\frac \sigma = -z, \quad \frac \sigma = z, or :\ell=\mu-z\sigma, \quad u=\mu+z\sigma, with ''z'' the
quantile In statistics and probability, quantiles are cut points dividing the range of a probability distribution into continuous intervals with equal probabilities, or dividing the observations in a sample in the same way. There is one fewer quantile th ...
in the standard normal distribution for which: :\gamma=P(-z or equivalently; :\tfrac 12(1-\gamma)=P(Z>z). The prediction interval is conventionally written as: :\left mu- z\sigma,\ \mu + z\sigma \right For example, to calculate the 95% prediction interval for a normal distribution with a mean (''µ'') of 5 and a standard deviation (''σ'') of 1, then ''z'' is approximately 2. Therefore, the lower limit of the prediction interval is approximately 5 ‒ (2·1) = 3, and the upper limit is approximately 5 + (2·1) = 7, thus giving a prediction interval of approximately 3 to 7.


Estimation of parameters

For a distribution with unknown parameters, a direct approach to prediction is to estimate the parameters and then use the associated quantile function – for example, one could use the sample mean \overline as estimate for ''μ'' and the
sample variance In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its population mean or sample mean. Variance is a measure of dispersion, meaning it is a measure of how far a set of numbe ...
''s''2 as an estimate for ''σ''2. Note that there are two natural choices for ''s''2 here – dividing by (n-1) yields an unbiased estimate, while dividing by ''n'' yields the
maximum likelihood estimator In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of an assumed probability distribution, given some observed data. This is achieved by maximizing a likelihood function so that, under the assumed statis ...
, and either might be used. One then uses the quantile function with these estimated parameters \Phi^_ to give a prediction interval. This approach is usable, but the resulting interval will not have the repeated sampling interpretation – it is not a predictive confidence interval. For the sequel, use the sample mean: :\overline = \overline_n=(X_1+\cdots+X_n)/n and the (unbiased) sample variance: :s^2 = s_n^2=\sum_^n (X_i-\overline_n)^2.


Unknown mean, known variance

Given a normal distribution with unknown mean ''μ'' but known variance 1, the sample mean \overline of the observations X_1,\dots,X_n has distribution N(\mu,1/n), while the future observation X_ has distribution N(\mu,1). Taking the difference of these cancels the ''μ'' and yields a normal distribution of variance 1+(1/n), thus :\frac \sim N(0,1). Solving for X_ gives the prediction distribution N(\overline,1+(1/n)), from which one can compute intervals as before. This is a predictive confidence interval in the sense that if one uses a quantile range of 100''p''%, then on repeated applications of this computation, the future observation X_ will fall in the predicted interval 100''p''% of the time. Notice that this prediction distribution is more conservative than using the estimated mean \overline and known variance 1, as this uses variance 1+(1/n), hence yields wider intervals. This is necessary for the desired confidence interval property to hold.


Known mean, unknown variance

Conversely, given a normal distribution with known mean 0 but unknown variance \sigma^2, the sample variance s^2 of the observations X_1,\dots,X_n has, up to scale, a \scriptstyle\chi_^2 distribution; more precisely: :\frac \sim \chi_^2. while the future observation X_ has distribution N(0,\sigma^2). Taking the ratio of the future observation and the sample standard deviation cancels the ''σ,'' yielding a Student's t-distribution with ''n'' – 1 degrees of freedom: : \frac s \sim T^. Solving for X_ gives the prediction distribution sT^, from which one can compute intervals as before. Notice that this prediction distribution is more conservative than using a normal distribution with the estimated standard deviation s and known mean 0, as it uses the t-distribution instead of the normal distribution, hence yields wider intervals. This is necessary for the desired confidence interval property to hold.


Unknown mean, unknown variance

Combining the above for a normal distribution N(\mu,\sigma^2) with both ''μ'' and ''σ''2 unknown yields the following ancillary statistic: :\frac \sim T^. This simple combination is possible because the sample mean and sample variance of the normal distribution are independent statistics; this is only true for the normal distribution, and in fact characterizes the normal distribution. Solving for X_ yields the prediction distribution :\overline_n + s_n\sqrt \cdot T^. The probability of X_ falling in a given interval is then: :\Pr\left(\overline_n-T_a s_n\sqrt\leq X_ \leq\overline_n+T_a s_n\sqrt\,\right)=p where ''Ta'' is the 100(1 − ''p''/2)th
percentile In statistics, a ''k''-th percentile (percentile score or centile) is a score ''below which'' a given percentage ''k'' of scores in its frequency distribution falls (exclusive definition) or a score ''at or below which'' a given percentage falls ...
of Student's t-distribution with ''n'' − 1 degrees of freedom. Therefore, the numbers :\overline_n \pm T_a s_n \sqrt are the endpoints of a 100(1 − ''p'')% prediction interval for X_.


Non-parametric methods

One can compute prediction intervals without any assumptions on the population; formally, this is a
non-parametric Nonparametric statistics is the branch of statistics that is not based solely on parametrized families of probability distributions (common examples of parameters are the mean and variance). Nonparametric statistics is based on either being distri ...
method. If one has a sample of identical random variables , then the probability that the next observation ''X''''n''+1 will be the largest is 1/(''n'' + 1), since all observations have equal probability of being the maximum. In the same way, the probability that ''X''''n''+1 will be the smallest is 1/(''n'' + 1). The other (''n'' − 1)/(''n'' + 1) of the time, ''X''''n''+1 falls between the
sample maximum In statistics, the sample maximum and sample minimum, also called the largest observation and smallest observation, are the values of the greatest and least elements of a sample. They are basic summary statistics, used in descriptive statistic ...
and
sample minimum In statistics, the sample maximum and sample minimum, also called the largest observation and smallest observation, are the values of the greatest and least elements of a sample. They are basic summary statistics, used in descriptive statistic ...
of the sample . Thus, denoting the sample maximum and minimum by ''M'' and ''m,'' this yields an (''n'' − 1)/(''n'' + 1) prediction interval of 'm'', ''M'' Notice that while this gives the probability that a future observation will fall in a range, it does not give any estimate as to where in a segment it will fall – notably, if it falls outside the range of observed values, it may be far outside the range. See
extreme value theory Extreme value theory or extreme value analysis (EVA) is a branch of statistics dealing with the extreme deviations from the median of probability distributions. It seeks to assess, from a given ordered sample of a given random variable, the pr ...
for further discussion. Formally, this applies not just to sampling from a population, but to any exchangeable sequence of random variables, not necessarily independent or
identically distributed In probability theory and statistics, a collection of random variables is independent and identically distributed if each random variable has the same probability distribution as the others and all are mutually independent. This property is usual ...
.


Contrast with other intervals


Contrast with confidence intervals

Note that in the formula for the predictive confidence interval ''no mention'' is made of the unobservable parameters ''μ'' and ''σ'' of population mean and standard deviation – the observed ''sample'' statistics \overline_n and S_n of sample mean and standard deviation are used, and what is estimated is the outcome of ''future'' samples. Rather than using sample statistics as estimators of population parameters and applying confidence intervals to these estimates, one considers "the next sample" X_ as ''itself'' a statistic, and computes its
sampling distribution In statistics, a sampling distribution or finite-sample distribution is the probability distribution of a given random-sample-based statistic. If an arbitrarily large number of samples, each involving multiple observations (data points), were s ...
. In parameter confidence intervals, one estimates population parameters; if one wishes to interpret this as prediction of the next sample, one models "the next sample" as a draw from this estimated population, using the (estimated) ''population'' distribution. By contrast, in predictive confidence intervals, one uses the ''sampling'' distribution of (a statistic of) a sample of ''n'' or ''n'' + 1 observations from such a population, and the population distribution is not directly used, though the assumption about its form (though not the values of its parameters) is used in computing the sampling distribution.


Contrast with tolerance intervals


Applications

Prediction intervals are commonly used as definitions of
reference range In medicine and health-related fields, a reference range or reference interval is the range or the interval of values that is deemed normal for a physiological measurement in healthy persons (for example, the amount of creatinine in the blood, o ...
s, such as
reference ranges for blood tests Reference ranges (reference intervals) for blood tests are sets of values used by a health professional to interpret a set of medical test results from blood samples. Reference ranges for blood tests are studied within the field of clinical chem ...
to give an idea of whether a blood test is normal or not. For this purpose, the most commonly used prediction interval is the 95% prediction interval, and a reference range based on it can be called a ''standard reference range''.


Regression analysis

A common application of prediction intervals is to
regression analysis In statistical modeling, regression analysis is a set of statistical processes for estimating the relationships between a dependent variable (often called the 'outcome' or 'response' variable, or a 'label' in machine learning parlance) and one ...
. Suppose the data is being modeled by a straight line regression: :y_i=\alpha+\beta x_i +\varepsilon_i\, where y_i is the
response variable Dependent and independent variables are variables in mathematical modeling, statistical modeling and experimental sciences. Dependent variables receive this name because, in an experiment, their values are studied under the supposition or deman ...
, x_i is the
explanatory variable Dependent and independent variables are variables in mathematical modeling, statistical modeling and experimental sciences. Dependent variables receive this name because, in an experiment, their values are studied under the supposition or deman ...
, ''εi'' is a random error term, and \alpha and \beta are parameters. Given estimates \hat \alpha and \hat \beta for the parameters, such as from a simple linear regression, the predicted response value ''y''''d'' for a given explanatory value ''x''''d'' is :\hat_d=\hat\alpha+\hat\beta x_d , (the point on the regression line), while the actual response would be :y_d=\alpha+\beta x_d +\varepsilon_d. \, The
point estimate In statistics, point estimation involves the use of sample data to calculate a single value (known as a point estimate since it identifies a point in some parameter space) which is to serve as a "best guess" or "best estimate" of an unknown popu ...
\hat_d is called the
mean response In regression, mean response (or expected response) and predicted response, also known as mean outcome (or expected outcome) and predicted outcome, are values of the dependent variable calculated from the regression parameters and a given value ...
, and is an estimate of the
expected value In probability theory, the expected value (also called expectation, expectancy, mathematical expectation, mean, average, or first moment) is a generalization of the weighted average. Informally, the expected value is the arithmetic mean of a l ...
of ''y''''d'', E(y\mid x_d). A prediction interval instead gives an interval in which one expects ''y''''d'' to fall; this is not necessary if the actual parameters ''α'' and ''β'' are known (together with the error term ''εi''), but if one is estimating from a
sample Sample or samples may refer to: Base meaning * Sample (statistics), a subset of a population – complete data set * Sample (signal), a digital discrete sample of a continuous analog signal * Sample (material), a specimen or small quantity of s ...
, then one may use the
standard error The standard error (SE) of a statistic (usually an estimate of a parameter) is the standard deviation of its sampling distribution or an estimate of that standard deviation. If the statistic is the sample mean, it is called the standard error ...
of the estimates for the intercept and slope (\hat\alpha and \hat\beta), as well as their correlation, to compute a prediction interval. In regression, makes a distinction between intervals for predictions of the mean response vs. for predictions of observed response—affecting essentially the inclusion or not of the unity term within the square root in the expansion factors above; for details, see .


Bayesian statistics

Seymour Geisser Seymour Geisser (October 5, 1929 – March 11, 2004) was an American statistician noted for emphasizing predictive inference. In his book ''Predictive Inference: An Introduction'', he held that conventional statistical inference about unobservable ...
, a proponent of predictive inference, gives predictive applications of
Bayesian statistics Bayesian statistics is a theory in the field of statistics based on the Bayesian interpretation of probability where probability expresses a ''degree of belief'' in an event. The degree of belief may be based on prior knowledge about the event, ...
. In Bayesian statistics, one can compute (Bayesian) prediction intervals from the
posterior probability The posterior probability is a type of conditional probability that results from updating the prior probability with information summarized by the likelihood via an application of Bayes' rule. From an epistemological perspective, the posterior ...
of the random variable, as a
credible interval In Bayesian statistics, a credible interval is an interval within which an unobserved parameter value falls with a particular probability. It is an interval in the domain of a posterior probability distribution or a predictive distribution. T ...
. In theoretical work, credible intervals are not often calculated for the prediction of future events, but for inference of parameters – i.e., credible intervals of a parameter, not for the outcomes of the variable itself. However, particularly where applications are concerned with possible extreme values of yet to be observed cases, credible intervals for such values can be of practical importance.


See also

*
Extrapolation In mathematics, extrapolation is a type of estimation, beyond the original observation range, of the value of a variable on the basis of its relationship with another variable. It is similar to interpolation, which produces estimates between know ...
*
Posterior probability The posterior probability is a type of conditional probability that results from updating the prior probability with information summarized by the likelihood via an application of Bayes' rule. From an epistemological perspective, the posterior ...
* Prediction * Prediction band *
Seymour Geisser Seymour Geisser (October 5, 1929 – March 11, 2004) was an American statistician noted for emphasizing predictive inference. In his book ''Predictive Inference: An Introduction'', he held that conventional statistical inference about unobservable ...
* Statistical model validation *
Trend estimation Linear trend estimation is a statistical technique to aid interpretation of data. When a series of measurements of a process are treated as, for example, a sequences or time series, trend estimation can be used to make and justify statements abou ...


Notes


References

* * *


Further reading

* * * * ISO 16269-8 Standard Interpretation of Data, Part 8, Determination of Prediction Intervals {{DEFAULTSORT:Prediction Interval Statistical forecasting Regression analysis Statistical intervals