HOME

TheInfoList



OR:

In statistics the mean squared prediction error or mean squared error of the predictions of a
smoothing In statistics and image processing, to smooth a data set is to create an approximating function that attempts to capture important patterns in the data, while leaving out noise or other fine-scale structures/rapid phenomena. In smoothing, the dat ...
or
curve fitting Curve fitting is the process of constructing a curve, or mathematical function, that has the best fit to a series of data points, possibly subject to constraints. Curve fitting can involve either interpolation, where an exact fit to the data i ...
procedure is the expected value of the squared difference between the fitted values implied by the predictive function \widehat and the values of the (unobservable) function ''g''. It is an inverse measure of the explanatory power of \widehat, and can be used in the process of cross-validation of an estimated model. If the smoothing or fitting procedure has
projection matrix In statistics, the projection matrix (\mathbf), sometimes also called the influence matrix or hat matrix (\mathbf), maps the vector of response values (dependent variable values) to the vector of fitted values (or predicted values). It describes t ...
(i.e., hat matrix) ''L'', which maps the observed values vector y to predicted values vector \hat via \hat=Ly, then :\operatorname(L)=\operatorname\left left( g(x_i)-\widehat(x_i)\right)^2\right The MSPE can be decomposed into two terms: the mean of squared biases of the fitted values and the mean of variances of the fitted values: :n\cdot\operatorname(L)=\sum_^n\left(\operatorname\left widehat(x_i)\rightg(x_i)\right)^2+\sum_^n\operatorname\left widehat(x_i)\right Knowledge of ''g'' is required in order to calculate the MSPE exactly; otherwise, it can be estimated.


Computation of MSPE over out-of-sample data

The mean squared prediction error can be computed exactly in two contexts. First, with a
data sample In statistics, quality assurance, and survey methodology, sampling is the selection of a subset (a statistical sample) of individuals from within a statistical population to estimate characteristics of the whole population. Statisticians attem ...
of length ''n'', the
data analyst Data analysis is a process of inspecting, cleansing, transforming, and modeling data with the goal of discovering useful information, informing conclusions, and supporting decision-making. Data analysis has multiple facets and approaches, enc ...
may run the regression over only ''q'' of the data points (with ''q'' < ''n''), holding back the other ''n – q'' data points with the specific purpose of using them to compute the estimated model’s MSPE out of sample (i.e., not using data that were used in the model estimation process). Since the regression process is tailored to the ''q'' in-sample points, normally the in-sample MSPE will be smaller than the out-of-sample one computed over the ''n – q'' held-back points. If the increase in the MSPE out of sample compared to in sample is relatively slight, that results in the model being viewed favorably. And if two models are to be compared, the one with the lower MSPE over the ''n – q'' out-of-sample data points is viewed more favorably, regardless of the models’ relative in-sample performances. The out-of-sample MSPE in this context is exact for the out-of-sample data points that it was computed over, but is merely an estimate of the model’s MSPE for the mostly unobserved population from which the data were drawn. Second, as time goes on more data may become available to the data analyst, and then the MSPE can be computed over these new data.


Estimation of MSPE over the population

When the model has been estimated over all available data with none held back, the MSPE of the model over the entire
population Population typically refers to the number of people in a single area, whether it be a city or town, region, country, continent, or the world. Governments typically quantify the size of the resident population within their jurisdiction using a ...
of mostly unobserved data can be estimated as follows. For the model y_i=g(x_i)+\sigma\varepsilon_i where \varepsilon_i\sim\mathcal(0,1), one may write :n\cdot\operatorname(L)=g^(I-L)^(I-L)g+\sigma^2\operatorname\left ^ L\right Using in-sample data values, the first term on the right side is equivalent to :\sum_^n\left(\operatorname\left (x_i)-\widehat(x_i)\rightright)^2 =\operatorname\left sum_^n\left(y_i-\widehat(x_i)\right)^2\right\sigma^2\operatorname\left left(I-L\right)^T\left(I-L\right)\right Thus, :n\cdot\operatorname(L)=\operatorname\left sum_^n\left(y_i-\widehat(x_i)\right)^2\right\sigma^2\left(n-\operatorname\left \rightright). If \sigma^2 is known or well-estimated by \widehat^2, it becomes possible to estimate MSPE by :n\cdot\operatorname(L)=\sum_^n\left(y_i-\widehat(x_i)\right)^2-\widehat^2\left(n-\operatorname\left \rightright).
Colin Mallows Colin Lingwood Mallows (born 10 September 1930, Great Sampford, Essex) is an English statistician, who has worked in the United States since 1960. He is known for Mallows's ''Cp'', a regression model diagnostic procedure, widely used in regressio ...
advocated this method in the construction of his model selection statistic ''Cp'', which is a normalized version of the estimated MSPE: :C_p=\frac-n+2p. where ''p'' the number of estimated parameters ''p'' and \widehat^2 is computed from the version of the model that includes all possible regressors. That concludes this proof.


See also

*
Mean squared error In statistics, the mean squared error (MSE) or mean squared deviation (MSD) of an estimator (of a procedure for estimating an unobserved quantity) measures the average of the squares of the errors—that is, the average squared difference between ...
*
Errors and residuals in statistics In statistics and optimization, errors and residuals are two closely related and easily confused measures of the deviation of an observed value of an element of a statistical sample from its "true value" (not necessarily observable). The er ...
*
Law of total variance In probability theory, the law of total variance or variance decomposition formula or conditional variance formulas or law of iterated variances also known as Eve's law, states that if X and Y are random variables on the same probability space, and ...


Further reading

* {{DEFAULTSORT:Mean Squared Prediction Error Point estimation performance Statistical deviation and dispersion Loss functions