In
statistics
Statistics (from German language, German: ''wikt:Statistik#German, Statistik'', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of ...
the mean squared prediction error or mean squared error of the predictions of a
smoothing or
curve fitting procedure is the expected value of the squared difference between the fitted values implied by the predictive function
and the values of the (unobservable) function ''g''. It is an inverse measure of the explanatory power of
and can be used in the process of
cross-validation of an estimated model.
If the smoothing or fitting procedure has
projection matrix (i.e., hat matrix) ''L'', which maps the observed values vector
to predicted values vector
via
then
:
The MSPE can be decomposed into two terms: the mean of squared biases of the fitted values and the mean of variances of the fitted values:
:
Knowledge of ''g'' is required in order to calculate the MSPE exactly; otherwise, it can be estimated.
Computation of MSPE over out-of-sample data
The mean squared prediction error can be computed exactly in two contexts. First, with a
data sample of length ''n'', the
data analyst may run the
regression
Regression or regressions may refer to:
Science
* Marine regression, coastal advance due to falling sea level, the opposite of marine transgression
* Regression (medicine), a characteristic of diseases to express lighter symptoms or less extent ( ...
over only ''q'' of the data points (with ''q'' < ''n''), holding back the other ''n – q'' data points with the specific purpose of using them to compute the estimated model’s MSPE out of sample (i.e., not using data that were used in the model estimation process). Since the regression process is tailored to the ''q'' in-sample points, normally the in-sample MSPE will be smaller than the out-of-sample one computed over the ''n – q'' held-back points. If the increase in the MSPE out of sample compared to in sample is relatively slight, that results in the model being viewed favorably. And if two models are to be compared, the one with the lower MSPE over the ''n – q'' out-of-sample data poi