In statistics, the predicted residual error sum of squares (PRESS) is a form of cross-validation used in

regression analysis In statistical modeling, regression analysis is a set of statistical processes for estimating the relationships between a dependent variable (often called the 'outcome' or 'response' variable, or a 'label' in machine learning parlance) and one ...

to provide a summary measure of the fit of a model to a sample of observations that were not themselves used to estimate the model. It is calculated as the sums of squares of the prediction residuals for those observations. A ''fitted model'' having been produced, each observation in turn is removed and the model is refitted using the remaining observations. The out-of-sample predicted value is calculated for the omitted observation in each case, and the PRESS statistic is calculated as the sum of the squares of all the resulting prediction errors: :

\operatorname =\sum_^n (y_i - \hat_)^2

Given this procedure, the PRESS statistic can be calculated for a number of candidate model structures for the same dataset, with the lowest values of PRESS indicating the best structures. Models that are over-parameterised ( over-fitted) would tend to give small residuals for observations included in the model-fitting but large residuals for observations that are excluded. PRESS statistic has been extensively used in

Lazy Learning In machine learning, lazy learning is a learning method in which generalization of the training data is, in theory, delayed until a query is made to the system, as opposed to eager learning, where the system tries to generalize the training data ...

and locally linear learning to speed-up the assessment and the selection of the neighbourhood size.

References

Regression diagnostics Model selection {{statistics-stub

See also

References