In statistics, Mallows's ''C_p'', named for Colin Lingwood Mallows, is used to assess the fit of a regression model that has been estimated using

ordinary least squares In statistics, ordinary least squares (OLS) is a type of linear least squares method for choosing the unknown parameters in a linear regression model (with fixed level-one effects of a linear function of a set of explanatory variables) by the ...

. It is applied in the context of model selection, where a number of predictor variables are available for predicting some outcome, and the goal is to find the best model involving a subset of these predictors. A small value of C_p means that the model is relatively precise. Mallows's ''C_p'' has been shown to be equivalent to

Akaike information criterion The Akaike information criterion (AIC) is an estimator of prediction error and thereby relative quality of statistical models for a given set of data. Given a collection of models for the data, AIC estimates the quality of each model, relative to ...

in the special case of Gaussian

linear regression In statistics, linear regression is a linear approach for modelling the relationship between a scalar response and one or more explanatory variables (also known as dependent and independent variables). The case of one explanatory variable is ...

Definition and properties

Mallows's ''C_p'' addresses the issue of

overfitting mathematical modeling, overfitting is "the production of an analysis that corresponds too closely or exactly to a particular set of data, and may therefore fail to fit to additional data or predict future observations reliably". An overfitt ...

, in which model selection statistics such as the residual sum of squares always get smaller as more variables are added to a model. Thus, if we aim to select the model giving the smallest residual sum of squares, the model including all variables would always be selected. Instead, the ''C_p'' statistic calculated on a sample of data estimates the sum squared prediction error (SSPE) as its

population Population typically refers to the number of people in a single area, whether it be a city or to