HOME

TheInfoList



OR:

In
statistics Statistics (from German language, German: ''wikt:Statistik#German, Statistik'', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of ...
, model validation is the task of evaluating whether a chosen
statistical model A statistical model is a mathematical model that embodies a set of statistical assumptions concerning the generation of Sample (statistics), sample data (and similar data from a larger Statistical population, population). A statistical model repres ...
is appropriate or not. Oftentimes in statistical inference, inferences from models that appear to fit their data may be flukes, resulting in a misunderstanding by researchers of the actual relevance of their model. To combat this, model validation is used to test whether a statistical model can hold up to permutations in the data. This topic is not to be confused with the closely related task of
model selection Model selection is the task of selecting a statistical model from a set of candidate models, given data. In the simplest cases, a pre-existing set of data is considered. However, the task can also involve the design of experiments such that the ...
, the process of discriminating between multiple candidate models: model validation does not concern so much the conceptual design of models as it tests only the consistency between a chosen model and its stated outputs. There are many ways to validate a model. Residual plots plot the difference between the actual data and the model's predictions: correlations in the residual plots may indicate a flaw in the model. Cross validation is a method of model validation that iteratively refits the model, each time leaving out just a small sample and comparing whether the samples left out are predicted by the model: there are many kinds of cross validation. Predictive simulation is used to compare simulated data to actual data. External validation involves fitting the model to new data. Akaike information criterion estimates the quality of a model.


Overview

Model validation comes in many forms and the specific method of model validation a researcher uses is often a constraint of their research design. To emphasize, what this means is that there is no one-size-fits-all method to validating a model. For example, if a researcher is operating with a very limited set of data, but data they have strong prior assumptions about, they may consider validating the fit of their model by using a Bayesian framework and testing the fit of their model using various prior distributions. However, if a researcher has a lot of data and is testing multiple nested models, these conditions may lend themselves toward cross validation and possibly a leave one out test. These are two abstract examples and any actual model validation will have to consider far more intricacies than describes here but these example illustrate that model validation methods are always going to be circumstantial. In general, models can be validated using existing data or with new data, and both methods are discussed more in the following subsections, and a note of caution is provided, too.


Validation with Existing Data

Validation based on existing data involves analyzing the
goodness of fit The goodness of fit of a statistical model describes how well it fits a set of observations. Measures of goodness of fit typically summarize the discrepancy between observed values and the values expected under the model in question. Such measure ...
of the model or analyzing whether the residuals seem to be random (i.e. residual diagnostics). This method involves using analyses of the models closeness to the data and trying to understand how well the model predicts its own data. One example of this method is in Figure 1, which shows a polynomial function fit to some data. We see that the polynomial function does not conform well to the data, which appears linear, and might invalidate this polynomial model.


Validation with New Data

If new data becomes available, an existing model can be validated by assessing whether the new data is predicted by the old model. If the new data is not predicted by the old model, then the model might not be valid for the researcher's goals.


A Note of Caution

A model can be validated only relative to some application area.. A model that is valid for one application might be invalid for some other applications. As an example, consider the curve in Figure 1: if the application only used inputs from the interval , 2 then the curve might well be an acceptable model.


Methods for validating

When doing a validation, there are three notable causes of potential difficulty, according to the ''
Encyclopedia of Statistical Sciences The ''Encyclopedia of Statistical Sciences'' is an encyclopaedia of statistics published by John Wiley & Sons.
''.. The three causes are these: lack of data; lack of control of the input variables; uncertainty about the underlying probability distributions and correlations. The usual methods for dealing with difficulties in validation include the following: checking the assumptions made in constructing the model; examining the available data and related model outputs; applying expert judgment. Note that expert judgment commonly requires expertise in the application area.. Expert judgment can sometimes be used to assess the validity of a prediction ''without'' obtaining real data: e.g. for the curve in Figure 1, an expert might well be able to assess that a substantial extrapolation will be invalid. Additionally, expert judgment can be used in
Turing Alan Mathison Turing (; 23 June 1912 – 7 June 1954) was an English mathematician, computer scientist, logician, cryptanalyst, philosopher, and theoretical biologist. Turing was highly influential in the development of theoretical co ...
-type tests, where experts are presented with both real data and related model outputs and then asked to distinguish between the two.. For some classes of statistical models, specialized methods of performing validation are available. As an example, if the statistical model was obtained via a regression, then specialized analyses for
regression model validation In statistics, regression validation is the process of deciding whether the numerical results quantifying hypothesized relationships between variables, obtained from regression analysis, are acceptable as descriptions of the data. The validation p ...
exist and are generally employed.


Residual diagnostics

Residual diagnostics comprise analyses of the residuals to determine whether the residuals seem to be effectively random. Such analyses typically requires estimates of the probability distributions for the residuals. Estimates of the residuals' distributions can often be obtained by repeatedly running the model, i.e. by using repeated
stochastic simulation A stochastic simulation is a simulation of a system that has variables that can change stochastically (randomly) with individual probabilities.DLOUHÝ, M.; FÁBRY, J.; KUNCOVÁ, M.. Simulace pro ekonomy. Praha : VŠE, 2005. Realizations of these ...
s (employing a
pseudorandom number generator A pseudorandom number generator (PRNG), also known as a deterministic random bit generator (DRBG), is an algorithm for generating a sequence of numbers whose properties approximate the properties of sequences of random numbers. The PRNG-generate ...
for random variables in the model). If the statistical model was obtained via a regression, then regression-residual diagnostics exist and may be used; such diagnostics have been well studied.


Cross Validation

Cross validation is a method of sampling that involves leaving some parts of the data out of the fitting process and then seeing whether those data that are left out are close or far away from where the model predicts they would be. What that means practically is that cross validation techniques fit the model many, many times with a portion of the data and compares each model fit to the portion it did not use. If the models very rarely describe the data that they were not trained on, then the model is probably wrong.


See also

*
All models are wrong All or ALL may refer to: Language * All, an indefinite pronoun in English * All, one of the English determiners * Allar language (ISO 639-3 code) * Allative case (abbreviated ALL) Music * All (band), an American punk rock band * ''All'' (All al ...
*
Cross-validation (statistics) Cross-validation, sometimes called rotation estimation or out-of-sample testing, is any of various similar model validation techniques for assessing how the results of a statistical analysis will generalize to an independent data set. Cross-v ...
*
Identifiability analysis Identifiability analysis is a group of methods found in mathematical statistics that are used to determine how well the parameters of a model are estimated by the quantity and quality of experimental data.Cobelli & DiStefano (1980) Therefore, thes ...
*
Internal validity Internal validity is the extent to which a piece of evidence supports a claim about cause and effect, within the context of a particular study. It is one of the most important properties of scientific studies and is an important concept in reason ...
*
Model identification In statistics, identifiability is a property which a model must satisfy for precise inference to be possible. A model is identifiable if it is theoretically possible to learn the true values of this model's underlying parameters after obtaining ...
*
Overfitting mathematical modeling, overfitting is "the production of an analysis that corresponds too closely or exactly to a particular set of data, and may therefore fail to fit to additional data or predict future observations reliably". An overfitt ...
*
Perplexity In information theory, perplexity is a measurement of how well a probability distribution or probability model predicts a sample. It may be used to compare probability models. A low perplexity indicates the probability distribution is good at ...
*
Predictive model Predictive modelling uses statistics to predict outcomes. Most often the event one wants to predict is in the future, but predictive modelling can be applied to any type of unknown event, regardless of when it occurred. For example, predictive mod ...
*
Sensitivity analysis Sensitivity analysis is the study of how the uncertainty in the output of a mathematical model or system (numerical or otherwise) can be divided and allocated to different sources of uncertainty in its inputs. A related practice is uncertainty anal ...
*
Spurious relationship In statistics, a spurious relationship or spurious correlation is a mathematical relationship in which two or more events or variables are associated but '' not'' causally related, due to either coincidence or the presence of a certain third, u ...
*
Statistical conclusion validity Statistical conclusion validity is the degree to which conclusions about the relationship among variables based on the data are correct or "reasonable". This began as being solely about whether the statistical conclusion about the relationship of ...
* Statistical model selection *
Statistical model specification In statistics, model specification is part of the process of building a statistical model: specification consists of selecting an appropriate functional form for the model and choosing which variables to include. For example, given personal income ...
*
Validity (statistics) Validity is the main extent to which a concept, conclusion or measurement is well-founded and likely corresponds accurately to the real world. The word "valid" is derived from the Latin validus, meaning strong. The validity of a measurement tool ( ...


References


Further reading

* * *


External links


How can I tell if a model fits my data?
 —''Handbook of Statistical Methods'' (
NIST The National Institute of Standards and Technology (NIST) is an agency of the United States Department of Commerce whose mission is to promote American innovation and industrial competitiveness. NIST's activities are organized into physical sci ...
) *{{cite web , first=Dan , last=Hicks , date=July 14, 2017 , title=What are core statistical model validation techniques? , work=
Stack Exchange Stack Exchange is a network of question-and-answer (Q&A) websites on topics in diverse fields, each site covering a specific topic, where questions, answers, and users are subject to a reputation award process. The reputation system allows th ...
, url=https://stats.stackexchange.com/q/291481 Statistical models Validity (statistics)