HOME

TheInfoList



OR:

In
machine learning Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence. Machine ...
, a learning curve (or training curve) plots the optimal value of a model's
loss function In mathematical optimization and decision theory, a loss function or cost function (sometimes also called an error function) is a function that maps an event or values of one or more variables onto a real number intuitively representing some "co ...
for a training set against this loss function evaluated on a validation data set with same parameters as produced the optimal function. Synonyms include ''error curve'', ''experience curve'', ''improvement curve'' and ''generalization curve''. More abstractly, the learning curve is a curve of (learning effort)-(predictive performance), where usually learning effort means number of training samples and predictive performance means accuracy on testing samples. The machine learning curve is useful for many purposes including comparing different algorithms, choosing model parameters during design, adjusting optimization to improve convergence, and determining the amount of data used for training.


Formal definition

One model of a machine learning is producing a function, , which given some information, , predicts some variable, , from training data X_\text and Y_\text . It is distinct from
mathematical optimization Mathematical optimization (alternatively spelled ''optimisation'') or mathematical programming is the selection of a best element, with regard to some criterion, from some set of available alternatives. It is generally divided into two subfi ...
because f should predict well for x outside of X_\text. We often constrain the possible functions to a parameterized family of functions, \ , so that our function is more generalizable or so that the function has certain properties such as those that make finding a good f easier, or because we have some a priori reason to think that these properties are true. Given that it is not possible to produce a function that perfectly fits out data, it is then necessary to produce a loss function L(f_\theta(X), Y') to measure how good our prediction is. We then define an optimization process which finds a \theta which minimizes L(f_\theta(X_, Y)) referred to as \theta^*(X, Y) .


Training curve for amount of data

Then if our training data is \, \ and our validation data is \, \ a learning curve is the plot of the two curves # i \mapsto L(f_(X_i), Y_i ) # i \mapsto L(f_(X_i'), Y_i' ) where X_i = \


Training curve for number of iterations

Many optimization processes are iterative, repeating the same step until the process converges to an optimal value.
Gradient descent In mathematics, gradient descent (also often called steepest descent) is a first-order iterative optimization algorithm for finding a local minimum of a differentiable function. The idea is to take repeated steps in the opposite direction of ...
is one such algorithm. If you define \theta_i^* as the approximation of the optimal \theta after i steps, a learning curve is the plot of # i \mapsto L(f_(X), Y) # i \mapsto L(f_(X'), Y')


Choosing the size of the training dataset

It is a tool to find out how much a machine model benefits from adding more training data and whether the estimator suffers more from a variance error or a bias error. If both the validation score and the training score converge to a value that is too low with increasing size of the training set, it will not benefit much from more training data. In the machine learning domain, there are two implications of learning curves differing in the x-axis of the curves, with experience of the model graphed either as the number of training examples used for learning or the number of iterations used in training the model.


See also

*
Overfitting mathematical modeling, overfitting is "the production of an analysis that corresponds too closely or exactly to a particular set of data, and may therefore fail to fit to additional data or predict future observations reliably". An overfitt ...
*
Bias–variance tradeoff In statistics and machine learning, the bias–variance tradeoff is the property of a model that the variance of the parameter estimated across samples can be reduced by increasing the bias in the estimated parameters. The bias–variance d ...
*
Model selection Model selection is the task of selecting a statistical model from a set of candidate models, given data. In the simplest cases, a pre-existing set of data is considered. However, the task can also involve the design of experiments such that the ...
*
Cross-validation (statistics) Cross-validation, sometimes called rotation estimation or out-of-sample testing, is any of various similar model validation techniques for assessing how the results of a statistical analysis will generalize to an independent data set. Cross-va ...
*
Validity (statistics) Validity is the main extent to which a concept, conclusion or measurement is well-founded and likely corresponds accurately to the real world. The word "valid" is derived from the Latin validus, meaning strong. The validity of a measurement tool ( ...
*
Verification and validation Verification and validation (also abbreviated as V&V) are independent procedures that are used together for checking that a product, service, or system meets requirements and specifications and that it fulfills its intended purpose. These ar ...
*
Double descent In statistics and machine learning, double descent is the phenomenon where a statistical model with a small number of parameters and a model with an extremely large number of parameters have a small error, but a model whose number of parameters ...


References

{{Reflist Model selection Machine learning