Structural Risk Minimization
   HOME

TheInfoList



OR:

Structural risk minimization (SRM) is an inductive principle of use in
machine learning Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence. Machine ...
. Commonly in machine learning, a generalized model must be selected from a finite data set, with the consequent problem of
overfitting mathematical modeling, overfitting is "the production of an analysis that corresponds too closely or exactly to a particular set of data, and may therefore fail to fit to additional data or predict future observations reliably". An overfitt ...
– the model becoming too strongly tailored to the particularities of the training set and generalizing poorly to new data. The SRM principle addresses this problem by balancing the model's complexity against its success at fitting the training data. This principle was first set out in a 1974 paper by
Vladimir Vapnik Vladimir Naumovich Vapnik (russian: Владимир Наумович Вапник; born 6 December 1936) is one of the main developers of the Vapnik–Chervonenkis theory of statistical learning, and the co-inventor of the support-vector machine ...
and
Alexey Chervonenkis Alexey Yakovlevich Chervonenkis (russian: link=no, Алексей Яковлевич Червоненкис; 7 September 1938 – 22 September 2014) was a Soviet and Russian mathematician. Along with Vladimir Vapnik, he was one of the main develo ...
and uses the
VC dimension VC may refer to: Military decorations * Victoria Cross, a military decoration awarded by the United Kingdom and also by certain Commonwealth nations ** Victoria Cross for Australia ** Victoria Cross (Canada) ** Victoria Cross for New Zealand * Vic ...
. In practical terms, Structural Risk Minimization is implemented by minimizing E_ + \beta H(W), where E_ is the train error, the function H(W) is called a regularization function, and \beta is a constant. H(W) is chosen such that it takes large values on parameters W that belong to high-capacity subsets of the parameter space. Minimizing H(W) in effect limits the capacity of the accessible subsets of the parameter space, thereby controlling the trade-off between minimizing the training error and minimizing the expected gap between the training error and test error. The SRM problem can be formulated in terms of data. Given n data points consisting of data x and labels y, the objective J(\theta) is often expressed in the following manner: J(\theta) = \frac \sum_^(h_(x^i) - y^i)^2 + \frac \sum_^ \theta_j^2 The first term is the mean squared error (MSE) term between the value of the learned model, h_, and the given labels y. This term is the training error, E_, that was discussed earlier. The second term, places a prior over the weights, to favor sparsity and penalize larger weights. The trade-off coefficient, \lambda, is a hyperparameter that places more or less importance on the regularization term. Larger \lambda encourages sparser weights at the expense of a more optimal MSE, and smaller \lambda relaxes regularization allowing the model to fit to data. Note that as \lambda \to \infty the weights become zero, and as \lambda \to 0, the model typically suffers from overfitting.


See also

*
Vapnik–Chervonenkis theory Vapnik–Chervonenkis theory (also known as VC theory) was developed during 1960–1990 by Vladimir Vapnik and Alexey Chervonenkis. The theory is a form of computational learning theory, which attempts to explain the learning process from a stati ...
*
Support vector machines In machine learning, support vector machines (SVMs, also support vector networks) are supervised learning models with associated learning algorithms that analyze data for classification and regression analysis. Developed at AT&T Bell Laboratorie ...
*
Model selection Model selection is the task of selecting a statistical model from a set of candidate models, given data. In the simplest cases, a pre-existing set of data is considered. However, the task can also involve the design of experiments such that the ...
*
Occam Learning In computational learning theory, Occam learning is a model of algorithmic learning where the objective of the learner is to output a succinct representation of received training data. This is closely related to probably approximately correct (P ...
*
Empirical risk minimization Empirical risk minimization (ERM) is a principle in statistical learning theory which defines a family of learning algorithms and is used to give theoretical bounds on their performance. The core idea is that we cannot know exactly how well an alg ...


References


External links


Structural risk minimization
at the support vector machines website. Machine learning {{compu-sci-stub