In
machine learning
Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence.
Machine ...
, the hinge loss is a
loss function
In mathematical optimization and decision theory, a loss function or cost function (sometimes also called an error function) is a function that maps an event or values of one or more variables onto a real number intuitively representing some "cost ...
used for training
classifiers. The hinge loss is used for "maximum-margin" classification, most notably for
support vector machine
In machine learning, support vector machines (SVMs, also support vector networks) are supervised learning models with associated learning algorithms that analyze data for classification and regression analysis. Developed at AT&T Bell Laboratorie ...
s (SVMs).
For an intended output and a classifier score , the hinge loss of the prediction is defined as
:
Note that
should be the "raw" output of the classifier's decision function, not the predicted class label. For instance, in linear SVMs,
, where
are the parameters of the
hyperplane
In geometry, a hyperplane is a subspace whose dimension is one less than that of its ''ambient space''. For example, if a space is 3-dimensional then its hyperplanes are the 2-dimensional planes, while if the space is 2-dimensional, its hyper ...
and
is the input variable(s).
When and have the same sign (meaning predicts the right class) and
, the hinge loss
. When they have opposite signs,
increases linearly with , and similarly if
, even if it has the same sign (correct prediction, but not by enough margin).
Extensions
While binary SVMs are commonly extended to
multiclass classification
In machine learning and statistical classification, multiclass classification or multinomial classification is the problem of classifying instances into one of three or more classes (classifying instances into one of two classes is called binary c ...
in a one-vs.-all or one-vs.-one fashion,
it is also possible to extend the hinge loss itself for such an end. Several different variations of multiclass hinge loss have been proposed.
For example, Crammer and Singer
defined it for a linear classifier as
:
Where
the target label,
and
the model parameters.
Weston and Watkins provided a similar definition, but with a sum rather than a max:
:
In
structured prediction
Structured prediction or structured (output) learning is an umbrella term for supervised machine learning techniques that involves predicting structured objects, rather than scalar discrete or real values.
Similar to commonly used supervised l ...
, the hinge loss can be further extended to structured output spaces.
Structured SVMs with margin rescaling use the following variant, where denotes the SVM's parameters, the SVM's predictions, the joint feature function, and the
Hamming loss:
:
Optimization
The hinge loss is a
convex function
In mathematics, a real-valued function is called convex if the line segment between any two points on the graph of a function, graph of the function lies above the graph between the two points. Equivalently, a function is convex if its epigra ...
, so many of the usual convex optimizers used in machine learning can work with it. It is not
differentiable
In mathematics, a differentiable function of one real variable is a function whose derivative exists at each point in its domain. In other words, the graph of a differentiable function has a non-vertical tangent line at each interior point in its ...
, but has a
subgradient with respect to model parameters of a linear SVM with score function
that is given by
:
However, since the derivative of the hinge loss at
is undefined,
smoothed versions may be preferred for optimization, such as Rennie and Srebro's
:
or the quadratically smoothed
:
suggested by Zhang.
The
modified Huber loss is a special case of this loss function with
, specifically
.
See also
*
References
{{Reflist
Loss functions
Support vector machines