In
machine learning
Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence.
Machine ...
, a margin classifier is a
classifier which is able to give an associated distance from the decision boundary for each example. For instance, if a
linear classifier
In the field of machine learning, the goal of statistical classification is to use an object's characteristics to identify which class (or group) it belongs to. A linear classifier achieves this by making a classification decision based on the val ...
(e.g.
perceptron
In machine learning, the perceptron (or McCulloch-Pitts neuron) is an algorithm for supervised learning of binary classifiers. A binary classifier is a function which can decide whether or not an input, represented by a vector of numbers, belon ...
or
linear discriminant analysis
Linear discriminant analysis (LDA), normal discriminant analysis (NDA), or discriminant function analysis is a generalization of Fisher's linear discriminant, a method used in statistics and other fields, to find a linear combination of features ...
) is used, the distance (typically
euclidean distance
In mathematics, the Euclidean distance between two points in Euclidean space is the length of a line segment between the two points.
It can be calculated from the Cartesian coordinates of the points using the Pythagorean theorem, therefor ...
, though others may be used) of an example from the separating hyperplane is the margin of that example.
The notion of margin is important in several machine learning classification algorithms, as it can be used to bound the
generalization error of the classifier. These bounds are frequently shown using the
VC dimension
VC may refer to:
Military decorations
* Victoria Cross, a military decoration awarded by the United Kingdom and also by certain Commonwealth nations
** Victoria Cross for Australia
** Victoria Cross (Canada)
** Victoria Cross for New Zealand
* Vic ...
. Of particular prominence is the generalization
error bound
The approximation error in a data value is the discrepancy between an exact value and some '' approximation'' to it. This error can be expressed as an absolute error (the numerical amount of the discrepancy) or as a relative error (the absolute e ...
on
boosting algorithms and
support vector machine
In machine learning, support vector machines (SVMs, also support vector networks) are supervised learning models with associated learning algorithms that analyze data for classification and regression analysis. Developed at AT&T Bell Laboratorie ...
s.
Support vector machine definition of margin
See
support vector machine
In machine learning, support vector machines (SVMs, also support vector networks) are supervised learning models with associated learning algorithms that analyze data for classification and regression analysis. Developed at AT&T Bell Laboratorie ...
s and
maximum-margin hyperplane
In geometry, the hyperplane separation theorem is a theorem about disjoint convex sets in ''n''-dimensional Euclidean space. There are several rather similar versions. In one version of the theorem, if both these sets are closed and at least one ...
for details.
Margin for boosting algorithms
The margin for an iterative
boosting algorithm given a set of examples with two classes can be defined as follows. The classifier is given an example pair
where
is a domain space and
is the label of the example. The iterative boosting algorithm then selects a classifier
at each iteration
where
is a space of possible classifiers that predict real values. This hypothesis is then weighted by
as selected by the boosting algorithm. At iteration
, the margin of an example
can thus be defined as
:
By this definition, the margin is positive if the example is labeled correctly and negative if the example is labeled incorrectly.
This definition may be modified and is not the only way to define margin for boosting algorithms. However, there are reasons why this definition may be appealing.
[Robert E. Schapire, Yoav Freund, Peter Bartlett and Wee Sun Lee.(1998)]
Boosting the margin: A new explanation for the effectiveness of voting methods
, ''The Annals of Statistics'', 26(5):1651–1686
Examples of margin-based algorithms
Many classifiers can give an associated margin for each example. However, only some classifiers utilize information of the margin while learning from a data set.
Many boosting algorithms rely on the notion of a margin to give weights to examples. If a convex loss is utilized (as in
AdaBoost
AdaBoost, short for ''Adaptive Boosting'', is a statistical classification meta-algorithm formulated by Yoav Freund and Robert Schapire in 1995, who won the 2003 Gödel Prize for their work. It can be used in conjunction with many other types of ...
,
LogitBoost In machine learning and computational learning theory, LogitBoost is a boosting algorithm formulated by Jerome Friedman, Trevor Hastie, and Robert Tibshirani. The original paper casts the AdaBoost algorithm into a statistical framework. Specif ...
, and all members of the
AnyBoost family of algorithms) then an example with higher margin will receive less (or equal) weight than an example with lower margin. This leads the boosting algorithm to focus weight on low margin examples. In nonconvex algorithms (e.g.
BrownBoost BrownBoost is a boosting algorithm that may be robust to noisy datasets. BrownBoost is an adaptive version of the boost by majority algorithm. As is true for all boosting algorithms, BrownBoost is used in conjunction with other machine learning ...
), the margin still dictates the weighting of an example, though the weighting is non-monotone with respect to margin. There exists boosting algorithms that probably maximize the minimum margin (e.g. see
[Manfred Warmuth and Karen Glocer and Gunnar Rätsch. Boosting Algorithms for Maximizing the Soft Margin. In the Proceedings of Advances in Neural Information Processing Systems 20, 2007, pp 1585–1592.]).
Support vector machine
In machine learning, support vector machines (SVMs, also support vector networks) are supervised learning models with associated learning algorithms that analyze data for classification and regression analysis. Developed at AT&T Bell Laboratorie ...
s probably maximize the margin of the separating hyperplane. Support vector machines that are trained using noisy data (there exists no perfect separation of the data in the given space) maximize the soft margin. More discussion of this can be found in the
support vector machine
In machine learning, support vector machines (SVMs, also support vector networks) are supervised learning models with associated learning algorithms that analyze data for classification and regression analysis. Developed at AT&T Bell Laboratorie ...
article.
The
voted-perceptron algorithm is a margin maximizing algorithm based on an iterative application of the classic
perceptron
In machine learning, the perceptron (or McCulloch-Pitts neuron) is an algorithm for supervised learning of binary classifiers. A binary classifier is a function which can decide whether or not an input, represented by a vector of numbers, belon ...
algorithm.
Generalization error bounds
One theoretical motivation behind margin classifiers is that their
generalization error may be bound by parameters of the algorithm and a margin term. An example of such a bound is for the AdaBoost algorithm.
Let
be a set of
examples sampled independently at random from a distribution
. Assume the VC-dimension of the underlying base classifier is
and
. Then with probability
we have the bound
:
for all
.
References
{{reflist
Classification algorithms
Statistical classification