machine learning Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence. Machine ...

, a margin classifier is a classifier which is able to give an associated distance from the decision boundary for each example. For instance, if a

linear classifier In the field of machine learning, the goal of statistical classification is to use an object's characteristics to identify which class (or group) it belongs to. A linear classifier achieves this by making a classification decision based on the val ...

(e.g.

perceptron In machine learning, the perceptron (or McCulloch-Pitts neuron) is an algorithm for supervised classification, supervised learning of binary classification, binary classifiers. A binary classifier is a function which can decide whether or not an ...

linear discriminant analysis Linear discriminant analysis (LDA), normal discriminant analysis (NDA), or discriminant function analysis is a generalization of Fisher's linear discriminant, a method used in statistics and other fields, to find a linear combination of features ...

) is used, the distance (typically

euclidean distance In mathematics, the Euclidean distance between two points in Euclidean space is the length of a line segment between the two points. It can be calculated from the Cartesian coordinates of the points using the Pythagorean theorem, therefore o ...

, though others may be used) of an example from the separating hyperplane is the margin of that example. The notion of margin is important in several machine learning classification algorithms, as it can be used to bound the generalization error of the classifier. These bounds are frequently shown using the

VC dimension VC may refer to: Military decorations * Victoria Cross, a military decoration awarded by the United Kingdom and also by certain Commonwealth nations ** Victoria Cross for Australia ** Victoria Cross (Canada) ** Victoria Cross for New Zealand * ...

. Of particular prominence is the generalization

error bound The approximation error in a data value is the discrepancy between an exact value and some ''approximation'' to it. This error can be expressed as an absolute error (the numerical amount of the discrepancy) or as a relative error (the absolute er ...

on boosting algorithms and

support vector machine In machine learning, support vector machines (SVMs, also support vector networks) are supervised learning models with associated learning algorithms that analyze data for classification and regression analysis. Developed at AT&T Bell Laboratories ...

Support vector machine definition of margin

See

s and

maximum-margin hyperplane In geometry, the hyperplane separation theorem is a theorem about disjoint convex sets in ''n''-dimensional Euclidean space. There are several rather similar versions. In one version of the theorem, if both these sets are closed and at least on ...

for details.

Margin for boosting algorithms

The margin for an iterative boosting algorithm given a set of examples with two classes can be defined as follows. The classifier is given an example pair

(x,y)

where

x \in X

is a domain space and

y \in Y = \

is the label of the example. The iterative boosting algorithm then selects a classifier

h_j \in C

at each iteration

j

where

C

is a space of possible classifiers that predict real values. This hypothesis is then weighted by

\alpha_j \in R

as selected by the boosting algorithm. At iteration

t

, the margin of an example

x

can thus be defined as :

\frac.

By this definition, the margin is positive if the example is labeled correctly and negative if the example is labeled incorrectly. This definition may be modified and is not the only way to define margin for boosting algorithms. However, there are reasons why this definition may be appealing.Robert E. Schapire, Yoav Freund, Peter Bartlett and Wee Sun Lee.(1998)
Boosting the margin: A new explanation for the effectiveness of voting methods
, ''The Annals of Statistics'', 26(5):1651–1686

Examples of margin-based algorithms

Many classifiers can give an associated margin for each example. However, only some classifiers utilize information of the margin while learning from a data set. Many boosting algorithms rely on the notion of a margin to give weights to examples. If a convex loss is utilized (as in

AdaBoost AdaBoost, short for ''Adaptive Boosting'', is a statistical classification meta-algorithm formulated by Yoav Freund and Robert Schapire in 1995, who won the 2003 Gödel Prize for their work. It can be used in conjunction with many other types ...

LogitBoost In machine learning and computational learning theory, LogitBoost is a boosting algorithm formulated by Jerome Friedman, Trevor Hastie, and Robert Tibshirani. The original paper casts the AdaBoost algorithm into a statistical framework. Specif ...

, and all members of the AnyBoost family of algorithms) then an example with higher margin will receive less (or equal) weight than an example with lower margin. This leads the boosting algorithm to focus weight on low margin examples. In nonconvex algorithms (e.g.

BrownBoost BrownBoost is a boosting algorithm that may be robust to noisy datasets. BrownBoost is an adaptive version of the boost by majority algorithm. As is true for all boosting algorithms, BrownBoost is used in conjunction with other machine learning ...

), the margin still dictates the weighting of an example, though the weighting is non-monotone with respect to margin. There exists boosting algorithms that probably maximize the minimum margin (e.g. see Manfred Warmuth and Karen Glocer and Gunnar Rätsch. Boosting Algorithms for Maximizing the Soft Margin. In the Proceedings of Advances in Neural Information Processing Systems 20, 2007, pp 1585–1592.).

Support vector machine In machine learning, support vector machines (SVMs, also support vector networks) are supervised learning models with associated learning algorithms that analyze data for classification and regression analysis. Developed at AT&T Bell Laboratories ...

s probably maximize the margin of the separating hyperplane. Support vector machines that are trained using noisy data (there exists no perfect separation of the data in the given space) maximize the soft margin. More discussion of this can be found in the

article. The voted-perceptron algorithm is a margin maximizing algorithm based on an iterative application of the classic

algorithm.

Generalization error bounds

One theoretical motivation behind margin classifiers is that their generalization error may be bound by parameters of the algorithm and a margin term. An example of such a bound is for the AdaBoost algorithm. Let

S

be a set of

m

examples sampled independently at random from a distribution

D

. Assume the VC-dimension of the underlying base classifier is

d

and

m \geq d \geq 1

. Then with probability

1-\delta

we have the bound :

P_D\left( \frac \leq 0\right)          \leq         P_S\left(\frac \leq \theta\right)  + O\left(\frac \sqrt\right)

for all

\theta > 0

References

{{reflist Classification algorithms Statistical classification