HOME

TheInfoList



OR:

The winnow algorithm Nick Littlestone (1988). "Learning Quickly When Irrelevant Attributes Abound: A New Linear-threshold Algorithm"
''Machine Learning'' 285–318(2)
is a technique from
machine learning Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence. Machine ...
for learning a
linear classifier In the field of machine learning, the goal of statistical classification is to use an object's characteristics to identify which class (or group) it belongs to. A linear classifier achieves this by making a classification decision based on the val ...
from labeled examples. It is very similar to the perceptron algorithm. However, the perceptron algorithm uses an additive weight-update scheme, while Winnow uses a multiplicative scheme that allows it to perform much better when many dimensions are irrelevant (hence its name winnow). It is a simple algorithm that scales well to high-dimensional data. During training, Winnow is shown a sequence of positive and negative examples. From these it learns a decision hyperplane that can then be used to label novel examples as positive or negative. The algorithm can also be used in the
online learning Educational technology (commonly abbreviated as edutech, or edtech) is the combined use of computer hardware, software, and educational theory and practice to facilitate learning. When referred to with its abbreviation, edtech, it often refer ...
setting, where the learning and the classification phase are not clearly separated.


Algorithm

The basic algorithm, Winnow1, is as follows. The instance space is X=\^n, that is, each instance is described as a set of Boolean-valued
features Feature may refer to: Computing * Feature (CAD), could be a hole, pocket, or notch * Feature (computer vision), could be an edge, corner or blob * Feature (software design) is an intentional distinguishing characteristic of a software item ...
. The algorithm maintains non-negative weights w_i for i\in \, which are initially set to 1, one weight for each feature. When the learner is given an example (x_1,\ldots,x_n), it applies the typical prediction rule for linear classifiers: * If \sum_^n w_i x_i > \Theta , then predict 1 * Otherwise predict 0 Here \Theta is a real number that is called the ''threshold''. Together with the weights, the threshold defines a dividing hyperplane in the instance space. Good bounds are obtained if \Theta=n/2 (see below). For each example with which it is presented, the learner applies the following update rule: * If an example is correctly classified, do nothing. * If an example is predicted incorrectly and the correct result was 0, for each feature x_=1, the corresponding weight w_ is set to 0 (demotion step). *: \forall x_ = 1, w_ = 0 * If an example is predicted incorrectly and the correct result was 1, for each feature x_=1, the corresponding weight w_ multiplied by (promotion step). *: \forall x_ = 1, w_ = \alpha w_ A typical value for is 2. There are many variations to this basic approach. ''Winnow2'' is similar except that in the demotion step the weights are divided by instead of being set to 0. ''Balanced Winnow'' maintains two sets of weights, and thus two hyperplanes. This can then be generalized for
multi-label classification In machine learning, multi-label classification or multi-output classification is a variant of the classification problem where multiple nonexclusive labels may be assigned to each instance. Multi-label classification is a generalization of mult ...
.


Mistake bounds

In certain circumstances, it can be shown that the number of mistakes Winnow makes as it learns has an
upper bound In mathematics, particularly in order theory, an upper bound or majorant of a subset of some preordered set is an element of that is greater than or equal to every element of . Dually, a lower bound or minorant of is defined to be an eleme ...
that is independent of the number of instances with which it is presented. If the Winnow1 algorithm uses \alpha > 1 and \Theta \geq 1/\alpha on a target function that is a k-literal monotone disjunction given by f(x_1,\ldots,x_n)=x_\cup \cdots \cup x_, then for any sequence of instances the total number of mistakes is bounded by: \alpha k ( \log_\alpha \Theta+1)+\frac{\Theta}. Nick Littlestone (1989). "Mistake bounds and logarithmic linear-threshold learning algorithms". Technical report UCSC-CRL-89-11, University of California, Santa Cruz.


References

Classification algorithms