In
machine learning
Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence.
Machine ...
, boosting is an
ensemble
Ensemble may refer to:
Art
* Architectural ensemble
* Ensemble (album), ''Ensemble'' (album), Kendji Girac 2015 album
* Ensemble (band), a project of Olivier Alary
* Ensemble cast (drama, comedy)
* Ensemble (musical theatre), also known as the ...
meta-algorithm for primarily reducing
bias
Bias is a disproportionate weight ''in favor of'' or ''against'' an idea or thing, usually in a way that is closed-minded, prejudicial, or unfair. Biases can be innate or learned. People may develop biases for or against an individual, a group ...
, and also variance in
supervised learning
Supervised learning (SL) is a machine learning paradigm for problems where the available data consists of labelled examples, meaning that each data point contains features (covariates) and an associated label. The goal of supervised learning alg ...
, and a family of machine learning algorithms that convert weak learners to strong ones. Boosting is based on the question posed by
Kearns and
Valiant (1988, 1989):
[Michael Kearns(1988)]
''Thoughts on Hypothesis Boosting''
Unpublished manuscript (Machine Learning class project, December 1988) "Can a set of weak learners create a single strong learner?" A weak learner is defined to be a
classifier that is only slightly correlated with the true classification (it can label examples better than random guessing). In contrast, a strong learner is a classifier that is arbitrarily well-correlated with the true classification.
Robert Schapire
Robert Elias Schapire is an American computer scientist, former David M. Siegel '83 Professor in the computer science department at Princeton University, and has recently moved to Microsoft Research. His primary specialty is theoretical and app ...
's affirmative answer in a 1990 paper
to the question of Kearns and Valiant has had significant ramifications in
machine learning
Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence.
Machine ...
and
statistics, most notably leading to the development of boosting.
When first introduced, the ''hypothesis boosting problem'' simply referred to the process of turning a weak learner into a strong learner. "Informally,
he hypothesis boostingproblem asks whether an efficient learning algorithm
that outputs a hypothesis whose performance is only slightly better than random guessing
.e. a weak learnerimplies the existence of an efficient algorithm that outputs a hypothesis of arbitrary accuracy
.e. a strong learner"
Algorithms that achieve hypothesis boosting quickly became simply known as "boosting".
Freund and Schapire's arcing (Adapt
tve Resampling and Combining), as a general technique, is more or less synonymous with boosting.
Boosting algorithms
While boosting is not algorithmically constrained, most boosting algorithms consist of iteratively learning weak classifiers with respect to a distribution and adding them to a final strong classifier. When they are added, they are weighted in a way that is related to the weak learners' accuracy. After a weak learner is added, the data weights are readjusted, known as "re-
weighting
The process of weighting involves emphasizing the contribution of particular aspects of a phenomenon (or of a set of data) over others to an outcome or result; thereby highlighting those aspects in comparison to others in the analysis. That i ...
". Misclassified input data gain a higher weight and examples that are classified correctly lose weight. Thus, future weak learners focus more on the examples that previous weak learners misclassified.
There are many boosting algorithms. The original ones, proposed by
Robert Schapire
Robert Elias Schapire is an American computer scientist, former David M. Siegel '83 Professor in the computer science department at Princeton University, and has recently moved to Microsoft Research. His primary specialty is theoretical and app ...
(a
recursive
Recursion (adjective: ''recursive'') occurs when a thing is defined in terms of itself or of its type. Recursion is used in a variety of disciplines ranging from linguistics to logic. The most common application of recursion is in mathematics ...
majority gate formulation)
and
Yoav Freund
Joab (Hebrew Modern: ''Yōʼav'', Tiberian: ''Yōʼāḇ'') the son of Zeruiah, was the nephew of King David and the commander of his army, according to the Hebrew Bible.
Name
The name Joab is, like many other Hebrew names, theophoric - der ...
(boost by majority),
[Llew Mason, Jonathan Baxter, Peter Bartlett, and Marcus Frean (2000); ''Boosting Algorithms as Gradient Descent'', in S. A. Solla, T. K. Leen, and K.-R. Muller, editors, ''Advances in Neural Information Processing Systems'' 12, pp. 512-518, MIT Press] were not
adaptive
Adaptation, in biology, is the process or trait by which organisms or population better match their environment
Adaptation may also refer to:
Arts
* Adaptation (arts), a transfer of a work of art from one medium to another
** Film adaptation, a ...
and could not take full advantage of the weak learners. Schapire and Freund then developed
AdaBoost
AdaBoost, short for ''Adaptive Boosting'', is a statistical classification meta-algorithm formulated by Yoav Freund and Robert Schapire in 1995, who won the 2003 Gödel Prize for their work. It can be used in conjunction with many other types of ...
, an adaptive boosting algorithm that won the prestigious
Gödel Prize
The Gödel Prize is an annual prize for outstanding papers in the area of theoretical computer science, given jointly by the European Association for Theoretical Computer Science (EATCS) and the Association for Computing Machinery Special Interes ...
.
Only algorithms that are provable boosting algorithms in the
probably approximately correct learning
In computational learning theory, probably approximately correct (PAC) learning is a framework for mathematical analysis of machine learning. It was proposed in 1984 by Leslie Valiant.L. Valiant. A theory of the learnable.' Communications of the ...
formulation can accurately be called ''boosting algorithms''. Other algorithms that are similar in spirit to boosting algorithms are sometimes called "leveraging algorithms", although they are also sometimes incorrectly called boosting algorithms.
The main variation between many boosting algorithms is their method of
weighting
The process of weighting involves emphasizing the contribution of particular aspects of a phenomenon (or of a set of data) over others to an outcome or result; thereby highlighting those aspects in comparison to others in the analysis. That i ...
training data
In machine learning, a common task is the study and construction of algorithms that can learn from and make predictions on data. Such algorithms function by making data-driven predictions or decisions, through building a mathematical model from ...
points and
hypotheses.
AdaBoost
AdaBoost, short for ''Adaptive Boosting'', is a statistical classification meta-algorithm formulated by Yoav Freund and Robert Schapire in 1995, who won the 2003 Gödel Prize for their work. It can be used in conjunction with many other types of ...
is very popular and the most significant historically as it was the first algorithm that could adapt to the weak learners. It is often the basis of introductory coverage of boosting in university machine learning courses. There are many more recent algorithms such as
LPBoost Linear Programming Boosting (LPBoost) is a supervised classifier from the boosting family of classifiers. LPBoost maximizes a ''margin'' between training samples of different classes and hence also belongs to the class of margin-maximizing superv ...
, TotalBoost,
BrownBoost BrownBoost is a boosting algorithm that may be robust to noisy datasets. BrownBoost is an adaptive version of the boost by majority algorithm. As is true for all boosting algorithms, BrownBoost is used in conjunction with other machine learning ...
,
xgboost, MadaBoost,
LogitBoost In machine learning and computational learning theory, LogitBoost is a boosting algorithm formulated by Jerome Friedman, Trevor Hastie, and Robert Tibshirani. The original paper casts the AdaBoost algorithm into a statistical framework. Specif ...
, and others. Many boosting algorithms fit into the AnyBoost framework,
which shows that boosting performs
gradient descent
In mathematics, gradient descent (also often called steepest descent) is a first-order iterative optimization algorithm for finding a local minimum of a differentiable function. The idea is to take repeated steps in the opposite direction of the ...
in a
function space using a
convex
Convex or convexity may refer to:
Science and technology
* Convex lens, in optics
Mathematics
* Convex set, containing the whole line segment that joins points
** Convex polygon, a polygon which encloses a convex set of points
** Convex polytop ...
cost function.
Object categorization in computer vision
Given images containing various known objects in the world, a classifier can be learned from them to automatically
classify the objects in future images. Simple classifiers built based on some
image feature of the object tend to be weak in categorization performance. Using boosting methods for object categorization is a way to unify the weak classifiers in a special way to boost the overall ability of categorization.
Problem of object categorization
Object categorization is a typical task of
computer vision that involves determining whether or not an image contains some specific category of object. The idea is closely related with recognition, identification, and detection. Appearance based object categorization typically contains
feature extraction
In machine learning, pattern recognition, and image processing, feature extraction starts from an initial set of measured data and builds derived values (features) intended to be informative and non-redundant, facilitating the subsequent learning a ...
,
learning a
classifier, and applying the classifier to new examples. There are many ways to represent a category of objects, e.g. from
shape analysis,
bag of words model
The bag-of-words model is a simplifying representation used in natural language processing and information retrieval (IR). In this model, a text (such as a sentence or a document) is represented as the bag (multiset) of its words, disregarding g ...
s, or local descriptors such as
SIFT, etc. Examples of
supervised classifiers are
Naive Bayes classifiers,
support vector machines,
mixtures of Gaussians, and
neural networks. However, research has shown that object categories and their locations in images can be discovered in an
unsupervised manner as well.
Status quo for object categorization
The recognition of object categories in images is a challenging problem in
computer vision, especially when the number of categories is large. This is due to high intra class variability and the need for generalization across variations of objects within the same category. Objects within one category may look quite different. Even the same object may appear unalike under different viewpoint,
scale, and
illumination. Background clutter and partial occlusion add difficulties to recognition as well. Humans are able to recognize thousands of object types, whereas most of the existing
object recognition
Object recognition – technology in the field of computer vision for finding and identifying objects in an image or video sequence. Humans recognize a multitude of objects in images with little effort, despite the fact that the image of the ...
systems are trained to recognize only a few, e.g.
human faces
The face is the front of an animal's head that features the eyes, nose and mouth, and through which animals express many of their emotions. The face is crucial for human identity, and damage such as scarring or developmental deformities may aff ...
,
cars, simple objects, etc. Research has been very active on dealing with more categories and enabling incremental additions of new categories, and although the general problem remains unsolved, several multi-category objects detectors (for up to hundreds or thousands of categories) have been developed. One means is by
feature
Feature may refer to:
Computing
* Feature (CAD), could be a hole, pocket, or notch
* Feature (computer vision), could be an edge, corner or blob
* Feature (software design) is an intentional distinguishing characteristic of a software item ...
sharing and boosting.
Boosting for binary categorization
AdaBoost can be used for face detection as an example of
binary categorization. The two categories are faces versus background. The general algorithm is as follows:
#Form a large set of simple features
#Initialize weights for training images
#For T rounds
##Normalize the weights
##For available features from the set, train a classifier using a single feature and evaluate the training error
##Choose the classifier with the lowest error
##Update the weights of the training images: increase if classified wrongly by this classifier, decrease if correctly
#Form the final strong classifier as the linear combination of the T classifiers (coefficient larger if training error is small)
After boosting, a classifier constructed from 200 features could yield a 95% detection rate under a
false positive rate
In statistics, when performing multiple comparisons, a false positive ratio (also known as fall-out or false alarm ratio) is the probability of falsely rejecting the null hypothesis for a particular test. The false positive rate is calculated as th ...
.
Another application of boosting for binary categorization is a system that detects pedestrians using
patterns
A pattern is a regularity in the world, in human-made design, or in abstract ideas. As such, the elements of a pattern repeat in a predictable manner. A geometric pattern is a kind of pattern formed of geometric shapes and typically repeated li ...
of motion and appearance. This work is the first to combine both motion information and appearance information as features to detect a walking person. It takes a similar approach to the
Viola-Jones object detection framework.
Boosting for multi-class categorization
Compared with binary categorization,
multi-class categorization looks for common features that can be shared across the categories at the same time. They turn to be more generic
edge
Edge or EDGE may refer to:
Technology Computing
* Edge computing, a network load-balancing system
* Edge device, an entry point to a computer network
* Adobe Edge, a graphical development application
* Microsoft Edge, a web browser developed ...
like features. During learning, the detectors for each category can be trained jointly. Compared with training separately, it
generalizes better, needs less training data, and requires fewer features to achieve the same performance.
The main flow of the algorithm is similar to the binary case. What is different is that a measure of the joint training error shall be defined in advance. During each iteration the algorithm chooses a classifier of a single feature (features that can be shared by more categories shall be encouraged). This can be done via converting
multi-class classification
In machine learning and statistical classification, multiclass classification or multinomial classification is the problem of classifying instances into one of three or more classes (classifying instances into one of two classes is called binary c ...
into a binary one (a set of categories versus the rest), or by introducing a penalty error from the categories that do not have the feature of the classifier.
In the paper "Sharing visual features for multiclass and multiview object detection", A. Torralba et al. used
GentleBoost for boosting and showed that when training data is limited, learning via sharing features does a much better job than no sharing, given same boosting rounds. Also, for a given performance level, the total number of features required (and therefore the run time cost of the classifier) for the feature sharing detectors, is observed to scale approximately
logarithm
In mathematics, the logarithm is the inverse function to exponentiation. That means the logarithm of a number to the base is the exponent to which must be raised, to produce . For example, since , the ''logarithm base'' 10 of ...
ically with the number of class, i.e., slower than
linear
Linearity is the property of a mathematical relationship ('' function'') that can be graphically represented as a straight line. Linearity is closely related to '' proportionality''. Examples in physics include rectilinear motion, the linear ...
growth in the non-sharing case. Similar results are shown in the paper "Incremental learning of object detectors using a visual shape alphabet", yet the authors used
AdaBoost
AdaBoost, short for ''Adaptive Boosting'', is a statistical classification meta-algorithm formulated by Yoav Freund and Robert Schapire in 1995, who won the 2003 Gödel Prize for their work. It can be used in conjunction with many other types of ...
for boosting.
Convex vs. non-convex boosting algorithms
Boosting algorithms can be based on
convex
Convex or convexity may refer to:
Science and technology
* Convex lens, in optics
Mathematics
* Convex set, containing the whole line segment that joins points
** Convex polygon, a polygon which encloses a convex set of points
** Convex polytop ...
or non-convex optimization algorithms. Convex algorithms, such as
AdaBoost
AdaBoost, short for ''Adaptive Boosting'', is a statistical classification meta-algorithm formulated by Yoav Freund and Robert Schapire in 1995, who won the 2003 Gödel Prize for their work. It can be used in conjunction with many other types of ...
and
LogitBoost In machine learning and computational learning theory, LogitBoost is a boosting algorithm formulated by Jerome Friedman, Trevor Hastie, and Robert Tibshirani. The original paper casts the AdaBoost algorithm into a statistical framework. Specif ...
, can be "defeated" by random noise such that they can't learn basic and learnable combinations of weak hypotheses.
[P. Long and R. Servedio. 25th International Conference on Machine Learning (ICML), 2008, pp. 608--615.] This limitation was pointed out by Long & Servedio in 2008. However, by 2009, multiple authors demonstrated that boosting algorithms based on non-convex optimization, such as
BrownBoost BrownBoost is a boosting algorithm that may be robust to noisy datasets. BrownBoost is an adaptive version of the boost by majority algorithm. As is true for all boosting algorithms, BrownBoost is used in conjunction with other machine learning ...
, can learn from noisy datasets and can specifically learn the underlying classifier of the Long–Servedio dataset.
See also
*
AdaBoost
AdaBoost, short for ''Adaptive Boosting'', is a statistical classification meta-algorithm formulated by Yoav Freund and Robert Schapire in 1995, who won the 2003 Gödel Prize for their work. It can be used in conjunction with many other types of ...
*
Random forest
*
Alternating decision tree An alternating decision tree (ADTree) is a machine learning method for classification. It generalizes decision trees and has connections to boosting.
An ADTree consists of an alternation of decision nodes, which specify a predicate condition, and ...
*
Bootstrap aggregating
Bootstrap aggregating, also called bagging (from bootstrap aggregating), is a machine learning ensemble meta-algorithm designed to improve the stability and accuracy of machine learning algorithms used in statistical classification and regressi ...
(bagging)
*
Cascading
*
BrownBoost BrownBoost is a boosting algorithm that may be robust to noisy datasets. BrownBoost is an adaptive version of the boost by majority algorithm. As is true for all boosting algorithms, BrownBoost is used in conjunction with other machine learning ...
*
CoBoosting CoBoost is a semi-supervised training algorithm proposed by Collins and Singer in 1999. The original application for the algorithm was the task of Named Entity Classification using very weak learners.Michael Collins and Yoram Singer, Unsupervised M ...
*
LPBoost Linear Programming Boosting (LPBoost) is a supervised classifier from the boosting family of classifiers. LPBoost maximizes a ''margin'' between training samples of different classes and hence also belongs to the class of margin-maximizing superv ...
*
Logistic regression
In statistics, the logistic model (or logit model) is a statistical model that models the probability of an event taking place by having the log-odds for the event be a linear combination of one or more independent variables. In regression a ...
*
Maximum entropy methods
*
Neural networks
*
Support vector machines
*
Gradient boosting
Gradient boosting is a machine learning technique used in regression and classification tasks, among others. It gives a prediction model in the form of an ensemble of weak prediction models, which are typically decision trees. When a decision t ...
*
Margin classifier In machine learning, a margin classifier is a classifier which is able to give an associated distance from the decision boundary for each example. For instance, if a linear classifier (e.g. perceptron or linear discriminant analysis) is used, the ...
s
*
Cross-validation
*
Machine learning
Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence.
Machine ...
*
List of datasets for machine learning research
These datasets are applied for machine learning research and have been cited in peer-reviewed academic journals. Datasets are an integral part of the field of machine learning. Major advances in this field can result from advances in learning ...
Implementations
*
scikit-learn
scikit-learn (formerly scikits.learn and also known as sklearn) is a free software machine learning library for the Python programming language.
It features various classification, regression and clustering algorithms including support-vector ...
, an open source machine learning library for
Python
Python may refer to:
Snakes
* Pythonidae, a family of nonvenomous snakes found in Africa, Asia, and Australia
** ''Python'' (genus), a genus of Pythonidae found in Africa and Asia
* Python (mythology), a mythical serpent
Computing
* Python (pro ...
*
Orange
Orange most often refers to:
*Orange (fruit), the fruit of the tree species '' Citrus'' × ''sinensis''
** Orange blossom, its fragrant flower
*Orange (colour), from the color of an orange, occurs between red and yellow in the visible spectrum
* ...
, a free data mining software suite, modul
Orange.ensemble*
Weka
The weka, also known as the Māori hen or woodhen (''Gallirallus australis'') is a flightless bird species of the rail family. It is endemic to New Zealand. It is the only extant member of the genus '' Gallirallus''. Four subspecies are recogni ...
is a machine learning set of tools that offers variate implementations of boosting algorithms like AdaBoost and LogitBoost
*
R packag
GBM(Generalized Boosted Regression Models) implements extensions to Freund and Schapire's AdaBoost algorithm and Friedman's gradient boosting machine.
jboost AdaBoost, LogitBoost, RobustBoost, Boostexter and alternating decision trees
* R packag
Applies Multiclass AdaBoost.M1, AdaBoost-SAMME and Bagging
* R packag
An implementation of gradient boosting for linear and tree-based models.
Notes
References
Further reading
* Yoav Freund and Robert E. Schapire (1997)
''A Decision-Theoretic Generalization of On-line Learning and an Application to Boosting'' Journal of Computer and System Sciences, 55(1):119-139
* Robert E. Schapire and Yoram Singer (1999)
Machine Learning, 37(3):297-336
External links
* Robert E. Schapire (2003)
''The Boosting Approach to Machine Learning: An Overview'' MSRI (Mathematical Sciences Research Institute) Workshop on Nonlinear Estimation and Classification
*
Zhou Zhi-Hua (2014
''Boosting 25 years'' CCL 2014 Keynote.
*
*
{{Authority control
Classification algorithms
Ensemble learning
Learning in computer vision
Object recognition and categorization