An incremental decision tree algorithm is an
online
In computer technology and telecommunications, online indicates a state of connectivity and offline indicates a disconnected state. In modern terminology, this usually refers to an Internet connection, but (especially when expressed "on line" o ...
machine learning
Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence.
Machine ...
algorithm that outputs a
decision tree
A decision tree is a decision support tool that uses a tree-like model of decisions and their possible consequences, including chance event outcomes, resource costs, and utility. It is one way to display an algorithm that only contains co ...
. Many
decision tree
A decision tree is a decision support tool that uses a tree-like model of decisions and their possible consequences, including chance event outcomes, resource costs, and utility. It is one way to display an algorithm that only contains co ...
methods, such as
C4.5
C4.5 is an algorithm used to generate a decision tree developed by Ross Quinlan. C4.5 is an extension of Quinlan's earlier ID3 algorithm. The decision trees generated by C4.5 can be used for classification, and for this reason, C4.5 is often referr ...
, construct a tree using a complete dataset. Incremental decision tree methods allow an existing tree to be updated using only new individual data instances, without having to re-process past instances. This may be useful in situations where the entire dataset is not available when the tree is updated (i.e. the data was not stored), the original data set is too large to process or the characteristics of the data change over time.
Applications
* On-line learning
*
Data streams
*
Concept drift
In predictive analytics and machine learning, concept drift means that the statistical properties of the target variable, which the model is trying to predict, change over time in unforeseen ways. This causes problems because the predictions beco ...
* Data which can be modeled well using a hierarchical model.
* Systems where a user-interpretable output is desired.
Methods
Here is a short list of incremental decision tree methods, organized by their (usually non-incremental) parent algorithms.
CART family
CART
A cart or dray (Australia and New Zealand) is a vehicle designed for transport, using two wheels and normally pulled by one or a pair of draught animals. A handcart is pulled or pushed by one or more people.
It is different from the flatbed tr ...
(1984) is a nonincremental decision tree inducer for both classification and regression problems. developed in the mathematics and statistics communities. CART traces its roots to AID (1963)
*incremental CART (1989) Crawford modified CART to incorporate data incrementally.
ID3/C4.5 family
ID3 (1986) and
C4.5
C4.5 is an algorithm used to generate a decision tree developed by Ross Quinlan. C4.5 is an extension of Quinlan's earlier ID3 algorithm. The decision trees generated by C4.5 can be used for classification, and for this reason, C4.5 is often referr ...
(1993) were developed by
Quinlan and have roots in Hunt's Concept Learning System (CLS, 1966) The ID3 family of tree inducers was developed in the engineering and computer science communities.
*ID3' (1986)
was suggested by Schlimmer and Fisher. It was a brute-force method to make ID3 incremental; after each new data instance is acquired, an entirely new tree is induced using ID3.
*ID4 (1986)
could incorporate data incrementally. However, certain concepts were unlearnable, because ID4 discards subtrees when a new test is chosen for a node.
*ID5 (1988) didn't discard subtrees, but also did not guarantee that it would produce the same tree as ID3.
*ID5R (1989) output the same tree as ID3 for a dataset regardless of the incremental training order. This was accomplished by recursively updating the tree's subnodes. It did not handle numeric variables,
multiclass classification
In machine learning and statistical classification, multiclass classification or multinomial classification is the problem of classifying instances into one of three or more classes (classifying instances into one of two classes is called binary c ...
tasks, or missing values.
*ID6MDL (2007) an extended version of the ID3 or ID5R algorithms.
*ITI (1997) is an efficient method for incrementally inducing decision trees. The same tree is produced for a dataset regardless of the data's presentation order, or whether the tree is induced incrementally or non incrementally (''batch mode''). It can accommodate numeric variables, multiclass tasks, and missing values. Code is available on the web
note: ID6NB (2009) is not incremental.
Other Incremental Learning Systems
There were several ''incremental'' concept learning systems that did not build decision trees, but which predated and influenced the development of the earliest incremental decision tree learners, notably ID4.
Notable among these was Schlimmer and Granger's STAGGER (1986), which learned disjunctive concepts incrementally. STAGGER was developed to examine concepts that changed over time (
concept drift
In predictive analytics and machine learning, concept drift means that the statistical properties of the target variable, which the model is trying to predict, change over time in unforeseen ways. This causes problems because the predictions beco ...
). Prior to STAGGER, Michalski and Larson (1978)
investigated an incremental variant of AQ (Michalski, 1973),
a supervised system for learning concepts in
disjunctive normal form
In boolean logic, a disjunctive normal form (DNF) is a canonical normal form of a logical formula consisting of a disjunction of conjunctions; it can also be described as an OR of ANDs, a sum of products, or (in philosophical logic) a ''cluster co ...
(DNF). Experience with these earlier systems and others, to include incremental tree-structured unsupervised learning, contributed to a conceptual framework for evaluating incremental decision tree learners specifically, and incremental concept learning generally, along four dimensions that reflect the inherent tradeoffs between learning cost and quality:
(1) cost of knowledge base update, (2) the number of observations that are required to converge on a knowledge base with given characteristics, (3) the total effort (as a function of the first two dimensions) that a system exerts, and the (4) quality (often consistency) of the final knowledge base. Some of the historical context in which incremental decision tree learners emerged is given in Fisher and Schlimmer (1988),
and which also expands on the four factor framework that was used to evaluate and design
incremental learning systems.
VFDT Algorithm
Very Fast Decision Trees learner reduces training time for large incremental data sets by subsampling the incoming data stream.
* VFDT (2000)
* CVFDT (2001) can adapt to
concept drift
In predictive analytics and machine learning, concept drift means that the statistical properties of the target variable, which the model is trying to predict, change over time in unforeseen ways. This causes problems because the predictions beco ...
, by using a sliding window on incoming data. Old data outside the window is forgotten.
* VFDTc (2006) extends VFDT for continuous data, concept drift, and application of Naive Bayes classifiers in the leaves.
* VFML (2003) is a toolkit and available on the web
It was developed by the creators of VFDT and CVFDT.
EFDT Algorithm
The Extremely Fast Decision Tree learner is statistically more powerful than VFDT, allowing it to learn more detailed trees from less data. It differs from VFDT in the method for deciding when to insert a new branch into the tree. VFDT waits until it is confident that the best available branch is better than any alternative. In contrast, EFDT splits as soon as it is confident that the best available branch is better than the current alternative. Initially, the current alternative is no branch. This allows EFDT to insert branches much more rapidly than VFDT. During incremental learning this means that EFDT can deploy useful trees much sooner than VFDT.
However, the new branch selection method greatly increases the likelihood of selecting a suboptimal branch. In consequence, EFDT keeps monitoring the performance of all branches and will replace a branch as soon as it is confident there is a better alternative.
OLIN and IFN
*OLIN (2002)
*IOLIN (2008) — based on Info-Fuzzy Network (IFN)
GAENARI
gaenari
See also
*
Concept drift
In predictive analytics and machine learning, concept drift means that the statistical properties of the target variable, which the model is trying to predict, change over time in unforeseen ways. This causes problems because the predictions beco ...
*
Decision tree learning, Decision tree
*
Machine Learning
Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence.
Machine ...
*
Online learning
References
External links
* ITI code
http://www-lrn.cs.umass.edu/iti/index.html* VFML code
http://www.cs.washington.edu/dm/vfml/* C++ incremental decision tree. https://github.com/greenfish77/gaenari
{{DEFAULTSORT:Incremental Decision Tree
Decision trees