![ID3 algorithm decision tree](https://upload.wikimedia.org/wikipedia/commons/4/46/ID3_algorithm_decision_tree.png)
In
decision tree learning
Decision tree learning is a supervised learning approach used in statistics, data mining and machine learning. In this formalism, a classification or regression decision tree is used as a predictive model to draw conclusions about a set of ob ...
, ID3 (Iterative Dichotomiser 3) is an
algorithm
In mathematics and computer science, an algorithm () is a finite sequence of rigorous instructions, typically used to solve a class of specific Computational problem, problems or to perform a computation. Algorithms are used as specificat ...
invented by
Ross Quinlan John Ross Quinlan is a computer science researcher in data mining and decision theory. He has contributed extensively to the development of decision tree algorithms, including inventing the canonical C4.5 and ID3 algorithms. He also contributed to ...
[Quinlan, J. R. 1986. Induction of Decision Trees. Mach. Learn. 1, 1 (Mar. 1986), 81–106] used to generate a
decision tree
A decision tree is a decision support tool that uses a tree-like model of decisions and their possible consequences, including chance event outcomes, resource costs, and utility. It is one way to display an algorithm that only contains condit ...
from a dataset. ID3 is the precursor to the
C4.5 algorithm
C4.5 is an algorithm used to generate a decision tree developed by Ross Quinlan. C4.5 is an extension of Quinlan's earlier ID3 algorithm. The decision trees generated by C4.5 can be used for classification, and for this reason, C4.5 is often referr ...
, and is typically used in the
machine learning
Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence.
Machine ...
and
natural language processing
Natural language processing (NLP) is an interdisciplinary subfield of linguistics, computer science, and artificial intelligence concerned with the interactions between computers and human language, in particular how to program computers to pro ...
domains.
Algorithm
The ID3 algorithm begins with the original set
as the
root node
In computer science, a tree is a widely used abstract data type that represents a hierarchical tree structure with a set of connected nodes. Each node in the tree can be connected to many children (depending on the type of tree), but must be con ...
. On each
iteration
Iteration is the repetition of a process in order to generate a (possibly unbounded) sequence of outcomes. Each repetition of the process is a single iteration, and the outcome of each iteration is then the starting point of the next iteration. ...
of the algorithm, it iterates through every unused
attribute
Attribute may refer to:
* Attribute (philosophy), an extrinsic property of an object
* Attribute (research), a characteristic of an object
* Grammatical modifier, in natural languages
* Attribute (computing), a specification that defines a prope ...
of the set
and calculates the
entropy
Entropy is a scientific concept, as well as a measurable physical property, that is most commonly associated with a state of disorder, randomness, or uncertainty. The term and the concept are used in diverse fields, from classical thermodynam ...
or the
information gain
Information is an abstract concept that refers to that which has the power to inform. At the most fundamental level information pertains to the interpretation of that which may be sensed. Any natural process that is not completely random, ...
of that attribute. It then selects the attribute which has the smallest entropy (or largest information gain) value. The set
is then split or
partitioned by the selected attribute to produce subsets of the data. (For example, a node can be split into
child node
In computer science, a tree is a widely used abstract data type that represents a hierarchical tree structure with a set of connected nodes. Each node in the tree can be connected to many children (depending on the type of tree), but must be c ...
s based upon the subsets of the population whose ages are less than 50, between 50 and 100, and greater than 100.) The algorithm continues to
recurse
''Recurse'' is the debut studio album released by American electro-industrial band Finite Automata. It was released on December 29, 2012, by Beyond Therapy Records in digital format, and released on compact disc on February 22, 2013.
The album ...
on each subset, considering only attributes never selected before.
Recursion on a subset may
stop
Stop may refer to:
Places
* Stop, Kentucky, an unincorporated community in the United States
* Stop (Rogatica), a village in Rogatica, Republika Srpska, Bosnia and Herzegovina
Facilities
* Bus stop
* Truck stop, a type of rest stop for truck d ...
in one of these cases:
* every element in the subset belongs to the same class; in which case the node is turned into a
leaf node
In computer science, a tree is a widely used abstract data type that represents a hierarchical tree structure with a set of connected nodes. Each node in the tree can be connected to many children (depending on the type of tree), but must be con ...
and labelled with the class of the examples.
* there are no more attributes to be selected, but the examples still do not belong to the same class. In this case, the node is made a leaf node and labelled with the
most common class of the examples in the subset.
* there are
no examples in the subset, which happens when no example in the parent set was found to match a specific value of the selected attribute. An example could be the absence of a person among the
population
Population typically refers to the number of people in a single area, whether it be a city or town, region, country, continent, or the world. Governments typically quantify the size of the resident population within their jurisdiction using a ...
with age over 100 years. Then a leaf node is created and labelled with the most common class of the examples in the parent node's set.
Throughout the algorithm, the decision tree is constructed with each non-terminal node (
internal node
In computer science, a tree is a widely used abstract data type that represents a hierarchical tree structure with a set of connected nodes. Each node in the tree can be connected to many children (depending on the type of tree), but must be con ...
) representing the selected
attribute
Attribute may refer to:
* Attribute (philosophy), an extrinsic property of an object
* Attribute (research), a characteristic of an object
* Grammatical modifier, in natural languages
* Attribute (computing), a specification that defines a prope ...
on which the data was split, and terminal nodes (leaf nodes) representing the class label of the final subset of this branch.
Summary
# Calculate the
entropy
Entropy is a scientific concept, as well as a measurable physical property, that is most commonly associated with a state of disorder, randomness, or uncertainty. The term and the concept are used in diverse fields, from classical thermodynam ...
of every
attribute
Attribute may refer to:
* Attribute (philosophy), an extrinsic property of an object
* Attribute (research), a characteristic of an object
* Grammatical modifier, in natural languages
* Attribute (computing), a specification that defines a prope ...
of the data set
.
# Partition ("split") the set
into
subset
In mathematics, Set (mathematics), set ''A'' is a subset of a set ''B'' if all Element (mathematics), elements of ''A'' are also elements of ''B''; ''B'' is then a superset of ''A''. It is possible for ''A'' and ''B'' to be equal; if they are ...
s using the attribute for which the
resulting entropy after splitting is
minimized; or, equivalently, information gain is
maximum
In mathematical analysis, the maxima and minima (the respective plurals of maximum and minimum) of a function, known collectively as extrema (the plural of extremum), are the largest and smallest value of the function, either within a given ran ...
.
# Make a decision tree
node
In general, a node is a localized swelling (a "knot") or a point of intersection (a vertex).
Node may refer to:
In mathematics
*Vertex (graph theory), a vertex in a mathematical graph
*Vertex (geometry), a point where two or more curves, lines, ...
containing that attribute.
# Recurse on subsets using the remaining attributes.
Pseudocode
ID3 (Examples, Target_Attribute, Attributes)
Create a root node for the tree
If all examples are positive, Return the single-node tree Root, with label = +.
If all examples are negative, Return the single-node tree Root, with label = -.
If number of predicting attributes is empty, then Return the single node tree Root,
with label = most common value of the target attribute in the examples.
Otherwise Begin
A ← The Attribute that best classifies examples.
Decision Tree attribute for Root = A.
For each possible value, , of A,
Add a new tree branch below Root, corresponding to the test A = .
Let Examples() be the subset of examples that have the value for A
If Examples() is empty
Then below this new branch add a leaf node with label = most common target value in the examples
Else below this new branch add the subtree ID3 (Examples(), Target_Attribute, Attributes – )
End
Return Root
Properties
![ID3 decision tree- splicing](https://upload.wikimedia.org/wikipedia/commons/0/09/ID3_decision_tree-_splicing.png)
ID3 does not guarantee an optimal solution. It can converge upon
local optima
In applied mathematics and computer science, a local optimum of an optimization problem is a solution that is optimal (either maximal or minimal) within a neighboring set of candidate solutions. This is in contrast to a global optimum, which i ...
. It uses a
greedy strategy by selecting the locally best attribute to split the dataset on each iteration. The
algorithm's optimality
Mathematical optimization (alternatively spelled ''optimisation'') or mathematical programming is the selection of a best element, with regard to some criterion, from some set of available alternatives. It is generally divided into two subfi ...
can be improved by using
backtracking
Backtracking is a class of algorithms for finding solutions to some computational problems, notably constraint satisfaction problems, that incrementally builds candidates to the solutions, and abandons a candidate ("backtracks") as soon as it de ...
during the search for the optimal decision tree at the cost of possibly taking longer.
ID3 can
overfit
mathematical modeling, overfitting is "the production of an analysis that corresponds too closely or exactly to a particular set of data, and may therefore fail to fit to additional data or predict future observations reliably". An overfitt ...
the training data. To avoid overfitting, smaller decision trees should be preferred over larger ones. This algorithm usually produces small trees, but it does not always produce the smallest possible decision tree.
ID3 is harder to use on continuous data than on factored data (factored data has a discrete number of possible values, thus reducing the possible branch points). If the values of any given attribute are
continuous
Continuity or continuous may refer to:
Mathematics
* Continuity (mathematics), the opposing concept to discreteness; common examples include
** Continuous probability distribution or random variable in probability and statistics
** Continuous ...
, then there are many more places to split the data on this attribute, and searching for the best value to split by can be time-consuming.
Usage
The ID3 algorithm is used by training on a
data set A data set (or dataset) is a collection of data. In the case of tabular data, a data set corresponds to one or more database tables, where every column of a table represents a particular variable, and each row corresponds to a given record of the ...
to produce a
decision tree
A decision tree is a decision support tool that uses a tree-like model of decisions and their possible consequences, including chance event outcomes, resource costs, and utility. It is one way to display an algorithm that only contains condit ...
which is stored in
memory
Memory is the faculty of the mind by which data or information is encoded, stored, and retrieved when needed. It is the retention of information over time for the purpose of influencing future action. If past events could not be remembered, ...
. At
runtime, this decision tree is used to
classify new test cases (
feature vector
In machine learning and pattern recognition, a feature is an individual measurable property or characteristic of a phenomenon. Choosing informative, discriminating and independent features is a crucial element of effective algorithms in pattern r ...
s) by
traversing the decision tree using the features of the datum to arrive at a leaf node.
The ID3 metrics
Entropy
Entropy
Entropy is a scientific concept, as well as a measurable physical property, that is most commonly associated with a state of disorder, randomness, or uncertainty. The term and the concept are used in diverse fields, from classical thermodynam ...
is a measure of the amount of uncertainty in the (data) set
(i.e. entropy characterizes the (data) set
).
:
Where,
*
– The
current dataset for which entropy is being calculated
**This changes at each step of the ID3 algorithm, either to a subset of the previous set in the case of splitting on an attribute or to a "sibling" partition of the parent in case the recursion terminated previously.
*
– The set of classes in
*
– The
proportion
Proportionality, proportion or proportional may refer to:
Mathematics
* Proportionality (mathematics), the property of two variables being in a multiplicative relation to a constant
* Ratio, of one quantity to another, especially of a part compare ...
of the
number of elements in class
to the number of elements in set
When
, the set
is perfectly classified (i.e. all elements in
are of the same class).
In ID3, entropy is calculated for each remaining attribute. The attribute with the smallest entropy is used to split the set
on this iteration. Entropy in
information theory
Information theory is the scientific study of the quantification (science), quantification, computer data storage, storage, and telecommunication, communication of information. The field was originally established by the works of Harry Nyquist a ...
measures how much information is
expected to be gained upon
measuring
Measurement is the quantification of attributes of an object or event, which can be used to compare with other objects or events.
In other words, measurement is a process of determining how large or small a physical quantity is as compared ...
a
random variable
A random variable (also called random quantity, aleatory variable, or stochastic variable) is a mathematical formalization of a quantity or object which depends on random events. It is a mapping or a function from possible outcomes (e.g., the po ...
; as such, it can also be used to quantify the amount to which the
distribution Distribution may refer to:
Mathematics
*Distribution (mathematics), generalized functions used to formulate solutions of partial differential equations
* Probability distribution, the probability of a particular value or value range of a vari ...
of the quantity's values is unknown. A
constant quantity has zero entropy, as its distribution is
perfectly known. In contrast, a uniformly distributed random variable (
discretely or
continuously uniform) maximizes entropy. Therefore, the greater the entropy at a node, the less information is known about the classification of data at this stage of the tree; and therefore, the greater the potential to improve the classification here.
As such, ID3 is a
greedy heuristic
A heuristic (; ), or heuristic technique, is any approach to problem solving or self-discovery that employs a practical method that is not guaranteed to be optimal, perfect, or rational, but is nevertheless sufficient for reaching an immediate, ...
performing a
best-first search for
locally optimal entropy values. Its accuracy can be improved by preprocessing the data.
Information gain
Information gain
Information is an abstract concept that refers to that which has the power to inform. At the most fundamental level information pertains to the interpretation of that which may be sensed. Any natural process that is not completely random, ...
is the measure of the difference in entropy from before to after the set
is split on an attribute
. In other words, how much uncertainty in
was reduced after splitting set
on attribute
.
:
Where,
*
– Entropy of set
*
– The subsets created from splitting set
by attribute
such that
*
– The proportion of the number of elements in
to the number of elements in set
*
– Entropy of subset
In ID3, information gain can be calculated (instead of entropy) for each remaining attribute. The attribute with the largest information gain is used to split the set
on this iteration.
See also
*
Classification and regression tree
Decision tree learning is a supervised learning approach used in statistics, data mining and machine learning. In this formalism, a classification or regression decision tree is used as a predictive model to draw conclusions about a set of obs ...
(CART)
*
C4.5 algorithm
C4.5 is an algorithm used to generate a decision tree developed by Ross Quinlan. C4.5 is an extension of Quinlan's earlier ID3 algorithm. The decision trees generated by C4.5 can be used for classification, and for this reason, C4.5 is often referr ...
*
Decision tree learning
Decision tree learning is a supervised learning approach used in statistics, data mining and machine learning. In this formalism, a classification or regression decision tree is used as a predictive model to draw conclusions about a set of ob ...
**
Decision tree model
In computational complexity the decision tree model is the model of computation in which an algorithm is considered to be basically a decision tree, i.e., a sequence of ''queries'' or ''tests'' that are done adaptively, so the outcome of the pre ...
References
Further reading
*
*{{Cite journal, last=Grzymala-Busse, first=Jerzy W., date=February 1993, title=Selected Algorithms of Machine Learning from Examples, url=https://people.eecs.ku.edu/~jerzygb/j24-sel.pdf, journal=Fundamenta Informaticae, volume=18, issue=2, pages=193–207, via=ResearchGate
External links
* Seminars
http://www2.cs.uregina.ca/* Description and examples
* Description and examples
Decision Trees and Political Party Classification
Decision trees
Classification algorithms
Articles with example pseudocode