Conceptual clustering is a

machine learning Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence. Machine ...

paradigm for

unsupervised classification Unsupervised learning is a type of algorithm that learns patterns from untagged data. The hope is that through mimicry, which is an important mode of learning in people, the machine is forced to build a concise representation of its world and t ...

that has been defined by Ryszard S. Michalski in 1980 (Fisher 1987, Michalski 1980) and developed mainly during the 1980s. It is distinguished from ordinary

data clustering Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). It is a main task of ...

by generating a concept description for each generated class. Most conceptual clustering methods are capable of generating hierarchical category structures; see

Categorization Categorization is the ability and activity of recognizing shared features or similarities between the elements of the experience of the world (such as Object (philosophy), objects, events, or ideas), organizing and classifying experience by a ...

for more information on hierarchy. Conceptual clustering is closely related to formal concept analysis,

decision tree learning Decision tree learning is a supervised learning approach used in statistics, data mining and machine learning. In this formalism, a classification or regression decision tree is used as a predictive model to draw conclusions about a set of obse ...

, and

mixture model In statistics, a mixture model is a probabilistic model for representing the presence of subpopulations within an overall population, without requiring that an observed data set should identify the sub-population to which an individual observation ...

learning.

Conceptual clustering vs. data clustering

Conceptual clustering is obviously closely related to data clustering; however, in conceptual clustering it is not only the inherent structure of the data that drives cluster formation, but also the Description language which is available to the learner. Thus, a statistically strong grouping in the data may fail to be extracted by the learner if the prevailing concept description language is incapable of describing that particular ''regularity''. In most implementations, the description language has been limited to feature

conjunction Conjunction may refer to: * Conjunction (grammar), a part of speech * Logical conjunction, a mathematical operator ** Conjunction introduction, a rule of inference of propositional logic * Conjunction (astronomy), in which two astronomical bodies ...

, although in COBWEB (see "

COBWEB A spider web, spiderweb, spider's web, or cobweb (from the archaic word '' coppe'', meaning "spider") is a structure created by a spider out of proteinaceous spider silk extruded from its spinnerets, generally meant to catch its prey. Spi ...

" below), the feature language is

probabilistic Probability is the branch of mathematics concerning numerical descriptions of how likely an Event (probability theory), event is to occur, or how likely it is that a proposition is true. The probability of an event is a number between 0 and ...

List of published algorithms

A fair number of algorithms have been proposed for conceptual clustering. Some examples are given below: * CLUSTER/2 (Michalski & Stepp 1983) *

(Fisher 1987) * CYRUS (Kolodner 1983) * GALOIS (Carpineto & Romano 1993), * GCF (Talavera & Béjar 2001) * INC (Hadzikadic & Yun 1989) * ITERATE (Biswas, Weinberg & Fisher 1998), * LABYRINTH (Thompson & Langley 1989) * SUBDUE (Jonyer, Cook & Holder 2001). * UNIMEM (Lebowitz 1987) * WITT (Hanson & Bauer 1989), More general discussions and reviews of conceptual clustering can be found in the following publications: * Michalski (1980) * Gennari, Langley, & Fisher (1989) * Fisher & Pazzani (1991) * Fisher & Langley (1986) * Stepp & Michalski (1986)

Example: A basic conceptual clustering algorithm

This section discusses the rudiments of the conceptual clustering algorithm COBWEB. There are many other algorithms using different heuristics and "

category goodness Category utility is a measure of "category goodness" defined in and . It attempts to maximize both the probability that two objects in the same category have attribute values in common, and the probability that objects from different categories ha ...

" or category evaluation criteria, but COBWEB is one of the best known. The reader is referred to the

bibliography Bibliography (from and ), as a discipline, is traditionally the academic study of books as physical, cultural objects; in this sense, it is also known as bibliology (from ). English author and bibliographer John Carter describes ''bibliography ...

for other methods.

Knowledge representation

The COBWEB data structure is a hierarchy (tree) wherein each node represents a given ''concept''. Each concept represents a set (actually, a

multiset In mathematics, a multiset (or bag, or mset) is a modification of the concept of a set that, unlike a set, allows for multiple instances for each of its elements. The number of instances given for each element is called the multiplicity of that e ...

or bag) of objects, each object being represented as a binary-valued property list. The data associated with each tree node (i.e., concept) are the integer property counts for the objects in that concept. For example, (see figure), let a concept

C_1

contain the following four objects (repeated objects being permitted). #

  0 1/code>
#  1 1/code>
#  1 0/code>
#  1 1/code>
The three properties might be, for example, s_male, has_wings, is_nocturnal 








S, or s, is the nineteenth letter in the Latin alphabet, used in the  modern English alphabet, the alphabets of other western European languages and others worldwide. Its name in English is  ''ess'' (pronounced ), plural ''esses''.
 History ...
/code>.  Then what is stored at this concept node is the property count   3 3/code>, indicating that 1 of the objects in the concept is male, 3 of the objects have wings, and 3 of the objects are nocturnal.  The concept ''description'' is the category-conditional probability (likelihood) of the properties at the node.  Thus, given that an object is a member of category (concept)  $C_1$ , the likelihood that it is male is  $1/4 = 0.25$ .  Likewise, the likelihood that the object has wings and  likelihood that the object is nocturnal  or both is  $3/4 = 0.75$ .  The concept description can therefore simply be given as 25 .75 .75 

5 (five) is a number, numeral (linguistics), numeral and numerical digit, digit. It is the natural number, and cardinal number, following 4 and preceding 6, and is a prime number. It has attained significance throughout history in part because typ ...
/code>, which corresponds to the  $C_1$ -conditional feature likelihood, i.e.,  $p(x, C_1) = (0.25, 0.75, 0.75)$ .

The figure to the right shows a concept tree with five concepts.   $C_0$   is the root concept, which contains all ten objects in the data set.  Concepts  $C_1$  and  $C_2$  are the children of  $C_0$ , the former containing four objects, and the later containing six objects.  Concept  $C_2$  is also the parent of concepts  $C_3$ ,  $C_4$ , and  $C_5$ , which contain three, two, and one object, respectively.  Note that each parent node (relative superordinate concept) contains all the objects contained by its child nodes (relative subordinate concepts).  In Fisher's (1987) description of COBWEB, he indicates that only the total attribute counts (not conditional probabilities, and not object lists) are stored at the nodes.  Any probabilities are computed from the attribute counts as needed.

 The COBWEB language

The description language of COBWEB is a "language" only in a loose sense, because being fully probabilistic it is capable of describing any concept.  However, if constraints are placed on the probability ranges which concepts may represent, then a stronger language is obtained.  For example,  we might permit only concepts wherein at least one probability differs from 0.5 by more than  $\alpha$ .  Under this constraint, with  $\alpha=0.3$ , a concept such as  6 .5 .7/code> could not be constructed by the learner;  however a concept such as  6 .5 .9/code> would be accessible because at least one probability differs from 0.5 by more than  $\alpha$ .  Thus, under constraints such as these, we obtain something like a traditional concept language.  In the limiting case where  $\alpha=0.5$  for every feature, and thus every probability in a concept must be 0 or 1, the result is a feature language base on conjunction; that is, every concept that can be represented can then be described as a conjunction of features (and their negations), and concepts that cannot be described in this way cannot be represented.

 Evaluation criterion

In Fisher's (1987) description of COBWEB, the measure he uses to evaluate the quality of the hierarchy is Gluck and Corter's (1985) category utility 
Category utility is a measure of "category goodness" defined in  and . It attempts to maximize both the probability that two objects in the same category have attribute values in common, and the probability that objects from different categories ha ...
 (CU) measure, which he re-derives in his paper.   The motivation for the measure is highly similar to the "information gain 






Information is an abstract concept that refers to that which has the power to inform.  At the most fundamental level information pertains to the  interpretation of that which may be sensed.  Any natural process that is not completely random,  ...
" measure introduced by Quinlan for decision tree learning.  It has previously been shown that the CU for feature-based classification  is the same as the mutual information 




In  probability theory and  information theory, the mutual information (MI) of two  random variables is a measure of the mutual  dependence between the two variables. More specifically, it quantifies the " amount of information" (in units such  ...
  between the feature variables and the class variable (Gluck & Corter, 1985; Corter & Gluck, 1992), and since this measure is much better known, we proceed here with mutual information as the measure of category "goodness".

What we wish to evaluate is the overall utility of grouping the objects into a particular hierarchical categorization structure.  Given a set of possible classification structures, we need to determine whether one is better than another.

  References 



*

*

*

*

*

*

*

*

*

*

*

*

*

*
{{refend

 External links

Bibliography of conceptual clusteringWorking python implementation of COBWEB

 Learning methods
 Classification algorithms
 Unsupervised learning