Hierarchical clustering
In data mining and statistics, hierarchical clustering (also called hierarchical cluster analysis or HCA) is a method of cluster analysis that seeks to build a hierarchy of clusters. Strategies for hierarchical clustering generally fall into tw ...
is one method for finding
community structure
In the study of complex networks, a network is said to have community structure if the nodes of the network can be easily grouped into (potentially overlapping) sets of nodes such that each set of nodes is densely connected internally. In the par ...
s in a
network
Network, networking and networked may refer to:
Science and technology
* Network theory, the study of graphs as a representation of relations between discrete objects
* Network science, an academic field that studies complex networks
Mathematics ...
. The technique arranges the network into a hierarchy of groups according to a specified weight function. The data can then be represented in a tree structure known as a
dendrogram
A dendrogram is a diagram representing a tree. This diagrammatic representation is frequently used in different contexts:
* in hierarchical clustering, it illustrates the arrangement of the clusters produced by the corresponding analyses.
...
. Hierarchical clustering can either be
agglomerative or
divisive depending on whether one proceeds through the algorithm by adding links to or removing links from the network, respectively. One divisive technique is the
Girvan–Newman algorithm
The Girvan–Newman algorithm (named after Michelle Girvan and Mark Newman) is a hierarchical method used to detect communities in complex systems.Girvan M. and Newman M. E. J.Community structure in social and biological networks Proc. Natl. Acad. ...
.
Algorithm
In the hierarchical clustering algorithm, a
weight
In science and engineering, the weight of an object is the force acting on the object due to gravity.
Some standard textbooks define weight as a vector quantity, the gravitational force acting on the object. Others define weight as a scalar q ...
is first assigned to each pair of
vertices in the network. The weight, which can vary depending on implementation (see section below), is intended to indicate how closely related the vertices are. Then, starting with all the nodes in the network disconnected, begin pairing nodes from highest to lowest weight between the pairs (in the divisive case, start from the original network and remove links from lowest to highest weight). As links are added, connected subsets begin to form. These represent the network's community structures.
The components at each iterative step are always a subset of other structures. Hence, the subsets can be represented using a tree diagram, or
dendrogram
A dendrogram is a diagram representing a tree. This diagrammatic representation is frequently used in different contexts:
* in hierarchical clustering, it illustrates the arrangement of the clusters produced by the corresponding analyses.
...
. Horizontal slices of the tree at a given level indicate the communities that exist above and below a value of the weight.
Weights
There are many possible weights for use in hierarchical clustering algorithms. The specific weight used is dictated by the data as well as considerations for computational speed. Additionally, the communities found in the network are highly dependent on the choice of weighting function. Hence, when compared to real-world data with a known community structure, the various weighting techniques have been met with varying degrees of success.
Two weights that have been used previously with varying success are the number of node-independent paths between each pair of vertices and the total number of paths between vertices weighted by the length of the path. One disadvantage of these weights, however, is that both weighting schemes tend to separate single peripheral vertices from their rightful communities because of the small number of paths going to these vertices. For this reason, their use in hierarchical clustering techniques is far from optimal.
Edge
betweenness centrality
In graph theory, betweenness centrality (or "betweeness centrality") is a measure of centrality in a graph based on shortest paths. For every pair of vertices in a connected graph, there exists at least one shortest path between the vertices suc ...
has been used successfully as a weight in the
Girvan–Newman algorithm
The Girvan–Newman algorithm (named after Michelle Girvan and Mark Newman) is a hierarchical method used to detect communities in complex systems.Girvan M. and Newman M. E. J.Community structure in social and biological networks Proc. Natl. Acad. ...
.
This technique is similar to a divisive hierarchical clustering algorithm, except the weights are recalculated with each step.
The change in
modularity
Broadly speaking, modularity is the degree to which a system's components may be separated and recombined, often with the benefit of flexibility and variety in use. The concept of modularity is used primarily to reduce complexity by breaking a s ...
of the network with the addition of a node has also been used successfully as a weight.
This method provides a computationally less-costly alternative to the Girvan-Newman algorithm while yielding similar results.
See also
*
Network topology
Network topology is the arrangement of the elements ( links, nodes, etc.) of a communication network. Network topology can be used to define or describe the arrangement of various types of telecommunication networks, including command and contr ...
*
Numerical taxonomy
Numerical taxonomy is a classification system in biological systematics which deals with the grouping by numerical methods of taxonomic units based on their character states. It aims to create a taxonomy using numeric algorithms like cluster ...
*
Tree structure
A tree structure, tree diagram, or tree model is a way of representing the hierarchical nature of a structure in a graphical form. It is named a "tree structure" because the classic representation resembles a tree, although the chart is genera ...
References
{{DEFAULTSORT:Hierarchical Clustering Of Networks
Graph algorithms
Network analysis