Hierarchical clustering of networks
   HOME

TheInfoList



OR:

Hierarchical clustering is one method for finding
community structure In the study of complex networks, a network is said to have community structure if the nodes of the network can be easily grouped into (potentially overlapping) sets of nodes such that each set of nodes is densely connected internally. In the part ...
s in a
network Network, networking and networked may refer to: Science and technology * Network theory, the study of graphs as a representation of relations between discrete objects * Network science, an academic field that studies complex networks Mathematics ...
. The technique arranges the network into a hierarchy of groups according to a specified weight function. The data can then be represented in a tree structure known as a dendrogram. Hierarchical clustering can either be agglomerative or divisive depending on whether one proceeds through the algorithm by adding links to or removing links from the network, respectively. One divisive technique is the
Girvan–Newman algorithm The Girvan–Newman algorithm (named after Michelle Girvan and Mark Newman) is a hierarchical method used to detect communities in complex systems.Girvan M. and Newman M. E. J.Community structure in social and biological networks Proc. Natl. Acad. ...
.


Algorithm

In the hierarchical clustering algorithm, a
weight In science and engineering, the weight of an object is the force acting on the object due to gravity. Some standard textbooks define weight as a vector quantity, the gravitational force acting on the object. Others define weight as a scalar qua ...
W_ is first assigned to each pair of vertices (i,j) in the network. The weight, which can vary depending on implementation (see section below), is intended to indicate how closely related the vertices are. Then, starting with all the nodes in the network disconnected, begin pairing nodes from highest to lowest weight between the pairs (in the divisive case, start from the original network and remove links from lowest to highest weight). As links are added, connected subsets begin to form. These represent the network's community structures. The components at each iterative step are always a subset of other structures. Hence, the subsets can be represented using a tree diagram, or dendrogram. Horizontal slices of the tree at a given level indicate the communities that exist above and below a value of the weight.


Weights

There are many possible weights for use in hierarchical clustering algorithms. The specific weight used is dictated by the data as well as considerations for computational speed. Additionally, the communities found in the network are highly dependent on the choice of weighting function. Hence, when compared to real-world data with a known community structure, the various weighting techniques have been met with varying degrees of success. Two weights that have been used previously with varying success are the number of node-independent paths between each pair of vertices and the total number of paths between vertices weighted by the length of the path. One disadvantage of these weights, however, is that both weighting schemes tend to separate single peripheral vertices from their rightful communities because of the small number of paths going to these vertices. For this reason, their use in hierarchical clustering techniques is far from optimal. Edge
betweenness centrality In graph theory, betweenness centrality (or "betweeness centrality") is a measure of centrality in a graph based on shortest paths. For every pair of vertices in a connected graph, there exists at least one shortest path between the vertices such ...
has been used successfully as a weight in the
Girvan–Newman algorithm The Girvan–Newman algorithm (named after Michelle Girvan and Mark Newman) is a hierarchical method used to detect communities in complex systems.Girvan M. and Newman M. E. J.Community structure in social and biological networks Proc. Natl. Acad. ...
. This technique is similar to a divisive hierarchical clustering algorithm, except the weights are recalculated with each step. The change in modularity of the network with the addition of a node has also been used successfully as a weight. This method provides a computationally less-costly alternative to the Girvan-Newman algorithm while yielding similar results.


See also

*
Network topology Network topology is the arrangement of the elements ( links, nodes, etc.) of a communication network. Network topology can be used to define or describe the arrangement of various types of telecommunication networks, including command and contr ...
*
Numerical taxonomy Numerical taxonomy is a classification system in biological systematics which deals with the grouping by numerical methods of taxonomic units based on their character states. It aims to create a taxonomy using numeric algorithms like cluster ...
*
Tree structure A tree structure, tree diagram, or tree model is a way of representing the hierarchical nature of a structure in a graphical form. It is named a "tree structure" because the classic representation resembles a tree, although the chart is genera ...


References

{{DEFAULTSORT:Hierarchical Clustering Of Networks Graph algorithms Network analysis