HOME

TheInfoList



OR:

The Girvan–Newman algorithm (named after
Michelle Girvan Michelle Girvan (born 1977) is an American physicist and network scientist whose research combines methods from dynamical systems, graph theory, and statistical mechanics and applies them to problems including epidemiology, gene regulation, a ...
and
Mark Newman Mark Newman is an English-American physicist and Anatol Rapoport Distinguished University Professor of Physics at the University of Michigan, as well as an external faculty member of the Santa Fe Institute. He is known for his fundamental co ...
) is a hierarchical method used to detect communities in
complex system A complex system is a system composed of many components which may interact with each other. Examples of complex systems are Earth's global climate, organisms, the human brain, infrastructure such as power grid, transportation or communicatio ...
s.Girvan M. and Newman M. E. J.
Community structure in social and biological networks
Proc. Natl. Acad. Sci. USA 99, 7821–7826 (2002)


Edge betweenness and community structure

The Girvan–Newman algorithm detects communities by progressively removing edges from the original network. The connected components of the remaining network are the communities. Instead of trying to construct a measure that tells us which edges are the most central to communities, the Girvan–Newman algorithm focuses on edges that are most likely "between" communities. Vertex betweenness is an indicator of highly
central Central is an adjective usually referring to being in the center of some place or (mathematical) object. Central may also refer to: Directions and generalised locations * Central Africa, a region in the centre of Africa continent, also known a ...
nodes in networks. For any node i, vertex betweenness is defined as the fraction of shortest paths between pairs of nodes that run through it. It is relevant to models where the network modulates transfer of goods between known start and end nonsense it is, under the assumption that such transfer seeks the shortest available route. The Girvan–Newman algorithm extends this definition to the case of edges, defining the "edge betweenness" of an edge as the number of shortest paths between pairs of nodes that run along it. If there is more than one shortest path between a pair of nodes, each path is assigned equal weight such that the total weight of all of the paths is equal to unity. If a network contains communities or groups that are only loosely connected by a few inter-group edges, then all shortest paths between different communities must go along one of these few edges. Thus, the edges connecting communities will have high edge betweenness (at least one of them). By removing these edges, the groups are separated from one another and so the underlying community structure of the network is revealed. The algorithm's steps for community detection are summarized below # The betweenness of all existing edges in the network is calculated first. # The edge(s) with the highest betweenness are removed. # The betweenness of all edges affected by the removal is recalculated. # Steps 2 and 3 are repeated until no edges remain. The fact that the only betweennesses being recalculated are only the ones which are affected by the removal, may lessen the running time of the process' simulation in computers. However, the betweenness centrality must be recalculated with each step, or severe errors occur. The reason is that the network adapts itself to the new conditions set after the edge removal. For instance, if two communities are connected by more than one edge, then there is no guarantee that all of these edges will have high betweenness. According to the method, we know that at least one of them will have, but nothing more than that is known. By recalculating betweennesses after the removal of each edge, it is ensured that at least one of the remaining edges between two communities will always have a high value. The end result of the Girvan–Newman algorithm is a
dendrogram A dendrogram is a diagram representing a tree. This diagrammatic representation is frequently used in different contexts: * in hierarchical clustering, it illustrates the arrangement of the clusters produced by the corresponding analyses. ...
. As the Girvan–Newman algorithm runs, the dendrogram is produced from the top down (i.e. the network splits up into different communities with the successive removal of links). The leaves of the dendrogram are individual nodes.


See also

* Closeness *
Hierarchical clustering In data mining and statistics, hierarchical clustering (also called hierarchical cluster analysis or HCA) is a method of cluster analysis that seeks to build a hierarchy of clusters. Strategies for hierarchical clustering generally fall into tw ...
*
Modularity Broadly speaking, modularity is the degree to which a system's components may be separated and recombined, often with the benefit of flexibility and variety in use. The concept of modularity is used primarily to reduce complexity by breaking a s ...


References

{{DEFAULTSORT:Girvan-Newman algorithm Graph algorithms Networks Network analysis