In
graph theory
In mathematics, graph theory is the study of ''graphs'', which are mathematical structures used to model pairwise relations between objects. A graph in this context is made up of '' vertices'' (also called ''nodes'' or ''points'') which are conne ...
and
network analysis Network analysis can refer to:
* Network theory, the analysis of relations through mathematical graphs
** Social network analysis, network theory applied to social relations
* Network analysis (electrical circuits)
See also
*Network planning and ...
, indicators of centrality assign numbers or rankings to
nodes
In general, a node is a localized swelling (a "knot") or a point of intersection (a Vertex (graph theory), vertex).
Node may refer to:
In mathematics
*Vertex (graph theory), a vertex in a mathematical graph
*Vertex (geometry), a point where two ...
within a graph corresponding to their network position. Applications include identifying the most influential person(s) in a
social network
A social network is a social structure made up of a set of social actors (such as individuals or organizations), sets of dyadic ties, and other social interactions between actors. The social network perspective provides a set of methods for an ...
, key infrastructure nodes in the
Internet
The Internet (or internet) is the global system of interconnected computer networks that uses the Internet protocol suite (TCP/IP) to communicate between networks and devices. It is a '' network of networks'' that consists of private, pub ...
or
urban network
, also referred to as , is one of the Japan Railways Group (JR Group) companies and operates in western Honshu. It has its headquarters in Kita-ku, Osaka. It is listed in the Tokyo Stock Exchange, is a constituent of the TOPIX Large70 index, and ...
s,
super-spreader
A superspreading event (SSEV) is an event in which an infectious disease is spread much more than usual, while an unusually contagious organism infection, infected with a pathogen, disease is known as a superspreader. In the human-to-human transmi ...
s of disease, and brain networks.
Centrality concepts were first developed in
social network analysis
Social network analysis (SNA) is the process of investigating social structures through the use of networks and graph theory. It characterizes networked structures in terms of ''nodes'' (individual actors, people, or things within the network) a ...
, and many of the terms used to measure centrality reflect their
sociological
Sociology is a social science that focuses on society, human social behavior, patterns of social relationships, social interaction, and aspects of culture associated with everyday life. It uses various methods of empirical investigation and ...
origin.
[Newman, M.E.J. 2010. ''Networks: An Introduction.'' Oxford, UK: Oxford University Press.]
Definition and characterization of centrality indices
Centrality indices are answers to the question "What characterizes an important vertex?" The answer is given in terms of a real-valued function on the vertices of a graph, where the values produced are expected to provide a ranking which identifies the most important nodes.
The word "importance" has a wide number of meanings, leading to many different definitions of centrality. Two categorization schemes have been proposed. "Importance" can be conceived in relation to a type of flow or transfer across the network. This allows centralities to be classified by the type of flow they consider important.
[ "Importance" can alternatively be conceived as involvement in the cohesiveness of the network. This allows centralities to be classified based on how they measure cohesiveness.] Both of these approaches divide centralities in distinct categories. A further conclusion is that a centrality which is appropriate for one category will often "get it wrong" when applied to a different category.[
Many, though not all, centrality measures effectively count the number of paths (also called walks) of some type going through a given vertex; the measures differ in how the relevant walks are defined and counted. Restricting consideration to this group allows for taxonomy which places many centralities on a spectrum from those concerned with walks of length one (]degree centrality
In graph theory and network theory, network analysis, indicators of centrality assign numbers or rankings to vertex (graph theory), nodes within a graph corresponding to their network position. Applications include identifying the most influent ...
) to infinite walks (eigenvector centrality
In graph theory, eigenvector centrality (also called eigencentrality or prestige score) is a measure of the influence of a node in a network. Relative scores are assigned to all nodes in the network based on the concept that connections to high-sco ...
). Other centrality measures, such as betweenness centrality
In graph theory, betweenness centrality (or "betweeness centrality") is a measure of centrality in a graph based on shortest paths. For every pair of vertices in a connected graph, there exists at least one shortest path between the vertices such ...
focus not just on overall connectedness but occupying positions that are pivotal to the network's connectivity.
Characterization by network flows
A network can be considered a description of the paths along which something flows. This allows a characterization based on the type of flow and the type of path encoded by the centrality. A flow can be based on transfers, where each indivisible item goes from one node to another, like a package delivery going from the delivery site to the client's house. A second case is serial duplication, in which an item is replicated so that both the source and the target have it. An example is the propagation of information through gossip, with the information being propagated in a private way and with both the source and the target nodes being informed at the end of the process. The last case is parallel duplication, with the item being duplicated to several links at the same time, like a radio broadcast which provides the same information to many listeners at once.[
Likewise, the type of path can be constrained to ]geodesics
In geometry, a geodesic () is a curve representing in some sense the shortest path ( arc) between two points in a surface, or more generally in a Riemannian manifold. The term also has meaning in any differentiable manifold with a connection. ...
(shortest paths), paths (no vertex is visited more than once), trails
A trail, also known as a path or track, is an unpaved lane or small road usually passing through a natural area. In the United Kingdom and the Republic of Ireland, a path or footpath is the preferred term for a pedestrian or hiking trail. The ...
(vertices can be visited multiple times, no edge is traversed more than once), or walks (vertices and edges can be visited/traversed multiple times).[
]
Characterization by walk structure
An alternative classification can be derived from how the centrality is constructed. This again splits into two classes. Centralities are either ''radial'' or ''medial.'' Radial centralities count walks which start/end from the given vertex. The degree
Degree may refer to:
As a unit of measurement
* Degree (angle), a unit of angle measurement
** Degree of geographical latitude
** Degree of geographical longitude
* Degree symbol (°), a notation used in science, engineering, and mathematics
...
and eigenvalue
In linear algebra, an eigenvector () or characteristic vector of a linear transformation is a nonzero vector that changes at most by a scalar factor when that linear transformation is applied to it. The corresponding eigenvalue, often denoted b ...
centralities are examples of radial centralities, counting the number of walks of length one or length infinity. Medial centralities count walks which pass through the given vertex. The canonical example is Freeman's betweenness
Betweenness is an algorithmic problem in order theory about ordering a collection of items subject to constraints that some items must be placed between others.. It has applications in bioinformatics. and was shown to be NP-complete by .
Problem ...
centrality, the number of shortest paths which pass through the given vertex.[
Likewise, the counting can capture either the ''volume'' or the ''length'' of walks. Volume is the total number of walks of the given type. The three examples from the previous paragraph fall into this category. Length captures the distance from the given vertex to the remaining vertices in the graph. Closeness centrality, the total geodesic distance from a given vertex to all other vertices, is the best known example.][ Note that this classification is independent of the type of walk counted (i.e. walk, trail, path, geodesic).
Borgatti and Everett propose that this typology provides insight into how best to compare centrality measures. Centralities placed in the same box in this 2×2 classification are similar enough to make plausible alternatives; one can reasonably compare which is better for a given application. Measures from different boxes, however, are categorically distinct. Any evaluation of relative fitness can only occur within the context of predetermining which category is more applicable, rendering the comparison moot.][
]
Radial-volume centralities exist on a spectrum
The characterization by walk structure shows that almost all centralities in wide use are radial-volume measures. These encode the belief that a vertex's centrality is a function of the centrality of the vertices it is associated with. Centralities distinguish themselves on how association is defined.
Bonacich showed that if association is defined in terms of walks, then a family of centralities can be defined based on the length of walk considered. Degree centrality
In graph theory and network theory, network analysis, indicators of centrality assign numbers or rankings to vertex (graph theory), nodes within a graph corresponding to their network position. Applications include identifying the most influent ...
counts walks of length one, while eigenvalue centrality counts walks of length infinity. Alternative definitions of association are also reasonable. Alpha centrality
In graph theory and social network analysis, alpha centrality is an alternative name for Katz centrality. It is a measure of centrality of nodes within a graph. It is an adaptation of eigenvector centrality with the addition that nodes are imbued ...
allows vertices to have an external source of influence. Estrada's subgraph centrality proposes only counting closed paths (triangles, squares, etc.).
The heart of such measures is the observation that powers of the graph's adjacency matrix gives the number of walks of length given by that power. Similarly, the matrix exponential is also closely related to the number of walks of a given length. An initial transformation of the adjacency matrix allows a different definition of the type of walk counted. Under either approach, the centrality of a vertex can be expressed as an infinite sum, either
:
for matrix powers or
:
for matrix exponentials, where
* is walk length,
* is the transformed adjacency matrix, and
* is a discount parameter which ensures convergence of the sum.
Bonacich's family of measures does not transform the adjacency matrix. Alpha centrality
In graph theory and social network analysis, alpha centrality is an alternative name for Katz centrality. It is a measure of centrality of nodes within a graph. It is an adaptation of eigenvector centrality with the addition that nodes are imbued ...
replaces the adjacency matrix with its resolvent. Subgraph centrality replaces the adjacency matrix with its trace. A startling conclusion is that regardless of the initial transformation of the adjacency matrix, all such approaches have common limiting behavior. As approaches zero, the indices converge to degree centrality
In graph theory and network theory, network analysis, indicators of centrality assign numbers or rankings to vertex (graph theory), nodes within a graph corresponding to their network position. Applications include identifying the most influent ...
. As approaches its maximal value, the indices converge to eigenvalue centrality.[
]
Game-theoretic centrality
The common feature of most of the aforementioned standard measures is that they assess the
importance of a node by focusing only on the role that a node plays by itself. However,
in many applications such an approach is inadequate because of synergies that may occur
if the functioning of nodes is considered in groups.
For example, consider the problem of stopping an epidemic. Looking at above image of network, which nodes should we vaccinate? Based on previously described measures, we want to recognize nodes that are the most important in disease spreading. Approaches based only on centralities, that focus on individual features of nodes, may not be good idea. Nodes in the red square, individually cannot stop disease spreading, but considering them as a group, we clearly see that they can stop disease if it has started in nodes , , and . Game-theoretic centralities try to consult described problems and opportunities, using tools from game-theory. The approach proposed in uses the Shapley value
The Shapley value is a solution concept in cooperative game theory. It was named in honor of Lloyd Shapley, who introduced it in 1951 and won the Nobel Memorial Prize in Economic Sciences for it in 2012. To each cooperative game it assigns a uniq ...
. Because of the time-complexity hardness of the Shapley value calculation, most efforts in this domain are driven into implementing new algorithms and methods which rely on a peculiar topology of the network or a special character of the problem. Such an approach may lead to reducing time-complexity from exponential to polynomial.
Similarly, the solution concept authority distribution
The solution concept authority distribution was formulated by Lloyd Shapley and his student X. Hu in 2003 to measure the authority power of players in a well-contracted organization. The index generates the Shapley-Shubik power index and can be u ...
() applies the Shapley-Shubik power index, rather than the Shapley value
The Shapley value is a solution concept in cooperative game theory. It was named in honor of Lloyd Shapley, who introduced it in 1951 and won the Nobel Memorial Prize in Economic Sciences for it in 2012. To each cooperative game it assigns a uniq ...
, to measure the bilateral direct influence between the players. The distribution is indeed a type of eigenvector centrality. It is used to sort big data objects in Hu (2020), such as ranking U.S. colleges.
Important limitations
Centrality indices have two important limitations, one obvious and the other subtle. The obvious limitation is that a centrality which is optimal for one application is often sub-optimal for a different application. Indeed, if this were not so, we would not need so many different centralities. An illustration of this phenomenon is provided by the Krackhardt kite graph
In graph theory, the Krackhardt kite graph is a simple graph with ten nodes. The graph is named after David Krackhardt, a researcher of social network theory.
Krackhardt introduced the graph in 1990 to distinguish different concepts of centrality. ...
, for which three different notions of centrality give three different choices of the most central vertex.
The more subtle limitation is the commonly held fallacy that vertex centrality indicates the relative importance of vertices. Centrality indices are explicitly designed to produce a ranking which allows indication of the most important vertices.[ This they do well, under the limitation just noted. They are not designed to measure the influence of nodes in general. Recently, network physicists have begun developing ]node influence metric In graph theory and network analysis, node influence metrics are measures that rank or quantify the influence of every node (also called vertex) within a graph. They are related to centrality indices. Applications include measuring the influence o ...
s to address this problem.
The error is two-fold. Firstly, a ranking only orders vertices by importance, it does not quantify the difference in importance between different levels of the ranking. This may be mitigated by applying Freeman centralization
Freeman, free men, or variant, may refer to:
* a member of the Third Estate in medieval society (commoners), see estates of the realm
* Freeman, an apprentice who has been granted freedom of the company, was a rank within Livery companies
* Free ...
to the centrality measure in question, which provide some insight to the importance of nodes depending on the differences of their centralization scores. Furthermore, Freeman centralization enables one to compare several networks by comparing their highest centralization scores. This approach, however, is seldom seen in practice.
Secondly, the features which (correctly) identify the most important vertices in a given network/application do not necessarily generalize to the remaining vertices.
For the majority of other network nodes the rankings may be meaningless. This explains why, for example, only the first few results of a Google image search appear in a reasonable order. The pagerank is a highly unstable measure, showing frequent rank reversals after small adjustments of the jump parameter.
While the failure of centrality indices to generalize to the rest of the network may at first seem counter-intuitive, it follows directly from the above definitions.
Complex networks have heterogeneous topology. To the extent that the optimal measure depends on the network structure of the most important vertices, a measure which is optimal for such vertices is sub-optimal for the remainder of the network.
Degree centrality
Historically first and conceptually simplest is degree centrality, which is defined as the number of links incident upon a node (i.e., the number of ties that a node has). The degree can be interpreted in terms of the immediate risk of a node for catching whatever is flowing through the network (such as a virus, or some information). In the case of a directed network (where ties have direction), we usually define two separate measures of degree centrality, namely indegree
In mathematics, and more specifically in graph theory, a directed graph (or digraph) is a graph that is made up of a set of vertices connected by directed edges, often called arcs.
Definition
In formal terms, a directed graph is an ordered pa ...
and outdegree
In mathematics, and more specifically in graph theory, a directed graph (or digraph) is a graph that is made up of a set of vertices connected by directed edges, often called arcs.
Definition
In formal terms, a directed graph is an ordered pai ...
. Accordingly, indegree is a count of the number of ties directed to the node and outdegree is the number of ties that the node directs to others. When ties are associated to some positive aspects such as friendship or collaboration, indegree is often interpreted as a form of popularity, and outdegree as gregariousness.
The degree centrality of a vertex , for a given graph with vertices and edges, is defined as
:
Calculating degree centrality for all the nodes in a graph takes in a dense
Density (volumetric mass density or specific mass) is the substance's mass per unit of volume. The symbol most often used for density is ''ρ'' (the lower case Greek letter rho), although the Latin letter ''D'' can also be used. Mathematically ...
adjacency matrix
In graph theory and computer science, an adjacency matrix is a square matrix used to represent a finite graph. The elements of the matrix indicate whether pairs of vertices are adjacent or not in the graph.
In the special case of a finite simp ...
representation of the graph, and for edges takes in a sparse matrix
In numerical analysis and scientific computing, a sparse matrix or sparse array is a matrix in which most of the elements are zero. There is no strict definition regarding the proportion of zero-value elements for a matrix to qualify as sparse b ...
representation.
The definition of centrality on the node level can be extended to the whole graph, in which case we are speaking of ''graph centralization''. Let be the node with highest degree centrality in . Let be the -node connected graph that maximizes the following quantity (with being the node with highest degree centrality in ):
: