graph theory In mathematics, graph theory is the study of '' graphs'', which are mathematical structures used to model pairwise relations between objects. A graph in this context is made up of '' vertices'' (also called ''nodes'' or ''points'') which are conn ...

and

network analysis Network analysis can refer to: * Network theory, the analysis of relations through mathematical graphs ** Social network analysis, network theory applied to social relations * Network analysis (electrical circuits) A network, in the context of e ...

, indicators of centrality assign numbers or rankings to

nodes In general, a node is a localized swelling (a "knot") or a point of intersection (a vertex). Node may refer to: In mathematics *Vertex (graph theory), a vertex in a mathematical graph *Vertex (geometry), a point where two or more curves, lines, ...

within a graph corresponding to their network position. Applications include identifying the most influential person(s) in a

social network A social network is a social structure made up of a set of social actors (such as individuals or organizations), sets of dyadic ties, and other social interactions between actors. The social network perspective provides a set of methods for ...

, key infrastructure nodes in the

Internet The Internet (or internet) is the global system of interconnected computer networks that uses the Internet protocol suite (TCP/IP) to communicate between networks and devices. It is a ''internetworking, network of networks'' that consists ...

urban network , also referred to as , is one of the Japan Railways Group (JR Group) companies and operates in western Honshu. It has its headquarters in Kita-ku, Osaka. It is listed in the Tokyo Stock Exchange, is a constituent of the TOPIX Large70 index, and i ...

super-spreader A superspreading event (SSEV) is an event in which an infectious disease is spread much more than usual, while an unusually contagious organism infected with a disease is known as a superspreader. In the context of a human-borne illness, a super ...

s of disease, and brain networks. Centrality concepts were first developed in

social network analysis Social network analysis (SNA) is the process of investigating social structures through the use of networks and graph theory. It characterizes networked structures in terms of ''nodes'' (individual actors, people, or things within the network) ...

, and many of the terms used to measure centrality reflect their

sociological Sociology is a social science that focuses on society, human social behavior, patterns of social relationships, social interaction, and aspects of culture associated with everyday life. It uses various methods of empirical investigation and ...

origin.Newman, M.E.J. 2010. ''Networks: An Introduction.'' Oxford, UK: Oxford University Press.

Definition and characterization of centrality indices

Centrality indices are answers to the question "What characterizes an important vertex?" The answer is given in terms of a real-valued function on the vertices of a graph, where the values produced are expected to provide a ranking which identifies the most important nodes. The word "importance" has a wide number of meanings, leading to many different definitions of centrality. Two categorization schemes have been proposed. "Importance" can be conceived in relation to a type of flow or transfer across the network. This allows centralities to be classified by the type of flow they consider important. "Importance" can alternatively be conceived as involvement in the cohesiveness of the network. This allows centralities to be classified based on how they measure cohesiveness. Both of these approaches divide centralities in distinct categories. A further conclusion is that a centrality which is appropriate for one category will often "get it wrong" when applied to a different category. Many, though not all, centrality measures effectively count the number of paths (also called walks) of some type going through a given vertex; the measures differ in how the relevant walks are defined and counted. Restricting consideration to this group allows for taxonomy which places many centralities on a spectrum from those concerned with walks of length one ( degree centrality) to infinite walks (

eigenvector centrality In graph theory, eigenvector centrality (also called eigencentrality or prestige score) is a measure of the influence of a node in a network. Relative scores are assigned to all nodes in the network based on the concept that connections to high-sc ...

). Other centrality measures, such as

betweenness centrality In graph theory, betweenness centrality (or "betweeness centrality") is a measure of centrality in a graph based on shortest paths. For every pair of vertices in a connected graph, there exists at least one shortest path between the vertices suc ...

focus not just on overall connectedness but occupying positions that are pivotal to the network's connectivity.

Characterization by network flows

A network can be considered a description of the paths along which something flows. This allows a characterization based on the type of flow and the type of path encoded by the centrality. A flow can be based on transfers, where each indivisible item goes from one node to another, like a package delivery going from the delivery site to the client's house. A second case is serial duplication, in which an item is replicated so that both the source and the target have it. An example is the propagation of information through gossip, with the information being propagated in a private way and with both the source and the target nodes being informed at the end of the process. The last case is parallel duplication, with the item being duplicated to several links at the same time, like a radio broadcast which provides the same information to many listeners at once. Likewise, the type of path can be constrained to

geodesics In geometry, a geodesic () is a curve representing in some sense the shortest path ( arc) between two points in a surface, or more generally in a Riemannian manifold. The term also has meaning in any differentiable manifold with a connection. ...

(shortest paths), paths (no vertex is visited more than once),

trails A trail, also known as a path or track, is an unpaved lane or small road usually passing through a natural area. In the United Kingdom and the Republic of Ireland, a path or footpath is the preferred term for a pedestrian or hiking trail. ...

(vertices can be visited multiple times, no edge is traversed more than once), or walks (vertices and edges can be visited/traversed multiple times).

Characterization by walk structure

An alternative classification can be derived from how the centrality is constructed. This again splits into two classes. Centralities are either ''radial'' or ''medial.'' Radial centralities count walks which start/end from the given vertex. The

degree Degree may refer to: As a unit of measurement * Degree (angle), a unit of angle measurement ** Degree of geographical latitude ** Degree of geographical longitude * Degree symbol (°), a notation used in science, engineering, and mathemati ...

and

eigenvalue In linear algebra, an eigenvector () or characteristic vector of a linear transformation is a nonzero vector that changes at most by a scalar factor when that linear transformation is applied to it. The corresponding eigenvalue, often denot ...

centralities are examples of radial centralities, counting the number of walks of length one or length infinity. Medial centralities count walks which pass through the given vertex. The canonical example is Freeman's

betweenness Betweenness is an algorithmic problem in order theory about ordering a collection of items subject to constraints that some items must be placed between others.. It has applications in bioinformatics. and was shown to be NP-complete by . Problem ...

centrality, the number of shortest paths which pass through the given vertex. Likewise, the counting can capture either the ''volume'' or the ''length'' of walks. Volume is the total number of walks of the given type. The three examples from the previous paragraph fall into this category. Length captures the distance from the given vertex to the remaining vertices in the graph. Closeness centrality, the total geodesic distance from a given vertex to all other vertices, is the best known example. Note that this classification is independent of the type of walk counted (i.e. walk, trail, path, geodesic). Borgatti and Everett propose that this typology provides insight into how best to compare centrality measures. Centralities placed in the same box in this 2×2 classification are similar enough to make plausible alternatives; one can reasonably compare which is better for a given application. Measures from different boxes, however, are categorically distinct. Any evaluation of relative fitness can only occur within the context of predetermining which category is more applicable, rendering the comparison moot.

Radial-volume centralities exist on a spectrum

The characterization by walk structure shows that almost all centralities in wide use are radial-volume measures. These encode the belief that a vertex's centrality is a function of the centrality of the vertices it is associated with. Centralities distinguish themselves on how association is defined. Bonacich showed that if association is defined in terms of walks, then a family of centralities can be defined based on the length of walk considered. Degree centrality counts walks of length one, while eigenvalue centrality counts walks of length infinity. Alternative definitions of association are also reasonable.

Alpha centrality In graph theory and social network analysis, alpha centrality is an alternative name for Katz centrality. It is a measure of centrality of nodes within a graph. It is an adaptation of eigenvector centrality with the addition that nodes are imbued ...

allows vertices to have an external source of influence. Estrada's subgraph centrality proposes only counting closed paths (triangles, squares, etc.). The heart of such measures is the observation that powers of the graph's adjacency matrix gives the number of walks of length given by that power. Similarly, the matrix exponential is also closely related to the number of walks of a given length. An initial transformation of the adjacency matrix allows a different definition of the type of walk counted. Under either approach, the centrality of a vertex can be expressed as an infinite sum, either :

\sum_^\infty A_^ \beta^k

for matrix powers or :

\sum_^\infty \frac

for matrix exponentials, where *

k

is walk length, *

A_R

is the transformed adjacency matrix, and *

\beta

is a discount parameter which ensures convergence of the sum. Bonacich's family of measures does not transform the adjacency matrix.

replaces the adjacency matrix with its resolvent. Subgraph centrality replaces the adjacency matrix with its trace. A startling conclusion is that regardless of the initial transformation of the adjacency matrix, all such approaches have common limiting behavior. As

\beta

approaches zero, the indices converge to degree centrality. As

\beta

approaches its maximal value, the indices converge to eigenvalue centrality.

Game-theoretic centrality

The common feature of most of the aforementioned standard measures is that they assess the importance of a node by focusing only on the role that a node plays by itself. However, in many applications such an approach is inadequate because of synergies that may occur if the functioning of nodes is considered in groups. Game-theoretic centrality

For example, consider the problem of stopping an epidemic. Looking at above image of network, which nodes should we vaccinate? Based on previously described measures, we want to recognize nodes that are the most important in disease spreading. Approaches based only on centralities, that focus on individual features of nodes, may not be good idea. Nodes in the red square, individually cannot stop disease spreading, but considering them as a group, we clearly see that they can stop disease if it has started in nodes

v_1

v_4

, and

v_5

. Game-theoretic centralities try to consult described problems and opportunities, using tools from game-theory. The approach proposed in uses the

Shapley value The Shapley value is a solution concept in cooperative game theory. It was named in honor of Lloyd Shapley, who introduced it in 1951 and won the Nobel Memorial Prize in Economic Sciences for it in 2012. To each cooperative game it assigns a u ...

. Because of the time-complexity hardness of the Shapley value calculation, most efforts in this domain are driven into implementing new algorithms and methods which rely on a peculiar topology of the network or a special character of the problem. Such an approach may lead to reducing time-complexity from exponential to polynomial. Similarly, the solution concept

authority distribution The solution concept authority distribution was formulated by Lloyd Shapley and his student X. Hu in 2003 to measure the authority power of players in a well-contracted organization. The index generates the Shapley-Shubik power index and can be u ...

() applies the Shapley-Shubik power index, rather than the

, to measure the bilateral direct influence between the players. The distribution is indeed a type of eigenvector centrality. It is used to sort big data objects in Hu (2020), such as ranking U.S. colleges.

Important limitations

Centrality indices have two important limitations, one obvious and the other subtle. The obvious limitation is that a centrality which is optimal for one application is often sub-optimal for a different application. Indeed, if this were not so, we would not need so many different centralities. An illustration of this phenomenon is provided by the

Krackhardt kite graph In graph theory, the Krackhardt kite graph is a simple graph with ten nodes. The graph is named after David Krackhardt, a researcher of social network theory. Krackhardt introduced the graph in 1990 to distinguish different concepts of centrality ...

, for which three different notions of centrality give three different choices of the most central vertex. The more subtle limitation is the commonly held fallacy that vertex centrality indicates the relative importance of vertices. Centrality indices are explicitly designed to produce a ranking which allows indication of the most important vertices. This they do well, under the limitation just noted. They are not designed to measure the influence of nodes in general. Recently, network physicists have begun developing

node influence metric In graph theory and network analysis, node influence metrics are measures that rank or quantify the influence of every node (also called vertex) within a graph. They are related to centrality indices. Applications include measuring the influence ...

s to address this problem. The error is two-fold. Firstly, a ranking only orders vertices by importance, it does not quantify the difference in importance between different levels of the ranking. This may be mitigated by applying

Freeman centralization Freeman, free men, or variant, may refer to: * a member of the Third Estate in medieval society (commoners), see estates of the realm * Freeman, an apprentice who has been granted freedom of the company, was a rank within Livery companies * Free ...

to the centrality measure in question, which provide some insight to the importance of nodes depending on the differences of their centralization scores. Furthermore, Freeman centralization enables one to compare several networks by comparing their highest centralization scores. This approach, however, is seldom seen in practice. Secondly, the features which (correctly) identify the most important vertices in a given network/application do not necessarily generalize to the remaining vertices. For the majority of other network nodes the rankings may be meaningless. This explains why, for example, only the first few results of a Google image search appear in a reasonable order. The pagerank is a highly unstable measure, showing frequent rank reversals after small adjustments of the jump parameter. While the failure of centrality indices to generalize to the rest of the network may at first seem counter-intuitive, it follows directly from the above definitions. Complex networks have heterogeneous topology. To the extent that the optimal measure depends on the network structure of the most important vertices, a measure which is optimal for such vertices is sub-optimal for the remainder of the network.

Degree centrality

Historically first and conceptually simplest is degree centrality, which is defined as the number of links incident upon a node (i.e., the number of ties that a node has). The degree can be interpreted in terms of the immediate risk of a node for catching whatever is flowing through the network (such as a virus, or some information). In the case of a directed network (where ties have direction), we usually define two separate measures of degree centrality, namely

indegree In mathematics, and more specifically in graph theory, a directed graph (or digraph) is a graph that is made up of a set of vertices connected by directed edges, often called arcs. Definition In formal terms, a directed graph is an ordered pa ...

and

outdegree In mathematics, and more specifically in graph theory, a directed graph (or digraph) is a graph that is made up of a set of vertices connected by directed edges, often called arcs. Definition In formal terms, a directed graph is an ordered pai ...

. Accordingly, indegree is a count of the number of ties directed to the node and outdegree is the number of ties that the node directs to others. When ties are associated to some positive aspects such as friendship or collaboration, indegree is often interpreted as a form of popularity, and outdegree as gregariousness. The degree centrality of a vertex

v

, for a given graph

G:=(V,E)

with

, V,

vertices and

, E,

edges, is defined as :

C_D(v)= \deg(v)

Calculating degree centrality for all the nodes in a graph takes

\Theta(V^2)

in a

dense Density (volumetric mass density or specific mass) is the substance's mass per unit of volume. The symbol most often used for density is ''ρ'' (the lower case Greek letter rho), although the Latin letter ''D'' can also be used. Mathematically ...

adjacency matrix In graph theory and computer science, an adjacency matrix is a square matrix used to represent a finite graph. The elements of the matrix indicate whether pairs of vertices are adjacent or not in the graph. In the special case of a finite simple ...

representation of the graph, and for edges takes

\Theta(E)

in a

sparse matrix In numerical analysis and scientific computing, a sparse matrix or sparse array is a matrix in which most of the elements are zero. There is no strict definition regarding the proportion of zero-value elements for a matrix to qualify as sparse b ...

representation. The definition of centrality on the node level can be extended to the whole graph, in which case we are speaking of ''graph centralization''. Let

v*

be the node with highest degree centrality in

G

. Let

X:=(Y,Z)

be the

, Y,

-node connected graph that maximizes the following quantity (with

y*

being the node with highest degree centrality in

X

): :

H= \sum^__D(y*)-C_D(y_j) /math>

Correspondingly, the degree centralization of the graph G is as follows:

: C_D(G)= \frac The value of H is maximized when the graph X contains one central node to which all other nodes are connected (a

star graph In graph theory, a star is the complete bipartite graph a tree with one internal node and leaves (but no internal nodes and leaves when ). Alternatively, some authors define to be the tree of order with maximum diameter 2; in which case a ...

), and in this case :

H=(n-1)\cdot((n-1)-1)=n^2-3n+2.

So, for any graph

G:=(V,E),

C_D(G)= \frac

Also, a new extensive global measure for degree centrality named Tendency to Make Hub (TMH) defines as follows: :

\text = \frac

where TMH increases by appearance of degree centrality in the network.

Closeness centrality

In a

connected Connected may refer to: Film and television * ''Connected'' (2008 film), a Hong Kong remake of the American movie ''Cellular'' * '' Connected: An Autoblogography About Love, Death & Technology'', a 2011 documentary film * ''Connected'' (2015 TV ...

graph Graph may refer to: Mathematics *Graph (discrete mathematics), a structure made of vertices and edges **Graph theory, the study of such graphs and their properties *Graph (topology), a topological space resembling a graph in the sense of discre ...

, the normalized closeness centrality (or closeness) of a node is the average length of the

shortest path In graph theory, the shortest path problem is the problem of finding a path between two vertices (or nodes) in a graph such that the sum of the weights of its constituent edges is minimized. The problem of finding the shortest path between t ...

between the node and all other nodes in the graph. Thus the more central a node is, the closer it is to all other nodes. Closeness was defined by

Alex Bavelas Alexander Bavelas (December 26, 1913 – August 16, 1993) was an American psychosociologist credited as the first to define closeness centrality. His work was influential in using mathematics in developing the concept of centralization and ...

(1950) as the

reciprocal Reciprocal may refer to: In mathematics * Multiplicative inverse, in mathematics, the number 1/''x'', which multiplied by ''x'' gives the product 1, also known as a ''reciprocal'' * Reciprocal polynomial, a polynomial obtained from another pol ...

of the farness, that is

C_B(v)= (\sum_u d(u,v))^

where

d(u,v)

is the

distance Distance is a numerical or occasionally qualitative measurement of how far apart objects or points are. In physics or everyday usage, distance may refer to a physical length or an estimation based on other criteria (e.g. "two counties over"). ...

between vertices ''u'' and ''v''. However, when speaking of closeness centrality, people usually refer to its normalized form, given by the previous formula multiplied by

N-1

, where

N

is the number of nodes in the graph :

C(v)= \frac .

This normalisation allows comparisons between nodes of graphs of different sizes. For many graphs, there is a strong correlation between the inverse of closeness and the logarithm of degree,

(C(v))^ \approx -\alpha \ln(k_v) + \beta

where

k_v

is the degree of vertex ''v'' while α and β are constants for each network. Taking distances ''from'' or ''to'' all other nodes is irrelevant in undirected graphs, whereas it can produce totally different results in

directed graph In mathematics, and more specifically in graph theory, a directed graph (or digraph) is a graph that is made up of a set of vertices connected by directed edges, often called arcs. Definition In formal terms, a directed graph is an ordered pai ...

s (e.g. a website can have a high closeness centrality from outgoing link, but low closeness centrality from incoming links).

Harmonic centrality

In a (not necessarily connected) graph, the harmonic centrality reverses the sum and reciprocal operations in the definition of closeness centrality: :

H(v)= \sum_ \frac

where

1 / d(u,v) = 0

if there is no path from ''u'' to ''v''. Harmonic centrality can be normalized by dividing by

N-1

, where

N

is the number of nodes in the graph. Harmonic centrality was proposed by Marchiori and Latora (2000) and then independently by Dekker (2005), using the name "valued centrality," and by Rochat (2009).

Betweenness centrality

Betweenness is a centrality measure of a

vertex Vertex, vertices or vertexes may refer to: Science and technology Mathematics and computer science *Vertex (geometry), a point where two or more curves, lines, or edges meet *Vertex (computer graphics), a data structure that describes the position ...

within a

(there is also

edge Edge or EDGE may refer to: Technology Computing * Edge computing, a network load-balancing system * Edge device, an entry point to a computer network * Adobe Edge, a graphical development application * Microsoft Edge, a web browser developed b ...

betweenness, which is not discussed here). Betweenness centrality quantifies the number of times a node acts as a bridge along the shortest path between two other nodes. It was introduced as a measure for quantifying the control of a human on the communication between other humans in a social network by Linton Freeman. In his conception, vertices that have a high probability to occur on a randomly chosen

between two randomly chosen vertices have a high betweenness. The betweenness of a vertex

v

in a graph

G:=(V,E)

with

V

vertices is computed as follows: # For each pair of vertices (''s'',''t''), compute the shortest paths between them. # For each pair of vertices (''s'',''t''), determine the fraction of shortest paths that pass through the vertex in question (here, vertex ''v''). # Sum this fraction over all pairs of vertices (''s'',''t''). More compactly the betweenness can be represented as: :

C_B(v)= \sum_\frac

where

\sigma_

is total number of shortest paths from node

s

to node

t

and

\sigma_(v)

is the number of those paths that pass through

v

. The betweenness may be normalised by dividing through the number of pairs of vertices not including ''v'', which for directed graphs is

(n-1)(n-2)

and for undirected graphs is

(n-1)(n-2)/2

. For example, in an undirected

, the center vertex (which is contained in every possible shortest path) would have a betweenness of

(n-1)(n-2)/2

(1, if normalised) while the leaves (which are contained in no shortest paths) would have a betweenness of 0. From a calculation aspect, both betweenness and closeness centralities of all vertices in a graph involve calculating the shortest paths between all pairs of vertices on a graph, which requires

O(V^3)

time with the

Floyd–Warshall algorithm In computer science, the Floyd–Warshall algorithm (also known as Floyd's algorithm, the Roy–Warshall algorithm, the Roy–Floyd algorithm, or the WFI algorithm) is an algorithm for finding shortest paths in a directed weighted graph with p ...

. However, on sparse graphs, Johnson's algorithm may be more efficient, taking

O(V^2 \log V + V E)

time. In the case of unweighted graphs the calculations can be done with Brandes' algorithm which takes

O(V E)

time. Normally, these algorithms assume that graphs are undirected and connected with the allowance of loops and multiple edges. When specifically dealing with network graphs, often graphs are without loops or multiple edges to maintain simple relationships (where edges represent connections between two people or vertices). In this case, using Brandes' algorithm will divide final centrality scores by 2 to account for each shortest path being counted twice.

Eigenvector centrality

Eigenvector centrality (also called eigencentrality) is a measure of the influence of a

node In general, a node is a localized swelling (a "knot") or a point of intersection (a vertex). Node may refer to: In mathematics * Vertex (graph theory), a vertex in a mathematical graph * Vertex (geometry), a point where two or more curves, line ...

in a

network Network, networking and networked may refer to: Science and technology * Network theory, the study of graphs as a representation of relations between discrete objects * Network science, an academic field that studies complex networks Mathematics ...

. It assigns relative scores to all nodes in the network based on the concept that connections to high-scoring nodes contribute more to the score of the node in question than equal connections to low-scoring nodes.

Google Google LLC () is an American Multinational corporation, multinational technology company focusing on Search Engine, search engine technology, online advertising, cloud computing, software, computer software, quantum computing, e-commerce, ar ...

PageRank PageRank (PR) is an algorithm used by Google Search to rank web pages in their search engine results. It is named after both the term "web page" and co-founder Larry Page. PageRank is a way of measuring the importance of website pages. Accordi ...

and the Katz centrality are variants of the eigenvector centrality.

Using the adjacency matrix to find eigenvector centrality

For a given graph

G:=(V,E)

with

, V,

number of vertices let

A = (a_)

be the

, i.e.

a_ = 1

if vertex

v

is linked to vertex

t

, and

a_ = 0

otherwise. The relative centrality score of vertex

v

can be defined as: :

x_v = \frac \sum_x_t = \frac \sum_ a_x_t

where

M(v)

is a set of the neighbors of

v

and

\lambda

is a constant. With a small rearrangement this can be rewritten in vector notation as the

eigenvector In linear algebra, an eigenvector () or characteristic vector of a linear transformation is a nonzero vector that changes at most by a scalar factor when that linear transformation is applied to it. The corresponding eigenvalue, often denote ...

equation :

\mathbf = \mathbf

In general, there will be many different

\lambda

for which a non-zero eigenvector solution exists. Since the entries in the adjacency matrix are non-negative, there is a unique largest eigenvalue, which is real and positive, by the

Perron–Frobenius theorem In matrix theory, the Perron–Frobenius theorem, proved by and , asserts that a real square matrix with positive entries has a unique largest real eigenvalue and that the corresponding eigenvector can be chosen to have strictly positive component ...

. This greatest eigenvalue results in the desired centrality measure. The

v^

component of the related eigenvector then gives the relative centrality score of the vertex

v

in the network. The eigenvector is only defined up to a common factor, so only the ratios of the centralities of the vertices are well defined. To define an absolute score one must normalise the eigenvector, e.g., such that the sum over all vertices is 1 or the total number of vertices ''n''.

Power iteration In mathematics, power iteration (also known as the power method) is an eigenvalue algorithm: given a diagonalizable matrix A, the algorithm will produce a number \lambda, which is the greatest (in absolute value) eigenvalue of A, and a nonzero vect ...

is one of many

eigenvalue algorithm In numerical analysis, one of the most important problems is designing efficient and Numerical stability, stable algorithms for finding the eigenvalues of a Matrix (mathematics), matrix. These eigenvalue algorithms may also find eigenvectors. Eig ...

s that may be used to find this dominant eigenvector. Furthermore, this can be generalized so that the entries in ''A'' can be real numbers representing connection strengths, as in a

stochastic matrix In mathematics, a stochastic matrix is a square matrix used to describe the transitions of a Markov chain. Each of its entries is a nonnegative real number representing a probability. It is also called a probability matrix, transition matrix, ...

Katz centrality

Katz centrality is a generalization of degree centrality. Degree centrality measures the number of direct neighbors, and Katz centrality measures the number of all nodes that can be connected through a path, while the contributions of distant nodes are penalized. Mathematically, it is defined as :

x_i = \sum_^\sum_^N \alpha^k (A^k)_

where

\alpha

is an attenuation factor in

(0,1)

. Katz centrality can be viewed as a variant of eigenvector centrality. Another form of Katz centrality is :

x_i = \alpha \sum_^N a_(x_j+1).

Compared to the expression of eigenvector centrality,

x_j

is replaced by

x_j+1.

It is shown that the principal eigenvector (associated with the largest eigenvalue of

A

, the adjacency matrix) is the limit of Katz centrality as

\alpha

approaches

\tfrac

from below.

PageRank centrality

PageRank satisfies the following equation :

x_i = \alpha \sum_ a_\frac + \frac,

where :

L(j) = \sum_ a_

is the number of neighbors of node

j

(or number of outbound links in a directed graph). Compared to eigenvector centrality and Katz centrality, one major difference is the scaling factor

L(j)

. Another difference between PageRank and eigenvector centrality is that the PageRank vector is a left hand eigenvector (note the factor

a_

has indices reversed).

Percolation centrality

A slew of centrality measures exist to determine the ‘importance’ of a single node in a complex network. However, these measures quantify the importance of a node in purely topological terms, and the value of the node does not depend on the ‘state’ of the node in any way. It remains constant regardless of network dynamics. This is true even for the weighted betweenness measures. However, a node may very well be centrally located in terms of betweenness centrality or another centrality measure, but may not be ‘centrally’ located in the context of a network in which there is percolation. Percolation of a ‘contagion’ occurs in complex networks in a number of scenarios. For example, viral or bacterial infection can spread over social networks of people, known as contact networks. The spread of disease can also be considered at a higher level of abstraction, by contemplating a network of towns or population centres, connected by road, rail or air links. Computer viruses can spread over computer networks. Rumours or news about business offers and deals can also spread via social networks of people. In all of these scenarios, a ‘contagion’ spreads over the links of a complex network, altering the ‘states’ of the nodes as it spreads, either recoverably or otherwise. For example, in an epidemiological scenario, individuals go from ‘susceptible’ to ‘infected’ state as the infection spreads. The states the individual nodes can take in the above examples could be binary (such as received/not received a piece of news), discrete (susceptible/infected/recovered), or even continuous (such as the proportion of infected people in a town), as the contagion spreads. The common feature in all these scenarios is that the spread of contagion results in the change of node states in networks. Percolation centrality (PC) was proposed with this in mind, which specifically measures the importance of nodes in terms of aiding the percolation through the network. This measure was proposed by Piraveenan et al. Percolation centrality is defined for a given node, at a given time, as the proportion of ‘percolated paths’ that go through that node. A ‘percolated path’ is a shortest path between a pair of nodes, where the source node is percolated (e.g., infected). The target node can be percolated or non-percolated, or in a partially percolated state. :

PC^t(v)= \frac\sum_\frac\frac

where

\sigma_

is total number of shortest paths from node

s

to node

r

and

\sigma_(v)

is the number of those paths that pass through

v

. The percolation state of the node

i

at time

t

is denoted by

_i

and two special cases are when

_i=0

which indicates a non-percolated state at time

t

whereas when

_i=1

which indicates a fully percolated state at time

t

. The values in between indicate partially percolated states ( e.g., in a network of townships, this would be the percentage of people infected in that town). The attached weights to the percolation paths depend on the percolation levels assigned to the source nodes, based on the premise that the higher the percolation level of a source node is, the more important are the paths that originate from that node. Nodes which lie on shortest paths originating from highly percolated nodes are therefore potentially more important to the percolation. The definition of PC may also be extended to include target node weights as well. Percolation centrality calculations run in

O(NM)

time with an efficient implementation adopted from Brandes' fast algorithm and if the calculation needs to consider target nodes weights, the worst case time is

O(N^3)

Cross-clique centrality

Cross-clique centrality of a single node in a complex graph determines the connectivity of a node to different

clique A clique ( AusE, CanE, or ), in the social sciences, is a group of individuals who interact with one another and share similar interests. Interacting with cliques is part of normative social development regardless of gender, ethnicity, or popula ...

s. A node with high cross-clique connectivity facilitates the propagation of information or disease in a graph. Cliques are subgraphs in which every node is connected to every other node in the clique. The cross-clique connectivity of a node

v

for a given graph

G:=(V,E)

with

, V,

vertices and

, E,

edges, is defined as

X(v)

where

X(v)

is the number of cliques to which vertex

v

belongs. This measure was used by Faghani in 2013 but was first proposed by Everett and Borgatti in 1998 where they called it clique-overlap centrality.

Freeman centralization

The centralization of any network is a measure of how central its most central node is in relation to how central all the other nodes are. Centralization measures then (a) calculate the sum in differences in centrality between the most central node in a network and all other nodes; and (b) divide this quantity by the theoretically largest such sum of differences in any network of the same size. Thus, every centrality measure can have its own centralization measure. Defined formally, if

C_x(p_i)

is any centrality measure of point

i

, if

C_x(p_*)

is the largest such measure in the network, and if: :

\max \sum_^ (C_x(p_*)-C_x(p_i))

is the largest sum of differences in point centrality

C_x

for any graph with the same number of nodes, then the centralization of the network is: :

C_x=\frac.

The concept is due to Linton Freeman.

Dissimilarity-based centrality measures

In order to obtain better results in the ranking of the nodes of a given network, in are used dissimilarity measures (specific to the theory of classification and data mining) to enrich the centrality measures in complex networks. This is illustrated with

, calculating the centrality of each node through the solution of the eigenvalue problem :

W\mathbf=\lambda \mathbf

where

W_=A_D_

(coordinate-to-coordinate product) and

D_

is an arbitrary dissimilarity matrix, defined through a dissimilitary measure, e.g.,

Jaccard Jaccard is a surname. Notable people with the surname include: * Auguste Jaccard (1833–1895), Swiss geologist and paleontologist * Fernand Jaccard (1907–2008), Swiss footballer * James Jaccard (born 1949), American psychologist and social wor ...

dissimilarity given by :

D_=1-\dfrac

Where this measure permits us to quantify the topological contribution (which is why is called contribution centrality) of each node to the centrality of a given node, having more weight/relevance those nodes with greater dissimilarity, since these allow to the given node access to nodes that which themselves can not access directly. Is noteworthy that

W

is non-negative because

A

and

D

are non-negative matrices, so we can use the

to ensure that the above problem has a unique solution for ''λ'' = ''λ_max'' with c non-negative, allowing us to infer the centrality of each node in the network. Therefore, the centrality of the i-th node is :

c_i=\dfrac\sum_^W_c_, \,\,\,\,\,\, i=1,\cdots,n

where

n

is the number of the nodes in the network. Several dissimilarity measures and networks were tested in obtaining improved results in the studied cases.

Notes and references

{{Reflist

Definition and characterization of centrality indices

Characterization by network flows

Characterization by walk structure

Radial-volume centralities exist on a spectrum

Game-theoretic centrality

Important limitations

Degree centrality

Closeness centrality

Harmonic centrality

Betweenness centrality

Eigenvector centrality

Using the adjacency matrix to find eigenvector centrality

Katz centrality

PageRank centrality

Percolation centrality

Cross-clique centrality

Freeman centralization

Dissimilarity-based centrality measures

See also

Notes and references

Further reading