Random walk closeness centrality is a measure of

centrality In graph theory and network analysis, indicators of centrality assign numbers or rankings to nodes within a graph corresponding to their network position. Applications include identifying the most influential person(s) in a social network, key ...

in a

network Network, networking and networked may refer to: Science and technology * Network theory, the study of graphs as a representation of relations between discrete objects * Network science, an academic field that studies complex networks Mathematics ...

, which describes the average speed with which randomly walking processes reach a node from other nodes of the network. It is similar to the

closeness centrality In a connected graph, closeness centrality (or closeness) of a node is a measure of centrality in a network, calculated as the reciprocal of the sum of the length of the shortest paths between the node and all other nodes in the graph. Thus, the mo ...

except that the farness is measured by the expected length of a

random walk In mathematics, a random walk is a random process that describes a path that consists of a succession of random steps on some mathematical space. An elementary example of a random walk is the random walk on the integer number line \mathbb Z ...

rather than by the

shortest path In graph theory, the shortest path problem is the problem of finding a path between two vertices (or nodes) in a graph such that the sum of the weights of its constituent edges is minimized. The problem of finding the shortest path between tw ...

. The concept was first proposed by White and Smyth (2003) under the name ''Markov centrality''.

Intuition

Consider a network with a finite number of nodes and a random walk process that starts in a certain node and proceeds from node to node along the edges. From each node, it chooses randomly the edge to be followed. In an unweighted network, the probability of choosing a certain edge is equal across all available edges, while in a weighted network it is proportional to the edge weights. A node is considered to be close to other nodes, if the random walk process initiated from any node of the network arrives to this particular node in relatively few steps on average.

Definition

Consider a weighted network – either directed or undirected – with n nodes denoted by j=1, …, n; and a random walk process on this network with a transition matrix M. The

m_

element of M describes the probability of the random walker that has reached node i, proceeds directly to node j. These probabilities are defined in the following way. :

m_=\frac

where

a_

is the (i,j)th element of the weighting matrix A of the network. When there is no edge between two nodes, the corresponding element of the A matrix is zero. The random walk closeness centrality of a node i is the inverse of the average mean first passage time to that node: :

C_^ = \frac

where

H(j, i)

is the mean first passage time from node j to node i.

Mean first passage time

The mean first passage time from node i to node j is the expected number of steps it takes for the process to reach node j from node i for the first time: :

H(i,j)=\sum_^ rP(i,j,r)

where P(i,j,r) denotes the probability that it takes exactly r steps to reach j from i for the first time. To calculate these probabilities of reaching a node for the first time in r steps, it is useful to regard the target node as an absorbing one, and introduce a transformation of M by deleting its j-th row and column and denoting it by

M_

. As the probability of a process starting at i and being in k after r-1 steps is simply given by the (i,k)th element of

M_^

, P(i,j,r) can be expressed as :

P(i,j,r)=\sum_((M_^))_ m_

Substituting this into the expression for mean first passage time yields :

H(i,j)=\sum_^ r \sum_((M_^))_ m_

Using the formula for the summation of

geometric series In mathematics, a geometric series is the sum of an infinite number of terms that have a constant ratio between successive terms. For example, the series :\frac \,+\, \frac \,+\, \frac \,+\, \frac \,+\, \cdots is geometric, because each succ ...

for matrices yields :

H(i,j)=\sum_((I-M_)^)_ m_

where I is the n-1 dimensional

identity matrix In linear algebra, the identity matrix of size n is the n\times n square matrix with ones on the main diagonal and zeros elsewhere. Terminology and notation The identity matrix is often denoted by I_n, or simply by I if the size is immaterial o ...

. For computational convenience, this expression can be vectorized as :

H(.,j)= (I-M_)^e

where

H(.,j)

is the vector for first passage times for a walk ending at node j, and e is an n-1 dimensional vector of ones. Mean first passage time is not symmetric, even for undirected graphs.

In model networks

According to simulations performed by Noh and Rieger (2004), the distribution of random walk closeness centrality in a Barabási-Albert model is mainly determined by the

degree distribution In the study of graphs and networks, the degree of a node in a network is the number of connections it has to other nodes and the degree distribution is the probability distribution of these degrees over the whole network. Definition The degre ...

. In such a network, the random walk closeness centrality of a node is roughly proportional to, but does not increase monotonically with its degree.

Applications for real networks

Random walk closeness centrality is more relevant measure than the simple

in case of applications where the concept of shortest paths is not meaningful or is very restrictive for a reasonable assessment of the nature of the system. This is the case for example when the analyzed process evolves in the network without any specific intention to reach a certain point, or without the ability of finding the shortest path to reach its target. One example for a random walk in a network is the way a certain coin circulates in an economy: it is passed from one person to another through transactions, without any intention of reaching a specific individual. Another example where the concept of shortest paths is not very useful is a densely connected network. Furthermore, as shortest paths are not influenced by self-loops, random walk closeness centrality is a more adequate measure than

when analyzing networks where self-loops are important. An important application on the field of economics is the analysis of the

input-output model In computing, input/output (I/O, or informally io or IO) is the communication between an information processing system, such as a computer, and the outside world, possibly a human or another information processing system. Inputs are the signals ...

of an economy, which is represented by a densely connected weighted network with important self-loops. The concept is widely used in natural sciences as well. One biological application is the analysis of protein-protein interactions.

Random walk betweenness centrality

A related concept, proposed by Newman, is random walk betweenness centrality. Just as random walk closeness centrality is a random walk counterpart of

, random walk betweenness centrality is, similarly, the random walk counterpart of

betweenness centrality In graph theory, betweenness centrality (or "betweeness centrality") is a measure of centrality in a graph based on shortest paths. For every pair of vertices in a connected graph, there exists at least one shortest path between the vertices such ...

. Unlike the usual betweenness centrality measure, it does not only count shortest paths passing through the given node, but all possible paths crossing it. Formally, the random walk betweenness centrality of a node is :

C_^ = \sum_ r_

where the

r_

element of matrix R contains the probability of a random walk starting at node j with absorbing node k, passing through node i. Calculating random walk betweenness in large networks is computationally very intensive.

Second order centrality

Another

based centrality is the second order centrality.A.-M. Kermarrec, E. Le Merrer, B. Sericola, G. Trédan: Second order centrality: Distributed assessment of nodes criticity in complex networks. Elsevier Computer Communications 34(5):619-628, 2011. Instead of counting the shortest paths passing through a given node (as for random walk betweenness centrality), it focuses on another characteristic of random walks on graphs. The expectation of the

standard deviation In statistics, the standard deviation is a measure of the amount of variation or dispersion of a set of values. A low standard deviation indicates that the values tend to be close to the mean (also called the expected value) of the set, while ...

of the return times of a random walk to a node constitutes its

. The lower that deviation, the more central that node is. Calculating the second order betweenness on large arbitrary graphs is also intensive, as its complexity is

O(n^3)

(worst case achieved on the

Lollipop graph In the mathematical discipline of graph theory, the (''m'',''n'')-lollipop graph is a special type of graph consisting of a complete graph (clique) on ''m'' vertices and a path graph on ''n'' vertices, connected with a bridge. The special case o ...

References

{{Reflist Graph theory

Intuition

Definition

Mean first passage time

In model networks

Applications for real networks

Random walk betweenness centrality

Second order centrality

See also

References