Configuration Model
   HOME

TheInfoList



OR:

In
network science Network science is an academic field which studies complex networks such as telecommunication networks, computer networks, biological networks, cognitive and semantic networks, and social networks, considering distinct elements or actors rep ...
, the configuration model is a method for generating random networks from a given
degree Degree may refer to: As a unit of measurement * Degree (angle), a unit of angle measurement ** Degree of geographical latitude ** Degree of geographical longitude * Degree symbol (°), a notation used in science, engineering, and mathematics ...
sequence. It is widely used as a reference model for real-life
social network A social network is a social structure made up of a set of social actors (such as individuals or organizations), sets of dyadic ties, and other social interactions between actors. The social network perspective provides a set of methods for ...
s, because it allows the modeler to incorporate arbitrary degree distributions.


Rationale for the model

In the configuration model, the degree of each vertex is pre-defined, rather than having a probability distribution from which the given degree is chosen. As opposed to the
Erdős–Rényi model In the mathematical field of graph theory, the Erdős–Rényi model is either of two closely related models for generating random graphs or the evolution of a random network. They are named after Hungarian mathematicians Paul Erdős and Alfrà ...
, the degree sequence of the configuration model is not restricted to have a
Poisson distribution In probability theory and statistics, the Poisson distribution is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time or space if these events occur with a known co ...
, the model allows the user to give the network any desired degree distribution.


Algorithm

The following algorithm describes the generation of the model: # Take a degree sequence, i. e. assign a degree k_ito each vertex. The degrees of the vertices are represented as half-links or stubs. The sum of stubs must be even in order to be able to construct a graph (\sum k_i = 2m ). The degree sequence can be drawn from a theoretical distribution or it can represent a real network (determined from the
adjacency matrix In graph theory and computer science, an adjacency matrix is a square matrix used to represent a finite graph. The elements of the matrix indicate whether pairs of vertices are adjacent or not in the graph. In the special case of a finite simp ...
of the network). # Choose two stubs uniformly at random and connect them to form an edge. Choose another pair from the remaining 2m-2 stubs and connect them. Continue until you run out of stubs. The result is a network with the pre-defined degree sequence. The realization of the network changes with the order in which the stubs are chosen, they might include cycles (b), self-loops (c) or multi-links (d) (Figure 1). Yet, the expected number of self-loops and multi-links goes to zero in the ''N'' → ∞ limit.


Self-loops, multi-edges and implications

The algorithm described above matches any stubs with the same probability. The uniform distribution of the matching is an important property in terms of calculating other features of the generated networks. The network generation process does not exclude the event of generating a self-loop or a multi-link. If we designed the process where self-loops and multi-edges are not allowed, the matching of the stubs would not follow a uniform distribution. The expected total number of multi-links in a configuration model network would be: \frac\Big frac-1\Big2 where \langle \rangle is the n-th moment of the degree distribution. Therefore, the average number of self-loops and multi-links is a constant for some large networks, and the density of self-loops and multi-links, meaning the number per node, goes to zero as N\rightarrow\infty as long as \langle \rangle is constant and finite. For some power-law degree distributions where the second moment diverges, density of multi-links may not vanish or may do so more slowly than ^. A further consequence of self-loops and multi-edges is that not all possible networks are generated with the same probability. In general, all possible realizations can be generated by permuting the stubs of all vertices in every possible way. The number of permutation of the stubs of node i is k_i! , so the number of realizations of a degree sequence is N\ = \prod_i k_i ! . This would mean that each realization occurs with the same probability. However, self-loops and multi-edges can change the number of realizations, since permuting self-edges can result an unchanged realization. Given that the number of self-loops and multi-links vanishes as N\rightarrow\infty, the variation in probabilities of different realization will be small but present.


Properties


Edge probability

A stub of node i can be connected to 2m-1 other stubs (there are 2m stubs altogether, and we have to exclude the one we are currently observing). The vertex j has k_j stubs to which node i can be connected with the same probability (because of the uniform distribution). The probability of a stub of node i being connected to one of these k_j stubs is \frac . Since node i has k_i stubs, the probability of i being connected to j is \frac (\frac for sufficiently large m ). Note that this formula can only be viewed as a probability if k_ik_j/2m\ll 1, and more precisely it describes the expected number of edges between nodes i and j. Note that this formula does not apply to the case of self-edges. Given a configuration model with a degree distribution p_k , the probability of a randomly chosen node i having degree k is p_k . But if we took one of the vertices to which we can arrive following one of edges of i, the probability of having degree k is \frac\times np_k = \frac . (The probability of reaching a node with degree k is \frac , and there are np_k such nodes.) This fraction depends on kp_k as opposed to the degree of the typical node with p_k . Thus, a neighbor of a typical node is expected to have higher degree than the typical node itself. This feature of the configuration model describes well the phenomenon of "my friends having more friends than I do".


Clustering coefficient

The
clustering coefficient In graph theory, a clustering coefficient is a measure of the degree to which nodes in a graph tend to cluster together. Evidence suggests that in most real-world networks, and in particular social networks, nodes tend to create tightly knit groups ...
C_g (the average probability that the neighbors of a node are connected) is computed approximately as follows: : C_g = \sum_^\infty q_q_ \frac, where q_k denotes the probability that a random edge reaches a degree-k vertex, and the factors of the form "k_i-1" rather than "k_i" appear because one stub has been accounted for by the fact that these are neighbors of a common vertex. Evaluating the above results in : C_g = \frac \left \sum_^\infty (k-1)q_k \right 2. Using q_k=kp_k/\langle k\rangle and 2m=N\langle k\rangle, with p_k denoting the degree distribution, \langle k\rangle denoting the average degree, and N denoting the number of vertices, the above becomes : C_g = \frac \left \sum_^\infty (k-1)kP(k) \right 2 =\frac, with \langle k^2\rangle denoting the second moment of the degree distribution. Assuming that \langle k^2\rangle and \langle k\rangle are constant, the above behaves as : C_g \sim \frac, where the constant depends on p_k . Thus, the clustering coefficient C_g becomes small in the N\gg 1 limit.


Giant component

In the configuration model, a
giant component In network theory, a giant component is a connected component of a given random graph that contains a finite fraction of the entire graph's vertices. Giant component in Erdős–Rényi model Giant components are a prominent feature of the ErdŠ...
(GC) exists if : \langle k^2 \rangle- 2 \langle k \rangle > 0, where \langle k \rangle and \langle k^2 \rangle are the first and second moments of the
degree distribution In the study of graphs and networks, the degree of a node in a network is the number of connections it has to other nodes and the degree distribution is the probability distribution of these degrees over the whole network. Definition The degre ...
. That means that, the critical threshold solely depends on quantities which are uniquely determined by the degree distribution p_k. Configuration model generates locally tree-like networks, meaning that any local neighborhood in such a network takes the form of a tree. More precisely, if you start at any node in the network and form the set of all nodes at distance d or less from that starting node, the set will, with probability tending to 1 as n → ∞, take the form of a tree. In tree-like structures, the number of second neighbors averaged over the whole network, c_2, is: c_2 =\langle k^2 \rangle- \langle k \rangle. Then, in general, the average number at distance d can be written as: : c_d = \left(\frac\right)^ c_1. Which implies that if the ratio of \frac is larger than one, then the network can have a giant component. This is famous as the Molloy-Reed criterion. The intuition behind this criterion is that if the giant component (GC) exists, then the average degree of a randomly chosen vertex i in a connected component should be at least 2. Molloy-Reed criterion can also be expressed as: \sum_i k_i(k_i - 2) >0, which implies that, although the size of the GC may depend on p_0and p_2, the number of nodes of degree 0 and 2 have no contribution in the existence of the giant component.


Diameter

Configuration model can assume any degree distribution and shows the small-world effect, since to leading order the diameter of the configuration model is just d = \frac.


Components of finite size

As total number of vertices N tends to infinity, the probability to find two giant components is vanishing. This means that in the sparse regime, the model consist of one giant component (if any) and multiple connected components of finite size. The sizes of the connected components are characterized by their size distribution w_n - the probability that a randomly sampled vertex belongs to a connected component of size n. There is a correspondence between the degree distribution p_k and the size distribution w_n. When total number of vertices tends to infinity, N\rightarrow\infty, the following relation takes place: : w_n=\begin \frac u_1^(n-2),& n>1, \\ p_0 & n=1, \end where u_1(k) := \frac p_, and u_1^ denotes the n -fold
convolution power In mathematics, the convolution power is the ''n''-fold iteration of the convolution with itself. Thus if x is a Function (mathematics), function on Euclidean space R''d'' and n is a natural number, then the convolution power is defined by : x^ = \ ...
. Moreover, explicit asymptotes for w_n are known when n\gg1 and , \langle k^2 \rangle- 2 \langle k \rangle , is close to zero. The analytical expressions for these asymptotes depend on finiteness of the moments of p_k, the degree distribution tail exponent \beta (when p_kfeatures a heavy tail), and the sign of Molloy–Reed criterium. The following table summarises these relationships (the constants are provided in).


Modelling


Comparison to real-world networks

Three general properties of
complex network In the context of network theory, a complex network is a graph (network) with non-trivial topological features—features that do not occur in simple networks such as lattices or random graphs but often occur in networks representing real ...
s are heterogeneous degree distribution, short average path length and high clustering. Having the opportunity to define any arbitrary degree sequence, the first condition can be satisfied by design, but as shown above, the global clustering coefficient is an inverse function of the network size, so for large configuration networks, clustering tends to be small. This feature of the baseline model contradicts the known properties of empirical networks, but extensions of the model can solve this issue (see ). All the networks generated by this model are locally tree-like provided the average of the excess degree distribution is either constant or grows slower than the square root of number of links, \sqrt m. In other words, this model prevents forming substructures such as loops in the large size limit. Vanishing of clustering coefficient, is a special case of this more general result. While the tree-like property makes the model not very realistic, so many calculations, such as the generating function methods, are possible for the configuration model thanks to this feature.


Application: modularity calculation

The configuration model is applied as benchmark in the calculation of network modularity. Modularity measures the degree of division of the network into modules. It is computed as follows: Q = \frac \sum_\Bigl(A_-\frac\Bigr)\delta(C_i,C_j) in which the
adjacency matrix In graph theory and computer science, an adjacency matrix is a square matrix used to represent a finite graph. The elements of the matrix indicate whether pairs of vertices are adjacent or not in the graph. In the special case of a finite simp ...
of the network is compared to the probability of having an edge between node i and j (depending on their degrees) in the configuration model (see the page modularity for details).


Directed configuration model

In the DCM (directed configuration model), each node is given a number of half-edges called tails and heads. Then tails and heads are matched uniformly at random to form directed edges. The size of the giant component, the typical distance, and the diameter of DCM have been studied mathematically. There also has been extensive research on
random walks In mathematics, a random walk is a random process that describes a path that consists of a succession of random steps on some mathematical space. An elementary example of a random walk is the random walk on the integer number line \mathbb Z ...
on DCM. Some real-world complex networks have been modelled by DCM, such as neural networks, finance and social networks.


References

{{Reflist Networks