Lancichinetti–Fortunato–Radicchi benchmark is an algorithm that generates benchmark networks (artificial networks that resemble real-world networks). They have ''a priori'' known

communities A community is a Level of analysis, social unit (a group of living things) with commonality such as place (geography), place, Norm (social), norms, religion, values, Convention (norm), customs, or Identity (social science), identity. Communiti ...

and are used to compare different community detection methods. The advantage of the benchmark over other methods is that it accounts for the heterogeneity in the distributions of

node In general, a node is a localized swelling (a "knot") or a point of intersection (a vertex). Node may refer to: In mathematics *Vertex (graph theory), a vertex in a mathematical graph *Vertex (geometry), a point where two or more curves, lines, ...

degrees and of community sizes.A. Lancichinetti, S. Fortunato, and F. Radicchi.(2008) Benchmark graphs for testing community detection algorithms. Physical Review E, 78.

The algorithm

The node degrees and the community sizes are distributed according to a

power law In statistics, a power law is a Function (mathematics), functional relationship between two quantities, where a Relative change and difference, relative change in one quantity results in a proportional relative change in the other quantity, inde ...

, with different exponents. The benchmark assumes that both the degree and the community size have power law distributions with different exponents,

\gamma

and

\beta

, respectively.

N

is the number of nodes and the average degree is

\langle k \rangle

. There is a mixing parameter

\mu

, which is the average fraction of neighboring nodes of a node that do not belong to any community that the benchmark node belongs to. This parameter controls the fraction of edges that are between communities. Thus, it reflects the amount of noise in the network. At the extremes, when

\mu = 0

all links are within community links, if

\mu = 1

all links are between nodes belonging to different communities. One can generate the benchmark network using the following steps. Step 1: Generate a network with nodes following a power law distribution with exponent

\gamma

and choose extremes of the distribution

k_

and

k_

to get desired average degree is

\langle k\rangle

. Step 2:

(1 - \mu)

fraction of links of every node is with nodes of the same community, while fraction

\mu

is with the other nodes. Step 3: Generate community sizes from a power law distribution with exponent

\beta

. The sum of all sizes must be equal to

N

. The minimal and maximal community sizes

s_

and

s_

must satisfy the definition of community so that every non-isolated node is in at least in one community: :

s_ > k_

s_ > k_

Step 4: Initially, no nodes are assigned to communities. Then, each node is randomly assigned to a community. As long as the number of neighboring nodes within the community does not exceed the community size a new node is added to the community, otherwise stays out. In the following iterations the “homeless” node is randomly assigned to some community. If that community is complete, i.e. the size is exhausted, a randomly selected node of that community must be unlinked. Stop the iteration when all the communities are complete and all the nodes belong to at least one community. Step 5: Implement rewiring of nodes keeping the same node degrees but only affecting the fraction of internal and external links such that the number of links outside the community for each node is approximately equal to the mixing parameter

\mu

Testing

Consider a

partition Partition may refer to: Computing Hardware * Disk partitioning, the division of a hard disk drive * Memory partition, a subdivision of a computer's memory, usually for use by a single job Software * Partition (database), the division of a ...

into communities that do not overlap. The communities of randomly chosen nodes in each iteration follow a

p(C)

distribution that represents the probability that a randomly picked node is from the community

C

. Consider a partition of the same network that was predicted by some community finding algorithm and has

p(C_2)

distribution. The benchmark partition has

p(C_1)

distribution. The joint distribution is

p(C_1, C_2)

. The similarity of these two partitions is captured by the normalized

mutual information In probability theory and information theory, the mutual information (MI) of two random variables is a measure of the mutual dependence between the two variables. More specifically, it quantifies the " amount of information" (in units such ...

. :

I_n = \frac

I_n=1

the benchmark and the detected partitions are identical, and if

I_n=0

then they are independent of each other.Barabasi, A.-L. (2014). "Network Science". Chapter 9: Communities.

References

{{DEFAULTSORT:Lancichinetti-Fortunato-Radicchi benchmark Algorithms Random graphs Benchmarks (computing) Statistical methods