Degree Preserving Randomization is a technique used in
Network Science
Network science is an academic field which studies complex networks such as telecommunication networks, computer networks, biological networks, cognitive and semantic networks, and social networks, considering distinct elements or actors repr ...
that aims to assess whether or not variations observed in a given graph could simply be an artifact of the graph's inherent structural properties rather than properties unique to the nodes, in an observed network.
Background
Cataloged as early as 1996, the simplest implementation of degree preserving randomization relies on a
Monte Carlo
Monte Carlo (; ; french: Monte-Carlo , or colloquially ''Monte-Carl'' ; lij, Munte Carlu ; ) is officially an administrative area of the Principality of Monaco, specifically the ward of Monte Carlo/Spélugues, where the Monte Carlo Casino i ...
algorithm that rearranges, or "rewires" the network at random such that, with a sufficient number of rewires, the network's degree distribution is identical to the initial degree distribution of the network, though the topological structure of the network has become completely distinct from the original network.
Degree preserving randomization, while it has many different forms, typically takes on the form of a relatively simple approach: for any network consisting of
nodes with
edges, select two dyadically tied nodes. For each of these dyadic pairs, switch the edges such that the new dyadic pairs are mismatched. After a sufficient number of these mismatches, the network increasingly loses its original observed topography.
As is common with algorithms based on
Markov chain
A Markov chain or Markov process is a stochastic model describing a sequence of possible events in which the probability of each event depends only on the state attained in the previous event. Informally, this may be thought of as, "What happen ...
s, the number of iterations, or individual rewires, that must occur on a given graph such that the graph is sufficiently random and distinct from the original graph is unknown, though Espinoza asserts that a safe minimum threshold is
, where
"is at least 100" (Espinoza). Others have provided input for this issue, including one author who states that a safe minimum may instead be at least
, where epsilon lies in a range between
and
, though ultimately the correct number is not presently known.
Uses
There are several cases in which published research have explicitly employed degree preserving randomization in order to analyze network properties. Dekker used rewiring in order to more accurately model observed social networks by adding a secondary variable,
, which introduces a high-degree attachment bias. Liu et al. have additionally employed degree preserving randomization to assert that the
Control Centrality, a metric they identify, alters little when compared to the Control Centrality of an
Erdős–Rényi model
In the mathematical field of graph theory, the Erdős–Rényi model is either of two closely related models for generating random graphs or the evolution of a random network. They are named after Hungarian mathematicians Paul Erdős and Alf ...
containing the same number of
nodes in their simulations - Liu et al. have also used degree preserving randomization models in subsequent work exploring
network controllability.
Additionally, some work has been done in investigating how Degree Preserving Randomization may be used in addressing considerations of anonymity in networked data research, which has been shown to be a cause for concern in
Social Network Analysis
Social network analysis (SNA) is the process of investigating social structures through the use of networks and graph theory. It characterizes networked structures in terms of ''nodes'' (individual actors, people, or things within the network) ...
, as in the case of a study by Lewis et al. Ultimately the work conducted by Ying and Wu, starting from a foundation of Degree Preserving Randomization, and then forwarding several modifications, has showed moderate advances in protecting anonymity without compromising the integrity of the underlying utility of the observed network.
Additionally, the method is similar in nature to the broadly used
Exponential random graph models
Exponential family random graph models (ERGMs) are a family of statistical models for analyzing data from social and other networks. Examples of networks examined using ERGM include knowledge networks, organizational networks, colleague networks, ...
popularized in social science,
and indeed the various forms of modeling networks against observed networks in order to identify and theorize about the differences expressed in real networks. Importantly, Degree Preserving Randomization provides a simple algorithmic design for those familiar with programming to apply a model to an available observed network.
Example
What follows is a small example showing how one may apply Degree Preserving Randomization to an observed network in an effort to understand the network against otherwise random variation while maintaining the degree distributional aspect of the network. The
Association of Internet Researchers The Association of Internet Researchers (AoIR) is a learned society dedicated to the advancement of the transdisciplinary field of Internet studies. Founded in 1999, it is an international, member-based support network promoting critical and schola ...
has a
Listserv
The term Listserv (styled by the registered trademark licensee, L-Soft International, Inc., as LISTSERV) has been used to refer to electronic mailing list software applications in general, but is more properly applied to a few early instances of ...
that constitutes the majority of discussion threads surrounding their work. On it, members post updates about their own research, upcoming conferences, calls for papers and also engage one another in substantive discussions in their field. These emails can in turn constitute a directed and temporal network graph, where nodes are individual e-mail accounts belonging to the Listserv and edges are cases in which one e-mail address responds to another e-mail address on the Listserv.
In this observed network, the properties of the Listserv are relatively simple to calculate - for the network of 3,235 individual e-mail accounts and 9,824 exchanges in total, the observed
reciprocity of the network is about 0.074, and the
average path lengthis about 4.46. Could these values be arrived at simply through the nature of the network's inherent structure?
Applying the
rule, this network would require around 67,861 individual edge rewires to construct a likely sufficiently random degree-preserved graph. If we construct many random, degree preserving graphs from the real graph, we can then create a probability space for characteristics, such as reciprocity and average path length, and assess the degree to which the network could have expressed these characteristics at random. 534 networks were generated using Degree Preserving Randomization. As both reciprocity and average path length in this graph are normally distributed, and as the standard deviation for both reciprocity and average path length are far too narrow to include the observed case, we can reasonably posit that this network is expressing characteristics that are non-random (and thus open for further theory and modeling).
References
{{Reflist, 2
External links
Dataset for example provided
Networks
Network theory