Exponential family random graph models (ERGMs) are a set of
statistical models
A statistical model is a mathematical model that embodies a set of statistical assumptions concerning the generation of sample data (and similar data from a larger population). A statistical model represents, often in considerably idealized form ...
used to study the structure and patterns within
networks
Network, networking and networked may refer to:
Science and technology
* Network theory, the study of graphs as a representation of relations between discrete objects
* Network science, an academic field that studies complex networks
Mathematics
...
, such as those in social, organizational, or scientific contexts.
They analyze how connections (
edges) form between individuals or entities (
nodes) by modeling the likelihood of network features, like
clustering or
centrality
In graph theory and network analysis, indicators of centrality assign numbers or rankings to nodes within a graph corresponding to their network position. Applications include identifying the most influential person(s) in a social network, ke ...
, across diverse examples including
knowledge networks, organizational networks, colleague networks,
social media
Social media are interactive technologies that facilitate the Content creation, creation, information exchange, sharing and news aggregator, aggregation of Content (media), content (such as ideas, interests, and other forms of expression) amongs ...
networks, networks of scientific collaboration, and more. Part of the
exponential family
In probability and statistics, an exponential family is a parametric set of probability distributions of a certain form, specified below. This special form is chosen for mathematical convenience, including the enabling of the user to calculate ...
of distributions, ERGMs help researchers understand and predict network behavior in fields ranging from sociology to
data science
Data science is an interdisciplinary academic field that uses statistics, scientific computing, scientific methods, processing, scientific visualization, algorithms and systems to extract or extrapolate knowledge from potentially noisy, stru ...
.
Background
Many metrics exist to describe the structural features of an observed network such as the density, centrality, or
assortativity
Assortativity, or assortative mixing, is a preference for a network's nodes to attach to others that are similar in some way. Though the specific measure of similarity may vary, network theorists often examine assortativity in terms of a node's ...
. However, these metrics describe the observed network which is only one instance of a large number of possible alternative networks.
This set of alternative networks may have similar or dissimilar structural features. To support
statistical inference
Statistical inference is the process of using data analysis to infer properties of an underlying probability distribution.Upton, G., Cook, I. (2008) ''Oxford Dictionary of Statistics'', OUP. . Inferential statistical analysis infers properties of ...
on the processes influencing the formation of network structure, a
statistical model
A statistical model is a mathematical model that embodies a set of statistical assumptions concerning the generation of Sample (statistics), sample data (and similar data from a larger Statistical population, population). A statistical model repre ...
should consider the set of all possible alternative networks weighted on their similarity to an observed network. However because network data is inherently relational, it violates the assumptions of independence and identical distribution of standard statistical models like
linear regression
In statistics, linear regression is a statistical model, model that estimates the relationship between a Scalar (mathematics), scalar response (dependent variable) and one or more explanatory variables (regressor or independent variable). A mode ...
.
Alternative statistical models should reflect the uncertainty associated with a given observation, permit inference about the relative frequency about network substructures of theoretical interest, disambiguating the influence of
confounding
In causal inference, a confounder is a variable that influences both the dependent variable and independent variable, causing a spurious association. Confounding is a causal concept, and as such, cannot be described in terms of correlatio ...
processes, efficiently representing complex structures, and linking local-level processes to global-level properties.
Degree-preserving randomization, for example, is a specific way in which an observed network could be considered in terms of multiple alternative networks.
Definition
The
Exponential family
In probability and statistics, an exponential family is a parametric set of probability distributions of a certain form, specified below. This special form is chosen for mathematical convenience, including the enabling of the user to calculate ...
is a broad family of models for covering many types of data, not just networks. An ERGM is a model from this family which describes networks.
Formally a
random graph
In mathematics, random graph is the general term to refer to probability distributions over graphs. Random graphs may be described simply by a probability distribution, or by a random process which generates them. The theory of random graphs l ...
consists of a set of
nodes and a collection of
tie
Tie has two principal meanings:
* Tie (draw), a finish to a competition with identical results, particularly sports
* Necktie, a long piece of cloth worn around the neck or shoulders
Tie or TIE may also refer to:
Engineering and technology
* T ...
variables
, indexed by pairs of nodes
, where
if the nodes
are connected by an edge and
otherwise. A pair of nodes
is called a dyad and a dyad is an
edge
Edge or EDGE may refer to:
Technology Computing
* Edge computing, a network load-balancing system
* Edge device, an entry point to a computer network
* Adobe Edge, a graphical development application
* Microsoft Edge, a web browser developed by ...
if
.
The basic assumption of these models is that the structure in an observed graph
can be explained by a given vector of
sufficient statistic
In statistics, sufficiency is a property of a statistic computed on a sample dataset in relation to a parametric model of the dataset. A sufficient statistic contains all of the information that the dataset provides about the model parameters. It ...
s
which are a function of the observed network and, in some cases, nodal attributes. This way, it is possible to describe any kind of dependence between the undyadic variables:
where
is a vector of model parameters associated with
and
is a normalising constant.
These models represent a probability distribution on each possible network on
nodes. However, the size of the set of possible networks for an undirected network (simple graph) of size
is
. Because the number of possible networks in the set vastly outnumbers the number of parameters which can constrain the model, the ideal probability distribution is the one which maximizes the
Gibbs entropy
The concept entropy was first developed by German physicist Rudolf Clausius in the mid-nineteenth century as a thermodynamic property that predicts that certain spontaneous processes are irreversible or impossible. In statistical mechanics, entrop ...
.
Example
Let
be a set of three nodes and let
be the set of all
undirected, loopless graphs on
. ''Loopless'' implies that for all
it is
and ''undirected'' implies that for all
it is
, so that there are three binary tie variables (
) and
different graphs in this example.
Define a two-dimensional vector of statistics by
, where
is defined to be the number of edges in the graph
and
is defined to be the number of closed triangles in
. Finally, let the parameter vector be defined by
, so that the probability of every graph
in this example is given by:
We note that in this example, there are just four
graph isomorphism classes: the graph with zero edges, three graphs with exactly one edge, three graphs with exactly two edges, and the graph with three edges. Since isomorphic graphs have the same number of edges and the same number of triangles, they also have the same probability in this example ERGM. For a representative
of each isomorphism class, we first compute the term
, which is proportional to the probability of
(up to the normalizing constant
).
If
is the graph with zero edges, then it is
and
, so that
If
is a graph with exactly one edge, then it is
and
, so that
If
is a graph with exactly two edges, then it is
and
, so that
If
is the graph with exactly three edges, then it is
and
, so that
The normalizing constant is computed by summing
over all eight different graphs
. This yields:
Finally, the probability of every graph
is given by
. Explicitly, we get that the graph with zero edges has probability
, every graph with exactly one edge has probability
, every graph with exactly two edges has probability
, and the graph with exactly three edges has probability
in this example.
Intuitively, the structure of graph probabilities in this ERGM example are consistent with typical patterns of
social
Social organisms, including human(s), live collectively in interacting populations. This interaction is considered social whether they are aware of it or not, and whether the exchange is voluntary or not.
Etymology
The word "social" derives fro ...
or
other networks. The negative parameter (
) associated with the number of edges implies that - all other things being equal - networks with fewer edges have a higher probability than networks with more edges. This is consistent with the
sparsity
In numerical analysis and scientific computing, a sparse matrix or sparse array is a matrix in which most of the elements are zero. There is no strict definition regarding the proportion of zero-value elements for a matrix to qualify as sparse ...
that is often found in empirical networks, namely that the empirical number of edges typically grows at a slower rate than the maximally possible number of edges. The positive parameter (
) associated with the number of closed triangles implies that - all other things being equal - networks with more triangles have a higher probability than networks with fewer triangles. This is consistent with a tendency for
triadic closure that is often found in certain types of social networks. Compare these patterns with the graph probabilities computed above. The addition of every edge divides the probability by two. However, when going from a graph with two edges to the graph with three edges, the number of triangles increases by one - which additionally multiplies the probability by three.
We note that the explicit calculation of all graph probabilities is only possible since there are so few different graphs in this example. Since the number of different graphs scales exponentially in the number of tie variables - which in turn scales quadratic in the number of nodes -, computing the normalizing constant is in general
computationally intractable, already for a moderate number of nodes.
Sampling from an ERGM
Exact sampling from a given ERGM is computationally intractable in general since computing the normalizing constant requires summation over all
. Efficient approximate sampling from an ERGM can be done via
Markov chain
In probability theory and statistics, a Markov chain or Markov process is a stochastic process describing a sequence of possible events in which the probability of each event depends only on the state attained in the previous event. Informally ...
s and is applied in current methods to approximate expected values and to estimate ERGM parameters.
Informally, given an ERGM on a set of graphs
with probability mass function
, one selects an initial graph
(which might be arbitrarily, or randomly, chosen or might represent an observed network) and implicitly defines transition probabilities (or jump probabilities)
, which are the conditional probabilities that the Markov chain is on graph
after Step
, given that it is on graph
after Step
. The transition probabilities do not depend on the graphs in earlier steps (
), which is a defining property of
Markov chain
In probability theory and statistics, a Markov chain or Markov process is a stochastic process describing a sequence of possible events in which the probability of each event depends only on the state attained in the previous event. Informally ...
s, and they do not depend on
, that is, the Markov chain is time-homogeneous. The goal is to define the transition probabilities such that for all
it is
independent of the initial graph
. If this is achieved, one can run the Markov chain for a large number of steps and then returns the current graph as a random sample from the given ERGM. The probability to return a graph
after a finite but large number of update steps is approximately the probability defined by the ERGM.
Current methods for sampling from ERGMs with Markov chains
usually define an update step by two sub-steps: first, randomly select a ''candidate''
in a neighborhood of the current graph
and, second, to accept
with a probability that depends on the probability ratio of the current graph
and the candidate
. (If the candidate is not accepted, the Markov chain remains on the current graph
.) If the set of graphs
is unconstrained (i.e., contains any combination of values on the binary tie variables), a simple method for candidate selection is to choose one tie variable
uniformly at random and to define the candidate by flipping this single variable (i.e., to set
; all other variables take the same value as in
). A common way to define the acceptance probability is to accept
with the conditional probability
where the graph probabilities are defined by the ERGM. Crucially, the normalizing constant
cancels out in this fraction, so that the acceptance probabilities can be computed efficiently.
See also
*
Autologistic actor attribute models
References
Further reading
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*{{cite journal, last1=van Duijn , first1=M. A. J. , last2=Gile , first2=K. J. , author2-link = Krista Gile , last3=Handcock , first3=M. S. , year=2009 , title=A framework for the comparison of maximum pseudo-likelihood and maximum likelihood estimation of exponential family random graph models , journal=Social Networks , volume=31 , issue=1 , pages=52–62 , doi=10.1016/j.socnet.2008.10.003, pmid=23170041 , pmc=3500576
Network theory