Exponential Random Graph Model
   HOME

TheInfoList



OR:

Exponential family random graph models (ERGMs) are a family of
statistical model A statistical model is a mathematical model that embodies a set of statistical assumptions concerning the generation of Sample (statistics), sample data (and similar data from a larger Statistical population, population). A statistical model repres ...
s for analyzing data from
social Social organisms, including human(s), live collectively in interacting populations. This interaction is considered social whether they are aware of it or not, and whether the exchange is voluntary or not. Etymology The word "social" derives from ...
and other networks. Examples of networks examined using ERGM include knowledge networks, organizational networks, colleague networks, social media networks, networks of scientific development, and others.


Background

Many metrics exist to describe the structural features of an observed network such as the density, centrality, or assortativity. However, these metrics describe the observed network which is only one instance of a large number of possible alternative networks. This set of alternative networks may have similar or dissimilar structural features. To support
statistical inference Statistical inference is the process of using data analysis to infer properties of an underlying probability distribution, distribution of probability.Upton, G., Cook, I. (2008) ''Oxford Dictionary of Statistics'', OUP. . Inferential statistical ...
on the processes influencing the formation of network structure, a
statistical model A statistical model is a mathematical model that embodies a set of statistical assumptions concerning the generation of Sample (statistics), sample data (and similar data from a larger Statistical population, population). A statistical model repres ...
should consider the set of all possible alternative networks weighted on their similarity to an observed network. However because network data is inherently relational, it violates the assumptions of independence and identical distribution of standard statistical models like
linear regression In statistics, linear regression is a linear approach for modelling the relationship between a scalar response and one or more explanatory variables (also known as dependent and independent variables). The case of one explanatory variable is call ...
. Alternative statistical models should reflect the uncertainty associated with a given observation, permit inference about the relative frequency about network substructures of theoretical interest, disambiguating the influence of confounding processes, efficiently representing complex structures, and linking local-level processes to global-level properties.
Degree-preserving randomization Degree Preserving Randomization is a technique used in Network Science that aims to assess whether or not variations observed in a given graph could simply be an artifact of the graph's inherent structural properties rather than properties uniqu ...
, for example, is a specific way in which an observed network could be considered in terms of multiple alternative networks.


Definition

The
Exponential family In probability and statistics, an exponential family is a parametric set of probability distributions of a certain form, specified below. This special form is chosen for mathematical convenience, including the enabling of the user to calculate ...
is a broad family of models for covering many types of data, not just networks. An ERGM is a model from this family which describes networks. Formally a
random graph In mathematics, random graph is the general term to refer to probability distributions over graphs. Random graphs may be described simply by a probability distribution, or by a random process which generates them. The theory of random graphs li ...
Y \in \mathcal consists of a set of n nodes and m dyads (edges) \ where Y_=1 if the nodes (i,j) are connected and Y_=0 otherwise. The basic assumption of these models is that the structure in an observed graph y can be explained by a given vector of
sufficient statistics In statistics, a statistic is ''sufficient'' with respect to a statistical model and its associated unknown parameter if "no other statistic that can be calculated from the same sample provides any additional information as to the value of the p ...
s(y) which are a function of the observed network and, in some cases, nodal attributes. This way, it is possible to describe any kind of dependence between the undyadic variables: P(Y = y , \theta) = \frac,\quad\forall y\in\mathcal where \theta is a vector of model parameters associated with s(y) and c(\theta) = \sum_\exp(\theta^ s(y')) is a normalising constant. These models represent a probability distribution on each possible network on n nodes. However, the size of the set of possible networks for an undirected network (simple graph) of size n is 2^. Because the number of possible networks in the set vastly outnumbers the number of parameters which can constrain the model, the ideal probability distribution is the one which maximizes the
Gibbs entropy The concept entropy was first developed by German physicist Rudolf Clausius in the mid-nineteenth century as a thermodynamic property that predicts that certain spontaneous processes are irreversible or impossible. In statistical mechanics, entropy ...
.


References


Further reading

* * * * * * *Harris, Jenine K (2014). An introduction to exponential random graph modeling. Sage. * * * * * * * * * * * * * * * *{{cite journal, last1=van Duijn , first1=M. A. J. , last2=Gile , first2=K. J. , author2-link = Krista Gile , last3=Handcock , first3=M. S. , year=2009 , title=A framework for the comparison of maximum pseudo-likelihood and maximum likelihood estimation of exponential family random graph models , journal=Social Networks , volume=31 , issue=1 , pages=52–62 , doi=10.1016/j.socnet.2008.10.003, pmid=23170041 , pmc=3500576 Network theory