Hammersley–Clifford theorem
   HOME

TheInfoList



OR:

The Hammersley–Clifford theorem is a result in
probability theory Probability theory is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expressing it through a set ...
,
mathematical statistics Mathematical statistics is the application of probability theory, a branch of mathematics, to statistics, as opposed to techniques for collecting statistical data. Specific mathematical techniques which are used for this include mathematical an ...
and statistical mechanics that gives necessary and sufficient conditions under which a strictly positive probability distribution (of events in a
probability space In probability theory, a probability space or a probability triple (\Omega, \mathcal, P) is a mathematical construct that provides a formal model of a random process or "experiment". For example, one can define a probability space which models t ...
) can be represented as events generated by a
Markov network In the domain of physics and probability, a Markov random field (MRF), Markov network or undirected graphical model is a set of random variables having a Markov property described by an undirected graph. In other words, a random field is said to ...
(also known as a
Markov random field In the domain of physics and probability, a Markov random field (MRF), Markov network or undirected graphical model is a set of random variables having a Markov property described by an undirected graph. In other words, a random field is said to b ...
). It is the fundamental theorem of random fields. It states that a probability distribution that has a strictly positive
mass Mass is an intrinsic property of a body. It was traditionally believed to be related to the quantity of matter in a physical body, until the discovery of the atom and particle physics. It was found that different atoms and different eleme ...
or
density Density (volumetric mass density or specific mass) is the substance's mass per unit of volume. The symbol most often used for density is ''ρ'' (the lower case Greek letter rho), although the Latin letter ''D'' can also be used. Mathematical ...
satisfies one of the Markov properties with respect to an undirected graph ''G'' if and only if it is a Gibbs random field, that is, its density can be factorized over the cliques (or complete subgraphs) of the graph. The relationship between Markov and Gibbs random fields was initiated by Roland Dobrushin and Frank Spitzer in the context of statistical mechanics. The theorem is named after John Hammersley and Peter Clifford, who proved the equivalence in an unpublished paper in 1971. Simpler proofs using the
inclusion–exclusion principle In combinatorics, a branch of mathematics, the inclusion–exclusion principle is a counting technique which generalizes the familiar method of obtaining the number of elements in the union of two finite sets; symbolically expressed as : , A \cu ...
were given independently by
Geoffrey Grimmett Geoffrey Richard Grimmett (born 20 December 1950) is a mathematician known for his work on the mathematics of random systems arising in probability theory and statistical mechanics, especially percolation theory and the contact process. He is ...
, Preston and Sherman in 1973, with a further proof by Julian Besag in 1974.


Proof outline

It is a trivial matter to show that a Gibbs random field satisfies every
Markov property In probability theory and statistics, the term Markov property refers to the memoryless property of a stochastic process. It is named after the Russian mathematician Andrey Markov. The term strong Markov property is similar to the Markov propert ...
. As an example of this fact, see the following: In the image to the right, a Gibbs random field over the provided graph has the form \Pr(A,B,C,D,E,F) \propto f_1(A,B,D)f_2(A,C,D)f_3(C,D,F)f_4(C,E,F). If variables C and D are fixed, then the global Markov property requires that: A, B \perp E, F , C, D (see
conditional independence In probability theory, conditional independence describes situations wherein an observation is irrelevant or redundant when evaluating the certainty of a hypothesis. Conditional independence is usually formulated in terms of conditional probabil ...
), since C, D forms a barrier between A, B and E, F. With C and D constant, \Pr(A,B,E,F, C=c,D=d) \propto _1(A,B,d)f_2(A,c,d)\cdot _3(c,d,F)f_4(c,E,F)= g_1(A,B)g_2(E,F) where g_1(A,B) = f_1(A,B,d)f_2(A,c,d) and g_2(E,F) = f_3(c,d,F)f_4(c,E,F). This implies that A, B \perp E, F , C, D. To establish that every positive probability distribution that satisfies the local Markov property is also a Gibbs random field, the following lemma, which provides a means for combining different factorizations, needs to be proved: Lemma 1 Let U denote the set of all random variables under consideration, and let \Theta, \Phi_1, \Phi_2, \dots, \Phi_n \subseteq U and \Psi_1, \Psi_2, \dots, \Psi_m \subseteq U denote arbitrary sets of variables. (Here, given an arbitrary set of variables X, X will also denote an arbitrary assignment to the variables from X.) If \Pr(U) = f(\Theta)\prod_^n g_i(\Phi_i) = \prod_^m h_j(\Psi_j) for functions f, g_1, g_2, \dots g_n and h_1, h_2, \dots, h_m, then there exist functions h'_1, h'_2, \dots, h'_m and g'_1, g'_2, \dots, g'_n such that \Pr(U) = \bigg(\prod_^m h'_j(\Theta \cap \Psi_j)\bigg)\bigg(\prod_^n g'_i(\Phi_i)\bigg) In other words, \prod_^m h_j(\Psi_j) provides a template for further factorization of f(\Theta). In order to use \prod_^m h_j(\Psi_j) as a template to further factorize f(\Theta), all variables outside of \Theta need to be fixed. To this end, let \bar be an arbitrary fixed assignment to the variables from U \setminus \Theta (the variables not in \Theta). For an arbitrary set of variables X, let \bar /math> denote the assignment \bar restricted to the variables from X \setminus \Theta (the variables from X, excluding the variables from \Theta). Moreover, to factorize only f(\Theta), the other factors g_1(\Phi_1), g_2(\Phi_2), ..., g_n(\Phi_n) need to be rendered moot for the variables from \Theta. To do this, the factorization \Pr(U) = f(\Theta)\prod_^n g_i(\Phi_i) will be re-expressed as \Pr(U) = \bigg(f(\Theta)\prod_^n g_i(\Phi_i \cap \Theta, \bar Phi_i\bigg)\bigg(\prod_^n \frac\bigg) For each i = 1, 2, ..., n: g_i(\Phi_i \cap \Theta, \bar Phi_i is g_i(\Phi_i) where all variables outside of \Theta have been fixed to the values prescribed by \bar. Let f'(\Theta) = f(\Theta)\prod_^n g_i(\Phi_i \cap \Theta, \bar Phi_i and g'_i(\Phi_i) = \frac for each i = 1, 2, \dots, n so \Pr(U) = f'(\Theta)\prod_^n g'_i(\Phi_i) = \prod_^m h_j(\Psi_j) What is most important is that g'_i(\Phi_i) = \frac = 1 when the values assigned to \Phi_i do not conflict with the values prescribed by \bar, making g'_i(\Phi_i) "disappear" when all variables not in \Theta are fixed to the values from \bar. Fixing all variables not in \Theta to the values from \bar gives \Pr(\Theta, \bar) = f'(\Theta) \prod_^n g'_i(\Phi_i \cap \Theta, \bar Phi_i = \prod_^m h_j(\Psi_j \cap \Theta, \bar Psi_j Since g'_i(\Phi_i \cap \Theta, \bar Phi_i = 1, f'(\Theta) = \prod_^m h_j(\Psi_j \cap \Theta, \bar Psi_j Letting h'_j(\Theta \cap \Psi_j) = h_j(\Psi_j \cap \Theta, \bar Psi_j gives: f'(\Theta) = \prod_^m h'_j(\Theta \cap \Psi_j) which finally gives: \Pr(U) = \bigg(\prod_^m h'_j(\Theta \cap \Psi_j)\bigg)\bigg(\prod_^n g'_i(\Phi_i)\bigg) Lemma 1 provides a means of combining two different factorizations of \Pr(U). The local Markov property implies that for any random variable x \in U, that there exists factors f_x and f_ such that: \Pr(U) = f_x(x, \partial x)f_(U \setminus \) where \partial x are the neighbors of node x. Applying Lemma 1 repeatedly eventually factors \Pr(U) into a product of clique potentials (see the image on the right). End of Proof


See also

*
Markov random field In the domain of physics and probability, a Markov random field (MRF), Markov network or undirected graphical model is a set of random variables having a Markov property described by an undirected graph. In other words, a random field is said to b ...
*
Conditional random field Conditional random fields (CRFs) are a class of statistical modeling methods often applied in pattern recognition and machine learning and used for structured prediction. Whereas a classifier predicts a label for a single sample without consid ...


Notes


Further reading

* , course notes from
University of Washington The University of Washington (UW, simply Washington, or informally U-Dub) is a public research university in Seattle, Washington. Founded in 1861, Washington is one of the oldest universities on the West Coast; it was established in Seattl ...
course. * * {{DEFAULTSORT:Hammersley-Clifford theorem Probability theorems Theorems in statistics Markov networks