The Hammersley–Clifford theorem is a result in

probability theory Probability theory is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expressing it through a set ...

mathematical statistics Mathematical statistics is the application of probability theory, a branch of mathematics, to statistics, as opposed to techniques for collecting statistical data. Specific mathematical techniques which are used for this include mathematical an ...

and statistical mechanics that gives necessary and sufficient conditions under which a strictly positive probability distribution (of events in a

probability space In probability theory, a probability space or a probability triple (\Omega, \mathcal, P) is a mathematical construct that provides a formal model of a random process or "experiment". For example, one can define a probability space which models t ...

) can be represented as events generated by a

Markov network In the domain of physics and probability, a Markov random field (MRF), Markov network or undirected graphical model is a set of random variables having a Markov property described by an undirected graph. In other words, a random field is said to ...

(also known as a

Markov random field In the domain of physics and probability, a Markov random field (MRF), Markov network or undirected graphical model is a set of random variables having a Markov property described by an undirected graph. In other words, a random field is said to b ...

). It is the fundamental theorem of random fields. It states that a probability distribution that has a strictly positive

mass Mass is an intrinsic property of a body. It was traditionally believed to be related to the quantity of matter in a physical body, until the discovery of the atom and particle physics. It was found that different atoms and different eleme ...

density Density (volumetric mass density or specific mass) is the substance's mass per unit of volume. The symbol most often used for density is ''ρ'' (the lower case Greek letter rho), although the Latin letter ''D'' can also be used. Mathematical ...

satisfies one of the Markov properties with respect to an undirected graph ''G'' if and only if it is a Gibbs random field, that is, its density can be factorized over the cliques (or complete subgraphs) of the graph. The relationship between Markov and Gibbs random fields was initiated by Roland Dobrushin and Frank Spitzer in the context of statistical mechanics. The theorem is named after John Hammersley and Peter Clifford, who proved the equivalence in an unpublished paper in 1971. Simpler proofs using the

inclusion–exclusion principle In combinatorics, a branch of mathematics, the inclusion–exclusion principle is a counting technique which generalizes the familiar method of obtaining the number of elements in the union of two finite sets; symbolically expressed as : , A \cu ...

were given independently by

Geoffrey Grimmett Geoffrey Richard Grimmett (born 20 December 1950) is a mathematician known for his work on the mathematics of random systems arising in probability theory and statistical mechanics, especially percolation theory and the contact process. He is ...

, Preston and Sherman in 1973, with a further proof by Julian Besag in 1974.

Proof outline

It is a trivial matter to show that a Gibbs random field satisfies every

Markov property In probability theory and statistics, the term Markov property refers to the memoryless property of a stochastic process. It is named after the Russian mathematician Andrey Markov. The term strong Markov property is similar to the Markov propert ...

. As an example of this fact, see the following: In the image to the right, a Gibbs random field over the provided graph has the form

\Pr(A,B,C,D,E,F) \propto f_1(A,B,D)f_2(A,C,D)f_3(C,D,F)f_4(C,E,F)

. If variables

C

and

D

are fixed, then the global Markov property requires that:

A, B \perp E, F ,  C, D

(see

conditional independence In probability theory, conditional independence describes situations wherein an observation is irrelevant or redundant when evaluating the certainty of a hypothesis. Conditional independence is usually formulated in terms of conditional probabil ...

), since

C, D

forms a barrier between

A, B

and

E, F

. With

C

and

D

constant,

\Pr(A,B,E,F, C=c,D=d) \propto_1(A,B,d)f_2(A,c,d) \cdot_3(c,d,F)f_4(c,E,F) = g_1(A,B)g_2(E,F)

where

g_1(A,B) = f_1(A,B,d)f_2(A,c,d)

and

g_2(E,F) = f_3(c,d,F)f_4(c,E,F)

. This implies that

A, B \perp E, F ,  C, D

. To establish that every positive probability distribution that satisfies the local Markov property is also a Gibbs random field, the following lemma, which provides a means for combining different factorizations, needs to be proved: Merging_two_factorizations_of_a_positive_mass_function

Merging_two_factorizations_of_a_positive_mass_function

Lemma 1 Let

U

denote the set of all random variables under consideration, and let

\Theta, \Phi_1, \Phi_2, \dots, \Phi_n \subseteq U

and

\Psi_1, \Psi_2, \dots, \Psi_m \subseteq U

denote arbitrary sets of variables. (Here, given an arbitrary set of variables

X

X

will also denote an arbitrary assignment to the variables from

X

.) If

\Pr(U) = f(\Theta)\prod_^n g_i(\Phi_i) = \prod_^m h_j(\Psi_j)

for functions

f, g_1, g_2, \dots g_n

and

h_1, h_2, \dots, h_m

, then there exist functions

h'_1, h'_2, \dots, h'_m

and

g'_1, g'_2, \dots, g'_n

such that

\Pr(U) = \bigg(\prod_^m h'_j(\Theta \cap \Psi_j)\bigg)\bigg(\prod_^n g'_i(\Phi_i)\bigg)

In other words,

\prod_^m h_j(\Psi_j)

provides a template for further factorization of

f(\Theta)

. In order to use

\prod_^m h_j(\Psi_j)

as a template to further factorize

f(\Theta)

, all variables outside of

\Theta

need to be fixed. To this end, let

\bar

be an arbitrary fixed assignment to the variables from

U \setminus \Theta

(the variables not in

\Theta

). For an arbitrary set of variables

X

, let

\bar /math> denote the assignment \bar restricted to the variables from X \setminus \Theta (the variables from X, excluding the variables from \Theta). 

Moreover, to factorize only f(\Theta), the other factors g_1(\Phi_1), g_2(\Phi_2), ..., g_n(\Phi_n) need to be rendered moot for the variables from \Theta . To do this, the factorization \Pr(U) = f(\Theta)\prod_^n g_i(\Phi_i) will be re-expressed as \Pr(U) = \bigg(f(\Theta)\prod_^n g_i(\Phi_i \cap \Theta, \bar Phi_i \bigg)\bigg(\prod_^n \frac\bigg) For each i = 1, 2, ..., n : g_i(\Phi_i \cap \Theta, \bar Phi_i is g_i(\Phi_i) where all variables outside of \Theta have been fixed to the values prescribed by \bar . 

Let f'(\Theta) = f(\Theta)\prod_^n g_i(\Phi_i \cap \Theta, \bar Phi_i and g'_i(\Phi_i) = \frac for each i = 1, 2, \dots, n so \Pr(U) = f'(\Theta)\prod_^n g'_i(\Phi_i) = \prod_^m h_j(\Psi_j) What is most important is that g'_i(\Phi_i) = \frac = 1 when the values assigned to \Phi_i do not conflict with the values prescribed by \bar, making g'_i(\Phi_i) "disappear" when all variables not in \Theta are fixed to the values from \bar .

Fixing all variables not in \Theta to the values from \bar gives \Pr(\Theta, \bar) = f'(\Theta) \prod_^n g'_i(\Phi_i \cap \Theta, \bar Phi_i = \prod_^m h_j(\Psi_j \cap \Theta, \bar Psi_j Since g'_i(\Phi_i \cap \Theta, \bar Phi_i = 1, f'(\Theta) = \prod_^m h_j(\Psi_j \cap \Theta, \bar Psi_j Letting h'_j(\Theta \cap \Psi_j) = h_j(\Psi_j \cap \Theta, \bar Psi_j gives: f'(\Theta) = \prod_^m h'_j(\Theta \cap \Psi_j) which finally gives: \Pr(U) = \bigg(\prod_^m h'_j(\Theta \cap \Psi_j)\bigg)\bigg(\prod_^n g'_i(\Phi_i)\bigg)

Lemma 1 provides a means of combining two different factorizations of

\Pr(U)

. The local Markov property implies that for any random variable

x \in U

, that there exists factors

f_x

and

f_

such that:

\Pr(U) = f_x(x, \partial x)f_(U \setminus \)

where

\partial x

are the neighbors of node

x

. Applying Lemma 1 repeatedly eventually factors

\Pr(U)

into a product of clique potentials (see the image on the right). End of Proof

Proof outline

See also

Notes

Further reading