The Hammersley–Clifford theorem is a result in

probability theory Probability theory or probability calculus is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expre ...

mathematical statistics Mathematical statistics is the application of probability theory and other mathematical concepts to statistics, as opposed to techniques for collecting statistical data. Specific mathematical techniques that are commonly used in statistics inc ...

and

statistical mechanics In physics, statistical mechanics is a mathematical framework that applies statistical methods and probability theory to large assemblies of microscopic entities. Sometimes called statistical physics or statistical thermodynamics, its applicati ...

that gives necessary and sufficient conditions under which a strictly positive

probability distribution In probability theory and statistics, a probability distribution is a Function (mathematics), function that gives the probabilities of occurrence of possible events for an Experiment (probability theory), experiment. It is a mathematical descri ...

can be represented as events generated by a

Markov network In the domain of physics and probability, a Markov random field (MRF), Markov network or undirected graphical model is a set of random variables having a Markov property described by an undirected graph. In other words, a random field is said to ...

(also known as a

Markov random field In the domain of physics and probability, a Markov random field (MRF), Markov network or undirected graphical model is a set of random variables having a Markov property described by an undirected graph In discrete mathematics, particularly ...

). It is the fundamental theorem of random fields. It states that a probability distribution that has a strictly positive

mass Mass is an Intrinsic and extrinsic properties, intrinsic property of a physical body, body. It was traditionally believed to be related to the physical quantity, quantity of matter in a body, until the discovery of the atom and particle physi ...

density Density (volumetric mass density or specific mass) is the ratio of a substance's mass to its volume. The symbol most often used for density is ''ρ'' (the lower case Greek letter rho), although the Latin letter ''D'' (or ''d'') can also be u ...

satisfies one of the Markov properties with respect to an undirected graph ''G'' if and only if it is a

Gibbs random field In physics and mathematics, the Gibbs measure, named after Josiah Willard Gibbs, is a probability measure frequently seen in many problems of probability theory and statistical mechanics. It is a generalization of the canonical ensemble to infinite ...

, that is, its density can be factorized over the cliques (or complete subgraphs) of the graph. The relationship between Markov and Gibbs random fields was initiated by

Roland Dobrushin Roland Lvovich Dobrushin () (July 20, 1929 – November 12, 1995) was a mathematician who made important contributions to probability theory, mathematical physics, and information theory. Life and work Dobrushin received his Ph.D. at Moscow St ...

and

Frank Spitzer Frank Ludvig Spitzer (July 24, 1926 – February 1, 1992) was an Austrian-born, Jewish-American mathematician who was a longtime professor at Cornell University and made fundamental contributions to probability theory, especially the theory of ra ...

in the context of

. The theorem is named after

John Hammersley John Michael Hammersley, (21 March 1920 – 2 May 2004) was a British mathematician best known for his foundational work in the theory of self-avoiding walks and percolation theory. Early life and education Hammersley was born in Helensburgh i ...

and Peter Clifford, who proved the equivalence in an unpublished paper in 1971. Simpler proofs using the

inclusion–exclusion principle In combinatorics, the inclusion–exclusion principle is a counting technique which generalizes the familiar method of obtaining the number of elements in the union (set theory), union of two finite sets; symbolically expressed as : , A \cup B, ...

were given independently by

Geoffrey Grimmett Geoffrey Richard Grimmett (born 20 December 1950) is an English mathematician known for his work on the mathematics of random systems arising in probability theory and statistical mechanics, especially percolation theory and the contact process. ...

, Preston and Sherman in 1973, with a further proof by

Julian Besag Julian Ernst Besag Fellow of the Royal Society, FRS (26 March 1945 – 6 August 2010) was a British statistician known chiefly for his work in spatial statistics (including its applications to epidemiology, image analysis and agricultural science ...

in 1974.

Proof outline

It is a trivial matter to show that a Gibbs random field satisfies every

Markov property In probability theory and statistics, the term Markov property refers to the memoryless property of a stochastic process, which means that its future evolution is independent of its history. It is named after the Russian mathematician Andrey Ma ...

. As an example of this fact, see the following: In the image to the right, a Gibbs random field over the provided graph has the form

\Pr(A,B,C,D,E,F) \propto f_1(A,B,D)f_2(A,C,D)f_3(C,D,F)f_4(C,E,F)

. If variables

C

and

D

are fixed, then the global Markov property requires that:

A, B \perp E, F ,  C, D

(see

conditional independence In probability theory, conditional independence describes situations wherein an observation is irrelevant or redundant when evaluating the certainty of a hypothesis. Conditional independence is usually formulated in terms of conditional probabi ...

), since

C, D

forms a barrier between

A, B

and

E, F

. With

C

and

D

constant,

\Pr(A,B,E,F, C=c,D=d) \propto_1(A,B,d)f_2(A,c,d) \cdot_3(c,d,F)f_4(c,E,F) = g_1(A,B)g_2(E,F)

where

g_1(A,B) = f_1(A,B,d)f_2(A,c,d)

and

g_2(E,F) = f_3(c,d,F)f_4(c,E,F)

. This implies that

A, B \perp E, F ,  C, D

. To establish that every positive probability distribution that satisfies the local Markov property is also a Gibbs random field, the following lemma, which provides a means for combining different factorizations, needs to be proved: Merging_two_factorizations_of_a_positive_mass_function

Merging_two_factorizations_of_a_positive_mass_function

Lemma 1 Let

U

denote the set of all random variables under consideration, and let

\Theta, \Phi_1, \Phi_2, \dots, \Phi_n \subseteq U

and

\Psi_1, \Psi_2, \dots, \Psi_m \subseteq U

denote arbitrary sets of variables. (Here, given an arbitrary set of variables

X

X

will also denote an arbitrary assignment to the variables from

X

.) If

\Pr(U) = f(\Theta)\prod_^n g_i(\Phi_i) = \prod_^m h_j(\Psi_j)

for functions

f, g_1, g_2, \dots g_n

and

h_1, h_2, \dots, h_m

, then there exist functions

h'_1, h'_2, \dots, h'_m

and

g'_1, g'_2, \dots, g'_n

such that

\Pr(U) = \bigg(\prod_^m h'_j(\Theta \cap \Psi_j)\bigg)\bigg(\prod_^n g'_i(\Phi_i)\bigg)

In other words,

\prod_^m h_j(\Psi_j)

provides a template for further factorization of

f(\Theta)

. In order to use

\prod_^m h_j(\Psi_j)

as a template to further factorize

f(\Theta)

, all variables outside of

\Theta

need to be fixed. To this end, let

\bar

be an arbitrary fixed assignment to the variables from

U \setminus \Theta

(the variables not in

\Theta

). For an arbitrary set of variables

X

, let

\bar /math> denote the assignment \bar restricted to the variables from X \setminus \Theta (the variables from X, excluding the variables from \Theta). 

Moreover, to factorize only f(\Theta), the other factors g_1(\Phi_1), g_2(\Phi_2), ..., g_n(\Phi_n) need to be rendered moot for the variables from \Theta . To do this, the factorization \Pr(U) = f(\Theta)\prod_^n g_i(\Phi_i) will be re-expressed as \Pr(U) = \bigg(f(\Theta)\prod_^n g_i(\Phi_i \cap \Theta, \bar Phi_i \bigg)\bigg(\prod_^n \frac\bigg) For each i = 1, 2, ..., n : g_i(\Phi_i \cap \Theta, \bar Phi_i is g_i(\Phi_i) where all variables outside of \Theta have been fixed to the values prescribed by \bar . 

Let f'(\Theta) = f(\Theta)\prod_^n g_i(\Phi_i \cap \Theta, \bar Phi_i and g'_i(\Phi_i) = \frac for each i = 1, 2, \dots, n so \Pr(U) = f'(\Theta)\prod_^n g'_i(\Phi_i) = \prod_^m h_j(\Psi_j) What is most important is that g'_i(\Phi_i) = \frac = 1 when the values assigned to \Phi_i do not conflict with the values prescribed by \bar, making g'_i(\Phi_i) "disappear" when all variables not in \Theta are fixed to the values from \bar .

Fixing all variables not in \Theta to the values from \bar gives \Pr(\Theta, \bar) = f'(\Theta) \prod_^n g'_i(\Phi_i \cap \Theta, \bar Phi_i = \prod_^m h_j(\Psi_j \cap \Theta, \bar

Psi_j Psi, PSI or Ψ may refer to: Alphabetic letters * Psi (Greek) (Ψ or ψ), the twenty-third letter of the Greek alphabet * Psi (Cyrillic), letter of the early Cyrillic alphabet, adopted from Greek Arts and entertainment * "Psi" as an abbreviatio ...

Since

g'_i(\Phi_i \cap \Theta, \bar Phi_i = 1

f'(\Theta) = \prod_^m h_j(\Psi_j \cap \Theta, \bar

Letting

h'_j(\Theta \cap \Psi_j) = h_j(\Psi_j \cap \Theta, \bar

gives:

f'(\Theta) = \prod_^m h'_j(\Theta \cap \Psi_j)

which finally gives:

\Pr(U) = \bigg(\prod_^m h'_j(\Theta \cap \Psi_j)\bigg)\bigg(\prod_^n g'_i(\Phi_i)\bigg)

Lemma 1 provides a means of combining two different factorizations of

\Pr(U)

. The local Markov property implies that for any random variable

x \in U

, that there exists factors

f_x

and

f_

such that:

\Pr(U) = f_x(x, \partial x)f_(U \setminus \)

where

\partial x

are the neighbors of node

x

. Applying Lemma 1 repeatedly eventually factors

\Pr(U)

into a product of clique potentials (see the image on the right). End of Proof

Proof outline

See also

Notes

Further reading