HOME

TheInfoList



OR:

The Hammersley–Clifford theorem is a result in
probability theory Probability theory or probability calculus is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expre ...
,
mathematical statistics Mathematical statistics is the application of probability theory and other mathematical concepts to statistics, as opposed to techniques for collecting statistical data. Specific mathematical techniques that are commonly used in statistics inc ...
and
statistical mechanics In physics, statistical mechanics is a mathematical framework that applies statistical methods and probability theory to large assemblies of microscopic entities. Sometimes called statistical physics or statistical thermodynamics, its applicati ...
that gives necessary and sufficient conditions under which a strictly positive
probability distribution In probability theory and statistics, a probability distribution is a Function (mathematics), function that gives the probabilities of occurrence of possible events for an Experiment (probability theory), experiment. It is a mathematical descri ...
can be represented as events generated by a
Markov network In the domain of physics and probability, a Markov random field (MRF), Markov network or undirected graphical model is a set of random variables having a Markov property described by an undirected graph. In other words, a random field is said to ...
(also known as a
Markov random field In the domain of physics and probability, a Markov random field (MRF), Markov network or undirected graphical model is a set of random variables having a Markov property described by an undirected graph In discrete mathematics, particularly ...
). It is the fundamental theorem of random fields. It states that a probability distribution that has a strictly positive
mass Mass is an Intrinsic and extrinsic properties, intrinsic property of a physical body, body. It was traditionally believed to be related to the physical quantity, quantity of matter in a body, until the discovery of the atom and particle physi ...
or
density Density (volumetric mass density or specific mass) is the ratio of a substance's mass to its volume. The symbol most often used for density is ''ρ'' (the lower case Greek letter rho), although the Latin letter ''D'' (or ''d'') can also be u ...
satisfies one of the Markov properties with respect to an undirected graph ''G'' if and only if it is a
Gibbs random field In physics and mathematics, the Gibbs measure, named after Josiah Willard Gibbs, is a probability measure frequently seen in many problems of probability theory and statistical mechanics. It is a generalization of the canonical ensemble to infinite ...
, that is, its density can be factorized over the cliques (or complete subgraphs) of the graph. The relationship between Markov and Gibbs random fields was initiated by
Roland Dobrushin Roland Lvovich Dobrushin () (July 20, 1929 – November 12, 1995) was a mathematician who made important contributions to probability theory, mathematical physics, and information theory. Life and work Dobrushin received his Ph.D. at Moscow St ...
and
Frank Spitzer Frank Ludvig Spitzer (July 24, 1926 – February 1, 1992) was an Austrian-born, Jewish-American mathematician who was a longtime professor at Cornell University and made fundamental contributions to probability theory, especially the theory of ra ...
in the context of
statistical mechanics In physics, statistical mechanics is a mathematical framework that applies statistical methods and probability theory to large assemblies of microscopic entities. Sometimes called statistical physics or statistical thermodynamics, its applicati ...
. The theorem is named after
John Hammersley John Michael Hammersley, (21 March 1920 – 2 May 2004) was a British mathematician best known for his foundational work in the theory of self-avoiding walks and percolation theory. Early life and education Hammersley was born in Helensburgh i ...
and Peter Clifford, who proved the equivalence in an unpublished paper in 1971. Simpler proofs using the
inclusion–exclusion principle In combinatorics, the inclusion–exclusion principle is a counting technique which generalizes the familiar method of obtaining the number of elements in the union (set theory), union of two finite sets; symbolically expressed as : , A \cup B, ...
were given independently by
Geoffrey Grimmett Geoffrey Richard Grimmett (born 20 December 1950) is an English mathematician known for his work on the mathematics of random systems arising in probability theory and statistical mechanics, especially percolation theory and the contact process. ...
, Preston and Sherman in 1973, with a further proof by
Julian Besag Julian Ernst Besag Fellow of the Royal Society, FRS (26 March 1945 – 6 August 2010) was a British statistician known chiefly for his work in spatial statistics (including its applications to epidemiology, image analysis and agricultural science ...
in 1974.


Proof outline

It is a trivial matter to show that a Gibbs random field satisfies every
Markov property In probability theory and statistics, the term Markov property refers to the memoryless property of a stochastic process, which means that its future evolution is independent of its history. It is named after the Russian mathematician Andrey Ma ...
. As an example of this fact, see the following: In the image to the right, a Gibbs random field over the provided graph has the form \Pr(A,B,C,D,E,F) \propto f_1(A,B,D)f_2(A,C,D)f_3(C,D,F)f_4(C,E,F). If variables C and D are fixed, then the global Markov property requires that: A, B \perp E, F , C, D (see
conditional independence In probability theory, conditional independence describes situations wherein an observation is irrelevant or redundant when evaluating the certainty of a hypothesis. Conditional independence is usually formulated in terms of conditional probabi ...
), since C, D forms a barrier between A, B and E, F. With C and D constant, \Pr(A,B,E,F, C=c,D=d) \propto _1(A,B,d)f_2(A,c,d)\cdot _3(c,d,F)f_4(c,E,F)= g_1(A,B)g_2(E,F) where g_1(A,B) = f_1(A,B,d)f_2(A,c,d) and g_2(E,F) = f_3(c,d,F)f_4(c,E,F). This implies that A, B \perp E, F , C, D. To establish that every positive probability distribution that satisfies the local Markov property is also a Gibbs random field, the following lemma, which provides a means for combining different factorizations, needs to be proved: Lemma 1 Let U denote the set of all random variables under consideration, and let \Theta, \Phi_1, \Phi_2, \dots, \Phi_n \subseteq U and \Psi_1, \Psi_2, \dots, \Psi_m \subseteq U denote arbitrary sets of variables. (Here, given an arbitrary set of variables X, X will also denote an arbitrary assignment to the variables from X.) If \Pr(U) = f(\Theta)\prod_^n g_i(\Phi_i) = \prod_^m h_j(\Psi_j) for functions f, g_1, g_2, \dots g_n and h_1, h_2, \dots, h_m, then there exist functions h'_1, h'_2, \dots, h'_m and g'_1, g'_2, \dots, g'_n such that \Pr(U) = \bigg(\prod_^m h'_j(\Theta \cap \Psi_j)\bigg)\bigg(\prod_^n g'_i(\Phi_i)\bigg) In other words, \prod_^m h_j(\Psi_j) provides a template for further factorization of f(\Theta). In order to use \prod_^m h_j(\Psi_j) as a template to further factorize f(\Theta), all variables outside of \Theta need to be fixed. To this end, let \bar be an arbitrary fixed assignment to the variables from U \setminus \Theta (the variables not in \Theta). For an arbitrary set of variables X, let \bar /math> denote the assignment \bar restricted to the variables from X \setminus \Theta (the variables from X, excluding the variables from \Theta). Moreover, to factorize only f(\Theta), the other factors g_1(\Phi_1), g_2(\Phi_2), ..., g_n(\Phi_n) need to be rendered moot for the variables from \Theta. To do this, the factorization \Pr(U) = f(\Theta)\prod_^n g_i(\Phi_i) will be re-expressed as \Pr(U) = \bigg(f(\Theta)\prod_^n g_i(\Phi_i \cap \Theta, \bar Phi_i\bigg)\bigg(\prod_^n \frac\bigg) For each i = 1, 2, ..., n: g_i(\Phi_i \cap \Theta, \bar Phi_i is g_i(\Phi_i) where all variables outside of \Theta have been fixed to the values prescribed by \bar. Let f'(\Theta) = f(\Theta)\prod_^n g_i(\Phi_i \cap \Theta, \bar Phi_i and g'_i(\Phi_i) = \frac for each i = 1, 2, \dots, n so \Pr(U) = f'(\Theta)\prod_^n g'_i(\Phi_i) = \prod_^m h_j(\Psi_j) What is most important is that g'_i(\Phi_i) = \frac = 1 when the values assigned to \Phi_i do not conflict with the values prescribed by \bar, making g'_i(\Phi_i) "disappear" when all variables not in \Theta are fixed to the values from \bar. Fixing all variables not in \Theta to the values from \bar gives \Pr(\Theta, \bar) = f'(\Theta) \prod_^n g'_i(\Phi_i \cap \Theta, \bar Phi_i = \prod_^m h_j(\Psi_j \cap \Theta, \bar
Psi_j Psi, PSI or Ψ may refer to: Alphabetic letters * Psi (Greek) (Ψ or ψ), the twenty-third letter of the Greek alphabet * Psi (Cyrillic), letter of the early Cyrillic alphabet, adopted from Greek Arts and entertainment * "Psi" as an abbreviatio ...
Since g'_i(\Phi_i \cap \Theta, \bar Phi_i = 1, f'(\Theta) = \prod_^m h_j(\Psi_j \cap \Theta, \bar
Psi_j Psi, PSI or Ψ may refer to: Alphabetic letters * Psi (Greek) (Ψ or ψ), the twenty-third letter of the Greek alphabet * Psi (Cyrillic), letter of the early Cyrillic alphabet, adopted from Greek Arts and entertainment * "Psi" as an abbreviatio ...
Letting h'_j(\Theta \cap \Psi_j) = h_j(\Psi_j \cap \Theta, \bar
Psi_j Psi, PSI or Ψ may refer to: Alphabetic letters * Psi (Greek) (Ψ or ψ), the twenty-third letter of the Greek alphabet * Psi (Cyrillic), letter of the early Cyrillic alphabet, adopted from Greek Arts and entertainment * "Psi" as an abbreviatio ...
gives: f'(\Theta) = \prod_^m h'_j(\Theta \cap \Psi_j) which finally gives: \Pr(U) = \bigg(\prod_^m h'_j(\Theta \cap \Psi_j)\bigg)\bigg(\prod_^n g'_i(\Phi_i)\bigg) Lemma 1 provides a means of combining two different factorizations of \Pr(U). The local Markov property implies that for any random variable x \in U, that there exists factors f_x and f_ such that: \Pr(U) = f_x(x, \partial x)f_(U \setminus \) where \partial x are the neighbors of node x. Applying Lemma 1 repeatedly eventually factors \Pr(U) into a product of clique potentials (see the image on the right). End of Proof


See also

*
Markov random field In the domain of physics and probability, a Markov random field (MRF), Markov network or undirected graphical model is a set of random variables having a Markov property described by an undirected graph In discrete mathematics, particularly ...
*
Conditional random field Conditional random fields (CRFs) are a class of statistical modeling methods often applied in pattern recognition and machine learning and used for structured prediction. Whereas a classifier predicts a label for a single sample without consi ...


Notes


Further reading

* * {{DEFAULTSORT:Hammersley-Clifford theorem Theorems in probability theory Theorems in statistics Markov networks