In the domain of

physics Physics is the natural science that studies matter, its fundamental constituents, its motion and behavior through space and time, and the related entities of energy and force. "Physical science is that department of knowledge which ...

and

probability Probability is the branch of mathematics concerning numerical descriptions of how likely an event is to occur, or how likely it is that a proposition is true. The probability of an event is a number between 0 and 1, where, roughly speaking, ...

, a Markov random field (MRF), Markov network or undirected

graphical model A graphical model or probabilistic graphical model (PGM) or structured probabilistic model is a probabilistic model for which a graph expresses the conditional dependence structure between random variables. They are commonly used in probabili ...

is a set of

random variable A random variable (also called random quantity, aleatory variable, or stochastic variable) is a mathematical formalization of a quantity or object which depends on random events. It is a mapping or a function from possible outcomes (e.g., the po ...

s having a

Markov property In probability theory and statistics, the term Markov property refers to the memoryless property of a stochastic process. It is named after the Russian mathematician Andrey Markov. The term strong Markov property is similar to the Markov propert ...

described by an

undirected graph In discrete mathematics, and more specifically in graph theory, a graph is a structure amounting to a set of objects in which some pairs of the objects are in some sense "related". The objects correspond to mathematical abstractions called '' ve ...

. In other words, a

random field In physics and mathematics, a random field is a random function over an arbitrary domain (usually a multi-dimensional space such as \mathbb^n). That is, it is a function f(x) that takes on a random value at each point x \in \mathbb^n(or some other ...

is said to be a

Markov Markov ( Bulgarian, russian: Марков), Markova, and Markoff are common surnames used in Russia and Bulgaria. Notable people with the name include: Academics *Ivana Markova (born 1938), Czechoslovak-British emeritus professor of psychology at ...

random field if it satisfies Markov properties. The concept originates from the

Sherrington–Kirkpatrick model In condensed matter physics, a spin glass is a magnetic state characterized by randomness, besides cooperative behavior in freezing of spins at a temperature called 'freezing temperature' ''Tf''. In ferromagnetic solids, component atoms' magn ...

. A Markov network or MRF is similar to a

Bayesian network A Bayesian network (also known as a Bayes network, Bayes net, belief network, or decision network) is a probabilistic graphical model that represents a set of variables and their conditional dependencies via a directed acyclic graph (DAG). Ba ...

in its representation of dependencies; the differences being that Bayesian networks are directed and acyclic, whereas Markov networks are undirected and may be cyclic. Thus, a Markov network can represent certain dependencies that a Bayesian network cannot (such as cyclic dependencies ); on the other hand, it can't represent certain dependencies that a Bayesian network can (such as induced dependencies ). The underlying graph of a Markov random field may be finite or infinite. When the joint probability density of the random variables is strictly positive, it is also referred to as a Gibbs random field, because, according to the

Hammersley–Clifford theorem The Hammersley–Clifford theorem is a result in probability theory, mathematical statistics and statistical mechanics that gives necessary and sufficient conditions under which a strictly positive probability distribution (of events in a probabili ...

, it can then be represented by a

Gibbs measure In mathematics, the Gibbs measure, named after Josiah Willard Gibbs, is a probability measure frequently seen in many problems of probability theory and statistical mechanics. It is a generalization of the canonical ensemble to infinite systems. Th ...

for an appropriate (locally defined) energy function. The prototypical Markov random field is the

Ising model The Ising model () (or Lenz-Ising model or Ising-Lenz model), named after the physicists Ernst Ising and Wilhelm Lenz, is a mathematical model of ferromagnetism in statistical mechanics. The model consists of discrete variables that represent ...

; indeed, the Markov random field was introduced as the general setting for the Ising model. In the domain of

artificial intelligence Artificial intelligence (AI) is intelligence—perceiving, synthesizing, and inferring information—demonstrated by machines, as opposed to intelligence displayed by animals and humans. Example tasks in which this is done include speech ...

, a Markov random field is used to model various low- to mid-level tasks in

image processing An image is a visual representation of something. It can be two-dimensional, three-dimensional, or somehow otherwise feed into the visual system to convey information. An image can be an artifact, such as a photograph or other two-dimensio ...

and

computer vision Computer vision is an interdisciplinary scientific field that deals with how computers can gain high-level understanding from digital images or videos. From the perspective of engineering, it seeks to understand and automate tasks that the human ...

Definition

Given an undirected graph

G=(V,E)

, a set of random variables

X = (X_v)_

indexed by

V

form a Markov random field with respect to

G

if they satisfy the local Markov properties: :Pairwise Markov property: Any two non-adjacent variables are conditionally independent given all other variables: ::

X_u \perp\!\!\!\perp X_v \mid X_

:Local Markov property: A variable is conditionally independent of all other variables given its neighbors: ::

X_v \perp\!\!\!\perp X_ \mid X_

:where

\operatorname(v)

is the set of neighbors of

v

, and

= v \cup \operatorname(v)

is the closed neighbourhood of

v

. :Global Markov property: Any two subsets of variables are conditionally independent given a separating subset: ::

X_A \perp\!\!\!\perp X_B \mid X_S

:where every path from a node in

A

to a node in

B

passes through

S

. The Global Markov property is stronger than the Local Markov property, which in turn is stronger than the Pairwise one. However, the above three Markov properties are equivalent for positive distributions (those that assign only nonzero probabilities to the associated variables). The relation between the three Markov properties is particularly clear in the following formulation: * Pairwise: For any

i, j \in V

not equal or adjacent,

X_i \perp\!\!\!\perp X_j ,  X_

. * Local: For any

i\in V

and

J\subset V

not containing or adjacent to

i

X_i \perp\!\!\!\perp X_J ,  X_

. * Global: For any

I, J\subset V

not intersecting or adjacent,

X_I \perp\!\!\!\perp X_J ,  X_

Clique factorization

As the Markov property of an arbitrary probability distribution can be difficult to establish, a commonly used class of Markov random fields are those that can be factorized according to the

clique A clique ( AusE, CanE, or ), in the social sciences, is a group of individuals who interact with one another and share similar interests. Interacting with cliques is part of normative social development regardless of gender, ethnicity, or popular ...

s of the graph. Given a set of random variables

X = (X_v)_

, let

P(X=x)

be the

of a particular field configuration

x

X

. That is,

P(X=x)

is the probability of finding that the random variables

X

take on the particular value

x

. Because

X

is a set, the probability of

x

should be understood to be taken with respect to a ''joint distribution'' of the

X_v

. If this joint density can be factorized over the cliques of

G

: :

P(X=x) = \prod_ \phi_C (x_C)

then

X

forms a Markov random field with respect to

G

. Here,

\operatorname(G)

is the set of cliques of

G

. The definition is equivalent if only maximal cliques are used. The functions

\phi_C

are sometimes referred to as ''factor potentials'' or ''clique potentials''. Note, however, conflicting terminology is in use: the word ''potential'' is often applied to the logarithm of

\phi_C

. This is because, in

statistical mechanics In physics, statistical mechanics is a mathematical framework that applies statistical methods and probability theory to large assemblies of microscopic entities. It does not assume or postulate any natural laws, but explains the macroscopic b ...

\log(\phi_C)

has a direct interpretation as the

potential energy In physics, potential energy is the energy held by an object because of its position relative to other objects, stresses within itself, its electric charge, or other factors. Common types of potential energy include the gravitational potenti ...

of a

configuration Configuration or configurations may refer to: Computing * Computer configuration or system configuration * Configuration file, a software file used to configure the initial settings for a computer program * Configurator, also known as choice bo ...

x_C

. Some MRF's do not factorize: a simple example can be constructed on a cycle of 4 nodes with some infinite energies, i.e. configurations of zero probabilities, even if one, more appropriately, allows the infinite energies to act on the complete graph on

V

. MRF's factorize if at least one of the following conditions is fulfilled: * the density is positive (by the

) * the graph is chordal (by equivalence to a

) When such a factorization does exist, it is possible to construct a

factor graph A factor graph is a bipartite graph representing the factorization of a function. In probability theory and its applications, factor graphs are used to represent factorization of a probability distribution function, enabling efficient computatio ...

for the network.

Exponential family

Any positive Markov random field can be written as exponential family in canonical form with feature functions

f_k

such that the full-joint distribution can be written as :

P(X=x) = \frac \exp \left( \sum_ w_k^ f_k (x_) \right)

where the notation :

w_k^ f_k (x_) = \sum_^ w_ \cdot f_(x_)

is simply a

dot product In mathematics, the dot product or scalar productThe term ''scalar product'' means literally "product with a scalar as a result". It is also used sometimes for other symmetric bilinear forms, for example in a pseudo-Euclidean space. is an alg ...

over field configurations, and ''Z'' is the partition function: :

Z = \sum_ \exp \left(\sum_ w_k^ f_k(x_)\right).

Here,

\mathcal

denotes the set of all possible assignments of values to all the network's random variables. Usually, the feature functions

f_

are defined such that they are

indicators Indicator may refer to: Biology * Environmental indicator of environmental health (pressures, conditions and responses) * Ecological indicator of ecosystem health (ecological processes) * Health indicator, which is used to describe the health o ...

of the clique's configuration, ''i.e.''

f_(x_) = 1

x_

corresponds to the ''i''-th possible configuration of the ''k''-th clique and 0 otherwise. This model is equivalent to the clique factorization model given above, if

N_k=, \operatorname(C_k),

is the cardinality of the clique, and the weight of a feature

f_

corresponds to the logarithm of the corresponding clique factor, ''i.e.''

w_ = \log \phi(c_)

, where

c_

is the ''i''-th possible configuration of the ''k''-th clique, ''i.e.'' the ''i''-th value in the domain of the clique

C_k

. The probability ''P'' is often called the Gibbs measure. This expression of a Markov field as a logistic model is only possible if all clique factors are non-zero, ''i.e.'' if none of the elements of

\mathcal

are assigned a probability of 0. This allows techniques from matrix algebra to be applied, ''e.g.'' that the trace of a matrix is log of the

determinant In mathematics, the determinant is a scalar value that is a function of the entries of a square matrix. It characterizes some properties of the matrix and the linear map represented by the matrix. In particular, the determinant is nonzero if a ...

, with the matrix representation of a graph arising from the graph's incidence matrix. The importance of the partition function ''Z'' is that many concepts from

, such as

entropy Entropy is a scientific concept, as well as a measurable physical property, that is most commonly associated with a state of disorder, randomness, or uncertainty. The term and the concept are used in diverse fields, from classical thermodyna ...

, directly generalize to the case of Markov networks, and an ''intuitive'' understanding can thereby be gained. In addition, the partition function allows variational methods to be applied to the solution of the problem: one can attach a driving force to one or more of the random variables, and explore the reaction of the network in response to this

perturbation Perturbation or perturb may refer to: * Perturbation theory, mathematical methods that give approximate solutions to problems that cannot be solved exactly * Perturbation (geology), changes in the nature of alluvial deposits over time * Perturbat ...

. Thus, for example, one may add a driving term ''J''_''v'', for each vertex ''v'' of the graph, to the partition function to get: :

= \sum_ \exp \left(\sum_ w_k^ f_k(x_) + \sum_v J_v x_v\right)

Formally differentiating with respect to ''J''_''v'' gives the

expectation value In probability theory, the expected value (also called expectation, expectancy, mathematical expectation, mean, average, or first moment) is a generalization of the weighted average. Informally, the expected value is the arithmetic mean of a l ...

of the random variable ''X''_''v'' associated with the vertex ''v'': :

= \frac \left.\frac\_.

Correlation function A correlation function is a function that gives the statistical correlation between random variables, contingent on the spatial or temporal distance between those variables. If one considers the correlation function between random variables r ...

s are computed likewise; the two-point correlation is: :

= \frac \left.\frac\_.

Unfortunately, though the likelihood of a logistic Markov network is convex, evaluating the likelihood or gradient of the likelihood of a model requires inference in the model, which is generally computationally infeasible (see 'Inference' below).

Examples

Gaussian

multivariate normal distribution In probability theory and statistics, the multivariate normal distribution, multivariate Gaussian distribution, or joint normal distribution is a generalization of the one-dimensional ( univariate) normal distribution to higher dimensions. One ...

forms a Markov random field with respect to a graph

G=(V,E)

if the missing edges correspond to zeros on the precision matrix (the inverse

covariance matrix In probability theory and statistics, a covariance matrix (also known as auto-covariance matrix, dispersion matrix, variance matrix, or variance–covariance matrix) is a square matrix giving the covariance between each pair of elements of ...

): :

X=(X_v)_ \sim \mathcal N (\boldsymbol \mu, \Sigma)

such that :

(\Sigma^)_ =0 \quad \text \quad \ \notin E .

Inference

As in a Bayesian network, one may calculate the

conditional distribution In probability theory and statistics, given two jointly distributed random variables X and Y, the conditional probability distribution of Y given X is the probability distribution of Y when X is known to be a particular value; in some cases the ...

of a set of nodes

V' = \

given values to another set of nodes

W' = \

in the Markov random field by summing over all possible assignments to

u \notin V',W'

; this is called exact inference. However, exact inference is a #P-complete problem, and thus computationally intractable in the general case. Approximation techniques such as

Markov chain Monte Carlo In statistics, Markov chain Monte Carlo (MCMC) methods comprise a class of algorithms for sampling from a probability distribution. By constructing a Markov chain that has the desired distribution as its equilibrium distribution, one can obtain ...

and loopy

belief propagation A belief is an attitude that something is the case, or that some proposition is true. In epistemology, philosophers use the term "belief" to refer to attitudes about the world which can be either true or false. To believe something is to take i ...

are often more feasible in practice. Some particular subclasses of MRFs, such as trees (see Chow–Liu tree), have polynomial-time inference algorithms; discovering such subclasses is an active research topic. There are also subclasses of MRFs that permit efficient

MAP A map is a symbolic depiction emphasizing relationships between elements of some space, such as objects, regions, or themes. Many maps are static, fixed to paper or some other durable medium, while others are dynamic or interactive. Although ...

, or most likely assignment, inference; examples of these include associative networks. Another interesting sub-class is the one of decomposable models (when the graph is chordal): having a closed-form for the MLE, it is possible to discover a consistent structure for hundreds of variables.

Conditional random fields

One notable variant of a Markov random field is a

conditional random field Conditional random fields (CRFs) are a class of statistical modeling methods often applied in pattern recognition and machine learning and used for structured prediction. Whereas a classifier predicts a label for a single sample without consid ...

, in which each random variable may also be conditioned upon a set of global observations

o

. In this model, each function

\phi_k

is a mapping from all assignments to both the

''k'' and the observations

o

to the nonnegative real numbers. This form of the Markov network may be more appropriate for producing discriminative classifiers, which do not model the distribution over the observations. CRFs were proposed by

John D. Lafferty John D. Lafferty is an American scientist, Professor at Yale University and leading researcher in machine learning. He is best known for proposing the Conditional Random Fields with Andrew McCallum and Fernando C.N. Pereira. Biography In 2017, ...

Andrew McCallum Andrew McCallum is a professor in the computer science department at University of Massachusetts Amherst. His primary specialties are in machine learning, natural language processing, information extraction, information integration, and socia ...

and Fernando C.N. Pereira in 2001.

Varied applications

Markov random fields find application in a variety of fields, ranging from

computer graphics Computer graphics deals with generating images with the aid of computers. Today, computer graphics is a core technology in digital photography, film, video games, cell phone and computer displays, and many specialized applications. A great de ...

to computer vision,

machine learning Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence. Machine ...

computational biology Computational biology refers to the use of data analysis, mathematical modeling and Computer simulation, computational simulations to understand biological systems and relationships. An intersection of computer science, biology, and big data, the ...

, and

information retrieval Information retrieval (IR) in computing and information science is the process of obtaining information system resources that are relevant to an information need from a collection of those resources. Searches can be based on full-text or other c ...

. MRFs are used in image processing to generate textures as they can be used to generate flexible and stochastic image models. In image modelling, the task is to find a suitable intensity distribution of a given image, where suitability depends on the kind of task and MRFs are flexible enough to be used for image and texture synthesis,

image compression Image compression is a type of data compression applied to digital images, to reduce their cost for storage or transmission. Algorithms may take advantage of visual perception and the statistical properties of image data to provide superior re ...

and restoration,

image segmentation In digital image processing and computer vision, image segmentation is the process of partitioning a digital image into multiple image segments, also known as image regions or image objects ( sets of pixels). The goal of segmentation is to simpli ...

, 3D image inference from 2D images, image registration, texture synthesis, super-resolution, stereo matching and

. They can be used to solve various computer vision problems which can be posed as energy minimization problems or problems where different regions have to be distinguished using a set of discriminating features, within a Markov random field framework, to predict the category of the region. Markov random fields were a generalization over the Ising model and have, since then, been used widely in combinatorial optimizations and networks.

References

{{Stochastic processes Graphical models