Maximal entropy random walk (MERW) is a popular type of
biased random walk on a graph
In network science, a biased random walk on a graph is a time path process in which an evolving variable jumps from its current state to one of various potential new states; unlike in a pure random walk, the probabilities of the potential new st ...
, in which transition probabilities are chosen accordingly to the
principle of maximum entropy
The principle of maximum entropy states that the probability distribution which best represents the current state of knowledge about a system is the one with largest entropy, in the context of precisely stated prior data (such as a proposition ...
, which says that the probability distribution which best represents the current state of knowledge is the one with largest entropy. While standard
random walk
In mathematics, a random walk is a random process that describes a path that consists of a succession of random steps on some mathematical space.
An elementary example of a random walk is the random walk on the integer number line \mathbb Z ...
chooses for every vertex uniform probability distribution among its outgoing edges, locally maximizing
entropy rate
In the mathematical theory of probability, the entropy rate or source information rate of a stochastic process is, informally, the time density of the average information in a stochastic process. For stochastic processes with a countable index, the ...
, MERW maximizes it globally (average
entropy production
Entropy production (or generation) is the amount of entropy which is produced in any irreversible processes such as heat and mass transfer processes including motion of bodies, heat exchange, fluid flow, substances expanding or mixing, anelastic d ...
) by assuming uniform probability distribution among all paths in a given graph.
MERW is used in various fields of science. A direct application is choosing probabilities to maximize transmission rate through a constrained channel, analogously to
Fibonacci coding
In mathematics and computing, Fibonacci coding is a universal code which encodes positive integers into binary code words. It is one example of representations of integers based on Fibonacci numbers. Each code word ends with "11" and contains no ...
. Its properties also made it useful for example in analysis of complex networks,
like link prediction,
community detection,
robust transport over networks
and
centrality
In graph theory and network analysis, indicators of centrality assign numbers or rankings to nodes within a graph corresponding to their network position. Applications include identifying the most influential person(s) in a social network, key ...
measures.
Also in
image analysis
Image analysis or imagery analysis is the extraction of meaningful information from images; mainly from digital images by means of digital image processing techniques. Image analysis tasks can be as simple as reading bar coded tags or as sophi ...
, for example for detecting visual saliency regions,
object localization,
[L. Wang, J. Zhao, X. Hu, J. Lu, ''Weakly supervised object localization via maximal entropy random walk'']
ICIP, 2014. tampering detection
or
tractography
In neuroscience
Neuroscience is the scientific study of the nervous system (the brain, spinal cord, and peripheral nervous system), its functions and disorders. It is a multidisciplinary science that combines physiology, anatomy, molecular b ...
problem.
Additionally, it recreates some properties of
quantum mechanics
Quantum mechanics is a fundamental theory in physics that provides a description of the physical properties of nature at the scale of atoms and subatomic particles. It is the foundation of all quantum physics including quantum chemistry, ...
, suggesting a way to repair the discrepancy between
diffusion
Diffusion is the net movement of anything (for example, atoms, ions, molecules, energy) generally from a region of higher concentration to a region of lower concentration. Diffusion is driven by a gradient in Gibbs free energy or chemical p ...
models and quantum predictions, like
Anderson localization
In condensed matter physics, Anderson localization (also known as strong localization) is the absence of diffusion of waves in a ''disordered'' medium. This phenomenon is named after the American physicist P. W. Anderson, who was the first to sug ...
.
Basic model
Consider a
graph
Graph may refer to:
Mathematics
*Graph (discrete mathematics), a structure made of vertices and edges
**Graph theory, the study of such graphs and their properties
*Graph (topology), a topological space resembling a graph in the sense of discre ...
with
vertices, defined by an
adjacency matrix
In graph theory and computer science, an adjacency matrix is a square matrix used to represent a finite graph. The elements of the matrix indicate whether pairs of vertices are adjacent or not in the graph.
In the special case of a finite simp ...
:
if there is an edge from vertex
to
, 0 otherwise. For simplicity assume it is an
undirected graph
In discrete mathematics, and more specifically in graph theory, a graph is a structure amounting to a set of objects in which some pairs of the objects are in some sense "related". The objects correspond to mathematical abstractions called '' v ...
, which corresponds to a symmetric
; however, MERW can also be generalized for directed and
weighted graphs
A weight function is a mathematical device used when performing a sum, integral, or average to give some elements more "weight" or influence on the result than other elements in the same set. The result of this application of a weight function is ...
(for example
Boltzmann distribution
In statistical mechanics and mathematics, a Boltzmann distribution (also called Gibbs distribution Translated by J.B. Sykes and M.J. Kearsley. See section 28) is a probability distribution or probability measure that gives the probability t ...
among paths instead of uniform).
We would like to choose a random walk as a
Markov process
A Markov chain or Markov process is a stochastic model describing a sequence of possible events in which the probability of each event depends only on the state attained in the previous event. Informally, this may be thought of as, "What happe ...
on this graph: for every vertex
and its outgoing edge to
, choose probability
of the walker randomly using this edge after visiting
. Formally, find a
stochastic matrix
In mathematics, a stochastic matrix is a square matrix used to describe the transitions of a Markov chain. Each of its entries is a nonnegative real number representing a probability. It is also called a probability matrix, transition matrix, ...
(containing the transition probabilities of a Markov chain) such that
*
for all
and
*
for all
.
Assuming this graph is connected and not
periodic,
ergodic theory
Ergodic theory (Greek: ' "work", ' "way") is a branch of mathematics that studies statistical properties of deterministic dynamical systems; it is the study of ergodicity. In this context, statistical properties means properties which are expres ...
says that evolution of this
stochastic process
In probability theory and related fields, a stochastic () or random process is a mathematical object usually defined as a family of random variables. Stochastic processes are widely used as mathematical models of systems and phenomena that appea ...
leads to some stationary probability distribution
such that
.
Using
Shannon entropy
Shannon may refer to:
People
* Shannon (given name)
* Shannon (surname)
* Shannon (American singer), stage name of singer Shannon Brenda Greene (born 1958)
* Shannon (South Korean singer), British-South Korean singer and actress Shannon Arrum Wi ...
for every vertex and averaging over probability of visiting this vertex (to be able to use its entropy), we get the following formula for average entropy production (
entropy rate
In the mathematical theory of probability, the entropy rate or source information rate of a stochastic process is, informally, the time density of the average information in a stochastic process. For stochastic processes with a countable index, the ...
) of the stochastic process:
:
This definition turns out to be equivalent to the asymptotic average entropy (per length) of the probability distribution in the space of paths for this stochastic process.
In the standard random walk, referred to here as generic random walk (GRW), we naturally choose that each outgoing edge is equally probable:
:
.
For a symmetric
it leads to a stationary probability distribution
with
:
.
It locally maximizes entropy production (uncertainty) for every vertex, but usually leads to a suboptimal averaged global entropy rate
.
MERW chooses the stochastic matrix which maximizes
, or equivalently assumes uniform probability distribution among all paths in a given graph. Its formula is obtained by first calculating the dominant
eigenvalue
In linear algebra, an eigenvector () or characteristic vector of a linear transformation is a nonzero vector that changes at most by a scalar factor when that linear transformation is applied to it. The corresponding eigenvalue, often denoted b ...
and corresponding
eigenvector
In linear algebra, an eigenvector () or characteristic vector of a linear transformation is a nonzero vector that changes at most by a scalar factor when that linear transformation is applied to it. The corresponding eigenvalue, often denoted b ...
of the adjacency matrix, i.e. the largest
with corresponding
such that
. Then stochastic matrix and stationary probability distribution are given by
:
for which every possible path of length
from the
-th to
-th vertex has probability
:
.
Its entropy rate is
and the stationary probability distribution
is
:
.
In contrast to GRW, the MERW transition probabilities generally depend on the structure of the entire graph (are nonlocal). Hence, they should not be imagined as directly applied by the walker – if random-looking decisions are made based on the local situation, like a person would make, the GRW approach is more appropriate. MERW is based on the
principle of maximum entropy
The principle of maximum entropy states that the probability distribution which best represents the current state of knowledge about a system is the one with largest entropy, in the context of precisely stated prior data (such as a proposition ...
, making it the safest assumption when we don't have any additional knowledge about the system. For example, it would be appropriate for modelling our knowledge about an object performing some complex dynamics – not necessarily random, like a particle.
Sketch of derivation
Assume for simplicity that the considered graph is indirected, connected and aperiodic, allowing to conclude from the
Perron–Frobenius theorem
In matrix theory, the Perron–Frobenius theorem, proved by and , asserts that a real square matrix with positive entries has a unique largest real eigenvalue and that the corresponding eigenvector can be chosen to have strictly positive component ...
that the dominant eigenvector is unique. Hence
can be asymptotically (
) approximated by
(or
in
bra–ket notation
In quantum mechanics, bra–ket notation, or Dirac notation, is used ubiquitously to denote quantum states. The notation uses angle brackets, and , and a vertical bar , to construct "bras" and "kets".
A ket is of the form , v \rangle. Mathema ...
).
MERW requires uniform distribution along paths. The number
of paths with length
and vertex
in the center is
:
,
hence for all
,
:
.
Analogously calculating probability distribution for two succeeding vertices, one obtains that the probability of being at the
-th vertex and next at the
-th vertex is
:
.
Dividing by the probability of being at the
-th vertex, i.e.
, gives for the
conditional probability
In probability theory, conditional probability is a measure of the probability of an event occurring, given that another event (by assumption, presumption, assertion or evidence) has already occurred. This particular method relies on event B occur ...
of the
-th vertex being next after the
-th vertex
:
.
Weighted MERW: Boltzmann path ensemble
We have assumed that
for MERW corresponding to uniform ensemble among paths. However, the above derivation works for real nonnegative
. Parametrizing
and asking for probability of length
path
, we get:
:
As in
Boltzmann distribution
In statistical mechanics and mathematics, a Boltzmann distribution (also called Gibbs distribution Translated by J.B. Sykes and M.J. Kearsley. See section 28) is a probability distribution or probability measure that gives the probability t ...
of paths for energy defined as sum of
over given path. For example it allows to calculate probability distribution of patterns in
Ising model
The Ising model () (or Lenz-Ising model or Ising-Lenz model), named after the physicists Ernst Ising and Wilhelm Lenz, is a mathematical model of ferromagnetism in statistical mechanics. The model consists of discrete variables that represent ...
.
Examples
Let us first look at a simple nontrivial situation:
Fibonacci coding
In mathematics and computing, Fibonacci coding is a universal code which encodes positive integers into binary code words. It is one example of representations of integers based on Fibonacci numbers. Each code word ends with "11" and contains no ...
, where we want to transmit a message as a sequence of 0s and 1s, but not using two successive 1s: after a 1 there has to be a 0. To maximize the amount of information transmitted in such sequence, we should assume uniform probability distribution in the space of all possible sequences fulfilling this constraint. To practically use such long sequences, after 1 we have to use 0, but there remains a freedom of choosing the probability of 0 after 0. Let us denote this probability by
, then
entropy coding
In information theory, an entropy coding (or entropy encoding) is any lossless data compression method that attempts to approach the lower bound declared by Shannon's source coding theorem, which states that any lossless data compression method ...
would allow encoding a message using this chosen probability distribution. The stationary probability distribution of symbols for a given
turns out to be
. Hence, entropy production is
, which is maximized for
, known as the
golden ratio
In mathematics, two quantities are in the golden ratio if their ratio is the same as the ratio of their sum to the larger of the two quantities. Expressed algebraically, for quantities a and b with a > b > 0,
where the Greek letter phi ( ...
. In contrast, standard random walk would choose suboptimal
. While choosing larger
reduces the amount of information produced after 0, it also reduces frequency of 1, after which we cannot write any information.
A more complex example is the defected one-dimensional cyclic lattice: let say 1000 nodes connected in a ring, for which all nodes but the defects have a self-loop (edge to itself). In standard random walk (GRW) the stationary probability distribution would have defect probability being 2/3 of probability of the non-defect vertices – there is nearly no localization, also analogously for standard
diffusion
Diffusion is the net movement of anything (for example, atoms, ions, molecules, energy) generally from a region of higher concentration to a region of lower concentration. Diffusion is driven by a gradient in Gibbs free energy or chemical p ...
, which is infinitesimal limit of GRW. For MERW we have to first find the dominant eigenvector of the adjacency matrix – maximizing
in:
for all positions
, where
for defects, 0 otherwise. Substituting
and multiplying the equation by −1 we get:
where
is minimized now, becoming the analog of energy. The formula inside the bracket is
discrete Laplace operator
In mathematics, the discrete Laplace operator is an analog of the continuous Laplace operator, defined so that it has meaning on a Graph (discrete mathematics), graph or a lattice (group), discrete grid. For the case of a finite-dimensional graph ...
, making this equation a discrete analogue of stationary
Schrodinger equation. As in
quantum mechanics
Quantum mechanics is a fundamental theory in physics that provides a description of the physical properties of nature at the scale of atoms and subatomic particles. It is the foundation of all quantum physics including quantum chemistry, ...
, MERW predicts that the probability distribution should lead exactly to the one of quantum
ground state
The ground state of a quantum-mechanical system is its stationary state of lowest energy; the energy of the ground state is known as the zero-point energy of the system. An excited state is any state with energy greater than the ground state. ...
:
with its strongly localized density (in contrast to standard diffusion). Taking the
infinitesimal
In mathematics, an infinitesimal number is a quantity that is closer to zero than any standard real number, but that is not zero. The word ''infinitesimal'' comes from a 17th-century Modern Latin coinage ''infinitesimus'', which originally referr ...
limit, we can get standard continuous stationary (time-independent) Schrodinger equation (
for
) here.
[J. Duda, ''Extended Maximal Entropy Random Walk'']
PhD Thesis, 2012.
See also
*
Principle of maximum entropy
The principle of maximum entropy states that the probability distribution which best represents the current state of knowledge about a system is the one with largest entropy, in the context of precisely stated prior data (such as a proposition ...
*
Eigenvector centrality
In graph theory, eigenvector centrality (also called eigencentrality or prestige score) is a measure of the influence of a node in a network. Relative scores are assigned to all nodes in the network based on the concept that connections to high-sco ...
*
Markov chain
A Markov chain or Markov process is a stochastic model describing a sequence of possible events in which the probability of each event depends only on the state attained in the previous event. Informally, this may be thought of as, "What happe ...
*
Anderson localization
In condensed matter physics, Anderson localization (also known as strong localization) is the absence of diffusion of waves in a ''disordered'' medium. This phenomenon is named after the American physicist P. W. Anderson, who was the first to sug ...
References
{{reflist
External links
* Gábor Simonyi
Y. Lin, Z. Zhang, "Mean first-passage time for maximal-entropy random walks in complex networks" Scientific Reports, 2014.
*
Electron Conductance Models Using Maximal Entropy Random WalksWolfram Demonstration Project
Network theory
Diffusion
Information theory
Quantum mechanics