In mathematics, a stochastic matrix is a
square matrix
In mathematics, a square matrix is a matrix with the same number of rows and columns. An ''n''-by-''n'' matrix is known as a square matrix of order Any two square matrices of the same order can be added and multiplied.
Square matrices are often ...
used to describe the transitions of a
Markov chain
A Markov chain or Markov process is a stochastic model describing a sequence of possible events in which the probability of each event depends only on the state attained in the previous event. Informally, this may be thought of as, "What happe ...
. Each of its entries is a
nonnegative
In mathematics, the sign of a real number is its property of being either positive, negative, or zero. Depending on local conventions, zero may be considered as being neither positive nor negative (having no sign or a unique third sign), or it ...
real number
In mathematics, a real number is a number that can be used to measure a ''continuous'' one-dimensional quantity such as a distance, duration or temperature. Here, ''continuous'' means that values can have arbitrarily small variations. Every real ...
representing a
probability
Probability is the branch of mathematics concerning numerical descriptions of how likely an Event (probability theory), event is to occur, or how likely it is that a proposition is true. The probability of an event is a number between 0 and ...
.
It is also called a probability matrix, transition matrix,
substitution matrix
In bioinformatics and evolutionary biology, a substitution matrix describes the frequency at which a character in a nucleotide sequence or a protein sequence changes to other character states over evolutionary time. The information is often in t ...
, or Markov matrix.
[ The stochastic matrix was first developed by ]Andrey Markov
Andrey Andreyevich Markov, first name also spelled "Andrei", in older works also spelled Markoff) (14 June 1856 – 20 July 1922) was a Russian mathematician best known for his work on stochastic processes. A primary subject of his research lat ...
at the beginning of the 20th century, and has found use throughout a wide variety of scientific fields, including probability theory
Probability theory is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expressing it through a set o ...
, statistics, mathematical finance
Mathematical finance, also known as quantitative finance and financial mathematics, is a field of applied mathematics, concerned with mathematical modeling of financial markets.
In general, there exist two separate branches of finance that require ...
and linear algebra
Linear algebra is the branch of mathematics concerning linear equations such as:
:a_1x_1+\cdots +a_nx_n=b,
linear maps such as:
:(x_1, \ldots, x_n) \mapsto a_1x_1+\cdots +a_nx_n,
and their representations in vector spaces and through matrices.
...
, as well as computer science
Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to Applied science, practical discipli ...
and population genetics
Population genetics is a subfield of genetics that deals with genetic differences within and between populations, and is a part of evolutionary biology. Studies in this branch of biology examine such phenomena as adaptation, speciation, and pop ...
.[ There are several different definitions and types of stochastic matrices:][
:A right stochastic matrix is a real square matrix, with each row summing to 1.
:A left stochastic matrix is a real square matrix, with each column summing to 1.
:A ]doubly stochastic matrix In mathematics, especially in probability and combinatorics, a doubly stochastic matrix
(also called bistochastic matrix) is a square matrix X=(x_) of nonnegative real numbers, each of whose rows and columns sums to 1, i.e.,
:\sum_i x_=\sum_j x_=1 ...
is a square matrix of nonnegative real numbers with each row and column summing to 1.
In the same vein, one may define a stochastic vector (also called probability vector) as a vector
Vector most often refers to:
*Euclidean vector, a quantity with a magnitude and a direction
*Vector (epidemiology), an agent that carries and transmits an infectious pathogen into another living organism
Vector may also refer to:
Mathematic ...
whose elements are nonnegative real numbers which sum to 1. Thus, each row of a right stochastic matrix (or column of a left stochastic matrix) is a stochastic vector.[ A common convention in English language mathematics literature is to use ]row vector
In linear algebra, a column vector with m elements is an m \times 1 matrix consisting of a single column of m entries, for example,
\boldsymbol = \begin x_1 \\ x_2 \\ \vdots \\ x_m \end.
Similarly, a row vector is a 1 \times n matrix for some n, c ...
s of probabilities and right stochastic matrices rather than column vector
In linear algebra, a column vector with m elements is an m \times 1 matrix consisting of a single column of m entries, for example,
\boldsymbol = \begin x_1 \\ x_2 \\ \vdots \\ x_m \end.
Similarly, a row vector is a 1 \times n matrix for some n, c ...
s of probabilities and left stochastic matrices; this article follows that convention.[ In addition, a substochastic matrix is a real square matrix whose row sums are all
]
History
The stochastic matrix was developed alongside the Markov chain by Andrey Markov
Andrey Andreyevich Markov, first name also spelled "Andrei", in older works also spelled Markoff) (14 June 1856 – 20 July 1922) was a Russian mathematician best known for his work on stochastic processes. A primary subject of his research lat ...
, a Russian mathematician and professor at St. Petersburg University
Saint Petersburg State University (SPBU; russian: Санкт-Петербургский государственный университет) is a public research university in Saint Petersburg, Russia. Founded in 1724 by a decree of Peter the ...
who first published on the topic in 1906. His initial intended uses were for linguistic analysis and other mathematical subjects like card shuffling
Shuffling is a procedure used to randomize a deck of playing cards to provide an element of chance in card games. Shuffling is often followed by a cut, to help ensure that the shuffler has not manipulated the outcome.
__TOC__
Techniques
Overha ...
, but both Markov chains and matrices rapidly found use in other fields.
Stochastic matrices were further developed by scholars like Andrey Kolmogorov
Andrey Nikolaevich Kolmogorov ( rus, Андре́й Никола́евич Колмого́ров, p=ɐnˈdrʲej nʲɪkɐˈlajɪvʲɪtɕ kəlmɐˈɡorəf, a=Ru-Andrey Nikolaevich Kolmogorov.ogg, 25 April 1903 – 20 October 1987) was a Sovi ...
, who expanded their possibilities by allowing for continuous-time Markov processes. By the 1950s, articles using stochastic matrices had appeared in the fields of econometrics
Econometrics is the application of Statistics, statistical methods to economic data in order to give Empirical evidence, empirical content to economic relationships.M. Hashem Pesaran (1987). "Econometrics," ''The New Palgrave: A Dictionary of ...
and circuit theory
Circuit may refer to:
Science and technology
Electrical engineering
* Electrical circuit, a complete electrical network with a closed-loop giving a return path for current
** Analog circuit, uses continuous signal levels
** Balanced circui ...
. In the 1960s, stochastic matrices appeared in an even wider variety of scientific works, from behavioral science
Behavioral sciences explore the cognitive processes within organisms and the behavioral interactions between organisms in the natural world. It involves the systematic analysis and investigation of human and animal behavior through naturalistic o ...
to geology to residential planning. In addition, much mathematical work was also done through these decades to improve the range of uses and functionality of the stochastic matrix and Markovian processes more generally.
From the 1970s to present, stochastic matrices have found use in almost every field that requires formal analysis, from structural science to medical diagnosis
Medical diagnosis (abbreviated Dx, Dx, or Ds) is the process of determining which disease or condition explains a person's symptoms and signs. It is most often referred to as diagnosis with the medical context being implicit. The information re ...
to personnel management
Employment is a relationship between two parties regulating the provision of paid labour services. Usually based on a contract, one party, the employer, which might be a corporation, a not-for-profit organization, a co-operative, or any other ...
. In addition, stochastic matrices have found wide use in land change modeling
Land change models (LCMs) describe, project, and explain changes in and the dynamics of land use and land-cover. LCMs are a means of understanding ways that humans change the Earth's surface in the past, present, and future.
Land change models a ...
, usually under the term Markov matrix.
Definition and properties
A stochastic matrix describes a Markov chain
A Markov chain or Markov process is a stochastic model describing a sequence of possible events in which the probability of each event depends only on the state attained in the previous event. Informally, this may be thought of as, "What happe ...
over a finite
Finite is the opposite of infinite. It may refer to:
* Finite number (disambiguation)
* Finite set, a set whose cardinality (number of elements) is some natural number
* Finite verb, a verb form that has a subject, usually being inflected or marked ...
state space
A state space is the set of all possible configurations of a system. It is a useful abstraction for reasoning about the behavior of a given system and is widely used in the fields of artificial intelligence and game theory.
For instance, the toy ...
with cardinality
In mathematics, the cardinality of a set is a measure of the number of elements of the set. For example, the set A = \ contains 3 elements, and therefore A has a cardinality of 3. Beginning in the late 19th century, this concept was generalized ...
.
If the probability
Probability is the branch of mathematics concerning numerical descriptions of how likely an Event (probability theory), event is to occur, or how likely it is that a proposition is true. The probability of an event is a number between 0 and ...
of moving from to in one time step is , the stochastic matrix is given by using as the -th row and -th column element, e.g.,
:
Since the total of transition probability from a state to all other states must be 1,
:
thus this matrix is a right stochastic matrix.[
The above elementwise sum across each row of may be more concisely written as , where is the -dimensional column vector of all ones. Using this, it can be seen that the product of two right stochastic matrices and is also right stochastic: . In general, the -th power of a right stochastic matrix is also right stochastic. The probability of transitioning from to in two steps is then given by the -th element of the square of :
:
In general, the probability transition of going from any state to another state in a finite Markov chain given by the matrix in steps is given by .
An initial probability distribution of states, specifying where the system might be initially and with what probabilities, is given as a ]row vector
In linear algebra, a column vector with m elements is an m \times 1 matrix consisting of a single column of m entries, for example,
\boldsymbol = \begin x_1 \\ x_2 \\ \vdots \\ x_m \end.
Similarly, a row vector is a 1 \times n matrix for some n, c ...
.
A ''stationary'' probability vector
In mathematics and statistics, a probability vector or stochastic vector is a vector with non-negative entries that add up to one.
The positions (indices) of a probability vector represent the possible outcomes of a discrete random variable, and ...
is defined as a distribution, written as a row vector, that does not change under application of the transition matrix; that is, it is defined as a probability distribution on the set which is also a row eigenvector
In linear algebra, an eigenvector () or characteristic vector of a linear transformation is a nonzero vector that changes at most by a scalar factor when that linear transformation is applied to it. The corresponding eigenvalue, often denoted b ...
of the probability matrix, associated with eigenvalue
In linear algebra, an eigenvector () or characteristic vector of a linear transformation is a nonzero vector that changes at most by a scalar factor when that linear transformation is applied to it. The corresponding eigenvalue, often denoted b ...
1:
:
It can be shown that the spectral radius
In mathematics, the spectral radius of a square matrix is the maximum of the absolute values of its eigenvalues. More generally, the spectral radius of a bounded linear operator is the supremum of the absolute values of the elements of its spectru ...
of any stochastic matrix is one. By the Gershgorin circle theorem
In mathematics, the Gershgorin circle theorem may be used to bound the spectrum of a square matrix. It was first published by the Soviet mathematician Semyon Aronovich Gershgorin in 1931. Gershgorin's name has been transliterated in several diffe ...
, all of the eigenvalues of a stochastic matrix have absolute values less than or equal to one. Additionally, every right stochastic matrix has an "obvious" column eigenvector associated to the eigenvalue 1: the vector , whose coordinates are all equal to 1 (just observe that multiplying a row of times equals the sum of the entries of the row and, hence, it equals 1). As left and right eigenvalues of a square matrix are the same, every stochastic matrix has, at least, a row eigenvector
In linear algebra, an eigenvector () or characteristic vector of a linear transformation is a nonzero vector that changes at most by a scalar factor when that linear transformation is applied to it. The corresponding eigenvalue, often denoted b ...
associated to the eigenvalue
In linear algebra, an eigenvector () or characteristic vector of a linear transformation is a nonzero vector that changes at most by a scalar factor when that linear transformation is applied to it. The corresponding eigenvalue, often denoted b ...
1 and the largest absolute value of all its eigenvalues is also 1. Finally, the Brouwer Fixed Point Theorem
Brouwer's fixed-point theorem is a fixed-point theorem in topology, named after L. E. J. (Bertus) Brouwer. It states that for any continuous function f mapping a compact convex set to itself there is a point x_0 such that f(x_0)=x_0. The simplest ...
(applied to the compact convex set of all probability distributions of the finite set ) implies that there is some left eigenvector which is also a stationary probability vector.
On the other hand, the Perron–Frobenius theorem
In matrix theory, the Perron–Frobenius theorem, proved by and , asserts that a real square matrix with positive entries has a unique largest real eigenvalue and that the corresponding eigenvector can be chosen to have strictly positive component ...
also ensures that every irreducible
In philosophy, systems theory, science, and art, emergence occurs when an entity is observed to have properties its parts do not have on their own, properties or behaviors that emerge only when the parts interact in a wider whole.
Emergence ...
stochastic matrix has such a stationary vector, and that the largest absolute value of an eigenvalue is always 1. However, this theorem cannot be applied directly to such matrices because they need not be irreducible.
In general, there may be several such vectors. However, for a matrix with strictly positive entries (or, more generally, for an irreducible aperiodic stochastic matrix), this vector is unique and can be computed by observing that for any we have the following limit,
:
where is the -th element of the row vector . Among other things, this says that the long-term probability of being in a state is independent of the initial state . That both of these computations give the same stationary vector is a form of an ergodic theorem
Ergodic theory (Greek: ' "work", ' "way") is a branch of mathematics that studies statistical properties of deterministic dynamical systems; it is the study of ergodicity. In this context, statistical properties means properties which are expres ...
, which is generally true in a wide variety of dissipative dynamical systems: the system evolves, over time, to a stationary state
A stationary state is a quantum state with all observables independent of time. It is an eigenvector of the energy operator (instead of a quantum superposition of different energies). It is also called energy eigenvector, energy eigenstate, ener ...
.
Intuitively, a stochastic matrix represents a Markov chain; the application of the stochastic matrix to a probability distribution redistributes the probability mass of the original distribution while preserving its total mass. If this process is applied repeatedly, the distribution converges to a stationary distribution for the Markov chain.[
]
Example: the cat and mouse
Suppose there is a timer and a row of five adjacent boxes. At time zero, a cat is in the first box, and a mouse is in the fifth box. The cat and the mouse both jump to a random adjacent box when the timer advances. E.g. if the cat is in the second box and the mouse is in the fourth, the probability that ''the cat will be in the first box and the mouse in the fifth after the timer advances'' is one fourth. If the cat is in the first box and the mouse is in the fifth, the probability that ''the cat will be in box two and the mouse will be in box four after the timer advances'' is one. The cat eats the mouse if both end up in the same box, at which time the game ends. Let the random variable
A random variable (also called random quantity, aleatory variable, or stochastic variable) is a mathematical formalization of a quantity or object which depends on random events. It is a mapping or a function from possible outcomes (e.g., the po ...
''K'' be the time the mouse stays in the game.
The Markov chain
A Markov chain or Markov process is a stochastic model describing a sequence of possible events in which the probability of each event depends only on the state attained in the previous event. Informally, this may be thought of as, "What happe ...
that represents this game contains the following five states specified by the combination of positions (cat,mouse). Note that while a naive enumeration of states would list 25 states, many are impossible either because the mouse can never have a lower index than the cat (as that would mean the mouse occupied the cat's box and survived to move past it), or because the sum of the two indices will always have even parity
Parity may refer to:
* Parity (computing)
** Parity bit in computing, sets the parity of data for the purpose of error detection
** Parity flag in computing, indicates if the number of set bits is odd or even in the binary representation of the r ...
. In addition, the 3 possible states that lead to the mouse's death are combined into one:
* State 1: (1,3)
* State 2: (1,5)
* State 3: (2,4)
* State 4: (3,5)
* State 5: game over: (2,2), (3,3) & (4,4).
We use a stochastic matrix, (below), to represent the transition probabilities
A Markov chain or Markov process is a stochastic model describing a sequence of possible events in which the probability of each event depends only on the state attained in the previous event. Informally, this may be thought of as, "What happe ...
of this system (rows and columns in this matrix are indexed by the possible states listed above, with the pre-transition state as the row and post-transition state as the column).[ For instance, starting from state 1 – 1st row – it is impossible for the system to stay in this state, so ; the system also cannot transition to state 2 – because the cat would have stayed in the same box – so , and by a similar argument for the mouse, . Transitions to states 3 or 5 are allowed, and thus .
:
]
Long-term averages
No matter what the initial state, the cat will eventually catch the mouse (with probability 1) and a stationary state π = (0,0,0,0,1) is approached as a limit.[ To compute the long-term average or expected value of a stochastic variable , for each state and time there is a contribution of . Survival can be treated as a binary variable with for a surviving state and for the terminated state. The states with do not contribute to the long-term average.
]
Phase-type representation
As State 5 is an absorbing state, the distribution of time to absorption is discrete phase-type distributed. Suppose the system starts in state 2, represented by the vector