In mathematics, a stochastic matrix is a

square matrix In mathematics, a square matrix is a Matrix (mathematics), matrix with the same number of rows and columns. An ''n''-by-''n'' matrix is known as a square matrix of order Any two square matrices of the same order can be added and multiplied. Squ ...

used to describe the transitions of a

Markov chain In probability theory and statistics, a Markov chain or Markov process is a stochastic process describing a sequence of possible events in which the probability of each event depends only on the state attained in the previous event. Informally ...

. Each of its entries is a

nonnegative In mathematics, the sign of a real number is its property of being either positive, negative, or 0. Depending on local conventions, zero may be considered as having its own unique sign, having no sign, or having both positive and negative sign. ...

real number In mathematics, a real number is a number that can be used to measure a continuous one- dimensional quantity such as a duration or temperature. Here, ''continuous'' means that pairs of values can have arbitrarily small differences. Every re ...

representing a

probability Probability is a branch of mathematics and statistics concerning events and numerical descriptions of how likely they are to occur. The probability of an event is a number between 0 and 1; the larger the probability, the more likely an e ...

. It is also called a probability matrix, transition matrix, ''

substitution matrix In bioinformatics and evolutionary biology, a substitution matrix describes the frequency at which a character in a Nucleic acid sequence, nucleotide sequence or a Protein primary structure, protein sequence changes to other character states ove ...

'', or Markov matrix. The stochastic matrix was first developed by

Andrey Markov Andrey Andreyevich Markov (14 June 1856 – 20 July 1922) was a Russian mathematician best known for his work on stochastic processes. A primary subject of his research later became known as the Markov chain. He was also a strong, close to mas ...

at the beginning of the 20th century, and has found use throughout a wide variety of scientific fields, including

probability theory Probability theory or probability calculus is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expre ...

, statistics,

mathematical finance Mathematical finance, also known as quantitative finance and financial mathematics, is a field of applied mathematics, concerned with mathematical modeling in the financial field. In general, there exist two separate branches of finance that req ...

and

linear algebra Linear algebra is the branch of mathematics concerning linear equations such as :a_1x_1+\cdots +a_nx_n=b, linear maps such as :(x_1, \ldots, x_n) \mapsto a_1x_1+\cdots +a_nx_n, and their representations in vector spaces and through matrix (mathemat ...

, as well as

computer science Computer science is the study of computation, information, and automation. Computer science spans Theoretical computer science, theoretical disciplines (such as algorithms, theory of computation, and information theory) to Applied science, ...

and

population genetics Population genetics is a subfield of genetics that deals with genetic differences within and among populations, and is a part of evolutionary biology. Studies in this branch of biology examine such phenomena as Adaptation (biology), adaptation, s ...

. There are several different definitions and types of stochastic matrices: *A right stochastic matrix is a square matrix of nonnegative real numbers, with each row summing to 1 (so it is also called a row stochastic matrix). *A left stochastic matrix is a square matrix of nonnegative real numbers, with each column summing to 1 (so it is also called a column stochastic matrix). *A ''

doubly stochastic matrix In mathematics, especially in probability and combinatorics, a doubly stochastic matrix (also called bistochastic matrix) is a square matrix X=(x_) of nonnegative real numbers, each of whose rows and columns sums to 1, i.e., :\sum_i x_=\sum_j x_ ...

'' is a square matrix of nonnegative real numbers with each row and column summing to 1. *A substochastic matrix is a real square matrix whose row sums are all

\le1.

In the same vein, one may define a probability vector as a

vector Vector most often refers to: * Euclidean vector, a quantity with a magnitude and a direction * Disease vector, an agent that carries and transmits an infectious pathogen into another living organism Vector may also refer to: Mathematics a ...

whose elements are nonnegative real numbers which sum to 1. Thus, each row of a right stochastic matrix (or column of a left stochastic matrix) is a probability vector. Right stochastic matrices act upon

row vector In linear algebra, a column vector with elements is an m \times 1 matrix consisting of a single column of entries, for example, \boldsymbol = \begin x_1 \\ x_2 \\ \vdots \\ x_m \end. Similarly, a row vector is a 1 \times n matrix for some , co ...

s of probabilities by multiplication from the right (hence their name) and the matrix entry in the -th row and -th column is the probability of transition from state to state . Left stochastic matrices act upon

column vector In linear algebra, a column vector with elements is an m \times 1 matrix consisting of a single column of entries, for example, \boldsymbol = \begin x_1 \\ x_2 \\ \vdots \\ x_m \end. Similarly, a row vector is a 1 \times n matrix for some , c ...

s of probabilities by multiplication from the left (hence their name) and the matrix entry in the -th row and -th column is the probability of transition from state to state . This article uses the right/row stochastic matrix convention.

History

The stochastic matrix was developed alongside the Markov chain by

, a Russian mathematician and professor at

St. Petersburg University Saint Petersburg State University (SPBGU; ) is a public research university in Saint Petersburg, Russia, and one of the oldest and most prestigious universities in Russia. Founded in 1724 by a decree of Peter the Great, the university from the be ...

who first published on the topic in 1906. His initial intended uses were for linguistic analysis and other mathematical subjects like card shuffling, but both Markov chains and matrices rapidly found use in other fields. Stochastic matrices were further developed by scholars such as

Andrey Kolmogorov Andrey Nikolaevich Kolmogorov ( rus, Андре́й Никола́евич Колмого́ров, p=ɐnˈdrʲej nʲɪkɐˈlajɪvʲɪtɕ kəlmɐˈɡorəf, a=Ru-Andrey Nikolaevich Kolmogorov.ogg, 25 April 1903 – 20 October 1987) was a Soviet ...

, who expanded their possibilities by allowing for continuous-time Markov processes. By the 1950s, articles using stochastic matrices had appeared in the fields of

econometrics Econometrics is an application of statistical methods to economic data in order to give empirical content to economic relationships. M. Hashem Pesaran (1987). "Econometrics", '' The New Palgrave: A Dictionary of Economics'', v. 2, p. 8 p. 8 ...

and

circuit theory Circuit may refer to: Science and technology Electrical engineering * Electrical circuit, a complete electrical network with a closed-loop giving a return path for current ** Analog circuit, uses continuous signal levels ** Balanced circu ...

. In the 1960s, stochastic matrices appeared in an even wider variety of scientific works, from

behavioral science Behavioural science is the branch of science concerned with Human behavior, human behaviour.Hallsworth, M. (2023). A manifesto for applying behavioural science. ''Nature Human Behaviour'', ''7''(3), 310-322. While the term can technically be ap ...

to geology to residential planning. In addition, much mathematical work was also done through these decades to improve the range of uses and functionality of the stochastic matrix and Markovian processes more generally. From the 1970s to present, stochastic matrices have found use in almost every field that requires formal analysis, from structural science to

medical diagnosis Medical diagnosis (abbreviated Dx, Dx, or Ds) is the process of determining which disease or condition explains a person's symptoms and signs. It is most often referred to as a diagnosis with the medical context being implicit. The information ...

to personnel management. In addition, stochastic matrices have found wide use in

land change modeling Land change models (LCMs) describe, project, and explain changes in and the dynamics of land use and Land cover, land-cover. LCMs are a means of understanding ways that humans change the Earth's surface in the past, present, and future. Land chan ...

, usually under the term Markov matrix.

Definition and properties

A stochastic matrix describes a

over a finite

state space In computer science, a state space is a discrete space representing the set of all possible configurations of a system. It is a useful abstraction for reasoning about the behavior of a given system and is widely used in the fields of artificial ...

with

cardinality The thumb is the first digit of the hand, next to the index finger. When a person is standing in the medical anatomical position (where the palm is facing to the front), the thumb is the outermost digit. The Medical Latin English noun for thum ...

. If the

of moving from to in one time step is , the stochastic matrix is given by using as the -th row and -th column element, e.g.,

P=\left[\begin
P_&P_&\dots&P_&\dots&P_\\
P_&P_&\dots&P_&\dots&P_\\
\vdots&\vdots&\ddots&\vdots&\ddots&\vdots\\
P_&P_&\dots&P_&\dots&P_\\
\vdots&\vdots&\ddots&\vdots&\ddots&\vdots\\
P_&P_&\dots&P_&\dots&P_\\
\end\right].

Since the total of transition probability from a state to all other states must be 1,

\forall i \in \,\quad \sum_^\alpha P_=1;\,

thus this matrix is a right stochastic matrix. The above elementwise sum across each row of may be more concisely written as , where is the -dimensional column vector of all ones. Using this, it can be seen that the product of two right stochastic matrices and is also right stochastic: . In general, the -th power of a right stochastic matrix is also right stochastic. The probability of transitioning from to in two steps is then given by the -th element of the square of :

\left(P ^\right)_.

In general, the probability transition of going from any state to another state in a finite Markov chain given by the matrix in steps is given by . An initial probability distribution of states, specifying where the system might be initially and with what probabilities, is given as a

. A ''stationary'' probability vector is defined as a distribution, written as a row vector, that does not change under application of the transition matrix; that is, it is defined as a probability distribution on the set which is also a left eigenvector of the probability matrix, associated with

eigenvalue In linear algebra, an eigenvector ( ) or characteristic vector is a vector that has its direction unchanged (or reversed) by a given linear transformation. More precisely, an eigenvector \mathbf v of a linear transformation T is scaled by a ...

\boldsymbolP=\boldsymbol.

It can be shown that the

spectral radius ''Spectral'' is a 2016 Hungarian-American military science fiction action film co-written and directed by Nic Mathieu. Written with Ian Fried (screenwriter), Ian Fried & George Nolfi, the film stars James Badge Dale as DARPA research scientist Ma ...

of any stochastic matrix is one. By the Gershgorin circle theorem, all of the eigenvalues of a stochastic matrix have absolute values less than or equal to one. More precisely, the eigenvalues of

n

-by-

n

stochastic matrices are restricted to lie within a subset of the complex unit disk, known as Karpelevič regions. This result was originally obtained by Fridrikh Karpelevich, following a question originally posed by Kolmogorov and partially addressed by Nikolay Dmitriyev and Eugene Dynkin. Additionally, every right stochastic matrix has an "obvious" column eigenvector associated to the eigenvalue 1: the vector used above, whose coordinates are all equal to 1. As left and right eigenvalues of a square matrix are the same, every stochastic matrix has, at least, a left eigenvector associated to the

1 and the largest absolute value of all its eigenvalues is also 1. Finally, the Brouwer Fixed Point Theorem (applied to the compact convex set of all probability distributions of the finite set ) implies that there is some left eigenvector which is also a stationary probability vector. On the other hand, the Perron–Frobenius theorem also ensures that every irreducible stochastic matrix has such a stationary vector, and that the largest absolute value of an eigenvalue is always 1. However, this theorem cannot be applied directly to such matrices because they need not be irreducible. In general, there may be several such vectors. However, for a matrix with strictly positive entries (or, more generally, for an irreducible aperiodic stochastic matrix), this vector is unique and can be computed by observing that for any we have the following limit,

\lim_\left(P^k \right)_=\boldsymbol_j,

where is the -th element of the row vector . Among other things, this says that the long-term probability of being in a state is independent of the initial state . That both of these computations give the same stationary vector is a form of an

ergodic theorem Ergodic theory is a branch of mathematics that studies statistical properties of deterministic dynamical systems; it is the study of ergodicity. In this context, "statistical properties" refers to properties which are expressed through the behav ...

, which is generally true in a wide variety of dissipative dynamical systems: the system evolves, over time, to a

stationary state A stationary state is a quantum state with all observables independent of time. It is an eigenvector of the energy operator (instead of a quantum superposition of different energies). It is also called energy eigenvector, energy eigenstate, ene ...

. Intuitively, a stochastic matrix represents a Markov chain; the application of the stochastic matrix to a probability distribution redistributes the probability mass of the original distribution while preserving its total mass. If this process is applied repeatedly, the distribution converges to a stationary distribution for the Markov chain. Stochastic matrices and their product form a

category Category, plural categories, may refer to: General uses *Classification, the general act of allocating things to classes/categories Philosophy * Category of being * ''Categories'' (Aristotle) * Category (Kant) * Categories (Peirce) * Category ( ...

, which is both a subcategory of the

category of matrices In mathematics, the category of matrices, often denoted \mathbf, is the category (category theory), category whose object (category theory), objects are natural numbers and whose morphisms are matrix (mathematics), matrices, with composition give ...

and of the one of

Markov kernel In probability theory, a Markov kernel (also known as a stochastic kernel or probability kernel) is a map that in the general theory of Markov processes plays the role that the transition matrix does in the theory of Markov processes with a finit ...

Example: Cat and mouse

Suppose there is a timer and a row of five adjacent boxes. At time zero, a cat is in the first box, and a mouse is in the fifth box. The cat and the mouse both jump to a random adjacent box when the timer advances. For example, if the cat is in the second box and the mouse is in the fourth, the probability that ''the cat will be in the first box and the mouse in the fifth after the timer advances'' is one fourth. If the cat is in the first box and the mouse is in the fifth, the probability that ''the cat will be in box two and the mouse will be in box four after the timer advances'' is one. The cat eats the mouse if both end up in the same box, at which time the game ends. Let the

random variable A random variable (also called random quantity, aleatory variable, or stochastic variable) is a Mathematics, mathematical formalization of a quantity or object which depends on randomness, random events. The term 'random variable' in its mathema ...

''K'' be the time the mouse stays in the game. The

that represents this game contains the following five states specified by the combination of positions (cat,mouse). Note that while a naive enumeration of states would list 25 states, many are impossible either because the mouse can never have a lower index than the cat (as that would mean the mouse occupied the cat's box and survived to move past it), or because the sum of the two indices will always have even parity. In addition, the 3 possible states that lead to the mouse's death are combined into one: * State 1: (1,3) * State 2: (1,5) * State 3: (2,4) * State 4: (3,5) * State 5: game over: (2,2), (3,3) & (4,4). We use a stochastic matrix,

P

(below), to represent the transition probabilities of this system (rows and columns in this matrix are indexed by the possible states listed above, with the pre-transition state as the row and post-transition state as the column). For instance, starting from state 1 – 1st row – it is impossible for the system to stay in this state, so

P_=0

; the system also cannot transition to state 2 – because the cat would have stayed in the same box – so

P_=0

, and by a similar argument for the mouse,

P_=0

. Transitions to states 3 or 5 are allowed, and thus

P_,P_\neq0

P = 
\begin
    0  & 0    & 1/2  &   0  & 1/2 \\
    0  & 0    & 1    &   0  & 0 \\
  1/4  & 1/4  & 0    & 1/4  & 1/4 \\
    0  & 0    & 1/2  &   0  & 1/2 \\
    0  & 0    & 0    &   0  & 1
\end.

Long-term averages

No matter what the initial state, the cat will eventually catch the mouse (with probability 1) and a stationary state π = (0,0,0,0,1) is approached as a limit. To compute the long-term average or expected value of a stochastic variable

Y

, for each state

S_j

and time

t_k

there is a contribution of

Y_\cdot P(S=S_j,t=t_k)

. Survival can be treated as a binary variable with

Y=1

for a surviving state and

Y=0

for the terminated state. The states with

Y=0

do not contribute to the long-term average.

Phase-type representation

As State 5 is an absorbing state, the distribution of time to absorption is discrete phase-type distributed. Suppose the system starts in state 2, represented by the vector

,1,0,0,0 /math>. The states where the mouse has perished don't contribute to the survival average so state five can be ignored. The initial state and transition matrix can be reduced to, \boldsymbol=,1,0,0 \qquad T=\begin
    0          & 0           & \frac  &   0\\
    0          & 0           & 1            &   0\\
  \frac  & \frac &           0  & \frac\\
    0          & 0           & \frac  &   0
\end, and (I-T)^\boldsymbol =\begin2.75 \\ 4.5 \\ 3.5 \\ 2.75\end, where I is the

identity matrix In linear algebra, the identity matrix of size n is the n\times n square matrix with ones on the main diagonal and zeros elsewhere. It has unique properties, for example when the identity matrix represents a geometric transformation, the obje ...

, and

\mathbf

represents a column matrix of all ones that acts as a sum over states. Since each state is occupied for one step of time the expected time of the mouse's survival is just the sum of the probability of occupation over all surviving states and steps in time,

\boldsymbol \left (I+T+T^2+\cdots \right )\boldsymbol=\boldsymbol(I-T)^\boldsymbol=4.5.

Higher order moments are given by

n!\boldsymbol(I-)^^\mathbf\,.

References