probability theory Probability theory is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expressing it through a set o ...

, a Markov kernel (also known as a stochastic kernel or probability kernel) is a map that in the general theory of

Markov process A Markov chain or Markov process is a stochastic model describing a sequence of possible events in which the probability of each event depends only on the state attained in the previous event. Informally, this may be thought of as, "What happen ...

es plays the role that the transition matrix does in the theory of Markov processes with a

finite Finite is the opposite of infinite. It may refer to: * Finite number (disambiguation) * Finite set, a set whose cardinality (number of elements) is some natural number * Finite verb Traditionally, a finite verb (from la, fīnītus, past partici ...

state space A state space is the set of all possible configurations of a system. It is a useful abstraction for reasoning about the behavior of a given system and is widely used in the fields of artificial intelligence and game theory. For instance, the t ...

Formal definition

Let

(X,\mathcal A)

and

(Y,\mathcal B)

measurable space In mathematics, a measurable space or Borel space is a basic object in measure theory. It consists of a set and a σ-algebra, which defines the subsets that will be measured. Definition Consider a set X and a σ-algebra \mathcal A on X. Then ...

s. A ''Markov kernel'' with source

(X,\mathcal A)

and target

(Y,\mathcal B)

is a map

\kappa : \mathcal B \times X  \to,1 /math> with the following properties: 
# For every (fixed) B \in \mathcal B, the map x \mapsto \kappa(B, x) is \mathcal A -measurable 
# For every (fixed) x \in X, the map B \mapsto \kappa(B, x) is a

probability measure In mathematics, a probability measure is a real-valued function defined on a set of events in a probability space that satisfies measure properties such as ''countable additivity''. The difference between a probability measure and the more g ...

(Y, \mathcal B)

In other words it associates to each point

x \in X

\kappa(dy, x): B \mapsto \kappa(B, x)

(Y,\mathcal B)

such that, for every measurable set

B\in\mathcal B

, the map

x\mapsto \kappa(B, x)

is measurable with respect to the

\sigma

-algebra

\mathcal A

Examples

Simple random walk on the integers

Take

X=Y=\Z

, and

\mathcal A = \mathcal B = \mathcal P(\Z)

(the

power set In mathematics, the power set (or powerset) of a set is the set of all subsets of , including the empty set and itself. In axiomatic set theory (as developed, for example, in the ZFC axioms), the existence of the power set of any set is ...

\Z

). Then a Markov kernel is fully determined by the probability it assigns to singletons

\,\, m \in Y = \Z

for each

n \in X = \Z

: :

\kappa(B, n )=\sum_\kappa(\, n), \qquad \forall n \in \mathbb, \, \forall B \in \mathcal B

. Now the random walk

\kappa

that goes to the right with probability

p

and to the left with probability

1 - p

is defined by :

\kappa(\, n)= p \delta_+ (1-p) \delta_, \quad \forall n,m \in \Z

where

\delta

is the

Kronecker delta In mathematics, the Kronecker delta (named after Leopold Kronecker) is a function of two variables, usually just non-negative integers. The function is 1 if the variables are equal, and 0 otherwise: \delta_ = \begin 0 &\text i \neq j, \\ 1 ...

. The transition probabilities

P(m, n) = \kappa(\, n)

for the random walk are equivalent to the Markov kernel.

General
Markov process A Markov chain or Markov process is a stochastic model describing a sequence of possible events in which the probability of each event depends only on the state attained in the previous event. Informally, this may be thought of as, "What happen ...
es with countable state space

More generally take

X

and

Y

both countable and

\mathcal A =  \mathcal P(X),\ \mathcal B = \mathcal P(Y)

. Again a Markov kernel is defined by the probability it assigns to singleton sets for each

i \in X

\kappa(B, i)=\sum_\kappa(\, i), \qquad \forall i \in X, \, \forall B \in \mathcal B

, We define a Markov process by defining a transition probability

P(j, i) = K_

where the numbers

K_

define a (countable)

stochastic matrix In mathematics, a stochastic matrix is a square matrix used to describe the transitions of a Markov chain. Each of its entries is a nonnegative real number representing a probability. It is also called a probability matrix, transition matrix, ...

(K_)

i.e. :

\begin
 K_ &\ge 0, \qquad &\forall (j,i) \in Y\times X, \\
\sum_K_&=1, \qquad &\forall i \in X.\\
\end

We then define :

\kappa(\ ,  i)  = K_ = P(j, i), \qquad \forall i \in X, \quad \forall B \in \mathcal B

. Again the transition probability, the stochastic matrix and the Markov kernel are equivalent reformulations.

Markov kernel defined by a kernel function and a measure

Let

\nu

be a measure on

(Y, \mathcal B)

, and

k: Y \times X\to, \infty /math> a

measurable function In mathematics and in particular measure theory, a measurable function is a function between the underlying sets of two measurable spaces that preserves the structure of the spaces: the preimage of any measurable set is measurable. This is i ...

with respect to the product

\sigma

-algebra

\mathcal A \otimes \mathcal B

such that :

\int_Y k(y, x)\nu(\mathrm y) = 1, \qquad \forall x \in X

, then

\kappa(dy , x) = k(y, x)\nu(dy)

i.e. the mapping :

\\ \kappa(B, x)=\int_k(y, x)\nu(\mathrm y) \end

defines a Markov kernel. This example generalises the countable Markov process example where

\nu

was the

counting measure In mathematics, specifically measure theory, the counting measure is an intuitive way to put a measure on any set – the "size" of a subset is taken to be the number of elements in the subset if the subset has finitely many elements, and infin ...

. Moreover it encompasses other important examples such as the convolution kernels, in particular the Markov kernels defined by the heat equation. The latter example includes the

Gaussian kernel In mathematics, a Gaussian function, often simply referred to as a Gaussian, is a function of the base form f(x) = \exp (-x^2) and with parametric extension f(x) = a \exp\left( -\frac \right) for arbitrary real constants , and non-zero . It is ...

X = Y = \mathbb R

with

\nu(dx) = dx

standard Lebesgue measure and :

k_t(y, x) = \frace^

Measurable functions

Take

(X, \mathcal)

and

(Y, \mathcal)

arbitrary measurable spaces, and let

f:X \to Y

be a measurable function. Now define

\kappa(dy, x) = \delta_(dy)

i.e. :

\kappa(B, x) = \mathbf_B(f(x)) = \mathbf_(x) = \begin1 & \text f(x) \in B\\ 0 & \text\end

for all

B \in \mathcal

. Note that the indicator function

\mathbf_

\mathcal

-measurable for all

B \in \mathcal

iff

f

is measurable. This example allows us to think of a Markov kernel as a generalised function with a (in general) random rather than certain value. That is, it is a

multivalued function In mathematics, a multivalued function, also called multifunction, many-valued function, set-valued function, is similar to a function, but may associate several values to each input. More precisely, a multivalued function from a domain to ...

where the values are not equally weighted.

Galton–Watson process The Galton–Watson process is a branching stochastic process arising from Francis Galton's statistical investigation of the extinction of family names. The process models family names as patrilineal (passed from father to son), while offspri ...

As a less obvious example, take

X = \N, \mathcal A = \mathcal P(\N)

, and

(Y, \mathcal B)

the real numbers

\R

with the standard sigma algebra of

Borel set In mathematics, a Borel set is any set in a topological space that can be formed from open sets (or, equivalently, from closed sets) through the operations of countable union, countable intersection, and relative complement. Borel sets are name ...

s. Then :

\kappa(B, n)=\begin \mathbf_B(0)  & n=0\\ \Pr(\xi_1 + \cdots + \xi_x \in B) & n \neq 0 \\ \end

with i.i.d.

random variable A random variable (also called random quantity, aleatory variable, or stochastic variable) is a mathematical formalization of a quantity or object which depends on random events. It is a mapping or a function from possible outcomes (e.g., the p ...

\xi_i

(usually with mean 0) and where

\mathbf_B

is the indicator function. For the simple case of coin flips this models the different levels of a

Galton board The Galton board, also known as the Galton box or quincunx or bean machine, is a device invented by Sir Francis Galton to demonstrate the central limit theorem, in particular that with sufficient sample size the binomial distribution approximat ...

Composition of Markov Kernels and the Markov Category

Given measurable spaces

(X, \mathcal A)

(Y, \mathcal B)

we consider a Markov kernel

\kappa: \mathcal B \times X \to,1 /math> as a morphism \kappa: X \to Y . Intuitively, rather than assigning to each x \in X a sharply defined point y \in Y the kernel assigns a "fuzzy" point in Y which is only known  with some level of uncertainty, much like actual physical measurements. If we have a third measurable space (Z, \mathcal C), and probability kernels \kappa: X \to Y and \lambda: Y \to Z, we can define a composition \lambda \circ \kappa : X \to Z by 
: (\lambda \circ \kappa) (dz, x) = \int_Y \lambda(dz ,  y)\kappa(dy, x) .
The composition is associative by the Monotone Convergence Theorem and the identity function considered as a Markov kernel (i.e. the delta measure \kappa_(dx', x) = \delta_x(dx')) is the unit for this composition. 

This composition defines the structure of a

category Category, plural categories, may refer to: Philosophy and general uses *Categorization, categories in cognitive science, information science and generally * Category of being * ''Categories'' (Aristotle) * Category (Kant) * Categories (Peirce) ...

on the measurable spaces with Markov kernels as morphisms first defined by Lawvere. The category has the empty set as initial object and the one point set

*

as the terminal object. From this point of view a probability space

(\Omega, \mathcal A, \mathbb P)

is the same thing as a pointed space

* \to \Omega

in the

Markov category Markov ( Bulgarian, russian: Марков), Markova, and Markoff are common surnames used in Russia and Bulgaria. Notable people with the name include: Academics *Ivana Markova (born 1938), Czechoslovak-British emeritus professor of psychology at ...

Probability Space defined by Probability Distribution and a Markov Kernel

A composition of a probability space

(X, \mathcal A, P_X)

and a probability kernel

\kappa: (X, \mathcal A) \to (Y, \mathcal B)

defines a probability space

(Y, \mathcal B, P_Y = \kappa \circ P_X)

, where the probability measure is given by :

P_Y(B) = \int_X \int_B \kappa(dy, x) P_X(dx) = \int_X \kappa(B, x)P_X(dx) = \mathbb_\kappa(B, \cdot).

Properties

Semidirect product

Let

(X, \mathcal A, P)

be a probability space and

\kappa

a Markov kernel from

(X, \mathcal A)

to some

(Y, \mathcal B)

. Then there exists a unique measure

Q

(X \times Y, \mathcal A \otimes \mathcal B)

, such that: :

Q(A \times B) = \int_A \kappa(B, x)\,P(dx), \quad \forall A \in \mathcal A, \quad \forall B \in\mathcal B.

Regular conditional distribution

Let

(S,Y)

be a Borel space,

X

(S,Y)

-valued random variable on the measure space

(\Omega, \mathcal, P)

and

\mathcal G \subseteq \mathcal F

a sub-

\sigma

-algebra. Then there exists a Markov kernel

\kappa

from

(\Omega, \mathcal G)

(S,Y)

, such that

\kappa(\cdot,B)

is a version of the

conditional expectation In probability theory, the conditional expectation, conditional expected value, or conditional mean of a random variable is its expected value – the value it would take “on average” over an arbitrarily large number of occurrences – given ...

\mathbb mathbf 1_ \mid \mathcal G /math> for every B \in Y, i.e.

: P(X \in B\mid\mathcal G)=\mathbb \left mathbf 1_\mid\mathcal G \right = \kappa(\cdot,B), \qquad P\text\,\, \forall B \in \mathcal G. It is called regular conditional distribution of X given \mathcal G and is not uniquely defined.

Generalizations

Transition kernel In the mathematics of probability, a transition kernel or kernel is a function in mathematics that has different applications. Kernels can for example be used to define random measures or stochastic processes. The most important example of kernel ...

s generalize Markov kernels in the sense that for all

x \in X

, the map :

B \mapsto \kappa(B,  x)

can be any type of (non negative) measure, not necessarily a probability measure.

References

* {{citation, first1=Heinz, last1=Bauer, title=Probability Theory, publisher=de Gruyter, year=1996, isbn=3-11-013935-9 :: §36. Kernels and semigroups of kernels Markov processes