probability theory Probability theory is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expressing it through a set o ...

, a Markov kernel (also known as a stochastic kernel or probability kernel) is a map that in the general theory of

Markov process A Markov chain or Markov process is a stochastic model describing a sequence of possible events in which the probability of each event depends only on the state attained in the previous event. Informally, this may be thought of as, "What happen ...

es plays the role that the transition matrix does in the theory of Markov processes with a

finite Finite is the opposite of infinite. It may refer to: * Finite number (disambiguation) * Finite set, a set whose cardinality (number of elements) is some natural number * Finite verb Traditionally, a finite verb (from la, fīnītus, past partici ...

state space A state space is the set of all possible configurations of a system. It is a useful abstraction for reasoning about the behavior of a given system and is widely used in the fields of artificial intelligence and game theory. For instance, the t ...

Formal definition

Let

(X,\mathcal A)

and

(Y,\mathcal B)

be measurable spaces. A ''Markov kernel'' with source

(X,\mathcal A)

and target

(Y,\mathcal B)

is a map

\kappa : \mathcal B \times X  \to,1 /math> with the following properties: 
# For every (fixed) B \in \mathcal B, the map x \mapsto \kappa(B, x) is \mathcal A -measurable 
# For every (fixed) x \in X, the map B \mapsto \kappa(B, x) is a

probability measure In mathematics, a probability measure is a real-valued function defined on a set of events in a probability space that satisfies measure properties such as ''countable additivity''. The difference between a probability measure and the more g ...

(Y, \mathcal B)

In other words it associates to each point

x \in X

\kappa(dy, x): B \mapsto \kappa(B, x)

(Y,\mathcal B)

such that, for every measurable set

B\in\mathcal B

, the map

x\mapsto \kappa(B, x)

is measurable with respect to the

\sigma

-algebra

\mathcal A

Examples

Simple random walk on the integers

Take

X=Y=\Z

, and

\mathcal A = \mathcal B = \mathcal P(\Z)

(the

power set In mathematics, the power set (or powerset) of a set is the set of all subsets of , including the empty set and itself. In axiomatic set theory (as developed, for example, in the ZFC axioms), the existence of the power set of any set is ...

\Z

). Then a Markov kernel is fully determined by the probability it assigns to singletons

\,\, m \in Y = \Z

for each

n \in X = \Z

: :

\kappa(B, n )=\sum_\kappa(\, n), \qquad \forall n \in \mathbb, \, \forall B \in \mathcal B

. Now the random walk

\kappa

that goes to the right with probability

p

and to the left with probability

1 - p

is defined by :

\kappa(\, n)= p \delta_+ (1-p) \delta_, \quad \forall n,m \in \Z

where

\delta

is the

Kronecker delta In mathematics, the Kronecker delta (named after Leopold Kronecker) is a function of two variables, usually just non-negative integers. The function is 1 if the variables are equal, and 0 otherwise: \delta_ = \begin 0 &\text i \neq j, \\ 1 ...

. The transition probabilities

P(m, n) = \kappa(\, n)

for the random walk are equivalent to the Markov kernel.

General
Markov process A Markov chain or Markov process is a stochastic model describing a sequence of possible events in which the probability of each event depends only on the state attained in the previous event. Informally, this may be thought of as, "What happen ...
es with countable state space

More generally take

X

and

Y

both countable and

\mathcal A =  \mathcal P(X),\ \mathcal B = \mathcal P(Y)

. Again a Markov kernel is defined by the probability it assigns to singleton sets for each

i \in X

\kappa(B, i)=\sum_\kappa(\, i), \qquad \forall i \in X, \, \forall B \in \mathcal B

, We define a Markov process by defining a transition probability

P(j, i) = K_

where the numbers

K_

define a (countable) stochastic matrix

(K_)

i.e. :

\begin
 K_ &\ge 0, \qquad &\forall (j,i) \in Y\times X, \\
\sum_K_&=1, \qquad &\forall i \in X.\\
\end

We then define :

\kappa(\ ,  i)  = K_ = P(j, i), \qquad \forall i \in X, \quad \forall B \in \mathcal B

. Again the transition probability, the stochastic matrix and the Markov kernel are equivalent reformulations.

Markov kernel defined by a kernel function and a measure

Let

\nu

be a measure on

(Y, \mathcal B)

, and

k: Y \times X\to, \infty /math> a

measurable function In mathematics and in particular measure theory, a measurable function is a function between the underlying sets of two measurable spaces that preserves the structure of the spaces: the preimage of any measurable set is measurable. This is i ...

with respect to the product

\sigma

-algebra

\mathcal A \otimes \mathcal B

such that :

\int_Y k(y, x)\nu(\mathrm y) = 1, \qquad \forall x \in X

, then

\kappa(dy , x) = k(y, x)\nu(dy)

i.e. the mapping :

\\ \kappa(B, x)=\int_k(y, x)\nu(\mathrm y) \end

defines a Markov kernel. This example generalises the countable Markov process example where

\nu

was the counting measure. Moreover it encompasses other important examples such as the convolution kernels, in particular the Markov kernels defined by the heat equation. The latter example includes the Gaussian kernel on

X = Y = \mathbb R

with

\nu(dx) = dx

standard Lebesgue measure and :

k_t(y, x) = \frace^

Measurable functions

Take

(X, \mathcal)

and

(Y, \mathcal)

arbitrary measurable spaces, and let

f:X \to Y

be a measurable function. Now define

\kappa(dy, x) = \delta_(dy)

i.e. :

\kappa(B, x) = \mathbf_B(f(x)) = \mathbf_(x) = \begin1 & \text f(x) \in B\\ 0 & \text\end

for all

B \in \mathcal

. Note that the indicator function

\mathbf_

\mathcal

-measurable for all

B \in \mathcal

iff

f

is measurable. This example allows us to think of a Markov kernel as a generalised function with a (in general) random rather than certain value. That is, it is a

multivalued function In mathematics, a multivalued function, also called multifunction, many-valued function, set-valued function, is similar to a function, but may associate several values to each input. More precisely, a multivalued function from a domain to ...

where the values are not equally weighted.

Galton–Watson process

As a less obvious example, take

X = \N, \mathcal A = \mathcal P(\N)

, and

(Y, \mathcal B)

the real numbers

\R

with the standard sigma algebra of Borel sets. Then :

\kappa(B, n)=\begin \mathbf_B(0)  & n=0\\ \Pr(\xi_1 + \cdots + \xi_x \in B) & n \neq 0 \\ \end

with i.i.d.

random variable A random variable (also called random quantity, aleatory variable, or stochastic variable) is a mathematical formalization of a quantity or object which depends on random events. It is a mapping or a function from possible outcomes (e.g., the p ...

\xi_i

(usually with mean 0) and where

\mathbf_B

is the indicator function. For the simple case of coin flips this models the different levels of a Galton board.

Composition of Markov Kernels and the Markov Category

Given measurable spaces

(X, \mathcal A)

(Y, \mathcal B)

we consider a Markov kernel

\kappa: \mathcal B \times X \to,1 /math> as a morphism \kappa: X \to Y . Intuitively, rather than assigning to each x \in X a sharply defined point y \in Y the kernel assigns a "fuzzy" point in Y which is only known  with some level of uncertainty, much like actual physical measurements. If we have a third measurable space (Z, \mathcal C), and probability kernels \kappa: X \to Y and \lambda: Y \to Z, we can define a composition \lambda \circ \kappa : X \to Z by 
: (\lambda \circ \kappa) (dz, x) = \int_Y \lambda(dz ,  y)\kappa(dy, x) .
The composition is associative by the Monotone Convergence Theorem and the identity function considered as a Markov kernel (i.e. the delta measure \kappa_(dx', x) = \delta_x(dx')) is the unit for this composition. 

This composition defines the structure of a

category Category, plural categories, may refer to: Philosophy and general uses *Categorization, categories in cognitive science, information science and generally * Category of being * ''Categories'' (Aristotle) * Category (Kant) * Categories (Peirce) ...

on the measurable spaces with Markov kernels as morphisms first defined by Lawvere. The category has the empty set as initial object and the one point set

*

as the terminal object. From this point of view a probability space

(\Omega, \mathcal A, \mathbb P)

is the same thing as a pointed space

* \to \Omega

in the Markov category.

Probability Space defined by Probability Distribution and a Markov Kernel

A composition of a probability space

(X, \mathcal A, P_X)

and a probability kernel

\kappa: (X, \mathcal A) \to (Y, \mathcal B)

defines a probability space

(Y, \mathcal B, P_Y = \kappa \circ P_X)

, where the probability measure is given by :

P_Y(B) = \int_X \int_B \kappa(dy, x) P_X(dx) = \int_X \kappa(B, x)P_X(dx) = \mathbb_\kappa(B, \cdot).

Properties

Semidirect product

Let

(X, \mathcal A, P)

be a probability space and

\kappa

a Markov kernel from

(X, \mathcal A)

to some

(Y, \mathcal B)

. Then there exists a unique measure

Q

(X \times Y, \mathcal A \otimes \mathcal B)

, such that: :

Q(A \times B) = \int_A \kappa(B, x)\,P(dx), \quad \forall A \in \mathcal A, \quad \forall B \in\mathcal B.

Regular conditional distribution

Let

(S,Y)

be a Borel space,

X

(S,Y)

-valued random variable on the measure space

(\Omega, \mathcal, P)

and

\mathcal G \subseteq \mathcal F

a sub-

\sigma

-algebra. Then there exists a Markov kernel

\kappa

from

(\Omega, \mathcal G)

(S,Y)

, such that

\kappa(\cdot,B)

is a version of the conditional expectation

\mathbb mathbf 1_ \mid \mathcal G /math> for every B \in Y, i.e.

: P(X \in B\mid\mathcal G)=\mathbb \left mathbf 1_\mid\mathcal G \right = \kappa(\cdot,B), \qquad P\text\,\, \forall B \in \mathcal G. It is called regular conditional distribution of X given \mathcal G and is not uniquely defined.

Generalizations

Transition kernels generalize Markov kernels in the sense that for all

x \in X

, the map :

B \mapsto \kappa(B,  x)

can be any type of (non negative) measure, not necessarily a probability measure.

References

* {{citation, first1=Heinz, last1=Bauer, title=Probability Theory, publisher=de Gruyter, year=1996, isbn=3-11-013935-9 :: §36. Kernels and semigroups of kernels Markov processes