probability theory Probability theory or probability calculus is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expre ...

, a Markov kernel (also known as a stochastic kernel or probability kernel) is a map that in the general theory of

Markov process In probability theory and statistics, a Markov chain or Markov process is a stochastic process describing a sequence of possible events in which the probability of each event depends only on the state attained in the previous event. Informally, ...

es plays the role that the transition matrix does in the theory of Markov processes with a finite

state space In computer science, a state space is a discrete space representing the set of all possible configurations of a system. It is a useful abstraction for reasoning about the behavior of a given system and is widely used in the fields of artificial ...

Formal definition

Let

(X,\mathcal A)

and

(Y,\mathcal B)

measurable space In mathematics, a measurable space or Borel space is a basic object in measure theory. It consists of a set and a σ-algebra, which defines the subsets that will be measured. It captures and generalises intuitive notions such as length, area, an ...

s. A ''Markov kernel'' with source

(X,\mathcal A)

and target

(Y,\mathcal B)

, sometimes written as

\kappa:(X,\mathcal)\to(Y,\mathcal)

, is a function

\kappa : \mathcal B \times X  \to,1 /math> with the following properties: 
# For every (fixed) B_0 \in \mathcal B, the map x \mapsto \kappa(B_0, x) is \mathcal A -

measurable In mathematics, the concept of a measure is a generalization and formalization of geometrical measures (length, area, volume) and other common notions, such as magnitude, mass, and probability of events. These seemingly distinct concepts hav ...

# For every (fixed)

x_0 \in X

, the map

B \mapsto \kappa(B, x_0)

is a

probability measure In mathematics, a probability measure is a real-valued function defined on a set of events in a σ-algebra that satisfies Measure (mathematics), measure properties such as ''countable additivity''. The difference between a probability measure an ...

(Y, \mathcal B)

In other words it associates to each point

x \in X

\kappa(dy, x): B \mapsto \kappa(B, x)

(Y,\mathcal B)

such that, for every measurable set

B\in\mathcal B

, the map

x\mapsto \kappa(B, x)

is measurable with respect to the

\sigma

-algebra

\mathcal A

Examples

Simple random walk on the integers

Take

X=Y=\Z

, and

\mathcal A = \mathcal B = \mathcal P(\Z)

(the

power set In mathematics, the power set (or powerset) of a set is the set of all subsets of , including the empty set and itself. In axiomatic set theory (as developed, for example, in the ZFC axioms), the existence of the power set of any set is po ...

\Z

). Then a Markov kernel is fully determined by the probability it assigns to singletons

\,\, m \in Y = \Z

for each

n \in X = \Z

: :

\kappa(B, n )=\sum_\kappa(\, n), \qquad \forall n \in \mathbb, \, \forall B \in \mathcal B

. Now the random walk

\kappa

that goes to the right with probability

p

and to the left with probability

1 - p

is defined by :

\kappa(\, n)= p \delta_+ (1-p) \delta_, \quad \forall n,m \in \Z

where

\delta

is the

Kronecker delta In mathematics, the Kronecker delta (named after Leopold Kronecker) is a function of two variables, usually just non-negative integers. The function is 1 if the variables are equal, and 0 otherwise: \delta_ = \begin 0 &\text i \neq j, \\ 1 &\ ...

. The transition probabilities

P(m, n) = \kappa(\, n)

for the random walk are equivalent to the Markov kernel.

General
Markov process In probability theory and statistics, a Markov chain or Markov process is a stochastic process describing a sequence of possible events in which the probability of each event depends only on the state attained in the previous event. Informally, ...
es with countable state space

More generally take

X

and

Y

both countable and

\mathcal A =  \mathcal P(X),\ \mathcal B = \mathcal P(Y)

. Again a Markov kernel is defined by the probability it assigns to singleton sets for each

i \in X

\kappa(B, i)=\sum_\kappa(\, i), \qquad \forall i \in X, \, \forall B \in \mathcal B

, We define a Markov process by defining a transition probability

P(j, i) = K_

where the numbers

K_

define a (countable)

stochastic matrix In mathematics, a stochastic matrix is a square matrix used to describe the transitions of a Markov chain. Each of its entries is a nonnegative real number representing a probability. It is also called a probability matrix, transition matrix, ''s ...

(K_)

i.e. :

\begin
 K_ &\ge 0, \qquad &\forall (j,i) \in Y\times X, \\
\sum_K_&=1, \qquad &\forall i \in X.\\
\end

We then define :

\kappa(\ ,  i)  = K_ = P(j, i), \qquad \forall i \in X, \quad \forall B \in \mathcal B

. Again the transition probability, the stochastic matrix and the Markov kernel are equivalent reformulations.

Markov kernel defined by a kernel function and a measure

Let

\nu

be a measure on

(Y, \mathcal B)

, and

k: Y \times X\to, \infty /math> a

measurable function In mathematics, and in particular measure theory, a measurable function is a function between the underlying sets of two measurable spaces that preserves the structure of the spaces: the preimage of any measurable set is measurable. This is in ...

with respect to the product

\sigma

-algebra

\mathcal A \otimes \mathcal B

such that :

\int_Y k(y, x)\nu(\mathrm y) = 1, \qquad \forall x \in X

, then

\kappa(dy , x) = k(y, x)\nu(dy)

i.e. the mapping :

\\ \kappa(B, x)=\int_k(y, x)\nu(\mathrm y) \end

defines a Markov kernel. This example generalises the countable Markov process example where

\nu

was the

counting measure In mathematics, specifically measure theory, the counting measure is an intuitive way to put a measure on any set – the "size" of a subset is taken to be the number of elements in the subset if the subset has finitely many elements, and infinit ...

. Moreover it encompasses other important examples such as the convolution kernels, in particular the Markov kernels defined by the heat equation. The latter example includes the Gaussian kernel on

X = Y = \mathbb R

with

\nu(dx) = dx

standard Lebesgue measure and :

k_t(y, x) = \frace^

Measurable functions

Take

(X, \mathcal)

and

(Y, \mathcal)

arbitrary measurable spaces, and let

f:X \to Y

be a measurable function. Now define

\kappa(dy, x) = \delta_(dy)

i.e. :

\kappa(B, x) = \mathbf_B(f(x)) = \mathbf_(x) = \begin1 & \text f(x) \in B\\ 0 & \text\end

for all

B \in \mathcal

. Note that the indicator function

\mathbf_

\mathcal

-measurable for all

B \in \mathcal

iff

f

is measurable. This example allows us to think of a Markov kernel as a generalised function with a (in general) random rather than certain value. That is, it is a

multivalued function In mathematics, a multivalued function, multiple-valued function, many-valued function, or multifunction, is a function that has two or more values in its range for at least one point in its domain. It is a set-valued function with additional p ...

where the values are not equally weighted.

Galton–Watson process

As a less obvious example, take

X = \N, \mathcal A = \mathcal P(\N)

, and

(Y, \mathcal B)

the real numbers

\R

with the standard sigma algebra of

Borel set In mathematics, a Borel set is any subset of a topological space that can be formed from its open sets (or, equivalently, from closed sets) through the operations of countable union, countable intersection, and relative complement. Borel sets ...

s. Then :

\kappa(B, n)=\begin \mathbf_B(0)  & n=0\\ \Pr(\xi_1 + \cdots + \xi_x \in B) & n \neq 0 \\ \end

where

x

is the number of element at the state

n

\xi_i

are i.i.d.

random variable A random variable (also called random quantity, aleatory variable, or stochastic variable) is a Mathematics, mathematical formalization of a quantity or object which depends on randomness, random events. The term 'random variable' in its mathema ...

s (usually with mean 0) and where

\mathbf_B

is the indicator function. For the simple case of coin flips this models the different levels of a Galton board.

Composition of Markov Kernels

Given measurable spaces

(X, \mathcal A)

(Y, \mathcal B)

we consider a Markov kernel

\kappa: \mathcal B \times X \to,1 /math> as a morphism \kappa: X \to Y . Intuitively, rather than assigning to each x \in X a sharply defined point y \in Y the kernel assigns a "fuzzy" point in Y which is only known  with some level of uncertainty, much like actual physical measurements. If we have a third measurable space (Z, \mathcal C), and probability kernels \kappa: X \to Y and \lambda: Y \to Z, we can define a composition \lambda \circ \kappa : X \to Z by the Chapman-Kolmogorov equation : (\lambda \circ \kappa) (dz, x) = \int_Y \lambda(dz ,  y)\kappa(dy, x) .
The composition is associative by the Monotone Convergence Theorem and the identity function considered as a Markov kernel (i.e. the delta measure \kappa_(dx', x) = \delta_x(dx')) is the unit for this composition. 

This composition defines the structure of a

category Category, plural categories, may refer to: General uses *Classification, the general act of allocating things to classes/categories Philosophy * Category of being * ''Categories'' (Aristotle) * Category (Kant) * Categories (Peirce) * Category ( ...

on the measurable spaces with Markov kernels as morphisms, first defined by Lawvere, the

category of Markov kernels In mathematics, the category of Markov kernels, often denoted Stoch, is the category whose objects are measurable spaces and whose morphisms are Markov kernels. It is analogous to the category of sets and functions, but where the arrows can be ...

Probability Space defined by Probability Distribution and a Markov Kernel

A composition of a probability space

(X, \mathcal A, P_X)

and a probability kernel

\kappa: (X, \mathcal A) \to (Y, \mathcal B)

defines a probability space

(Y, \mathcal B, P_Y = \kappa \circ P_X)

, where the probability measure is given by :

P_Y(B) = \int_X \int_B \kappa(dy, x) P_X(dx) = \int_X \kappa(B, x)P_X(dx) = \mathbb_\kappa(B, \cdot).

Properties

Semidirect product

Let

(X, \mathcal A, P)

be a probability space and

\kappa

a Markov kernel from

(X, \mathcal A)

to some

(Y, \mathcal B)

. Then there exists a unique measure

Q

(X \times Y, \mathcal A \otimes \mathcal B)

, such that: :

Q(A \times B) = \int_A \kappa(B, x)\,P(dx), \quad \forall A \in \mathcal A, \quad \forall B \in\mathcal B.

Regular conditional distribution

Let

(S,Y)

be a Borel space,

X

(S,Y)

-valued random variable on the measure space

(\Omega, \mathcal, P)

and

\mathcal G \subseteq \mathcal F

a sub-

\sigma

-algebra. Then there exists a Markov kernel

\kappa

from

(\Omega, \mathcal G)

(S,Y)

, such that

\kappa(\cdot,B)

is a version of the

conditional expectation In probability theory, the conditional expectation, conditional expected value, or conditional mean of a random variable is its expected value evaluated with respect to the conditional probability distribution. If the random variable can take on ...

\mathbb mathbf 1_ \mid \mathcal G /math> for every B \in Y, i.e.

: P(X \in B\mid\mathcal G)=\mathbb \left mathbf 1_\mid\mathcal G \right = \kappa(\cdot,B), \qquad P\text\,\, \forall B \in \mathcal G. It is called regular conditional distribution of X given \mathcal G and is not uniquely defined.

Generalizations

Transition kernels generalize Markov kernels in the sense that for all

x \in X

, the map :

B \mapsto \kappa(B,  x)

can be any type of (non negative) measure, not necessarily a probability measure.

External links

Markov kernel
i
nLab

References

* {{citation, first1=Heinz, last1=Bauer, title=Probability Theory, publisher=de Gruyter, year=1996, isbn=3-11-013935-9 :: §36. Kernels and semigroups of kernels

Formal definition

Examples

Simple random walk on the integers

Markov kernel defined by a kernel function and a measure

Measurable functions

Galton–Watson process

Composition of Markov Kernels

Probability Space defined by Probability Distribution and a Markov Kernel

Properties

Semidirect product

Regular conditional distribution

Generalizations

External links

References

See also