Stochastic Chains With Memory Of Variable Length
   HOME

TheInfoList



OR:

Stochastic chains with memory of variable length are a family of stochastic chains of finite order in a finite alphabet, such as, for every time pass, only one finite suffix of the past, called context, is necessary to predict the next symbol. These models were introduced in the information theory literature by
Jorma Rissanen Jorma Johannes Rissanen (October 20, 1932 – May 9, 2020) was an information theorist, known for originating the minimum description length (MDL) principle and practical approaches to arithmetic coding for lossless data compression. His work in ...
in 1983, as a universal tool to
data compression In information theory, data compression, source coding, or bit-rate reduction is the process of encoding information using fewer bits than the original representation. Any particular compression is either lossy or lossless. Lossless compression ...
, but recently have been used to model data in different areas such as
biology Biology is the scientific study of life. It is a natural science with a broad scope but has several unifying themes that tie it together as a single, coherent field. For instance, all organisms are made up of cells that process hereditary i ...
,
linguistics Linguistics is the scientific study of human language. It is called a scientific study because it entails a comprehensive, systematic, objective, and precise analysis of all aspects of language, particularly its nature and structure. Linguis ...
and
music Music is generally defined as the art of arranging sound to create some combination of form, harmony, melody, rhythm or otherwise expressive content. Exact definitions of music vary considerably around the world, though it is an aspect ...
.


Definition

A stochastic chain with memory of variable length is a stochastic chain (X_n)_, taking values in a finite alphabet A, and characterized by a probabilistic context tree (\tau,p), so that *\tau is the group of all contexts. A context X_,\ldots,X_, being l the size of the context, is a finite portion of the past X_,\ldots,X_, which is relevant to predict the next symbol X_; *p is a family of transition probabilities associated with each context.


History

The class of stochastic chains with memory of variable length was introduced by
Jorma Rissanen Jorma Johannes Rissanen (October 20, 1932 – May 9, 2020) was an information theorist, known for originating the minimum description length (MDL) principle and practical approaches to arithmetic coding for lossless data compression. His work in ...
in the article ''A universal system for data compression system''. Such class of stochastic chains was popularized in the statistical and probabilistic community by P. Bühlmann and A. J. Wyner in 1999, in the article ''Variable Length Markov Chains''. Named by Bühlmann and Wyner as “variable length
Markov chain A Markov chain or Markov process is a stochastic model describing a sequence of possible events in which the probability of each event depends only on the state attained in the previous event. Informally, this may be thought of as, "What happe ...
s” (VLMC), these chains are also known as “
variable-order Markov model In the mathematical theory of stochastic processes, variable-order Markov (VOM) models are an important class of models that extend the well known Markov chain models. In contrast to the Markov chain models, where each random variable in a sequence ...
s" (VOM), “probabilistic
suffix tree In computer science, a suffix tree (also called PAT tree or, in an earlier form, position tree) is a compressed trie containing all the suffixes of the given text as their keys and positions in the text as their values. Suffix trees allow particu ...
s” and “context
tree model In historical linguistics, the tree model (also Stammbaum, genetic, or cladistic model) is a model of the evolution of languages analogous to the concept of a family tree, particularly a phylogenetic tree in the biological evolution of species. ...
s”. The name “stochastic chains with memory of variable length” seems to have been introduced by Galves and Löcherbach, in 2008, in the article of the same name.


Examples


Interrupted light source

Consider a
system A system is a group of Interaction, interacting or interrelated elements that act according to a set of rules to form a unified whole. A system, surrounded and influenced by its environment (systems), environment, is described by its boundaries, ...
by a lamp, an observer and a door between both of them. The lamp has two possible states: on, represented by 1, or off, represented by 0. When the lamp is on, the observer may see the light through the door, depending on which state the door is at the time: open, 1, or closed, 0. such states are independent of the original state of the lamp. Let (X_n)_ a
Markov chain A Markov chain or Markov process is a stochastic model describing a sequence of possible events in which the probability of each event depends only on the state attained in the previous event. Informally, this may be thought of as, "What happe ...
that represents the state of the lamp, with values in A= and let p be a probability transition matrix. Also, let (\xi _n)_ be a sequence of
independent random variables Independent or Independents may refer to: Arts, entertainment, and media Artist groups * Independents (artist group), a group of modernist painters based in the New Hope, Pennsylvania, area of the United States during the early 1930s * Independ ...
that represents the door's states, also taking values in A, independent of the chain (X_n)_ and such that : \mathbb(\xi_n = 1) = 1 - \varepsilon where 0 < \epsilon < 1. Define a new sequence (Z_n)_ such that : Z_n = X_n \xi_n for every (Z_n)_. In order to determine the last instant that the observer could see the lamp on, i.e. to identify the least instant k, with k in which Z_k=1. Using a context tree it's possible to represent the past states of the sequence, showing which are relevant to identify the next state. The stochastic chain (Z_n)_ is, then, a chain with memory of variable length, taking values in A and compatible with the probabilistic context tree (\tau,p), where : \tau = \ \cup \.


Inferences in chains with variable length

Given a sample X_,\ldots,X_, one can find the appropriated context tree using the following algorithms.


The context algorithm

In the article ''A Universal Data Compression System'', Rissanen introduced a consistent algorithm to estimate the probabilistic context tree that generates the data. This algorithm's function can be summarized in two steps: # Given the sample produced by a chain with memory of variable length, we start with the maximum tree whose branches are all the candidates to contexts to the sample; # The branches in this tree are then cut until you obtain the smallest tree that's well adapted to the data. Deciding whether or not shortening the context is done through a given gain function, such as the ratio of the log-likelihood. Be X_,\ldots,X_ a sample of a finite probabilistic tree (\tau,p). For any sequence x_^ with j \leq n, it is possible to denote by N_n(x_^) the number of occurrences of the sequence in the sample, i.e., : N_n(x_^) = \sum_^ \mathbf\left\ Rissanen first built a context maximum candidate, given by X_^, where K(n)=C\log and C is an arbitrary positive constant. The intuitive reason for the choice of C\log comes from the impossibility of estimating the probabilities of sequence with lengths greater than \log based in a sample of size n. From there, Rissanen shortens the maximum candidate through successive cutting the branches according to a sequence of tests based in statistical likelihood ratio. In a more formal definition, if bANnxk1b0 define the probability estimator of the transition probability p by : \hat_n(a\mid x_^) = \frac where x_^a=(x_, \ldots, x_,a). If \sum_N_n(x_^b) \,=\,0, define \hat_n(a\mid x_^) \,=\, 1/, A, . To i \geq 1, define : \Lambda_n (x_^ ) \,=\, 2 \, \sum_ \sum_ N_n(y x_^a) \log\left frac \right, where y x_^=(y,x_, \ldots , x_) and : \hat_n(a\mid x_^y)= \frac. Note that \Lambda_n (x_^) is the ratio of the log-likelihood to test the consistency of the sample with the probabilistic context tree (\tau,p) against the alternative that is consistent with (\tau',p'), where \tau and \tau' differ only by a set of sibling knots. The length of the current estimated context is defined by : \hat_n(X_0^)= \max \left\\, where C is any positive constant. At last, by Rissanen, there's the following result. Given X_0,\ldots, X_ of a finite probabilistic context tree (\tau,p), then : P\left( \hat_n(X_0^) \neq \ell(X_0^) \right) \longrightarrow 0, when n \rightarrow \infty.


Bayesian information criterion (BIC)

The estimator of the context tree by BIC with a penalty constant c>0 is defined as : \hat_\mathrm=\underset\


Smallest maximizer criterion (SMC)

The smallest maximizer criterion is calculated by selecting the smallest tree ''τ'' of a set of champion trees ''C'' such that : \lim_ \frac = 0


See also

*
Variable-order Markov model In the mathematical theory of stochastic processes, variable-order Markov (VOM) models are an important class of models that extend the well known Markov chain models. In contrast to the Markov chain models, where each random variable in a sequence ...
*
Markov chain A Markov chain or Markov process is a stochastic model describing a sequence of possible events in which the probability of each event depends only on the state attained in the previous event. Informally, this may be thought of as, "What happe ...
*
Stochastic process In probability theory and related fields, a stochastic () or random process is a mathematical object usually defined as a family of random variables. Stochastic processes are widely used as mathematical models of systems and phenomena that appea ...


References

{{Stochastic processes Stochastic models