In the mathematical theory of

stochastic processes In probability theory and related fields, a stochastic () or random process is a mathematical object usually defined as a family of random variables. Stochastic processes are widely used as mathematical models of systems and phenomena that appe ...

, variable-order Markov (VOM) models are an important class of models that extend the well known

Markov chain A Markov chain or Markov process is a stochastic model describing a sequence of possible events in which the probability of each event depends only on the state attained in the previous event. Informally, this may be thought of as, "What happe ...

models. In contrast to the Markov chain models, where each

random variable A random variable (also called random quantity, aleatory variable, or stochastic variable) is a mathematical formalization of a quantity or object which depends on random events. It is a mapping or a function from possible outcomes (e.g., the po ...

in a sequence with a Markov property depends on a fixed number of random variables, in VOM models this number of conditioning random variables may vary based on the specific observed realization. This realization sequence is often called the ''context''; therefore the VOM models are also called ''context trees''. VOM models are nicely rendered by colorized probabilistic suffix trees (PST). The flexibility in the number of conditioning random variables turns out to be of real advantage for many applications, such as

statistical analysis Statistical inference is the process of using data analysis to infer properties of an underlying distribution of probability.Upton, G., Cook, I. (2008) ''Oxford Dictionary of Statistics'', OUP. . Inferential statistical analysis infers propertie ...

classification Classification is a process related to categorization, the process in which ideas and objects are recognized, differentiated and understood. Classification is the grouping of related facts into classes. It may also refer to: Business, organizat ...

and

prediction A prediction (Latin ''præ-'', "before," and ''dicere'', "to say"), or forecast, is a statement about a future event or data. They are often, but not always, based upon experience or knowledge. There is no universal agreement about the exact ...

Example

Consider for example a sequence of

s, each of which takes a value from the ternary

alphabet An alphabet is a standardized set of basic written graphemes (called letters) that represent the phonemes of certain spoken languages. Not all writing systems represent language in this way; in a syllabary, each character represents a syll ...

. Specifically, consider the string ' constructed from infinite concatenations of the sub-string '. The VOM model of maximal order 2 can approximate the above string using ''only'' the following five conditional probability components: . In this example, Pr(''c'', ''ab'') = Pr(''c'', ''b'') = 1.0; therefore, the shorter context ''b'' is sufficient to determine the next character. Similarly, the VOM model of maximal order 3 can generate the string exactly using only five conditional probability components, which are all equal to 1.0. To construct the

of order 1 for the next character in that string, one must estimate the following 9 conditional probability components: . To construct the Markov chain of order 2 for the next character, one must estimate 27 conditional probability components: . And to construct the Markov chain of order three for the next character one must estimate the following 81 conditional probability components: . In practical settings there is seldom sufficient data to accurately estimate the exponentially increasing number of conditional probability components as the order of the Markov chain increases. The variable-order Markov model assumes that in realistic settings, there are certain realizations of states (represented by contexts) in which some past states are independent from the future states; accordingly, "a great reduction in the number of model parameters can be achieved."

Definition

Let be a state space (finite

) of size

, A,

. Consider a sequence with the Markov property

x_1^=x_1x_2\dots x_n

of realizations of

s, where

x_i\in  A

is the state (symbol) at position

\scriptstyle (1 \le i \le n)

, and the concatenation of states

x_i

and

x_

is denoted by

x_ix_

. Given a training set of observed states,

x_1^

, the construction algorithm of the VOM models learns a model that provides a

probability Probability is the branch of mathematics concerning numerical descriptions of how likely an Event (probability theory), event is to occur, or how likely it is that a proposition is true. The probability of an event is a number between 0 and ...

assignment for each state in the sequence given its past (previously observed symbols) or future states. Specifically, the learner generates a

conditional probability distribution In probability theory and statistics, given two jointly distributed random variables X and Y, the conditional probability distribution of Y given X is the probability distribution of Y when X is known to be a particular value; in some cases the co ...

P(x_i\mid s)

for a symbol

x_i \in A

given a context

s\in A^*

, where the * sign represents a sequence of states of any length, including the empty context. VOM models attempt to estimate

conditional distribution In probability theory and statistics, given two jointly distributed random variables X and Y, the conditional probability distribution of Y given X is the probability distribution of Y when X is known to be a particular value; in some cases the co ...

s of the form

P(x_i\mid s)

where the context length

, s,  \le D

varies depending on the available statistics. In contrast, conventional

Markov models In probability theory, a Markov model is a stochastic model used to model pseudo-randomly changing systems. It is assumed that future states depend only on the current state, not on the events that occurred before it (that is, it assumes the Marko ...

attempt to estimate these

s by assuming a fixed contexts' length

, s,  = D

and, hence, can be considered as special cases of the VOM models. Effectively, for a given training sequence, the VOM models are found to obtain better model parameterization than the fixed-order

that leads to a better

variance In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its population mean or sample mean. Variance is a measure of dispersion, meaning it is a measure of how far a set of numbers ...

-bias tradeoff of the learned models.

Application areas

Various efficient algorithms have been devised for estimating the parameters of the VOM model. VOM models have been successfully applied to areas such as

machine learning Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence. Machine ...

information theory Information theory is the scientific study of the quantification (science), quantification, computer data storage, storage, and telecommunication, communication of information. The field was originally established by the works of Harry Nyquist a ...

and

bioinformatics Bioinformatics () is an interdisciplinary field that develops methods and software tools for understanding biological data, in particular when the data sets are large and complex. As an interdisciplinary field of science, bioinformatics combi ...

, including specific applications such as coding and data compression, document compression, classification and identification of DNA and

protein sequences Protein primary structure is the linear sequence of amino acids in a peptide or protein. By convention, the primary structure of a protein is reported starting from the amino-terminal (N) end to the carboxyl-terminal (C) end. Protein biosynthes ...

ref name="Shmilovici"/> statistical process control, spam filtering, haplotyping, speech recognition, sequence analysis in social sciences, and others.

References

{{reflist Markov models

Example

Definition

Application areas

See also

References