Variable-order Markov Models
   HOME

TheInfoList



OR:

In the mathematical theory of
stochastic processes In probability theory and related fields, a stochastic () or random process is a mathematical object usually defined as a family of random variables. Stochastic processes are widely used as mathematical models of systems and phenomena that appe ...
, variable-order Markov (VOM) models are an important class of models that extend the well known
Markov chain A Markov chain or Markov process is a stochastic model describing a sequence of possible events in which the probability of each event depends only on the state attained in the previous event. Informally, this may be thought of as, "What happe ...
models. In contrast to the Markov chain models, where each
random variable A random variable (also called random quantity, aleatory variable, or stochastic variable) is a mathematical formalization of a quantity or object which depends on random events. It is a mapping or a function from possible outcomes (e.g., the po ...
in a sequence with a Markov property depends on a fixed number of random variables, in VOM models this number of conditioning random variables may vary based on the specific observed realization. This realization sequence is often called the ''context''; therefore the VOM models are also called ''context trees''. VOM models are nicely rendered by colorized probabilistic suffix trees (PST). The flexibility in the number of conditioning random variables turns out to be of real advantage for many applications, such as
statistical analysis Statistical inference is the process of using data analysis to infer properties of an underlying distribution of probability.Upton, G., Cook, I. (2008) ''Oxford Dictionary of Statistics'', OUP. . Inferential statistical analysis infers propertie ...
,
classification Classification is a process related to categorization, the process in which ideas and objects are recognized, differentiated and understood. Classification is the grouping of related facts into classes. It may also refer to: Business, organizat ...
and
prediction A prediction (Latin ''præ-'', "before," and ''dicere'', "to say"), or forecast, is a statement about a future event or data. They are often, but not always, based upon experience or knowledge. There is no universal agreement about the exact ...
.


Example

Consider for example a sequence of
random variable A random variable (also called random quantity, aleatory variable, or stochastic variable) is a mathematical formalization of a quantity or object which depends on random events. It is a mapping or a function from possible outcomes (e.g., the po ...
s, each of which takes a value from the ternary
alphabet An alphabet is a standardized set of basic written graphemes (called letters) that represent the phonemes of certain spoken languages. Not all writing systems represent language in this way; in a syllabary, each character represents a syll ...
. Specifically, consider the string ' constructed from infinite concatenations of the sub-string '. The VOM model of maximal order 2 can approximate the above string using ''only'' the following five conditional probability components: . In this example, Pr(''c'', ''ab'') = Pr(''c'', ''b'') = 1.0; therefore, the shorter context ''b'' is sufficient to determine the next character. Similarly, the VOM model of maximal order 3 can generate the string exactly using only five conditional probability components, which are all equal to 1.0. To construct the
Markov chain A Markov chain or Markov process is a stochastic model describing a sequence of possible events in which the probability of each event depends only on the state attained in the previous event. Informally, this may be thought of as, "What happe ...
of order 1 for the next character in that string, one must estimate the following 9 conditional probability components: . To construct the Markov chain of order 2 for the next character, one must estimate 27 conditional probability components: . And to construct the Markov chain of order three for the next character one must estimate the following 81 conditional probability components: . In practical settings there is seldom sufficient data to accurately estimate the exponentially increasing number of conditional probability components as the order of the Markov chain increases. The variable-order Markov model assumes that in realistic settings, there are certain realizations of states (represented by contexts) in which some past states are independent from the future states; accordingly, "a great reduction in the number of model parameters can be achieved."


Definition

Let be a state space (finite
alphabet An alphabet is a standardized set of basic written graphemes (called letters) that represent the phonemes of certain spoken languages. Not all writing systems represent language in this way; in a syllabary, each character represents a syll ...
) of size , A, . Consider a sequence with the Markov property x_1^=x_1x_2\dots x_n of realizations of
random variable A random variable (also called random quantity, aleatory variable, or stochastic variable) is a mathematical formalization of a quantity or object which depends on random events. It is a mapping or a function from possible outcomes (e.g., the po ...
s, where x_i\in A is the state (symbol) at position \scriptstyle (1 \le i \le n), and the concatenation of states x_i and x_ is denoted by x_ix_. Given a training set of observed states, x_1^, the construction algorithm of the VOM models learns a model that provides a
probability Probability is the branch of mathematics concerning numerical descriptions of how likely an Event (probability theory), event is to occur, or how likely it is that a proposition is true. The probability of an event is a number between 0 and ...
assignment for each state in the sequence given its past (previously observed symbols) or future states. Specifically, the learner generates a
conditional probability distribution In probability theory and statistics, given two jointly distributed random variables X and Y, the conditional probability distribution of Y given X is the probability distribution of Y when X is known to be a particular value; in some cases the co ...
P(x_i\mid s) for a symbol x_i \in A given a context s\in A^*, where the * sign represents a sequence of states of any length, including the empty context. VOM models attempt to estimate
conditional distribution In probability theory and statistics, given two jointly distributed random variables X and Y, the conditional probability distribution of Y given X is the probability distribution of Y when X is known to be a particular value; in some cases the co ...
s of the form P(x_i\mid s) where the context length , s, \le D varies depending on the available statistics. In contrast, conventional
Markov models In probability theory, a Markov model is a stochastic model used to model pseudo-randomly changing systems. It is assumed that future states depend only on the current state, not on the events that occurred before it (that is, it assumes the Marko ...
attempt to estimate these
conditional distribution In probability theory and statistics, given two jointly distributed random variables X and Y, the conditional probability distribution of Y given X is the probability distribution of Y when X is known to be a particular value; in some cases the co ...
s by assuming a fixed contexts' length , s, = D and, hence, can be considered as special cases of the VOM models. Effectively, for a given training sequence, the VOM models are found to obtain better model parameterization than the fixed-order
Markov models In probability theory, a Markov model is a stochastic model used to model pseudo-randomly changing systems. It is assumed that future states depend only on the current state, not on the events that occurred before it (that is, it assumes the Marko ...
that leads to a better
variance In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its population mean or sample mean. Variance is a measure of dispersion, meaning it is a measure of how far a set of numbers ...
-bias tradeoff of the learned models.


Application areas

Various efficient algorithms have been devised for estimating the parameters of the VOM model. VOM models have been successfully applied to areas such as
machine learning Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence. Machine ...
,
information theory Information theory is the scientific study of the quantification (science), quantification, computer data storage, storage, and telecommunication, communication of information. The field was originally established by the works of Harry Nyquist a ...
and
bioinformatics Bioinformatics () is an interdisciplinary field that develops methods and software tools for understanding biological data, in particular when the data sets are large and complex. As an interdisciplinary field of science, bioinformatics combi ...
, including specific applications such as coding and data compression, document compression, classification and identification of DNA and
protein sequences Protein primary structure is the linear sequence of amino acids in a peptide or protein. By convention, the primary structure of a protein is reported starting from the amino-terminal (N) end to the carboxyl-terminal (C) end. Protein biosynthes ...


ref name="Shmilovici"/> statistical process control, spam filtering, haplotyping, speech recognition, sequence analysis in social sciences, and others.


See also

*
Stochastic chains with memory of variable length Stochastic chains with memory of variable length are a family of stochastic chains of finite order in a finite alphabet, such as, for every time pass, only one finite suffix of the past, called context, is necessary to predict the next symbol. The ...
* Examples of Markov chains *
Variable order Bayesian network Variable-order Bayesian network (VOBN) models provide an important extension of both the Bayesian network models and the variable-order Markov models. VOBN models are used in machine learning in general and have shown great potential in bioinformat ...
*
Markov process A Markov chain or Markov process is a stochastic model describing a sequence of possible events in which the probability of each event depends only on the state attained in the previous event. Informally, this may be thought of as, "What happe ...
*
Markov chain Monte Carlo In statistics, Markov chain Monte Carlo (MCMC) methods comprise a class of algorithms for sampling from a probability distribution. By constructing a Markov chain that has the desired distribution as its equilibrium distribution, one can obtain ...
*
Semi-Markov process In probability and statistics, a Markov renewal process (MRP) is a random process that generalizes the notion of Markov jump processes. Other random processes like Markov chains, Poisson processes and renewal processes can be derived as special ...
*
Artificial intelligence Artificial intelligence (AI) is intelligence—perceiving, synthesizing, and inferring information—demonstrated by machines, as opposed to intelligence displayed by animals and humans. Example tasks in which this is done include speech re ...


References

{{reflist Markov models