HOME

TheInfoList



OR:

The Viterbi algorithm is a
dynamic programming Dynamic programming is both a mathematical optimization method and a computer programming method. The method was developed by Richard Bellman in the 1950s and has found applications in numerous fields, from aerospace engineering to economics. ...
algorithm In mathematics and computer science, an algorithm () is a finite sequence of rigorous instructions, typically used to solve a class of specific problems or to perform a computation. Algorithms are used as specifications for performing ...
for obtaining the maximum a posteriori probability estimate of the most likely sequence of hidden states—called the Viterbi path—that results in a sequence of observed events, especially in the context of Markov information sources and
hidden Markov model A hidden Markov model (HMM) is a statistical Markov model in which the system being modeled is assumed to be a Markov process — call it X — with unobservable ("''hidden''") states. As part of the definition, HMM requires that there be an o ...
s (HMM). The algorithm has found universal application in decoding the
convolutional code In telecommunication, a convolutional code is a type of error-correcting code that generates parity symbols via the sliding application of a boolean polynomial function to a data stream. The sliding application represents the 'convolution' of t ...
s used in both
CDMA Code-division multiple access (CDMA) is a channel access method used by various radio communication technologies. CDMA is an example of multiple access, where several transmitters can send information simultaneously over a single communicatio ...
and
GSM The Global System for Mobile Communications (GSM) is a standard developed by the European Telecommunications Standards Institute (ETSI) to describe the protocols for second-generation ( 2G) digital cellular networks used by mobile devices such ...
digital cellular, dial-up modems, satellite, deep-space communications, and
802.11 IEEE 802.11 is part of the IEEE 802 set of local area network (LAN) technical standards, and specifies the set of media access control (MAC) and physical layer (PHY) protocols for implementing wireless local area network (WLAN) computer com ...
wireless LANs. It is now also commonly used in
speech recognition Speech recognition is an interdisciplinary subfield of computer science and computational linguistics that develops methodologies and technologies that enable the recognition and translation of spoken language into text by computers with the ...
, speech synthesis, diarization, keyword spotting, computational linguistics, and bioinformatics. For example, in
speech-to-text Speech recognition is an interdisciplinary subfield of computer science and computational linguistics that develops methodologies and technologies that enable the recognition and translation of spoken language into text by computers with the mai ...
(speech recognition), the acoustic signal is treated as the observed sequence of events, and a string of text is considered to be the "hidden cause" of the acoustic signal. The Viterbi algorithm finds the most likely string of text given the acoustic signal.


History

The Viterbi algorithm is named after
Andrew Viterbi Andrew James Viterbi (born Andrea Giacomo Viterbi, March 9, 1935) is an American electrical engineer and businessman who co-founded Qualcomm Inc. and invented the Viterbi algorithm. He is the Presidential Chair Professor of Electrical Engineeri ...
, who proposed it in 1967 as a decoding algorithm for
convolutional codes In telecommunication, a convolutional code is a type of error-correcting code that generates parity symbols via the sliding application of a boolean polynomial function to a data stream. The sliding application represents the 'convolution' of t ...
over noisy digital communication links. It has, however, a history of
multiple invention Multiple may refer to: Economics *Multiple finance, a method used to analyze stock prices *Multiples of the price-to-earnings ratio *Chain stores, are also referred to as 'Multiples' * Box office multiple, the ratio of a film's total gross to th ...
, with at least seven independent discoveries, including those by Viterbi, Needleman and Wunsch, and Wagner and Fischer. It was introduced to Natural Language Processing as a method of
part-of-speech tagging In corpus linguistics, part-of-speech tagging (POS tagging or PoS tagging or POST), also called grammatical tagging is the process of marking up a word in a text (corpus) as corresponding to a particular part of speech, based on both its definitio ...
as early as 1987. ''Viterbi path'' and ''Viterbi algorithm'' have become standard terms for the application of dynamic programming algorithms to maximization problems involving probabilities. For example, in
statistical parsing Statistical parsing is a group of parsing methods within natural language processing. The methods have in common that they associate grammar rules with a probability. Grammar rules are traditionally viewed in computational linguistics as defining ...
a dynamic programming algorithm can be used to discover the single most likely context-free derivation (parse) of a string, which is commonly called the "Viterbi parse". Another application is in
target tracking Target may refer to: Physical items * Shooting target, used in marksmanship training and various shooting sports ** Bullseye (target), the goal one for which one aims in many of these sports ** Aiming point, in field artillery, fi ...
, where the track is computed that assigns a maximum likelihood to a sequence of observations.


Extensions

A generalization of the Viterbi algorithm, termed the ''max-sum algorithm'' (or ''max-product algorithm'') can be used to find the most likely assignment of all or some subset of
latent variable In statistics, latent variables (from Latin: present participle of ''lateo'', “lie hidden”) are variables that can only be inferred indirectly through a mathematical model from other observable variables that can be directly observed or me ...
s in a large number of
graphical model A graphical model or probabilistic graphical model (PGM) or structured probabilistic model is a probabilistic model for which a graph expresses the conditional dependence structure between random variables. They are commonly used in probabili ...
s, e.g.
Bayesian network A Bayesian network (also known as a Bayes network, Bayes net, belief network, or decision network) is a probabilistic graphical model that represents a set of variables and their conditional dependencies via a directed acyclic graph (DAG). Bay ...
s,
Markov random field In the domain of physics and probability, a Markov random field (MRF), Markov network or undirected graphical model is a set of random variables having a Markov property described by an undirected graph. In other words, a random field is said to b ...
s and
conditional random field Conditional random fields (CRFs) are a class of statistical modeling methods often applied in pattern recognition and machine learning and used for structured prediction. Whereas a classifier predicts a label for a single sample without consid ...
s. The latent variables need, in general, to be connected in a way somewhat similar to a
hidden Markov model A hidden Markov model (HMM) is a statistical Markov model in which the system being modeled is assumed to be a Markov process — call it X — with unobservable ("''hidden''") states. As part of the definition, HMM requires that there be an o ...
(HMM), with a limited number of connections between variables and some type of linear structure among the variables. The general algorithm involves ''message passing'' and is substantially similar to the
belief propagation A belief is an attitude that something is the case, or that some proposition is true. In epistemology, philosophers use the term "belief" to refer to attitudes about the world which can be either true or false. To believe something is to take i ...
algorithm (which is the generalization of the forward-backward algorithm). With the algorithm called
iterative Viterbi decoding Iterative Viterbi decoding is an algorithm that spots the subsequence ''S'' of an observation ''O'' = having the highest average probability (i.e., probability scaled by the length of ''S'') of being generated by a given hidden Markov model ''M'' w ...
one can find the subsequence of an observation that matches best (on average) to a given hidden Markov model. This algorithm is proposed by Qi Wang et al. to deal with
turbo code In information theory, turbo codes (originally in French ''Turbocodes'') are a class of high-performance forward error correction (FEC) codes developed around 1990–91, but first published in 1993. They were the first practical codes to closel ...
. Iterative Viterbi decoding works by iteratively invoking a modified Viterbi algorithm, reestimating the score for a filler until convergence. An alternative algorithm, the Lazy Viterbi algorithm, has been proposed. For many applications of practical interest, under reasonable noise conditions, the lazy decoder (using Lazy Viterbi algorithm) is much faster than the original Viterbi decoder (using Viterbi algorithm). While the original Viterbi algorithm calculates every node in the trellis of possible outcomes, the Lazy Viterbi algorithm maintains a prioritized list of nodes to evaluate in order, and the number of calculations required is typically fewer (and never more) than the ordinary Viterbi algorithm for the same result. However, it is not so easy to parallelize in hardware.


Pseudocode

This algorithm generates a path X=(x_1,x_2,\ldots,x_T) , which is a sequence of states x_n \in S=\ that generate the observations Y=(y_1,y_2,\ldots, y_T) with y_n \in O=\, where N is the number of possible observations in the observation space O. Two 2-dimensional tables of size K \times T are constructed: * Each element T_1 ,j/math> of T_1 stores the probability of the most likely path so far \hat=(\hat_1,\hat_2,\ldots,\hat_j) with \hat_j=s_i that generates Y=(y_1,y_2,\ldots, y_j). * Each element T_2 ,j of T_2 stores \hat_ of the most likely path so far \hat=(\hat_1,\hat_2,\ldots,\hat_,\hat_j = s_i) \forall j, 2\leq j \leq T The table entries T_1 ,jT_2 ,j/math> are filled by increasing order of K\cdot j+i : :T_1 ,j\max_ , :T_2 ,j\operatorname_ , with A_ and B_ as defined below. Note that B_ does not need to appear in the latter expression, as it's non-negative and independent of k and thus does not affect the argmax. ;Input: * The observation space O=\, * the
state space A state space is the set of all possible configurations of a system. It is a useful abstraction for reasoning about the behavior of a given system and is widely used in the fields of artificial intelligence and game theory. For instance, the to ...
S=\ , * an array of initial probabilities \Pi = (\pi_1,\pi_2,\dots,\pi_K) such that \pi_i stores the probability that x_1 = s_i , * a sequence of observations Y=(y_1,y_2,\ldots, y_T) such that y_t=o_i if the observation at time t is o_i , * transition matrix A of size K\times K such that A_ stores the
transition probability A Markov chain or Markov process is a stochastic model describing a sequence of possible events in which the probability of each event depends only on the state attained in the previous event. Informally, this may be thought of as, "What happen ...
of transiting from state s_i to state s_j , * emission matrix B of size K\times N such that B_ stores the probability of observing o_j from state s_i . ;Output * The most likely hidden state sequence X=(x_1,x_2,\ldots,x_T) function ''VITERBI''(O,S,\Pi,Y,A,B):X for each state i=1,2,\ldots,K do T_1 ,1leftarrow\pi_i\cdot B_ T_2 ,1leftarrow 0 end for for each observation j = 2,3,\ldots,T do for each state i =1,2,\ldots,K do end for end for x_T\leftarrow s_ for j=T,T-1,\ldots,2 do z_\leftarrow T_2 _j,j/math> x_\leftarrow s_ end for return X end function Restated in a succinct near-
Python Python may refer to: Snakes * Pythonidae, a family of nonvenomous snakes found in Africa, Asia, and Australia ** ''Python'' (genus), a genus of Pythonidae found in Africa and Asia * Python (mythology), a mythical serpent Computing * Python (pro ...
: function ''viterbi''(O, S, \Pi, Tm, Em): best\_path Tm: transition matrix Em: emission matrix trellis \leftarrow matrix(length(S), length(O)) To hold probability of each state given each observation pointers \leftarrow matrix(length(S), length(O)) To hold backpointer to best prior state for s in range(length(S)): Determine each hidden state's probability at time 0… trellis
, 0 The comma is a punctuation mark that appears in several variants in different languages. It has the same shape as an apostrophe or single closing quotation mark () in many typefaces, but it differs from them in being placed on the baseline ...
\leftarrow \Pi \cdot Em , O[0 for o in range(1, length(O)): …and after, tracking each state's most likely prior state, k for s in range(length(S)): k \leftarrow \arg\max(k\ \mathsf\ trellis[k, o-1] \cdot Tm[k, s] \cdot Em[s, o]) trellis[s, o] \leftarrow trellis[k, o-1] \cdot Tm[k, s] \cdot Em[s, o] pointers[s, o] \leftarrow k best\_path \leftarrow list() k \leftarrow \arg\max(k\ \mathsf\ trellis , length(O)-1) Find k of best final state for o in range(length(O)-1, -1, -1): Backtrack from last observation best\_path.insert(0, S Insert previous state on most likely path k \leftarrow pointers
, o The comma is a punctuation mark that appears in several variants in different languages. It has the same shape as an apostrophe or single closing quotation mark () in many typefaces, but it differs from them in being placed on the baseline o ...
/math> Use backpointer to find best previous state return best\_path ;Explanation: Suppose we are given a
hidden Markov model A hidden Markov model (HMM) is a statistical Markov model in which the system being modeled is assumed to be a Markov process — call it X — with unobservable ("''hidden''") states. As part of the definition, HMM requires that there be an o ...
(HMM) with state space S, initial probabilities \pi_i of being in state i and transition probabilities a_ of transitioning from state i to state j. Say, we observe outputs y_1,\dots, y_T. The most likely state sequence x_1,\dots,x_T that produces the observations is given by the recurrence relationsXing E, slide 11. : \begin V_ &= \mathrm\big( y_1 \ , \ k \big) \cdot \pi_k, \\ V_ &= \max_ \left( \mathrm\big( y_t \ , \ k \big) \cdot a_ \cdot V_\right). \end Here V_ is the probability of the most probable state sequence \mathrm\big(x_1,\dots,x_t,y_1,\dots, y_t\big) responsible for the first t observations that have k as its final state. The Viterbi path can be retrieved by saving back pointers that remember which state x was used in the second equation. Let \mathrm(k,t) be the function that returns the value of x used to compute V_ if t > 1, or k if t=1. Then : \begin x_T &= \arg\max_ (V_), \\ x_ &= \mathrm(x_t,t). \end Here we're using the standard definition of
arg max In mathematics, the arguments of the maxima (abbreviated arg max or argmax) are the points, or elements, of the domain of some function at which the function values are maximized.For clarity, we refer to the input (''x'') as ''points'' and t ...
. The complexity of this implementation is O(T\times\left, \^2). A better estimation exists if the maximum in the internal loop is instead found by iterating only over states that directly link to the current state (i.e. there is an edge from k to j). Then using
amortized analysis In computer science, amortized analysis is a method for analyzing a given algorithm's complexity, or how much of a resource, especially time or memory, it takes to execute. The motivation for amortized analysis is that looking at the worst-case ...
one can show that the complexity is O(T\times(\left, \ + \left, \)), where E is the number of edges in the graph.


Example

Consider a village where all villagers are either healthy or have a fever, and only the village doctor can determine whether each has a fever. The doctor diagnoses fever by asking patients how they feel. The villagers may only answer that they feel normal, dizzy, or cold. The doctor believes that the health condition of the patients operates as a discrete Markov chain. There are two states, "Healthy" and "Fever", but the doctor cannot observe them directly; they are ''hidden'' from the doctor. On each day, there is a certain chance that a patient will tell the doctor "I feel normal", "I feel cold", or "I feel dizzy", depending on the patient's health condition. The ''observations'' (normal, cold, dizzy) along with a ''hidden'' state (healthy, fever) form a hidden Markov model (HMM), and can be represented as follows in the
Python programming language Python is a high-level, general-purpose programming language. Its design philosophy emphasizes code readability with the use of significant indentation. Python is dynamically-typed and garbage-collected. It supports multiple programming p ...
: obs = ("normal", "cold", "dizzy") states = ("Healthy", "Fever") start_p = trans_p = emit_p = In this piece of code, start_p represents the doctor's belief about which state the HMM is in when the patient first visits (all the doctor knows is that the patient tends to be healthy). The particular probability distribution used here is not the equilibrium one, which is (given the transition probabilities) approximately . The transition_p represents the change of the health condition in the underlying Markov chain. In this example, a patient who is healthy today has only a 30% chance of having a fever tomorrow. The emit_p represents how likely each possible observation (normal, cold, or dizzy) is, given the underlying condition (healthy or fever). A patient who is healthy has a 50% chance of feeling normal; one who has a fever has a 60% chance of feeling dizzy. A patient visits three days in a row, and the doctor discovers that the patient feels normal on the first day, cold on the second day, and dizzy on the third day. The doctor has a question: what is the most likely sequence of health conditions of the patient that would explain these observations? This is answered by the Viterbi algorithm. def viterbi(obs, states, start_p, trans_p, emit_p): V = [] for st in states: V[0] t= # Run Viterbi when t > 0 for t in range(1, len(obs)): V.append() for st in states: max_tr_prob = V[t - 1] tates[0_["prob".html" ;"title=".html" ;"title="tates[0">tates[0 ["prob"">.html" ;"title="tates[0">tates[0 ["prob"* trans_p tates[0 t* emit_p t bs[t prev_st_selected = states[0] for prev_st in states[1:]: tr_prob = V[t - 1] [prev_st] ["prob"] * trans_p[prev_st] t* emit_p t bs[t if tr_prob > max_tr_prob: max_tr_prob = tr_prob prev_st_selected = prev_st max_prob = max_tr_prob V t= for line in dptable(V): print(line) opt = [] max_prob = 0.0 best_st = None # Get most probable state and its backtrack for st, data in V[-1].items(): if data["prob"] > max_prob: max_prob = data["prob"] best_st = st opt.append(best_st) previous = best_st # Follow the backtrack till the first observation for t in range(len(V) - 2, -1, -1): opt.insert(0, V + 1 revious prev" previous = V + 1 revious prev" print ("The steps of states are " + " ".join(opt) + " with highest probability of %s" % max_prob) def dptable(V): # Print a table of steps from dictionary yield " " * 5 + " ".join(("%3d" % i) for i in range(len(V))) for state in V yield "%.7s: " % state + " ".join("%.7s" % ("%lf" % v
tate Tate is an institution that houses, in a network of four art galleries, the United Kingdom's national collection of British art, and international modern and contemporary art. It is not a government institution, but its main sponsor is the U ...
prob" for v in V)
The function viterbi takes the following arguments: obs is the sequence of observations, e.g. normal', 'cold', 'dizzy'/code>; states is the set of hidden states; start_p is the start probability; trans_p are the transition probabilities; and emit_p are the emission probabilities. For simplicity of code, we assume that the observation sequence obs is non-empty and that trans_p /code> and emit_p /code> is defined for all states i,j. In the running example, the forward/Viterbi algorithm is used as follows: viterbi(obs, states, start_p, trans_p, emit_p) The output of the script is $ python viterbi_example.py 0 1 2 Healthy: 0.30000 0.08400 0.00588 Fever: 0.04000 0.02700 0.01512 The steps of states are Healthy Healthy Fever with highest probability of 0.01512 This reveals that the observations normal', 'cold', 'dizzy'/code> were most likely generated by states Healthy', 'Healthy', 'Fever'/code>. In other words, given the observed activities, the patient was most likely to have been healthy on the first day and also on the second day (despite feeling cold that day), and only to have contracted a fever on the third day. The operation of Viterbi's algorithm can be visualized by means of a trellis diagram. The Viterbi path is essentially the shortest path through this trellis.


Soft output Viterbi algorithm

The soft output Viterbi algorithm (SOVA) is a variant of the classical Viterbi algorithm. SOVA differs from the classical Viterbi algorithm in that it uses a modified path metric which takes into account the ''a priori probabilities'' of the input symbols, and produces a ''soft'' output indicating the ''reliability'' of the decision. The first step in the SOVA is the selection of the survivor path, passing through one unique node at each time instant, ''t''. Since each node has 2 branches converging at it (with one branch being chosen to form the ''Survivor Path'', and the other being discarded), the difference in the branch metrics (or ''cost'') between the chosen and discarded branches indicate the ''amount of error'' in the choice. This ''cost'' is accumulated over the entire sliding window (usually equals ''at least'' five constraint lengths), to indicate the ''soft output'' measure of reliability of the ''hard bit decision'' of the Viterbi algorithm.


See also

*
Expectation–maximization algorithm In statistics, an expectation–maximization (EM) algorithm is an iterative method to find (local) maximum likelihood or maximum a posteriori (MAP) estimates of parameters in statistical models, where the model depends on unobserved latent variabl ...
*
Baum–Welch algorithm In electrical engineering, statistical computing and bioinformatics, the Baum–Welch algorithm is a special case of the expectation–maximization algorithm used to find the unknown parameters of a hidden Markov model (HMM). It makes use of the ...
* Forward-backward algorithm * Forward algorithm * Error-correcting code * Viterbi decoder *
Hidden Markov model A hidden Markov model (HMM) is a statistical Markov model in which the system being modeled is assumed to be a Markov process — call it X — with unobservable ("''hidden''") states. As part of the definition, HMM requires that there be an o ...
*
Part-of-speech tagging In corpus linguistics, part-of-speech tagging (POS tagging or PoS tagging or POST), also called grammatical tagging is the process of marking up a word in a text (corpus) as corresponding to a particular part of speech, based on both its definitio ...
*
A* search algorithm A* (pronounced "A-star") is a graph traversal and path search algorithm, which is used in many fields of computer science due to its completeness, optimality, and optimal efficiency. One major practical drawback is its O(b^d) space complexity, ...


References


General references

* (note: the Viterbi decoding algorithm is described in section IV.) Subscription required. * * Subscription required. * * {{cite journal , author=Rabiner LR , title=A tutorial on hidden Markov models and selected applications in speech recognition , journal=Proceedings of the IEEE , volume=77 , issue=2 , pages=257–286 , date=February 1989 , doi=10.1109/5.18626, citeseerx=10.1.1.381.3454 , s2cid=13618539 (Describes the forward algorithm and Viterbi algorithm for HMMs). * Shinghal, R. and Godfried T. Toussaint, "Experiments in text recognition with the modified Viterbi algorithm," ''IEEE Transactions on Pattern Analysis and Machine Intelligence'', Vol. PAMI-l, April 1979, pp. 184–193. * Shinghal, R. and Godfried T. Toussaint, "The sensitivity of the modified Viterbi algorithm to the source statistics," ''IEEE Transactions on Pattern Analysis and Machine Intelligence'', vol. PAMI-2, March 1980, pp. 181–185.


External links

* Implementations in Java, F#, Clojure, C# on Wikibooks
Tutorial
on convolutional coding with viterbi decoding, by Chip Fleming
A tutorial for a Hidden Markov Model toolkit (implemented in C) that contains a description of the Viterbi algorithm

Viterbi algorithm
by Dr. Andrew J. Viterbi (scholarpedia.org).


Implementations


Mathematica
has an implementation as part of its support for stochastic processes
Susa
signal processing framework provides the C++ implementation for Forward error correction codes and channel equalizatio
here

C++

C#

Java

Java 8

Julia (HMMBase.jl)

Perl

Prolog



Go

SFIHMM
includes code for Viterbi decoding. Error detection and correction Dynamic programming Markov models Articles with example Python (programming language) code