The Viterbi algorithm is a
dynamic programming
Dynamic programming is both a mathematical optimization method and a computer programming method. The method was developed by Richard Bellman in the 1950s and has found applications in numerous fields, from aerospace engineering to economics. ...
algorithm
In mathematics and computer science, an algorithm () is a finite sequence of rigorous instructions, typically used to solve a class of specific problems or to perform a computation. Algorithms are used as specifications for performing ...
for obtaining the
maximum a posteriori probability estimate of the most
likely sequence of hidden states—called the Viterbi path—that results in a sequence of observed events, especially in the context of
Markov information sources and
hidden Markov model
A hidden Markov model (HMM) is a statistical Markov model in which the system being modeled is assumed to be a Markov process — call it X — with unobservable ("''hidden''") states. As part of the definition, HMM requires that there be an o ...
s (HMM).
The algorithm has found universal application in decoding the
convolutional code
In telecommunication, a convolutional code is a type of error-correcting code that generates parity symbols via the sliding application of a boolean polynomial function to a data stream. The sliding application represents the 'convolution' of t ...
s used in both
CDMA
Code-division multiple access (CDMA) is a channel access method used by various radio communication technologies. CDMA is an example of multiple access, where several transmitters can send information simultaneously over a single communicatio ...
and
GSM
The Global System for Mobile Communications (GSM) is a standard developed by the European Telecommunications Standards Institute (ETSI) to describe the protocols for second-generation ( 2G) digital cellular networks used by mobile devices such ...
digital cellular,
dial-up modems, satellite, deep-space communications, and
802.11
IEEE 802.11 is part of the IEEE 802 set of local area network (LAN) technical standards, and specifies the set of media access control (MAC) and physical layer (PHY) protocols for implementing wireless local area network (WLAN) computer com ...
wireless LANs. It is now also commonly used in
speech recognition
Speech recognition is an interdisciplinary subfield of computer science and computational linguistics that develops methodologies and technologies that enable the recognition and translation of spoken language into text by computers with the ...
,
speech synthesis,
diarization,
keyword spotting,
computational linguistics, and
bioinformatics. For example, in
speech-to-text
Speech recognition is an interdisciplinary subfield of computer science and computational linguistics that develops methodologies and technologies that enable the recognition and translation of spoken language into text by computers with the mai ...
(speech recognition), the acoustic signal is treated as the observed sequence of events, and a string of text is considered to be the "hidden cause" of the acoustic signal. The Viterbi algorithm finds the most likely string of text given the acoustic signal.
History
The Viterbi algorithm is named after
Andrew Viterbi
Andrew James Viterbi (born Andrea Giacomo Viterbi, March 9, 1935) is an American electrical engineer and businessman who co-founded Qualcomm Inc. and invented the Viterbi algorithm. He is the Presidential Chair Professor of Electrical Engineeri ...
, who proposed it in 1967 as a decoding algorithm for
convolutional codes
In telecommunication, a convolutional code is a type of error-correcting code that generates parity symbols via the sliding application of a boolean polynomial function to a data stream. The sliding application represents the 'convolution' of t ...
over noisy digital communication links. It has, however, a history of
multiple invention
Multiple may refer to:
Economics
*Multiple finance, a method used to analyze stock prices
*Multiples of the price-to-earnings ratio
*Chain stores, are also referred to as 'Multiples'
* Box office multiple, the ratio of a film's total gross to th ...
, with at least seven independent discoveries, including those by Viterbi,
Needleman and Wunsch, and
Wagner and Fischer.
It was introduced to
Natural Language Processing as a method of
part-of-speech tagging
In corpus linguistics, part-of-speech tagging (POS tagging or PoS tagging or POST), also called grammatical tagging is the process of marking up a word in a text (corpus) as corresponding to a particular part of speech, based on both its definitio ...
as early as 1987.
''Viterbi path'' and ''Viterbi algorithm'' have become standard terms for the application of dynamic programming algorithms to maximization problems involving probabilities.
For example, in
statistical parsing Statistical parsing is a group of parsing methods within natural language processing. The methods have in common that they associate grammar rules with a probability. Grammar rules are traditionally viewed in computational linguistics as defining ...
a dynamic programming algorithm can be used to discover the single most likely context-free derivation (parse) of a string, which is commonly called the "Viterbi parse". Another application is in
target tracking
Target may refer to:
Physical items
* Shooting target, used in marksmanship training and various shooting sports
** Bullseye (target), the goal one for which one aims in many of these sports
** Aiming point, in field artillery, fi ...
, where the track is computed that assigns a maximum likelihood to a sequence of observations.
Extensions
A generalization of the Viterbi algorithm, termed the ''max-sum algorithm'' (or ''max-product algorithm'') can be used to find the most likely assignment of all or some subset of
latent variable
In statistics, latent variables (from Latin: present participle of ''lateo'', “lie hidden”) are variables that can only be inferred indirectly through a mathematical model from other observable variables that can be directly observed or me ...
s in a large number of
graphical model
A graphical model or probabilistic graphical model (PGM) or structured probabilistic model is a probabilistic model for which a graph expresses the conditional dependence structure between random variables. They are commonly used in probabili ...
s, e.g.
Bayesian network
A Bayesian network (also known as a Bayes network, Bayes net, belief network, or decision network) is a probabilistic graphical model that represents a set of variables and their conditional dependencies via a directed acyclic graph (DAG). Bay ...
s,
Markov random field
In the domain of physics and probability, a Markov random field (MRF), Markov network or undirected graphical model is a set of random variables having a Markov property described by an undirected graph. In other words, a random field is said to b ...
s and
conditional random field
Conditional random fields (CRFs) are a class of statistical modeling methods often applied in pattern recognition and machine learning and used for structured prediction. Whereas a classifier predicts a label for a single sample without consid ...
s. The latent variables need, in general, to be connected in a way somewhat similar to a
hidden Markov model
A hidden Markov model (HMM) is a statistical Markov model in which the system being modeled is assumed to be a Markov process — call it X — with unobservable ("''hidden''") states. As part of the definition, HMM requires that there be an o ...
(HMM), with a limited number of connections between variables and some type of linear structure among the variables. The general algorithm involves ''message passing'' and is substantially similar to the
belief propagation
A belief is an attitude that something is the case, or that some proposition is true. In epistemology, philosophers use the term "belief" to refer to attitudes about the world which can be either true or false. To believe something is to take i ...
algorithm (which is the generalization of the
forward-backward algorithm).
With the algorithm called
iterative Viterbi decoding Iterative Viterbi decoding is an algorithm that spots the subsequence ''S'' of an observation ''O'' = having the highest average probability (i.e., probability scaled by the length of ''S'') of being generated by a given hidden Markov model ''M'' w ...
one can find the subsequence of an observation that matches best (on average) to a given hidden Markov model. This algorithm is proposed by Qi Wang et al. to deal with
turbo code
In information theory, turbo codes (originally in French ''Turbocodes'') are a class of high-performance forward error correction (FEC) codes developed around 1990–91, but first published in 1993. They were the first practical codes to closel ...
. Iterative Viterbi decoding works by iteratively invoking a modified Viterbi algorithm, reestimating the score for a filler until convergence.
An alternative algorithm, the
Lazy Viterbi algorithm, has been proposed. For many applications of practical interest, under reasonable noise conditions, the lazy decoder (using Lazy Viterbi algorithm) is much faster than the original
Viterbi decoder (using Viterbi algorithm). While the original Viterbi algorithm calculates every node in the
trellis of possible outcomes, the Lazy Viterbi algorithm maintains a prioritized list of nodes to evaluate in order, and the number of calculations required is typically fewer (and never more) than the ordinary Viterbi algorithm for the same result. However, it is not so easy to parallelize in hardware.
Pseudocode
This algorithm generates a path
, which is a sequence of states
that generate the observations
with
, where
is the number of possible observations in the observation space
.
Two 2-dimensional tables of size
are constructed:
* Each element