In
machine learning
Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of Computational statistics, statistical algorithms that can learn from data and generalise to unseen data, and thus perform Task ( ...
, sequence labeling is a type of
pattern recognition
Pattern recognition is the task of assigning a class to an observation based on patterns extracted from data. While similar, pattern recognition (PR) is not to be confused with pattern machines (PM) which may possess PR capabilities but their p ...
task that involves the algorithmic assignment of a
categorical label to each member of a sequence of observed values. A common example of a sequence labeling task is
part of speech tagging
In corpus linguistics, part-of-speech tagging (POS tagging, PoS tagging, or POST), also called grammatical tagging, is the process of marking up a word in a text ( corpus) as corresponding to a particular part of speech, based on both its definiti ...
, which seeks to assign a
part of speech
In grammar, a part of speech or part-of-speech ( abbreviated as POS or PoS, also known as word class or grammatical category) is a category of words (or, more generally, of lexical items) that have similar grammatical properties. Words that are ...
to each word in an input sentence or document. Sequence labeling can be treated as a set of independent
classification
Classification is the activity of assigning objects to some pre-existing classes or categories. This is distinct from the task of establishing the classes themselves (for example through cluster analysis). Examples include diagnostic tests, identif ...
tasks, one per member of the sequence. However, accuracy is generally improved by making the optimal label for a given element dependent on the choices of nearby elements, using special algorithms to choose the ''globally'' best set of labels for the entire sequence at once.
As an example of why finding the globally best label sequence might produce better results than labeling one item at a time, consider the part-of-speech tagging task just described. Frequently, many words are members of multiple parts of speech, and the correct label of such a word can often be deduced from the correct label of the word to the immediate left or right. For example, the word "sets" can be either a noun or verb. In a phrase like "he sets the books down", the word "he" is unambiguously a pronoun, and "the" unambiguously a
determiner
Determiner, also called determinative ( abbreviated ), is a term used in some models of grammatical description to describe a word or affix belonging to a class of noun modifiers. A determiner combines with a noun to express its reference. Examp ...
, and using either of these labels, "sets" can be deduced to be a verb, since nouns very rarely follow pronouns and are less likely to precede determiners than verbs are. But in other cases, only one of the adjacent words is similarly helpful. In "he sets and then knocks over the table", only the word "he" to the left is helpful (cf. "...picks up the sets and then knocks over..."). Conversely, in "... and also sets the table" only the word "the" to the right is helpful (cf. "... and also sets of books were ..."). An algorithm that proceeds from left to right, labeling one word at a time, can only use the tags of left-adjacent words and might fail in the second example above; vice versa for an algorithm that proceeds from right to left.
Most sequence labeling algorithms are
probabilistic in nature, relying on
statistical inference
Statistical inference is the process of using data analysis to infer properties of an underlying probability distribution.Upton, G., Cook, I. (2008) ''Oxford Dictionary of Statistics'', OUP. . Inferential statistical analysis infers properties of ...
to find the best sequence. The most common statistical models in use for sequence labeling make a Markov assumption, i.e. that the choice of label for a particular word is directly dependent only on the immediately adjacent labels; hence the set of labels forms a
Markov chain
In probability theory and statistics, a Markov chain or Markov process is a stochastic process describing a sequence of possible events in which the probability of each event depends only on the state attained in the previous event. Informally ...
. This leads naturally to the
hidden Markov model (HMM), one of the most common statistical models used for sequence labeling. Other common models in use are the
maximum entropy Markov model and
conditional random field.
See also
*
Artificial intelligence
Artificial intelligence (AI) is the capability of computer, computational systems to perform tasks typically associated with human intelligence, such as learning, reasoning, problem-solving, perception, and decision-making. It is a field of re ...
*
Bayesian networks (of which HMMs are an example)
*
Classification (machine learning)
When classification is performed by a computer, statistical methods are normally used to develop the algorithm.
Often, the individual observations are analyzed into a set of quantifiable properties, known variously as explanatory variables or ''fe ...
*
Linear dynamical system, which applies to tasks where the "label" is actually a real number
*
Machine learning
Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of Computational statistics, statistical algorithms that can learn from data and generalise to unseen data, and thus perform Task ( ...
*
Pattern recognition
Pattern recognition is the task of assigning a class to an observation based on patterns extracted from data. While similar, pattern recognition (PR) is not to be confused with pattern machines (PM) which may possess PR capabilities but their p ...
*
Sequence mining
References
Further reading
* Erdogan H.
"Sequence labeling: generative and discriminative approaches, hidden Markov models, conditional random fields and structured SVMs," ICMLA 2010 tutorial, Bethesda, MD (2010)
{{DEFAULTSORT:Sequence Labeling
Machine learning