HOME

TheInfoList



OR:

In
machine learning Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence. Machine ...
, sequence labeling is a type of
pattern recognition Pattern recognition is the automated recognition of patterns and regularities in data. It has applications in statistical data analysis, signal processing, image analysis, information retrieval, bioinformatics, data compression, computer graphi ...
task that involves the algorithmic assignment of a categorical label to each member of a sequence of observed values. A common example of a sequence labeling task is
part of speech tagging In corpus linguistics, part-of-speech tagging (POS tagging or PoS tagging or POST), also called grammatical tagging is the process of marking up a word in a text (corpus) as corresponding to a particular part of speech, based on both its definit ...
, which seeks to assign a
part of speech In grammar, a part of speech or part-of-speech (abbreviated as POS or PoS, also known as word class or grammatical category) is a category of words (or, more generally, of lexical items) that have similar grammatical properties. Words that are assi ...
to each word in an input sentence or document. Sequence labeling can be treated as a set of independent
classification Classification is a process related to categorization, the process in which ideas and objects are recognized, differentiated and understood. Classification is the grouping of related facts into classes. It may also refer to: Business, organizat ...
tasks, one per member of the sequence. However, accuracy is generally improved by making the optimal label for a given element dependent on the choices of nearby elements, using special algorithms to choose the ''globally'' best set of labels for the entire sequence at once. As an example of why finding the globally best label sequence might produce better results than labeling one item at a time, consider the part-of-speech tagging task just described. Frequently, many words are members of multiple parts of speech, and the correct label of such a word can often be deduced from the correct label of the word to the immediate left or right. For example, the word "sets" can be either a noun or verb. In a phrase like "he sets the books down", the word "he" is unambiguously a pronoun, and "the" unambiguously a
determiner A determiner, also called determinative (abbreviated ), is a word, phrase, or affix that occurs together with a noun or noun phrase and generally serves to express the reference of that noun or noun phrase in the context. That is, a determiner m ...
, and using either of these labels, "sets" can be deduced to be a verb, since nouns very rarely follow pronouns and are less likely to precede determiners than verbs are. But in other cases, only one of the adjacent words is similarly helpful. In "he sets and then knocks over the table", only the word "he" to the left is helpful (cf. "...picks up the sets and then knocks over..."). Conversely, in "... and also sets the table" only the word "the" to the right is helpful (cf. "... and also sets of books were ..."). An algorithm that proceeds from left to right, labeling one word at a time, can only use the tags of left-adjacent words and might fail in the second example above; vice versa for an algorithm that proceeds from right to left. Most sequence labeling algorithms are
probabilistic Probability is the branch of mathematics concerning numerical descriptions of how likely an Event (probability theory), event is to occur, or how likely it is that a proposition is true. The probability of an event is a number between 0 and ...
in nature, relying on
statistical inference Statistical inference is the process of using data analysis to infer properties of an underlying probability distribution, distribution of probability.Upton, G., Cook, I. (2008) ''Oxford Dictionary of Statistics'', OUP. . Inferential statistical ...
to find the best sequence. The most common statistical models in use for sequence labeling make a Markov assumption, i.e. that the choice of label for a particular word is directly dependent only on the immediately adjacent labels; hence the set of labels forms a
Markov chain A Markov chain or Markov process is a stochastic model describing a sequence of possible events in which the probability of each event depends only on the state attained in the previous event. Informally, this may be thought of as, "What happe ...
. This leads naturally to the
hidden Markov model A hidden Markov model (HMM) is a statistical Markov model in which the system being modeled is assumed to be a Markov process — call it X — with unobservable ("''hidden''") states. As part of the definition, HMM requires that there be an ob ...
(HMM), one of the most common statistical models used for sequence labeling. Other common models in use are the
maximum entropy Markov model In statistics, a maximum-entropy Markov model (MEMM), or conditional Markov model (CMM), is a graphical model for sequence labeling that combines features of hidden Markov models (HMMs) and maximum entropy (MaxEnt) models. An MEMM is a discrimina ...
and
conditional random field Conditional random fields (CRFs) are a class of statistical modeling methods often applied in pattern recognition and machine learning and used for structured prediction. Whereas a classifier predicts a label for a single sample without consid ...
.


See also

*
Artificial intelligence Artificial intelligence (AI) is intelligence—perceiving, synthesizing, and inferring information—demonstrated by machines, as opposed to intelligence displayed by animals and humans. Example tasks in which this is done include speech re ...
*
Bayesian network A Bayesian network (also known as a Bayes network, Bayes net, belief network, or decision network) is a probabilistic graphical model that represents a set of variables and their conditional dependencies via a directed acyclic graph (DAG). Bay ...
s (of which HMMs are an example) *
Classification (machine learning) In statistics, classification is the problem of identifying which of a set of categories (sub-populations) an observation (or observations) belongs to. Examples are assigning a given email to the "spam" or "non-spam" class, and assigning a diagno ...
*
Linear dynamical system Linear dynamical systems are dynamical systems whose evaluation functions are linear. While dynamical systems, in general, do not have closed-form solutions, linear dynamical systems can be solved exactly, and they have a rich set of mathematical ...
, which applies to tasks where the "label" is actually a real number *
Machine learning Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence. Machine ...
*
Pattern recognition Pattern recognition is the automated recognition of patterns and regularities in data. It has applications in statistical data analysis, signal processing, image analysis, information retrieval, bioinformatics, data compression, computer graphi ...
*
Sequence mining Sequential pattern mining is a topic of data mining concerned with finding statistically relevant patterns between data examples where the values are delivered in a sequence. It is usually presumed that the values are discrete, and thus time serie ...


References


Further reading

* Erdogan H.

"Sequence labeling: generative and discriminative approaches, hidden Markov models, conditional random fields and structured SVMs," ICMLA 2010 tutorial, Bethesda, MD (2010) {{DEFAULTSORT:Sequence Labeling Machine learning