Keyword extraction is tasked with the automatic identification of terms that best describe the subject of a document.
''Key phrases'', ''key terms'', ''key segments'' or just ''keywords'' are the terminology which is used for defining the terms that represent the most relevant information contained in the document. Although the terminology is different, function is the same: characterization of the topic discussed in a document. The task of keyword extraction is an important problem in
text mining
Text mining, text data mining (TDM) or text analytics is the process of deriving high-quality information from text. It involves "the discovery by computer of new, previously unknown information, by automatically extracting information from differe ...
,
information extraction,
information retrieval
Information retrieval (IR) in computing and information science is the task of identifying and retrieving information system resources that are relevant to an Information needs, information need. The information need can be specified in the form ...
and
natural language processing
Natural language processing (NLP) is a subfield of computer science and especially artificial intelligence. It is primarily concerned with providing computers with the ability to process data encoded in natural language and is thus closely related ...
(NLP).
Keyword assignment vs. extraction
Keyword assignment methods can be roughly divided into:
* keyword assignment (keywords are chosen from controlled vocabulary or taxonomy) and
* keyword extraction (keywords are chosen from words that are explicitly mentioned in original text).
Methods for automatic keyword extraction can be supervised, semi-supervised, or unsupervised. Unsupervised methods can be further divided into simple statistics, linguistics or graph-based, or
ensemble methods that combine some or most of these methods.
References
Further reading
*
Natural language processing
{{comp-ling-stub