Apache OpenNLP
   HOME

TheInfoList



OR:

The Apache OpenNLP library is a
machine learning Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence. Machine ...
based toolkit for the processing of natural language text. It supports the most common NLP tasks, such as
language detection In natural language processing, language identification or language guessing is the problem of determining which natural language given content is in. Computational approaches to this problem view it as a special case of text categorization, s ...
,
tokenization Tokenization may refer to: * Tokenization (lexical analysis) in language processing * Tokenization (data security) in the field of data security * Word segmentation * Tokenism Tokenism is the practice of making only a perfunctory or symbolic ...
,
sentence segmentation Sentence boundary disambiguation (SBD), also known as sentence breaking, sentence boundary detection, and sentence segmentation, is the problem in natural language processing of deciding where sentences begin and end. Natural language processing to ...
,
part-of-speech tagging In corpus linguistics, part-of-speech tagging (POS tagging or PoS tagging or POST), also called grammatical tagging is the process of marking up a word in a text (corpus) as corresponding to a particular part of speech, based on both its definitio ...
, named entity extraction, chunking,
parsing Parsing, syntax analysis, or syntactic analysis is the process of analyzing a string of symbols, either in natural language, computer languages or data structures, conforming to the rules of a formal grammar. The term ''parsing'' comes from Lati ...
and
coreference resolution In linguistics, coreference, sometimes written co-reference, occurs when two or more expressions refer to the same person or thing; they have the same referent. For example, in ''Bill said Alice would arrive soon, and she did'', the words ''Alice'' ...
. These tasks are usually required to build more advanced text processing services.Apache OpenNLP Proposal
/ref>


See also

*
Unstructured Information Management Architecture UIMA ( ), short for Unstructured Information Management Architecture, is an OASIS standard for content analytics, originally developed at IBM. It provides a component software architecture for the development, discovery, composition, and deplo ...
(UIMA) *
General Architecture for Text Engineering General Architecture for Text Engineering or GATE is a Java suite of tools originally developed at the University of Sheffield beginning in 1995 and now used worldwide by a wide community of scientists, companies, teachers and students for many nat ...
(GATE) *
cTAKES Apache cTAKES: clinical Text Analysis and Knowledge Extraction System is an open-source Natural Language Processing (NLP) system that extracts clinical information from electronic health record unstructured text. It processes clinical notes, id ...


References


External links


Apache OpenNLP Website
{{Apache Software Foundation Natural language processing Statistical natural language processing Natural language processing toolkits
OpenNLP The Apache OpenNLP library is a machine learning based toolkit for the processing of natural language text. It supports the most common NLP tasks, such as language detection, tokenization, sentence segmentation, part-of-speech tagging, named en ...
Java (programming language) libraries Cross-platform software 2004 software