LanguageWare is a
natural language processing
Natural language processing (NLP) is an interdisciplinary subfield of linguistics, computer science, and artificial intelligence concerned with the interactions between computers and human language, in particular how to program computers to proc ...
(NLP) technology developed by
IBM, which allows applications to process natural language text. It comprises a set of Java libraries which provide a range of
NLP functions:
language identification, text segmentation/tokenization, normalization, entity and
relationship extraction A relationship extraction task requires the detection and classification of semantic relationship mentions within a set of artifacts, typically from text or XML documents. The task is very similar to that of information extraction (IE), but IE a ...
, and semantic analysis and disambiguation. The analysis engine uses
Finite State Machine
A finite-state machine (FSM) or finite-state automaton (FSA, plural: ''automata''), finite automaton, or simply a state machine, is a mathematical model of computation. It is an abstract machine that can be in exactly one of a finite number o ...
approach at multiple levels, which aids its performance characteristics, while maintaining a reasonably small footprint.
The behaviour of the system is driven by a set of configurable lexico-semantic resources which describe the characteristics and domain of the processed language. A default set of resources comes as part of LanguageWare and these describe the native language characteristics, such as morphology, and the basic vocabulary for the language. Supplemental resources have been created which capture additional vocabularies, terminologies, rules and grammars, which may be generic to the language or specific to one or more domains.
A set of
Eclipse
An eclipse is an astronomical event that occurs when an astronomical object or spacecraft is temporarily obscured, by passing into the shadow of another body or by having another body pass between it and the viewer. This alignment of three ce ...
-based customization tooling, LanguageWare Resource Workbench, is available on IBM's alphaWorks
site, and allows domain knowledge to be compiled into these resources and thereby incorporated into the analysis process.
LanguageWare can be deployed as a set of
UIMA UIMA ( ), short for Unstructured Information Management Architecture, is an OASIS standard for content analytics, originally developed at IBM. It provides a component software architecture for the development, discovery, composition, and deploy ...
-compliant annotators, Eclipse plug-ins or
Web Services.
See also
*
Data Discovery and Query Builder
Data Discovery and Query Builder (DDQB) is a data abstraction technology, developed by IBM, that allows users to retrieve information from a data warehouse, in terms of the user's specific area of expertise instead of SQL.
DDQB serves the user th ...
*
Finite state machine
A finite-state machine (FSM) or finite-state automaton (FSA, plural: ''automata''), finite automaton, or simply a state machine, is a mathematical model of computation. It is an abstract machine that can be in exactly one of a finite number o ...
*
Formal language
In logic, mathematics, computer science, and linguistics, a formal language consists of words whose letters are taken from an alphabet and are well-formed according to a specific set of rules.
The alphabet of a formal language consists of s ...
*
IBM Omnifind
*
Linguistics
Linguistics is the scientific study of human language. It is called a scientific study because it entails a comprehensive, systematic, objective, and precise analysis of all aspects of language, particularly its nature and structure. Lingu ...
*
Semantic Web
*
Semantics
Semantics (from grc, σημαντικός ''sēmantikós'', "significant") is the study of reference, meaning, or truth. The term can be used to refer to subfields of several distinct disciplines, including philosophy, linguistics and compu ...
*
Service-oriented architecture
In software engineering, service-oriented architecture (SOA) is an architectural style that focuses on discrete services instead of a monolithic design. By consequence, it is also applied in the field of software design where services are provided ...
*
Web services
*
UIMA UIMA ( ), short for Unstructured Information Management Architecture, is an OASIS standard for content analytics, originally developed at IBM. It provides a component software architecture for the development, discovery, composition, and deploy ...
References
External links
IBM LanguageWare Resource Workbench on alphaWorksIBM LanguageWare Miner for Multidimensional Socio-Semantic Networks on alphaWorksUIMA Homepage at the Apache Software FoundationUIMA Framework on SourceForgeIBM OmniFind Yahoo! Edition (FREE enterprise search engine)Semantic Information Systems and Language Engineering GroupSemanticDesktop.org
Related Papers
Branimir K. Boguraev Annotation-Based Finite State Processing in a Large-Scale NLP Architecture, IBM Research Report, 2004Alexander Troussov, Mikhail Sogrin, "IBM LanguageWare Ontological Network Miner"Sheila Kinsella, Andreas Harth, Alexander Troussov, Mikhail Sogrin, John Judge, Conor Hayes, John G. Breslin, "Navigating and Annotating Semantically-Enabled Networks of People and Associated Objects"Mikhail Kotelnikov, Alexander Polonsky, Malte Kiesel, Max Völkel, Heiko Haller, Mikhail Sogrin, Pär Lannerö, Brian Davis, "Interactive Semantic Wikis"Sebastian Trüg, Jos van den Oever, Stéphane Laurière, "The Social Semantic Desktop: Nepomuk"Séamus Lawless, Vincent Wade, "Dynamic Content Discovery, Harvesting and Delivery"{cbignore, bot=medic
*[https://web.archive.org/web/20070209100619/http://www.scai.fraunhofer.de/fileadmin/download/vortraege/tms_06/IBM_Nevidomsky.pdf Alex Nevidomsky, "UIMA Framework and Knowledge Discovery at IBM", 4th Text Mining Symposium, Fraunhofer SCAI, 2006]
Data mining and machine learning software
Java development tools
Natural language processing
Java (programming language) libraries