UIMA

	UIMA UIMA ( ), short for Unstructured Information Management Architecture, is an OASIS standard for content analytics, originally developed at IBM. It provides a component software architecture for the development, discovery, composition, and deployment of multi-modal analytics for the analysis of unstructured information and integration with search technologies. Structure The UIMA architecture can be thought of in four dimensions: # It specifies component interfaces in an analytics pipeline. # It describes a set of design patterns. # It suggests two data representations: an in-memory representation of annotations for high-performance analytics and an XML representation of annotations for integration with remote web services. # It suggests development roles allowing tools to be used by users with diverse skills. Implementations and uses Apache UIMA, a reference implementation of UIMA, is maintained by the Apache Software Foundation. UIMA is used in a number of software p ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	CTAKES Apache cTAKES: clinical Text Analysis and Knowledge Extraction System is an open-source Natural Language Processing (NLP) system that extracts clinical information from electronic health record unstructured text. It processes clinical notes, identifying types of clinical named entities — drugs, diseases/disorders, signs/symptoms, anatomical sites and procedures. Each named entity has attributes for the text span, the ontology mapping code, context (family history of, current, unrelated to patient), and negated/not negated. cTAKES was built using the UIMA Unstructured Information Management Architecture framework and OpenNLP natural language processing toolkit. Components Components of cTAKES are specifically trained for the clinical domain, and create rich linguistic and semantic annotations that can be utilized by clinical decision support systems and clinical research. These components include: * Named Section identifier * Sentence boundary detector * Rule-based tok ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	LanguageWare LanguageWare is a natural language processing (NLP) technology developed by IBM, which allows applications to process natural language text. It comprises a set of Java libraries which provide a range of NLP functions: language identification, text segmentation/tokenization, normalization, entity and relationship extraction, and semantic analysis and disambiguation. The analysis engine uses Finite State Machine approach at multiple levels, which aids its performance characteristics, while maintaining a reasonably small footprint. The behaviour of the system is driven by a set of configurable lexico-semantic resources which describe the characteristics and domain of the processed language. A default set of resources comes as part of LanguageWare and these describe the native language characteristics, such as morphology, and the basic vocabulary for the language. Supplemental resources have been created which capture additional vocabularies, terminologies, rules and grammars, which m ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Unstructured Information Unstructured data (or unstructured information) is information that either does not have a pre-defined data model or is not organized in a pre-defined manner. Unstructured information is typically text-heavy, but may contain data such as dates, numbers, and facts as well. This results in irregularities and ambiguities that make it difficult to understand using traditional programs as compared to data stored in fielded form in databases or annotated ( semantically tagged) in documents. In 1998, Merrill Lynch said "unstructured data comprises the vast majority of data found in an organization, some estimates run as high as 80%." It's unclear what the source of this number is, but nonetheless it is accepted by some. Other sources have reported similar or higher percentages of unstructured data. , IDC and Dell EMC project that data will grow to 40 zettabytes by 2020, resulting in a 50-fold growth from the beginning of 2010. More recently, IDC and Seagate predict that the global data ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Unstructured Data Unstructured data (or unstructured information) is information that either does not have a pre-defined data model or is not organized in a pre-defined manner. Unstructured information is typically text-heavy, but may contain data such as dates, numbers, and facts as well. This results in irregularities and ambiguities that make it difficult to understand using traditional programs as compared to data stored in fielded form in databases or annotated ( semantically tagged) in documents. In 1998, Merrill Lynch said "unstructured data comprises the vast majority of data found in an organization, some estimates run as high as 80%." It's unclear what the source of this number is, but nonetheless it is accepted by some. Other sources have reported similar or higher percentages of unstructured data. , IDC and Dell EMC project that data will grow to 40 zettabytes by 2020, resulting in a 50-fold growth from the beginning of 2010. More recently, IDC and Seagate predict that the global data ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	OASIS (organization) The Organization for the Advancement of Structured Information Standards (OASIS; ) is a nonprofit consortium that works on the development, convergence, and adoption of open standards for cybersecurity, blockchain, Internet of things (IoT), emergency management, cloud computing, legal data exchange, energy, content technologies, and other areas. History OASIS was founded under the name "SGML Open" in 1993. It began as a trade association of Standard Generalized Markup Language (SGML) tool vendors to cooperatively promote the adoption of SGML through mainly educational activities, though some amount of technical activity was also pursued including an update of the CALS Table Model specification and specifications for fragment interchange and entity management. In 1998, with the movement of the industry to XML, SGML Open changed its emphasis from SGML to XML, and changed its name to OASIS Open to be inclusive of XML and reflect an expanded scope of technical work and standar ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	IBM Omnifind IBM OmniFind was an enterprise search platform from IBM. It did come in several packages adapted to different business needs, including OmniFind Enterprise Edition, OmniFind Enterprise Starter Edition, and OmniFind Discovery Edition. IBM OmniFind as a standalone product was withdrawn in April 2011 and is now part of IBM Watson Content Analytics with Enterprise Search. IBM OmniFind Yahoo! Edition was a free-of-charge version that could handle up to 500,000 documents in its index and was intended for small businesses. IBM OmniFind Yahoo! Edition was simple to install, provided a user friendly front end for administration, and incorporated technology from the open source Lucene project. IBM withdrew this product from marketing effective September 22, 2010 and withdrew support effective June 30, 2011. IBM OmniFind Personal E-mail Search was a research product launched in 2007 for doing semantic search over personal emails by extracting and organizing concepts and relationships (such a ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	General Architecture For Text Engineering General Architecture for Text Engineering or GATE is a Java suite of tools originally developed at the University of Sheffield beginning in 1995 and now used worldwide by a wide community of scientists, companies, teachers and students for many natural language processing tasks, including information extraction in many languages. As of May 28, 2011, 881 people are on the gate-users mailing list at SourceForge.net, and 111,932 downloads from SourceForge are recorded since the project moved to SourceForge in 2005. The paper "GATE: A framework and graphical development environment for robust NLP tools and applications" has received over 2000 citations since publication (according to Google Scholar). Books covering the use of GATE, in addition to the GATE User Guide, include "Building Search Applications: Lucene, LingPipe, and Gate", by Manu Konchady, and "Introduction to Linguistic Annotation and Text Analytics", by Graham Wilcock. GATE community and research has been involved in seve ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Ubiquitous Knowledge Processing Lab The Ubiquitous Knowledge Processing Lab (also UKP Lab) is a research lab at the Department of Computer Science at the Technische Universität Darmstadt. It was founded in 2006 by Iryna Gurevych. Research Activities UKP Lab develops natural language processing techniques for automatically understanding written text and applies them to information management like information retrieval, question answering, and structuring information in Wikis. The Ubiquitous Knowledge Processing Lab is among the leading research institutes in the field of utilizing Web 2.0 content as the source of lexical semantic information for natural language processing (NLP). Wikipedia and Wiktionary are employed as collaboratively constructed lexical semantic resources and used to improve expert-built resources like WordNet. These resources are used to develop semantically enhanced algorithms for information retrieval and question answering. An example is semantic search: If a user enters the query "pie-frui ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Watson (computer) IBM Watson is a question-answering computer system capable of answering questions posed in natural language, developed in IBM's DeepQA project by a research team led by principal investigator David Ferrucci. Watson was named after IBM's founder and first CEO, industrialist Thomas J. Watson. The computer system was initially developed to answer questions on the quiz show ''Jeopardy!'' and, in 2011, the Watson computer system competed on ''Jeopardy!'' against champions Brad Rutter and Ken Jennings, winning the first place prize of $1 million. In February 2013, IBM announced that Watson's first commercial application would be for utilization management decisions in lung cancer treatment at Memorial Sloan Kettering Cancer Center, New York City, in conjunction with WellPoint (now Anthem). Description Watson was created as a question answering (QA) computing system that IBM built to apply advanced natural language processing, information retrieval, knowledge representation, au ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	IBM Research IBM Research is the research and development division for IBM, an American multinational information technology company headquartered in Armonk, New York, with operations in over 170 countries. IBM Research is the largest industrial research organization in the world and has twelve labs on six continents. IBM employees have garnered six Nobel Prizes, six Turing Awards, 20 inductees into the U.S. National Inventors Hall of Fame, 19 National Medals of Technology, five National Medals of Science and three Kavli Prizes. , the company has generated more patents than any other business in each of 25 consecutive years, which is a record. History The roots of today's IBM Research began with the 1945 opening of the Watson Scientific Computing Laboratory at Columbia University. This was the first IBM laboratory devoted to pure science and later expanded into additional IBM Research locations in Westchester County, New York, starting in the 1950s,Beatty, Jack, (editor''Colussus: how t ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Apache Software Foundation Projects The Apache () are a group of culturally related Native American tribes in the Southwestern United States, which include the Chiricahua, Jicarilla, Lipan, Mescalero, Mimbreño, Ndendahe (Bedonkohe or Mogollon and Nednhi or Carrizaleño and Janero), Salinero, Plains (Kataka or Semat or "Kiowa-Apache") and Western Apache ( Aravaipa, Pinaleño, Coyotero, Tonto). Distant cousins of the Apache are the Navajo, with whom they share the Southern Athabaskan languages. There are Apache communities in Oklahoma and Texas, and reservations in Arizona and New Mexico. Apache people have moved throughout the United States and elsewhere, including urban centers. The Apache Nations are politically autonomous, speak several different languages, and have distinct cultures. Historically, the Apache homelands have consisted of high mountains, sheltered and watered valleys, deep canyons, deserts, and the southern Great Plains, including areas in what is now Eastern Arizona, Northern Mexico ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Entity Extraction Named-entity recognition (NER) (also known as (named) entity identification, entity chunking, and entity extraction) is a subtask of information extraction that seeks to locate and classify named entities mentioned in unstructured text into pre-defined categories such as person names, organizations, locations, medical codes, time expressions, quantities, monetary values, percentages, etc. Most research on NER/NEE systems has been structured as taking an unannotated block of text, such as this one: And producing an annotated block of text that highlights the names of entities: In this example, a person name consisting of one token, a two-token company name and a temporal expression have been detected and classified. State-of-the-art NER systems for English produce near-human performance. For example, the best system entering MUC-7 scored 93.39% of F-measure while human annotators scored 97.60% and 96.95%. Named-entity recognition platforms Notable NER platforms include ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]