CTAKES
   HOME

TheInfoList



OR:

Apache cTAKES: clinical Text Analysis and Knowledge Extraction System is an open-source Natural Language Processing (NLP) system that extracts clinical information from electronic health record unstructured text. It processes clinical notes, identifying types of clinical named entities — drugs, diseases/disorders, signs/symptoms, anatomical sites and procedures. Each named entity has attributes for the text span, the ontology mapping code, context (family history of, current, unrelated to patient), and negated/not negated. cTAKES was built using the UIMA Unstructured Information Management Architecture framework and
OpenNLP The Apache OpenNLP library is a machine learning based toolkit for the processing of natural language text. It supports the most common NLP tasks, such as language detection, tokenization, sentence segmentation, part-of-speech tagging, named en ...
natural language processing toolkit.


Components

Components of cTAKES are specifically trained for the clinical domain, and create rich linguistic and semantic annotations that can be utilized by clinical decision support systems and clinical research. These components include: * Named Section identifier * Sentence boundary detector * Rule-based tokenizer * Formatted list identifier * Normalizer * Context dependent tokenizer * Part-of-speech tagger * Phrasal chunker * Dictionary lookup annotator * Context annotator * Negation detector * Uncertainty detector * Subject detector * Dependency parser * patient smoking status identifier * Drug mention annotator


History

Development of cTAKES began at the Mayo Clinic in 2006. The development team, led by Dr. Guergana Savova and Dr. Christopher Chute, included physicians, computer scientists and software engineers. After its deployment, cTAKES became an integral part of Mayo's clinical data management infrastructure, processing more than 80 million clinical notes. When Dr. Savova's moved to
Boston Children's Hospital Boston Children's Hospital formerly known as Children's Hospital Boston until 2012 is a nationally ranked, freestanding acute care children's hospital located in Boston, Massachusetts, adjacent both to its teaching affiliate, Harvard Medical Scho ...
in early 2010, the core development team grew to include members there. Further external collaborations include: *
University of Colorado The University of Colorado (CU) is a system of public universities in Colorado. It consists of four institutions: University of Colorado Boulder, University of Colorado Colorado Springs, University of Colorado Denver, and the University o ...
*
Brandeis University , mottoeng = "Truth even unto its innermost parts" , established = , type = Private research university , accreditation = NECHE , president = Ronald D. Liebowitz , ...
*
University of Pittsburgh The University of Pittsburgh (Pitt) is a public state-related research university in Pittsburgh, Pennsylvania. The university is composed of 17 undergraduate and graduate schools and colleges at its urban Pittsburgh campus, home to the univers ...
*
University of California The University of California (UC) is a public land-grant research university system in the U.S. state of California. The system is composed of the campuses at Berkeley, Davis, Irvine, Los Angeles, Merced, Riverside, San Diego, San Franci ...
at San Diego Such collaborations have extended cTAKES' capabilities into other areas such as Temporal Reasoning, Clinical Question Answering, and coreference resolution for the clinical domain. In 2010, cTAKES was adopted by th
i2b2
program and is a central component of th
SHARP Area 4
In 2013, cTAKES released their first release as an Apache incubator project
cTAKES 3.0
In March 2013, cTAKES became an Apache Top Level Project (TLP).


See also

*
OpenNLP The Apache OpenNLP library is a machine learning based toolkit for the processing of natural language text. It supports the most common NLP tasks, such as language detection, tokenization, sentence segmentation, part-of-speech tagging, named en ...
*
UIMA UIMA ( ), short for Unstructured Information Management Architecture, is an OASIS standard for content analytics, originally developed at IBM. It provides a component software architecture for the development, discovery, composition, and deplo ...
* Electronic Health Record *
Unified Medical Language System The Unified Medical Language System (UMLS) is a compendium of many controlled vocabularies in the biomedical sciences (created 1986). It provides a mapping structure among these vocabularies and thus allows one to translate among the various termi ...


References


External links


cTAKES Official Website


from
ASF ASF may refer to: Arts and entertainment * Alabama Shakespeare Festival, a drama festival * ''Asimov's Science Fiction'', a U.S.-based English-language science fiction magazine containing SF stories Science and technology Biological * A ...

Abstract (JAMIA)

Open Health Natural Language Processing (OHNLP) Consortium

Strategic Health IT Advanced Research Projects (SHARP) Program

SHARP Area 4 - Secondary Use of EHR Data




was developed as part of the i2b2 project. It is a rule-based NLP pipeline based on the GATE framework developed b
Informatics for Integrating Biology and the Bedside

Computational Language and Education Research toolkit (cleartk)
(''No longer maintained'') has been developed at the University of Colorado at Boulder, and provides a framework for developing statistical NLP components in Java. It is built on top of Apache UIMA.
NegEx
- is a tool developed at the University of Pittsburgh to detect negated terms from clinical text. The system utilizes trigger terms as a method to determine likely negation scenarios within a sentence.

: an extension to NegEx, and is also developed by the University of Pittsburgh. ConText extends NegEx to not only detect negated concepts, but to also find temporal (recent, historical or hypothetical scenarios) and who the Subject (of experience) is (patient or other).
MetaMap
(by
United States National Library of Medicine The United States National Library of Medicine (NLM), operated by the United States federal government, is the world's largest medical library. Located in Bethesda, Maryland, the NLM is an institute within the National Institutes of Health. Its ...
): is a comprehensive concept tagging system which is built on top of the
Unified Medical Language System The Unified Medical Language System (UMLS) is a compendium of many controlled vocabularies in the biomedical sciences (created 1986). It provides a mapping structure among these vocabularies and thus allows one to translate among the various termi ...
. It requires an active ''UMLS Metathesaurus License Agreement'' (and account) for use.
MedEx
- a tool for extraction medication information from clinical text. MedEx processes free-text clinical records to recognize medication names and signature information, such as drug dose, frequency, route, and duration. Use is free with a UMLS license. It is a standalone application for Linux and Windows.
SecTag
(section tagging hierarchy): recognizes note section headers using NLP, Bayesian, spelling correction, and scoring techniques. Use is free with either a UMLS or LOINC license. *
Stanford Named Entity Recognizer (NER)
: Stanford’s NER is a Conditional Random Field sequence model, together with well-engineered features for Named Entity Recognition in English and German. *
Stanford CoreNLP
is an integrated suite of natural language processing tools for English in Java, including
tokenization Tokenization may refer to: * Tokenization (lexical analysis) in language processing * Tokenization (data security) in the field of data security * Word segmentation * Tokenism Tokenism is the practice of making only a perfunctory or symbolic ...
, part-of-speech tagging, named entity recognition, parsing, and coreference. {{Health software
cTAKES Apache cTAKES: clinical Text Analysis and Knowledge Extraction System is an open-source Natural Language Processing (NLP) system that extracts clinical information from electronic health record unstructured text. It processes clinical notes, iden ...
Electronic health record software Natural language processing software Free health care software Free bioinformatics software