National Centre For Text Mining
   HOME

TheInfoList



OR:

The National Centre for Text Mining (NaCTeM) is a publicly funded
text mining Text mining, also referred to as ''text data mining'', similar to text analytics, is the process of deriving high-quality information from text. It involves "the discovery by computer of new, previously unknown information, by automatically extract ...
(TM) centre. It was established to provide support, advice, and information on TM technologies and to disseminate information from the larger TM community, while also providing tailored services and tools in response to the requirements of the United Kingdom academic community. The
software Software is a set of computer programs and associated documentation and data. This is in contrast to hardware, from which the system is built and which actually performs the work. At the lowest programming level, executable code consists ...
tools and services which NaCTeM supplies allow researchers to apply text mining techniques to problems within their specific areas of interest – examples of these tools are highlighted below. In addition to providing services, the centre is also involved in, and makes significant contributions to, the text mining research community both nationally and internationally in initiatives such as
Europe PubMed Central Europe PubMed Central (Europe PMC) is an open-access repository which contains millions of biomedical research works. It was known as UK PubMed Central until 1 November 2012. Service Europe PMC provides free access to more than 3.7 million full-te ...
. The centre is located in the
Manchester Institute of Biotechnology The Manchester Institute of Biotechnology, formerly the Manchester Interdisciplinary Biocentre (MIB) is a research institute of the University of Manchester, England. Role The centre has been designed to enable academic communities to explore ...
and is operated and organised by the
Department of Computer Science, University of Manchester The Department of Computer Science at the University of Manchester is the longest established department of Computer Science in the United Kingdom and one of the largest. It is located in the Kilburn Building on the Oxford Road and currently has o ...
. NaCTeM contributes expertise in
natural language processing Natural language processing (NLP) is an interdisciplinary subfield of linguistics, computer science, and artificial intelligence concerned with the interactions between computers and human language, in particular how to program computers to pro ...
and
information extraction Information extraction (IE) is the task of automatically extracting structured information from unstructured and/or semi-structured machine-readable documents and other electronically represented sources. In most of the cases this activity concer ...
, including
named-entity recognition Named-entity recognition (NER) (also known as (named) entity identification, entity chunking, and entity extraction) is a subtask of information extraction that seeks to locate and classify named entities mentioned in unstructured text into pre ...
and extractions of complex relationships (or events) that hold between named entities, along with parallel and distributed data mining systems in biomedical and clinical applications.


Services


TerMine

TerMine is a domain independent method for automatic term recognition which can be used to help locate the most important terms in a document and automatically ranks them.


AcroMine

AcroMine finds all known expanded forms of
acronyms An acronym is a word or name formed from the initial components of a longer name or phrase. Acronyms are usually formed from the initial letters of words, as in ''NATO'' (''North Atlantic Treaty Organization''), but sometimes use syllables, as ...
as they have appeared in
Medline MEDLINE (Medical Literature Analysis and Retrieval System Online, or MEDLARS Online) is a bibliographic database of life sciences and biomedical information. It includes bibliographic information for articles from academic journals covering medic ...
entries or conversely, it can be used to find possible acronyms of expanded forms as they have previously appeared in Medline and disambiguates them.


Medie

Medie is an intelligent search engine for the semantic retrieval of sentences containing biomedical correlations from Medline abstracts.


Facta+

Facta+ is a Medline search engine for finding associations between biomedical concepts.


Facta+ Visualizer

Facta+ Visualizer is a web application that aids in understanding FACTA+ search results through intuitive graphical visualisation.


KLEIO

KLEIO is a faceted semantic information retrieval system over Medline abstracts.


Europe PMC EvidenceFinder

Europe PMC EvidenceFinder Europe PMC EvidenceFinder helps users to explore facts that involve entities of interest within the full text articles of the
Europe PubMed Central Europe PubMed Central (Europe PMC) is an open-access repository which contains millions of biomedical research works. It was known as UK PubMed Central until 1 November 2012. Service Europe PMC provides free access to more than 3.7 million full-te ...
database.


EUPMC Evidence Finder for Anatomical entities with meta-knowledge

EUPMC Evidence Finder for Anatomical entities with meta-knowledge is similar to the Europe PMC EvidenceFinder, allowing exploration of facts involving anatomical entities within the full text articles of the Europe PubMed Central database. Facts can be filtered according to various aspects of their interpretation (e.g., negation, certainly level, novelty).


Info-PubMed

Info-PubMed provides information and graphical representation of biomedical interactions extracted from Medline using deep
semantic parsing Semantic parsing is the task of converting a natural language utterance to a logical form: a machine-understandable representation of its meaning. Semantic parsing can thus be understood as extracting the precise meaning of an utterance. Application ...
technology. This is supplemented with a term dictionary consisting of over 200,000
protein Proteins are large biomolecules and macromolecules that comprise one or more long chains of amino acid residues. Proteins perform a vast array of functions within organisms, including catalysing metabolic reactions, DNA replication, respo ...
/
gene In biology, the word gene (from , ; "...Wilhelm Johannsen coined the word gene to describe the Mendelian units of heredity..." meaning ''generation'' or ''birth'' or ''gender'') can have several different meanings. The Mendelian gene is a ba ...
names and identification of
disease A disease is a particular abnormal condition that negatively affects the structure or function of all or part of an organism, and that is not immediately due to any external injury. Diseases are often known to be medical conditions that a ...
types and
organisms In biology, an organism () is any living system that functions as an individual entity. All organisms are composed of cells (cell theory). Organisms are classified by taxonomy into groups such as multicellular animals, plants, and fungi; ...
.


Clinical Trial Protocols (ASCOT)

ASCOT is an efficient, semantically-enhanced search application, customised for clinical trial documents.


History of Medicine (HOM)

HOM is a semantic search system over historical medical document archives


Resources


BioLexicon

BioLexicon is a large-scale terminological resource for the biomedical domain.


GENIA

GENIA is a collection of reference materials for the development of biomedical text mining systems.


GREC

GREC is a semantically annotated corpus of Medline abstracts intended for training IE systems and/or resources which are used to extract events from biomedical literature.


Metabolite and Enzyme Corpus

This is a corpus of Medline abstracts annotated by experts with metabolite and enzyme names.


Anatomy Corpora

A collection of corpora manually annotated with fine-grained, species-independent anatomical entities, to facilitate the development of text mining systems that can carry out detailed and comprehensive analyses of biomedical scientific text.


Meta-knowledge corpus

This is an enrichment of the GENIA Event corpus, in which events are enriched with various levels of information pertaining to their interpretation. The aim is to allow systems to be trained that can distinguish between events that factual information or experimental analyses, definite information from speculated information, etc.


Projects


Argo

The objective of the Argo project is to develop a workbench for analysing (primarily annotating) textual data. The workbench, which is accessed as a web application, supports the combination of elementary text-processing components to form comprehensive processing workflows. It provides functionality to manually intervene in the otherwise automatic process of annotation by correcting or creating new annotations, and facilitates user collaboration by providing sharing capabilities for user-owned resources. Argo benefits users such as text-analysis designers by providing an integrated environment for the development of processing workflows; annotators/curators by providing manual annotation functionalities supported by automatic pre-processing and post-processing; and developers by providing a workbench for testing and evaluating text analytics.


Big Mechanism

Big mechanisms are large, explanatory models of complicated systems in which interactions have important causal effects. Whilst the collection of big data is increasingly automated, the creation of big mechanisms remains a largely human effort, which is becoming increasingly challenging, according to the fragmentation and distribution of knowledge. The ability to automate the construction of big mechanisms could have a major impact on scientific research. As one of a number of different projects that make up the big mechanism programme, funded by
DARPA The Defense Advanced Research Projects Agency (DARPA) is a research and development agency of the United States Department of Defense responsible for the development of emerging technologies for use by the military. Originally known as the Adv ...
, the aim is to assemble an overarching big mechanism from the literature and prior experiments and to utilise this for the probabilistic interpretation of new patient panomics data. We will integrate machine reading of the cancer literature with probabilistic reasoning across cancer claims using specially-designed ontologies, computational modelling of cancer mechanisms (pathways), automated hypothesis generation to extend knowledge of the mechanisms and a 'Robot Scientist' that performs experiments to test the hypotheses. A repetitive cycle of text mining, modelling, experimental testing, and worldview updating is intended to lead to increased knowledge about cancer mechanisms.


Pathtext

Pathtext/Refine is a system designed to integrate a pathway visualiser, text mining systems and annotation tools.


COPIOUS

This project aims to produce a knowledge repository of Philippine biodiversity by combining the domain-relevant expertise and resources of Philippine partners with the text mining-based big data analytics of the University of Manchester's National Centre for Text Mining. The repository will be a synergy of different types of information, e.g., taxonomic, occurrence, ecological, biomolecular, biochemical, thus providing users with a comprehensive view on species of interest that will allow them to (1) carry out predictive analysis on species distributions, and (2) investigate potential medicinal applications of natural products derived from Philippine species.


Europe PMC Project

This is a collaboration with the Text-Mining group at the
European Bioinformatics Institute The European Bioinformatics Institute (EMBL-EBI) is an Intergovernmental Organization (IGO) which, as part of the European Molecular Biology Laboratory (EMBL) family, focuses on research and services in bioinformatics. It is located on the Well ...
(EBI) and
Mimas (data centre) Mimas was a nationally designated academic data centre based at the University of Manchester in the United Kingdom. Its mission was to support the advancement of knowledge, research, and teaching. It hosted a number of the UK's research informatio ...
, forming a work package in the Europe PubMed Central project (formerly UKPMC) hosted and coordinated by the
British Library The British Library is the national library of the United Kingdom and is one of the largest libraries in the world. It is estimated to contain between 170 and 200 million items from many countries. As a legal deposit library, the British ...
. Europe PMC, as a whole, forms a European version of the PubMed Central paper repository, in collaboration with the
National Institutes of Health The National Institutes of Health, commonly referred to as NIH (with each letter pronounced individually), is the primary agency of the United States government responsible for biomedical and public health research. It was founded in the late ...
(NIH) in the United States. Europe PMC is funded by a consortium of key funding bodies from the biomedical research funders. The contribution to this major project is in the application of text mining solutions to enhance information retrieval and knowledge discovery. As such this is an application of technology developed in other NaCTeM projects on a large scale and in a prominent resource for the Biomedicine community.


Mining Biodiversity

This project aims to transform the
Biodiversity Heritage Library The Biodiversity Heritage Library (BHL) is the world’s largest open access digital library for biodiversity literature and archives. BHL operates as worldwide consortiumof natural history, botanical, research, and national libraries working toge ...
(BHL) into a next-generation social digital library resource to facilitate the study and discussion (via social media integration) of legacy science documents on biodiversity by a worldwide community and to raise awareness of the changes in biodiversity over time in the general public. The project integrates novel text mining methods, visualisation, crowdsourcing and social media into the BHL. The resulting digital resource will provide fully interlinked and indexed access to the full content of BHL library documents, via semantically enhanced and interactive browsing and searching capabilities, allowing users to locate precisely the information of interest to them in an easy and efficient manner.


Mining for Public Health

This project aims to conduct novel research in text mining and machine learning to transform the way in which evidence-based public health (EBPH) reviews are conducted. The aims of the project are to develop new text mining unsupervised methods for deriving term similarities, to support screening while searching in EBPH reviews and to develop new algorithms for ranking and visualising meaningful associations of multiple types in a dynamic and iterative manner. These newly developed methods will be evaluated in EBPH reviews, based on implementation of a pilot, to ascertain the level of transformation in EBPH reviewing.


References


External links

* http://www.nactem.ac.uk {{Authority control Computational linguistics Computer science organizations Information retrieval organizations Information technology organisations based in the United Kingdom Linguistics organizations Research institutes in Manchester Department of Computer Science, University of Manchester