HOME

TheInfoList



OR:

A relationship extraction task requires the detection and classification of semantic relationship mentions within a set of artifacts, typically from
text Text may refer to: Written word * Text (literary theory) In literary theory, a text is any object that can be "read", whether this object is a work of literature, a street sign, an arrangement of buildings on a city block, or styles of clothi ...
or
XML Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing data. It defines a set of rules for encoding electronic document, documents in a format that is both human-readable and Machine-r ...
documents. The task is very similar to that of information extraction (IE), but IE additionally requires the removal of repeated relations (
disambiguation Word-sense disambiguation is the process of identifying which sense of a word is meant in a sentence or other segment of context. In human language processing and cognition, it is usually subconscious. Given that natural language requires ref ...
) and generally refers to the extraction of many different relationships.


Concept and applications

The concept of relationship extraction was first introduced during the 7th Message Understanding Conference in 1998. Relationship extraction involves the identification of relations between entities and it usually focuses on the extraction of binary relations. Application domains where relationship extraction is useful include gene-disease relationships, protein-protein interaction etc. Current relationship extraction studies use machine learning technologies, which approach relationship extraction as a classification problem. Never-Ending Language Learning is a
semantic Semantics is the study of linguistic Meaning (philosophy), meaning. It examines what meaning is, how words get their meaning, and how the meaning of a complex expression depends on its parts. Part of this process involves the distinction betwee ...
machine learning Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of Computational statistics, statistical algorithms that can learn from data and generalise to unseen data, and thus perform Task ( ...
system A system is a group of interacting or interrelated elements that act according to a set of rules to form a unified whole. A system, surrounded and influenced by its open system (systems theory), environment, is described by its boundaries, str ...
developed by a research team at
Carnegie Mellon University Carnegie Mellon University (CMU) is a private research university in Pittsburgh, Pennsylvania, United States. The institution was established in 1900 by Andrew Carnegie as the Carnegie Technical Schools. In 1912, it became the Carnegie Institu ...
that extracts relationships from the open web.


Approaches

There are several methods used to extract relationships and these include text-based relationship extraction. These methods rely on the use of pretrained relationship structure information or it could entail the learning of the structure in order to reveal relationships. Another approach to this problem involves the use of domain
ontologies In information science, an ontology encompasses a representation, formal naming, and definitions of the categories, properties, and relations between the concepts, data, or entities that pertain to one, many, or all domains of discourse. More ...
. There is also the approach that involves visual detection of meaningful relationships in parametric values of objects listed on a data table that shift positions as the table is permuted automatically as controlled by the software user. The poor coverage, rarity and development cost related to structured resources such as semantic lexicons (e.g.
WordNet WordNet is a lexical database of semantic relations between words that links words into semantic relations including synonyms, hyponyms, and meronyms. The synonyms are grouped into ''synsets'' with short definitions and usage examples. It can thu ...
, UMLS) and domain ontologies (e.g. the
Gene Ontology The Gene Ontology (GO) is a major bioinformatics initiative to unify the representation of gene and gene product attributes across all species. More specifically, the project aims to: 1) maintain and develop its controlled vocabulary of gene and ...
) has given rise to new approaches based on broad, dynamic background knowledge on the Web. For instance, the ARCHILES technique uses only Wikipedia and search engine page count for acquiring coarse-grained relations to construct lightweight ontologies. The relationships can be represented using a variety of formalisms/languages. One such representation language for data on the Web is RDF. More recently, end-to-end systems which jointly learn to extract entity mentions and their semantic relations have been proposed with strong potential to obtain high performance. Most of the reported systems have demonstrated their approach on English datasets. However, data and systems have been described for other languages, e.g.,
Russian Russian(s) may refer to: *Russians (), an ethnic group of the East Slavic peoples, primarily living in Russia and neighboring countries *A citizen of Russia *Russian language, the most widely spoken of the Slavic languages *''The Russians'', a b ...
and Vietnamese.


Datasets

Researchers have constructed multiple datasets for benchmarking relationship extraction methods. One such dataset was the document-level relationship extraction dataset called DocRED released in 2019. It uses relations from
Wikidata Wikidata is a collaboratively edited multilingual knowledge graph hosted by the Wikimedia Foundation. It is a common source of open data that Wikimedia projects such as Wikipedia, and anyone else, are able to use under the CC0 public domain ...
and text from the
English Wikipedia The English Wikipedia is the primary English-language edition of Wikipedia, an online encyclopedia. It was created by Jimmy Wales and Larry Sanger on 15 January 2001, as Wikipedia's first edition. English Wikipedia is hosted alongside o ...
. The dataset has been used by other researchers and a prediction competition has been setup at CodaLab.


See also

*
Text analytics Text mining, text data mining (TDM) or text analytics is the process of deriving high-quality information from plain text, text. It involves "the discovery by computer of new, previously unknown information, by automatically extracting information ...
*
Semantic analytics Semantic analytics, also termed ''semantic relatedness'', is the use of ontologies to analyze content in web resources. This field of research combines text analytics and Semantic Web technologies like RDF. Semantic analytics measures the relate ...
*
Semantic role labeling In natural language processing, semantic role labeling (also called shallow semantic parsing or slot-filling) is the process that assigns labels to words or phrases in a sentence that indicates their semantic role in the sentence, such as that of ...
* Information extraction *
Business Intelligence Business intelligence (BI) consists of strategies, methodologies, and technologies used by enterprises for data analysis and management of business information. Common functions of BI technologies include Financial reporting, reporting, online an ...


References

Tasks of natural language processing Semantic Web {{comp-sci-stub