BabelNet is a
multilingual
Multilingualism is the use of more than one language, either by an individual speaker or by a group of speakers. When the languages are just two, it is usually called bilingualism. It is believed that multilingual speakers outnumber monolin ...
lexical-semantic
knowledge graph
In knowledge representation and reasoning, a knowledge graph is a knowledge base that uses a Graph (discrete mathematics), graph-structured data model or topology to represent and operate on data. Knowledge graphs are often used to store interl ...
,
ontology
Ontology is the philosophical study of existence, being. It is traditionally understood as the subdiscipline of metaphysics focused on the most general features of reality. As one of the most fundamental concepts, being encompasses all of realit ...
and encyclopedic
dictionary
A dictionary is a listing of lexemes from the lexicon of one or more specific languages, often arranged Alphabetical order, alphabetically (or by Semitic root, consonantal root for Semitic languages or radical-and-stroke sorting, radical an ...
developed at the
NLP group of the
Sapienza University of Rome
The Sapienza University of Rome (), formally the Università degli Studi di Roma "La Sapienza", abbreviated simply as Sapienza ('Wisdom'), is a Public university, public research university located in Rome, Italy. It was founded in 1303 and is ...
under the supervision of
Roberto Navigli.
[R. Navigli and S. P Ponzetto. 2012]
BabelNet: The Automatic Construction, Evaluation and Application of a Wide-Coverage Multilingual Semantic Network
Artificial Intelligence, 193, Elsevier, pp. 217-250. BabelNet was automatically created by linking
Wikipedia
Wikipedia is a free content, free Online content, online encyclopedia that is written and maintained by a community of volunteers, known as Wikipedians, through open collaboration and the wiki software MediaWiki. Founded by Jimmy Wales and La ...
to the most popular computational
lexicon
A lexicon (plural: lexicons, rarely lexica) is the vocabulary of a language or branch of knowledge (such as nautical or medical). In linguistics, a lexicon is a language's inventory of lexemes. The word ''lexicon'' derives from Greek word () ...
of the
English language
English is a West Germanic language that developed in early medieval England and has since become a English as a lingua franca, global lingua franca. The namesake of the language is the Angles (tribe), Angles, one of the Germanic peoples th ...
,
WordNet
WordNet is a lexical database of semantic relations between words that links words into semantic relations including synonyms, hyponyms, and meronyms. The synonyms are grouped into ''synsets'' with short definitions and usage examples. It can thu ...
. The integration is done using an automatic mapping and by filling in lexical gaps in resource-poor
language
Language is a structured system of communication that consists of grammar and vocabulary. It is the primary means by which humans convey meaning, both in spoken and signed language, signed forms, and may also be conveyed through writing syste ...
s by using
statistical machine translation
Statistical machine translation (SMT) is a machine translation approach where translations are generated on the basis of statistical models whose parameters are derived from the analysis of bilingual text corpora. The statistical approach contra ...
. The result is an
encyclopedic dictionary that provides
concept
A concept is an abstract idea that serves as a foundation for more concrete principles, thoughts, and beliefs.
Concepts play an important role in all aspects of cognition. As such, concepts are studied within such disciplines as linguistics, ...
s and
named entities lexicalized
In linguistics, lexicalization is the process of adding words, set phrases, or word patterns to a language's lexicon.
Whether ''word formation'' and ''lexicalization'' refer to the same process is controversial within the field of linguistics. Mo ...
in many languages and connected with large amounts of
semantic relations
Semantics is the study of linguistic meaning. It examines what meaning is, how words get their meaning, and how the meaning of a complex expression depends on its parts. Part of this process involves the distinction between sense and referenc ...
. Additional lexicalizations and definitions are added by linking to free-license wordnets, OmegaWiki, the English
Wiktionary
Wiktionary (, ; , ; rhyming with "dictionary") is a multilingual, web-based project to create a free content dictionary of terms (including words, phrases, proverbs, linguistic reconstructions, etc.) in all natural languages and in a number o ...
,
Wikidata
Wikidata is a collaboratively edited multilingual knowledge graph hosted by the Wikimedia Foundation. It is a common source of open data that Wikimedia projects such as Wikipedia, and anyone else, are able to use under the CC0 public domain ...
,
FrameNet
FrameNet is a group of online lexical databases based upon the theory of meaning known as Frame semantics, developed by linguist Charles J. Fillmore. The project's fundamental notion is simple: most words' meanings may be best understood in ter ...
,
VerbNet and others. Similarly to WordNet, BabelNet groups
word
A word is a basic element of language that carries semantics, meaning, can be used on its own, and is uninterruptible. Despite the fact that language speakers often have an intuitive grasp of what a word is, there is no consensus among linguist ...
s in different languages into sets of
synonyms
A synonym is a word, morpheme, or phrase that means precisely or nearly the same as another word, morpheme, or phrase in a given language. For example, in the English language, the words ''begin'', ''start'', ''commence'', and ''initiate'' are a ...
, called ''Babel
synsets
In metadata, a synonym ring or synset, is a group of data elements that are considered semantically equivalent for the purposes of information retrieval. These data elements are frequently found in different metadata registries. Although a group ...
''. For each Babel synset, BabelNet provides short definitions (called
glosses) in many languages harvested from both WordNet and Wikipedia.
Statistics of BabelNet
, BabelNet (version 5.3) covers 600
language
Language is a structured system of communication that consists of grammar and vocabulary. It is the primary means by which humans convey meaning, both in spoken and signed language, signed forms, and may also be conveyed through writing syste ...
s. It contains almost 23 million synsets and around 1.7 billion
word sense
In linguistics, a word sense is one of the meanings of a word. For example, a dictionary may have over 50 different senses of the word "play", each of these having a different meaning based on the context of the word's usage in a sentence, as f ...
s (regardless of their language). Each Babel synset contains 2 synonyms per language, i.e., word senses, on average. The semantic network includes all the lexico-semantic relations from WordNet (
hypernymy and hyponymy
Hypernymy and hyponymy are the wikt:Wiktionary:Semantic relations, semantic relations between a generic term (''hypernym'') and a more specific term (''hyponym''). The hypernym is also called a ''supertype'', ''umbrella term'', or ''blanket term ...
,
meronymy
In linguistics, meronymy () is a semantic relation between a meronym denoting a part and a holonym denoting a whole. In simpler terms, a meronym is in a ''part-of'' relationship with its holonym. For example, ''finger'' is a meronym of ''hand, ...
and
holonymy
In linguistics, meronymy () is a semantic relation between a meronym denoting a part and a holonym denoting a whole. In simpler terms, a meronym is in a ''part-of'' relationship with its holonym. For example, ''finger'' is a meronym of ''hand, ...
,
antonymy
In lexical semantics, opposites are words lying in an inherently incompatible binary relationship. For example, something that is ''even'' entails that it is not ''odd''. It is referred to as a 'binary' relationship because there are two members i ...
and
synonymy, etc., totaling around 364,000 relation edges) as well as an underspecified relatedness relation from Wikipedia (totaling around 1.9 billion edges).
Version 5.3 also associates around 61 million images with Babel synsets and provides a Lemon
RDF encoding of the resource, available via a
SPARQL endpoint. 2.67 million synsets are assigned domain labels.
Applications
BabelNet has been shown to enable multilingual
natural language processing
Natural language processing (NLP) is a subfield of computer science and especially artificial intelligence. It is primarily concerned with providing computers with the ability to process data encoded in natural language and is thus closely related ...
applications. The lexicalized
knowledge
Knowledge is an Declarative knowledge, awareness of facts, a Knowledge by acquaintance, familiarity with individuals and situations, or a Procedural knowledge, practical skill. Knowledge of facts, also called propositional knowledge, is oft ...
available in BabelNet has been shown to obtain state-of-the-art results in:
*
Semantic relatedness
Semantic similarity is a metric defined over a set of documents or terms, where the idea of distance between items is based on the likeness of their meaning or semantic content as opposed to lexicographical similarity. These are mathematical too ...
,
* Multilingual
word-sense disambiguation
Word-sense disambiguation is the process of identifying which sense of a word is meant in a sentence or other segment of context. In human language processing and cognition, it is usually subconscious.
Given that natural language requires ref ...
and
entity linking
In natural language processing, Entity Linking, also referred to as named-entity disambiguation (NED), named-entity recognition and disambiguation (NERD), named-entity normalization (NEN), or Concept Recognition, is the task of assigning a unique ...
, with the
Babelfy system,
*
Video games with a purpose.
Prizes and acknowledgments
BabelNet received the META prize 2015 for "groundbreaking work in overcoming language barriers through a multilingual lexicalised semantic network and ontology making use of heterogeneous data sources".
The Artificial Intelligence Journal paper that describes BabelNet
won the Prominent Paper Award in 2017.
BabelNet featured prominently in a ''
Time
Time is the continuous progression of existence that occurs in an apparently irreversible process, irreversible succession from the past, through the present, and into the future. It is a component quantity of various measurements used to sequ ...
'' magazine article
[Steinmetz, Katy (May 12, 2016)]
"Redefining the Modern Dictionary"
''Time
Time is the continuous progression of existence that occurs in an apparently irreversible process, irreversible succession from the past, through the present, and into the future. It is a component quantity of various measurements used to sequ ...
''. 187: 20-21. about the new age of innovative and up-to-date lexical knowledge resources available on the Web.
See also
*
Babelfy
*
EuroWordNet
*
Knowledge acquisition
*
Linguistic Linked Open Data
*
Semantic network
A semantic network, or frame network is a knowledge base that represents semantic relations between concepts in a network. This is often used as a form of knowledge representation. It is a directed or undirected graph consisting of vertices, ...
*
Semantic relatedness
Semantic similarity is a metric defined over a set of documents or terms, where the idea of distance between items is based on the likeness of their meaning or semantic content as opposed to lexicographical similarity. These are mathematical too ...
*
Wikidata
Wikidata is a collaboratively edited multilingual knowledge graph hosted by the Wikimedia Foundation. It is a common source of open data that Wikimedia projects such as Wikipedia, and anyone else, are able to use under the CC0 public domain ...
*
Wiktionary
Wiktionary (, ; , ; rhyming with "dictionary") is a multilingual, web-based project to create a free content dictionary of terms (including words, phrases, proverbs, linguistic reconstructions, etc.) in all natural languages and in a number o ...
*
Word sense disambiguation
Word-sense disambiguation is the process of identifying which sense of a word is meant in a sentence or other segment of context. In human language processing and cognition, it is usually subconscious.
Given that natural language requires re ...
*
Word sense induction
*
UBY
References
External links
*
{{Natural language processing
Lexical databases
Knowledge bases
Ontology (information science)
Knowledge representation
Computational linguistics
Online dictionaries
Multilingualism