BabelNet
   HOME

TheInfoList



OR:

BabelNet is a
multilingual Multilingualism is the use of more than one language, either by an individual speaker or by a group of speakers. It is believed that multilingual speakers outnumber monolingual speakers in the world's population. More than half of all ...
lexicalized semantic network and
ontology In metaphysics, ontology is the philosophy, philosophical study of being, as well as related concepts such as existence, Becoming (philosophy), becoming, and reality. Ontology addresses questions like how entities are grouped into Category ...
developed at the NLP group of the
Sapienza University of Rome The Sapienza University of Rome ( it, Sapienza – Università di Roma), also called simply Sapienza or the University of Rome, and formally the Università degli Studi di Roma "La Sapienza", is a public research university located in Rome, Ita ...
.R. Navigli and S. P Ponzetto. 2012
BabelNet: The Automatic Construction, Evaluation and Application of a Wide-Coverage Multilingual Semantic Network
Artificial Intelligence, 193, Elsevier, pp. 217-250.
BabelNet was automatically created by linking Wikipedia to the most popular computational
lexicon A lexicon is the vocabulary of a language or branch of knowledge (such as nautical or medical). In linguistics, a lexicon is a language's inventory of lexemes. The word ''lexicon'' derives from Greek word (), neuter of () meaning 'of or fo ...
of the
English language English is a West Germanic language of the Indo-European language family, with its earliest forms spoken by the inhabitants of early medieval England. It is named after the Angles, one of the ancient Germanic peoples that migrated to the ...
,
WordNet WordNet is a lexical database of semantic relations between words in more than 200 languages. WordNet links words into semantic relations including synonyms, hyponyms, and meronyms. The synonyms are grouped into '' synsets'' with short defin ...
. The integration is done using an automatic mapping and by filling in lexical gaps in resource-poor
language Language is a structured system of communication. The structure of a language is its grammar and the free components are its vocabulary. Languages are the primary means by which humans communicate, and may be conveyed through a variety of ...
s by using
statistical machine translation Statistical machine translation (SMT) is a machine translation paradigm where translations are generated on the basis of statistical models whose parameters are derived from the analysis of bilingual text corpora. The statistical approach contras ...
. The result is an
encyclopedic dictionary An encyclopedic dictionary typically includes many short listings, arranged alphabetically, and discussing a wide range of topics. Encyclopedic dictionaries can be general, containing articles on topics in many different fields; or they can s ...
that provides
concept Concepts are defined as abstract ideas. They are understood to be the fundamental building blocks of the concept behind principles, thoughts and beliefs. They play an important role in all aspects of cognition. As such, concepts are studied by ...
s and named entities
lexicalized In linguistics, lexicalization is the process of adding words, set phrases, or word patterns to a language's lexicon. Whether '' word formation'' and ''lexicalization'' refer to the same process is controversial within the field of linguistics. M ...
in many languages and connected with large amounts of semantic relations. Additional lexicalizations and definitions are added by linking to free-license wordnets, OmegaWiki, the English
Wiktionary Wiktionary ( , , rhyming with "dictionary") is a multilingual, web-based project to create a free content dictionary of terms (including words, phrases, proverbs, linguistic reconstructions, etc.) in all natural languages and in a num ...
,
Wikidata Wikidata is a collaboratively edited multilingual knowledge graph hosted by the Wikimedia Foundation. It is a common source of open data that Wikimedia projects such as Wikipedia, and anyone else, can use under the CC0 public domain license ...
, FrameNet, VerbNet and others. Similarly to WordNet, BabelNet groups
word A word is a basic element of language that carries an objective or practical meaning, can be used on its own, and is uninterruptible. Despite the fact that language speakers often have an intuitive grasp of what a word is, there is no conse ...
s in different languages into sets of
synonyms A synonym is a word, morpheme, or phrase that means exactly or nearly the same as another word, morpheme, or phrase in a given language. For example, in the English language, the words ''begin'', ''start'', ''commence'', and ''initiate'' are ...
, called ''Babel synsets''. For each Babel synset, BabelNet provides short definitions (called glosses) in many languages harvested from both WordNet and Wikipedia.


Statistics of BabelNet

, BabelNet (version 5.0) covers 500
language Language is a structured system of communication. The structure of a language is its grammar and the free components are its vocabulary. Languages are the primary means by which humans communicate, and may be conveyed through a variety of ...
s. It contains almost 20 million synsets and around 1.4 billion
word sense In linguistics, a word sense is one of the meanings of a word. For example, a dictionary may have over 50 different senses of the word " play", each of these having a different meaning based on the context of the word's usage in a sentence, as ...
s (regardless of their language). Each Babel synset contains 2 synonyms per language, i.e., word senses, on average. The semantic network includes all the lexico-semantic relations from WordNet ( hypernymy and hyponymy, meronymy and
holonymy In linguistics, meronymy () is a semantic relation between a meronym denoting a part and a holonym denoting a whole. In simpler terms, a meronym is in a ''part-of'' relationship with its holonym. For example, ''finger'' is a meronym of ''hand' ...
,
antonymy In lexical semantics, opposites are words lying in an inherently incompatible binary relationship. For example, something that is ''long'' entails that it is not ''short''. It is referred to as a 'binary' relationship because there are two members ...
and
synonymy A synonym is a word, morpheme, or phrase that means exactly or nearly the same as another word, morpheme, or phrase in a given language. For example, in the English language, the words ''begin'', ''start'', ''commence'', and ''initiate'' are all ...
, etc., totaling around 364,000 relation edges) as well as an underspecified relatedness relation from Wikipedia (totaling around 1.3 billion edges). Version 5.0 also associates around 51 million images with Babel synsets and provides a Lemon RDF encoding of the resource, available via a SPARQL endpoint. 2.67 million synsets are assigned domain labels.


Applications

BabelNet has been shown to enable multilingual
Natural Language Processing Natural language processing (NLP) is an interdisciplinary subfield of linguistics, computer science, and artificial intelligence concerned with the interactions between computers and human language, in particular how to program computers to proc ...
applications. The lexicalized
knowledge Knowledge can be defined as awareness of facts or as practical skills, and may also refer to familiarity with objects or situations. Knowledge of facts, also called propositional knowledge, is often defined as true belief that is distin ...
available in BabelNet has been shown to obtain state-of-the-art results in: * semantic relatedness * multilingual Word Sense Disambiguation * multilingual Word Sense Disambiguation and Entity Linking with the Babelfy system * video games with a purpose


Prizes and acknowledgments

BabelNet received th
META prize
2015 for "groundbreaking work in overcoming language barriers through a multilingual lexicalised semantic network and ontology making use of heterogeneous data sources". BabelNet featured prominently in a ''
Time Time is the continued sequence of existence and event (philosophy), events that occurs in an apparently irreversible process, irreversible succession from the past, through the present, into the future. It is a component quantity of various me ...
'' magazine articleSteinmetz, Katy (May 12, 2016)
"Redefining the Modern Dictionary"
''
Time Time is the continued sequence of existence and event (philosophy), events that occurs in an apparently irreversible process, irreversible succession from the past, through the present, into the future. It is a component quantity of various me ...
''. 187: 20-21.
about the new age of innovative and up-to-date lexical knowledge resources available on the Web.


See also

* Babelfy *
EuroWordNet EuroWordNet is a system of semantic networks for European languages, based on WordNet. Each language develops its own wordnet but they are interconnected with ''interlingual links'' stored in the ''Interlingual Index'' (ILI). Unlike the origina ...
* Knowledge acquisition * Linguistic Linked Open Data * Semantic network * Semantic relatedness *
Wikidata Wikidata is a collaboratively edited multilingual knowledge graph hosted by the Wikimedia Foundation. It is a common source of open data that Wikimedia projects such as Wikipedia, and anyone else, can use under the CC0 public domain license ...
*
Wiktionary Wiktionary ( , , rhyming with "dictionary") is a multilingual, web-based project to create a free content dictionary of terms (including words, phrases, proverbs, linguistic reconstructions, etc.) in all natural languages and in a num ...
* Word sense disambiguation *
Word sense induction In computational linguistics, word-sense induction (WSI) or discrimination is an open problem of natural language processing, which concerns the automatic identification of the senses of a word (i.e. meanings). Given that the output of word-sense i ...
*
UBY UBY is a large-scale lexical-semantic resource for natural language processing (NLP) developed at the Ubiquitous Knowledge Processing Lab (UKP) in the department of Computer Science of the Technische Universität Darmstadt . UBY is based on the ...


References


External links

* {{Natural language processing Lexical databases Knowledge bases Ontology (information science) Knowledge representation Computational linguistics Online dictionaries Multilingualism