HOME

TheInfoList



OR:

BabelNet is a
multilingual Multilingualism is the use of more than one language, either by an individual speaker or by a group of speakers. When the languages are just two, it is usually called bilingualism. It is believed that multilingual speakers outnumber monolin ...
lexical-semantic
knowledge graph In knowledge representation and reasoning, a knowledge graph is a knowledge base that uses a Graph (discrete mathematics), graph-structured data model or topology to represent and operate on data. Knowledge graphs are often used to store interl ...
,
ontology Ontology is the philosophical study of existence, being. It is traditionally understood as the subdiscipline of metaphysics focused on the most general features of reality. As one of the most fundamental concepts, being encompasses all of realit ...
and encyclopedic
dictionary A dictionary is a listing of lexemes from the lexicon of one or more specific languages, often arranged Alphabetical order, alphabetically (or by Semitic root, consonantal root for Semitic languages or radical-and-stroke sorting, radical an ...
developed at the NLP group of the
Sapienza University of Rome The Sapienza University of Rome (), formally the Università degli Studi di Roma "La Sapienza", abbreviated simply as Sapienza ('Wisdom'), is a Public university, public research university located in Rome, Italy. It was founded in 1303 and is ...
under the supervision of Roberto Navigli.R. Navigli and S. P Ponzetto. 2012
BabelNet: The Automatic Construction, Evaluation and Application of a Wide-Coverage Multilingual Semantic Network
Artificial Intelligence, 193, Elsevier, pp. 217-250.
BabelNet was automatically created by linking
Wikipedia Wikipedia is a free content, free Online content, online encyclopedia that is written and maintained by a community of volunteers, known as Wikipedians, through open collaboration and the wiki software MediaWiki. Founded by Jimmy Wales and La ...
to the most popular computational
lexicon A lexicon (plural: lexicons, rarely lexica) is the vocabulary of a language or branch of knowledge (such as nautical or medical). In linguistics, a lexicon is a language's inventory of lexemes. The word ''lexicon'' derives from Greek word () ...
of the
English language English is a West Germanic language that developed in early medieval England and has since become a English as a lingua franca, global lingua franca. The namesake of the language is the Angles (tribe), Angles, one of the Germanic peoples th ...
,
WordNet WordNet is a lexical database of semantic relations between words that links words into semantic relations including synonyms, hyponyms, and meronyms. The synonyms are grouped into ''synsets'' with short definitions and usage examples. It can thu ...
. The integration is done using an automatic mapping and by filling in lexical gaps in resource-poor
language Language is a structured system of communication that consists of grammar and vocabulary. It is the primary means by which humans convey meaning, both in spoken and signed language, signed forms, and may also be conveyed through writing syste ...
s by using
statistical machine translation Statistical machine translation (SMT) is a machine translation approach where translations are generated on the basis of statistical models whose parameters are derived from the analysis of bilingual text corpora. The statistical approach contra ...
. The result is an encyclopedic dictionary that provides
concept A concept is an abstract idea that serves as a foundation for more concrete principles, thoughts, and beliefs. Concepts play an important role in all aspects of cognition. As such, concepts are studied within such disciplines as linguistics, ...
s and named entities
lexicalized In linguistics, lexicalization is the process of adding words, set phrases, or word patterns to a language's lexicon. Whether ''word formation'' and ''lexicalization'' refer to the same process is controversial within the field of linguistics. Mo ...
in many languages and connected with large amounts of
semantic relations Semantics is the study of linguistic meaning. It examines what meaning is, how words get their meaning, and how the meaning of a complex expression depends on its parts. Part of this process involves the distinction between sense and referenc ...
. Additional lexicalizations and definitions are added by linking to free-license wordnets, OmegaWiki, the English
Wiktionary Wiktionary (, ; , ; rhyming with "dictionary") is a multilingual, web-based project to create a free content dictionary of terms (including words, phrases, proverbs, linguistic reconstructions, etc.) in all natural languages and in a number o ...
,
Wikidata Wikidata is a collaboratively edited multilingual knowledge graph hosted by the Wikimedia Foundation. It is a common source of open data that Wikimedia projects such as Wikipedia, and anyone else, are able to use under the CC0 public domain ...
,
FrameNet FrameNet is a group of online lexical databases based upon the theory of meaning known as Frame semantics, developed by linguist Charles J. Fillmore. The project's fundamental notion is simple: most words' meanings may be best understood in ter ...
, VerbNet and others. Similarly to WordNet, BabelNet groups
word A word is a basic element of language that carries semantics, meaning, can be used on its own, and is uninterruptible. Despite the fact that language speakers often have an intuitive grasp of what a word is, there is no consensus among linguist ...
s in different languages into sets of
synonyms A synonym is a word, morpheme, or phrase that means precisely or nearly the same as another word, morpheme, or phrase in a given language. For example, in the English language, the words ''begin'', ''start'', ''commence'', and ''initiate'' are a ...
, called ''Babel
synsets In metadata, a synonym ring or synset, is a group of data elements that are considered semantically equivalent for the purposes of information retrieval. These data elements are frequently found in different metadata registries. Although a group ...
''. For each Babel synset, BabelNet provides short definitions (called glosses) in many languages harvested from both WordNet and Wikipedia.


Statistics of BabelNet

, BabelNet (version 5.3) covers 600
language Language is a structured system of communication that consists of grammar and vocabulary. It is the primary means by which humans convey meaning, both in spoken and signed language, signed forms, and may also be conveyed through writing syste ...
s. It contains almost 23 million synsets and around 1.7 billion
word sense In linguistics, a word sense is one of the meanings of a word. For example, a dictionary may have over 50 different senses of the word "play", each of these having a different meaning based on the context of the word's usage in a sentence, as f ...
s (regardless of their language). Each Babel synset contains 2 synonyms per language, i.e., word senses, on average. The semantic network includes all the lexico-semantic relations from WordNet (
hypernymy and hyponymy Hypernymy and hyponymy are the wikt:Wiktionary:Semantic relations, semantic relations between a generic term (''hypernym'') and a more specific term (''hyponym''). The hypernym is also called a ''supertype'', ''umbrella term'', or ''blanket term ...
,
meronymy In linguistics, meronymy () is a semantic relation between a meronym denoting a part and a holonym denoting a whole. In simpler terms, a meronym is in a ''part-of'' relationship with its holonym. For example, ''finger'' is a meronym of ''hand, ...
and
holonymy In linguistics, meronymy () is a semantic relation between a meronym denoting a part and a holonym denoting a whole. In simpler terms, a meronym is in a ''part-of'' relationship with its holonym. For example, ''finger'' is a meronym of ''hand, ...
,
antonymy In lexical semantics, opposites are words lying in an inherently incompatible binary relationship. For example, something that is ''even'' entails that it is not ''odd''. It is referred to as a 'binary' relationship because there are two members i ...
and synonymy, etc., totaling around 364,000 relation edges) as well as an underspecified relatedness relation from Wikipedia (totaling around 1.9 billion edges). Version 5.3 also associates around 61 million images with Babel synsets and provides a Lemon RDF encoding of the resource, available via a SPARQL endpoint. 2.67 million synsets are assigned domain labels.


Applications

BabelNet has been shown to enable multilingual
natural language processing Natural language processing (NLP) is a subfield of computer science and especially artificial intelligence. It is primarily concerned with providing computers with the ability to process data encoded in natural language and is thus closely related ...
applications. The lexicalized
knowledge Knowledge is an Declarative knowledge, awareness of facts, a Knowledge by acquaintance, familiarity with individuals and situations, or a Procedural knowledge, practical skill. Knowledge of facts, also called propositional knowledge, is oft ...
available in BabelNet has been shown to obtain state-of-the-art results in: *
Semantic relatedness Semantic similarity is a metric defined over a set of documents or terms, where the idea of distance between items is based on the likeness of their meaning or semantic content as opposed to lexicographical similarity. These are mathematical too ...
, * Multilingual
word-sense disambiguation Word-sense disambiguation is the process of identifying which sense of a word is meant in a sentence or other segment of context. In human language processing and cognition, it is usually subconscious. Given that natural language requires ref ...
and
entity linking In natural language processing, Entity Linking, also referred to as named-entity disambiguation (NED), named-entity recognition and disambiguation (NERD), named-entity normalization (NEN), or Concept Recognition, is the task of assigning a unique ...
, with the Babelfy system, * Video games with a purpose.


Prizes and acknowledgments

BabelNet received the META prize 2015 for "groundbreaking work in overcoming language barriers through a multilingual lexicalised semantic network and ontology making use of heterogeneous data sources". The Artificial Intelligence Journal paper that describes BabelNet won the Prominent Paper Award in 2017. BabelNet featured prominently in a ''
Time Time is the continuous progression of existence that occurs in an apparently irreversible process, irreversible succession from the past, through the present, and into the future. It is a component quantity of various measurements used to sequ ...
'' magazine articleSteinmetz, Katy (May 12, 2016)
"Redefining the Modern Dictionary"
''
Time Time is the continuous progression of existence that occurs in an apparently irreversible process, irreversible succession from the past, through the present, and into the future. It is a component quantity of various measurements used to sequ ...
''. 187: 20-21.
about the new age of innovative and up-to-date lexical knowledge resources available on the Web.


See also

* Babelfy * EuroWordNet * Knowledge acquisition * Linguistic Linked Open Data *
Semantic network A semantic network, or frame network is a knowledge base that represents semantic relations between concepts in a network. This is often used as a form of knowledge representation. It is a directed or undirected graph consisting of vertices, ...
*
Semantic relatedness Semantic similarity is a metric defined over a set of documents or terms, where the idea of distance between items is based on the likeness of their meaning or semantic content as opposed to lexicographical similarity. These are mathematical too ...
*
Wikidata Wikidata is a collaboratively edited multilingual knowledge graph hosted by the Wikimedia Foundation. It is a common source of open data that Wikimedia projects such as Wikipedia, and anyone else, are able to use under the CC0 public domain ...
*
Wiktionary Wiktionary (, ; , ; rhyming with "dictionary") is a multilingual, web-based project to create a free content dictionary of terms (including words, phrases, proverbs, linguistic reconstructions, etc.) in all natural languages and in a number o ...
*
Word sense disambiguation Word-sense disambiguation is the process of identifying which sense of a word is meant in a sentence or other segment of context. In human language processing and cognition, it is usually subconscious. Given that natural language requires re ...
* Word sense induction * UBY


References


External links

* {{Natural language processing Lexical databases Knowledge bases Ontology (information science) Knowledge representation Computational linguistics Online dictionaries Multilingualism