lexical database
In digital lexicography, natural language processing, and digital humanities, a lexical resource is a language resource consisting of data regarding the lexemes of the lexicon of one or more languages e.g., in the form of a database.
Characterist ...
word
A word is a basic element of language that carries an semantics, objective or pragmatics, practical semantics, meaning, can be used on its own, and is uninterruptible. Despite the fact that language speakers often have an intuitive grasp of w ...
s in more than 200 languages. WordNet links
word
A word is a basic element of language that carries an semantics, objective or pragmatics, practical semantics, meaning, can be used on its own, and is uninterruptible. Despite the fact that language speakers often have an intuitive grasp of w ...
synonyms
A synonym is a word, morpheme, or phrase that means exactly or nearly the same as another word, morpheme, or phrase in a given language. For example, in the English language, the words ''begin'', ''start'', ''commence'', and ''initiate'' are all ...
,
hyponyms
In linguistics, semantics, general semantics, and ontologies, hyponymy () is a semantic relation between a hyponym denoting a subtype and a hypernym or hyperonym (sometimes called umbrella term or blanket term) denoting a supertype. In other wor ...
synsets
In metadata, a synonym ring or synset, is a group of data elements that are considered semantically equivalent for the purposes of information retrieval. These data elements are frequently found in different metadata registries. Although a group ...
'' with short definitions and usage examples. WordNet can thus be seen as a combination and extension of a
dictionary
A dictionary is a listing of lexemes from the lexicon of one or more specific languages, often arranged alphabetically (or by radical and stroke for ideographic languages), which may include information on definitions, usage, etymologies ...
and thesaurus. While it is accessible to human users via a
web browser
A web browser is application software for accessing websites. When a user requests a web page from a particular website, the browser retrieves its files from a web server and then displays the page on the user's screen. Browsers are used on ...
, its primary use is in automatic
text analysis
Content analysis is the study of documents and communication artifacts, which might be texts of various formats, pictures, audio or video. Social scientists use content analysis to examine patterns in communication in a replicable and systematic ...
and
artificial intelligence
Artificial intelligence (AI) is intelligence—perceiving, synthesizing, and inferring information—demonstrated by machines, as opposed to intelligence displayed by animals and humans. Example tasks in which this is done include speech re ...
applications. WordNet was first created in the
English language
English is a West Germanic language of the Indo-European language family, with its earliest forms spoken by the inhabitants of early medieval England. It is named after the Angles, one of the ancient Germanic peoples that migrated to the is ...
and the English WordNet
database
In computing, a database is an organized collection of data stored and accessed electronically. Small databases can be stored on a file system, while large databases are hosted on computer clusters or cloud storage. The design of databases sp ...
and
software
Software is a set of computer programs and associated documentation and data. This is in contrast to hardware, from which the system is built and which actually performs the work.
At the lowest programming level, executable code consists ...
tools have been released under a BSD style license and are freely available for download from that WordNet website.
History and team members
WordNet was first created in English only in the Cognitive Science Laboratory of
Princeton University
Princeton University is a private university, private research university in Princeton, New Jersey. Founded in 1746 in Elizabeth, New Jersey, Elizabeth as the College of New Jersey, Princeton is the List of Colonial Colleges, fourth-oldest ins ...
under the direction of
psychology
Psychology is the scientific study of mind and behavior. Psychology includes the study of conscious and unconscious phenomena, including feelings and thoughts. It is an academic discipline of immense scope, crossing the boundaries betwe ...
professor
Professor (commonly abbreviated as Prof.) is an Academy, academic rank at university, universities and other post-secondary education and research institutions in most countries. Literally, ''professor'' derives from Latin as a "person who pr ...
Christiane Fellbaum
Christiane D. Fellbaum is a Lecturer with Rank of Professor in the Program in Linguistics and the Computer Science Department at Princeton University. The co-developer of the WordNet project, she is also its current director.
Biography
Fellbaum r ...
. The project was initially funded by the U.S. Office of Naval Research and later also by other U.S. government agencies including the
DARPA
The Defense Advanced Research Projects Agency (DARPA) is a research and development agency of the United States Department of Defense responsible for the development of emerging technologies for use by the military.
Originally known as the Adv ...
, the
National Science Foundation
The National Science Foundation (NSF) is an independent agency of the United States government that supports fundamental research and education in all the non-medical fields of science and engineering. Its medical counterpart is the National I ...
, the
Disruptive Technology Office
The Disruptive Technology Office (DTO) was a funding agency within the United States Intelligence Community. It was previously known as the Advanced Research and Development Activity (ARDA). In December 2007, DTO was folded into the newly created ...
(formerly the Advanced Research and Development Activity), and REFLEX. George Miller and Christiane Fellbaum were awarded the 2006 Antonio Zampolli Prize for their work with WordNet.
The Global WordNet Association is a non-commercial organization that provides a platform for discussing, sharing and connecting WordNets for all languages in the world.
Christiane Fellbaum
Christiane D. Fellbaum is a Lecturer with Rank of Professor in the Program in Linguistics and the Computer Science Department at Princeton University. The co-developer of the WordNet project, she is also its current director.
Biography
Fellbaum r ...
The database contains 155,327 words organized in 175,979
synsets
In metadata, a synonym ring or synset, is a group of data elements that are considered semantically equivalent for the purposes of information retrieval. These data elements are frequently found in different metadata registries. Although a group ...
for a total of 207,016 word-sense pairs; in compressed form, it is about 12
megabyte
The megabyte is a multiple of the unit byte for digital information. Its recommended unit symbol is MB. The unit prefix ''mega'' is a multiplier of (106) in the International System of Units (SI). Therefore, one megabyte is one million bytes o ...
s in size.
WordNet includes the lexical categories
noun
A noun () is a word that generally functions as the name of a specific object or set of objects, such as living creatures, places, actions, qualities, states of existence, or ideas.Example nouns for:
* Living creatures (including people, alive, d ...
s,
verb
A verb () is a word (part of speech) that in syntax generally conveys an action (''bring'', ''read'', ''walk'', ''run'', ''learn''), an occurrence (''happen'', ''become''), or a state of being (''be'', ''exist'', ''stand''). In the usual descri ...
s,
adjective
In linguistics, an adjective (list of glossing abbreviations, abbreviated ) is a word that generally grammatical modifier, modifies a noun or noun phrase or describes its referent. Its semantic role is to change information given by the noun.
Tra ...
s and
adverb An adverb is a word or an expression that generally modifies a verb, adjective, another adverb, determiner, clause, preposition, or sentence. Adverbs typically express manner, place, time, frequency, degree, level of certainty, etc., answering ...
s but ignores
preposition
Prepositions and postpositions, together called adpositions (or broadly, in traditional grammar, simply prepositions), are a class of words used to express spatial or temporal relations (''in'', ''under'', ''towards'', ''before'') or mark various ...
s,
determiner
A determiner, also called determinative (abbreviated ), is a word, phrase, or affix that occurs together with a noun or noun phrase and generally serves to express the reference of that noun or noun phrase in the context. That is, a determiner m ...
s and other function words.
Words from the same lexical category that are roughly synonymous are grouped into
synsets
In metadata, a synonym ring or synset, is a group of data elements that are considered semantically equivalent for the purposes of information retrieval. These data elements are frequently found in different metadata registries. Although a group ...
. Synsets include simplex words as well as
collocation
In corpus linguistics, a collocation is a series of words or terms that co-occur more often than would be expected by chance. In phraseology, a collocation is a type of compositional phraseme, meaning that it can be understood from the words th ...
s like "eat out" and "car pool." The different senses of a polysemous word form are assigned to different synsets. The meaning of a synset is further clarified with a short defining ''gloss'' and one or more usage examples. An example adjective synset is:
: good, right, ripe – (most suitable or right for a particular purpose; "a good time to plant tomatoes"; "the right time to act"; "the time is ripe for great sociological changes")
All synsets are connected to other synsets by means of semantic relations. These relations, which are not all shared by all lexical categories, include:
*
Noun
A noun () is a word that generally functions as the name of a specific object or set of objects, such as living creatures, places, actions, qualities, states of existence, or ideas.Example nouns for:
* Living creatures (including people, alive, d ...
s
**'' hypernyms'': ''Y'' is a hypernym of ''X'' if every ''X'' is a (kind of) ''Y'' (''canine'' is a hypernym of ''
dog
The dog (''Canis familiaris'' or ''Canis lupus familiaris'') is a domesticated descendant of the wolf. Also called the domestic dog, it is derived from the extinct Pleistocene wolf, and the modern wolf is the dog's nearest living relative. Do ...
'')
**'' hyponyms'': ''Y'' is a hyponym of ''X'' if every ''Y'' is a (kind of) ''X'' (''dog'' is a hyponym of ''canine'')
**''coordinate terms'': ''Y'' is a coordinate term of ''X'' if ''X'' and ''Y'' share a hypernym (''wolf'' is a coordinate term of ''dog'', and ''dog'' is a coordinate term of ''wolf'')
**'' meronym'': ''Y'' is a meronym of ''X'' if ''Y'' is a part of ''X'' (''window'' is a meronym of ''building'')
**'' holonym'': ''Y'' is a holonym of ''X'' if ''X'' is a part of ''Y'' (''building'' is a holonym of ''window'')
*
Verb
A verb () is a word (part of speech) that in syntax generally conveys an action (''bring'', ''read'', ''walk'', ''run'', ''learn''), an occurrence (''happen'', ''become''), or a state of being (''be'', ''exist'', ''stand''). In the usual descri ...
s
**''hypernym'': the verb ''Y'' is a hypernym of the verb ''X'' if the activity ''X'' is a (kind of) ''Y'' (''to perceive'' is an hypernym of ''to listen'')
**''
troponym
In linguistics, troponymy is the presence of a 'manner' relation between two lexemes.
The concept was originally proposed by Christiane Fellbaum and George Miller. Some examples they gave are "to nibble is to eat in a certain manner, and to g ...
'': the verb ''Y'' is a troponym of the verb ''X'' if the activity ''Y'' is doing ''X'' in some manner (''to lisp'' is a troponym of ''to talk'')
**'' entailment'': the verb ''Y'' is entailed by ''X'' if by doing ''X'' you must be doing ''Y'' (''to sleep'' is entailed by ''to snore'')
**''coordinate terms'': those verbs sharing a common hypernym (''to lisp'' and ''to yell'')
These semantic relations hold among all members of the linked synsets. Individual synset members (words) can also be connected with lexical relations. For example, (one sense of) the noun "director" is linked to (one sense of) the verb "direct" from which it is derived via a "morphosemantic" link.
The morphology functions of the software distributed with the database try to deduce the
lemma
Lemma may refer to:
Language and linguistics
* Lemma (morphology), the canonical, dictionary or citation form of a word
* Lemma (psycholinguistics), a mental abstraction of a word about to be uttered
Science and mathematics
* Lemma (botany), a ...
or
stem
Stem or STEM may refer to:
Plant structures
* Plant stem, a plant's aboveground axis, made of vascular tissue, off which leaves and flowers hang
* Stipe (botany), a stalk to support some other structure
* Stipe (mycology), the stem of a mushro ...
form of a
word
A word is a basic element of language that carries an semantics, objective or pragmatics, practical semantics, meaning, can be used on its own, and is uninterruptible. Despite the fact that language speakers often have an intuitive grasp of w ...
from the user's input. Irregular forms are stored in a list, and looking up "ate" will return "eat," for example.
Knowledge structure
Both nouns and verbs are organized into hierarchies, defined by hypernym or ''
IS A
In knowledge representation, object-oriented programming and Object-oriented design, design (see object-oriented program architecture), is-a (is_a or is a) is a wikt:subsume, subsumption relationship between abstractions (e.g. type (disambiguation) ...
'' relationships. For instance, one sense of the word ''dog'' is found following hypernym hierarchy; the words at the same level represent synset members. Each set of synonyms has a unique index.
* dog, domestic dog, Canis familiaris
** canine, canid
*** carnivore
**** placental, placental mammal, eutherian, eutherian mammal
***** mammal
****** vertebrate, craniate
******* chordate
******** animal, animate being, beast, brute, creature, fauna
********* ...
At the top level, these hierarchies are organized into 25 beginner "trees" for nouns and 15 for verbs (called ''lexicographic files'' at a maintenance level). All are linked to a unique beginner synset, "entity".
Noun hierarchies are far deeper than verb hierarchies
Adjectives are not organized into hierarchical trees. Instead, two "central" antonyms such as "hot" and "cold" form binary poles, while 'satellite' synonyms such as "steaming" and "chilly" connect to their respective poles via a "similarity" relations. The adjectives can be visualized in this way as "dumbbells" rather than as "trees".
Psycholinguistic aspects
The initial goal of the WordNet project was to build a lexical database that would be consistent with theories of human semantic memory developed in the late 1960s. Psychological experiments indicated that speakers organized their knowledge of concepts in an economic, hierarchical fashion. Retrieval time required to access conceptual knowledge seemed to be directly related to the number of hierarchies the speaker needed to "traverse" to access the knowledge. Thus, speakers could more quickly verify that ''canaries can sing'' because a canary is a songbird, but required slightly more time to verify that ''canaries can fly'' (where they had to access the concept "bird" on the superordinate level) and even more time to verify ''canaries have skin'' (requiring look-up across multiple levels of hyponymy, up to "animal").
While such psycholinguistic experiments and the underlying theories have been subject to criticism, some of WordNet's organization is consistent with experimental evidence. For example,
anomic aphasia
Anomic aphasia (also known as dysnomia, nominal aphasia, and amnesic aphasia) is a mild, fluent type of aphasia where individuals have word retrieval failures and cannot express the words they want to say (particularly nouns and verbs). By contra ...
selectively affects speakers' ability to produce words from a specific semantic category, a WordNet hierarchy. Antonymous adjectives (WordNet's central adjectives in the dumbbell structure) are found to co-occur far more frequently than chance, a fact that has been found to hold for many languages.
As a lexical ontology
WordNet is sometimes called an ontology, a persistent claim that its creators do not make. The hypernym/hyponym relationships among the noun synsets can be interpreted as specialization relations among conceptual categories. In other words, WordNet can be interpreted and used as a lexical
ontology
In metaphysics, ontology is the philosophical study of being, as well as related concepts such as existence, becoming, and reality.
Ontology addresses questions like how entities are grouped into categories and which of these entities exis ...
in the
computer science
Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to Applied science, practical discipli ...
sense. However, such an ontology should be corrected before being used, because it contains hundreds of basic semantic inconsistencies; for example there are, (i) common specializations for exclusive categories and (ii) redundancies in the specialization hierarchy. Furthermore, transforming WordNet into a lexical ontology usable for knowledge representation should normally also involve (i) distinguishing the specialization relations into ''subtypeOf'' and ''instanceOf'' relations, and (ii) associating intuitive unique identifiers to each category. Although such corrections and transformations have been performed and documented as part of the integration of WordNet 1.7 into the cooperatively updatable knowledge base of WebKB-2, most projects claiming to re-use WordNet for knowledge-based applications (typically, knowledge-oriented information retrieval) simply re-use it directly.
WordNet has also been converted to a formal specification, by means of a hybrid bottom-up top-down methodology to automatically extract association relations from WordNet, and interpret these associations in terms of a set of conceptual relations, formally defined in the DOLCE foundational ontology.
In most works that claim to have integrated WordNet into ontologies, the content of WordNet has not simply been corrected when it seemed necessary; instead, WordNet has been heavily re-interpreted and updated whenever suitable. This was the case when, for example, the top-level ontology of WordNet was re-structured according to the
OntoClean OntoClean is a methodology for analyzing ontologies based on formal, domain-independent properties of classes (the metaproperties) developed by Nicola Guarino and Chris Welty.
Overview and History
OntoClean was the first attempt to formalize notio ...
based approach or when WordNet was used as a primary source for constructing the lower classes of the SENSUS ontology.
Limitations
The most widely discussed limitation of WordNet (and related resources like ImageNet) is that some of the semantic relations are more suited to concrete concepts than to abstract concepts. For example, it is easy to create hyponyms/hypernym relationships to capture that a "
conifer
Conifers are a group of conifer cone, cone-bearing Spermatophyte, seed plants, a subset of gymnosperms. Scientifically, they make up the phylum, division Pinophyta (), also known as Coniferophyta () or Coniferae. The division contains a single ...
" is a type of "
tree
In botany, a tree is a perennial plant with an elongated stem, or trunk, usually supporting branches and leaves. In some usages, the definition of a tree may be narrower, including only woody plants with secondary growth, plants that are ...
", a "tree" is a type of "
plant
Plants are predominantly photosynthetic eukaryotes of the kingdom Plantae. Historically, the plant kingdom encompassed all living things that were not animals, and included algae and fungi; however, all current definitions of Plantae exclud ...
", and a "plant" is a type of "
organism
In biology, an organism () is any living system that functions as an individual entity. All organisms are composed of cells (cell theory). Organisms are classified by taxonomy into groups such as multicellular animals, plants, and ...
", but it is difficult to classify emotions like "fear" or "happiness" into equally deep and well-defined hyponyms/hypernym relationships.
Many of the concepts in WordNet are specific to certain languages and the most accurate reported mapping between languages is 94%. Synonyms, hyponyms, meronyms, and antonyms occur in all languages with a WordNet so far, but other semantic relationships are language-specific. This limits the interoperability across languages. However, it also makes WordNet a resource for highlighting and studying the differences between languages, so it is not necessarily a limitation for all use cases.
WordNet does not include information about the
etymology
Etymology ()The New Oxford Dictionary of English (1998) – p. 633 "Etymology /ˌɛtɪˈmɒlədʒi/ the study of the class in words and the way their meanings have changed throughout time". is the study of the history of the Phonological chan ...
or the pronunciation of words and it contains only limited information about usage. WordNet aims to cover most everyday words and does not include much domain-specific terminology.
WordNet is the most commonly used computational lexicon of English for word-sense disambiguation (WSD), a task aimed to assigning the context-appropriate meanings (i.e. synset members) to words in a text. However, it has been argued that WordNet encodes sense distinctions that are too fine-grained. This issue prevents WSD systems from achieving a level of performance comparable to that of humans, who do not always agree when confronted with the task of selecting a sense from a dictionary that matches a word in a context. The granularity issue has been tackled by proposing clustering methods that automatically group together similar senses of the same word.
Offensive content
WordNet includes words that can be perceived as
pejorative
A pejorative or slur is a word or grammatical form expressing a negative or a disrespectful connotation, a low opinion, or a lack of respect toward someone or something. It is also used to express criticism, hostility, or disregard. Sometimes, a ...
or offensive. The interpretation of a word can change over time and between social groups, so it is not always possible for WordNet to define a word as "
pejorative
A pejorative or slur is a word or grammatical form expressing a negative or a disrespectful connotation, a low opinion, or a lack of respect toward someone or something. It is also used to express criticism, hostility, or disregard. Sometimes, a ...
" or "offensive" in isolation. Therefore, people using WordNet must apply their own methods to identify offensive or pejorative words.
However, this limitation is true of other lexical resources like
dictionaries
A dictionary is a listing of lexemes from the lexicon of one or more specific languages, often arranged alphabetically (or by radical and stroke for ideographic languages), which may include information on definitions, usage, etymologies, p ...
pejorative
A pejorative or slur is a word or grammatical form expressing a negative or a disrespectful connotation, a low opinion, or a lack of respect toward someone or something. It is also used to express criticism, hostility, or disregard. Sometimes, a ...
and offensive words. Some dictionaries indicate words that are
pejorative
A pejorative or slur is a word or grammatical form expressing a negative or a disrespectful connotation, a low opinion, or a lack of respect toward someone or something. It is also used to express criticism, hostility, or disregard. Sometimes, a ...
s, but do not include all the contexts in which words might be acceptable or offensive to different social groups. Therefore, people using dictionaries must apply their own methods to identify all offensive words.
Licensed vs. Open WordNets
Some wordnets were subsequently created for other languages. A 2012 survey lists the wordnets and their availability. In an effort to propagate the usage of WordNets, the Global WordNet community had been slowly re-licensing their WordNets to an open domain where researchers and developers can easily access and use WordNets as language resources to provide ontological and
lexical
Lexical may refer to:
Linguistics
* Lexical corpus or lexis, a complete set of all words in a language
* Lexical item, a basic unit of lexicographical classification
* Lexicon, the vocabulary of a person, language, or branch of knowledge
* Lexical ...
knowledge in
natural-language processing
Natural language processing (NLP) is an interdisciplinary subfield of linguistics, computer science, and artificial intelligence concerned with the interactions between computers and human language, in particular how to program computers to proc ...
(NLP) tasks.
The Open Multilingual WordNet provides access to open licensed wordnets in a variety of languages, all linked to the Princeton Wordnet of English (PWN). The goal is to make it easy to use wordnets in multiple languages.
Applications
WordNet has been used for a number of purposes in information systems, including word-sense disambiguation,
information retrieval
Information retrieval (IR) in computing and information science is the process of obtaining information system resources that are relevant to an information need from a collection of those resources. Searches can be based on full-text or other co ...
machine translation
Machine translation, sometimes referred to by the abbreviation MT (not to be confused with computer-aided translation, machine-aided human translation or interactive translation), is a sub-field of computational linguistics that investigates t ...
and even automatic crossword puzzle generation.
A common use of WordNet is to determine the similarity between words. Various algorithms have been proposed, including measuring the distance among words and synsets in WordNet's graph structure, such as by counting the number of edges among synsets. The intuition is that the closer two words or synsets are, the closer their meaning. A number of WordNet-based word similarity algorithms are implemented in a
Perl
Perl is a family of two high-level, general-purpose, interpreted, dynamic programming languages. "Perl" refers to Perl 5, but from 2000 to 2019 it also referred to its redesigned "sister language", Perl 6, before the latter's name was offici ...
package called WordNet::Similarity, and in a
Python
Python may refer to:
Snakes
* Pythonidae, a family of nonvenomous snakes found in Africa, Asia, and Australia
** ''Python'' (genus), a genus of Pythonidae found in Africa and Asia
* Python (mythology), a mythical serpent
Computing
* Python (pro ...
package called NLTK. Other more sophisticated WordNet-based similarity techniques include ADW, whose implementation is available in
Java
Java (; id, Jawa, ; jv, ꦗꦮ; su, ) is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea to the north. With a population of 151.6 million people, Java is the world's List ...
. WordNet can also be used to inter-link other vocabularies.
Interfaces
Princeton maintains a list of related projects that includes links to some of the widely used
application programming interface
An application programming interface (API) is a way for two or more computer programs to communicate with each other. It is a type of software interface, offering a service to other pieces of software. A document or standard that describes how t ...
s available for accessing WordNet using various programming languages and environments.
Related projects and extensions
WordNet is connected to several databases of the Semantic Web. WordNet is also commonly re-used via mappings between the WordNet synsets and the categories from ontologies. Most often, only the top-level categories of WordNet are mapped.
Global WordNet Association
The Global WordNet Association (GWA) is a public and non-commercial organization that provides a platform for discussing, sharing and connecting wordnets for all languages in the world. The GWA also promotes the standardization of wordnets across languages, to ensure its uniformity in enumerating the synsets in human languages. The GWA keeps a list of wordnets developed around the world.
Arabic Ontology
Arabic Ontology is a linguistic ontology for the Arabic language, which can be used as an Arabic Wordnet with ontologically-clean content. People use it also as a tree (i.e. classification) of the concepts/meanings of the Arabic terms. It is a fo ...
, a linguistic ontology that has the same structure as wordnet, and mapped to it.
* The BalkaNet project has produced WordNets for six European languages (Bulgarian, Czech, Greek, Romanian, Turkish and Serbian). For this project, a freely available XML-based WordNet editor was developed. This editor – VisDic – is not in active development anymore, but is still used for the creation of various WordNets. Its successor, DEBVisDic, is client-server application and is currently used for the editing of several WordNets (Dutch in Cornetto project, Polish, Hungarian, several African languages, Chinese).
*
BulNet
The Bulgarian WordNet (BulNet) is an electronic multilingual dictionary of synonym sets along with their explanatory definitions and sets of semantic relations with other words in the language.
It follows the Princeton WordNet (PWN) framework whic ...
is a Bulgarian version of the WordNet developed at the Department of Computational Linguistics of the
Institute for Bulgarian Language The Institute for Bulgarian Language (in Bulgarian: Институт за български език) is the language regulator of the Bulgarian language. It was created on May 15, 1942, and is based in Sofia. The institute develops a national dic ...
, Bulgarian Academy of Sciences.
* CWN (Chinese Wordnet or 中文詞彙網路) supported by National Taiwan University.
* The EuroWordNet project has produced WordNets for several European languages and linked them together; these are not freely available however. The Global Wordnet project attempts to coordinate the production and linking of "wordnets" for all languages.
Oxford University Press
Oxford University Press (OUP) is the university press of the University of Oxford. It is the largest university press in the world, and its printing history dates back to the 1480s. Having been officially granted the legal right to print books ...
, the publisher of the
Oxford English Dictionary
The ''Oxford English Dictionary'' (''OED'') is the first and foundational historical dictionary of the English language, published by Oxford University Press (OUP). It traces the historical development of the English language, providing a com ...
, has voiced plans to produce their own online competitor to WordNet.
* FinnWordNet is a Finnish version of the WordNet where all entries of the original English WordNet were translated.
*
GermaNet
GermaNet is a semantic network for the German language. It relates nouns, verbs, and adjectives semantically by grouping lexical units that express the same concept into ''synsets'' and by defining semantic relations between these synsets. GermaNe ...
is a German version of the WordNet developed by the University of Tübingen.
* The
IndoWordNet IndoWordNetPushpak Bhattacharyya, IndoWordNet, Lexical Resources Engineering Conference 2010 (LREC 2010), Malta, May, 2010. is a linked lexical knowledge base of wordnets of 18 scheduled languages of India, viz., Assamese, Bangla, Bodo, Gujarati, Hi ...
Pushpak Bhattacharyya, IndoWordNet, Lexical Resources Engineering Conference 2010 (LREC 2010), Malta, May, 2010. is a linked lexical knowledge base of wordnets of 18 scheduled languages of India viz.,
Assamese
Assamese may refer to:
* Assamese people, a socio-ethnolinguistic identity of north-eastern India
* People of Assam, multi-ethnic, multi-linguistic and multi-religious people of Assam
* Assamese language, one of the easternmost Indo-Aryan language ...
,
Bangla
Bangla (Bengali: বাংলা) may refer to:
*Bengali language, an eastern Indo-Aryan language
*The endonym of Bengal, a geographical and ethno-linguistic region in South Asia
*''Bangla-'', a prefix indicating Bangladesh
Businesses and organ ...
,
Bodo Bodo may refer to:
Ethnicity
* Boro people, an ethno-linguistic group mainly from Northwest Assam, India
* Bodo-Kachari people, an umbrella group from Nepal, India and Bangladesh that includes the Bodo people
Culture and language
* Boro cu ...
,
Gujarati
Gujarati may refer to:
* something of, from, or related to Gujarat, a state of India
* Gujarati people, the major ethnic group of Gujarat
* Gujarati language, the Indo-Aryan language spoken by them
* Gujarati languages, the Western Indo-Aryan sub- ...
,
Hindi
Hindi (Devanāgarī: or , ), or more precisely Modern Standard Hindi (Devanagari: ), is an Indo-Aryan language spoken chiefly in the Hindi Belt region encompassing parts of northern, central, eastern, and western India. Hindi has been de ...
,
Kannada
Kannada (; ಕನ್ನಡ, ), originally romanised Canarese, is a Dravidian language spoken predominantly by the people of Karnataka in southwestern India, with minorities in all neighbouring states. It has around 47 million native s ...
,
Kashmiri Kashmiri may refer to:
* People or things related to the Kashmir Valley or the broader region of Kashmir
* Kashmiris, an ethnic group native to the Kashmir Valley
* Kashmiri language, their language
People with the name
* Kashmiri Saikia Baruah ...
Malayalam
Malayalam (; , ) is a Dravidian language spoken in the Indian state of Kerala and the union territories of Lakshadweep and Puducherry (Mahé district) by the Malayali people. It is one of 22 scheduled languages of India. Malayalam was des ...
Marathi
Marathi may refer to:
*Marathi people, an Indo-Aryan ethnolinguistic group of Maharashtra, India
*Marathi language, the Indo-Aryan language spoken by the Marathi people
*Palaiosouda, also known as Marathi, a small island in Greece
See also
*
* ...
,
Nepali
Nepali or Nepalese may refer to :
Concerning Nepal
* Anything of, from, or related to Nepal
* Nepali people, citizens of Nepal
* Nepali language, an Indo-Aryan language found in Nepal, the current official national language and a language spoken ...
,
Odia
Odia, also spelled Oriya or Odiya, may refer to:
* Odia people in Odisha, India
* Odia language, an Indian language, belonging to the Indo-Aryan branch of the Indo-European language family
* Odia alphabet, a writing system used for the Odia languag ...
,
Punjabi
Punjabi, or Panjabi, most often refers to:
* Something of, from, or related to Punjab, a region in India and Pakistan
* Punjabi language
* Punjabi people
* Punjabi dialects and languages
Punjabi may also refer to:
* Punjabi (horse), a British Th ...
,
Sanskrit
Sanskrit (; attributively , ; nominally , , ) is a classical language belonging to the Indo-Aryan branch of the Indo-European languages. It arose in South Asia after its predecessor languages had diffused there from the northwest in the late ...
,
Tamil
Tamil may refer to:
* Tamils, an ethnic group native to India and some other parts of Asia
** Sri Lankan Tamils, Tamil people native to Sri Lanka also called ilankai tamils
**Tamil Malaysians, Tamil people native to Malaysia
* Tamil language, nati ...
,
Telugu
Telugu may refer to:
* Telugu language, a major Dravidian language of India
*Telugu people, an ethno-linguistic group of India
* Telugu script, used to write the Telugu language
** Telugu (Unicode block), a block of Telugu characters in Unicode
S ...
Malayalam WordNet
Malayalam WordNet (പദശൃംഖല) is an on line WordNet created for Malayalam Language. Malayalam WordNet has been developed by the Department of Computer Science, Cochin University of Science and Technology, Cochin University Of Science A ...
, developed by Cochin University Of Science and Technology.
* Multilingual Central Repository (MCR) integrates in the same EuroWordNet framework wordnets from Spanish, Catalan, Basque, Galician and Portuguese liked to English.
* The MultiWordNet project, a multilingual WordNet aimed at producing an Italian WordNet strongly aligned with the Princeton WordNet.
* OpenDutchWordNet, is a Dutch lexical semantic database.
* OpenWN-PT is a Brazilian Portuguese version of the original WordNet freely available for download under CC-BY-SA license.
*
plWordNet plWordNet is a lexico-semantic database of the Polish language. It includes sets of Synonym, synonymous lexical units (synsets) followed by short definitions. plWordNet serves as a thesaurus-dictionary where concepts (synsets) and individual word me ...
is a Polish-language version of WordNet developed by Wrocław University of Technology.
* PolNet is a Polish-language version of WordNet developed by
Adam Mickiewicz University in Poznań
The Adam Mickiewicz University ( pl, Uniwersytet im. Adama Mickiewicza w Poznaniu; Latin: ''Universitas Studiorum Mickiewicziana Posnaniensis'') is a research university in Poznań, Poland.
It traces its origins to 1611, when under the Royal Ch ...
(distributed under CC BY-NC-ND 3.0 license).
Projects such as BalkaNet and EuroWordNet made it feasible to create standalone wordnets linked to the original one. One of such projects was Russian WordNet patronized by
Petersburg State University of Means of Communication
Emperor Alexander I St. Petersburg State Transport University (PGUPS) (russian: Петербургский государственный университет путей сообщения Императора Александра I, abbreviat ...
led by S.A. Yablonsky or Russnet by
Saint Petersburg State University
Saint Petersburg State University (SPBU; russian: Санкт-Петербургский государственный университет) is a public research university in Saint Petersburg, Russia. Founded in 1724 by a decree of Peter the G ...
* UWN is an automatically constructed multilingual lexical knowledge base extending WordNet to cover over a million words in many different languages.
* WOLF (WordNet Libre du Français), a French version of WordNet.
semantic network
A semantic network, or frame network is a knowledge base that represents semantic relations between concepts in a network. This is often used as a form of knowledge representation. It is a directed or undirected graph consisting of vertices, ...
with millions of concepts obtained by integrating WordNet and Wikipedia using an automatic mapping algorithm.
* The
SUMO
is a form of competitive full-contact wrestling where a ''rikishi'' (wrestler) attempts to force his opponent out of a circular ring (''dohyō'') or into touching the ground with any body part other than the soles of his feet (usually by thr ...
ontology has produced a mapping between all of the WordNet synsets (including nouns, verbs, adjectives and adverbs), and SUMO classes. The most recent addition of the mappings provides links to all of the more specific terms in the MId-Level Ontology (MILO), which extends SUMO.
*
OpenCyc
Cyc (pronounced ) is a long-term artificial intelligence project that aims to assemble a comprehensive ontology and knowledge base that spans the basic concepts and rules about how the world works. Hoping to capture common sense knowledge, Cyc fo ...
, an open
ontology
In metaphysics, ontology is the philosophical study of being, as well as related concepts such as existence, becoming, and reality.
Ontology addresses questions like how entities are grouped into categories and which of these entities exis ...
and
knowledge base
A knowledge base (KB) is a technology used to store complex structured and unstructured information used by a computer system. The initial use of the term was in connection with expert systems, which were the first knowledge-based systems.
Ori ...
of everyday common sense knowledge, has 12,000 terms linked to WordNet synonym sets.
*
DOLCE
Dolce, the Italian word for 'sweet', may refer to:
Places
*Dolcè, a municipality in Italy
*Dolce (Plzeň-South District), a municipality and village in the Czech Republic
*Dolce, a village and part of Jesenice (Příbram District) in the Czech ...
, is the first module of the WonderWeb Foundational Ontologies Library (WFOL). This upper-ontology has been developed in light of rigorous ontological principles inspired by the philosophical tradition, with a clear orientation toward language and cognition. OntoWordNet is the result of an experimental align WordNet's upper level with DOLCE. It is suggested that such alignment could lead to an "ontologically sweetened" WordNet, meant to be conceptually more rigorous, cognitively transparent, and efficiently exploitable in several applications.
* DBpedia, a database of structured information, is linked to WordNet.
* The
eXtended WordNet
Extension, extend or extended may refer to:
Mathematics
Logic or set theory
* Axiom of extensionality
* Extensible cardinal
* Extension (model theory)
* Extension (predicate logic), the set of tuples of values that satisfy the predicate
* Exte ...
is a project at the University of Texas at Dallas which aims to improve WordNet by semantically parsing the glosses, thus making the information contained in these definitions available for automatic knowledge processing systems. It is freely available under a license similar to WordNet's.
* The
GCIDE
GCIDE is the GNU Project, GNU version of Collaborative International Dictionary of English, derived from the 1913 edition of Webster's Dictionary, Webster's Revised Unabridged Dictionary and WordNet. The dictionary is released under the GNU Genera ...
project produced a dictionary by combining a
public domain
The public domain (PD) consists of all the creative work
A creative work is a manifestation of creative effort including fine artwork (sculpture, paintings, drawing, sketching, performance art), dance, writing (literature), filmmaking, ...
''
Webster's Dictionary
''Webster's Dictionary'' is any of the English language dictionaries edited in the early 19th century by American lexicographer Noah Webster (1758–1843), as well as numerous related or unrelated dictionaries that have adopted the Webster's n ...
'' from 1913 with some WordNet definitions and material provided by volunteers. It was released under the
copyleft
Copyleft is the legal technique of granting certain freedoms over copies of copyrighted works with the requirement that the same rights be preserved in derivative works. In this sense, ''freedoms'' refers to the use of the work for any purpose, ...
license
GPL
The GNU General Public License (GNU GPL or simply GPL) is a series of widely used free software licenses that guarantee end users the four freedoms to run, study, share, and modify the software. The license was the first copyleft for general us ...
.
* ImageNet is an image database organized according to the WordNet hierarchy (currently only the nouns), in which each node of the hierarchy is depicted by hundreds and thousands of images. Currently, it has over 500 images per node on average.
* BioWordnet, a biomedical extension of wordnet was abandoned due to issues about stability over versions.
* WikiTax2WordNet, a mapping between WordNet synsets and Wikipedia categories.
* WordNet++, a resource including over millions of semantic edges harvested from Wikipedia and connecting pairs of WordNet synsets.
* SentiWordNet, a resource for supporting opinion mining applications obtained by tagging all the WordNet 3.0 synsets according to their estimated degrees of positivity, negativity, and neutrality.
* ColorDict, is an Android application to mobiles phones that use Wordnet database and others, like Wikipedia.
*
UBY-LMF UBY-LMF is a format for standardizing lexical resources for Natural Language Processing (NLP). UBY-LMF
conforms to the ISO standard for lexicons: LMF, designed within the ISO-TC37, and constitutes a so-called serialization of this abstract standa ...
a database of 10 resources including WordNet.
Related projects
* FrameNet is a lexical database that shares some similarities with, and refers to, WordNet.
* Lexical markup framework (LMF) is an ISO standard specified within
ISO/TC37
ISO/TC 37 is a technical committee within the International Organization for Standardization (ISO) that prepares standards and other documents concerning methodology and principles for terminology and language resources.
Title: Terminology and ...
in order to define a common standardized framework for the construction of lexicons, including WordNet. The subset of LMF for Wordnet is called Wordnet-LMF. An instantiation has been made within the KYOTO project.
* UNL Programme is a project under the auspices of UNO aimed to consolidate lexicosemantic data of many languages to be used in machine translation and information extraction systems.
Meaning Monkey is a free online dictionary based on the WordNet database.
Distributions
WordNet Database is distributed as a dictionary package (usually a single file) for the following software:
*
GoldenDict
GoldenDict is a free and open-source dictionary program that gives translations of words and phrases for different languages. It allows the use of several popular dictionary file formats simultaneously and without conversion.
The project aims to ...
Machine-readable dictionary
Machine-readable dictionary (''MRD'') is a dictionary stored as machine (computer) data instead of being printed on paper. It is an electronic dictionary and lexical database.
A machine-readable dictionary is a dictionary in an electronic form tha ...
*
Synonym Ring
In metadata, a synonym ring or synset, is a group of data elements that are considered semantically equivalent for the purposes of information retrieval. These data elements are frequently found in different metadata registries. Although a group ...
*
Taxonomy
Taxonomy is the practice and science of categorization or classification.
A taxonomy (or taxonomical classification) is a scheme of classification, especially a hierarchical classification, in which things are organized into groups or types. ...