WordNet is a
lexical database
In digital lexicography, natural language processing, and digital humanities, a lexical resource is a language resource consisting of data regarding the lexemes of the lexicon of one or more languages e.g., in the form of a database.
Characteris ...
of
semantic relations between
word
A word is a basic element of language that carries semantics, meaning, can be used on its own, and is uninterruptible. Despite the fact that language speakers often have an intuitive grasp of what a word is, there is no consensus among linguist ...
s that links
word
A word is a basic element of language that carries semantics, meaning, can be used on its own, and is uninterruptible. Despite the fact that language speakers often have an intuitive grasp of what a word is, there is no consensus among linguist ...
s into
semantic relations including
synonyms
A synonym is a word, morpheme, or phrase that means precisely or nearly the same as another word, morpheme, or phrase in a given language. For example, in the English language, the words ''begin'', ''start'', ''commence'', and ''initiate'' are a ...
,
hyponyms
Hypernymy and hyponymy are the semantic relations between a generic term (''hypernym'') and a more specific term (''hyponym''). The hypernym is also called a ''supertype'', ''umbrella term'', or ''blanket term''. The hyponym names a subtype of ...
, and
meronym
In linguistics, meronymy () is a semantic relation between a meronym denoting a part and a holonym denoting a whole. In simpler terms, a meronym is in a ''part-of'' relationship with its holonym. For example, ''finger'' is a meronym of ''hand, ...
s. The synonyms are grouped into ''
synsets
In metadata, a synonym ring or synset, is a group of data elements that are considered semantically equivalent for the purposes of information retrieval. These data elements are frequently found in different metadata registries. Although a group ...
'' with short definitions and usage examples. It can thus be seen as a combination and extension of a
dictionary
A dictionary is a listing of lexemes from the lexicon of one or more specific languages, often arranged Alphabetical order, alphabetically (or by Semitic root, consonantal root for Semitic languages or radical-and-stroke sorting, radical an ...
and
thesaurus
A thesaurus (: thesauri or thesauruses), sometimes called a synonym dictionary or dictionary of synonyms, is a reference work which arranges words by their meanings (or in simpler terms, a book where one can find different words with similar me ...
. Its primary use is in automatic
text analysis
Content analysis is the study of documents and communication artifacts, known as texts e.g. photos, speeches or essays. Social scientists use content analysis to examine patterns in communication in a replicable and systematic manner. One of the ...
and
artificial intelligence
Artificial intelligence (AI) is the capability of computer, computational systems to perform tasks typically associated with human intelligence, such as learning, reasoning, problem-solving, perception, and decision-making. It is a field of re ...
applications. It was first created in the
English language
English is a West Germanic language that developed in early medieval England and has since become a English as a lingua franca, global lingua franca. The namesake of the language is the Angles (tribe), Angles, one of the Germanic peoples th ...
and the English WordNet
database
In computing, a database is an organized collection of data or a type of data store based on the use of a database management system (DBMS), the software that interacts with end users, applications, and the database itself to capture and a ...
and
software
Software consists of computer programs that instruct the Execution (computing), execution of a computer. Software also includes design documents and specifications.
The history of software is closely tied to the development of digital comput ...
tools have been released under a
BSD style license and are freely available for download. The latest official release from Princeton was released in 2011. Princeton currently has no plans to release any new versions due to staffing and funding issues.
New versions are still being released annually through the Open English WordNet website. Until about 2024 an online version was previously available through wordnet.princeton.edu. That version of WordNet has been deprecated, but a new online version is available at en-word.net. There are now WordNets in more than 200 languages.
History and team members
WordNet was first created in 1985, in English only, in the
Cognitive Science
Cognitive science is the interdisciplinary, scientific study of the mind and its processes. It examines the nature, the tasks, and the functions of cognition (in a broad sense). Mental faculties of concern to cognitive scientists include percep ...
Laboratory of
Princeton University
Princeton University is a private university, private Ivy League research university in Princeton, New Jersey, United States. Founded in 1746 in Elizabeth, New Jersey, Elizabeth as the College of New Jersey, Princeton is the List of Colonial ...
under the direction of
psychology
Psychology is the scientific study of mind and behavior. Its subject matter includes the behavior of humans and nonhumans, both consciousness, conscious and Unconscious mind, unconscious phenomena, and mental processes such as thoughts, feel ...
professor
Professor (commonly abbreviated as Prof.) is an Academy, academic rank at university, universities and other tertiary education, post-secondary education and research institutions in most countries. Literally, ''professor'' derives from Latin ...
George Armitage Miller
George Armitage Miller (February 3, 1920 – July 22, 2012) was an American psychologist who was one of the founders of cognitive psychology, and more broadly, of cognitive science. He also contributed to the birth of psycholinguistics. Miller ...
. It was later directed by
Christiane Fellbaum
Christiane D. Fellbaum is an American linguist and computational linguistics researcher who is Lecturer with Rank of Professor in the Program in Linguistics and the Computer Science Department at Princeton University. The co-developer of the WordN ...
. The project was initially funded by the U.S. Office of Naval Research, and later also by other U.S. government agencies including the
DARPA
The Defense Advanced Research Projects Agency (DARPA) is a research and development agency of the United States Department of Defense responsible for the development of emerging technologies for use by the military. Originally known as the Adva ...
, the
National Science Foundation
The U.S. National Science Foundation (NSF) is an Independent agencies of the United States government#Examples of independent agencies, independent agency of the Federal government of the United States, United States federal government that su ...
, the
Disruptive Technology Office (formerly the Advanced Research and Development Activity) and REFLEX. George Miller and Christiane Fellbaum received the 2006 Antonio Zampolli Prize for their work with WordNet.
The Global WordNet Association is a non-commercial organization that provides a platform for discussing, sharing and connecting WordNets for all languages in the world.
Christiane Fellbaum
Christiane D. Fellbaum is an American linguist and computational linguistics researcher who is Lecturer with Rank of Professor in the Program in Linguistics and the Computer Science Department at Princeton University. The co-developer of the WordN ...
and
Piek Th.J.M. Vossen are its co-presidents.
Database contents

The database contains 155,327 words organized in 175,979
synsets
In metadata, a synonym ring or synset, is a group of data elements that are considered semantically equivalent for the purposes of information retrieval. These data elements are frequently found in different metadata registries. Although a group ...
for a total of 207,016 word-sense pairs; in
compressed form, it is about 12
megabyte
The megabyte is a multiple of the unit byte for digital information. Its recommended unit symbol is MB. The unit prefix ''mega'' is a multiplier of (106) in the International System of Units (SI). Therefore, one megabyte is one million bytes ...
s in size.
It includes the lexical categories
noun
In grammar, a noun is a word that represents a concrete or abstract thing, like living creatures, places, actions, qualities, states of existence, and ideas. A noun may serve as an Object (grammar), object or Subject (grammar), subject within a p ...
s,
verb
A verb is a word that generally conveys an action (''bring'', ''read'', ''walk'', ''run'', ''learn''), an occurrence (''happen'', ''become''), or a state of being (''be'', ''exist'', ''stand''). In the usual description of English, the basic f ...
s,
adjective
An adjective (abbreviations, abbreviated ) is a word that describes or defines a noun or noun phrase. Its semantic role is to change information given by the noun.
Traditionally, adjectives are considered one of the main part of speech, parts of ...
s and
adverb An adverb is a word or an expression that generally modifies a verb, an adjective, another adverb, a determiner, a clause, a preposition, or a sentence. Adverbs typically express manner, place, time, frequency, degree, or level of certainty by ...
s but ignores
preposition
Adpositions are a part of speech, class of words used to express spatial or temporal relations (''in, under, towards, behind, ago'', etc.) or mark various thematic relations, semantic roles (''of, for''). The most common adpositions are prepositi ...
s,
determiner
Determiner, also called determinative ( abbreviated ), is a term used in some models of grammatical description to describe a word or affix belonging to a class of noun modifiers. A determiner combines with a noun to express its reference. Examp ...
s and other function words.
Words from the same lexical category that are roughly synonymous are grouped into
synsets
In metadata, a synonym ring or synset, is a group of data elements that are considered semantically equivalent for the purposes of information retrieval. These data elements are frequently found in different metadata registries. Although a group ...
, which include simplex words as well as
collocation
In corpus linguistics, a collocation is a series of words or terms that co-occur more often than would be expected by chance. In phraseology, a collocation is a type of compositional phraseme, meaning that it can be understood from the words t ...
s like "eat out" and "car pool." The different senses of a
polysemous
Polysemy ( or ; ) is the capacity for a sign (e.g. a symbol, morpheme, word, or phrase) to have multiple related meanings. For example, a word can have several word senses. Polysemy is distinct from '' monosemy'', where a word has a single meani ...
word form are assigned to different synsets. A synset's meaning is further clarified with a short defining ''gloss'' and one or more usage examples. An example adjective synset is:
: good, right, ripe – (most suitable or right for a particular purpose; "a good time to plant tomatoes"; "the right time to act"; "the time is ripe for great sociological changes")
All synsets are connected by means of semantic relations. These relations, which are not all shared by all lexical categories, include:
*
Noun
In grammar, a noun is a word that represents a concrete or abstract thing, like living creatures, places, actions, qualities, states of existence, and ideas. A noun may serve as an Object (grammar), object or Subject (grammar), subject within a p ...
s
**''
hypernym
Hypernymy and hyponymy are the semantic relations between a generic term (''hypernym'') and a more specific term (''hyponym''). The hypernym is also called a ''supertype'', ''umbrella term'', or ''blanket term''. The hyponym names a subtype of ...
'': ''Y'' is a hypernym of ''X'' if every ''X'' is a (kind of) ''Y'' (''canine'' is a hypernym of ''
dog
The dog (''Canis familiaris'' or ''Canis lupus familiaris'') is a domesticated descendant of the gray wolf. Also called the domestic dog, it was selectively bred from a population of wolves during the Late Pleistocene by hunter-gatherers. ...
'')
**''
hyponym
Hypernymy and hyponymy are the wikt:Wiktionary:Semantic relations, semantic relations between a generic term (''hypernym'') and a more specific term (''hyponym''). The hypernym is also called a ''supertype'', ''umbrella term'', or ''blanket term ...
'': ''Y'' is a hyponym of ''X'' if every ''Y'' is a (kind of) ''X'' (''dog'' is a hyponym of ''canine'')
**''coordinate term'': ''Y'' is a coordinate term of ''X'' if ''X'' and ''Y'' share a hypernym (''wolf'' is a coordinate term of ''dog'', and ''dog'' is a coordinate term of ''wolf'')
**''
holonym
In linguistics, meronymy () is a semantic relation between a meronym denoting a part and a holonym denoting a whole. In simpler terms, a meronym is in a ''part-of'' relationship with its holonym. For example, ''finger'' is a meronym of ''hand, ...
'': ''Y'' is a holonym of ''X'' if ''X'' is a part of ''Y'' (''building'' is a holonym of ''window'')
**''
meronym
In linguistics, meronymy () is a semantic relation between a meronym denoting a part and a holonym denoting a whole. In simpler terms, a meronym is in a ''part-of'' relationship with its holonym. For example, ''finger'' is a meronym of ''hand, ...
'': ''Y'' is a meronym of ''X'' if ''Y'' is a part of ''X'' (''window'' is a meronym of ''building'')
*
Verb
A verb is a word that generally conveys an action (''bring'', ''read'', ''walk'', ''run'', ''learn''), an occurrence (''happen'', ''become''), or a state of being (''be'', ''exist'', ''stand''). In the usual description of English, the basic f ...
s
**''hypernym'': the verb ''Y'' is a hypernym of the verb ''X'' if the activity ''X'' is a (kind of) ''Y'' (''to perceive'' is an hypernym of ''to listen'')
**''
troponym
In linguistics, troponymy is the presence of a 'manner' relation between two lexemes.
The concept was originally proposed by Christiane Fellbaum and George Armitage Miller, George Miller. Some examples they gave are "to nibble is to eat in a cer ...
'': the verb ''Y'' is a troponym of the verb ''X'' if the activity ''Y'' is doing ''X'' in some manner (''to lisp'' is a troponym of ''to talk'')
**''
entailment
Logical consequence (also entailment or logical implication) is a fundamental concept in logic which describes the relationship between statements that hold true when one statement logically ''follows from'' one or more statements. A valid l ...
'': the verb ''Y'' is entailed by the verb ''X'' if by doing ''X'' you must be doing ''Y'' (''to sleep'' is entailed by ''to snore'')
**''coordinate term'': the verb ''Y'' is a coordinate term of the verb ''X'' if ''X'' and ''Y'' share a hypernym (''to lisp'' is a coordinate term of ''to yell'', and ''to yell'' is a coordinate term of ''to lisp'')
These semantic relations hold among all members of the linked synsets. Individual synset members (words) can also be connected with lexical relations. For example, (one sense of) the noun "director" is linked to (one sense of) the verb "direct" from which it is derived via a "morphosemantic" link.
The morphology functions of the software distributed with the database try to deduce the
lemma or
stem
Stem or STEM most commonly refers to:
* Plant stem, a structural axis of a vascular plant
* Stem group
* Science, technology, engineering, and mathematics
Stem or STEM can also refer to:
Language and writing
* Word stem, part of a word respon ...
form of a
word
A word is a basic element of language that carries semantics, meaning, can be used on its own, and is uninterruptible. Despite the fact that language speakers often have an intuitive grasp of what a word is, there is no consensus among linguist ...
from the user's input. Irregular forms are stored in a list, and looking up "ate" will return "eat," for example.
Knowledge structure
Both nouns and verbs are organized into hierarchies, defined by
hypernym
Hypernymy and hyponymy are the semantic relations between a generic term (''hypernym'') and a more specific term (''hyponym''). The hypernym is also called a ''supertype'', ''umbrella term'', or ''blanket term''. The hyponym names a subtype of ...
or ''
IS A
In knowledge representation, ontology components and ontology engineering, including for object-oriented programming and design, is-a (also written as is_a or is a) is a subsumptive relationship between abstractions (e.g., types, classes), wh ...
'' relationships. For instance, one sense of the word ''dog'' is found following hypernym hierarchy; the words at the same level represent synset members. Each set of synonyms has a unique index.
* dog, domestic dog, Canis familiaris
** canine, canid
*** carnivore
**** placental, placental mammal, eutherian, eutherian mammal
***** mammal
****** vertebrate, craniate
******* chordate
******** animal, animate being, beast, brute, creature, fauna
********* ...
At the top level, these hierarchies are organized into 25 beginner "trees" for nouns and 15 for verbs (called ''lexicographic files'' at a maintenance level). All are linked to a unique beginner synset, "entity".
Noun hierarchies are far deeper than verb hierarchies.
Adjectives are not organized into hierarchical trees. Instead, two "central" antonyms such as "hot" and "cold" form binary poles, while 'satellite' synonyms such as "steaming" and "chilly" connect to their respective poles via a "similarity" relations. The adjectives can be visualized in this way as "dumbbells" rather than as "trees".
Psycholinguistic aspects
The initial goal of the WordNet project was to build a lexical database that would be consistent with theories of human semantic memory developed in the late 1960s. Psychological experiments indicated that speakers organized their knowledge of concepts in an economic, hierarchical fashion. Retrieval time required to access conceptual knowledge seemed to be directly related to the number of hierarchies the speaker needed to "traverse" to access the knowledge. Thus, speakers could more quickly verify that ''canaries can sing'' because a canary is a songbird, but required slightly more time to verify that ''canaries can fly'' (where they had to access the concept "bird" on the superordinate level) and even more time to verify ''canaries have skin'' (requiring look-up across multiple levels of hyponymy, up to "animal").
While such
psycholinguistic experiments and the underlying theories have been subject to criticism, some of WordNet's organization is consistent with experimental evidence. For example,
anomic aphasia
Anomic aphasia, also known as dysnomia, nominal aphasia, and amnesic aphasia, is a mild, fluent type of aphasia where individuals have word retrieval failures and cannot express the words they want to say (particularly nouns and verbs). By cont ...
selectively affects speakers' ability to produce words from a specific semantic category, a WordNet hierarchy. Antonymous adjectives (WordNet's central adjectives in the dumbbell structure) are found to co-occur far more frequently than chance, a fact that has been found to hold for many languages.
As a lexical ontology
WordNet is sometimes called an ontology, a persistent claim that its creators do not make. The hypernym/hyponym relationships among the noun synsets can be interpreted as specialization relations among conceptual categories. In other words, WordNet can be interpreted and used as a lexical
ontology
Ontology is the philosophical study of existence, being. It is traditionally understood as the subdiscipline of metaphysics focused on the most general features of reality. As one of the most fundamental concepts, being encompasses all of realit ...
in the
computer science
Computer science is the study of computation, information, and automation. Computer science spans Theoretical computer science, theoretical disciplines (such as algorithms, theory of computation, and information theory) to Applied science, ...
sense. However, such an ontology should be corrected before being used, because it contains hundreds of basic semantic inconsistencies; for example there are, (i) common specializations for exclusive categories and (ii) redundancies in the specialization hierarchy. Furthermore, transforming WordNet into a lexical ontology usable for knowledge representation should normally also involve (i) distinguishing the specialization relations into ''subtypeOf'' and ''instanceOf'' relations, and (ii) associating intuitive unique identifiers to each category. Although such corrections and transformations have been performed and documented as part of the integration of WordNet 1.7 into the cooperatively updatable knowledge base of WebKB-2, most projects claiming to reuse WordNet for knowledge-based applications (typically, knowledge-oriented information retrieval) simply reuse it directly.
WordNet has also been converted to a formal specification, by means of a hybrid bottom-up top-down methodology to automatically extract association relations from it and interpret these associations in terms of a set of conceptual relations, formally defined in the
DOLCE foundational ontology.
In most works that claim to have integrated WordNet into ontologies, the content of WordNet has not simply been corrected when it seemed necessary; instead, it has been heavily reinterpreted and updated whenever suitable. This was the case when, for example, the top-level ontology of WordNet was restructured according to the
OntoClean-based approach, or when it was used as a primary source for constructing the lower classes of the SENSUS ontology.
Limitations
The most widely discussed limitation of WordNet (and related resources like
ImageNet) is that some of the
semantic relations are more suited to concrete concepts than to abstract concepts. For example, it is easy to create hyponyms/hypernym relationships to capture that a "
conifer
Conifers () are a group of conifer cone, cone-bearing Spermatophyte, seed plants, a subset of gymnosperms. Scientifically, they make up the phylum, division Pinophyta (), also known as Coniferophyta () or Coniferae. The division contains a sin ...
" is a type of "
tree
In botany, a tree is a perennial plant with an elongated stem, or trunk, usually supporting branches and leaves. In some usages, the definition of a tree may be narrower, e.g., including only woody plants with secondary growth, only ...
", a "tree" is a type of "
plant
Plants are the eukaryotes that form the Kingdom (biology), kingdom Plantae; they are predominantly Photosynthesis, photosynthetic. This means that they obtain their energy from sunlight, using chloroplasts derived from endosymbiosis with c ...
", and a "plant" is a type of "
organism
An organism is any life, living thing that functions as an individual. Such a definition raises more problems than it solves, not least because the concept of an individual is also difficult. Many criteria, few of them widely accepted, have be ...
", but it is difficult to classify emotions like "fear" or "happiness" into equally deep and well-defined hyponyms/hypernym relationships.
Many of the concepts in WordNet are specific to certain languages and the most accurate reported mapping between languages is 94%. Synonyms, hyponyms, meronyms, and antonyms occur in all languages with a WordNet so far, but other semantic relationships are language-specific. This limits the interoperability across languages. However, it also makes WordNet a resource for highlighting and studying the differences between languages, so it is not necessarily a limitation for all use cases.
WordNet does not include information about the
etymology
Etymology ( ) is the study of the origin and evolution of words—including their constituent units of sound and meaning—across time. In the 21st century a subfield within linguistics, etymology has become a more rigorously scientific study. ...
or the pronunciation of words and it contains only limited information about usage. WordNet aims to cover most everyday words and does not include much domain-specific terminology.
WordNet is the most commonly used computational lexicon of English for
word-sense disambiguation
Word-sense disambiguation is the process of identifying which sense of a word is meant in a sentence or other segment of context. In human language processing and cognition, it is usually subconscious.
Given that natural language requires ref ...
(WSD), a task aimed at assigning the context-appropriate meanings (i.e. synset members) to words in a text. However, it has been argued that WordNet encodes sense distinctions that are too fine-grained. This issue prevents WSD systems from achieving a level of performance comparable to that of humans, who do not always agree when confronted with the task of selecting a sense from a dictionary that matches a word in a context. The granularity issue has been tackled by proposing
clustering methods that automatically group together similar senses of the same word.
Offensive content
WordNet includes words that can be perceived as
pejorative
A pejorative word, phrase, slur, or derogatory term is a word or grammatical form expressing a negative or disrespectful connotation, a low opinion, or a lack of respect toward someone or something. It is also used to express criticism, hosti ...
or offensive. The interpretation of a word can
change over time and
between social groups, so it is not always possible for WordNet to define a word as "
pejorative
A pejorative word, phrase, slur, or derogatory term is a word or grammatical form expressing a negative or disrespectful connotation, a low opinion, or a lack of respect toward someone or something. It is also used to express criticism, hosti ...
" or "offensive" in isolation. Therefore, people using WordNet must apply their own methods to identify offensive or pejorative words.
However, this limitation is true of other lexical resources like
dictionaries
A dictionary is a listing of lexemes from the lexicon of one or more specific languages, often arranged Alphabetical order, alphabetically (or by Semitic root, consonantal root for Semitic languages or radical-and-stroke sorting, radical an ...
and
thesaurus
A thesaurus (: thesauri or thesauruses), sometimes called a synonym dictionary or dictionary of synonyms, is a reference work which arranges words by their meanings (or in simpler terms, a book where one can find different words with similar me ...
es, which also contain
pejorative
A pejorative word, phrase, slur, or derogatory term is a word or grammatical form expressing a negative or disrespectful connotation, a low opinion, or a lack of respect toward someone or something. It is also used to express criticism, hosti ...
and offensive words. Some dictionaries indicate words that are
pejorative
A pejorative word, phrase, slur, or derogatory term is a word or grammatical form expressing a negative or disrespectful connotation, a low opinion, or a lack of respect toward someone or something. It is also used to express criticism, hosti ...
s, but do not include all the contexts in which words might be acceptable or offensive to different social groups. Therefore, people using dictionaries must apply their own methods to identify all offensive words.
Licensed vs. Open WordNets
Some wordnets were subsequently created for other languages. A 2012 survey lists the wordnets and their availability. In an effort to propagate the usage of WordNets, the Global WordNet community had been slowly re-licensing their WordNets to an open domain where researchers and developers can easily access and use WordNets as language resources to provide
ontological
Ontology is the philosophical study of being. It is traditionally understood as the subdiscipline of metaphysics focused on the most general features of reality. As one of the most fundamental concepts, being encompasses all of reality and every ...
and
lexical knowledge in
natural-language processing (NLP) tasks.
The Open Multilingual WordNet provides access to
open licensed wordnets in a variety of languages, all linked to the Princeton Wordnet of English (PWN). The goal is to make it easy to use wordnets in multiple languages.
Applications
WordNet has been used for a number of purposes in information systems, including
word-sense disambiguation
Word-sense disambiguation is the process of identifying which sense of a word is meant in a sentence or other segment of context. In human language processing and cognition, it is usually subconscious.
Given that natural language requires ref ...
,
information retrieval
Information retrieval (IR) in computing and information science is the task of identifying and retrieving information system resources that are relevant to an Information needs, information need. The information need can be specified in the form ...
,
automatic text classification,
automatic text summarization,
machine translation
Machine translation is use of computational techniques to translate text or speech from one language to another, including the contextual, idiomatic and pragmatic nuances of both languages.
Early approaches were mostly rule-based or statisti ...
and even automatic crossword puzzle generation.
A common use of WordNet is to determine the
similarity between words. Various algorithms have been proposed, including measuring the distance among words and
synsets
In metadata, a synonym ring or synset, is a group of data elements that are considered semantically equivalent for the purposes of information retrieval. These data elements are frequently found in different metadata registries. Although a group ...
in WordNet's graph structure, such as by counting the number of edges among synsets. The intuition is that the closer two words or synsets are, the closer their meaning. A number of WordNet-based word similarity algorithms are implemented in a
Perl
Perl is a high-level, general-purpose, interpreted, dynamic programming language. Though Perl is not officially an acronym, there are various backronyms in use, including "Practical Extraction and Reporting Language".
Perl was developed ...
package called WordNet::Similarity, and in a
Python package called
NLTK. Other more sophisticated WordNet-based similarity techniques include ADW, whose implementation is available in
Java
Java is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea (a part of Pacific Ocean) to the north. With a population of 156.9 million people (including Madura) in mid 2024, proje ...
. WordNet can also be used to inter-link other vocabularies.
Interfaces
Princeton maintains a list of related projects that includes links to some of the widely used
application programming interface
An application programming interface (API) is a connection between computers or between computer programs. It is a type of software Interface (computing), interface, offering a service to other pieces of software. A document or standard that des ...
s available for accessing WordNet using various programming languages and environments.
Related projects and extensions
WordNet is connected to several databases of the
Semantic Web
The Semantic Web, sometimes known as Web 3.0, is an extension of the World Wide Web through standards set by the World Wide Web Consortium (W3C). The goal of the Semantic Web is to make Internet data machine-readable.
To enable the encoding o ...
. WordNet is also commonly reused via mappings between the WordNet synsets and the categories from ontologies. Most often, only the top-level categories of WordNet are mapped.
Global WordNet Association
The Global WordNet Association (GWA) is a public and non-commercial organization that provides a platform for discussing, sharing and connecting wordnets for all languages in the world. The GWA also promotes the standardization of wordnets across languages, to ensure its uniformity in enumerating the synsets in human languages. The GWA keeps a list of wordnets developed around the world.
Other languages
*
Arabic WordNet: WordNet for Arabic language.
*
Arabic Ontology
Arabic Ontology is a linguistic ontology for the Arabic language, which can be used as an Arabic WordNet with ontologically clean content. People use it also as a tree (i.e. classification) of the concepts/meanings of the Arabic terms. It is a f ...
, a linguistic ontology that has the same structure as wordnet, and mapped to it.
* The BalkaNet project has produced WordNets for six European languages (Bulgarian, Czech, Greek, Romanian, Turkish and Serbian). For this project, a freely available XML-based WordNet editor was developed. This editor – VisDic – is not in active development anymore, but is still used for the creation of various WordNets. Its successor, DEBVisDic, is client-server application and is currently used for the editing of several WordNets (Dutch in Cornetto project, Polish, Hungarian, several African languages, Chinese).
*
BulNet is a Bulgarian version of the WordNet developed at the Department of Computational Linguistics of the
Institute for Bulgarian Language The Institute for Bulgarian Language (in ) is the language regulator of the Bulgarian language. It was created on May 15, 1942, and is based in Sofia. The institute develops a national dictionary, publishes magazines on linguistic research, and of ...
, Bulgarian Academy of Sciences.
* CWN (Chinese Wordnet or 中文詞彙網路) supported by
National Taiwan University
National Taiwan University (NTU; zh, t=國立臺灣大學, poj=Kok-li̍p Tâi-oân Tāi-ha̍k, p=, s=) is a National university, national Public university, public research university in Taipei, Taiwan. Founded in 1928 during Taiwan under J ...
.
* The
EuroWordNet project has produced WordNets for several European languages and linked them together; these are not freely available however. The Global Wordnet project attempts to coordinate the production and linking of "wordnets" for all languages.
Oxford University Press
Oxford University Press (OUP) is the publishing house of the University of Oxford. It is the largest university press in the world. Its first book was printed in Oxford in 1478, with the Press officially granted the legal right to print books ...
, the publisher of the
Oxford English Dictionary
The ''Oxford English Dictionary'' (''OED'') is the principal historical dictionary of the English language, published by Oxford University Press (OUP), a University of Oxford publishing house. The dictionary, which published its first editio ...
, has voiced plans to produce their own online competitor to WordNet.
* FinnWordNet is a Finnish version of the WordNet where all entries of the original English WordNet were translated.
*
GermaNet is a German version of the WordNet developed by the University of Tübingen.
* The
IndoWordNet[Pushpak Bhattacharyya, IndoWordNet, Lexical Resources Engineering Conference 2010 (LREC 2010), Malta, May, 2010.] is a linked lexical knowledge base of wordnets of 18 scheduled languages of India viz.,
Assamese,
Bangla,
Bodo
Bodo may refer to:
Ethnicity
* Boro people, also called ''Bodo'', an ethno-linguistic group mainly from Northwest Assam, India
* Bodo-Kachari people, an umbrella group from Nepal, India and Bangladesh that includes the Boro people
Culture an ...
,
Gujarati,
Hindi
Modern Standard Hindi (, ), commonly referred to as Hindi, is the Standard language, standardised variety of the Hindustani language written in the Devanagari script. It is an official language of India, official language of the Government ...
,
Kannada
Kannada () is a Dravidian language spoken predominantly in the state of Karnataka in southwestern India, and spoken by a minority of the population in all neighbouring states. It has 44 million native speakers, and is additionally a ...
,
Kashmiri,
Konkani __NOTOC__
Konkani may refer to:
Language
* Konkani language is an Indo-Aryan language spoken in the Konkan region of India.
* Konkani alphabets, different scripts used to write the language
**Konkani in the Roman script, one of the scripts used to ...
,
Malayalam
Malayalam (; , ) is a Dravidian languages, Dravidian language spoken in the Indian state of Kerala and the union territories of Lakshadweep and Puducherry (union territory), Puducherry (Mahé district) by the Malayali people. It is one of ...
,
Meitei (Manipuri),
Marathi,
Nepali,
Odia,
Punjabi,
Sanskrit
Sanskrit (; stem form ; nominal singular , ,) is a classical language belonging to the Indo-Aryan languages, Indo-Aryan branch of the Indo-European languages. It arose in northwest South Asia after its predecessor languages had Trans-cultural ...
,
Tamil
Tamil may refer to:
People, culture and language
* Tamils, an ethno-linguistic group native to India, Sri Lanka, and some other parts of Asia
**Sri Lankan Tamils, Tamil people native to Sri Lanka
** Myanmar or Burmese Tamils, Tamil people of Ind ...
,
Telugu and
Urdu
Urdu (; , , ) is an Indo-Aryan languages, Indo-Aryan language spoken chiefly in South Asia. It is the Languages of Pakistan, national language and ''lingua franca'' of Pakistan. In India, it is an Eighth Schedule to the Constitution of Indi ...
.
* JAWS (Just Another WordNet Subset), another French version of WordNet built using the Wiktionary and semantic spaces
WordNet Bahasa WordNet for Malay and Indonesia language, developed by
Nanyang University of Technology.
*
Malayalam WordNet, developed by
Cochin University Of Science and Technology.
* Multilingual Central Repository (MCR) integrates in the same EuroWordNet framework wordnets from Spanish, Catalan, Basque, Galician and Portuguese liked to English.
* The MultiWordNet project, a multilingual WordNet aimed at producing an Italian WordNet strongly aligned with the Princeton WordNet.
* OpenDutchWordNet, is a Dutch lexical semantic database.
* OpenWN-PT is a Brazilian Portuguese version of the original WordNet freely available for download under CC-BY-SA license.
*
plWordNet is a Polish-language version of WordNet developed by
Wrocław University of Technology.
* PolNet is a Polish-language version of WordNet developed by
Adam Mickiewicz University in Poznań (distributed under CC BY-NC-ND 3.0 license).
Projects such as BalkaNet and EuroWordNet made it feasible to create standalone wordnets linked to the original one. Two such projects were the Russian WordNet, patronized by
Petersburg State University of Means of Communication and led by S.A. Yablonsky, and Russnet, by
Saint Petersburg State University
Saint Petersburg State University (SPBGU; ) is a public research university in Saint Petersburg, Russia, and one of the oldest and most prestigious universities in Russia. Founded in 1724 by a decree of Peter the Great, the university from the be ...
.
* UWN is an automatically constructed multilingual lexical knowledge base extending WordNet to cover over a million words in many different languages.
* WOLF (WordNet Libre du Français), a French version of WordNet.
Linked data
*
BabelNet, a very large multilingual
semantic network
A semantic network, or frame network is a knowledge base that represents semantic relations between concepts in a network. This is often used as a form of knowledge representation. It is a directed or undirected graph consisting of vertices, ...
with millions of concepts obtained by integrating WordNet and Wikipedia using an automatic mapping algorithm.
* The
SUMO
is a form of competitive full-contact wrestling where a ''rikishi'' (wrestler) attempts to force his opponent out of a circular ring (''dohyō'') or into touching the ground with any body part other than the soles of his feet (usually by th ...
ontology has a complete manual mappin
between all of the WordNet synsets and all of SUMO (including its domain ontologies, when WordNet contains a word sense for a given SUMO term) which is browsable at, for exampl
*
OpenCyc, an open
ontology
Ontology is the philosophical study of existence, being. It is traditionally understood as the subdiscipline of metaphysics focused on the most general features of reality. As one of the most fundamental concepts, being encompasses all of realit ...
and
knowledge base
In computer science, a knowledge base (KB) is a set of sentences, each sentence given in a knowledge representation language, with interfaces to tell new sentences and to ask questions about what is known, where either of these interfaces migh ...
of everyday common sense knowledge, has 12,000 terms linked to WordNet synonym sets.
*
DOLCE, is the first module of the WonderWeb Foundational Ontologies Library (WFOL). This upper-ontology has been developed in light of rigorous ontological principles inspired by the philosophical tradition, with a clear orientation toward language and cognition. OntoWordNet is the result of an experimental alignment of WordNet's upper level with DOLCE. It is suggested that such alignment could lead to an "ontologically sweetened" WordNet, meant to be conceptually more rigorous, cognitively transparent, and efficiently exploitable in several applications.
*
DBpedia
DBpedia (from "DB" for "database") is a project aiming to extract structured content from the information created in the Wikipedia project. This structured information is made available on the World Wide Web using OpenLink Virtuoso. DBpedia a ...
, a database of structured information, is linked to WordNet.
* The
eXtended WordNet is a project at the
University of Texas at Dallas
The University of Texas at Dallas (UTD or UT Dallas) is a public research university in Richardson, Texas, United States. It is the northernmost institution of the University of Texas System. It was initially founded in 1961 as a private res ...
which aims to improve WordNet by semantically parsing the glosses, thus making the information contained in these definitions available for automatic knowledge processing systems. It is freely available under a license similar to WordNet's.
* The
GCIDE project produced a dictionary by combining a
public domain
The public domain (PD) consists of all the creative work to which no Exclusive exclusive intellectual property rights apply. Those rights may have expired, been forfeited, expressly Waiver, waived, or may be inapplicable. Because no one holds ...
''
Webster's Dictionary
''Webster's Dictionary'' is any of the US English language dictionaries edited in the early 19th century by Noah Webster (1758–1843), a US lexicographer, as well as numerous related or unrelated dictionaries that have adopted the Webster's n ...
'' from 1913 with some WordNet definitions and material provided by volunteers. It was released under the
copyleft
Copyleft is the legal technique of granting certain freedoms over copies of copyrighted works with the requirement that the same rights be preserved in derivative works. In this sense, ''freedoms'' refers to the use of the work for any purpose, ...
license
GPL.
*
ImageNet is an image database organized according to the WordNet hierarchy (currently only the nouns), in which each node of the hierarchy is depicted by millions of images. Currently, it has over 500 images per node on average.
* BioWordnet, a biomedical extension of wordnet was abandoned due to issues about stability over versions.
* WikiTax2WordNet, a mapping between WordNet synsets and
Wikipedia categories.
* WordNet++, a resource including over millions of semantic edges harvested from Wikipedia and connecting pairs of WordNet synsets.
* SentiWordNet, a resource for supporting opinion mining applications obtained by tagging all the WordNet 3.0 synsets according to their estimated degrees of positivity, negativity, and neutrality.
* ColorDict, is an Android application to mobiles phones that use Wordnet database and others, like Wikipedia.
*
UBY-LMF a database of 10 resources including WordNet.
Related projects
*
TaxoLLaMa is a WordNet-based model designed to enhance LLMs' ability to capture lexical-semantic knowledge.
*
FrameNet is a lexical database that shares some similarities with, and refers to, WordNet.
*
Lexical markup framework
Language resource management – Lexical markup framework (LMF; ISO 24613), produced by ISO/TC 37, is the ISO standard for natural language processing (NLP) and machine-readable dictionary (MRD) lexicons. The scope is standardization of principles ...
(LMF) is an ISO standard specified within
ISO/TC37 in order to define a common standardized framework for the construction of lexicons, including WordNet. The subset of LMF for Wordnet is called Wordnet-LMF. An instantiation has been made within the KYOTO project.
*
UNL Programme is a project under the auspices of
UNO aimed to consolidate lexicosemantic data of many languages to be used in machine translation and
information extraction systems.
Meaning Monkeyis a free online dictionary based on the WordNet database.
Dictionary.videois a video dictionary focusing on pronunciations. Its text part is extended from WordNet.
Distributions
WordNet Database is distributed as a dictionary package (usually a single file) for the following software:
*
Babylon
Babylon ( ) was an ancient city located on the lower Euphrates river in southern Mesopotamia, within modern-day Hillah, Iraq, about south of modern-day Baghdad. Babylon functioned as the main cultural and political centre of the Akkadian-s ...
* GoldenDict
*
Lingoes
LexSemantic: Digital Platform for publishing reference works (dictionaries, encyclopedias, etc.). Includes WordnetPlus.
See also
*
Lexical Markup Framework
Language resource management – Lexical markup framework (LMF; ISO 24613), produced by ISO/TC 37, is the ISO standard for natural language processing (NLP) and machine-readable dictionary (MRD) lexicons. The scope is standardization of principles ...
*
Machine-readable dictionary
*
Synonym Ring
In metadata, a synonym ring or synset, is a group of data elements that are considered semantically equivalent for the purposes of information retrieval. These data elements are frequently found in different metadata registries. Although a group ...
*
Taxonomy
image:Hierarchical clustering diagram.png, 280px, Generalized scheme of taxonomy
Taxonomy is a practice and science concerned with classification or categorization. Typically, there are two parts to it: the development of an underlying scheme o ...
References
External links
*
*
*
{{DEFAULTSORT:Wordnet
Online English dictionaries
Lexical databases
Knowledge representation
Computational linguistics
Open data
Projects established in 1985
1985 establishments in New Jersey
Corpus linguistics
Free software programmed in Prolog
Software using the BSD license