PlWordNet
   HOME

TheInfoList



OR:

plWordNet is a lexico-semantic
database In computing, a database is an organized collection of data stored and accessed electronically. Small databases can be stored on a file system, while large databases are hosted on computer clusters or cloud storage. The design of databases sp ...
of the
Polish language Polish (Polish: ''język polski'', , ''polszczyzna'' or simply ''polski'', ) is a West Slavic language of the Lechitic group written in the Latin script. It is spoken primarily in Poland and serves as the native language of the Poles. In a ...
. It includes sets of synonymous lexical units (
synset In metadata, a synonym ring or synset, is a group of data elements that are considered semantically equivalent for the purposes of information retrieval. These data elements are frequently found in different metadata registries. Although a group ...
s) followed by short definitions. plWordNet serves as a thesaurus-dictionary where concepts (synsets) and individual word meanings (
lexical unit In lexicography, a lexical item is a single word, a part of a word, or a chain of words ( catena) that forms the basic elements of a language's lexicon (≈ vocabulary). Examples are ''cat'', ''traffic light'', ''take care of'', ''by the way ...
s) are defined by their location in the network of mutual relations, reflecting the lexico-semantic system of the Polish language. plWordNet is also used as one of the basic resources for the construction of
natural language processing Natural language processing (NLP) is an interdisciplinary subfield of linguistics, computer science, and artificial intelligence concerned with the interactions between computers and human language, in particular how to program computers to pro ...
tools for Polish.


History

plWordNet is being developed at
Wrocław University of Technology Wrocław (; german: Breslau, or . ; Silesian German: ''Brassel'') is a city in southwestern Poland and the largest city in the historical region of Silesia. It lies on the banks of the River Oder in the Silesian Lowlands of Central Europe, rou ...
as part of
CLARIN Clarin or Clarín may refer to: Geography *Clarin, Bohol, a municipality in the province of Bohol, Philippines *Clarin, Misamis Occidental, a municipality in the province of Misamis Occidental, Philippines * River Clarin, a river in Ireland Media ...
. The works have been carried out by The WrocU
Language Technology Group G4.19
since 2005, funded by the Ministry of Science and Higher Education and by the EU. The
thesaurus A thesaurus (plural ''thesauri'' or ''thesauruses'') or synonym dictionary is a reference work for finding synonyms and sometimes antonyms of words. They are often used by writers to help find the best word to express an idea: Synonym diction ...
has been built from the 'ground up' by lexicographers and natural language engineers. The first version of plWordNet was published in 2009 – it contained 20,223 lemmas, 26,990 lexical units and 17,695 synsets. Version 4.0 was released in 2018. The most recent version i
plWordNet 4.2


Content

Currently, plWordNet contains 195k
lemmas Lemma may refer to: Language and linguistics * Lemma (morphology), the canonical, dictionary or citation form of a word * Lemma (psycholinguistics), a mental abstraction of a word about to be uttered Science and mathematics * Lemma (botany), ...
, 295k lexical units and 228k synsets.Detailed comparative statistics of plWN and PWN can be found at plWN webpage: http://plwordnet.pwr.wroc.pl/wordnet/stats ccess: 30.06.2014/ref> It has already outgrown
Princeton WordNet WordNet is a lexical database of semantic relations between words in more than 200 languages. WordNet links words into semantic relations including synonyms, hyponyms, and meronyms. The synonyms are grouped into '' synsets'' with short definiti ...
with respect to the number of lexical units. plWordNet consists of
noun A noun () is a word that generally functions as the name of a specific object or set of objects, such as living creatures, places, actions, qualities, states of existence, or ideas.Example nouns for: * Living creatures (including people, alive, d ...
s (135k),
verb A verb () is a word (part of speech) that in syntax generally conveys an action (''bring'', ''read'', ''walk'', ''run'', ''learn''), an occurrence (''happen'', ''become''), or a state of being (''be'', ''exist'', ''stand''). In the usual descri ...
s (21k),
adjective In linguistics, an adjective (list of glossing abbreviations, abbreviated ) is a word that generally grammatical modifier, modifies a noun or noun phrase or describes its referent. Its semantic role is to change information given by the noun. Tra ...
s (29k) and
adverb An adverb is a word or an expression that generally modifies a verb, adjective, another adverb, determiner, clause, preposition, or sentence. Adverbs typically express manner, place, time, frequency, degree, level of certainty, etc., answering ...
s (8k). Each meaning of a given word is a separate lexical unit. Units that represent the same concept, and do not differ significantly in stylistic register, have been combined into synsets - sets of synonyms. Each lexical unit is assigned to one of the domains (semantic categories), indicating its general meaning. plWordNet domains correspond to Princeton WordNet ''lexicographers' files''.


Semantic categories in plWordNet


Lexical unit description

Some lexical units are provided with the information about stylistic register, short definition, usage examples and link to the relevant Wikipedia article. The most important element defining words meanings ar
lexico-semantic and derivational relations
which hold between synsets and between lexical units. One synset groups such lexical units, which share the same set of relations.Maziarz M., Piasecki M., Szpakowicz S., Rabiega-Wiśniewska J., Semantic Relations Among Nouns in Polish Wordnet Grounded in Lexicographic and Semantic Tradition, Cognitive Studies/Études Cognitives, t, 11, s. 161-181, 2011. Based on the relations assigned to the synsets and units, tools for natural language processing can conclude about meaning of the lemma, which is important for example in
word-sense disambiguation Word-sense disambiguation (WSD) is the process of identifying which sense of a word is meant in a sentence or other segment of context. In human language processing and cognition, it is usually subconscious/automatic but can often come to consci ...
.


Selected noun relations

Polish synsets are connected to the corresponding Princeton WordNet synsets with a set of inter-lingual lexico-semantic relations (such as for instance synonymy, partial synonymy,
hyponymy In linguistics, semantics, general semantics, and ontologies, hyponymy () is a semantic relation between a hyponym denoting a subtype and a hypernym or hyperonym (sometimes called umbrella term or blanket term) denoting a supertype. In other wor ...
). 91 578 synsets have been mapped so far (which amounts to about 2/3 of plWordNet synsets, among which mainly nouns). The mapping enables the application of plWordNet in
machine translation Machine translation, sometimes referred to by the abbreviation MT (not to be confused with computer-aided translation, machine-aided human translation or interactive translation), is a sub-field of computational linguistics that investigates t ...
, e.g. in the online service offered by
Google Translate Google Translate is a multilingual neural machine translation service developed by Google to translate text, documents and websites from one language into another. It offers a website interface, a mobile app for Android and iOS, and an API t ...
. Mapping can be instrumental in leveraging
textual analysis Content analysis is the study of documents and communication artifacts, which might be texts of various formats, pictures, audio or video. Social scientists use content analysis to examine patterns in communication in a replicable and systematic ...
tools from English to Polish.


Applications

plWordNet is available on the
open access Open access (OA) is a set of principles and a range of practices through which research outputs are distributed online, free of access charges or other barriers. With open access strictly defined (according to the 2001 definition), or libre op ...
br>license
allowing free browsing. It has been made available to the users in the form of a
online dictionarymobile application
and web services. Some application of plWordNet: * constructing and developing tools for automatic language processing, * word-sense disambiguation (WSD), * automatic classification of texts, *
machine translation Machine translation, sometimes referred to by the abbreviation MT (not to be confused with computer-aided translation, machine-aided human translation or interactive translation), is a sub-field of computational linguistics that investigates t ...
, *
aphasia Aphasia is an inability to comprehend or formulate language because of damage to specific brain regions. The major causes are stroke and head trauma; prevalence is hard to determine but aphasia due to stroke is estimated to be 0.1–0.4% in th ...
treatment, * Polish-English and English-Polish dictionary, * Polish language semantic dictionary, * dictionary of
synonym A synonym is a word, morpheme, or phrase that means exactly or nearly the same as another word, morpheme, or phrase in a given language. For example, in the English language, the words ''begin'', ''start'', ''commence'', and ''initiate'' are all ...
s and
thesaurus A thesaurus (plural ''thesauri'' or ''thesauruses'') or synonym dictionary is a reference work for finding synonyms and sometimes antonyms of words. They are often used by writers to help find the best word to express an idea: Synonym diction ...
, * dictionary of
antonym In lexical semantics, opposites are words lying in an inherently incompatible binary relationship. For example, something that is ''long'' entails that it is not ''short''. It is referred to as a 'binary' relationship because there are two members ...
s.


References

{{reflist Databases in Poland Polish-language mass media