Wikt
   HOME

TheInfoList



OR:

Wiktionary (, ; , ; rhyming with "dictionary") is a multilingual,
web Web most often refers to: * Spider web, a silken structure created by the animal * World Wide Web or the Web, an Internet-based hypertext system Web, WEB, or the Web may also refer to: Computing * WEB, a literate programming system created by ...
-based project to create a
free content Free content, libre content, libre information, or free information is any kind of creative work, such as a work of art, a book, a software program, or any other creative content for which there are very minimal copyright and other legal limi ...
dictionary A dictionary is a listing of lexemes from the lexicon of one or more specific languages, often arranged Alphabetical order, alphabetically (or by Semitic root, consonantal root for Semitic languages or radical-and-stroke sorting, radical an ...
of terms (including
word A word is a basic element of language that carries semantics, meaning, can be used on its own, and is uninterruptible. Despite the fact that language speakers often have an intuitive grasp of what a word is, there is no consensus among linguist ...
s,
phrase In grammar, a phrasecalled expression in some contextsis a group of words or singular word acting as a grammatical unit. For instance, the English language, English expression "the very happy squirrel" is a noun phrase which contains the adject ...
s,
proverb A proverb (from ) or an adage is a simple, traditional saying that expresses a perceived truth based on common sense or experience. Proverbs are often metaphorical and are an example of formulaic speech, formulaic language. A proverbial phrase ...
s,
linguistic reconstruction Linguistic reconstruction is the practice of establishing the features of an unattested ancestor language of one or more given languages. There are two kinds of reconstruction: * Internal reconstruction uses irregularities in a single language t ...
s, etc.) in all
natural language A natural language or ordinary language is a language that occurs naturally in a human community by a process of use, repetition, and change. It can take different forms, typically either a spoken language or a sign language. Natural languages ...
s and in a number of
artificial language Artificial languages are languages of a typically very limited size which emerge either in computer simulations between artificial agents, robot interactions or controlled psychological experiments with humans. They are different from both constr ...
s. These entries may contain
definition A definition is a statement of the meaning of a term (a word, phrase, or other set of symbols). Definitions can be classified into two large categories: intensional definitions (which try to give the sense of a term), and extensional definitio ...
s,
image An image or picture is a visual representation. An image can be Two-dimensional space, two-dimensional, such as a drawing, painting, or photograph, or Three-dimensional space, three-dimensional, such as a carving or sculpture. Images may be di ...
s for illustration,
pronunciation Pronunciation is the way in which a word or a language is spoken. To This may refer to generally agreed-upon sequences of sounds used in speaking a given word or all language in a specific dialect—"correct" or "standard" pronunciation—or si ...
s,
etymologies Etymology ( ) is the study of the origin and evolution of words—including their constituent units of sound and meaning—across time. In the 21st century a subfield within linguistics, etymology has become a more rigorously scientific study. ...
,
inflection In linguistic Morphology (linguistics), morphology, inflection (less commonly, inflexion) is a process of word formation in which a word is modified to express different grammatical category, grammatical categories such as grammatical tense, ...
s, usage examples,
quotation A quotation or quote is the repetition of a sentence, phrase, or passage from speech or text that someone has said or written. In oral speech, it is the representation of an utterance (i.e. of something that a speaker actually said) that is intro ...
s, related terms, and
translation Translation is the communication of the semantics, meaning of a #Source and target languages, source-language text by means of an Dynamic and formal equivalence, equivalent #Source and target languages, target-language text. The English la ...
s of terms into other languages, among other features. It is collaboratively edited via a
wiki A wiki ( ) is a form of hypertext publication on the internet which is collaboratively edited and managed by its audience directly through a web browser. A typical wiki contains multiple pages that can either be edited by the public or l ...
. Its name is a
portmanteau In linguistics, a blend—also known as a blend word, lexical blend, or portmanteau—is a word formed by combining the meanings, and parts of the sounds, of two or more words together.
of the words ''
wiki A wiki ( ) is a form of hypertext publication on the internet which is collaboratively edited and managed by its audience directly through a web browser. A typical wiki contains multiple pages that can either be edited by the public or l ...
'' and ''
dictionary A dictionary is a listing of lexemes from the lexicon of one or more specific languages, often arranged Alphabetical order, alphabetically (or by Semitic root, consonantal root for Semitic languages or radical-and-stroke sorting, radical an ...
''. It is available in languages and in Simple English. Like its sister project
Wikipedia Wikipedia is a free content, free Online content, online encyclopedia that is written and maintained by a community of volunteers, known as Wikipedians, through open collaboration and the wiki software MediaWiki. Founded by Jimmy Wales and La ...
, Wiktionary is run by the
Wikimedia Foundation The Wikimedia Foundation, Inc. (WMF) is an American 501(c)(3) nonprofit organization headquartered in San Francisco, California, and registered there as foundation (United States law), a charitable foundation. It is the host of Wikipedia, th ...
, and is written collaboratively by
volunteers Volunteering is an elective and freely chosen act of an individual or group giving their time and labor, often for community service. Many volunteers are specifically trained in the areas they work, such as medicine, education, or emergenc ...
, dubbed "Wiktionarians". Its
wiki software Wiki software (also known as a wiki engine or a wiki application) is collaborative software that runs a wiki, which allows the users to create and collaboratively edit pages or entries via a web browser. A wiki system is usually a web application ...
,
MediaWiki MediaWiki is free and open-source wiki software originally developed by Magnus Manske for use on Wikipedia on January 25, 2002, and further improved by Lee Daniel Crocker,mailarchive:wikipedia-l/2001-August/000382.html, Magnus Manske's announc ...
, allows almost anyone with access to the website to create and edit entries. Because Wiktionary is not limited by print space considerations, most of Wiktionary's language editions provide definitions and translations of terms from many languages, and some editions offer additional information typically found in
thesauri A thesaurus (: thesauri or thesauruses), sometimes called a synonym dictionary or dictionary of synonyms, is a reference work which arranges words by their meanings (or in simpler terms, a book where one can find different words with similar me ...
. Wiktionary's data is frequently used in various natural language processing tasks.


History and development

Wiktionary was brought online on December 12, 2002, following a proposal by Daniel Alston and an idea by
Larry Sanger Lawrence Mark Sanger (; born July 16, 1968) is an American Internet project developer and philosopher who co-founded Wikipedia along with Jimmy Wales. Sanger coined Wikipedia's name, and provided initial drafts for many of its early guidelines, ...
, co-founder of Wikipedia. On March 28, 2004, the first non- English Wiktionaries were initiated in French and Polish. Wiktionaries in numerous other languages have since been started. Wiktionary was hosted on a temporary
domain name In the Internet, a domain name is a string that identifies a realm of administrative autonomy, authority, or control. Domain names are often used to identify services provided through the Internet, such as websites, email services, and more. ...
(wiktionary.wikipedia.org) until May 1, 2004, when it switched to the current domain name. , Wiktionary features over 30 million articles (and even more entries) across its editions. The largest of the language editions is the English Wiktionary, with over 7.5 million entries, followed by the French Wiktionary with over 4.7 million and the Malagasy Wiktionary with over 3.5 million entries. Forty-three Wiktionary language editions contain over 100,000 entries each. Many of the definitions at the project's largest language editions were created by
bots The British Overseas Territories (BOTs) or alternatively referred to as the United Kingdom Overseas Territories (UKOTs) are the fourteen dependent territory, territories with a constitutional and historical link with the United Kingdom that, ...
that found creative ways to generate entries or (rarely) automatically imported thousands of entries from previously published dictionaries. Seven of the 18 bots registered at the English Wiktionary in 2007 created 163,000 of the entries there.TheDaveBot

TheCheatBot

Websterbot

PastBot

NanshuBot
Another of these bots, " ThirdPersBot", was responsible for the addition of a number of third-person
conjugation Conjugation or conjugate may refer to: Linguistics *Grammatical conjugation, the modification of a verb from its basic form *Emotive conjugation or Russell's conjugation, the use of loaded language Mathematics *Complex conjugation, the change o ...
s that would not have received their own entries in standard dictionaries; for instance, it defined " smoulders" as the "third-person singular simple present form of smoulder." Of the 1,269,938 definitions the English Wiktionary provides for 996,450 English words, 478,068 are "form of" definitions of this kind. This means that even without such entries, its coverage of English is significantly larger than that of major monolingual print dictionaries. ''
Merriam-Webster's Third New International Dictionary ''Webster's Third New International Dictionary of the English Language, Unabridged'' (commonly known as ''Webster's Third'', or ''W3'') is an American English-language dictionary published in September 1961. It was edited by Philip Babcock Gove an ...
of the English Language, Unabridged'', for instance, has 475,000 entries (with many additional embedded headwords); the ''
Oxford English Dictionary The ''Oxford English Dictionary'' (''OED'') is the principal historical dictionary of the English language, published by Oxford University Press (OUP), a University of Oxford publishing house. The dictionary, which published its first editio ...
'' has 615,000 headwords, but includes
Middle English Middle English (abbreviated to ME) is a form of the English language that was spoken after the Norman Conquest of 1066, until the late 15th century. The English language underwent distinct variations and developments following the Old English pe ...
as well, for which the English Wiktionary has an additional 34,234 gloss definitions. Detailed
statistics Statistics (from German language, German: ', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a s ...
exist to show how many entries of various kinds exist. The English Wiktionary does not rely on bots to the extent that some other editions do. The French and
Vietnamese Vietnamese may refer to: * Something of, from, or related to Vietnam, a country in Southeast Asia * Vietnamese people, or Kinh people, a Southeast Asian ethnic group native to Vietnam ** Overseas Vietnamese, Vietnamese people living outside Vietna ...
Wiktionaries, for example, imported large sections of the Free
Vietnamese Vietnamese may refer to: * Something of, from, or related to Vietnam, a country in Southeast Asia * Vietnamese people, or Kinh people, a Southeast Asian ethnic group native to Vietnam ** Overseas Vietnamese, Vietnamese people living outside Vietna ...
Dictionary Project (FVDP), which provides free content bilingual dictionaries to and from Vietnamese. These imported entries make up virtually all of the Vietnamese edition's contents. Like the English edition, the French Wiktionary has imported approximately 20,000 entries from the
Unihan Han unification is an effort by the authors of Unicode and the Universal Character Set to map multiple character sets of the Han characters of the so-called CJK languages into a single set of unified characters. Han characters are a feature ...
database of Chinese, Japanese, Korean and Indian characters. The French Wiktionary grew rapidly in 2006 thanks in a large part to bots copying many entries from old, freely licensed dictionaries, such as the eighth edition of the (1935, around 35,000 words), and using bots to add words from other Wiktionary editions with French translations. The
Russian Russian(s) may refer to: *Russians (), an ethnic group of the East Slavic peoples, primarily living in Russia and neighboring countries *A citizen of Russia *Russian language, the most widely spoken of the Slavic languages *''The Russians'', a b ...
edition grew by nearly 80,000 entries as " LXbot" added boilerplate entries (with headings, but without definitions) for words in English and
German German(s) may refer to: * Germany, the country of the Germans and German things **Germania (Roman era) * Germans, citizens of Germany, people of German ancestry, or native speakers of the German language ** For citizenship in Germany, see also Ge ...
. As of July 2021, the English Wiktionary has over 791,870 gloss definitions and over 1,269,938 total definitions (including different forms) for English entries alone, with a total of over 9,928,056 definitions across all languages.


Logos

Wiktionary has historically lacked a uniform logo across its numerous language editions. Some editions use logos that depict a dictionary entry about the term "Wiktionary", based on the previous English Wiktionary logo, which was designed by Brooke Vibber, a
MediaWiki MediaWiki is free and open-source wiki software originally developed by Magnus Manske for use on Wikipedia on January 25, 2002, and further improved by Lee Daniel Crocker,mailarchive:wikipedia-l/2001-August/000382.html, Magnus Manske's announc ...
developer. Because a purely textual logo must vary considerably from language to language, a four-phase contest to adopt a uniform logo was held at the Wikimedia Meta-Wiki from September to October 2006. Some communities adopted the winning entry by " Smurrayinchester", a 3×3 grid of wooden tiles, each bearing a character from a different writing system. However, the poll did not see as much participation from the Wiktionary community as some community members had hoped, and a number of the larger wikis ultimately kept their textual logos. In April 2009, the issue was resurrected with a new contest. This time, a depiction by "AAEngelman" of an open hardbound dictionary won a head-to-head vote against the 2006 logo, but the process to refine and adopt the new logo then stalled. In the following years, some wikis replaced their textual logos with one of the two newer logos. In 2012, 55 wikis that had been using the English Wiktionary logo received localized versions of the 2006 design by "Smurrayinchester". In July 2016, the English Wiktionary adopted a variant of this logo. , 135 wikis, representing 61% of Wiktionary's entries, use a logo based on the 2006 design by "Smurrayinchester", 33 wikis (36%) use a textual logo, and three wikis (3%) use the 2009 design by "AAEngelman".


Multi-lingual

As of , there are Wiktionary sites for languages of which are active and are closed.
Wikimedia The Wikimedia Foundation, Inc. (WMF) is an American 501(c)(3) nonprofit organization headquartered in San Francisco, California, and registered there as a charitable foundation. It is the host of Wikipedia, the eighth most visited website ...
's
MediaWiki MediaWiki is free and open-source wiki software originally developed by Magnus Manske for use on Wikipedia on January 25, 2002, and further improved by Lee Daniel Crocker,mailarchive:wikipedia-l/2001-August/000382.html, Magnus Manske's announc ...
API:Sitematrix. Retrieved from Data:Wikipedia statistics/meta.tab
The active sites have articles, and the closed sites have articles.
Wikimedia The Wikimedia Foundation, Inc. (WMF) is an American 501(c)(3) nonprofit organization headquartered in San Francisco, California, and registered there as a charitable foundation. It is the host of Wikipedia, the eighth most visited website ...
's
MediaWiki MediaWiki is free and open-source wiki software originally developed by Magnus Manske for use on Wikipedia on January 25, 2002, and further improved by Lee Daniel Crocker,mailarchive:wikipedia-l/2001-August/000382.html, Magnus Manske's announc ...
API:Siteinfo. Retrieved from Data:Wikipedia statistics/data.tab
There are registered users of which are recently active. The top ten Wiktionary language projects by mainspace article count: For a complete list with totals see Wikimedia Statistics:


Critical reception

Critical reception of Wiktionary has been mixed. In 2006,
Jill Lepore Jill Lepore is an American historian and journalist. She is the David Woods Kemper '41 Professor of American History at Harvard University and a staff writer at ''The New Yorker'', where she has contributed since 2005. She writes about American h ...
wrote in the article "Noah's Ark" for ''
The New Yorker ''The New Yorker'' is an American magazine featuring journalism, commentary, criticism, essays, fiction, satire, cartoons, and poetry. It was founded on February 21, 1925, by Harold Ross and his wife Jane Grant, a reporter for ''The New York T ...
,''
There's no show of hands at ''Wiktionary''. There's not even an editorial staff. "Be your own lexicographer!", might be ''Wiktionary's'' motto. Who needs experts? Why pay good money for a dictionary written by lexicographers when we could cobble one together ourselves? ''Wiktionary'' isn't so much republican or democratic as
Maoist Maoism, officially Mao Zedong Thought, is a variety of Marxism–Leninism that Mao Zedong developed while trying to realize a socialist revolution in the agricultural, pre-industrial society of the Republic of China (1912–1949), Republic o ...
. And it's only as good as the copyright-expired books from which it pilfers.
Keir Graff Keir Graff (born 1969) is an American novelist and literary editor. Biography Graff was born and raised in Missoula, Montana. He has had four novels published and is also the executive editor of ''Booklist Publications'' at the American Library A ...
's review for ''
Booklist ''Booklist'' is a publication of the American Library Association that provides critical reviews of books and audiovisual materials for all ages. ''Booklist''s primary audience consists of libraries, educators, and booksellers. The magazine is ...
'' was less critical:
Is there a place for Wiktionary? Undoubtedly. The industry and enthusiasm of its many creators are proof that there's a market. And it's wonderful to have another strong source to use when searching the odd terms that pop up in today's fast-changing world and the online environment. But as with so many Web sources (including this column), it's best used by sophisticated users in conjunction with more reputable sources.
References in other publications are fleeting and part of larger discussions of Wikipedia, not progressing beyond a definition, although David Brooks in '' The Nashua Telegraph'' described it as "wild and woolly". One of the impediments to independent coverage of Wiktionary is the continuing confusion that it is merely an extension of Wikipedia. The measure of correctness of the inflections for a subset of the Polish words in the English Wiktionary showed that this grammatical data is very stable (a study showed that only 131 out of 4,748 Polish words have had their inflection data corrected). , Wiktionary has seen growing use in
academia An academy (Attic Greek: Ἀκαδήμεια; Koine Greek Ἀκαδημία) is an institution of tertiary education. The name traces back to Plato's school of philosophy, founded approximately 386 BC at Akademia, a sanctuary of Athena, the go ...
.


Wiktionary data in natural language processing

Wiktionary has
semi-structured data Semi-structured data is a form of structured data that does not obey the tabular structure of data models associated with relational databases or other forms of data tables, but nonetheless contains tags or other markers to separate semantic elem ...
. Wiktionary
lexicographic Lexicography is the study of lexicons and the art of compiling dictionaries. It is divided into two separate academic disciplines: * Practical lexicography is the art or craft of compiling, writing and editing dictionaries. * Theoretical lex ...
data can be converted to machine-readable format in order to be used in
natural language processing Natural language processing (NLP) is a subfield of computer science and especially artificial intelligence. It is primarily concerned with providing computers with the ability to process data encoded in natural language and is thus closely related ...
tasks. Wiktionary's
data mining Data mining is the process of extracting and finding patterns in massive data sets involving methods at the intersection of machine learning, statistics, and database systems. Data mining is an interdisciplinary subfield of computer science and ...
is a complex task. There are the following difficulties: * (1) the constant and frequent changes to data and schemata * (2) the heterogeneity in Wiktionary language edition schemata and * (3) the human-centric nature of a
wiki A wiki ( ) is a form of hypertext publication on the internet which is collaboratively edited and managed by its audience directly through a web browser. A typical wiki contains multiple pages that can either be edited by the public or l ...
. There are several parsers for different Wiktionary language editions: * DBpedia Wiktionary : a subproject of
DBpedia DBpedia (from "DB" for "database") is a project aiming to extract structured content from the information created in the Wikipedia project. This structured information is made available on the World Wide Web using OpenLink Virtuoso. DBpedia a ...
, the data are extracted from English, French, German, and Russian Wiktionaries; the data includes language,
parts of speech In grammar, a part of speech or part-of-speech (abbreviated as POS or PoS, also known as word class or grammatical category) is a category of words (or, more generally, of lexical items) that have similar grammatical properties. Words that are as ...
, definitions,
semantic relations Semantics is the study of linguistic meaning. It examines what meaning is, how words get their meaning, and how the meaning of a complex expression depends on its parts. Part of this process involves the distinction between sense and referenc ...
and translations. The declarative description of the page schema,
regular expression A regular expression (shortened as regex or regexp), sometimes referred to as rational expression, is a sequence of characters that specifies a match pattern in text. Usually such patterns are used by string-searching algorithms for "find" ...
s and
finite state transducer A finite-state transducer (FST) is a finite-state machine with two memory ''tapes'', following the terminology for Turing machines: an input tape and an output tape. This contrasts with an ordinary finite-state automaton, which has a single tape. ...
are used in order to extract information. * JWKTL (
Java Java is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea (a part of Pacific Ocean) to the north. With a population of 156.9 million people (including Madura) in mid 2024, proje ...
Wiktionary Library) : provides access to English Wiktionary and German Wiktionary dumps via a Java Wiktionary API. The data includes language, parts of speech, definitions, quotations, semantic relations, etymologies and translations. JWKTL is distributed under the
Apache License The Apache License is a permissive free software license written by the Apache Software Foundation (ASF). It allows users to use the software for any purpose, to distribute it, to modify it, and to distribute modified versions of the software ...
. * wikokit : the
parser Parsing, syntax analysis, or syntactic analysis is a process of analyzing a string of symbols, either in natural language, computer languages or data structures, conforming to the rules of a formal grammar by breaking it into parts. The term '' ...
of English Wiktionary and Russian Wiktionary. The parsed data includes language, parts of speech, definitions, quotations, semantic relations and translations. This is a multi-licensed
open-source Open source is source code that is made freely available for possible modification and redistribution. Products include permission to use and view the source code, design documents, or content of the product. The open source model is a decentrali ...
software. *
Etymological Etymology ( ) is the study of the origin and evolution of words—including their constituent units of sound and meaning—across time. In the 21st century a subfield within linguistics, etymology has become a more rigorously scientific study. ...
entries have been parsed in the Etymological
WordNet WordNet is a lexical database of semantic relations between words that links words into semantic relations including synonyms, hyponyms, and meronyms. The synonyms are grouped into ''synsets'' with short definitions and usage examples. It can thu ...
project. Examples of
natural language processing Natural language processing (NLP) is a subfield of computer science and especially artificial intelligence. It is primarily concerned with providing computers with the ability to process data encoded in natural language and is thus closely related ...
tasks which have been solved with the help of Wiktionary data include: *
Rule-based machine translation Rule-based machine translation (RBMT) is a classical approach of machine translation systems based on linguistic information about source and target languages. Such information is retrieved from (unilingual, bilingual or multilingual) dictionaries ...
between
Dutch language Dutch ( ) is a West Germanic languages, West Germanic language of the Indo-European language family, spoken by about 25 million people as a first language and 5 million as a second language and is the List of languages by total number of speak ...
and
Afrikaans Afrikaans is a West Germanic languages, West Germanic language spoken in South Africa, Namibia and to a lesser extent Botswana, Zambia, Zimbabwe and also Argentina where there is a group in Sarmiento, Chubut, Sarmiento that speaks the Pat ...
; data of English Wiktionary, Dutch Wiktionary and Wikipedia were used with the
Apertium Apertium is a free/open-source rule-based machine translation platform. It is free software and released under the terms of the GNU General Public License. Overview Apertium is a transfer-based machine translation system, which uses finite st ...
machine translation Machine translation is use of computational techniques to translate text or speech from one language to another, including the contextual, idiomatic and pragmatic nuances of both languages. Early approaches were mostly rule-based or statisti ...
platform. * Construction of
machine-readable dictionary Machine-readable dictionary (MRD) is a dictionary stored as machine-readable data instead of being printed on paper. It is an electronic dictionary and lexical database. A machine-readable dictionary is a dictionary in an electronic form that c ...
by the parser NULEX, which integrates open linguistic resources: English Wiktionary,
WordNet WordNet is a lexical database of semantic relations between words that links words into semantic relations including synonyms, hyponyms, and meronyms. The synonyms are grouped into ''synsets'' with short definitions and usage examples. It can thu ...
, and VerbNet. The parser NULEX scrapes English Wiktionary for tense information (verbs), plural form and parts of speech (nouns). *
Speech recognition Speech recognition is an interdisciplinary subfield of computer science and computational linguistics that develops methodologies and technologies that enable the recognition and translation of spoken language into text by computers. It is also ...
and
synthesis Synthesis or synthesize may refer to: Science Chemistry and biochemistry *Chemical synthesis, the execution of chemical reactions to form a more complex molecule from chemical precursors **Organic synthesis, the chemical synthesis of organi ...
, where Wiktionary was used to automatically create pronunciation dictionaries. Word-pronunciation pairs were retrieved from 6 Wiktionary language editions (
Czech Czech may refer to: * Anything from or related to the Czech Republic, a country in Europe ** Czech language ** Czechs, the people of the area ** Czech culture ** Czech cuisine * One of three mythical brothers, Lech, Czech, and Rus *Czech (surnam ...
, English, French,
Spanish Spanish might refer to: * Items from or related to Spain: **Spaniards are a nation and ethnic group indigenous to Spain **Spanish language, spoken in Spain and many countries in the Americas **Spanish cuisine **Spanish history **Spanish culture ...
, Polish, and German). Pronunciations are in terms of the
International Phonetic Alphabet The International Phonetic Alphabet (IPA) is an alphabetic system of phonetic notation based primarily on the Latin script. It was devised by the International Phonetic Association in the late 19th century as a standard written representation ...
. The ASR system based on English Wiktionary has the highest word error rate, where each third
phoneme A phoneme () is any set of similar Phone (phonetics), speech sounds that are perceptually regarded by the speakers of a language as a single basic sound—a smallest possible Phonetics, phonetic unit—that helps distinguish one word fr ...
has to be changed. *
Ontology engineering In computer science, information science and systems engineering, ontology engineering is a field which studies the methods and methodologies for building Ontology (information science), ontologies, which encompasses a representation, formal nami ...
and
semantic network A semantic network, or frame network is a knowledge base that represents semantic relations between concepts in a network. This is often used as a form of knowledge representation. It is a directed or undirected graph consisting of vertices, ...
constructing. * Ontology matching. *
Text simplification Text simplification is an operation used in natural language processing to change, enhance, classify, or otherwise process an existing body of human-readable text so its grammar and structure is greatly simplified while the underlying meaning an ...
. Medero & Ostendorf assessed vocabulary difficulty (
reading level Readability is the ease with which a Reading (process), reader can understand a written text. The concept exists in both natural language and programming languages though in different forms. In natural language, the readability of text depends o ...
detection) with the help of Wiktionary data. Properties of words extracted from Wiktionary entries (definition length and POS, sense, and translation counts) were investigated. Medero & Ostendorf expected that ** (1) very common words will be more likely to have multiple parts of speech, ** (2) common words will be more likely to have multiple senses, ** (3) common words will be more likely to have been translated into multiple languages. These features extracted from Wiktionary entries were useful in distinguishing word types that appear in
Simple English Wikipedia The Simple English Wikipedia is a modified English language, English-language edition of Wikipedia written primarily in Basic English and Learning English (version of English), Learning English. It is one of seven List of Wikipedias, Wikipedias ...
articles from words that only appear in the Standard English comparable articles. *
Part-of-speech tagging In corpus linguistics, part-of-speech tagging (POS tagging, PoS tagging, or POST), also called grammatical tagging, is the process of marking up a word in a text ( corpus) as corresponding to a particular part of speech, based on both its defini ...
. Li et al. (2012) built multilingual POS-taggers for eight resource-poor languages on the basis of English Wiktionary and
hidden Markov models A hidden Markov model (HMM) is a Markov model in which the observations are dependent on a latent (or ''hidden'') Markov process (referred to as X). An HMM requires that there be an observable process Y whose outcomes depend on the outcomes of X ...
. *
Sentiment analysis Sentiment analysis (also known as opinion mining or emotion AI) is the use of natural language processing, text analysis, computational linguistics, and biometrics to systematically identify, extract, quantify, and study affective states and subje ...
. "
Wikidata Wikidata is a collaboratively edited multilingual knowledge graph hosted by the Wikimedia Foundation. It is a common source of open data that Wikimedia projects such as Wikipedia, and anyone else, are able to use under the CC0 public domain ...
:Lexicographical data" was started in 2018 to provide structured data support to Wiktionaries. It stores word data of all languages in a machine readable data model, under a dedicated "
Lexeme A lexeme () is a unit of lexical meaning that underlies a set of words that are related through inflection. It is a basic abstract unit of meaning, a unit of morphological analysis in linguistics that roughly corresponds to a set of forms ta ...
" namespace in Wikidata. As of October 2021, the project has amassed over 600,000 lexeme entries of various languages.


See also

* Lingua Libre


Notes


References


Citations


Sources

* * * * * * * * * * * * * * * * * * *


External links

* * Wikipedia:List of Wiktionaries * List of all Wiktionary editions * * /en.wiktionary.org/wiki/Wiktionary:Multilingual_statistics Wiktionary's multilingual statistics* Wikimedia's page on Wiktionary (including list of all existing Wiktionaries) * Pages about Wiktionary in Meta. {{Dictionaries of English Etymological dictionaries Internet properties established in 2002 MediaWiki websites Multilingual websites Online dictionaries Wikimedia projects Jimmy Wales Larry Sanger