Wiktionary is a multilingual, web-based project to create a free
content dictionary of all words in all languages. It is
collaboratively edited via a wiki, and its name is a portmanteau of
the words wiki and dictionary. It is available in 171 languages and in
Simple English. Like its sister project,
Wiktionary is run
by the Wikimedia Foundation, and is written collaboratively by
volunteers, dubbed "Wiktionarians". Its wiki software, MediaWiki,
allows almost anyone with access to the website to create and edit
Wiktionary is not limited by print space considerations, most
of Wiktionary's language editions provide definitions and translations
of words from many languages, and some editions offer additional
information typically found in thesauri and lexicons. The English
Wiktionary includes a thesaurus (formerly known as Wikisaurus) of
synonyms of various words.
Wiktionary data are frequently used in various natural language
1 History and development
3 Critical reception
Wiktionary data in natural language processing
5 See also
8 External links
History and development
Wiktionary was brought online on December 12, 2002,[a] following a
proposal by Daniel Alston and an idea by Larry Sanger, co-founder of
Wikipedia.[b] On March 28, 2004, the first non-English Wiktionaries
were initiated in French and Polish. Wiktionaries in numerous other
languages have since been started.
Wiktionary was hosted on a
temporary domain name (wiktionary.wikipedia.org) until May 1, 2004,
when it switched to the current domain name.[c] As of
Wiktionary features over 25.9 million entries
across its editions. The largest of the language editions is the
English Wiktionary, with over 5.5 million entries, followed by the
Wiktionary with over 4 million bot-generated entries and the
Wiktionary with over 3.2 million. Forty-one
editions now contain over 100,000 entries each.[d]
The use of bots to generate large numbers of articles is visible as
"growth spurts" in this graph of article counts at the largest eight
Wiktionary editions. (Data as of December 2009[update])
Most of the entries and many of the definitions at the project's
largest language editions were created by bots that found creative
ways to generate entries or (rarely) automatically imported thousands
of entries from previously published dictionaries. Seven of the 18
bots registered at the English Wiktionary[e] created 163,000 of the
Another of these bots, "ThirdPersBot," was responsible for the
addition of a number of third-person conjugations that would not have
received their own entries in standard dictionaries; for instance, it
defined "smoulders" as the "third-person singular simple present form
of smoulder." Of the 648,970 definitions the English Wiktionary
provides for 501,171 English words, 217,850 are "form of" definitions
of this kind. This means its coverage of English is slightly
smaller than that of major monolingual print dictionaries. The Oxford
English Dictionary, for instance, has 615,000 headwords, while
Merriam-Webster's Third New International
Dictionary of the English
Language, Unabridged has 475,000 entries (with many additional
embedded headwords). Detailed statistics exist to show how many
entries of various kinds exist.
Wiktionary does not rely on bots to the extent that some
other editions do. The French and Vietnamese Wiktionaries, for
example, imported large sections of the Free Vietnamese Dictionary
Project (FVDP), which provides free content bilingual dictionaries to
and from Vietnamese.[f] These imported entries make up virtually all
of the Vietnamese edition's contents. Almost all non-Malagasy-language
entries of the Malagasy
Wiktionary were copied by bot from other
Wiktionaries. Like the English edition, the French
imported the approximately 20,000 entries from the Unihan database of
Chinese, Japanese, and Korean characters. The French
rapidly in 2006 thanks in large part to bots copying many entries from
old, freely licensed dictionaries, such as the eighth edition of the
Dictionnaire de l'Académie française
Dictionnaire de l'Académie française (1935, around 35,000 words),
and using bots to add words from other
Wiktionary editions with French
translations. The Russian edition grew by nearly 80,000 entries as
"LXbot" added boilerplate entries (with headings, but without
definitions) for words in English and German.
In 2017 English part of en.wiktionary had over 500,000 gloss
definitions and over 900,000 definitions (including different
Wiktionary has historically lacked a uniform logo across its numerous
language editions. Some editions use logos that depict a dictionary
entry about the term "Wiktionary", based on the previous English
Wiktionary logo, which was designed by Brion Vibber, a MediaWiki
developer.[g] Because a purely textual logo must vary considerably
from language to language, a four-phase contest to adopt a uniform
logo was held at the Wikimedia Meta-
Wiki from September to October
2006.[h] Some communities adopted the winning entry by
"Smurrayinchester", a 3×3 grid of wooden tiles, each bearing a
character from a different writing system. However, the poll did not
see as much participation from the
Wiktionary community as some
community members had hoped, and a number of the larger wikis
ultimately kept their textual logos.[h]
In April 2009, the issue was resurrected with a new contest. This
time, a depiction by "AAEngelman" of an open hardbound dictionary won
a head-to-head vote against the 2006 logo, but the process to refine
and adopt the new logo then stalled.[i] In the following years, some
wikis replaced their textual logos with one of the two newer logos. In
2012, 55 wikis that had been using the English
received localized versions of the 2006 design by
"Smurrayinchester".[j] In July 2016, the English
Wiktionary adopted a
variant of this logo. As of 4 July 2016[update], 135
wikis, representing 61% of Wiktionary's entries, use a logo based on
the 2006 design by "Smurrayinchester", 33 wikis (36%) use a textual
logo, and three wikis (3%) use the 2009 design by "AAEngelman".[k]
To ensure accuracy, the English
Wiktionary has a policy requiring that
terms be attested. Terms in major languages such as English and
Chinese must be verified by:
clearly widespread use, or
use in permanently recorded media, conveying meaning, in at least
three independent instances spanning at least a year.
For smaller languages such as Creek and extinct languages such as
Latin, one use in a permanently recorded medium or one mention in a
reference work is sufficient verification.
This section's factual accuracy may be compromised due to out-of-date
information. Please update this article to reflect recent events or
newly available information. (May 2013)
Critical reception of
Wiktionary has been mixed. In 2006 Jill Lepore
wrote in the article "Noah's Ark" for The New Yorker,[l]
There's no show of hands at Wiktionary. There's not even an editorial
staff. "Be your own lexicographer!", might be Wiktionary's motto. Who
needs experts? Why pay good money for a dictionary written by
lexicographers when we could cobble one together ourselves?
Wiktionary isn't so much republican or democratic as Maoist. And it's
only as good as the copyright-expired books from which it pilfers.
Keir Graff's review for Booklist was less critical:
Is there a place for Wiktionary? Undoubtedly. The industry and
enthusiasm of its many creators are proof that there's a market. And
it's wonderful to have another strong source to use when searching the
odd terms that pop up in today's fast-changing world and the online
environment. But as with so many Web sources (including this column),
it's best used by sophisticated users in conjunction with more
reputable sources.
References in other publications are fleeting and part of larger
discussions of, not progressing beyond a definition,
although David Brooks in The Nashua Telegraph described it as "wild
and woolly".[m] One of the impediments to independent coverage of
Wiktionary is the continuing confusion that it is merely an extension
of.[n] In 2005,
PC Magazine rated
Wiktionary as one of the
Internet's "Top 101 Web Sites", although little information was
given about the site.
The measure of correctness of the inflections for a subset of the
Polish words in the English
Wiktionary showed that this grammatical
data is very stable. Only 131 out of 4748 Polish words have had their
inflection data corrected.
Wiktionary data in natural language processing
Wiktionary has semi-structured data.
Wiktionary lexicographic data
can be converted to machine-readable format in order to be used in
natural language processing tasks.
Wiktionary data mining is a complex task. There are the following
difficulties: (1) the constant and frequent changes to data and
schemata, (2) the heterogeneity in
Wiktionary language edition
schemata [o] and (3) the human-centric nature of a wiki.
There are several parsers for different
DBpedia Wiktionary: a subproject of DBpedia, the data are
extracted from English, French, German and Russian wiktionaries; the
data includes language, part of speech, definitions, semantic
relations and translations. The declarative description of the page
schema, regular expressions and finite state transducer
are used in order to extract information.
Wiktionary Library): provides access to English
Wiktionary and German
Wiktionary dumps via a Java
The data includes language, part of speech, definitions, quotations,
semantic relations, etymologies and translations. JWKTL is available
for non-commercial use.
wikokit: the parser of English
Wiktionary and Russian
Wiktionary. The parsed data includes language, part of speech,
definitions, quotations,[p] semantic relations and
translations. This is a multi-licensed open-source software.
Etymological entries have been parsed in the Etymological WordNet
The various natural language processing tasks were solved with the
Rule-based machine translation between
Dutch language and Afrikaans;
data of English Wiktionary, Dutch
Wiktionary and were used
Apertium machine translation platform.
Construction of machine-readable dictionary by the parser NULEX, which
integrates open linguistic resources: English Wiktionary, WordNet, and
VerbNet. The parser NULEX scrapes English
Wiktionary for tense
information (verbs), plural form and part of speech (nouns).
Speech recognition and synthesis, where
Wiktionary was used to
automatically create pronunciation dictionaries.
Word-pronunciation pairs were retrieved from 6
editions (Czech, English, French, Spanish, Polish, and German).
Pronunciations are in terms of the International Phonetic Alphabet.[q]
The ASR system based on English
Wiktionary has the highest word error
rate, where each third phoneme has to be changed.
Ontology engineering and semantic network constructing.[r]
Text simplification. Medero & Ostendorf assessed vocabulary
difficulty (reading level detection) with the help of
Properties of words extracted from
Wiktionary entries (definition
length and POS, sense, and translation counts) were investigated.
Medero & Ostendorf expected that (1) very common words will be
more likely to have multiple parts of speech, (2) common words to be
more likely to have multiple senses, (3) common words will be more
likely to have been translated into multiple languages. These features
Wiktionary entries were useful in distinguishing word
types that appear in Simple English articles from words that
only appear in the Standard English comparable articles.
Part-of-speech tagging. Li et al. (2012) built multilingual
POS-taggers for eight resource-poor languages on the basis of English
Wiktionary and Hidden Markov Models.[s]
List of Wiktionaries
^ mailing list archive discussion announcing the opening of
Wiktionary project – Retrieved May 3, 2011
^ mailing list archive discussion from
Larry Sanger giving
the idea on
Wiktionary – Retrieved May 3, 2011
^ Wiktionary's current URL is www.wiktionary.org.
Wiktionary total article counts are here. Detailed statistics by
word type are available here .
^ The user list at the English
Wiktionary identifies accounts that
have been given "bot status".
^ Hồ Ngọc Đức, Free Vietnamese
Dictionary Project. Details at
the Vietnamese Wiktionary.
Wiktionary Logo", English Wiktionary, Wikimedia
^ a b "Wiktionary/logo", Meta-Wiki, Wikimedia Foundation.
^ "Wiktionary/logo/refresh/voting", Meta-Wiki, Wikimedia Foundation.
^ [Translators-l] 56 Wiktionaries got a localised logo
^ m:Wiktionary/logo#Logo use statistics.
^ The full article is not available on-line.
^ David Brooks, "Online, interactive encyclopedia not just for geeks
anymore, because everyone seems to need it now, more than ever!" The
Nashua Telegraph (August 4, 2004)
^ In this citation, the author refers to
Wiktionary as part of the
site: Adapted from an article by Naomi DeTullio (2006).
"Wikis for Librarians" (PDF). NETLS News #142. Northeast Texas Library
System. p. 15. Archived from the original (PDF newsletter) on
2007-06-05. Retrieved April 21, 2007.
^ E.g. compare the entry structure and formatting rules in English
Wiktionary and Russian Wiktionary.
^ Quotations are extracted only from Russian Wiktionary.
^ If there are several IPA notations on a
Wiktionary page – either
for different languages or for pronunciation variants, then the first
pronunciation was extracted.
^ The source code and the results of POS-tagging are available at
^ "Wiktionary.org Site Info". Alexa Internet. Retrieved
^ TheDaveBot Archived 2007-10-11 at the Wayback Machine., TheCheatBot
Archived 2007-10-11 at the Wayback Machine., Websterbot Archived
2007-10-11 at the Wayback Machine., PastBot Archived 2007-10-11 at the
Wayback Machine., NanshuBot Archived 2007-10-11 at the Wayback
^ Detailed statistics as of 1 July 2013
^ LXbot Archived May 24, 2008, at the Wayback Machine.
^ Wikitionary statistics
^ "Wiktionary:Criteria for inclusion". Wiktionary. Retrieved 13 March
^ Lepore 2006.
^ PC Mag 2005.
^ Kurmas 2010.
^ Meyer & Gurevych 2012, p. 140.
^ Zesch, Müller & Gurevych 2008, p. 4, Figure 1.
^ Meyer & Gurevych 2010, p. 40.
^ Krizhanovsky, Transformation 2010, p. 1.
^ Hellmann & Auer 2013, p. 302, p. 16 in PDF.
^ Hellmann, Brekle & Auer 2012, p. 3, Table 1.
Wiktionary Archived 2013-05-04 at the Wayback Machine.
^ Hellmann, Brekle & Auer 2012, pp. 8–9.
^ Hellmann, Brekle & Auer 2012, p. 10.
^ Hellmann, Brekle & Auer 2012, p. 11.
^ Zesch, Müller & Gurevych 2008.
^ Krizhanovsky, Transformation 2010.
^ a b Smirnov 2012.
^ Krizhanovsky, Comparison 2010.
^ Etymological WordNet
^ Otte & Tyers 2011.
^ McFate & Forbus 2011.
^ Schlippe, Ochs & Schultz 2012.
^ Schlippe, Ochs & Schultz 2012, p. 4802.
^ Schlippe, Ochs & Schultz 2012, p. 4804.
^ Meyer & Gurevych 2012.
^ Lin & Krizhanovsky 2011.
^ Medero & Ostendorf 2009.
^ Li, Graça & Taskar 2012.
^ Chesley et al. 2006.
Chesley, Paula; Vincent, Bruce; Xu, Li; Srihari, Rohini K. (2006).
"Using verbs and adjectives to automatically classify blog sentiment"
(PDF). Training. 580: 233–235. Retrieved May 9, 2013.
Hellmann, Sebastian; Brekle, Jonas; Auer, Sören (2012). "Leveraging
the Crowdsourcing of Lexical Resources for Bootstrapping a Linguistic
Data Cloud" (PDF). Proc. Joint Int. Semantic Technology Conference
(JIST). Nara, Japan.
Hellmann, S.; Auer, S. (2013). "Towards Web-Scale Collaborative
Knowledge Extraction" (PDF). In Gurevych, Iryna; Kim, Jungi. The
People's Web Meets NLP. Theory and Applications of Natural Language
Processing. Springer-Verlag. pp. 287–313.
Krizhanovsky, Andrew (2010). "Transformation of
structure into tables and relations in a relational database schema".
Krizhanovsky, Andrew (2010). "The comparison of
transformed into the machine-readable format". arXiv:1006.5040
Kurmas, Zachary (July 2010). Zawilinski: a library for studying
grammar in Wiktionary. Proceedings of the 6th International Symposium
on Wikis and Open Collaboration. Gdansk, Poland. Retrieved
Li, Shen; Graça, Joao V.; Taskar, Ben (2012). "Wiki-ly supervised
part-of-speech tagging" (PDF). Proceedings of the 2012 Joint
Conference on Empirical Methods in Natural
Language Processing and
Language Learning. Jeju Island, Korea:
Association for Computational Linguistics. pp. 1389–1398.
Lepore, Jill (November 6, 2006). "Noah's Ark" (Abstract). The New
Yorker. Retrieved April 21, 2007.
Lin, Feiyu; Krizhanovsky, Andrew (2011). "Multilingual ontology
matching based on
Wiktionary data accessible via SPARQL endpoint".
Proc. of the 13th Russian Conference on Digital Libraries RCDL'2011.
Voronezh, Russia. pp. 19–26. arXiv:1109.0732 .
McFate, Clifton J.; Forbus, Kenneth D. (2011). "NULEX: an open-license
broad coverage lexicon" (PDF). The 49th Annual Meeting of the
Association for Computational Linguistics: Human Language
Technologies, Proceedings of the Conference. Portland, Oregon, USA:
The Association for Computer Linguistics. pp. 363–367.
Medero, Julie; Ostendorf, Mari (2009). "Analysis of vocabulary
difficulty using wiktionary" (PDF). Proc. SLaTE Workshop.
Meyer, C. M.; Gurevych, I. (2010). "Worth its Weight in Gold or Yet
Another Resource - A Comparative Study of Wiktionary, OpenThesaurus
and GermaNet" (PDF). Proc. 11th International Conference on
Intelligent Text Processing and Computational Linguistics, Iasi,
Romania. pp. 38–49.
Otte, Pim; Tyers, F. M. (2011). "Rapid rule-based machine translation
between Dutch and Afrikaans" (PDF). In Forcada, Mikel L.; Depraetere,
Heidi; Vandeghinste, Vincent. 16th Annual Conference of the European
Association of Machine Translation, EAMT11. Leuven, Belgium.
Schlippe, Tim; Ochs, Sebastian; Schultz, Tanja (2012).
"Grapheme-to-phoneme model generation for Indo-European languages"
(PDF). Acoustics, Speech and Signal Processing (ICASSP). Kyoto, Japan.
Smirnov A., Levashova T., Karpov A., Kipyatkova I., Ronzhin A.,
Krizhanovsky A., Krizhanovsky N.. Analysis of the quotation corpus of
the Russian Wiktionary. Research in Computing Science. 2012 [Retrieved
Zesch, Torsten; Müller, Christof; Gurevych, Iryna (2008). "Extracting
Lexical Semantic Knowledge from and Wiktionary" (PDF).
Proceedings of the Conference on
Language Resources and Evaluation
(LREC). Marrakech, Morocco.
"Wiktionary". Top 101 Web Sites. PC Magazine. April 6, 2005. Retrieved
December 16, 2005.
Wiktionary in Wiktionary, the free dictionary.
List of all
Wiktionary front page
Wiktionary Android package at the
Wiktionary on Google Play
Wiktionary's multilingual statistics
Wikimedia's page on
Wiktionary (including list of all existing
Wiktionary in Meta.
Meta:Main Page – OmegaWiki
Oscar van Dillen
List of Wiktionaries
List of Wikimedia chapters
Wikimedia v. NSA
Dictionaries of English
Old and Middle English
An Anglo-Saxon Dictionary
Dictionary of Old English
Middle English Dictionary
Catholicon Anglicum (1483)
The English Schoole-Master (1596)
The New World of English Words (1658)
A New English
An Universal Etymological English
Dictionary of the English
Richardson's New Dictionary
Dictionary of American English
Dictionary of American Regional English
New Oxford American
Random House Webster's
Webster's New World
Webster's Third New International Dictionary
World Book Dictionary
Concise Oxford English
Compact Oxford English
Shorter Oxford English
Dictionary of English
Dictionary of Canadianisms
Dictionary of English
Learners / ESL
Cambridge Advanced Learner's
Collins COBUILD Advanced
Dictionary of Contemporary English
Dictionary for Advanced Learners
Merriam-Webster's Advanced Learner's