Spellcheck (comics)
   HOME

TheInfoList



OR:

In
software Software is a set of computer programs and associated documentation and data. This is in contrast to hardware, from which the system is built and which actually performs the work. At the lowest programming level, executable code consists ...
, a spell checker (or spelling checker or spell check) is a
software feature In software, the term feature has several definitions. The Institute of Electrical and Electronics Engineers defines the term ''feature'' in IEEE 829 as " distinguishing characteristic of a software item (e.g., performance, portability, or functio ...
that checks for misspellings in a
text Text may refer to: Written word * Text (literary theory), any object that can be read, including: **Religious text, a writing that a religious tradition considers to be sacred **Text, a verse or passage from scripture used in expository preachin ...
. Spell-checking features are often embedded in
software Software is a set of computer programs and associated documentation and data. This is in contrast to hardware, from which the system is built and which actually performs the work. At the lowest programming level, executable code consists ...
or services, such as a
word processor A word processor (WP) is a device or computer program that provides for input, editing, formatting, and output of text, often with some additional features. Word processor (electronic device), Early word processors were stand-alone devices ded ...
,
email client An email client, email reader or, more formally, message user agent (MUA) or mail user agent is a computer program used to access and manage a user's email. A web application which provides message management, composition, and reception functio ...
, electronic
dictionary A dictionary is a listing of lexemes from the lexicon of one or more specific languages, often arranged alphabetically (or by radical and stroke for ideographic languages), which may include information on definitions, usage, etymologies ...
, or
search engine A search engine is a software system designed to carry out web searches. They search the World Wide Web in a systematic way for particular information specified in a textual web search query. The search results are generally presented in a ...
.


Design

A basic spell checker carries out the following processes: * It scans the text and extracts the words contained in it. * It then compares each word with a known list of correctly spelled words (i.e. a dictionary). This might contain just a list of words, or it might also contain additional information, such as hyphenation points or lexical and grammatical attributes. * An additional step is a language-dependent algorithm for handling
morphology Morphology, from the Greek and meaning "study of shape", may refer to: Disciplines *Morphology (archaeology), study of the shapes or forms of artifacts *Morphology (astronomy), study of the shape of astronomical objects such as nebulae, galaxies, ...
. Even for a lightly inflected language like
English English usually refers to: * English language * English people English may also refer to: Peoples, culture, and language * ''English'', an adjective for something of, from, or related to England ** English national ide ...
, the spell checker will need to consider different forms of the same word, such as plurals, verbal forms,
contraction Contraction may refer to: Linguistics * Contraction (grammar), a shortened word * Poetic contraction, omission of letters for poetic reasons * Elision, omission of sounds ** Syncope (phonology), omission of sounds in a word * Synalepha, merged ...
s, and
possessive A possessive or ktetic form (abbreviated or ; from la, possessivus; grc, κτητικός, translit=ktētikós) is a word or grammatical construction used to indicate a relationship of possession in a broad sense. This can include strict owne ...
s. For many other languages, such as those featuring agglutination and more complex declension and conjugation, this part of the process is more complicated. It is unclear whether morphological analysis—allowing for many forms of a word depending on its grammatical role—provides a significant benefit for English, though its benefits for highly
synthetic language A synthetic language uses inflection or agglutination to express Syntax, syntactic relationships within a sentence. Inflection is the addition of morphemes to a root word that assigns grammatical property to that word, while agglutination is the ...
s such as German, Hungarian, or Turkish are clear. As an adjunct to these components, the program's
user interface In the industrial design field of human–computer interaction, a user interface (UI) is the space where interactions between humans and machines occur. The goal of this interaction is to allow effective operation and control of the machine f ...
allows users to approve or reject replacements and modify the program's operation. Spell checkers can use
approximate string matching In computer science, approximate string matching (often colloquially referred to as fuzzy string searching) is the technique of finding strings that match a pattern approximately (rather than exactly). The problem of approximate string matching i ...
algorithms such as
Levenshtein distance In information theory, linguistics, and computer science, the Levenshtein distance is a string metric for measuring the difference between two sequences. Informally, the Levenshtein distance between two words is the minimum number of single-charact ...
to find correct spellings of misspelled words. An alternative type of spell checker uses solely statistical information, such as
n-gram In the fields of computational linguistics and probability, an ''n''-gram (sometimes also called Q-gram) is a contiguous sequence of ''n'' items from a given sample of text or speech. The items can be phonemes, syllables, letters, words or b ...
s, to recognize errors instead of correctly-spelled words. This approach usually requires a lot of effort to obtain sufficient statistical information. Key advantages include needing less runtime storage and the ability to correct errors in words that are not included in a dictionary. In some cases, spell checkers use a fixed list of misspellings and suggestions for those misspellings; this less flexible approach is often used in paper-based correction methods, such as the ''see also'' entries of encyclopedias.
Clustering algorithm Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). It is a main task of ...
s have also been used for spell checking combined with phonetic information.


History


Pre-PC

In 1961,
Les Earnest Lester Donald Earnest (born December 17, 1930) is an American computer scientist. Education and career After receiving his B.S. in electrical engineering from the California Institute of Technology (Caltech) in 1953, he began his career as a comp ...
, who headed the research on this budding technology, saw it necessary to include the first spell checker that accessed a list of 10,000 acceptable words. Ralph Gorin, a graduate student under Earnest at the time, created the first true spelling checker program written as an applications program (rather than research) for general English text: SPELL for the DEC PDP-10 at Stanford University's Artificial Intelligence Laboratory, in February 1971. Gorin wrote SPELL in
assembly language In computer programming, assembly language (or assembler language, or symbolic machine code), often referred to simply as Assembly and commonly abbreviated as ASM or asm, is any low-level programming language with a very strong correspondence be ...
, for faster action; he made the first spelling corrector by searching the word list for plausible correct spellings that differ by a single letter or adjacent letter transpositions and presenting them to the user. Gorin made SPELL publicly accessible, as was done with most SAIL (Stanford Artificial Intelligence Laboratory) programs, and it soon spread around the world via the new ARPAnet, about ten years before personal computers came into general use. SPELL, its algorithms and data structures inspired the Unix ''ispell'' program. The first spell checkers were widely available on mainframe computers in the late 1970s. A group of six linguists from
Georgetown University Georgetown University is a private university, private research university in the Georgetown (Washington, D.C.), Georgetown neighborhood of Washington, D.C. Founded by Bishop John Carroll (archbishop of Baltimore), John Carroll in 1789 as Georg ...
developed the first spell-check system for the IBM corporation., citation: "Maria Mariani... was one of a group of six linguists from Georgetown University who developed the first spell-check system for the IBM corporation."
Henry Kučera Henry Kučera (15 February 1925 – 20 February 2010), born Jindřich Kučera () was a Czech-American linguist who pioneered corpus linguistics, linguistic software, a major contributor to the ''American Heritage Dictionary'', and a pioneer in ...
invented one for the VAX machines of Digital Equipment Corp in 1981.


PCs

The first spell checkers for personal computers appeared in 1980, such as "WordCheck" for Commodore systems which was released in late 1980 in time for advertisements to go to print in January 1981. Developers such as Maria Mariani and
Random House Random House is an American book publisher and the largest general-interest paperback publisher in the world. The company has several independently managed subsidiaries around the world. It is part of Penguin Random House, which is owned by Germ ...
rushed
OEM An original equipment manufacturer (OEM) is generally perceived as a company that produces non-aftermarket parts and equipment that may be marketed by another manufacturer. It is a common industry term recognized and used by many professional or ...
packages or end-user products into the rapidly expanding software market. On the pre-Windows PCs, these spell checkers were standalone programs, many of which could be run in TSR mode from within word-processing packages on PCs with sufficient memory. However, the market for standalone packages was short-lived, as by the mid-1980s developers of popular word-processing packages like
WordStar WordStar is a word processor application for microcomputers. It was published by MicroPro International and originally written for the CP/M-80 operating system, and later written also for MS-DOS and other 16-bit PC OSes. Rob Barnaby was the sol ...
and WordPerfect had incorporated spell checkers in their packages, mostly licensed from the above companies, who quickly expanded support from just
English English usually refers to: * English language * English people English may also refer to: Peoples, culture, and language * ''English'', an adjective for something of, from, or related to England ** English national ide ...
to many
Europe Europe is a large peninsula conventionally considered a continent in its own right because of its great physical size and the weight of its history and traditions. Europe is also considered a Continent#Subcontinents, subcontinent of Eurasia ...
an and eventually even
Asian language A wide variety of languages are spoken throughout Asia, comprising different language families and some unrelated isolates. The major language families include Austroasiatic, Austronesian, Caucasian, Dravidian, Indo-European, Afroasiatic, Turk ...
s. However, this required increasing sophistication in the morphology routines of the software, particularly with regard to heavily-
agglutinative In linguistics, agglutination is a morphological process in which words are formed by stringing together morphemes, each of which corresponds to a single syntactic feature. Languages that use agglutination widely are called agglutinative langu ...
languages like Hungarian and
Finnish Finnish may refer to: * Something or someone from, or related to Finland * Culture of Finland * Finnish people or Finns, the primary ethnic group in Finland * Finnish language, the national language of the Finnish people * Finnish cuisine See also ...
. Although the size of the word-processing market in a country like
Iceland Iceland ( is, Ísland; ) is a Nordic island country in the North Atlantic Ocean and in the Arctic Ocean. Iceland is the most sparsely populated country in Europe. Iceland's capital and largest city is Reykjavík, which (along with its s ...
might not have justified the investment of implementing a spell checker, companies like WordPerfect nonetheless strove to localize their software for as many national markets as possible as part of their global
marketing Marketing is the process of exploring, creating, and delivering value to meet the needs of a target market in terms of goods and services; potentially including selection of a target audience; selection of certain attributes or themes to emph ...
strategy. When Apple developed "a system-wide spelling checker" for Mac OS X so that "the operating system took over spelling fixes," it was a first: one "didn't have to maintain a separate spelling checker for each" program.
Mac OS X macOS (; previously OS X and originally Mac OS X) is a Unix operating system developed and marketed by Apple Inc. since 2001. It is the primary operating system for Apple's Mac (computer), Mac computers. Within the market of ...
's spellcheck coverage includes virtually all bundled and third party applications. ''Visual Tools ''VT Speller'', introduced in 1994, was "designed for developers of applications that support Windows." It came with a dictionary but had the ability to build and incorporate use of secondary dictionaries.


Browsers

Firefox Mozilla Firefox, or simply Firefox, is a free and open-source web browser developed by the Mozilla Foundation and its subsidiary, the Mozilla Corporation. It uses the Gecko rendering engine to display web pages, which implements current and ...
2.0, a
web browser A web browser is application software for accessing websites. When a user requests a web page from a particular website, the browser retrieves its files from a web server and then displays the page on the user's screen. Browsers are used on ...
, has spell check support for user-written content, such as when editing Wikitext, writing on many
webmail Webmail (or web-based email) is an email service that can be accessed using a standard web browser. It contrasts with email service accessible through a specialised email client software. Examples of webmail providers are 1&1 Ionos, AOL Mail, G ...
sites,
blogs A blog (a truncation of "weblog") is a discussion or informational website published on the World Wide Web consisting of discrete, often informal diary-style text entries (posts). Posts are typically displayed in reverse chronological order ...
, and
social networking A social network is a social structure made up of a set of social actors (such as individuals or organizations), sets of dyadic ties, and other social interactions between actors. The social network perspective provides a set of methods for an ...
websites. The web browsers
Google Chrome Google Chrome is a cross-platform web browser developed by Google. It was first released in 2008 for Microsoft Windows, built with free software components from Apple WebKit and Mozilla Firefox. Versions were later released for Linux, macOS ...
,
Konqueror Konqueror is a free and open-source web browser and file manager that provides web access and file-viewer functionality for file systems (such as local files, files on a remote FTP server and files in a disk image). It forms a core part of ...
, and
Opera Opera is a form of theatre in which music is a fundamental component and dramatic roles are taken by singers. Such a "work" (the literal translation of the Italian word "opera") is typically a collaboration between a composer and a librett ...
, the email client
Kmail Kontact is a personal information manager and groupware software suite developed by KDE. It supports calendars, contacts, notes, to-do lists, news, and email. It offers a number of inter-changeable graphical UIs (KMail, KAddressBook, Akregator, ...
and the
instant messaging Instant messaging (IM) technology is a type of online chat allowing real-time text transmission over the Internet or another computer network. Messages are typically transmitted between two or more parties, when each user inputs text and trigge ...
client Client(s) or The Client may refer to: * Client (business) * Client (computing), hardware or software that accesses a remote service on another computer * Customer or client, a recipient of goods or services in return for monetary or other valuable ...
Pidgin A pidgin , or pidgin language, is a grammatically simplified means of communication that develops between two or more groups of people that do not have a language in common: typically, its vocabulary and grammar are limited and often drawn from s ...
also offer spell checking support, transparently using previously
GNU Aspell GNU Aspell, usually called just Aspell, is a free software spell checker designed to replace Ispell. It is the standard spell checker for the GNU operating system. It also compiles for other Unix-like operating systems and Windows. The main pro ...
and currently
Hunspell Hunspell is a spell checker and morphological analyser designed for languages with rich morphology and complex word compounding and character encoding, originally designed for the Hungarian language. Hunspell is based on MySpell and is backwar ...
as their engine.


Specialties

Some spell checkers have separate support for medical dictionaries to help prevent medical errors.


Functionality

The first spell checkers were "verifiers" instead of "correctors." They offered no suggestions for incorrectly spelled words. This was helpful for
typos A typographical error (often shortened to typo), also called a misprint, is a mistake (such as a spelling mistake) made in the typing of printed (or electronic) material. Historically, this referred to mistakes in manual type-setting (typography). ...
but it was not so helpful for logical or phonetic errors. The challenge the developers faced was the difficulty in offering useful suggestions for misspelled words. This requires reducing words to a skeletal form and applying pattern-matching algorithms. It might seem logical that where spell-checking dictionaries are concerned, "the bigger, the better," so that correct words are not marked as incorrect. In practice, however, an optimal size for English appears to be around 90,000 entries. If there are more than this, incorrectly spelled words may be skipped because they are mistaken for others. For example, a linguist might determine on the basis of
corpus linguistics Corpus linguistics is the study of language, study of a language as that language is expressed in its text corpus (plural ''corpora''), its body of "real world" text. Corpus linguistics proposes that a reliable analysis of a language is more feas ...
that the word ''
baht The baht (; th, บาท, ; currency sign, sign: ฿; ISO 4217, code: THB) is the official currency of Thailand. It is divided into 100 ''satang'' (, ). The issuance of currency is the responsibility of the Bank of Thailand. Society for Worldw ...
'' is more frequently a misspelling of ''bath'' or ''bat'' than a reference to the Thai currency. Hence, it would typically be more useful if a few people who write about Thai currency were slightly inconvenienced than if the spelling errors of the many more people who discuss baths were overlooked. The first MS-DOS spell checkers were mostly used in proofing mode from within word processing packages. After preparing a document, a user scanned the text looking for misspellings. Later, however, batch processing was offered in such packages as
Oracle An oracle is a person or agency considered to provide wise and insightful counsel or prophetic predictions, most notably including precognition of the future, inspired by deities. As such, it is a form of divination. Description The word '' ...
's short-lived CoAuthor and allowed a user to view the results after a document was processed and correct only the words that were known to be wrong. When memory and processing power became abundant, spell checking was performed in the background in an interactive way, such as has been the case with the Sector Software produced Spellbound program released in 1987 and
Microsoft Word Microsoft Word is a word processing software developed by Microsoft. It was first released on October 25, 1983, under the name ''Multi-Tool Word'' for Xenix systems. Subsequent versions were later written for several other platforms includin ...
since Word 95. Spell checkers became increasingly sophisticated; now capable of recognizing
grammatical In linguistics, grammaticality is determined by the conformity to language usage as derived by the grammar of a particular variety (linguistics), speech variety. The notion of grammaticality rose alongside the theory of generative grammar, the go ...
errors. However, even at their best, they rarely catch all the errors in a text (such as
homophone A homophone () is a word that is pronounced the same (to varying extent) as another word but differs in meaning. A ''homophone'' may also differ in spelling. The two words may be spelled the same, for example ''rose'' (flower) and ''rose'' (p ...
errors) and will flag
neologism A neologism Greek νέο- ''néo''(="new") and λόγος /''lógos'' meaning "speech, utterance"] is a relatively recent or isolated term, word, or phrase that may be in the process of entering common use, but that has not been fully accepted int ...
s and foreign words as misspellings. Nonetheless, spell checkers can be considered as a type of
foreign language writing aid A foreign language writing aid is a computer program or any other instrument that assists a non-native language user (also referred to as a foreign language learner) in writing decently in their target language. Assistive operations can be classifie ...
that non-native language learners can rely on to detect and correct their misspellings in the target language.


Spell-checking for languages other than English

English is unusual in that most words used in formal writing have a single spelling that can be found in a typical dictionary, with the exception of some jargon and modified words. In many languages, words are often
concatenated In formal language theory and computer programming, string concatenation is the operation of joining character strings end-to-end. For example, the concatenation of "snow" and "ball" is "snowball". In certain formalisations of concatenat ...
into new combinations of words. In German, compound nouns are frequently coined from other existing nouns. Some scripts do not clearly separate one word from another, requiring word-splitting algorithms. Each of these presents unique challenges to non-English language spell checkers.


Context-sensitive spell checkers

There has been research on developing algorithms that are capable of recognizing a misspelled word, even if the word itself is in the vocabulary, based on the
context Context may refer to: * Context (language use), the relevant constraints of the communicative situation that influence language use, language variation, and discourse summary Computing * Context (computing), the virtual environment required to su ...
of the surrounding words. Not only does this allow words such as those in the poem above to be caught, but it mitigates the detrimental effect of enlarging dictionaries, allowing more words to be recognized. For example, ''
baht The baht (; th, บาท, ; currency sign, sign: ฿; ISO 4217, code: THB) is the official currency of Thailand. It is divided into 100 ''satang'' (, ). The issuance of currency is the responsibility of the Bank of Thailand. Society for Worldw ...
'' in the same paragraph as ''Thai'' or ''Thailand'' would not be recognized as a misspelling of ''bath''. The most common example of errors caught by such a system are
homophone A homophone () is a word that is pronounced the same (to varying extent) as another word but differs in meaning. A ''homophone'' may also differ in spelling. The two words may be spelled the same, for example ''rose'' (flower) and ''rose'' (p ...
errors, such as the bold words in the following sentence: :Their coming too sea if its reel. The most successful algorithm to date is Andrew Golding and Dan Roth's "
Winnow Winnowing is a process by which chaff is separated from grain. It can also be used to remove pests from stored grain. Winnowing usually follows threshing in grain preparation. In its simplest form, it involves throwing the mixture into the ...
-based spelling correction algorithm", published in 1999, which is able to recognize about 96% of context-sensitive spelling errors, in addition to ordinary non-word spelling errors. Context-sensitive spell checkers appeared in the now-defunct applications
Microsoft Office 2007 Microsoft Office 2007 (codenamed Office 12) is an office suite for Windows, developed and published by Microsoft. It was officially revealed on March 9, 2006 and was the 12th version of Microsoft Office. It was released to manufacturing on Novemb ...
and
Google Wave Google Wave, later known as Apache Wave, was a software framework for real-time collaborative editing online. Originally developed by Google and announced on May 28, 2009, it was renamed to ''Apache Wave'' when the project was adopted by the Apa ...
.
Grammar checker A grammar checker, in computing terms, is a program, or part of a program, that attempts to verify written text for grammatical correctness. Grammar checkers are most often implemented as a feature of a larger program, such as a word processor, b ...
s attempt to fix problems with grammar beyond spelling errors, including incorrect choice of words.


See also

*
Cupertino effect The Cupertino effect occurs when a spell checker erroneously replaces correctly spelled words that are not in its dictionary. Origin This term refers to the unhyphenated English word "cooperation" often being changed to "Cupertino" by older sp ...
*
Grammar checker A grammar checker, in computing terms, is a program, or part of a program, that attempts to verify written text for grammatical correctness. Grammar checkers are most often implemented as a feature of a larger program, such as a word processor, b ...
*
Record linkage problem Record linkage (also known as data matching, data linkage, entity resolution, and many other terms) is the task of finding records in a data set that refer to the same entity across different data sources (e.g., data files, books, websites, and da ...
*
Spelling suggestion Spelling suggestion is a feature of many computer software applications used to suggest plausible replacements for words that are likely to have been misspelled. ''Spelling suggestion'' features are commonly included in Internet search engines, wor ...
*
Words (Unix) words is a standard file on Unix and Unix-like operating systems, and is simply a newline-delimited list of dictionary words. It is used, for instance, by spell-checking programs. The words file is usually stored in or . On Debian and Ubuntu ...
*
Autocorrection Autocorrection, also known as text replacement, replace-as-you-type or simply autocorrect, is an automatic data validation function commonly found in word processors and text editing interfaces for smartphones and tablet computers. Its principal ...
*
LanguageTool LanguageTool is a free and open-source grammar, style, and spell checker, and all its features are available for download. LanguageTool website connects to a proprietary sister project LanguageTool Plus, which provides improved error detection fo ...


References


External links

*
Norvig.com
"How to Write a Spelling Corrector", by
Peter Norvig Peter Norvig (born December 14, 1956) is an American computer scientist and Distinguished Education Fellow at the Stanford Institute for Human-Centered AI. He previously served as a director of research and search quality at Google. Norvig is t ...

BBK.ac.uk
"Spellchecking by computer", by Roger Mitton
CBSNews.com
Spell-Check Crutch Curtails Correctness, by Lloyd de Vries
History and text of "Candidate for a Pullet Surprise" by Mark Eckman and Jerrold H. Zar
{{DEFAULTSORT:Spell Checker * Text editor features Spelling Natural language processing