HOME

TheInfoList



OR:

In
software Software consists of computer programs that instruct the Execution (computing), execution of a computer. Software also includes design documents and specifications. The history of software is closely tied to the development of digital comput ...
, a spell checker (or spelling checker or spell check) is a
software feature A feature is "a prominent or distinctive user-visible aspect, quality, or characteristic of a software system or systems", as defined by Kang et al. At the implementation level, "it is a structure that extends and modifies the structure of a give ...
that checks for misspellings in a
text Text may refer to: Written word * Text (literary theory) In literary theory, a text is any object that can be "read", whether this object is a work of literature, a street sign, an arrangement of buildings on a city block, or styles of clothi ...
. Spell-checking features are often embedded in
software Software consists of computer programs that instruct the Execution (computing), execution of a computer. Software also includes design documents and specifications. The history of software is closely tied to the development of digital comput ...
or services, such as a
word processor A word processor (WP) is a device or computer program that provides for input, editing, formatting, and output of text, often with some additional features. Early word processors were stand-alone devices dedicated to the function, but current word ...
, email client, electronic
dictionary A dictionary is a listing of lexemes from the lexicon of one or more specific languages, often arranged Alphabetical order, alphabetically (or by Semitic root, consonantal root for Semitic languages or radical-and-stroke sorting, radical an ...
, or
search engine A search engine is a software system that provides hyperlinks to web pages, and other relevant information on World Wide Web, the Web in response to a user's web query, query. The user enters a query in a web browser or a mobile app, and the sea ...
.


Design

A basic spell checker carries out the following processes: * It scans the text and extracts the words contained in it. * It then compares each word with a known list of correctly spelled words (i.e. a dictionary). This might contain just a list of words, or it might also contain additional information, such as hyphenation points or lexical and grammatical attributes. * An additional step is a language-dependent algorithm for handling morphology. Even for a lightly inflected language like English, the spell checker will need to consider different forms of the same word, such as plurals, verbal forms, contractions, and
possessive A possessive or ktetic form (Glossing abbreviation, abbreviated or ; from ; ) is a word or grammatical construction indicating a relationship of possession (linguistics), possession in a broad sense. This can include strict ownership, or a numbe ...
s. For many other languages, such as those featuring agglutination and more complex declension and conjugation, this part of the process is more complicated. It is unclear whether morphological analysis—allowing for many forms of a word depending on its grammatical role—provides a significant benefit for English, though its benefits for highly
synthetic language A synthetic language is a language that is characterized by denoting syntactic relationships between words via inflection or agglutination. Synthetic languages are statistically characterized by a higher morpheme-to-word ratio relative to an ...
s such as German, Hungarian, or Turkish are clear. As an adjunct to these components, the program's
user interface In the industrial design field of human–computer interaction, a user interface (UI) is the space where interactions between humans and machines occur. The goal of this interaction is to allow effective operation and control of the machine fro ...
allows users to approve or reject replacements and modify the program's operation. Spell checkers can use approximate string matching algorithms such as Levenshtein distance to find correct spellings of misspelled words. An alternative type of spell checker uses solely statistical information, such as n-grams, to recognize errors instead of correctly-spelled words. This approach usually requires a lot of effort to obtain sufficient statistical information. Key advantages include needing less runtime storage and the ability to correct errors in words that are not included in a dictionary. In some cases, spell checkers use a fixed list of misspellings and suggestions for those misspellings; this less flexible approach is often used in paper-based correction methods, such as the ''see also'' entries of encyclopedias. Clustering algorithms have also been used for spell checking combined with phonetic information.


History


Pre-PC

In 1961, Les Earnest, who headed the research on this budding technology, saw it necessary to include the first spell checker that accessed a list of 10,000 acceptable words. Ralph Gorin, a graduate student under Earnest at the time, created the first true spelling checker program written as an applications program (rather than research) for general English text: SPELL for the DEC PDP-10 at Stanford University's Artificial Intelligence Laboratory, in February 1971. Gorin wrote SPELL in
assembly language In computing, assembly language (alternatively assembler language or symbolic machine code), often referred to simply as assembly and commonly abbreviated as ASM or asm, is any low-level programming language with a very strong correspondence bet ...
, for faster action; he made the first spelling corrector by searching the word list for plausible correct spellings that differ by a single letter or adjacent letter transpositions and presenting them to the user. Gorin made SPELL publicly accessible, as was done with most SAIL (Stanford Artificial Intelligence Laboratory) programs, and it soon spread around the world via the new ARPAnet, about ten years before personal computers came into general use. SPELL, its algorithms and data structures inspired the Unix ''ispell'' program. The first spell checkers were widely available on mainframe computers in the late 1970s. A group of six linguists from
Georgetown University Georgetown University is a private university, private Jesuit research university in Washington, D.C., United States. Founded by Bishop John Carroll (archbishop of Baltimore), John Carroll in 1789, it is the oldest Catholic higher education, Ca ...
developed the first spell-check system for the IBM corporation., citation: "Maria Mariani... was one of a group of six linguists from Georgetown University who developed the first spell-check system for the IBM corporation." Henry Kučera invented one for the VAX machines of Digital Equipment Corp in 1981.


Unix

The International Ispell program commonly used in Unix is based on R. E. Gorin's SPELL. It was converted to C by Pace Willisson at MIT. The GNU project has its spell checker
GNU Aspell GNU Aspell, usually called just Aspell, is a free software spell checker designed to replace Ispell. It is the standard spell checker for the GNU operating system. It also compiler, compiles for other Unix-like operating systems and Microsoft Win ...
. Aspell's main improvement is that it can more accurately suggest correct alternatives for misspelled English words. Due to the inability of traditional spell checkers to check words in complex inflected languages, Hungarian László Németh developed Hunspell, a spell checker that supports agglutinative languages and complex compound words. Hunspell also uses Unicode in its dictionaries. Hunspell replaced the previous MySpell in OpenOffice.org in version 2.0.2. Enchant is another general spell checker, derived from AbiWord. Its goal is to combine programs supporting different languages such as Aspell, Hunspell, Nuspell, Hspell (Hebrew), Voikko (Finnish), Zemberek (Turkish) and AppleSpell under one interface.


PCs

The first spell checkers for personal computers appeared in 1980, such as "WordCheck" for Commodore systems which was released in late 1980 in time for advertisements to go to print in January 1981. Developers such as Maria Mariani and
Random House Random House is an imprint and publishing group of Penguin Random House. Founded in 1927 by businessmen Bennett Cerf and Donald Klopfer as an imprint of Modern Library, it quickly overtook Modern Library as the parent imprint. Over the foll ...
rushed OEM packages or end-user products into the rapidly expanding software market. On the pre-Windows PCs, these spell checkers were standalone programs, many of which could be run in terminate-and-stay-resident mode from within word-processing packages on PCs with sufficient memory. However, the market for standalone packages was short-lived, as by the mid-1980s developers of popular word-processing packages like WordStar and WordPerfect had incorporated spell checkers in their packages, mostly licensed from the above companies, who quickly expanded support from just English to many
Europe Europe is a continent located entirely in the Northern Hemisphere and mostly in the Eastern Hemisphere. It is bordered by the Arctic Ocean to the north, the Atlantic Ocean to the west, the Mediterranean Sea to the south, and Asia to the east ...
an and eventually even Asian languages. However, this required increasing sophistication in the morphology routines of the software, particularly with regard to heavily- agglutinative languages like Hungarian and Finnish. Although the size of the word-processing market in a country like
Iceland Iceland is a Nordic countries, Nordic island country between the Atlantic Ocean, North Atlantic and Arctic Oceans, on the Mid-Atlantic Ridge between North America and Europe. It is culturally and politically linked with Europe and is the regi ...
might not have justified the investment of implementing a spell checker, companies like WordPerfect nonetheless strove to localize their software for as many national markets as possible as part of their global
marketing Marketing is the act of acquiring, satisfying and retaining customers. It is one of the primary components of Business administration, business management and commerce. Marketing is usually conducted by the seller, typically a retailer or ma ...
strategy. When Apple developed "a system-wide spelling checker" for Mac OS X so that "the operating system took over spelling fixes," it was a first: one "didn't have to maintain a separate spelling checker for each" program.
Mac OS X macOS, previously OS X and originally Mac OS X, is a Unix, Unix-based operating system developed and marketed by Apple Inc., Apple since 2001. It is the current operating system for Apple's Mac (computer), Mac computers. With ...
's spellcheck coverage includes virtually all bundled and third party applications. ''Visual Tools ''VT Speller'', introduced in 1994, was "designed for developers of applications that support Windows." It came with a dictionary but had the ability to build and incorporate use of secondary dictionaries.


Browsers

Web browsers such as
Firefox Mozilla Firefox, or simply Firefox, is a free and open-source web browser developed by the Mozilla Foundation and its subsidiary, the Mozilla Corporation. It uses the Gecko rendering engine to display web pages, which implements curr ...
and
Google Chrome Google Chrome is a web browser developed by Google. It was first released in 2008 for Microsoft Windows, built with free software components from Apple WebKit and Mozilla Firefox. Versions were later released for Linux, macOS, iOS, iPadOS, an ...
offer spell checking support, using Hunspell. Prior to using Hunspell, Firefox and Chrome used MySpell and
GNU Aspell GNU Aspell, usually called just Aspell, is a free software spell checker designed to replace Ispell. It is the standard spell checker for the GNU operating system. It also compiler, compiles for other Unix-like operating systems and Microsoft Win ...
, respectively.


Specialties

Some spell checkers have separate support for medical dictionaries to help prevent medical errors.


Functionality

The first spell checkers were "verifiers" instead of "correctors." They offered no suggestions for incorrectly spelled words. This was helpful for typos but it was not so helpful for logical or phonetic errors. The challenge the developers faced was the difficulty in offering useful suggestions for misspelled words. This requires reducing words to a skeletal form and applying pattern-matching algorithms. It might seem logical that where spell-checking dictionaries are concerned, "the bigger, the better," so that correct words are not marked as incorrect. In practice, however, an optimal size for English appears to be around 90,000 entries. If there are more than this, incorrectly spelled words may be skipped because they are mistaken for others. For example, a linguist might determine on the basis of
corpus linguistics Corpus linguistics is an empirical method for the study of language by way of a text corpus (plural ''corpora''). Corpora are balanced, often stratified collections of authentic, "real world", text of speech or writing that aim to represent a giv ...
that the word '' baht'' is more frequently a misspelling of ''bath'' or ''bat'' than a reference to the Thai currency. Hence, it would typically be more useful if a few people who write about Thai currency were slightly inconvenienced than if the spelling errors of the many more people who discuss baths were overlooked. The first MS-DOS spell checkers were mostly used in proofing mode from within word processing packages. After preparing a document, a user scanned the text looking for misspellings. Later, however, batch processing was offered in such packages as
Oracle An oracle is a person or thing considered to provide insight, wise counsel or prophetic predictions, most notably including precognition of the future, inspired by deities. If done through occultic means, it is a form of divination. Descript ...
's short-lived CoAuthor and allowed a user to view the results after a document was processed and correct only the words that were known to be wrong. When memory and processing power became abundant, spell checking was performed in the background in an interactive way, such as has been the case with the Sector Software produced Spellbound program released in 1987 and
Microsoft Word Microsoft Word is a word processor program, word processing program developed by Microsoft. It was first released on October 25, 1983, under the name Multi-Tool Word for Xenix systems. Subsequent versions were later written for several other platf ...
since Word 95. Spell checkers became increasingly sophisticated; now capable of recognizing grammatical errors. However, even at their best, they rarely catch all the errors in a text (such as
homophone A homophone () is a word that is pronounced the same as another word but differs in meaning or in spelling. The two words may be spelled the same, for example ''rose'' (flower) and ''rose'' (past tense of "rise"), or spelled differently, a ...
errors) and will flag
neologism In linguistics, a neologism (; also known as a coinage) is any newly formed word, term, or phrase that has achieved popular or institutional recognition and is becoming accepted into mainstream language. Most definitively, a word can be considered ...
s and foreign words as misspellings. Nonetheless, spell checkers can be considered as a type of foreign language writing aid that non-native language learners can rely on to detect and correct their misspellings in the target language.


Spell-checking for languages other than English

English is unusual in that most words used in formal writing have a single spelling that can be found in a typical dictionary, with the exception of some jargon and modified words. In many languages, words are often concatenated into new combinations of words. In German, compound nouns are frequently coined from other existing nouns. Some scripts do not clearly separate one word from another, requiring word-splitting algorithms. Each of these presents unique challenges to non-English language spell checkers.


Context-sensitive spell checkers

There has been research on developing algorithms that are capable of recognizing a misspelled word, even if the word itself is in the vocabulary, based on the context of the surrounding words. Not only does this allow words such as those in the poem above to be caught, but it mitigates the detrimental effect of enlarging dictionaries, allowing more words to be recognized. For example, '' baht'' in the same paragraph as ''Thai'' or ''Thailand'' would not be recognized as a misspelling of ''bath''. The most common example of errors caught by such a system are
homophone A homophone () is a word that is pronounced the same as another word but differs in meaning or in spelling. The two words may be spelled the same, for example ''rose'' (flower) and ''rose'' (past tense of "rise"), or spelled differently, a ...
errors, such as the bold words in the following sentence: :Their coming too sea if its reel. The most successful algorithm to date is Andrew Golding and Dan Roth's " Winnow-based spelling correction algorithm", published in 1999, which is able to recognize about 96% of context-sensitive spelling errors, in addition to ordinary non-word spelling errors. Context-sensitive spell checkers appeared in the now-defunct applications Microsoft Office 2007 and Google Wave.
Grammar checker A grammar checker, in computing terms, is a Computer program, program, or part of a program, that attempts to verify written text for grammatical correctness. Grammar checkers are most often implemented as a feature of a larger program, such as a ...
s attempt to fix problems with grammar beyond spelling errors, including incorrect choice of words.


See also

* Cupertino effect *
Grammar checker A grammar checker, in computing terms, is a Computer program, program, or part of a program, that attempts to verify written text for grammatical correctness. Grammar checkers are most often implemented as a feature of a larger program, such as a ...
* Record linkage problem * Spelling suggestion * Words (Unix) *
Autocorrection Autocorrection, also known as text replacement, replace-as-you-type, text expander or simply autocorrect, is an automatic data validation function commonly found in word processors and text editing interfaces for smartphones and tablet computer ...
*
LanguageTool LanguageTool is a Free and open-source software, free and open-source grammar checker, grammar, Writing style, style, and spell checker, and all its features are available for download. The LanguageTool website connects to a Proprietary software, ...


References


External links


Norvig.com
"How to Write a Spelling Corrector", by Peter Norvig
BBK.ac.uk
"Spellchecking by computer", by Roger Mitton
CBSNews.com
Spell-Check Crutch Curtails Correctness, by Lloyd de Vries
History and text of "Candidate for a Pullet Surprise" by Mark Eckman and Jerrold H. Zar
{{Natural Language Processing Text editor features Checker Natural language processing