HOME

TheInfoList



OR:

Statistical machine translation (SMT) is a
machine translation Machine translation, sometimes referred to by the abbreviation MT (not to be confused with computer-aided translation, machine-aided human translation or interactive translation), is a sub-field of computational linguistics that investigates t ...
paradigm In science and philosophy, a paradigm () is a distinct set of concepts or thought patterns, including theories, research methods, postulates, and standards for what constitute legitimate contributions to a field. Etymology ''Paradigm'' comes f ...
where translations are generated on the basis of
statistical model A statistical model is a mathematical model that embodies a set of statistical assumptions concerning the generation of Sample (statistics), sample data (and similar data from a larger Statistical population, population). A statistical model repres ...
s whose parameters are derived from the analysis of bilingual
text corpora In linguistics, a corpus (plural ''corpora'') or text corpus is a language resource consisting of a large and structured set of texts (nowadays usually electronically stored and processed). In corpus linguistics, they are used to do statistical ...
. The statistical approach contrasts with the rule-based approaches to machine translation as well as with
example-based machine translation Example-based machine translation (EBMT) is a method of machine translation often characterized by its use of a bilingual corpus with parallel texts as its main knowledge base at run-time. It is essentially a translation by analogy and can be vi ...
, and has more recently been superseded by
neural machine translation Neural machine translation (NMT) is an approach to machine translation that uses an artificial neural network to predict the likelihood of a sequence of words, typically modeling entire sentences in a single integrated model. Properties They requi ...
in many applications (see this article's final section). The first ideas of statistical machine translation were introduced by
Warren Weaver Warren Weaver (July 17, 1894 – November 24, 1978) was an American scientist, mathematician, and science administrator. He is widely recognized as one of the pioneers of machine translation and as an important figure in creating support for scien ...
in 1949, including the ideas of applying
Claude Shannon Claude Elwood Shannon (April 30, 1916 – February 24, 2001) was an American people, American mathematician, electrical engineering, electrical engineer, and cryptography, cryptographer known as a "father of information theory". As a 21-year-o ...
's
information theory Information theory is the scientific study of the quantification (science), quantification, computer data storage, storage, and telecommunication, communication of information. The field was originally established by the works of Harry Nyquist a ...
. Statistical machine translation was re-introduced in the late 1980s and early 1990s by researchers at IBM's
Thomas J. Watson Research Center The Thomas J. Watson Research Center is the headquarters for IBM Research. The center comprises three sites, with its main laboratory in Yorktown Heights, New York, U.S., 38 miles (61 km) north of New York City, Albany, New York and wit ...
and has contributed to the significant resurgence in interest in machine translation in recent years. Before the introduction of neural machine translation, it was by far the most widely studied machine translation method.


Basis

The idea behind statistical machine translation comes from
information theory Information theory is the scientific study of the quantification (science), quantification, computer data storage, storage, and telecommunication, communication of information. The field was originally established by the works of Harry Nyquist a ...
. A document is translated according to the
probability distribution In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment. It is a mathematical description of a random phenomenon i ...
p(e, f) that a string e in the target language (for example, English) is the translation of a string f in the source language (for example, French). The problem of modeling the probability distribution p(e, f) has been approached in a number of ''ways''. One approach which lends itself well to computer implementation is to apply
Bayes Theorem In probability theory and statistics, Bayes' theorem (alternatively Bayes' law or Bayes' rule), named after Thomas Bayes, describes the probability of an event, based on prior knowledge of conditions that might be related to the event. For examp ...
, that is p(e, f) \propto p(f, e) p(e), where the translation model p(f, e) is the probability that the source string is the translation of the target string, and the
language model A language model is a probability distribution over sequences of words. Given any sequence of words of length , a language model assigns a probability P(w_1,\ldots,w_m) to the whole sequence. Language models generate probabilities by training on ...
p(e) is the probability of seeing that target language string. This decomposition is attractive as it splits the problem into two subproblems. Finding the best translation \tilde is done by picking up the one that gives the highest probability: : \tilde = arg \max_ p(e, f) = arg \max_ p(f, e) p(e) . For a rigorous implementation of this one would have to perform an exhaustive search by going through all strings e^* in the native language. Performing the search efficiently is the work of a machine translation decoder that uses the foreign string, heuristics and other methods to limit the search space and at the same time keeping acceptable quality. This trade-off between quality and time usage can also be found in
speech recognition Speech recognition is an interdisciplinary subfield of computer science and computational linguistics that develops methodologies and technologies that enable the recognition and translation of spoken language into text by computers with the m ...
. As the translation systems are not able to store all native strings and their translations, a document is typically translated sentence by sentence, but even this is not enough. Language models are typically approximated by smoothed ''n''-gram models, and similar approaches have been applied to translation models, but there is additional complexity due to different sentence lengths and word orders in the languages. The statistical translation models were initially
word A word is a basic element of language that carries an semantics, objective or pragmatics, practical semantics, meaning, can be used on its own, and is uninterruptible. Despite the fact that language speakers often have an intuitive grasp of w ...
based (Models 1-5 from IBM
Hidden Markov model A hidden Markov model (HMM) is a statistical Markov model in which the system being modeled is assumed to be a Markov process — call it X — with unobservable ("''hidden''") states. As part of the definition, HMM requires that there be an ob ...
from Stephan Vogel and Model 6 from Franz-Joseph Och), but significant advances were made with the introduction of
phrase In syntax and grammar, a phrase is a group of words or singular word acting as a grammatical unit. For instance, the English expression "the very happy squirrel" is a noun phrase which contains the adjective phrase "very happy". Phrases can consi ...
based models. Later work incorporated
syntax In linguistics, syntax () is the study of how words and morphemes combine to form larger units such as phrases and sentences. Central concerns of syntax include word order, grammatical relations, hierarchical sentence structure ( constituency) ...
or quasi-syntactic structures.D. Chiang (2005)
A Hierarchical Phrase-Based Model for Statistical Machine Translation
In ''Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL'05)''.


Benefits

The most frequently cited benefits of statistical machine translation over rule-based approach are: * More efficient use of human and data resources **There are many
parallel corpora A parallel text is a text placed alongside its translation or translations. Parallel text alignment is the identification of the corresponding sentences in both halves of the parallel text. The Loeb Classical Library and the Clay Sanskrit Libr ...
in machine-readable format and even more monolingual data. **Generally, SMT systems are not tailored to any specific pair of languages. **Rule-based translation systems require the manual development of linguistic rules, which can be costly, and which often do not generalize to other languages. * More fluent translations owing to use of a language model


Shortcomings

* Corpus creation can be costly. * Specific errors are hard to predict and fix. * Results may have superficial fluency that masks translation problems. * Statistical machine translation usually works less well for language pairs with significantly different word order. * The benefits obtained for translation between Western European languages are not representative of results for other language pairs, owing to smaller training corpora and greater grammatical differences.lx


Word-based translation

In word-based translation, the fundamental unit of translation is a word in some natural language. Typically, the number of words in translated sentences are different, because of compound words, morphology and idioms. The ratio of the lengths of sequences of translated words is called fertility, which tells how many foreign words each native word produces. Necessarily it is assumed by information theory that each covers the same concept. In practice this is not really true. For example, the English word ''corner'' can be translated in Spanish by either ''rincón'' or ''esquina'', depending on whether it is to mean its internal or external angle. Simple word-based translation can't translate between languages with different fertility. Word-based translation systems can relatively simply be made to cope with high fertility, such that they could map a single word to multiple words, but not the other way about. For example, if we were translating from English to French, each word in English could produce any number of French words— sometimes none at all. But there's no way to group two English words producing a single French word. An example of a word-based translation system is the freely available GIZA++ package (
GPL The GNU General Public License (GNU GPL or simply GPL) is a series of widely used free software licenses that guarantee end users the four freedoms to run, study, share, and modify the software. The license was the first copyleft for general u ...
ed), which includes the training program for IBM models and HMM model and Model 6. The word-based translation is not widely used today; phrase-based systems are more common. Most phrase-based system are still using GIZA++ to align the corpus. The alignments are used to extract phrases or deduce syntax rules. And matching words in bi-text is still a problem actively discussed in the community. Because of the predominance of GIZA++, there are now several distributed implementations of it online.


Phrase-based translation

In phrase-based translation, the aim is to reduce the restrictions of word-based translation by translating whole sequences of words, where the lengths may differ. The sequences of words are called blocks or phrases, but typically are not linguistic
phrase In syntax and grammar, a phrase is a group of words or singular word acting as a grammatical unit. For instance, the English expression "the very happy squirrel" is a noun phrase which contains the adjective phrase "very happy". Phrases can consi ...
s, but
phraseme A phraseme, also called a set phrase, idiomatic phrase, multi-word expression (in computational linguistics), or idiom, is a multi-word or multi-morphemic utterance whose components include at least one that is selectionally constrained or restric ...
s found using statistical methods from corpora. It has been shown that restricting the phrases to linguistic phrases (syntactically motivated groups of words, see
syntactic categories A syntactic category is a syntactic unit that theories of syntax assume. Word classes, largely corresponding to traditional parts of speech (e.g. noun, verb, preposition, etc.), are syntactic categories. In phrase structure grammars, the ''phrasal c ...
) decreases the quality of translation. The chosen phrases are further mapped one-to-one based on a phrase translation table, and may be reordered. This table can be learnt based on word-alignment, or directly from a parallel corpus. The second model is trained using the expectation maximization algorithm, similarly to the word-based IBM model.


Syntax-based translation

Syntax-based translation is based on the idea of translating
syntactic In linguistics, syntax () is the study of how words and morphemes combine to form larger units such as phrases and sentences. Central concerns of syntax include word order, grammatical relations, hierarchical sentence structure (constituency), ...
units, rather than single words or strings of words (as in phrase-based MT), i.e. (partial)
parse tree A parse tree or parsing tree or derivation tree or concrete syntax tree is an ordered, rooted tree that represents the syntactic structure of a string according to some context-free grammar. The term ''parse tree'' itself is used primarily in co ...
s of sentences/utterances. The idea of syntax-based translation is quite old in MT, though its statistical counterpart did not take off until the advent of strong stochastic parsers in the 1990s. Examples of this approach include DOP-based MT and, more recently,
synchronous context-free grammar Synchronous context-free grammars (SynCFG or SCFG; not to be confused with stochastic CFGs) are a type of formal grammar designed for use in transfer-based machine translation Transfer-based machine translation is a type of machine translation ( ...
s.


Hierarchical phrase-based translation

Hierarchical phrase-based translation combines the strengths of phrase-based and syntax-based translation. It uses
synchronous context-free grammar Synchronous context-free grammars (SynCFG or SCFG; not to be confused with stochastic CFGs) are a type of formal grammar designed for use in transfer-based machine translation Transfer-based machine translation is a type of machine translation ( ...
rules, but the grammars may be constructed by an extension of methods for phrase-based translation without reference to linguistically motivated syntactic constituents. This idea was first introduced in Chiang's Hiero system (2005).


Language models

A
language model A language model is a probability distribution over sequences of words. Given any sequence of words of length , a language model assigns a probability P(w_1,\ldots,w_m) to the whole sequence. Language models generate probabilities by training on ...
is an essential component of any statistical machine translation system, which aids in making the translation as fluent as possible. It is a function that takes a translated sentence and returns the probability of it being said by a native speaker. A good language model will for example assign a higher probability to the sentence "the house is small" than to "small the is house". Other than
word order In linguistics, word order (also known as linear order) is the order of the syntactic constituents of a language. Word order typology studies it from a cross-linguistic perspective, and examines how different languages employ different orders. C ...
, language models may also help with word choice: if a foreign word has multiple possible translations, these functions may give better probabilities for certain translations in specific contexts in the target language.


Challenges with statistical machine translation

Problems that statistical machine translation have to deal with include:


Sentence alignment

In parallel corpora single sentences in one language can be found translated into several sentences in the other and vice versa. Long sentences may be broken up, short sentences may be merged. There are even some languages that use writing systems without clear indication of a sentence end (for example, Thai). Sentence aligning can be performed through the Gale-Church alignment algorithm. Through this and other mathematical models efficient search and retrieval of the highest scoring sentence alignment is possible.


Word alignment

Sentence alignment is usually either provided by the corpus or obtained by aforementioned Gale-Church alignment algorithm. To learn e.g. the translation model, however, we need to know which words align in a source-target sentence pair. Solutions are the IBM-Models or the HMM-approach. One of the problems presented is function words that have no clear equivalent in the target language. For example, when translating from English to German the sentence "John does not live here," the word "does" doesn't have a clear alignment in the translated sentence "John wohnt hier nicht." Through logical reasoning, it may be aligned with the words "wohnt" (as in English it contains grammatical information for the word "live") or "nicht" (as it only appears in the sentence because it is negated) or it may be unaligned.


Statistical anomalies

Real-world training sets may override translations of, say, proper nouns. An example would be that "I took the train to Berlin" gets mis-translated as "I took the train to Paris" due to an abundance of "train to Paris" in the training set.


Idioms

Depending on the corpora used, idioms may not translate "idiomatically". For example, using Canadian Hansard as the bilingual corpus, "hear" may almost invariably be translated to "Bravo!" since in Parliament "Hear, Hear!" becomes "Bravo!". This problem is connected with word alignment, as in very specific contexts the idiomatic expression may align with words that result in an idiomatic expression of the same meaning in the target language. However, it is unlikely, as the alignment usually doesn't work in any other contexts. For that reason, idioms should only be subjected to phrasal alignment, as they cannot be decomposed further without losing their meaning. This problem is therefore specific for word-based translation.


Different word orders

Word order in languages differ. Some classification can be done by naming the typical order of subject (S), verb (V) and object (O) in a sentence and one can talk, for instance, of SVO or VSO languages. There are also additional differences in word orders, for instance, where modifiers for nouns are located, or where the same words are used as a question or a statement. In
speech recognition Speech recognition is an interdisciplinary subfield of computer science and computational linguistics that develops methodologies and technologies that enable the recognition and translation of spoken language into text by computers with the m ...
, the speech signal and the corresponding textual representation can be mapped to each other in blocks in order. This is not always the case with the same text in two languages. For SMT, the machine translator can only manage small sequences of words, and word order has to be thought of by the program designer. Attempts at solutions have included re-ordering models, where a distribution of location changes for each item of translation is guessed from aligned bi-text. Different location changes can be ranked with the help of the language model and the best can be selected. Recently,
Skype Skype () is a proprietary telecommunications application operated by Skype Technologies, a division of Microsoft, best known for VoIP-based videotelephony, videoconferencing and voice calls. It also has instant messaging, file transfer, deb ...
voice communicator started testing speech translation. However, machine translation is following technological trends in speech at a slower rate than speech recognition. In fact, some ideas from speech recognition research have been adopted by statistical machine translation.


Out of vocabulary (OOV) words

SMT systems typically store different word forms as separate symbols without any relation to each other and word forms or phrases that were not in the training data cannot be translated. This might be because of the lack of training data, changes in the human domain where the system is used, or differences in morphology.


Mobile devices

The rapid increase in the computing power of tablets and
smartphone A smartphone is a portable computer device that combines mobile telephone and computing functions into one unit. They are distinguished from feature phones by their stronger hardware capabilities and extensive mobile operating systems, whic ...
s, combined with the wide availability of high-speed mobile Internet access, makes it possible for them to run machine translation systems. Experimental systems have already been developed to assist foreign health workers in developing countries. Similar systems are already available on the market. For example,
Apple An apple is an edible fruit produced by an apple tree (''Malus domestica''). Apple fruit tree, trees are agriculture, cultivated worldwide and are the most widely grown species in the genus ''Malus''. The tree originated in Central Asia, wh ...
’s
iOS 8 iOS 8 is the eighth major release of the iOS mobile operating system developed by Apple Inc., being the successor to iOS 7. It was announced at the company's Worldwide Developers Conference on June 2, 2014, and was released on September 17, ...
allows users to dictate
text messages Text messaging, or texting, is the act of composing and sending electronic messages, typically consisting of alphabetic and numeric characters, between two or more users of mobile devices, desktops/laptops, or another type of compatible compute ...
. A built-in
ASR The Asr prayer ( ar, صلاة العصر ', "afternoon prayer") is one of the five mandatory salah (Islamic prayer). As an Islamic day starts at sunset, the Asr prayer is technically the fifth prayer of the day. If counted from midnight, it is ...
system recognizes the speech and the recognition results are edited by an online system. Projects such as Universal Speech Translation Advanced Research (U-STAR1, a continuation of the A-STAR project) and EU-BRIDGE2 are currently conducting research in translation of full sentences recognized from spoken language. Recent years have seen a growing interest in combining speech recognition, machine translation and
speech synthesis Speech synthesis is the artificial production of human speech. A computer system used for this purpose is called a speech synthesizer, and can be implemented in software or hardware products. A text-to-speech (TTS) system converts normal languag ...
. To achieve speech-to-speech translation, n-best lists are passed from the ASR to the statistical machine translation system. However, combining those systems raises problems of how to achieve sentence segmentation, de-normalization and punctuation prediction needed for quality translations.


Systems implementing statistical machine translation

*
Google Translate Google Translate is a multilingual neural machine translation service developed by Google to translate text, documents and websites from one language into another. It offers a website interface, a mobile app for Android and iOS, and an API t ...
(started transition to neural machine translation in 2016) *
Microsoft Translator Microsoft Translator is a multilingual machine translation cloud service provided by Microsoft. Microsoft Translator is a part of Microsoft Cognitive Services and integrated across multiple consumer, developer, and enterprise products; including B ...
(started transition to neural machine translation in 2016) * SYSTRAN (started transition to neural machine translation in 2016) *
Yandex.Translate Yandex Translate (russian: Яндекс Переводчик, r=Yandeks Perevodchik) is a web service provided by Yandex, intended for the translation of text or web pages into another language. The service uses a self-learning statistical ma ...
(switched to hybrid approach incorporating neural machine translation in 2017)


See also

* AppTek *
Cache language model A cache language model is a type of statistical language model. These occur in the natural language processing subfield of computer science and assign probabilities to given sequences of words by means of a probability distribution. Statistical lan ...
*
Duolingo Duolingo ( ) is an American educational technology company which produces learning apps and provides language certification. On its main app, users can practice vocabulary, grammar, pronunciation and listening skills using spaced repetition. D ...
* Europarl corpus *
Example-based machine translation Example-based machine translation (EBMT) is a method of machine translation often characterized by its use of a bilingual corpus with parallel texts as its main knowledge base at run-time. It is essentially a translation by analogy and can be vie ...
*
Google Translate Google Translate is a multilingual neural machine translation service developed by Google to translate text, documents and websites from one language into another. It offers a website interface, a mobile app for Android and iOS, and an API t ...
* Hybrid machine translation *
Microsoft Translator Microsoft Translator is a multilingual machine translation cloud service provided by Microsoft. Microsoft Translator is a part of Microsoft Cognitive Services and integrated across multiple consumer, developer, and enterprise products; including B ...
*
Moses (machine translation) Moses is a free software, statistical machine translation engine that can be used to train statistical models of text translation from a source language to a target language, developed by the University of Edinburgh. Moses then allows new source-l ...
, free software *
Rule-based machine translation Rule-based machine translation (RBMT; "Classical Approach" of MT) is machine translation systems based on linguistic information about source and target languages basically retrieved from (unilingual, bilingual or multilingual) dictionaries and gram ...
* SDL Language Weaver *
Statistical parsing Statistical parsing is a group of parsing methods within natural language processing. The methods have in common that they associate grammar rules with a probability. Grammar rules are traditionally viewed in computational linguistics as defining ...


Notes and references


External links


Statistical Machine Translation
— includes introduction to research, conference, corpus and software listings
Moses: a state-of-the-art open source SMT system

Web-based translation
— a statistical machine translation tool

— Includes links to freely available statistical machine translation software
Garuda DIKTI
— an open national journal {{DEFAULTSORT:Statistical Machine Translation Machine translation Statistical natural language processing