Hybrid machine translation is a method of

machine translation Machine translation, sometimes referred to by the abbreviation MT (not to be confused with computer-aided translation, machine-aided human translation or interactive translation), is a sub-field of computational linguistics that investigates t ...

that is characterized by the use of multiple machine translation approaches within a single machine translation system. The motivation for developing hybrid machine translation systems stems from the failure of any single technique to achieve a satisfactory level of accuracy. Many hybrid machine translation systems have been successful in improving the accuracy of the translations, and there are several popular machine translation systems which employ hybrid methods.

Approaches

Multi-engine

This approach to hybrid machine translation involves running multiple machine translation systems in parallel. The final output is generated by combining the output of all the sub-systems. Most commonly, these systems use statistical and rule-based translation subsystems,Hutchins, J. 2007
Machine translation: A concise history
. Computer-aided translation: Theory and practice. but other combinations have been explored. For example, researchers at

Carnegie Mellon University Carnegie Mellon University (CMU) is a private research university in Pittsburgh, Pennsylvania. One of its predecessors was established in 1900 by Andrew Carnegie as the Carnegie Technical Schools; it became the Carnegie Institute of Technology ...

have had some success combining example-based, transfer-based, knowledge-based and

statistical Statistics (from German: '' Statistik'', "description of a state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a scientific, industr ...

translation sub-systems into one machine translation system.

Statistical rule generation

This approach involves using statistical data to generate

lexical Lexical may refer to: Linguistics * Lexical corpus or lexis, a complete set of all words in a language * Lexical item, a basic unit of lexicographical classification * Lexicon, the vocabulary of a person, language, or branch of knowledge * Lexica ...

and

syntactic In linguistics, syntax () is the study of how words and morphemes combine to form larger units such as phrases and sentences. Central concerns of syntax include word order, grammatical relations, hierarchical sentence structure ( constituenc ...

rules. The input is then processed with these rules as if it were a rule-based translator. This approach attempts to avoid the difficult and time-consuming task of creating a set of comprehensive, fine-grained linguistic rules by extracting those rules from the training corpus. This approach still suffers from many problems of normal

statistical machine translation Statistical machine translation (SMT) is a machine translation paradigm where translations are generated on the basis of statistical models whose parameters are derived from the analysis of bilingual text corpora. The statistical approach contrast ...

, namely that the accuracy of the translation will depend heavily on the similarity of the input text to the text of the training corpus. As a result, this technique has had the most success in domain-specific applications, and has the same difficulties with domain adaptation as many

systems.

Multi-Pass

This approach involves serially processing the input multiple times. The most common technique used in multi-pass machine translation systems is to pre-process the input with a rule-based machine translation system. The output of the rule-based pre-processor is passed to a

system, which produces the final output. This technique is used to limit the amount of information a statistical system need consider, significantly reducing the processing power required. It also removes the need for the rule-based system to be a complete translation system for the language, significantly reducing the amount of human effort and labor necessary to build the system.Hovy, E. 1996
Deepening wisdom or compromised principles?-the hybridization of statistical and symbolic MT systems
IEEE Expert, 11 (2), pp. 16--18.

Confidence-Based

This approach differs from the other hybrid approaches in that in most cases only one translation technology is used. A confidence metric is produced for each translated sentence from which a decision can be made whether to try a secondary translation technology or to proceed with the initial translation output. SMT is also used when common error patterns such as multiple repeat words appear in sequence, as is common with NMT when the attention mechanism is confused.

References

* {{Approaches to machine translation Machine translation Machine translation, example-based