Hybrid Machine Translation
   HOME

TheInfoList



OR:

Hybrid machine translation is a method of machine translation that is characterized by the use of multiple machine translation approaches within a single machine translation system. The motivation for developing hybrid machine translation systems stems from the failure of any single technique to achieve a satisfactory level of accuracy. Many hybrid machine translation systems have been successful in improving the accuracy of the translations, and there are several popular machine translation systems which employ hybrid methods.


Approaches


Multi-engine

This approach to hybrid machine translation involves running multiple machine translation systems in parallel. The final output is generated by combining the output of all the sub-systems. Most commonly, these systems use statistical and rule-based translation subsystems,Hutchins, J. 2007
Machine translation: A concise history
. Computer-aided translation: Theory and practice.
but other combinations have been explored. For example, researchers at
Carnegie Mellon University Carnegie Mellon University (CMU) is a private research university in Pittsburgh, Pennsylvania. One of its predecessors was established in 1900 by Andrew Carnegie as the Carnegie Technical Schools; it became the Carnegie Institute of Technology ...
have had some success combining example-based, transfer-based, knowledge-based and
statistical Statistics (from German: ''Statistik'', "description of a state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a scientific, industria ...
translation sub-systems into one machine translation system.


Statistical rule generation

This approach involves using statistical data to generate lexical and
syntactic In linguistics, syntax () is the study of how words and morphemes combine to form larger units such as phrases and sentences. Central concerns of syntax include word order, grammatical relations, hierarchical sentence structure (constituency), ...
rules. The input is then processed with these rules as if it were a rule-based translator. This approach attempts to avoid the difficult and time-consuming task of creating a set of comprehensive, fine-grained linguistic rules by extracting those rules from the training corpus. This approach still suffers from many problems of normal statistical machine translation, namely that the accuracy of the translation will depend heavily on the similarity of the input text to the text of the training corpus. As a result, this technique has had the most success in domain-specific applications, and has the same difficulties with domain adaptation as many statistical machine translation systems.


Multi-Pass

This approach involves serially processing the input multiple times. The most common technique used in multi-pass machine translation systems is to pre-process the input with a rule-based machine translation system. The output of the rule-based pre-processor is passed to a statistical machine translation system, which produces the final output. This technique is used to limit the amount of information a statistical system need consider, significantly reducing the processing power required. It also removes the need for the rule-based system to be a complete translation system for the language, significantly reducing the amount of human effort and labor necessary to build the system.Hovy, E. 1996
Deepening wisdom or compromised principles?-the hybridization of statistical and symbolic MT systems
IEEE Expert, 11 (2), pp. 16--18.


Confidence-Based

This approach differs from the other hybrid approaches in that in most cases only one translation technology is used. A confidence metric is produced for each translated sentence from which a decision can be made whether to try a secondary translation technology or to proceed with the initial translation output. SMT is also used when common error patterns such as multiple repeat words appear in sequence, as is common with NMT when the attention mechanism is confused.


See also

* Machine translation * Neural machine translation *
Natural language processing Natural language processing (NLP) is an interdisciplinary subfield of linguistics, computer science, and artificial intelligence concerned with the interactions between computers and human language, in particular how to program computers to pro ...


References

* {{Approaches to machine translation Machine translation Machine translation, example-based