IBM alignment models are a sequence of increasingly complex models used in
statistical machine translation
Statistical machine translation (SMT) is a machine translation paradigm where translations are generated on the basis of statistical models whose parameters are derived from the analysis of bilingual text corpora. The statistical approach contrast ...
to train a translation model and an alignment model, starting with lexical translation probabilities and moving to reordering and word duplication. They underpinned the majority of statistical machine translation systems for almost twenty years starting in the early 1990s, until
neural machine translation began to dominate. These models offer principled probabilistic formulation and (mostly) tractable inference.
The original work on statistical machine translation at
IBM proposed five models, and a model 6 was proposed later. The sequence of the six models can be summarized as:
* Model 1: lexical translation
* Model 2: additional absolute alignment model
* Model 3: extra fertility model
* Model 4: added relative alignment model
* Model 5: fixed deficiency problem.
* Model 6: Model 4 combined with a
HMM alignment model in a log linear way
Mathematical setup
The IBM alignment models translation as a conditional probability model. For each source-language ("foreign") sentence
, we generate both a target-language ("English") sentence
and an alignment
. The problem then is to find a good statistical model for
, the probability that we would generate English language sentence
and an alignment
given a foreign sentence
.
The meaning of an alignment grows increasingly complicated as the model version number grew. See Model 1 for the most simple and understandable version.
Model 1
Word alignment
Given any foreign-English sentence pair
, an alignment for the sentence pair is a function of type
. That is, we assume that the English word at location
is "explained" by the foreign word at location
. For example, consider the following pair of sentences
It will surely rain tomorrow -- 明日 は きっと 雨 だ
We can align some English words to corresponding Japanese words, but not everyone:
it -> ?
will -> ?
surely -> きっと
rain -> 雨
tomorrow -> 明日
This in general happens due to the different grammar and conventions of speech in different languages. English sentences require a subject, and when there is no subject available, it uses a
dummy pronoun
A dummy pronoun is a deictic pronoun that fulfills a syntactical requirement without providing a contextually explicit meaning of its referent. As such, it is an example of exophora.
Dummy pronouns are used in many Germanic languages, includ ...
''it''. Japanese verbs do not have different forms for future and present tense, and the future tense is implied by the noun 明日 (tomorrow). Conversely, the
topic-marker は and the grammar word だ (roughly "to be") do not correspond to any word in the English sentence.
So, we can write the alignment as
1-> 0; 2 -> 0; 3 -> 3; 4 -> 4; 5 -> 1
where 0 means that there is no corresponding alignment.
Thus, we see that the alignment function is in general a function of type
.
Future models will allow one English world to be aligned with multiple foreign words.
Statistical model
Given the above definition of alignment, we can define the statistical model used by Model 1:
* Start with a "dictionary". Its entries are of form
, which can be interpreted as saying "the foreign word
is translated to the English word
with probability
".
* After being given a foreign sentence
with length
, we first generate an English sentence length
uniformly in a range