modular arithmetic In mathematics, modular arithmetic is a system of arithmetic for integers, where numbers "wrap around" when reaching a certain value, called the modulus. The modern approach to modular arithmetic was developed by Carl Friedrich Gauss in his book ...

computation, Montgomery modular multiplication, more commonly referred to as Montgomery multiplication, is a method for performing fast modular multiplication. It was introduced in 1985 by the American mathematician Peter L. Montgomery.Martin Kochanski
"Montgomery Multiplication"
a colloquial explanation. Montgomery modular multiplication relies on a special representation of numbers called Montgomery form. The algorithm uses the Montgomery forms of and to efficiently compute the Montgomery form of . The efficiency comes from avoiding expensive division operations. Classical modular multiplication reduces the double-width product using division by and keeping only the remainder. This division requires quotient digit estimation and correction. The Montgomery form, in contrast, depends on a constant which is coprime to , and the only division necessary in Montgomery multiplication is division by . The constant can be chosen so that division by is easy, significantly improving the speed of the algorithm. In practice, is always a power of two, since division by powers of two can be implemented by bit shifting. The need to convert and into Montgomery form and their product out of Montgomery form means that computing a single product by Montgomery multiplication is slower than the conventional or

Barrett reduction In modular arithmetic, Barrett reduction is a reduction algorithm introduced in 1986 by P.D. Barrett. A naive way of computing :c = a \,\bmod\, n \, would be to use a fast division algorithm. Barrett reduction is an algorithm designed to optimiz ...

algorithms. However, when performing many multiplications in a row, as in

modular exponentiation Modular exponentiation is exponentiation performed over a modulus. It is useful in computer science, especially in the field of public-key cryptography, where it is used in both Diffie-Hellman Key Exchange and RSA public/private keys. Modul ...

, intermediate results can be left in Montgomery form. Then the initial and final conversions become a negligible fraction of the overall computation. Many important cryptosystems such as RSA and

Diffie–Hellman key exchange Diffie–Hellman key exchangeSynonyms of Diffie–Hellman key exchange include: * Diffie–Hellman–Merkle key exchange * Diffie–Hellman key agreement * Diffie–Hellman key establishment * Diffie–Hellman key negotiation * Exponential key exc ...

are based on arithmetic operations modulo a large odd number, and for these cryptosystems, computations using Montgomery multiplication with a power of two are faster than the available alternatives.

Modular arithmetic

Let denote a positive integer modulus. The

quotient ring In ring theory, a branch of abstract algebra, a quotient ring, also known as factor ring, difference ring or residue class ring, is a construction quite similar to the quotient group in group theory and to the quotient space in linear algebra. ...

consists of residue classes modulo , that is, its elements are sets of the form :

\,

where ranges across the integers. Each residue class is a set of integers such that the difference of any two integers in the set is divisible by (and the residue class is maximal with respect to that property; integers aren't left out of the residue class unless they would violate the divisibility condition). The residue class corresponding to is denoted . Equality of residue classes is called congruence and is denoted :

\bar a \equiv \bar b \pmod.

Storing an entire residue class on a computer is impossible because the residue class has infinitely many elements. Instead, residue classes are stored as representatives. Conventionally, these representatives are the integers for which . If is an integer, then the representative of is written . When writing congruences, it is common to identify an integer with the residue class it represents. With this convention, the above equality is written . Arithmetic on residue classes is done by first performing integer arithmetic on their representatives. The output of the integer operation determines a residue class, and the output of the modular operation is determined by computing the residue class's representative. For example, if , then the sum of the residue classes and is computed by finding the integer sum , then determining , the integer between 0 and 16 whose difference with 22 is a multiple of 17. In this case, that integer is 5, so .

Montgomery form

If and are integers in the range , then their sum is in the range and their difference is in the range , so determining the representative in requires at most one subtraction or addition (respectively) of . However, the product is in the range . Storing the intermediate integer product requires twice as many bits as either or , and efficiently determining the representative in requires division. Mathematically, the integer between 0 and that is congruent to can be expressed by applying the Euclidean division theorem: :

ab = qN + r,

where is the quotient

\lfloor ab / N \rfloor

and , the remainder, is in the interval . The remainder is . Determining can be done by computing , then subtracting from . For example, the product is determined by computing

7 \cdot 15 = 105

, dividing

\lfloor 105 / 17 \rfloor = 6

, and subtracting

105 - 6 \cdot 17 = 105 - 102 = 3

. Because the computation of requires division, it is undesirably expensive on most computer hardware. Montgomery form is a different way of expressing the elements of the ring in which modular products can be computed without expensive divisions. While divisions are still necessary, they can be done with respect to a different divisor . This divisor can be chosen to be a power of two, for which division can be replaced by shifting, or a whole number of machine words, for which division can be replaced by omitting words. These divisions are fast, so most of the cost of computing modular products using Montgomery form is the cost of computing ordinary products. The auxiliary modulus must be a positive integer such that . For computational purposes it is also necessary that division and reduction modulo are inexpensive, and the modulus is not useful for modular multiplication unless . The ''Montgomery form'' of the residue class with respect to is , that is, it is the representative of the residue class . For example, suppose that and that . The Montgomery forms of 3, 5, 7, and 15 are , , , and . Addition and subtraction in Montgomery form are the same as ordinary modular addition and subtraction because of the distributive law: :

aR + bR = (a + b)R,

aR - bR = (a - b)R.

This is a consequence of the fact that, because , multiplication by is an

isomorphism In mathematics, an isomorphism is a structure-preserving mapping between two structures of the same type that can be reversed by an inverse mapping. Two mathematical structures are isomorphic if an isomorphism exists between them. The word is ...

on the additive group . For example, , which in Montgomery form becomes . Multiplication in Montgomery form, however, is seemingly more complicated. The usual product of and does not represent the product of and because it has an extra factor of : :

(aR \bmod N)(bR \bmod N) \bmod N = (abR)R \bmod N.

Computing products in Montgomery form requires removing the extra factor of . While division by is cheap, the intermediate product is not divisible by because the modulo operation has destroyed that property. So for instance, the product of the Montgomery forms of 7 and 15 modulo 17 is the product of 3 and 4, which is 12. Since 12 is not divisible by 100, additional effort is required to remove the extra factor of . Removing the extra factor of can be done by multiplying by an integer such that , that is, by an whose residue class is the

modular inverse In mathematics, particularly in the area of arithmetic, a modular multiplicative inverse of an integer is an integer such that the product is congruent to 1 with respect to the modulus .. In the standard notation of modular arithmetic this congru ...

of mod . Then, working modulo , :

(aR \bmod N)(bR \bmod N)R' \equiv (aR)(bR)R^ \equiv (ab)R \pmod.

The integer exists because of the assumption that and are coprime. It can be constructed using the

extended Euclidean algorithm In arithmetic and computer programming, the extended Euclidean algorithm is an extension to the Euclidean algorithm, and computes, in addition to the greatest common divisor (gcd) of integers ''a'' and ''b'', also the coefficients of Bézout's ide ...

. The extended Euclidean algorithm efficiently determines integers and that satisfy

Bézout's identity In mathematics, Bézout's identity (also called Bézout's lemma), named after Étienne Bézout, is the following theorem: Here the greatest common divisor of and is taken to be . The integers and are called Bézout coefficients for ; they a ...

: , , and: :

RR' - NN' = 1.

This shows that it is possible to do multiplication in Montgomery form. A straightforward algorithm to multiply numbers in Montgomery form is therefore to multiply , , and as integers and reduce modulo . For example, to multiply 7 and 15 modulo 17 in Montgomery form, again with , compute the product of 3 and 4 to get 12 as above. The extended Euclidean algorithm implies that , so . Multiply 12 by 8 to get 96 and reduce modulo 17 to get 11. This is the Montgomery form of 3, as expected.

The REDC algorithm

While the above algorithm is correct, it is slower than multiplication in the standard representation because of the need to multiply by and divide by . ''Montgomery reduction'', also known as REDC, is an algorithm which simultaneously computes the product by and reduces modulo more quickly than the naïve method. Unlike conventional modular reduction, which focuses on making the number smaller than , Montgomery reduction focuses on making the number more divisible by . It does this by adding a small multiple of which is chosen to cancel the residue modulo . Dividing the result by yields a much smaller number. This number is so much smaller that it is nearly the reduction modulo , and computing the reduction modulo requires only a final conditional subtraction. Because all computations are done using only reduction and divisions with respect to , not , the algorithm runs faster than a straightforward modular reduction by division. function REDC is input: Integers ''R'' and ''N'' with , Integer ''N''′ in such that , Integer ''T'' in the range . output: Integer ''S'' in the range such that ''m'' ← ((''T'' mod ''R'')''N''′) mod ''R'' ''t'' ← (''T'' + ''mN'') / ''R'' if ''t'' ≥ ''N'' then return else return ''t'' end if end function To see that this algorithm is correct, first observe that is chosen precisely so that is divisible by . A number is divisible by if and only if it is congruent to zero mod , and we have: :

T + mN \equiv T + (((T \bmod R)N') \bmod R)N \equiv T + T N' N \equiv T - T \equiv 0 \pmod.

Therefore, is an integer. Second, the output is either or , both of which are congruent to , so to prove that the output is congruent to , it suffices to prove that is. Modulo , satisfies: :

t \equiv (T + mN)R^ \equiv TR^ + (mR^)N \equiv TR^ \pmod.

Therefore, the output has the correct residue class. Third, is in , and therefore is between 0 and . Hence is less than , and because it's an integer, this puts in the range . Therefore, reducing into the desired range requires at most a single subtraction, so the algorithm's output lies in the correct range. To use REDC to compute the product of 7 and 15 modulo 17, first convert to Montgomery form and multiply as integers to get 12 as above. Then apply REDC with , , , and . The first step sets to . The second step sets to . Notice that is 1100, a multiple of 100 as expected. is set to 11, which is less than 17, so the final result is 11, which agrees with the computation of the previous section. As another example, consider the product but with . Using the extended Euclidean algorithm, compute , so will be . The Montgomery forms of 7 and 15 are and , respectively. Their product 28 is the input to REDC, and since , the assumptions of REDC are satisfied. To run REDC, set to . Then , so . Because , this is the Montgomery form of .

Arithmetic in Montgomery form

Many operations of interest modulo can be expressed equally well in Montgomery form. Addition, subtraction, negation, comparison for equality, multiplication by an integer not in Montgomery form, and greatest common divisors with may all be done with the standard algorithms. The

Jacobi symbol Jacobi symbol for various ''k'' (along top) and ''n'' (along left side). Only are shown, since due to rule (2) below any other ''k'' can be reduced modulo ''n''. Quadratic residues are highlighted in yellow — note that no entry with a ...

can be calculated as

\big(\tfrac\big) = \big(\tfrac\big) / \big(\tfrac\big)

as long as

\big(\tfrac\big)

is stored. When , most other arithmetic operations can be expressed in terms of REDC. This assumption implies that the product of two representatives mod is less than , the exact hypothesis necessary for REDC to generate correct output. In particular, the product of and is . The combined operation of multiplication and REDC is often called ''Montgomery multiplication''. Conversion into Montgomery form is done by computing . Conversion out of Montgomery form is done by computing . The modular inverse of is . Modular exponentiation can be done using

exponentiation by squaring Exponentiation is a mathematical operation, written as , involving two numbers, the '' base'' and the ''exponent'' or ''power'' , and pronounced as " (raised) to the (power of) ". When is a positive integer, exponentiation corresponds to re ...

by initializing the initial product to the Montgomery representation of 1, that is, to , and by replacing the multiply and square steps by Montgomery multiplies. Performing these operations requires knowing at least and . When is a power of a small positive integer , can be computed by

Hensel's lemma In mathematics, Hensel's lemma, also known as Hensel's lifting lemma, named after Kurt Hensel, is a result in modular arithmetic, stating that if a univariate polynomial has a simple root modulo a prime number , then this root can be ''lifted'' to a ...

: The inverse of modulo is computed by a naïve algorithm (for instance, if then the inverse is 1), and Hensel's lemma is used repeatedly to find the inverse modulo higher and higher powers of , stopping when the inverse modulo is known; is the negation of this inverse. The constants and can be generated as and as . The fundamental operation is to compute REDC of a product. When standalone REDC is needed, it can be computed as REDC of a product with . The only place where a direct reduction modulo is necessary is in the precomputation of .

Montgomery arithmetic on multiprecision integers

Most cryptographic applications require numbers that are hundreds or even thousands of bits long. Such numbers are too large to be stored in a single machine word. Typically, the hardware performs multiplication mod some base , so performing larger multiplications requires combining several small multiplications. The base is typically 2 for microelectronic applications, 2⁸ for 8-bit firmware, or 2³² or 2⁶⁴ for software applications. The REDC algorithm requires products modulo , and typically so that REDC can be used to compute products. However, when is a power of , there is a variant of REDC which requires products only of machine word sized integers. Suppose that positive multi-precision integers are stored

little endian In computing, endianness, also known as byte sex, is the order or sequence of bytes of a word of digital data in computer memory. Endianness is primarily expressed as big-endian (BE) or little-endian (LE). A big-endian system stores the most sig ...

, that is, is stored as an array such that for all and . The algorithm begins with a multiprecision integer and reduces it one word at a time. First an appropriate multiple of is added to make divisible by . Then a multiple of is added to make divisible by , and so on. Eventually is divisible by , and after division by the algorithm is in the same place as REDC was after the computation of . function MultiPrecisionREDC is Input: Integer ''N'' with , stored as an array of ''p'' words, Integer , --thus, ''r'' = ''log''_''B'' ''R'' Integer ''N''′ in such that , Integer ''T'' in the range , stored as an array of words. Output: Integer ''S'' in such that , stored as an array of ''p'' words. Set ''(extra carry word)'' for do ''--loop1- Make T divisible by '' ''c'' ← 0 ''m'' ← for do ''--loop2- Add the low word of and the carry from earlier, and find the new carry'' ''x'' ← ''T'' 'i'' + ''j''← ''c'' ← end for for do ''--loop3- Continue carrying'' ''x'' ← ''T'' 'i'' + ''j''← ''c'' ← end for end for for do ''S'' 'i''← ''T'' 'i'' + ''r'' end for if then return else return end if end function The final comparison and subtraction is done by the standard algorithms. The above algorithm is correct for essentially the same reasons that REDC is correct. Each time through the loop, is chosen so that is divisible by . Then is added to . Because this quantity is zero mod , adding it does not affect the value of . If denotes the value of computed in the th iteration of the loop, then the algorithm sets to . Because MultiPrecisionREDC and REDC produce the same output, this sum is the same as the choice of that the REDC algorithm would make. The last word of , (and consequently ), is used only to hold a carry, as the initial reduction result is bound to a result in the range of . It follows that this extra carry word can be avoided completely if it is known in advance that . On a typical binary implementation, this is equivalent to saying that this carry word can be avoided if the number of bits of is smaller than the number of bits of . Otherwise, the carry will be either zero or one. Depending upon the processor, it may be possible to store this word as a carry flag instead of a full-sized word. It is possible to combine multiprecision multiplication and REDC into a single algorithm. This combined algorithm is usually called Montgomery multiplication. Several different implementations are described by Koç, Acar, and Kaliski. The algorithm may use as little as words of storage (plus a carry bit). As an example, let , , and . Suppose that and . The Montgomery representations of and are and . Compute . The initial input to MultiPrecisionREDC will be , 4, 8, 5, 6, 7 The number will be represented as , 9, 9 The extended Euclidean algorithm says that , so will be 7. i ← 0 m ← j T c - ------- - 0 0485670 2 ''(After first iteration of first loop)'' 1 0485670 2 2 0485670 2 3 0487670 0 ''(After first iteration of second loop)'' 4 0487670 0 5 0487670 0 6 0487670 0 i ← 1 m ← j T c - ------- - 0 0087670 6 ''(After first iteration of first loop)'' 1 0067670 8 2 0067670 8 3 0067470 1 ''(After first iteration of second loop)'' 4 0067480 0 5 0067480 0 i ← 2 m ← j T c - ------- - 0 0007480 2 ''(After first iteration of first loop)'' 1 0007480 2 2 0007480 2 3 0007400 1 ''(After first iteration of second loop)'' 4 0007401 0 Therefore, before the final comparison and subtraction, . The final subtraction yields the number 50. Since the Montgomery representation of is , this is the expected result. When working in base 2, determining the correct at each stage is particularly easy: If the current working bit is even, then is zero and if it's odd, then is one. Furthermore, because each step of MultiPrecisionREDC requires knowing only the lowest bit, Montgomery multiplication can be easily combined with a carry-save adder.

Side-channel attacks

Because Montgomery reduction avoids the correction steps required in conventional division when quotient digit estimates are inaccurate, it is mostly free of the conditional branches which are the primary targets of timing and power

side-channel attack In computer security, a side-channel attack is any attack based on extra information that can be gathered because of the fundamental way a computer protocol or algorithm is implemented, rather than flaws in the design of the protocol or algorit ...

s; the sequence of instructions executed is independent of the input operand values. The only exception is the final conditional subtraction of the modulus, but it is easily modified (to always subtract something, either the modulus or zero) to make it resistant.
Presentation slides
) It is of course necessary to ensure that the exponentiation algorithm built around the multiplication primitive is also resistant.Marc Joye and Sung-Ming Yen
"The Montgomery Powering Ladder"
2002.

References

External links

* {{cite document , title=Theory and practice of Montgomery multiplication , author=Henry S. Warren, Jr. , date=July 2012, citeseerx=10.1.1.450.6124 Computer arithmetic Cryptographic algorithms Modular arithmetic