Improving Euclidean Division and Modular Reduction for Some Classes of Divisors

Improving Euclidean Division and Modular Reduction for some Classes of Divisors

Jean-Claude Bajard, Laurent Imbert, and Thomas Plantard LIRMM, CNRS, Universit´eMontpellier II 161 rue Ada, 34392 Montpellier cedex 5, FRANCE

Abstract cific modulus provides very efficient solutions but there are cases where only very few of those specific numbers Modular arithmetic is becoming an area of major im- are available. In this paper, we propose an intermedi- portance for many modern applications; RNS is widely ate approach by defining new classes of modulus with used in digital signal processing, and most public-key wider size and high scalability. Our solution seems very cryptographic algorithms require very fast modular mul- efficient compared to commonly used techniques. tiplication, and exponentiation. When such an arithmetic is required, specific values such as Fermat or 2 Problem and Notations Mersenne numbers are often chosen since they allow for very efficient implementations. However, there are Given the two positive integers D and 0 ≤ X < cases where only very few of those numbers are avail- D2, we compute the results of the Euclidean division able. We present an algorithm for the Euclidean divi- X/D ; i.e. the integers Q, R which satisfy the following sion with remainder and we give the classes of divisors equation: for which our algorithm is particularly efficient compared to commonly used method. X = QD + R, with R < D. (1)

Throughout the paper we consider the following nota- 1 Introduction tions: D is a n-digit integer in base β: βn−1 ≤ D < βn, (2) With no doubt, modular multiplication is the most important arithmetic operation of today’s public-key X is a 2n-digit number in base β ; for instance, the cryptographic algorithms [6]. During the last three result of a multiplication of two n-digit numbers, such decades, many solutions have been proposed to speed- that: up modular arithmetic. Although modular multiplica- 0 ≤ X < D2. (3) tion can be performed by interleaving multiplications We use a for the difference between βn and D and modular reductions [2], evaluating the product and then reducing is often a preferred option since one D = βn − a, (4) can take advantage of fast multiplication algorithms (see [3, 4, 9] for more details). and denote k its size in base β: |a| = k. We define Modular multiplication and reduction algorithms aβn are thus very closely related. They can be classified A = . (5) in many different ways. For instance, a considera- D tion that can be taken into account for the classifi- cation is the requirement precomputed values [1] or 3 Algorithm look-up tables[8, 5]. We can also distinguish between those which do not depend on any specific modulus, Our algorithm proceeds in two steps. Instead of like the widely used Montgomery’s algorithm [7], and evaluating the quotient Q = bX/Dc, we first compute those that only consider specific ones, like Fermat or an approximation of the quotient Qˆ = Q − e, with Mersenne numbers. Of course, taking advantage of spe- e ∈ {0, 1, 2}, and we deduce an approximation of the remainder Rˆ = X −QDˆ . Since e ∈ {0, 1, 2}, the correct For example, if a2 < βn, we define ψ(A) = a, and we remainder is obtained with at most two subtractions. evaluate  j X k  Using (4), the evaluation of Q = bX/Dc can be X + n × a 0  β  rewritten as Q = .  βn  $ % X (D + a) X + X a Q = = D . D βn βn In any cases, we are able to get can an approximation of A s.t. |A − ψ(A)| ≤ 1. Table 1 gives different X a approximations of A depending on the size of a. Using the same trick, i.e. by introducing β in D , and eq. (5), we obtain

$ X % X + n × A Size of a Approximation of A Q = β . (6) βn n |a| ≤ ψ(A) = a 2 At this point, note that the divisions by βn reduce to 2n 2 simple shifts in base β. |a| ≤ ψ(A) = a + b a c 3 βn We propose the following approximation: n i |a| = k ψ(A) = P n−k a $X + ϕ( X ) ψ(A)% i=1 β(i−1)n ˆ βn Q = n , (7) β Table 1. Different approximations of A. or more exactly diﬀerent approximations with increasing accuracy which depend on the level of accuracy of X 3.3 Bounds on Q ϕ( βn ) and ψ(A).

X X The approximations on n and A lead to an error 3.1 Evaluation of ϕ( n ) β β on Q such that: We consider two cases depending on the error we Qˆ = Q − e with e ∈ {0, 1, 2}. can afford on X/βn. As we shall see further, this error is controlled by the size of a. Thus, the correct result is obtained with at most two In the first case, ϕ( X ) is just the integer part of X . βn βn subtractions by D. Clearly, we have Proof: Let us denote e the error on ϕ(X/βn): X X 1 ϕ = − e, (8) βn βn X X ϕ = − e1, with e ∈ [0; 1). βn βn In the second case, we only consider the k + 1 most X and e2 the error on ψ(A): significant digits of the integer part of βn . We compute: X X ψ(A) = A − e2, with e2 ∈ [0, 1). (11) ϕ = βn−(k+1) , (9) βn β2n−(k+1) Using (7), we get which lead to an error e ∈ [0; βn−(k+1)).  X  X + βn − e1 (A − e2) Qˆ =   3.2 Evaluation of ψ(A)  βn 

We ﬁrst remark that A can be written as $ X X % X + βn A − e1A − e2 βn + e1e2 n = aβ a n A = = . β βn − a 1 − a/βn $ X % $ X % X + n A e1A + e2 n − e1e2 Thus, we obtain diﬀerent approximations of A with ≥ β + − β βn βn increasing accuracy by evaluating the following series at increasing orders: From (6), we obtain a2 a3 ak A = a + + + ··· + + ··· (10) ˆ βn β2n β(k−1)n Q ≥ Q − e, with Data:

& X ' D (n = |D| = 10) 9995566778 e1A + e2 n − e1e2 e = β ≥ 0. βn a (k = |a| = 7) 4433222 X 56789098765432101234 Evaluation of ψ(A):

Let us look at e in more details. Clearly, from (5), 2 replacing A by its value yields a 19653457301284 a2 1965 1010 2 & aβn X ' a e1 + e2 n ψ(A) = a + 4435187 e ≤ D β 1010 βn Evaluation of Qˆ: X 56789098765432101234 Since X ≤ D2, we have X ϕ( X ) = 10n−k 5678909000 10n 102n−k X n 2 ϕ( 10n )ψ(A) 25187023370983000 &e aβ + e D ' 2 1 D 2 βn a D X e ≤ ≤ e1 + e2 . X + ϕ( 10n )ψ(A) 56814285788803084234 βn D β2n $ % X + ϕ( X )ψ(A) Qˆ = 10n 5681428578 10n Replacing e and e by their respective maximum value 1 2 Evaluation of Rˆ: given by (9) and (11), we obtain: X 56789098765432101234 QDˆ 56789098745836581684 a D2 e ≤ βn−(k−1) + ˆ ˆ D β2n R = X − QD 00000000019595519550 Final correction: R = Rˆ − D 9599952772 n D2 Since D < β , the second term β2n is less than 1. For the ﬁrst term we have 5 Complexity

a βk βn−1 We analyze the computational complexity of our al- βn−(k−1) ≤ βn−(k−1) < < 1. D D D gorithm by counting the number of elementary opera- tions, i.e. the number of multiplication of single digits in radix β. We do not consider the evaluation of ψ(A) Taking the ceil function gives 0 ≤ e ≤ 2, and since e is in our complexity analysis since it can be precompu- tated if necessary, and for some values of D does not an integer, we have e ∈ {0, 1, 2}. requires any computations at all (the case ψ(A) = a). The evaluation of Qˆ requires the multiplication ϕ(X/βn)ψ(A), where the two operands are of size at most k + 1 digits. Thus the cost is (k + 1)2. 4 Example The complexity for the evaluation of Rˆ is more tricky. Since Rˆ < 3D, we only need to compute the |3D| less significant digits of QDˆ , i.e. n + 2 in base 2, and n + 1 for all base β greater than 2. One can Let us consider the following example in radix β = also remark that since D = βn − a, it is more inter- 10. esting to evaluate Rˆ = X − Qβˆ n + Qaˆ . Thus, we only consider the cost of the product Qaˆ , which requires 6 Conclusions (k − 3)(k − 2) kn − Pk−3 i = kn − elementary multi- i=1 2 The proposed algorithm is interesting if we can not plications. The total cost is then use the usual specific divisors (Fermat, Mersenne), or when we need more than those available in the dy- (k − 3)(k − 2) C = (k + 1)2 + kn − . namic range. If the size of a is less than 70% the size 2 of D, the cost of our reduction is less than one multiplication. So, in this case, the cost of our modular In table 2 we roughly estimate k, the size of a, which multiplication is less than two multiplications which is yield to some given complexities. We consider the cases the best we can have with Montgomery multiplication. 1 1/2, 1, 3/4, and 1/2 multiplications. In figure 1 we Also, when |a| is about 50% |D|, we do not need to perform any precomputation since the approximation ψ(A) = a holds. Cost Exact bounds on k

2 q C < 3n k ≤ − 9 − n + 89 + 9n + 4n2 2 2 4 References q 2 9 89 2 C < n k ≤ − 2 − n + 4 + 9n + 3n [1] P. D. Barret. Implementing the rivest shamir adle- 3n2 9 q 89 5n2 C < 4 k ≤ − 2 − n + 4 + 9n + 2 man pubic key encryption algorithm on a standard 2 q digital signal processor. In Advances in Cryptology – C < n k ≤ − 9 − n + 89 + 9n + 2n2 2 2 4 CRYPTO’86, number 263 in LNCS, pages 311–323. Springer-Verlag, 1986. Table 2. Bounds on k for some given costs. [2] S. R. Duss´eand J. B. S. Kaliski. A cryptographic library for the motorola DSP56000. In Advances in Cryptoloy have plotted the relative size of a according to D which – Eurocrypt 90, number 473 in LNCS, pages 230–244, allows us to reach the costs considered in the previous New York, 1990. Springer-Verlag. [3] A. Karatsuba and Y. Ofman. Multiplication of multi- table. For example, the lowest curve means that when digit numbers on automata. Soviet Physics—Doklady, k represent about 40% of n, the cost of our algorithm 7(7):595–596, Jan. 1963. is less that half a multiplication. [4] D. E. Knuth. The Art of Computer Programming, Vol. 2: Seminumerical Algorithms. Addison-Wesley, Read- ing, MA, third edition, 1997. [5] C. H. Lim, H. S. Hwang, and P. J. Lee. Fast modular reduction with precomputation. In Proceedings of Korea-Japan joint Workshop on Information Security 90 and Cryptology, Seoul, Korea, October 1997. [6] A. J. Menezes, P. C. Van Oorschot, and S. A. Vanstone. 80 Handbook of applied cryptography. CRC Press, 2000 N.W. Corporate Blvd., Boca Raton, FL 33431-9868, 70 USA, 1997. [7] P. L. Montgomery. Modular multiplication without trial 60 division. Mathematics of Computation, 44(170):519– 521, April 1985. 50 [8] N. Takagi and S. Yajima. Modular multiplication hard- ware algorithms with a redundant representation and

40 their application to RSA cryptosystem. IEEE Transac- tions on Computers, 41(7):887–891, July 1992.

30 [9] D. Zuras. More on squaring and multiplying larges integers. IEEE Transactions on Computers, 43(8):899–908, 1994. 20 40 60 80 100 120 Size of D

Figure 1. Complexity and size of a based on the size of D.