<<

J Cryptogr Eng (2017) 7:245–253 DOI 10.1007/s13389-016-0134-5

SHORT COMMUNICATION

Efficient regular modular using multiplicative half-size splitting

Christophe Negre1,2 · Thomas Plantard3,4

Received: 14 August 2015 / Accepted: 23 June 2016 / Published online: 13 July 2016 © Springer-Verlag Berlin Heidelberg 2016

Abstract In this paper, we consider efficient RSA modular x K mod N where N = pq with p and q prime. The private x K mod N which are regular and con- data are the two prime factors of N and the private exponent stant time. We first review the multiplicative splitting of an K used to decrypt or sign a message. In order to insure a integer x modulo N into two half-size integers. We then sufficient security level, N and K are chosen large enough take advantage of this splitting to modify the square-and- to render the factorization of N infeasible: they are typically multiply exponentiation as a regular sequence of squarings 2048-bit integers. The basic approach to efficiently perform always followed by a multiplication by a half-size inte- the modular exponentiation is the square-and-multiply algo- ger. The proposed method requires around 16% less word rithm which scans the bits ki of the exponent K and perform operations compared to Montgomery-ladder, square-always a sequence of squarings followed by a multiplication when and square-and-multiply-always exponentiations. These the- ki is equal to one. oretical results are validated by our implementation results When the cryptographic computations are performed on which show an improvement by more than 12% compared an embedded device, an adversary can monitor power con- approaches which are both regular and constant time. sumption [2] or electronic emanation [3]. If the power or electromagnetic traces of a multiplication and a squaring Keywords RSA · Regular exponentiation · Constant time differ sufficiently, an adversary can read the sequence of exponentiation · Multiplicative splitting squarings and multiplications directly on a single power or electromagnetic trace of a modular exponentiation. In the lit- erature these attacks are referred to as simple power analysis 1 Introduction (SPA) and simple electromagnetic analysis (SEMA), respec- tively. Currently, RSA [1] is the most used public key cryptosystem. Consequently, modular exponentiations have to be imple- The main operation in RSA protocols is an exponentiation mented in order to prevent such side channel analysis. The first direct approach which prevents this attack is the B Christophe Negre multiply-always exponentiation which performs all squar- [email protected] ings as multiplications. But, unfortunately, it has been shown Thomas Plantard in [4] that this multiply-always strategy is still weak against [email protected] an SPA or SEMA: an operation r × r and r × r  have dif- 1 DALI, Universite de Perpignan, Perpignan, France ferent output Hamming weight. The authors in [5] proposed a square-always approach which performs a multiplication 2 LIRMM, Université de Montpellier and National Center for Scientific Research (CNRS), 161 Rue Ada, 34090 as the combination of two squarings. They then noticed Montpellier, France that in this case the attack of [4] does not apply. But both 3 Centre for Computer and Information Security Research multiply-always and square-always approaches still leak (CCISR), University of Wollongong, Wollongong, Australia some information about the exponent: the computation time 4 School of Computer Science and Software Engineering is correlated with the Hamming weight of the exponent, (SCSSE), University of Wollongong, Wollongong, Australia which is then leaked out. 123 246 J Cryptogr Eng (2017) 7:245–253

Table 1 Complexity in terms of word operations per iteration of loop body for a modular exponentiation Regular Constant time Complexity per loop body for t-word integers

# Word add. # Word mult.

✗✗ 2 + ( ) 5 2 + ( ) Square-and-multiply 5t O t 2 t O t Multiply-always [5]  ✗ 6t2 + O(t) 3t2 + O(t) Square-always [5]  ✗ 6t2 + O(t) 3t2 + O(t)  2 + ( ) 7 2 + ( ) Square-and-multiply-always [6] 7t O t 2 t O t  2 + ( ) 7 2 + ( ) Montgomery-ladder [7] 7t O t 2 t O t Montgomery-ladder with CM [9]  6t2 + O(t) 3t2 + O(t)  2 + ( ) 5 2 + ( ) Proposed approach 5t O t 2 t O t

A prerequisite to be SPA resistant is then to be regu- 2 Review of regular modular exponentiation lar and constant time. A first method which satisfies both of these properties is the square-and-multiply-always expo- We review in this section several methods for performing an nentiation proposed by Coron [6]. Its principle is to always exponentiation x K mod N. The simplest and most popular perform a multiplication after a squaring, i.e., if the bit ki = 0 method is the square-and-multiply exponentiation [10]. The then a dummy multiplication is performed. Another popular bits of the exponent K are scanned from left to right, for each strategy is the Montgomery-ladder [7] which also performs bit a squaring is performed and is followed by a multiplica- an exponentiation through a regular sequence of squarings tion by x if the bit is equal to 1. This method is detailed in always followed by a multiplication. Algorithm 1. We present in this paper an alternative approach for regular K and constant time exponentiation x mod N. Our method Algorithm 1 Square-and-multiply uses a multiplicative splitting of x into two halves. We mod- Require: x ∈{0,...,N − 1} and K = (k−1,...,k0)2 ify the square-and-multiply algorithm as a regular sequence 1: r ← 1 of squarings always followed by a multiplication with a 2: for i from  − 1 downto 0 do ← 2 half-size integer. The half-size multiplications and squarings 3: r r mod N 4: if ki = 1 then modulo N are computed with the method of Montgomery [8], 5: r ← r × x mod N and we then also provide a version of the proposed exponen- 6: end if tiation with Montgomery modular multiplications adapted 7: end for to the size of the operands. We provide a complexity analy- 8: return r sis when modular operations are computed with word-level algorithm: computer words are w-bit long and considered The sequence of squarings and multiplications in the integers modulo N haveasizeoft computer words. The square-and-multiply method has some irregularities due to complexity of the proposed approach is given in Table 1 the irregular sequence of bits ki equal to 1. This can be which contains the costs of the loop bodies of the consid- used to mount a side channel attack by monitoring the power ered exponentiation . We notice that the proposed consumption or the electromagnetic emanation of the circuit approach always reaches the best complexity while having performing the computations. Indeed, if the monitored sig- the higher security level compared to best known methods of nal of a multiplication and a squaring have a different shape, the literature. then we can directly read on the power trace the sequence of The remainder of the paper is organized as follows. Sec- squarings and multiplications. If a trace of a multiplication tion 2 summarizes state of the art methods for regular modular appears between two subsequent squarings, then we deduce exponentiation. In Sect. 2.1 we review techniques to com- that the corresponding bit is 1, otherwise it is 0. pute a multiplicative splitting of an integer modulo N.In This means that a secure implementation of modular expo- Sect. 3 we present a new modular exponentiation algorithm nentiation must be computed through a regular sequence of which uses this splitting to render regular the square-and- squarings and multiplications uncorrelated with the key bits. multiply exponentiation. In Sect. 4, we present a version In the literature the following strategies were proposed to of the proposed exponentiation which incorporates Mont- prevent SPA: gomery modular multiplications. Finally, in Sect. 5,we evaluate the complexity of the proposed algorithm and pro- • Multiply-always [5]. This approach performs all the vide implementation results. squarings of the square-and-multiply exponentiation as 123 J Cryptogr Eng (2017) 7:245–253 247

3 multiplications. This leads to a sequence of 2 multipli- 2.1 Multiplicative splitting of an integer x modulo N cations on average. Unfortunately, this multiply-always approach can be threatened by the attack of [4]: this attack We consider an RSA modulus N and an integer x ∈[0, N] differentiates a power trace of a multiplication r × r (i.e. that corresponds to the message we want to decrypt or sign a hidden squaring) by a multiplication r × x with x = r by computing x K mod N. We will show in this section that based on a difference of the Hamming weight of the out- x can be split into two parts as follows put bits. • = −1 × | |, | |≤ 1/2. Square-always [5]. This approach is an improvement of x x0 x1 mod N with x0 x1 N (2) the Multiply-always and prevents the attack of [4]. The authors in [5] use the fact that a multiplication can be The idea to split multiplicatively is not new, we can find it in performed with two squarings: a number of references of the literature: for example in [14] the authors use it to randomize an RSA exponent. (r + x)2 − (r − x)2 In order to get a multiplicative splitting of x modulo N, r × x = . (1) 4 we use the method presented in [15] which consists in a partial execution of the extended . The They re-express all the multiplications of the square- Euclidean algorithm computes the and-multiply exponentiation in order to get a square- of x and N through a sequence of reductions: we start with always exponentiation. This square-always exponentia- r0 = N, r1 = x and perform the following iteration tion requires 2 squarings in average. Unfortunately, both multiply-always and square-always approaches suffer ri+1 = ri−1 mod ri for i = 1, 2,... (3) from a weakness: they do not process the exponentia- tion in a constant time, and then the Hamming weight of The sequence r0, r1,...,ri is a decreasing sequence of posi- the key can be leaked out by the computation time. tive integers and the last non zero r satisfies r = gcd(x, N). • i i Square-and-multiply-always [6]. The first method which The extended Euclidean algorithm computes, in addition is regular and constant time is the square-and-multiply- to gcd(x, N), two integers a, b satisfying always exponentiation proposed by Coron in [6]. The idea of Coron is to perform a dummy multiplication when we ax + bN = gcd(x, N), (4) read a bit that is equal to 0. This results in a power trace of a regular sequence of traces of squarings always followed which is called a Bezout identity. In order to compute a and b, by a trace of a multiplication. the extended Euclidean algorithm maintains two sequences • Montgomery-ladder [7]. The square-and-multiply-always ai and bi satisfying exponentiation is effective to counteract SPA and SEMA along with timing attacks. But it is still under the threat a x + b N = r (5) of another kind of side channel attack: the safe error fault i i i injection attack [11,12]. This problem was fixed by the where the integers r , i = 0, 1,..., are the consecutive Montgomery-ladder approach for modular exponentia- i remainders in (3) computed in the Euclidean algorithm. The tion [7]. The Montgomery-ladder is regular and constant integers a , b , i = 1, 2,...,are computed as follows time and any error injected during the computation will i i affect the final results, yielding a natural resilience to safe q = r − /r , error fault injection attack. i i 1 i ri+1 = ri−1 − qi ri , (6) Both square-and-multiply-always and Montgomery- ai+1 = ai−1 − qi ai ,  ladder exponentiations have a complexity of squarings and bi+1 = bi−1 − qi bi ,  multiplications for an -bit exponent K . starting from r0 = N, r1 = x and a0 = 0, a1 = 1 and b0 = Remark 1 In this paper we focus on methods that require 1, b1 = 0. Then, when ri is equal to gcd(x, N) the identity (5) at most two intermediate variables. But the reader might be is a valid Bezout relation (4). For a detailed presentation of aware that there are some alternative methods in the litera- this method the reader may refer to [16]. ture ensuring a regularity of the operations while reducing the In order to obtain a multiplicative splitting of x, the authors ∼ number of multiplications. These methods use a larger num- in [15] stop the extended Euclidean algorithm when ri = 1/2 ∼ 1/2 ber of intermediate variable. This is for example the case of N and ai = N : indeed, due to (5), for any i we have the methods reported in [13] which use a regular windowing = −1 x ai ri mod N. This method computing the splitting of recoding of the exponent K . an integer x is reviewed in Algorithm 2. 123 248 J Cryptogr Eng (2017) 7:245–253

Algorithm 2 Multiplicative splitting modulo N [15] bound its cost above with an upper bound of the complexity Require: 0 ≤ x < N < c2 ∈ N with gcd(x, N)

Lemma 1 Let ai and ri be the two sequences of coefficients We have: computed in Algorithm 2. They satisfy the following proper- • Cost of one loop body. If we assume that the integers ai ties: and ri in Algorithm 2 are stored on t words, each loop i−1 (i) (−1) ai ≥ 1 for all i ≥ 1. body requires 2t word subtractions. i • (ii) ai+1ri − ai ri = (−1) N for all i ≥ 1. Number of iterations. At each iteration we remove the most significant bit of ri or ri−1 by at least one bit. This  ( )+ ( ) The proof of Lemma 1 is reviewed in the Appendix. reduces the bit length log2 ri log2 ri−1 by one. The following lemma asserts that Algorithm 2 outputs a This implies that the number of iterations before we get | |, | | < r = 0isatmostlog (x)+log (N)≤2tw. pair aic and ric which satisfy aic ric c. i 2 2 1/2 Lemma 2 Let c ∈ N such that c > N and let At the end the total number of operations is at most 2tw × , ,..., , ,..., 2 a0 a1 aic and r0 r1 ric be the sequences computed 2t = 4t w word subtractions.  in Algorithm 2. Then Algorithm 2 correctly outputs a pair , aic ric such that 3 Regular exponentiation with half-size − multiplicative splitting x = a 1 × r mod N with |a | < c and |r | < c. ic ic ic ic

Proof The proof is a direct consequence of Lemma 1: state- Given a multiplicative splitting (2)ofx into two half-size ments (i) and (ii) imply that for i ≥ 1 integers, we can modify the square-and-multiply method in order to distribute a full multiplication by x to one half-size | |+ | |= . ri−1 ai ri ai−1 N (7) multiplication by x0 when ki = 0 and one half-size multi- √ plication by x1 when ki = 1. This approach is depicted in So if r − is the last remainder such that r − ≥ c > N ic 1 ic 1 Algorithm 3. This algorithm reaches our goal since it is regu- then we have r < c. Then taking i = i in (7)wehave ic c lar: each iteration of the loop body is a squaring followed by r − |a |+r |a − |=N then one must have |a |≤ ic 1 ic ic ic 1 ic a half-size multiplication. It is also robust against safe error N/r − ≤ N/c < c.  ic 1 fault injection attack: each error in one half-size multiplica- A direct consequence of Lemma 2 is the following. If tion will affect the final result. N 1/2 is not an integer and if Algorithm 2 is executed with 1/2 Algorithm 3 Regular exponentiation with half-size multipli- c =N  then the multiplicative splitting ai , ri output c c cations by the algorithm satisfies Require: x ∈{0,...,N − 1} and K = (k−1,...,k0)2 / / Ensure: r = xk mod N | | <  1 2 | | <  1 2. − aic N and ric N = 1 × , =∼ 1/2 1: Split. x x0 x1 mod N with x0 x1 N . ← −1 2: r x0 In other words, it is a half-size multiplicative splitting. 3: for i from  − 1 downto 0 do 4: temp ← ki x1 + (1 − ki )x0 Complexity. For the sake of simplicity, we will only give an 5: r ← r 2 × temp mod N upper bound on the cost of the multiplicative splitting. Specif- 6: end for ically, since computing a multiplicative splitting consists in a 7: r ← r × x0 mod N partial execution of the extended Euclidean algorithm, we can 8: return r 123 J Cryptogr Eng (2017) 7:245–253 249

The following lemma establishes the validity of Algo- and z satisfies z = (xyM−1) mod N and z < 2N.In = K = n+1 = ( ) rithm 3, i.e., that it correctly computes r x mod N. practice taking M 2 with n log2 N simplifies the reduction and the division by M. This method also Lemma 4 Let K = (k− ,...,k ) with k ∈{ , } be an 1 0 2 i 0 1 applies for a squaring, i.e., x = y and, in the sequel this -bit integer and let N and x be two positive integers such will be referred to as FMS for Full Montgomery Squaring. that x < N. Ifweset K = (k− ,...,k ) , then the value i 1 i 2 • Half Montgomery Multiplication√ (HMM):Letm be an of r after the iteration i satisfies: integer such that m > N and gcd(N, m) = 1. Let y be a −  ( )  ( )/  r = x Ki x 1 mod N. log2 N -bit integer and x be a log2 N 2 -bit integer. 0 Then the HMM works as follows: − Proof We prove the assertion by a decreasing induction on q ← (−x × y × N 1) mod m i: we assume it is true for i and we prove it for i − 1. We z ← (x × y + q × N)/m −1 denote ri the value of r after the execution of iteration i in and z satisfies z = (xym ) mod N and z < 2N. Then, − = Ki × 1 = n/2+1 = ( ) Algorithm 3 and we assume that it satisfies ri x x0 . in practice, taking m 2 where n log2 N Then if ki−1 = 1 the execution of iteration i − 1gives: simplifies the computation of a reduction and a division by m. = 2 × ri−1 ri x1 − = 2Ki × 2 × x x0 x1 The proposed regular exponentiation which incorporates + − = 2Ki 1 × 1 FMS and HMM is depicted in Algorithm 4. x x0 − = Ki−1 × 1. x x0 Algorithm 4 Regular exponentiation with half-size Mont- since Ki−1 = 2Ki + 1. And, if ki−1 = 0, the execution of gomery modular multiplications − iteration i 1gives: Require: x ∈{0,...,N − 1} and K = (k−1,...,k0)2 Ensure: r = x K mod N = 2 × = −1 × , =∼ 1/2 ri−1 r x0 1: Split x x0 x1 mod N with x0 x1 N . i − −2 2: r = x 1 × m × M mod N // Montgomery repre- = x2Ki × x × x 0 0 0 sentation − = 2Ki × 1 3: for i from  − 1 downto 0 do x x0 − 4: r ← FMS(r, r) = Ki−1 × 1. x x0 5: temp ← ki x1 + (1 − ki )x0 6: r ← HMM(r, temp) since in this case Ki−1 = 2Ki .  7: end for  −1 −1 8: r ← r × x0 × m × M mod N 9: return r 4 Exponentiation with half-size splitting and Montgomery multiplication

Lemma 5 Let K = (k− ,...,k ) with k ∈{0, 1} be an - An RSA modulus N looks like a random integer: the binary 1 0 2 i bit integer, and let N be a positive integer and x ∈[0, N −1]. representation is not sparse and has no other underlying struc- If we set K = (k− ,...,k ) then the value r after the ture which can be used to speed-up a reduction modulo N. i 1 i 2 iteration i in Algorithm 4 satisfies: The most commonly used method to perform a multiplication − modulo a random integer is the Montgomery method [8]. We r = (x Ki x 1 Mm) mod N. modify Algorithm 3 in order to use the Montgomery multi- 0 N plication for the squarings and multiplications modulo .A Proof We prove it by induction on i. If we denote ri the value − squaring in Algorithm 3 involves integers of size log (N) = ( Ki 1 ) 2 of r after the iteration i, then it satisfies ri x x0 Mm bits, while a multiplication involves two kinds of multipli- mod N. Then the squaring with FMS provides:  ( ) cand: one integer of size log2 N bits and one integer of ∼ − − size = log (N)/2 bits. This compels us to use two kinds ( ) = 2Ki 2 2 2 1 2 FMS ri x x0 M m M mod N − of Montgomery multiplications: = 2Ki 2 2 . x x0 Mm mod N • Full Montgomery Multiplication (FMM):LetM be an Now if k − = 0 the algorithm computes: integer such that M > N and gcd(N, M) = 1. Let y and i 1 x  (N) − be two integers of size log2 bits. Then the FMM = ( 2Ki 2 2, ) ri−1 HMM x x0 Mm x0 worksasfollows: − − − = ( 2Ki 2 2) 1 q ← (−x × y × N 1) mod M x x0 Mm x0m mod N − ← ( × + × )/ = 2Ki 1 z x y q N M x x0 Mm mod N 123 250 J Cryptogr Eng (2017) 7:245–253 which satisfies the induction hypothesis since Ki−1 = 2Ki . Algorithm 5 Word-level Montgomery multiplication [17] w − Now if ki−1 = 1 the algorithm computes: Require: N < 2 t 1 the modulus, w the word size, x = (xt−1,...,x0)2w and y = (ys−1,...,y0)2w integers in [0, N[ and  = (− −1) w 2Ki −2 2 N N mod 2 r − = HMM(x x Mm , x ) −w i 1 0 1 Ensure: z = x · y · 2 s mod N 2Ki −2 2 −1 ← = (x x Mm )x1m mod N 1: z 0 0 − + − 2: for i from 0 to s 1 do = 2Ki 1 1 x x0 Mm mod N 3: z ← z + yi · x  w 4: q ←|z|2w · N mod 2 5: z ← (z + q · N)/2w which satisfies the induction hypothesis since Ki−1 = 2Ki +  6: end for 1. 7: if z ≥ N then 8: z ← z − N 9: end if 5 Complexity comparison and implementation 10: return z results Table 2 Step by step complexity evaluation of word-level Montgomery In this section we first briefly review word-level forms of multiplication (Algorithm 5) Montgomery multiplication and squaring along with their Operations # Word add. # Word mul. complexities. We then deduce the complexity of the proposed s Step 3 xi × ystst exponentiation and compare it with the approaches reviewed z + (xi y) s(t + 1) 0 in Sect. 2.  s Step 4 |z|2w · N 0 s s Step 5 q × Nstst 5.1 Word-level Montgomery multiplication and z + (qN) s(t + 1) 0 squaring Step 7 z − Nt 0 Total s(4t + 2) + ts(2t + 1) The proposed exponentiation in Algorithm 4 involves Mont- gomery modular squarings and multiplications with adapted  ( ) sizes to the operands, i.e., of size either log2 N or  ( )/  Montgomery multiplication. However, a squaring can be log2 N 2 bits. The subsequent word-level form of Mont- gomery multiplication can take as input two integers of optimized by considering that we may save some redundant · · different sizes. word multiplications xi x j and x j xi . We review here the formulation of the Montgomery squaring provided in [9]. Word-level Montgomery multiplication. We consider two 2 w The squaring x is rewritten as follows: integers x = (xt−1,...,x0)2w where t =N/2  and y = ( ,..., ) w = =/  ys−1 y0 2 with s t or s t 2 . The word-level t−1 t−1 2 w(i+ j) form of the Montgomery multiplication interleaves multi- x = xi x j 2 precision multiplication and small Montgomery reduction i=0 j=0 = , ,..., − by sequentially performing for i 0 1 s 1: t−2 t−1 t−1 = w(i+ j) + 2 2iw 2 xi x j 2 xi 2 z ← z + x × yi i=0 j=i+1 i=0 − w q ←−z × N 1 mod 2 t−1 t−i−1 w( ) w w = 2i ( + j ) z ← (z + qN)/2 xi 2 xi 2 xi+ j 2 i=0 j=1 t−1 w(2i) = xi 2 xi . (8) where z is initially set to 0 and, at the end, it is equal to x × i=0 y × 2−sw mod N. This method is detailed in Algorithm 5.  The complexity of Algorithm 5 is evaluated step by step  = ( + t−i−1 wj ) The integer xi xi 2 j=1 xi+ j 2 can be deduced in Table 2. The cost of each step is expressed in terms of the    from x = 2x = (x − ,...,x )2w as complexity of a t-word addition or of a 1 × t multiplication t 1 0 which costs t word multiplications and t word additions with   x = (x ,...,x , |2x + | w , x ) w . carry. i t−1 i+2 i 1 2 i 2

Word-level Montgomery squaring. The Montgomery squar- With the formulation (8) the authors in [9] could derive a ing of a t-word integer x can be computed with the word-level word-level Montgomery squaring as shown in Algorithm 6. 123 J Cryptogr Eng (2017) 7:245–253 251

Algorithm 6 Word-level Montgomery squaring [9] Now, we deduce the cost of the following approaches for wt−1  Require: N < 2 the modulus, x, with x = (xt−1,...,x0)2w with an bit exponent for a modular exponentiation: w  −1 w 0 ≤ xi < 2 where w is the word size, N =−N mod 2 −w Ensure: z ≡ x2 × 2 t mod N and z < N •   ← + The square-and-multiplication exponentiation requires 1: x x x / 2: z ← 0 FMS and 2 FMM in average. 3: for i from 0 to (t − 1) do • The multiply-always exponentiation necessitates 3/2    ← ( ,..., , | | w , ) w 4: xi xt−1 xi+2 2xi+1 2 xi 2 FMM in average. wi 5: z ← z + xi · xi · 2 •   w The square-always exponentiation necessitates 2 FMS 6: q ←|z|2w · N mod 2 ← ( + · )/ w in average. 7: z z q N 2 • 8: end for The square-and-multiply-always and Montgomery-ladder 9: if z ≥ N then exponentiation require  FMS and  FMM. 10: z ← z − N • The Montgomery-ladder exponentiation with common 11: end if multiplicand [9]: this necessitates  word-level com- 12: return z bined Montgomery multiplications AB, AC, which have a reduced complexity by sharing some computations The complexity of Algorithm 6 is evaluated step by step involved in reductions modulo N (cf. [9] for details). in Table 3. Only the complexity evaluation of Step 5 needs to be detailed. We first notice that: The complexities of these approaches in terms of the num- ber of word additions and multiplications are given in Table 4. • xi × xi requires t −i word multiplications and t −i word additions. For the proposed regular exponentiation with half-size wi Montgomery multiplication (Algorithm 4), we need  FMS • z + 2 (xi xi ) requires t − i + 1 word additions. and  HMM in the  iterations of the loop body. For the com-  t−1 putation of the multiplicative splitting x with Algorithm 2, Weadd the contributions of all iterations and we get = (t− 0  i 0 2 t(t+1) t−1 the cost is, using Lemma 3, bounded above by 4t w word i) = word multiplications and = (2t − 2i + 1) = − 2 i 0 x 1 t(t + 1) + t = t2 + 2t word additions for t Step 5, as stated additions. The computation of 0 has also a cost bounded 2w in Table 3. above by 4t word additions since it is computed with the extended Euclidean algorithm. The resulting overall com- plexity of the proposed regular exponentiation is given in 5.2 Complexity comparison terms of the number of word additions and multiplications in Table 4. Now, we can deduce the cost of a FMM, a FMS and a HMM We notice that the fastest approach is the non-secure from the complexity of the word-level Montgomery multi- square-and-multiply exponentiation. We also notice that our plication and squaring. Specifically, the cost of a FMS with w approach has complexity really close to the one of the M = 2t is the same as the one shown in Table 3. To obtain w square-and-multiply: only the precomputation costs make it the complexity of FMM with M = 2t ,wetakes = t in the less efficient. Moreover, our approach is better by roughly formula of Table 2 and to get the complexity of a HMM with w/ 16 % than all regular approaches: the square-always and m = 2t 2 we take s = t/2 in the formula of Table 2.This multiply-always exponentiations and also the Montgomery- leads to the complexities shown in the upper part of Table 4. ladder and square-and-multiply always approaches.

Table 3 Step by Step complexity evaluation of a word-level Mont- 5.3 Implementation results gomery squaring (Algorithm 5) Operations # Word add. # Word mul. We implemented in C language the different approaches and compiled them on an Intel Core i7 Broadwell 5600U with + Step 1 x xt 0 gcc-4.8.4 and on a quad-core ARMv7 Cortex-A7 with gcc- | | w t Step 4 2xi+1 2 t 0 4.9.2. For modular multiplication and modular squaring, we + wi 2 + t(t+1) t Step 5 z 2 xi xi t 2t 2 implemented Algorithm 6 and Algorithm 5 using low-level  t Step 6 |z|2w · N 0 t functions of GMP library (cf. GMP 6.0.0, https://gmplib. t Step 7 q × Nt2 t2 org)for1×t multiplications and t-word additions. We could z + (qN) t(t + 1) 0 then implement all the exponentiation algorithms considered Step 10 z − Nt 0 in this paper. The multiplicative splitting of our approach 2 + 3t2 + 3t was implemented using the low-level function of GMP for Total 3t 6t 2 2 Euclidean division. The timings obtained for a number of 123 252 J Cryptogr Eng (2017) 7:245–253

Table 4 Complexity comparison Algorithm # Word add. # Word mul.

Multiplication and FMM 4t2 + 3t 2t2 + t squaring modulo N 2 + 3t2 + 3t FMS 3t 6t 2 2 2 + 2 + t HMM 2t 2tt2 ( 2 + 15t ) + 2 + ( 5t2 + 4t ) + 2 + Exponentiation mod Square-and-multiply 5t 2 8t 6t 2 2 4t 2t N with no side channel protection ( 2 + 9t ) + 2 + ( 2 + 3t ) + 2 + Non constant time Multiply-always 6t 2 8t 6t 3t 2 4t 2t regular exponentiation Square-always (6t2 + 12t) + 8t2 + 6t (3t2 + 3t) + 4t2 + 2t ( 2 + ) + 2 + ( 7t2 + 5t ) + 2 + Regular and constant Square-and-multiply-always 7t 9t 8t 6t 2 2 4t 2t time exponentiation ( 2 + ) + 2 + ( 7t2 + 5t ) + 2 + Montgomery-ladder 7t 9t 8t 6t 2 2 4t 2t Montgomery-ladder CM [9] (6t2 + 9t + 1) + 8t2 + 8t (3t2 + 4t + 3) + 4t2 + 4t + 2 ( 2 + ) + 2 + ( 5t2 + ) + w 2 + 2 + 5t Proposed (Algorithm 4) 5t 8t 10t 8t 2 2t 8 t 5t 2

Table 5 Timings in 103 clock-cycles of modular exponentiation Algorithm Timings on a Core i7 Timings on an ARMv7 2048 bits 3072 bits 4096 bits 2048 bits 3072 bits 4096 bits

Exponentiation Square-and-multiply 12,811 40,207 93,094 155,005 502,142 1,175,073 without side channel protection Non constant time Multiply-always [5] 13,896 45,407 106,177 175,946 575,859 1,373,075 regular exponentiations Square-always [5] 16,751 52,120 118,744 193,493 620,804 1,450,404 Regular and constant Montgomery-ladder [7] 17,669 56,449 130,436 214,077 702,270 1,633,957 time exponentiations Montgomery-ladder with CM [9] 15,478 48,963 113,133 183,707 598,020 1,398,734 Square-and-multiply-always [6] 17,619 56,137 130,043 214,249 697,046 1,634,550 Proposed (Algorithm 4) 13,616 42,139 96,547 158,805 509,769 1,188,119 practical bit lengths of N (i.e., 2048, 3072 and 4096) are not entirely secure against SPA, and more than 12 % com- reported in Table 5. We used Papi library [18] to get cycle pared to the other regular approaches. counts on both platforms. These timings are the average of 1000 timings obtained with random input messages x and 6 Conclusion random exponents K . We notice that the reported timings relate to the complex- We presented in this paper a new approach for regular ity results shown in Table 4. Indeed, the fastest approach modular exponentiation. We first introduced a multiplica- is the square-and-multiply exponentiation which is not pro- tive splitting of an integer x modulo N. We showed that tected against simple side channel analysis. Our approach is this splitting can be used to modify the square-and-multiply less than 6.3 slower than square-and-multiply for any key algorithm in order to have a regular sequence of squarings size. Our approach is better than all other approaches: by 1– always followed by a multiplication with a half-size integer. 13.4% compared to the multiply-always approach, which is We then modified this algorithm in order to perform modular 123 J Cryptogr Eng (2017) 7:245–253 253 multiplication with the Montgomery method. Compared to References the usual regular and constant time modular exponentiations, the proposed method involves only multiplications by half- 1. Rivest, R., Shamir, A., Adleman, L.: A method for obtaining digital size integers instead of full multiplications. This leads to a signatures and public-key cryptosystems. Commun. ACM 21, 120– 126 (1978) reduction of the complexity by 16% and an improvement of 2. Kocher, P.C., Jaffe, J., Jun, B.: Differential power analysis. In: the timing by 12% compared to other approach which are Wiener, M.J. (ed.): Advances in Cryptology–CRYPTO ’99, 19th both regular and constant time. Annual International Cryptology Conference, Santa Barbara, Cal- ifornia, USA, August 15–19, 1999, Proceedings, Lecture Notes in Computer Science, vol. 1666, pp. 388–397. Springer, Berlin (1999) Acknowledgements This work was supported by PAVOIS ANR 12 3. Mangard, S.: Exploiting Radiated Emissions - EM Attacks on BS02 002 02. Cryptographic ICs. In: Austrochip 2003, Linz, Austria, October 1st, pp. 13–16 (2003) 4. Amiel, F., Feix, B., Tunstall, M., Whelan, C., Marnane, W.: Distin- Appendix guishing Multiplications from Squaring Operations. In: SAC 2008, ser. LNCS, vol. 5381, pp. 346–360. Springer (2009) • 5. Clavier, C., Feix, B., Gagnerot, G., Roussellet, M., Verneuil, Proof of Lemma 1 Proof of (i). We prove by induction V.: Square Always Exponentiation. In: Progress in Cryptology - i−1 on i that (−1) ai ≥ 1 for all i ≥ 1. For i = 1wehave INDOCRYPT, 2011 ser. LNCS, vol. 7107, pp. 40–57. Springer i−1 a1 = 1 which implies (−1) ai = 1 as required. For (2011) 1 6. Coron, J.-S.: Resistance against differential power analysis for i = 2wehavea2 =−q1a1 which implies (−1) a2 = ≥ elliptic curve cryptosystems. In: Koç, Ç.K., Paar, C. (eds.): Cryp- q1a1 1. Now, we suppose that the inequality holds for tographic Hardware and Embedded Systems. First International- i − 1 and i , i.e., Workshop, CHES’99 Worcester, MA, USA, August 12–13, 1999, Proceedings, Lecture Notes in Computer Science, vol. 1717, pp. i−2 i−1 292–302. Springer, Berlin (1999) (−1) ai−1 ≥ 1 and (−1) ai ≥ 1, (9) 7. Joye, M., Yen, S.: The Montgomery Powering Ladder. In: CHES, + 20002 ser. LNCS, vol. 2523, pp. 291–302. Springer (2002) and we prove that the inequality is also true for i 1. We 8. Montgomery, P.: Modular multiplication without . i starts with (−1) ai+1 and replace ai+1 by its expression Math. Comput. 44, 519–521 (1985) in terms of ai , ai−1, ri and ri−1 in Algorithm 2. We obtain 9. Negre, C., Plantard, T., Robert, J.: Efficient Modular Exponentia- the following: tion Based on Multiple Multiplications by a Common Operand. In: 22nd IEEE Symposium on Computer Arithmetic 2015, pp. 144– 151 (2015) i i (−1) ai+1 = (−1) (ai−1 − ri−1/ri ai ) 10. Menezes, A., van Oorschot, P., Vanstone, S.: Handbook of Applied i i Cryptography. CRC Press, Boca Raton (1996) = (−1) ai− − ri− /ri (−1) ai 1 1 11. Yen, S.-M., Joye, M.: Checking before output may not be enough i−2 i−1 = (−1) ai−1 + ri−1/ri (−1) ai against fault-based cryptanalysis. IEEE Trans. Comput. 49(9), 967–970 (2000) ≥ 1 + r − /r (Using (9)) i 1 i 12. Yen, S.-M., Kim, S., Lim, S., Moon, S.-J.: A Countermeasure against One Physical Cryptanalysis May Benefit Another Attack. i Therefore, we have proven by induction that (−1) ai ≥ 1 In: ICISC, 2001 ser. LNCS, vol. 2288, pp. 414–427. Springer for all i. (2001) • Proof of (ii). We follow the proof of [16]: we express 13. Joye, M., Tunstall, M.: Exponent Recoding and Regular Exponen- the inductive expression of ai and ri as a 2 × 2matrix tiation Algorithms. In: Progress in Cryptology - AFRICACRYPT, product: 2009 ser. LNCS, vol. 5580, pp. 334–349. Springer (2009)      14. Bryant, E., Rambhia, A., Atallah, M. and Rice, J.: Software Trusted − / ai+1 ri+1 = ri−1 ri 1 ai ri . Platform Module and Application Security Wrapper,” Jan 2011, ai ri 10ai−1 ri−1 US Patent 7,870,399. [Online]. https://www.google.ch/patents/   US7870399 − r − /r 1 Now since for all i we have det i 1 i =−1,we 10 15. Gallant, R., Lambert, R., Vanstone, S.: Faster Point Multiplication obtain by induction that on Elliptic Curves with Efficient Endomorphisms. In: Advances     in Cryptology-CRYPTO, 2001 ser. LNCS, vol. 2139, pp. 190–200 a + r + a r Springer (2001) det i 1 i 1 = (−1)i det 1 1 ai ri a0 r0 16. von zur Gathen, J.: Modern Computer Algebra, 3rd edn. Cambridge   University Press, Cambridge (2013) 1 x = (−1)i det 17. Bosselaers, A., Govaerts, R. and Vandewalle, J.: “Comparison of 0 N Three Modular Reduction Functions,” in Advances in Cryptology- = (−1)i N. CRYPTO’93, ser. LNCS, vol. 773. Springer, pp. 175–186 (1993) 18. Papi, M.: “Performance Application Programming Interface (PAPI).” [Online]. Available: http://icl.cs.utk.edu/papi/ Finally we obtain that

i ∀i ≥ 0, ai+1ri − ai ri+1 = (−1) N.

123