Modular Exponentiation Via the Explicit Chinese Remainder Theorem

MATHEMATICS OF COMPUTATION Volume 76, Number 257, January 2007, Pages 443–454 S 0025-5718(06)01849-7 Article electronically published on September 14, 2006 MODULAR EXPONENTIATION VIA THE EXPLICIT CHINESE REMAINDER THEOREM DANIEL J. BERNSTEIN AND JONATHAN P. SORENSON Abstract. Fix pairwise coprime positive integers p1,p2,...,ps.Wepropose representing√ integers u modulo m,wherem is any positive integer up to roughly p1p2 ···ps,asvectors(u mod p1,umod p2,...,umod ps). We use this representation to obtain a new result on the parallel complexity of modular exponentiation: there is an algorithm for the Common CRCW PRAM that, given positive integers x, e,andm in binary, of total bit length n, computes xe mod m in time O(n/lg lg n)usingnO(1) processors. For comparison, a parallelization of the standard binary algorithm takes superlinear 3 time; Adleman√ and Kompella gave an O((lg n) ) expected time algorithm using exp(O( n lg n)) processors; von zur Gathen gave an NC algorithm for the highly special case that m is polynomially smooth. 1. Introduction In this paper we consider the problem of computing xe mod m for large integers x, e,andm. This is the bottleneck in Rabin’s algorithm for testing primality, the Diffie-Hellman algorithm for exchanging cryptographic keys, and many other common algorithms. See, e.g., [18, Section 4.5.4]. The usual solution for computing xe mod m is to compute small integers that are congruent modulo m to various powers of x. See, e.g., [18, Section 4.6.3], [11, Section 1.2], [16], [20], and [9]. For example, say e = 10. One can compute 2 2 x1 = x mod m,thenx2 = x1 mod m,thenx4 = x2 mod m,thenx5 = x1x4 mod m, 2 and finally x10 = x5 mod m. It is often convenient to allow x1,x2,x4,x5 to be slightly larger than m. The output xe mod m and inputs x, e, m are conventionally written in binary. 2 2 Standard practice is to also use the binary representation for x1, x1, x2, x2,etc. We instead use the residue representation: x1 is represented as (x1 mod p1, 2 2 2 2 x1 mod p2,...,x1 mod ps), x1 is represented as (x1 mod p1,x1 mod p2,...,x1 mod ps), etc. Here p1,p2,...,ps are small pairwise coprime positive integers, typically primes, such that P = p1p2 ···ps is sufficiently large. Note that, because pj is small, it is easy to compute (e.g.) x1x4 mod pj from x1 mod pj and x4 mod pj. Received by the editor August 18, 2003 and, in revised form, June 15, 2005. 2000 Mathematics Subject Classification. Primary 11Y16; Secondary 68W10. This paper combines and improves the preliminary papers [7] by Bernstein and [29] by Soren- son. Bernstein was supported by the National Science Foundation under grants DMS-9600083 and DMS–9970409. Sorenson was supported by the National Science Foundation under grant CCR–9626877. Sorenson completed part of this work while on sabbatical at Purdue University in Fall 1998. c 2006 by the authors 443 License or copyright restrictions may apply to redistribution; see https://www.ams.org/journal-terms-of-use 444 DANIEL J. BERNSTEIN AND JONATHAN P. SORENSON Residue representations of x1 and x4 easy Residue representation of x1x4 CRT Binary representation ECRT of something congruent to x1x4 modulo P ECRT reduction mod m ECRT ( modulo P mod m Binary representation mod pj of x1x4 reduction + modulo m Binary representation of x5 reduction + modulo pj Residue representation of x5 The usual Chinese remainder theorem says that (for example) x1x4 mod P is determined by the residue representation of x1x4. In fact, any integer u is congruent modulo P to a particular linear combination of u mod p1,umod p2,...,umod ps. The explicit Chinese remainder theorem says, in a computationally useful form, exactly what multiple of P must be subtracted from the linear combination to obtain u.See§2. Once we know the binary representation of x1x4, we could reduce it modulo m to obtain the binary representation of x5. We actually use the explicit CRT modulo m to obtain the binary representation of x5 directly from the residue representation of x1x4.See§3. Once we know the binary representation of x5, we reduce it modulo each pj to obtain the residue representation of x5. An alternative is to use the explicit CRT modulo m modulo pj to obtain the residue representation of x5 directly from the residue representation of x1x4.See§4. If P is sufficiently large then, as discussed in §5, one can perform several multi- plications in the residue representation before reduction modulo m.Thisispartic- ularly beneficial for parallel computation: operations in the residue representation are more easily parallelized than reductions modulo m. By optimizing parameters, we obtain the following result: Theorem 1.1. There is an algorithm for the Common CRCW PRAM that, given the binary representations of positive integers x, e,andm, of total bit length n, License or copyright restrictions may apply to redistribution; see https://www.ams.org/journal-terms-of-use MODULAR EXPONENTIATION VIA CHINESE REMAINDER THEOREM 445 computes the binary representation of xe mod m in time O(n/lg lg n) using nO(1) processors. In §6wedefinetheCommonCRCWPRAM,in§8 we present the algorithm, and in §7 we present a simpler algorithm taking time O(n). For comparison: A naive parallelization of the standard binary algorithm takes 3 superlinear time; see §7.√ Adleman and Kompella gave an O((lg n) ) expected time algorithm using exp(O( n lg n)) processors; see [2]. Von zur Gathen gave an NC algorithm for the highly special case that m is polynomially smooth; see [35]. It is unknown whether general integer modular exponentiation is in NC. For more on parallel exponentiation algorithms, see Gordon’s survey [16]. Our techniques can easily be applied to exponentiation in finite rings more general than Z/m. For example, the deterministic polynomial-time primality test of Agrawal, Kayal, and Saxena in [4] can be carried out in sublinear time using a polynomial number of processors on the Common CRCW PRAM; the bottleneck is exponentiation in a ring of the form (Z/m)[x]/(xk − 1). 2. The explicit Chinese remainder theorem Algorithms for integer arithmetic using the residue representation were intro- duced in the 1950s by Svoboda, Valach, and Garner, according to [18, Section 4.3.2]. This section discusses conversion from the residue representation to the binary representation. For each real number α such that α − 1/2 ∈/ Z, define round α as the unique integer r with |r − α| < 1/2. Theorem 2.1. Let p1,p2,...,ps be pairwise coprime positive integers. Write P = p1p2 ···ps.Letq1,q2,...,qs be integers with qiP/pi ≡ 1(modpi).Letu be an integer with |u| <P/2.Letu1,u2,...,us be integers with ui ≡ u (mod pi).Let ≡ − t1,t2,...,ts be integers with ti uiqi (mod pi).Thenu = Pα P round α where α = ti/pi. i ≡ ≡ ≡ Proof. Pα = i ti(P/pi) i uiqi(P/pi) ujqj(P/pj) u (mod pj)foreachj, so Pα ≡ u (mod P ). Write r = α − u/P .Thenr is an integer, and |r − α| = |u/P | < 1/2, so r = round α, i.e., u/P = α − round α. ≡ The usual Chinese remainder theorem says that u i tiP/pi (mod P ); this is the integer version of Lagrange’s interpolation formula. This is a popular way to compute u from the residues ui: first compute ti = uiqi mod pi or simply ti = uiqi, then compute Pα = i tiP/pi, then reduce Pα modulo P to the right range. The explicit Chinese remainder theorem, Theorem 2.1, suggests another way to divide Pα by P .Useti and pi directly to compute a low-precision approximation to α = i ti/pi with sufficient accuracy to determine round α; see, for example, Theorem 2.2 below. Then subtract P round α from Pα to obtain u. As far as we know, the first use of the explicit Chinese remainder theorem was by Montgomery and Silverman in [23, Section 4]. It has also appeared in [22], [13, Section 2.1], [7], [29], and [3, Section 5]. Theorem 2.2. Let β1,β2,...,βs be real numbers. Let r and a be integers. If a −a a |r − βi| < 1/4 and 2 ≥ 2s then r = 3/4+2 2 βi. i i −a a ≤ −a a ≤ Proof. r<1/4+ i βi =1/4+2 i 2 βi 1/4+2(s + i 2 βi ) 1/4+ −a a a −a a ≤ 2 (2 /2+ i 2 βi )=3/4+2 i 2 βi 3/4+ i βi <r+1. License or copyright restrictions may apply to redistribution; see https://www.ams.org/journal-terms-of-use 446 DANIEL J. BERNSTEIN AND JONATHAN P. SORENSON In the situation of Theorem 2.1, assume further that |u| <P/4. We can then use Theorem 2.2 to quickly compute r = round α.Wechoosea with 2a ≥ 2s, −a a then compute the fixed-point approximations 2 2 ti/pi to ti/pi, then compute −a a r = 3/4+2 i 2 ti/pi . 3. The explicit CRT mod m Theorem 3.1. Let p1,p2,...,ps be pairwise coprime positive integers. Write P = p1p2 ···ps.Letq1,q2,...,qs be integers with qiP/pi ≡ 1(modpi).Letu be an | | integer with u <P/2.Writeti = uqi mod pi.Letm be a positive integer. Write − ≡ v = iti(P/pi mod m) (P mod m) round i ti/pi.Thenu v (mod m),and | |≤ v m i pi.

Load more