Modular Exponentiation Via the Explicit Chinese Remainder Theorem

Modular Exponentiation Via the Explicit Chinese Remainder Theorem

MATHEMATICS OF COMPUTATION Volume 76, Number 257, January 2007, Pages 443–454 S 0025-5718(06)01849-7 Article electronically published on September 14, 2006 MODULAR EXPONENTIATION VIA THE EXPLICIT CHINESE REMAINDER THEOREM DANIEL J. BERNSTEIN AND JONATHAN P. SORENSON Abstract. Fix pairwise coprime positive integers p1,p2,...,ps.Wepropose representing√ integers u modulo m,wherem is any positive integer up to roughly p1p2 ···ps,asvectors(u mod p1,umod p2,...,umod ps). We use this representation to obtain a new result on the parallel complexity of mod- ular exponentiation: there is an algorithm for the Common CRCW PRAM that, given positive integers x, e,andm in binary, of total bit length n, computes xe mod m in time O(n/lg lg n)usingnO(1) processors. For com- parison, a parallelization of the standard binary algorithm takes superlinear 3 time; Adleman√ and Kompella gave an O((lg n) ) expected time algorithm us- ing exp(O( n lg n)) processors; von zur Gathen gave an NC algorithm for the highly special case that m is polynomially smooth. 1. Introduction In this paper we consider the problem of computing xe mod m for large integers x, e,andm. This is the bottleneck in Rabin’s algorithm for testing primality, the Diffie-Hellman algorithm for exchanging cryptographic keys, and many other common algorithms. See, e.g., [18, Section 4.5.4]. The usual solution for computing xe mod m is to compute small integers that are congruent modulo m to various powers of x. See, e.g., [18, Section 4.6.3], [11, Section 1.2], [16], [20], and [9]. For example, say e = 10. One can compute 2 2 x1 = x mod m,thenx2 = x1 mod m,thenx4 = x2 mod m,thenx5 = x1x4 mod m, 2 and finally x10 = x5 mod m. It is often convenient to allow x1,x2,x4,x5 to be slightly larger than m. The output xe mod m and inputs x, e, m are conventionally written in binary. 2 2 Standard practice is to also use the binary representation for x1, x1, x2, x2,etc. We instead use the residue representation: x1 is represented as (x1 mod p1, 2 2 2 2 x1 mod p2,...,x1 mod ps), x1 is represented as (x1 mod p1,x1 mod p2,...,x1 mod ps), etc. Here p1,p2,...,ps are small pairwise coprime positive integers, typically primes, such that P = p1p2 ···ps is sufficiently large. Note that, because pj is small, it is easy to compute (e.g.) x1x4 mod pj from x1 mod pj and x4 mod pj. Received by the editor August 18, 2003 and, in revised form, June 15, 2005. 2000 Mathematics Subject Classification. Primary 11Y16; Secondary 68W10. This paper combines and improves the preliminary papers [7] by Bernstein and [29] by Soren- son. Bernstein was supported by the National Science Foundation under grants DMS-9600083 and DMS–9970409. Sorenson was supported by the National Science Foundation under grant CCR–9626877. Sorenson completed part of this work while on sabbatical at Purdue University in Fall 1998. c 2006 by the authors 443 License or copyright restrictions may apply to redistribution; see https://www.ams.org/journal-terms-of-use 444 DANIEL J. BERNSTEIN AND JONATHAN P. SORENSON Residue representations of x1 and x4 easy Residue representation of x1x4 CRT Binary representation ECRT of something congruent to x1x4 modulo P ECRT reduction mod m ECRT ( modulo P mod m Binary representation mod pj of x1x4 reduction + modulo m Binary representation of x5 reduction + modulo pj Residue representation of x5 The usual Chinese remainder theorem says that (for example) x1x4 mod P is determined by the residue representation of x1x4. In fact, any integer u is congruent modulo P to a particular linear combination of u mod p1,umod p2,...,umod ps. The explicit Chinese remainder theorem says, in a computationally useful form, exactly what multiple of P must be subtracted from the linear combination to obtain u.See§2. Once we know the binary representation of x1x4, we could reduce it modulo m to obtain the binary representation of x5. We actually use the explicit CRT modulo m to obtain the binary representation of x5 directly from the residue representation of x1x4.See§3. Once we know the binary representation of x5, we reduce it modulo each pj to obtain the residue representation of x5. An alternative is to use the explicit CRT modulo m modulo pj to obtain the residue representation of x5 directly from the residue representation of x1x4.See§4. If P is sufficiently large then, as discussed in §5, one can perform several multi- plications in the residue representation before reduction modulo m.Thisispartic- ularly beneficial for parallel computation: operations in the residue representation are more easily parallelized than reductions modulo m. By optimizing parameters, we obtain the following result: Theorem 1.1. There is an algorithm for the Common CRCW PRAM that, given the binary representations of positive integers x, e,andm, of total bit length n, License or copyright restrictions may apply to redistribution; see https://www.ams.org/journal-terms-of-use MODULAR EXPONENTIATION VIA CHINESE REMAINDER THEOREM 445 computes the binary representation of xe mod m in time O(n/lg lg n) using nO(1) processors. In §6wedefinetheCommonCRCWPRAM,in§8 we present the algorithm, and in §7 we present a simpler algorithm taking time O(n). For comparison: A naive parallelization of the standard binary algorithm takes 3 superlinear time; see §7.√ Adleman and Kompella gave an O((lg n) ) expected time algorithm using exp(O( n lg n)) processors; see [2]. Von zur Gathen gave an NC algorithm for the highly special case that m is polynomially smooth; see [35]. It is unknown whether general integer modular exponentiation is in NC. For more on parallel exponentiation algorithms, see Gordon’s survey [16]. Our techniques can easily be applied to exponentiation in finite rings more gen- eral than Z/m. For example, the deterministic polynomial-time primality test of Agrawal, Kayal, and Saxena in [4] can be carried out in sublinear time using a polynomial number of processors on the Common CRCW PRAM; the bottleneck is exponentiation in a ring of the form (Z/m)[x]/(xk − 1). 2. The explicit Chinese remainder theorem Algorithms for integer arithmetic using the residue representation were intro- duced in the 1950s by Svoboda, Valach, and Garner, according to [18, Section 4.3.2]. This section discusses conversion from the residue representation to the binary representation. For each real number α such that α − 1/2 ∈/ Z, define round α as the unique integer r with |r − α| < 1/2. Theorem 2.1. Let p1,p2,...,ps be pairwise coprime positive integers. Write P = p1p2 ···ps.Letq1,q2,...,qs be integers with qiP/pi ≡ 1(modpi).Letu be an integer with |u| <P/2.Letu1,u2,...,us be integers with ui ≡ u (mod pi).Let ≡ − t1,t2,...,ts be integers with ti uiqi (mod pi).Thenu = Pα P round α where α = ti/pi. i ≡ ≡ ≡ Proof. Pα = i ti(P/pi) i uiqi(P/pi) ujqj(P/pj) u (mod pj)foreachj, so Pα ≡ u (mod P ). Write r = α − u/P .Thenr is an integer, and |r − α| = |u/P | < 1/2, so r = round α, i.e., u/P = α − round α. ≡ The usual Chinese remainder theorem says that u i tiP/pi (mod P ); this is the integer version of Lagrange’s interpolation formula. This is a popular way to compute u from the residues ui: first compute ti = uiqi mod pi or simply ti = uiqi, then compute Pα = i tiP/pi, then reduce Pα modulo P to the right range. The explicit Chinese remainder theorem, Theorem 2.1, suggests another way to divide Pα by P .Useti and pi directly to compute a low-precision approximation to α = i ti/pi with sufficient accuracy to determine round α; see, for example, Theorem 2.2 below. Then subtract P round α from Pα to obtain u. As far as we know, the first use of the explicit Chinese remainder theorem was by Montgomery and Silverman in [23, Section 4]. It has also appeared in [22], [13, Section 2.1], [7], [29], and [3, Section 5]. Theorem 2.2. Let β1,β2,...,βs be real numbers. Let r and a be integers. If a −a a |r − βi| < 1/4 and 2 ≥ 2s then r = 3/4+2 2 βi. i i −a a ≤ −a a ≤ Proof. r<1/4+ i βi =1/4+2 i 2 βi 1/4+2(s + i 2 βi ) 1/4+ −a a a −a a ≤ 2 (2 /2+ i 2 βi )=3/4+2 i 2 βi 3/4+ i βi <r+1. License or copyright restrictions may apply to redistribution; see https://www.ams.org/journal-terms-of-use 446 DANIEL J. BERNSTEIN AND JONATHAN P. SORENSON In the situation of Theorem 2.1, assume further that |u| <P/4. We can then use Theorem 2.2 to quickly compute r = round α.Wechoosea with 2a ≥ 2s, −a a then compute the fixed-point approximations 2 2 ti/pi to ti/pi, then compute −a a r = 3/4+2 i 2 ti/pi . 3. The explicit CRT mod m Theorem 3.1. Let p1,p2,...,ps be pairwise coprime positive integers. Write P = p1p2 ···ps.Letq1,q2,...,qs be integers with qiP/pi ≡ 1(modpi).Letu be an | | integer with u <P/2.Writeti = uqi mod pi.Letm be a positive integer. Write − ≡ v = iti(P/pi mod m) (P mod m) round i ti/pi.Thenu v (mod m),and | |≤ v m i pi.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    12 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us