On the Amount of Sieving in Factorization Methods Proefschrift
Total Page:16
File Type:pdf, Size:1020Kb
On the Amount of Sieving in Factorization Methods Proefschrift ter verkrijging van de graad van Doctor aan de Universiteit Leiden, op gezag van Rector Magnificus prof.mr. P.F. van der Heijden, volgens besluit van het College voor Promoties te verdedigen op woensdag 20 januari 2010 klokke 13.45 uur door Willemina Hendrika Ekkelkamp geboren te Almelo in 1978 Samenstelling van de promotiecommissie: promotoren: Prof.dr. R. Tijdeman Prof.dr. A.K. Lenstra (EPFL, Lausanne, Zwitserland) copromotor: Dr.ir. H.J.J. te Riele (CWI) overige leden: Prof.dr. R.J.F. Cramer (CWI / Universiteit Leiden) Prof.dr. H.W. Lenstra Prof.dr. P. Stevenhagen Dr. P. Zimmermann (INRIA, Nancy, Frankrijk) The research in this thesis has been carried out at the Centrum Wiskunde & Informa- tica (CWI) and at the Universiteit Leiden. It has been financially supported by the Netherlands Organisation for Scientific Research (NWO), project 613.000.314. On the Amount of Sieving in Factorization Methods Ekkelkamp, W.H., 1978 – On the Amount of Sieving in Factorization Methods ISBN: THOMAS STIELTJES INSTITUTE FOR MATHEMATICS c W.H. Ekkelkamp, Den Ham 2009 The° illustration on the cover is a free interpretation of line and lattice sieving in combination with the amount of relations found. Contents 1 Introduction 1 1.1 Factoringlargenumbers .......................... 1 1.2 Multiple polynomial quadratic sieve . 2 1.3 Number field sieve . 4 1.4 Smoothandsemismoothnumbers. 6 1.5 Simulating the sieving . 7 1.6 Conclusion ................................. 7 2 Smooth and Semismooth Numbers 9 2.1 Introduction................................. 9 2.2 Basictools.................................. 11 2.3 Type1expansions ............................. 11 2.3.1 Smoothnumbers .......................... 12 2.3.2 1-semismoothnumbers. 12 2.3.3 2-semismoothnumbers. 13 2.3.4 k-semismoothnumbers....................... 14 2.4 Type2expansions ............................. 15 2.4.1 Smoothnumbers .......................... 15 2.4.2 k-semismoothnumbers....................... 16 2.5 ProofofTheorem6............................. 18 2.6 ProofofTheorem7............................. 22 3 From Theory to Practice 29 3.1 Introduction................................. 29 3.2 Computing the approximating Ψ-function of Corollary5 ................................. 30 β1 β2 α 3.2.1 Main term of Ψ2(x,x ,x ,x ) ................. 31 β1 β2 α 3.2.2 Second order term of Ψ2(x,x ,x ,x )............. 36 3.3 ExperimentingwithMPQS ........................ 40 3.3.1 Equal upper bounds . 42 3.3.2 Different upper bounds . 48 3.3.3 Optimal bounds . 50 iv CONTENTS 3.4 ExperimentingwithNFS ......................... 52 4 Predicting the Sieving Effort for the Number Field Sieve 61 4.1 Introduction................................. 61 4.2 Simulatingrelations ............................ 63 4.2.1 CaseI................................ 64 4.2.2 CaseII ............................... 68 4.2.3 Special primes . 70 4.3 Thestopcriterion ............................. 71 4.3.1 Duplicates.............................. 72 4.4 Experiments................................. 72 4.4.1 CaseI,linesieving ......................... 73 4.4.2 Case I, lattice sieving . 76 4.4.3 CaseII,linesieving ........................ 77 4.4.4 Case II, lattice sieving . 79 4.4.5 Comparing line and lattice sieving . 79 4.5 Whichcasetochoose............................ 82 4.6 Additional experiments . 85 4.6.1 Size of the sample sieve test . 85 4.6.2 Expected sieving area and sieving time . 89 4.6.3 Growth behavior of useful relations . 92 4.6.4 Oversquareness and matrix size . 95 5 Conclusions and Suggestions for Future Research 99 5.1 Smoothandsemismoothnumbers. 99 5.2 Simulating the sieving . 100 Bibliography 103 Samenvatting 107 Acknowledgements 109 Curriculum Vitae 111 Chapter 1 Introduction 1.1 Factoring large numbers In 1977 Rivest, Shamir, and Adleman introduced the RSA public-key cryptosystem [44]. The safety of this cryptosystem is closely related to the difficulty of factoring large integers. It has not been proved that breaking RSA is equivalent with factor- ing, but it is getting close: see, for instance, the work of Aggarwal and Maurer [1]. Nowadays RSA is widely used, so factoring large integers is not merely an academic exercise, but has important practical implications. The searches for large primes and for the prime factors of composite numbers have a long history. Euclid ( 300 B.C.) was one of the first persons to write about compositeness of numbers, followed± by Eratosthenes (276–194 B.C.), who came up with an algorithm that finds all primes up to a certain bound, the so-called Sieve of Eratosthenes. For an excellent overview of the history of factorization methods, we refer to the thesis of Elkenbracht-Huizing ([18], Ch. 2). Over the years, people came up with faster factorization methods, when the fac- tors to be found are large. Based on ideas of Kraitchik, and Morrison and Brillhart, Schroeppel introduced the linear sieve (1976/1977). With a small modification by Pomerance this became the quadratic sieve; albeit a very basic version. Improvements are due to Davis and Holdridge, and Montgomery (cf. [15], [39], [45]). Factoring al- gorithms based on sieving are much faster than earlier algorithms, as the expensive divisions are replaced by additions. The expected running time of the quadratic sieve, based on heuristic assumptions, is L(N) = exp((1 + o(1))(log N)1/2(log log N)1/2), as N , → ∞ where N is the number to be factored and the logarithms are natural [14]. An improvement of the quadratic sieve is the multiple polynomial quadratic sieve. The same idea of sieving is used in the number field sieve, based on ideas of Pollard (pp. 4–10 of [27]) and further developed by H.W. Lenstra, among others. 2 Introduction The heuristic expected running time of the number field sieve is given by (cf. [27]) L(N) = exp(((64/9)1/3 + o(1))(log N)1/3(log log N)2/3), as N . → ∞ Asymptotically, the number field sieve is the fastest known general method for factor- ing large integers, and in practice for numbers of about 90 and more decimal digits. We will go into the details of the multiple polynomial quadratic sieve and the number field sieve in the next two sections, as we use them for experiments in this thesis. Sections 1.4, 1.5, and 1.6 contain an overview of the thesis. 1.2 Multiple polynomial quadratic sieve The multiple polynomial quadratic sieve (MPQS) is based on the quadratic sieve, so we start with a short description of the basic quadratic sieve (QS). Take N as the (odd) composite number (not a perfect power) to be factored. Then QS searches for integers x and y that satisfy the equation x2 y2 mod N. ≡ If this holds, and x y mod N and x y mod N, then gcd(x y, N) is a proper factor of N. 6≡ 6≡ − − The algorithm starts with the polynomial f(t) = t2 N, t Z. For every prime factor p of f(t) the congruence t2 N mod p must hold.− For odd∈ primes p this can N ≡ be expressed as the condition ( p ) = 1 (Legendre symbol). A number is called F -smooth if all its prime factors are below some given bound F . These numbers are collected in the sieving step of the algorithm. The factorbase FB is defined as the set of 1 (for practical reasons), 2, and all the odd primes p −N up to a bound F for which ( p ) = 1. Now we evaluate f(t) for all integer values t [√N M, √N + M], where M is subexponential in N, and keep only those for which∈ f(−t) is F -smooth. We can write this as | | t2 pep(t) mod N, (1.1) ≡ p FB ∈Y where ep(t) 0 and we call expression (1.1) a relation. The left-hand≥ side is a square. Hence the product of such expressions is also a square. If we generate relations until we have more distinct relations than primes in the factorbase, there is a non-empty subset of this set of relations such that multipli- cation of the relations in this subset gives a square on the right-hand side. To find a correct combination of relations, we use linear algebra to find dependencies in a matrix with the exponents ep(t) mod 2 (one relation per row). Possible algorithms for finding such a dependency include Gaussian elimination, block Lanczos, and block Wiedemann; the running time of Gaussian elimination is O(n3) and the running time of block Lanczos and block Wiedemann is O(nW ), where n is the dimension of the matrix and W the weight of the matrix ([12], [13], [33], [49]). 1.2 Multiple polynomial quadratic sieve 3 Once we have a dependency of k relations for t = t1,t2,...,tk, we set x = t1t2 ...tk mod N and ··· ep(t1)+ +ep(tk) y = p 2 mod N. p FB ∈Y Then x2 y2 (mod N). Now we compute gcd(x y, N) and hopefully we have found a proper≡ factor. If not, we continue with the next− dependency. As the probability of finding a factor is at least 1/2 for each dependency (as N has at least two distinct prime factors), we are in practice almost certain to have found the prime factors after having computed a small number of gcd’s as above. To make clear where the term ‘sieving’ from the title comes from, we take a closer look at the polynomial f(t). If we know that 2 is a divisor of f(3), then 2 is a di- visor of f(3 + 2k), for all k Z; more generally, once we have located a t-value for which a given prime p divides∈ f(t), we know a residue class modulo p of t-values for which p divides f(t). This can be exploited as follows. Initialize an array a as log( f(t) ) for all integers t [√N M, √N + M], so we have a[i] = log( f(i) ) for i| [√| N M, √N + M]∈ for some− appropriate M.