Implementation of the Quadratic Sieve
Total Page:16
File Type:pdf, Size:1020Kb
Page 1 of 28 Implementation of the Quadratic Sieve Mark Gordon Cryptography 475 November 30th, 2008 Introduction Finding the factorization of a number, n, is a computationally difficult problem that is at the heart of the security of the RSA encryption algorithm. The naïve solution, trial division, takes O(sqrt(n)) time and is ineffective for even moderately large n. The quadratic sieve offers an improvement on this algorithm and is currently the second asymptotically fastest algorithm for factoring integers. In this paper I present my implementation of the quadratic sieve and its results. The Quadratic Sieve The main idea behind the quadratic sieve is that if we have a and b such that a2=b2 (mod n) then we have a2-b2=n (mod n) which can be re-written as (a-b)(a+b)=n (mod n). Therefore we can take either a-b or a+b and test if it contains a non trivial factor of n using the Euclidean algorithm. One idea to find perfect squares is to find pairs (x, Q(x)), where Q(x) is x2-n (different polynomials are possible), and test if Q(x) is a perfect square. Note that x2=Q(x) mod n so if Q(x) happens to be a perfect square we have a congruence of squares which will allow us to factor n. Finding Squares Unfortunately there are very few x such that Q(x) will be a perfect square. An improvement on this concept would be to try and combine pairs (x, Q(x)) and (y, Q(y)) to yield (xy, Q(x)Q(y)). Note that the congruence (xy)2=Q(x)Q(y) mod n still holds so if it happens that Q(x)Q(y) is a perfect square we once again have a congruence of squares. If we view Q(x) as an exponent vector, Q(x) = p1v1*p2v2*...*pBvB, then Q(x) would be a perfect square if and only if every vi was even. Linear System Restating the problem at hand; we want to take the product of some subset of Q(xi) such that each exponent in the exponent vector is even. This is equivalent to saying we want to add some subset of vectors such that they sum to 0 modulo 2. This is the same Page 2 of 28 as saying we want to find a linear dependence between the vectors modulo 2 which can be solved easily with Gaussian Elimination. Factor Base Because there are a lot of primes that could divide Q(x) requiring us to collect a lot of pairs (x, Q(x)) (and because factoring is hard) it will help if we choose some factor base, F, such that we only collect pairs (x, Q(x)) such that Q(x) factors completely over F. With a restricted size on F we will only need to gather |F|+1 pairs to ensure that we will have a linear dependence and thus be able to generate congruent squares solving the factoring problem. Preparations In order to do anything with large integers a library for doing basic integer operations is required since native integer types on any processor have a fixed (small) size. To accomplish this I wrote a bigint class that stores the number it represents in the base 230 and handles all of the basic arithmetic operations (+, -, *, /, %) and comparisons one would expect. In addition to basic arithmetic the bigint library contains some other functions that were needed to implement the quadratic sieve. This includes: A square root function that computes the floor of the square root of a big integer. A gcd function for computing the greatest common divisor of two big integers using the Euclidean Algorithm. A modular exponentiation algorithm for raising be modulo m using repeated squaring. A modular inverse algorithm for computing x-1 modulo m using the Extended Euclidean Algorithm. A method for detecting if a number is probably prime using the Miller-Rabin probabilistic primality test. A method for generating random numbers of a fixed number of bits. A method for generating a random prime of a fixed number of bits. (This was useful for testing) Page 3 of 28 A method for computing the Legendre Symbol. This symbols tells you about the existence of square roots of a number modulo a prime. This was implemented as described in [4]. A method for computing the square root of a number modulo a prime. This uses the Shanks-Tonelli algorithm and is implemented as described in [3]. The Algorithm All of the previous preparation has been for the purpose of implementing the quadratic sieve which is located within the bigint::factor(bool) method in my implementation (Appendix B). In fact the factor method isn’t just the quadratic sieve as it tries trial division to remove small primes to start with, checks if the number is probably prime (and therefore has no factors), runs Pollard's Rho algorithm for a fixed number of iterations to try and get lucky and find some factors, and then finally moves on to the quadratic sieve. Computing the factor base The quadratic sieve starts out by calculating a factor base. The size of the factor base is calculated as B=2*esqrt(ln(n)ln(ln(n))/4 approximately as suggested in [1]. The elements of F are chosen as the first B primes that have a quadratic residue modulo n. The Sieve of Eratosthenes is used for quickly calculating primes and Legendre's symbol is checked to be 1. After calculating the factor base the square root of n modulo p is calculated for each p in F using the Shanks-Tonelli algorithm. This square root is useful because it allow us to find up to two x's such that Q(x) is divisible by p. Then this can be used, together with the fact that Q(x) = Q(x + p) (mod p), to find two arithmetic progressions that represent all x's such that p divides Q(x). Sieving Now we're ready to sieve. Traditionally sieving is done by selecting some sieving interval and dividing out the largest power of each prime p from each element in the arithmetic progressions associated with that prime within the sieving interval. If at the end of sieving an element is 1 then the associated Q(x) is factorable over the factor base. Page 4 of 28 Instead of doing that I chose to start at x=1+floor(sqrt(n)) and just keep going forward iteratively. To tell what primes should be divided out of Q(x) I instead keep a heap that tracks what primes will be appearing next. The main advantages to this approach are that I don't need tons of extra memory, I don't have to try and guess the size of the sieving interval, and I can get pairs (x, Q(x)) early and test for linear dependencies as I go. However these advantages come at a slight cost to runtime due to the heap operations. Gaussian Elimination Unlike the approach described in [1], I search for linear dependencies as I find new pairs. This usually reduces the number of pairs required but the cut is not substantial. The search for linear dependencies is just done using Gaussian elimination optimized for sparse matrices. The elimination is done carefully enough so that the actual subset of pairs can be reconstructed that formed a linear dependence. Constructing a Solution After finding the subset of pairs that creates the linear dependence we can calculate a and b such that a2=b2 (mod n). a will be calculated as the product of each of xi modulo n in the selected pairs. Then b can be computed by calculating the exponent vector of the product of each Q(xi) and halving each exponent. Finally we can take gcd(a – b, n) and gcd(a + b, n) to try and find a non trivial factor of n. Double Large Primes In addition to tracking pairs (x, Q(x)) that factor completely over the factor base I also track when Q(x) almost factors over the factor base save one large prime L. In this case I check if I’ve seen any other pair (y, Q(y)) such that Q(y) factors over the factor base except for L. If this is the case I combine the pairs and form a pair (x*y, Q(x)*Q(y)) . Now Q(x) * Q(y) factor entirely over the factor base except for an L2 term which I can remember is there when I try to calculate the square root of the products of the subset of Q(xi). Since there may be many of these large-prime Q(x) only R pairs with the smallest associated L are kept in memory where R was selected to be 10,000,000. Performance To test the effectiveness and correctness of my implementation of the quadratic sieve I generated two probable primes p and q and fed p * q to my factoring algorithm. Below is a table of the performance of my quadratic sieve on different sizes of p * q. Note that the quadratic sieve does not change Page 5 of 28 Size of p * Time to Factor (x, Q(x)) q (bits) factor base size pairs needed 160 2 hours 6454 6288 21 minutes 144 7 minutes 3650 3910 13 seconds 128 5 minutes 2313 2114 16 seconds 112 39 1280 1096 seconds 96 33 368 350.7 seconds Known bugs Additionally there appear to be some bugs implementation bugs present when large semi-primes are tested.