Integer Factorization - An Investigation of Methods and Implementation

Josh Boone Southern Illinois University at Carbondale Carbondale, IL 62901

April 24, 2007

Abstract is the breaking down of a composite integer into its prime factors. This unique factorization is then used to ana- lyze the number, or in the case of most cryptographical applications, breaking the cryptoscheme. There are many methods of factorization, but we will focus on those based off of Fermat’s Factorization Method. We will give a proof of correctness of this method, as well as an exam- ple. We will discuss Dixon’s Factorization Method at length, with an example given to show how it works. Finally, we will give some insight on the method, a factorization algorithm that uses quadratic congruences to reduce the amount of time needed to factor an integer.

1 Introduction

Integer factorization has been a topic of study since the beginning of number theory. The French mathematicial Pierre de Fermat (1601-1665) is credited with one of the earliest algorithms, aptly named Fermat’s Factorization Method. This method is based on a congruence of squares, which is the backbone of many factorization methods. We will prove the correctness and show the strength and weaknesses of this algorithm, and discuss the extensions of Fermat’s Method that are more efficient: Dixon’s Factorization Method and The Quadratic Sieve Method. Before we begin discussion of this and the other methods, we need some results and definitions from elementary number theory.

1 2 Some Number Theory

All of our methods will rely upon the following important theorem, without which factorization would be unimportant. Theorem 2.1 (Fundamental Theorem of Arithmetic). Every integer greater than 1 can be written as a unique product of prime numbers. This rather intuitive result was first proven by Euclid in a more limited form, but was first proven completely by Carl Friedrich Gauss at the age of 21. Here is a simple proof. proof:

Factorization: Assume that there exists a number that is not a product of primes. By the well-ordering principal, there must be a smallest integer that has this property. Call it n. It must be the case that n > 1, and that n is composite (since any prime is obviously a product of primes). Then n = ab, where a and b are positive integers less than n. It follows that a and b must be a product of primes. So, n = ab must be a product of primes, a contradiction. Uniqueness: Assume we have two factorizations, n = p1p2 . . . ps = q1q2 . . . qt, where pi and qi are primes and, WLOG, s ≤ t. Also WLOG, we can assume that the primes are written in increasing order, i.e. p1 ≤ p2 ≤ ... ≤ ps, and q1 ≤ q2 ≤ ... ≤ qt. We have p1|q1q2 . . . qt, so p1 = qk for some k ⇒p1 ≥ q1. Similarly, q1|p1p2 . . . ps, so q1 ≥ p1. So p1 = q1. Continuing this algorithm, we end up with 1 = qs+1qs+2 . . . qt. Hence each q = 1. So we have s = t and pi = qi for each i. So, the factorizations are identical.

We will also need some definitions to discuss our algorithms. Definition 2.2 (Congruence of Squares). Two integers x and y, x 6≡ ±y satisfy a congruence of squares modulo n if: x2 ≡ y2(mod n) for some positive integer n. Notice that this congruence implies x2 − y2 ≡ 0(mod n), i.e. (x − y)(x + y) ≡ 0(mod n).

2 Definition 2.3 (). Let a, m be positive integers. We say a is quadratic residue of m if gcd(a, m) = 1 and x2 ≡ a(mod n) has a solution. This term will come up when we discuss the Quadratic Sieve Method.

3 The Most Basic Algorithm: Trial Division

Now that we know a little background material, we can discuss some factor- ization algorithms. We will begin with the most basic of algorithms, trial division. This algorithm should be familiar, since nearly every student has used it to factor an integer in an algebra class. Algorithm 3.1 (Trial Division). To factor an integer n, √ 1. For p odd from 2 to n, if p divides n, p is a factor of n.

2. For each p dividing n, while pj|n, j ∈ Z+, the multiplicity of the factor p equals j. If n has t factors, we get the factorization n = pj1 pj2 . . . pjt . However, √ 1 2 t n this factorization takes, on average, 2 steps.[1] So, if n has two factors of similar size (like most cryptographic schemes) this algorithm is certainly computationally infeasible for large n.

4 Fermat’s Factorization Method

Now, our first algorithm involving the congruence of squares. Algorithm 4.1 (Fermat’s Factorization Method).

INPUT: An odd composite integer n. OUTPUT: Two integers a, b such that n = ab. √ 1. r ← d( n)e s0 ← r2 − n 2. While s0 is not a perfect square: r ← r + 1 s0 ← r2 − n

3 3. If r = n+1 , return ’error: n is prime’ 2 √ √ Otherwise, return a = r − s0, b = r + s0

Example 4.2. Factor n = 6077 using Fermat’s Method

√ r = d ne = 78 782 − 6077 = 7 792 − 6077 = 164 802 − 6077 = 323 812 − 6077 = 484

Since 484 = 222, we see that: 6077 = 812 − 222 Hence, 6077 = (81 − 22)(81 + 22) = 59 ∗ 103

Of course, this method does not always give the full factorization, just two odd integers that divide n. The idea is that this information will lead to an easy analysis of n (or a complete factorization if n is simply a product of two primes). This seems like it would be a perfect algorithm for breaking the RSA modulus, so why is this not the end of our discussion?

5 Efficiency of Fermat’s Method √ Notice that Fermat’s method is very fast if n = pq, where p ≈ q ≈ n. Be- cause of this fact, RSA primes are chosen carefully to not have this property, just as they are chosen to not have very small factors (which are easy to find with trial division). For this reason, Fermat’s method is not as efficient for breaking cryptoschemes as it looks at first glance. In fact, as the distance √ between p and n increases, the running time increases faster than expo- nentially.[2] So, with just a tiny bit of foresight, it is easy to design an RSA modulus that makes Fermat’s method computationally infeasible. However, Fermat’s Method is a very important topic of study, since it was the first factoring method to use the congruence of squares as a basis. We will see that two important extensions of Fermat’s method are still in use today, one of which is the premier factoring algorithm for numbers with less than 115 decimal digits. For now, let us study Fermat’s method in its entirety.

4 6 Correctness of Fermat’s Method

Now we will show that Fermat’s method does indeed always find a factor of n. Say n = ab. We want to show that we will always find those factors, i.e. that either a or b is in the range of our iteration. The proof will also explain n+1 why n is prime if r reaches 2 . Theorem 6.1 (Correctness of Fermat’s Method). For any odd composite integer n, Algorithm 4.1 will always find a divisor of n. proof:

Let n = ab. Then, 1 1 2 2 a+b 2 a−b 2 n = ab = 4 (2ab + 2ab) = 4 ((a + b) − (a − b) ) = ( 2 ) − ( 2 ) a+b a−b 2 2 So, if we let r = 2 and s = 2 , we see that n = r − s Note: r and s are integers, because n odd ⇒ a, b also odd. √ We will now show that r is in the range of the iteration, i.e. d ne ≤ r < n+1 . √ 2 Assume that r < n. Then, √ n = r2 − s2 < n2 − s2 = n − s2 ⇒ s2 < 0, an obvious contradiction.

n+1 Assume that r ≥ 2 . Then, 2 2 n+1 2 2 n = r − s ≥ ( 2 ) − s 2 n+1 2 n2 n 1 n2 n 1 n−1 2 ⇒ s ≥ ( 2 ) − n = ( 4 + 2 + 4 ) − n = 4 − 2 + 4 = ( 2 ) n−1 ⇒ s ≥ 2 n+1 n−1 So, r + s ≥ 2 + 2 = n But we know that n = r2 − s2 = (r + s)(r − s) So it must be that r + s = n, r − s = 1. ⇒ n is prime, another contradiction.

So it must be that the value r can always be found by Algorithm 4.1, hence the factors a and b are always found. 

7 Extension 1 of Fermat’s Method - Dixon’s Method

Fermat’s method is academically interesting, even if it has limited applica- tion, because it has extensions that are very useful for factoring integers. John D. Dixon of Carlten University, Ontario, devised this method in 1981 that is an extension of Fermat’s method.[3] Dixon’s method uses congru- ences of squares to find a divisor of n. It also uses Gaussian elimination to

5 solve the resulting matrix. We also must prepare a table of primes before we begin, the size of which we will discuss later. Algorithm 7.1 (Dixon’s Factorization Method).

INPUT: A composite integer n to be factored, a set {S}, called the factor base[3] of all primes less than some integer S called the prime bound, an integer R called the relation bound. OUTPUT: An integer a such that a|n.

1. s ← |{S}|. √ x ← d ne r ← 0

2. While r < R If f(x) = x2 (mod n) factors over {S}, store (x, f(x)), [These pairs are called relations], r ← r + 1, x ← x + 1 Else, x ← x + 1

3. For each of the r relations (x, f(x)), e1 e2 es Factor f(x) = p1 p2 . . . ps and store the ei’s modulo 2 in a row vector. 4. Place each row vector in an r × s matrix M, and solve the equation Mc = 0(mod 2) [Using Gaussian Elimination] for c, where c is an r-vector of zeroes and ones and 0 is an r-vector of zeroes.

5. For each element of c equal to zero [call them ck]: Q 2 Q k xk ≡ k f(xk)(mod n) If we call these products x and y, respectively, we have: x ≡ y(mod n). Because of the way we have solved for c, y is a square. Since x is a product of squares, we know x must also a square. So we have a congruence of squares.

6. For each congruence of squares: We know (x + y)(x − y) ≡ 0(mod n) Compute a = gcd(x + y, n) [with ]. If a|n, return a.

Because of the construction of our matrix, each zero-row from step 5 has a 50% chance of giving a factor of n. This shows that we must make

6 S sufficently large to that we can get enough zero-rows to make the chance of not finding a factor negligible. However, if we make S too large, the algorithm√ will run slowly. It has been proven that the best value of S is e2 log(n)log(log(n)), which also happens to be the running time of Dixon’s method.[4] The correctness of this algorithm is a rather complicated proof, so we omit it for simplicity’s sake. Instead, we provide an example:

Example 7.2 (Using Dixon’s Algorithm). Find a factorization of 23449.

Say we want to factor n = 23449 over {S} = {2, 3, 5, 7}. √ First, note x = d ne = 154. Starting here, the first related squares we get are:

9702 mod(23449) = 2940 = 22 ∗ 3 ∗ 5 ∗ 72 86212 mod(23449) = 11760 = 24 ∗ 3 ∗ 5 ∗ 72

So, (970 ∗ 8621)2 ≡ (23 ∗ 3 ∗ 5 ∗ 72)2 (mod 23449) That is, 145262 ≡ 58802 (mod 23449)

Now, we find that: gcd(14526 − 5880, 23449) = 131 gcd(14526 + 5880, 23449) = 179 and indeed n = 131 ∗ 179, so we are finished. It should be noted that, in addition to the improvement in efficiency, this algorithm can be run in parallel by many machines, making this a great method for factoring very large numbers. The parallel algorithm is exactly as above, but each machine tries a different x value. This type of parallel attack was used to factor RSA-129 in 1994 over a time period of eight months using the next method, Quadratic Sieve, which is also very easy to run in parallel.[5]

8 Extension 2 of Fermat’s Method - Quadratic Sieve

Dixon’s elegant factorization method was improved upon the very year it was published. Carl Pomerence published his method, the Quadratic Sieve, in 1981 while at the University of Georgia.[6] This optimization of Dixon’s method uses a smaller set of relations for faster results. Where Dixon’s method blindly tries to factor each f(x) over the prime base {S}, the

7 Quadratic Sieve only considers primes in {S} that have a quadratic residue of n, that is:

n ≡ t2(mod p) for some integer t, and where n is the number we are attempting to factor. This way, it is easy to see when a given prime will divide f(x) = x2, since if the residue of p is t, p|f(x) if x = t, (t + 7), (t + 2 ∗ 7), (t + 3 ∗ 7), ...

This extension of Dixon’s Method is much faster, and is actually the fastest algorithm for factoring numbers with less than 115 decimal digits[5]. The only real difference between it and Dixon’s Method is the sieving step:

8.1 Quadratic Sieving Rather than just choosing values of x at random and checking if f(x) is a perfect square modulo n, instead we use the fact that each prime p in our factor base {S} has a quadratic residue with respect to n. Knowing this, it is easy to see that, for any integer k, for the function y(x) = x2 − n:

y(x + kp) = x2 + 2kp + (kp)2 − n ≡ y(x)(mod p)

Using this result, we can see that, by solving y(x) ≡ 0(mod p) we can find an entire family of values of y such that p|y. The speed increase is in this step, since finding a square root modulo a prime number is easily accomplished with efficient algorithms such as the Shanks-Tonelli Algorithm. By using x values that satisfy this family of congruences, we can find a factor of n much quicker.

8.2 Continuation of Algorithm After the sieving step, we continue as in Dixon’s algorithm by creating the matrix and solving Mc = 0. The desired result is obtained by finding the gcd just as in the last step of Dixon’s method.

9 Efficiency of the Quadratic Sieve Method √ This method requires about e log(n)log(log(n)) stephs to factor a given integer, which is about half as many as Dixon’s Method. In other words, we are able

8 to reduce the work needed to factor n by 50%, which is a formidable amount. This method was the fastest and most used method until the invention of the General Number Field Sieve (GNFS) method in 1993.[7] The GNFS is currently the fastest method in use today.

10 Conclusion

The obvious conclusion is that the Quadratic Sieve Method is superior to all three previously discussed algorithms. It gives up nothing in robustness or generality, that is, it can factor any number that Fermat’s Method can factor, and in nearly all cases is faster. The other conclusion is that the simpler a factoring algorithm is, the slower it usually is. The complete analysis and correctness proofs of the extensions of Fermat’s Method are beyond the scope of this course (and beyond the scope of many 400-level number theory courses!), and although these algorithms are complicated, they also exhibit a elegance that the simpler algorithms simply do not have. For this reason, more than any other, they deserve the study and analysis of anyone hoping to understand the factorization of integers.

References

[1] Wikipedia.org. Trial Division. http://en.wikipedia.org/wiki/Trial division, 2007. [2] Andreas Carlsen. Prime Factorization - Implementation in a Functional Language. www.daimi.au.dk/ akc/crypt /project/primefactoring.pdf, 2005. [3] Weisstein, Eric. Dixon’s Factorization Method. http://mathworld.wolfram.com/DixonsFactorizationMethod.html, 1999. [4] Ben Lynn. Factoring and Discrete Logarithms. http://rooster.stanford.edu/ ben/crypto/factoring.html, 2002. [5] Kenneth H. Rosen. Elementary Number Theory and its Applications, Fifth Edition Pearson, Addison Wesley. New York, New York, USA, 2005. [6] Wikipedia.org. Quadratic Sieve. http://en.wikipedia.org/wiki/Quadratic sieve, 2007. [7] Eric Landquist. The Quadratic Sieve Factoring Algorithm. http://www.math.uiuc.edu/ landquis/quadsieve.pdf, 2001.

9