<<

Reverse Factorization and Comparison of Factorization in attack to RSA

Sadi Evren SEKER Cihan MERT Dept. of Business Administration. Electrical Engineering Dept. Istanbul Medeniyet University The University of Texas at Dallas [email protected] [email protected]

ABSTRACT details of the factorization algorithms we have implemented Factorization algorithms have a major role in the computer in this study. The experiments section will go into the details security and cryptography. Most of the widely used crypto- of the big number of their properties after generation graphic algorithms, like RSA, are built on the mathematical and evaluation of the algorithms. difficulty of factorization for big prime numbers. This re- search, proposes a new approach to the factorization by using 2. PROBLEM STATEMENT two new enhancements. The new approach is also compared A stepwise approach to the study can be viewed as in the Fig- with six different factorization algorithms and evaluated the ure 1. performance on a big data environment. The algorithms cov- ered are elliptic curve method, , Fermat’s method, and Pollard rho methods. Success rates are compared over a million of numbers with different difficulties. We have implemented our own for ran- dom number generation, which is also explained in the paper. We also empirically show that the new approach has an ad- vantage on the factorization attack to RSA. Keywords Factorization, Cryptography, Benchmarking Acknowledgement Work of Sadi Evren SEKER is supported by Istanbul University, research projects department under project number YADOP-27254 1. INTRODUCTION This study can be viewed as three major steps. In the first layer, we have generated big integers with a new approach on the generation. After the generation, the factorization algo- rithms including the new approach are executed. Finally, on Figure 1. Overview of Study the last step the performance of the algorithms are evaluated. In order to simulate the RSA factorization In this paper, the problem will be defined and an overview of problem, we have only concentrated on the semi-prime num- the problem will be demonstrated in the problem statement bers. The random generator is designed to generate the semi- section. The related work section will cover a brief literature prime numbers. In order to make the time performance more review about the contemporary studies on the factorization explicit we have generated huge number of semi-prime num- algorithms. The background chapter will briefly describe the bers and stored them in a database. After storing, the factori- zation algorithms are executed on those numbers. Finally each algorithm is evaluated in the time performance. 3.3. Quadratic Sieve To factorize a number n, quadratic sieve method [3] attempts 3. BACKGROUND to find two numbers x and y such that 푥 ≢ ±푦 (푚표푑 푛) and 푥2 ≡ 푦2(푚표푑 푛). If two such numbers are found, this From the early times, the factorization of composite numbers implies that (x − y)(x + y) ≡ 0 (mod n). Then, x − y must has been an interesting area of studying and there are some have non-trivial factors in common with n. To achieve this, algorithms carried on like ( 276 – 194 a common strategy for finding such x and y is the following. BC). Choose a smoothness bound B. The number π(B),which Also the by the spreading usage of modern cryptographic sys- denotes the number of prime numbers less than B, will control tems which some are built on the difficulty of factoring, like both the number of vectors needed and the length of the RSA[1], the factorization problem has been a studying area. vectors. Then use sieving to locate π(B) + 1 numbers 푥푖 such 2 that 푦푖≡ (푥푖 푚표푑 푛) is B-smooth. Factor the 푦푖 and Initially factoring started with dividing a number by larger generate exponent vectors mod 2 for each one. Find a subset and larger primes until you had the factorization. This trial of these vectors which add to the zero vector. Multiply the division was not improved until Fermat’s method in which corresponding 푥푖 together naming the result mod n: x and the factorization of the difference of two squares is used. 2 the 푦푖 together which yields a B-smooth square 푦 . Next, While Fermat's method is much faster than trial division, 2 2 when it comes to the real world of factoring, for example for obtained equality 푥 ≡ 푦 (푚표푑 푛) gives two square roots of 2 factoring several hundred digits long RSA modulus, the (푥 푚표푑 푛), one by taking the square root in the integers 2 purely iterative Fermat’s method is too slow. This led the de- of 푦 namely 푦, and the other the a computed in previous velopment of several other methods, such as a pair of proba- step. Having desired identity(x − y)(x + y) ≡ 0(mod n), bilistic methods by Pollard in the mid 70's, the p − 1 method compute the 퐺퐶퐷(푥 − 푦, 푛). This gives a factor. If the factor and the ρ method, the Elliptic Curve Method discovered by is trivial, try again with a different a or linear dependency. H. Lenstra in 1987 . However, the fastest algorithms such as 3.4. Pollard Rho the Number Field Sieve (and its variants), the Quadratic Sieve Pollard’s rho method [4] is based on a combination of two (and it variants), and Continued Fraction Method utilize the ideas on Floyd's cycle-finding algorithm and birthday same trick as Fermat. The remainder of this paper will briefly paradox that are also useful for various other factoring discuss some of the above methods and focus on reverse fac- methods. torization method, a new approach. Let N be a number that is neither a perfect power nor a prime 3.1. Factorization by Trial Division and p the smallest prime factor of N. Generate of numbers 푥0, 푥1, 푥2, … from 푍푁 uniformly, independently at random then after at most p + 1 Trial method is a brute-force method of finding a of such pickings for the first time, there are two numbers 푥푖 and an integer N by simply trying if N is divisible by 푥푠 with i < s such that 푥푖 ≡ 푥푠 (푚표푑 푝). Since N is 2,3,5,7,11,13,17,…, i.e., all primes which are less than or not a perfect power, there is another prime factor q > p of N. equal to √푁 in succession, until a divisor is reached. Since the numbers 푥푖 and 푥푠 are randomly chosen from 푍푁, by the Chinese remaindering theorem, 푥푖 ≢ 푥푠 (푚표푑 푞) To partially or completely factor N, Trial division is an effec- with probability 1 − 1/푞 even under the condition that 푥푖 ≡ tive and simple method. It is reasonable to use trial division 푥푠 (푚표푑 푝). Therefore, 푔푐푑(푥푖 − 푥푠, 푁) is a nontrivial method as a factoring method when N is not too large. factor of N with probability at least 1 − 1/푞. Since the 푥 푚표푑 푝 behave more or less as random integers 3.2. Fermat Factorization 푖 in 0,1, … , 푝 − 1 , by computing 푔푐푑(푥푖 − 푥푗, 푁), for 푖 ≠ 푗 , the factorization of N after about 푐√푝 elements of the Fermat's factorization method [2] looks for the representation sequence can be computed, for some small constant c. 2 of an odd integer N as the difference of two squares N = This suggests that approximately (푐√푝) /2 pairs 푥푖 , 푥푗 have a2 − b2 . Then to be considered. However, this can easily be avoided by only N = (a − b)(a + b) computing 푔푐푑(푥푖 − 푥2푖, 푁) for 푖 = 0,1, … , i.e., by and N is factored. generating two copies of the sequence, one at the regular speed and one at the double speed. This can be expected to To factor any number N, first calculate √N. Then compute 2 result in a factorization of N after approximately 2√푝 gcd a − N starting with a, the first integer greater than √N and computations. If this GCD ever comes to N, then the continue until reaching a square b2. Since a2 − N = b2 , algorithm terminates with failure, since this means 푥푖 = N = a2 − b2 . So N is factorized into N = (a − b)(a + b) .If 푥2푖 and therefore, by Floyd's cycle-finding algorithm, the the only factors found are N and 1, then N is a prime number. sequence has cycled and continuing any further would only If N is not prime, use the same algorithm for each factor. be repeating previous work. Fermat's method works well when the number is factorized into two terms of approximately equal size. It works poorly when the factors are of very different sizes. 4. Semi-prime Factorization in RSA Where the number of prime factors of cn is consierede as This study focus on the fast and efficient factorization for the m+1. semi-prime numbers. The semi-prime numbers are For the given cn, the equation (2) can be concluded. considered as the multiplication of two prime numbers, say p and q. In some sources the semi-prime numbers are also 푚 named as pq numbers for this reason. ( 푛 ∈ 푁 ∧ 푛|푓푖) ⇔ 푛푖| {ℤ|푛 = (⋂ ℤ|푓푖)} (2) The advantage of factorizing the semi-prime numbers in RSA 푖=1 crypto system is the two prime factors of semi-prime numbers Where N is the domain set of search for the prime numbers, should be in equal digists or almost in equal digits. The reason 2 is, if the number of digits of one prime of the semi-prime which are the numbers from 2 to √푐푛. number is smaller than the other, the system woul have a If, any number 푛 ∈ 푁 is also a with m weakness. factors, than testing the situation of 푐푛 | 푛 means, for all the The weakness can be explained like this. The RSA system is m factors of n are already tested. Depending on the situation, built on the time complexity of factorizing the semi-prime since we are running a search algorithm, if the searched factor number into two factors. The time complexity increases by is found, than the search finishes. If, n is not the factor of cn, the number of digits. For example the time required to than the search, can be reduced by also eliminating the factors factorize a 20 digit number is muh more higher than the time of cn from the search space. required for factorizing 19 digit number. But if one of the 5.1. Sample Run factors of the high digit number is so small. Let’s give an In order to present the new approach, we are also example of extreme case with one digit prime like 2,3,5 or 7. demonstrating a sample run over over the semi-prime number Than factorizing the number would be much more easier. 47 x 53 = 2491. And finding any factor of the number would make it even The search space is the numbers from 1 to 49, since the easier to find the second factor. So, in most of the cases, RSA 2 √2491 = 49. The search algorithm starts by a sieve and tests uses the two prime numbers with equal digits to generate a the first alternative number 49 from the end of the sieve. Since semi-prime number. 2491|49 = false, we can remove all the factors of composite The novel approach proposed in this study, considers this as number 49 from the search space. a vulnerability and and proofs that, using the same digit Table 1.Removing first factors of composite number 49 after primes to generate a semi-prime is also makes easier to get the first iteration factorization with the novel method explained in this paper. 1 2 3 4 5 6 7 5. A Novel Approach to Semi-prime Fac- 8 9 10 11 12 13 14 15 16 17 18 19 20 21 torization 22 23 24 25 26 27 28 In the new approach, we see the problem as a search problem, 29 30 31 32 33 34 35 where the factors p and q of a semi-prime number sp are 2 36 37 38 39 40 41 42 smaller than the 푝, 푞 < √푠푝 we propose to implement a 43 44 45 46 47 48 49 sieving approach, which increaes the speed of searching by In the second iteration, the second number from the end of eliminating some of the possibilities in each check. On the the search space is considered, which is 48. Since the 2491|48 other hande, we propose to keep a factor tree for fast is false, all the factors of composite number 48, can be elimination of the alternatives. removed from the search space. The factors of 48 are 2 and The sieving approaches like, Erotathene’s Sieve[6] or Sieve 3 and the composite numbers can be generated from those of Atkin or Rational Sieve [7] are eliminating alternatives, factors are eliminated as shown on the Table 2. strating from the smallest prime number and the number Table 2. After eliminating the factors and composite searched increases in each step. numbers from thos factors of 48 in second iteration This iterative approach from small to bigger prime numbers 1 2 3 4 5 6 7 has a certain advantage while finding the prime factors of a 8 9 10 11 12 13 14 composite number. But in the case of factoring for the semi- 15 16 17 18 19 20 21 prime numbers which are specially generated for the RSA crypto system, starting from small prime numbers has a 22 23 24 25 26 27 28 disadvantage since we are aware that the searched prime 29 30 31 32 33 34 35 number is much mor close to the square root of semi-prime 36 37 38 39 40 41 42 number (2√푠푝). 43 44 45 46 47 48 49 From the sieve in table 2, the next number in the search space Our approach is as in Algorithm 1. By the definition, any is 47. Testing the 2491 | 47 is true, so we are finished with composite number cn can be rewritten as in equation (1). searching the numbers. If the results of 2491 | 47 would be false, than searching whould have conitnue and since all the numbers until 43 are m eliminated, the next number in the search iteration would be 43. In table 2, the number of numbers in search space is cn=p ∏ ci (1) reduced to 14 possibilities only, from the initial 49 numbers. i=1

From the sample run, we have found the factor in 3 steps. Any 9. end else; sieving approach would find the factor after trying all the 10. end for; prime numbers until 47. This brings up a performance Above algorithm demonstrates the execution of novel obviously. approach. The iterator value i starts from 2 푠푝 and iterates During the elimination of factors of any composite number a √ factor tree can be implemented. until the smallest prime number. In fact, we are aware that, one of the factors of semi-prime of RSA can never be 2 because of the vulnarability, but algorithm is designed in this manner for the worst case analysis. 6. EVALUATION The results of executions of various algorithms are demonstrated on the Table 3. Table 1. Execution Performance of the Factorization Algorithms

Method Average Execution Figure 2. Factorization tree for composite number 48 Pollard Rho 398 mins ECM 3443 mins

Fermat 30 mins In figure 2, the factor tree holding the factors of 48 are Quadratic Sieve 326 mins demonstrated. Also the tree is ambigious since the same tree Erathostene 1267052 mins can be redrawn as in figure 2. Trial Division 5510739 mins New Approach 5 mins

In the table 3, the results are gathered from execution of thousands of random numbers with 8 digits. Also, in order to visualize the increase of time spent of algorithms, the execution times of algorithms for 6 of the methods are plotted in figure 4.

Figure 3. Ambigious alternative factorization tree for composite number 48

Any drawing of the tree can be useful in the elimination of the search space. The deepest tree for any composite number can have maximum of numbers as given in equation (3). Figure 4. Performance evaluation of methods while the number of digits are increasing. 푀푎푥 푑푒푝푡ℎ 표푓 푓푎푐푡표푟 푡푟푒푒 = log2 푛 ⁄ 2 − 1 (3) The digits are quite low in Figure 4 and plotting is stopped for 5 Please remember the smallest prime number is 2 and the digits, where the algorithms are still close to each other. After the maximum internal node count can be 1 minus half of the total number of digits are increased, some of the algorithms consumes numbers of the nodes in a binary tree. higher time than the rest. Algorithm 1: A Novel Factorization for RSA Semi-Prime

1. Let SP be a semi-prime with high factors, 2. Let C be Closings of Stockmarket, 3. for i  2√푆푃 down to 2 begin 4. if SP | i return i as factor 5. else begin 6. create a factor tree of i; 7. eliminate all factors in sieve; 8. decrease i; REFERENCES

[1] Rivest, R.; A. Shamir; L. Adleman (1978). "A Method for Obtaining Digital Signatures and Public-Key Cryptosystems". Communications of the ACM 21 (2): 120–126. doi:10.1145/359340.359342.

[2] McKee, J. Speeding Fermat’s Factoring Method, Math. Comput. 68, 1729–1738,1999.

[3] Pollard, J. M. A Monte Carlo method for factorization, BIT, Vol. 15 (1975) pp. 331–334

[4] Lenstra, H. W. Jr. "Factoring Integers with Elliptic

Curves." Ann. Math. 126, 649-673, 1987.. Figure 5. Performance evaluation of methods while the number of digits are further more increasing. [5] Gerver, J. Factoring Large Numbers with a Quadratic Sieve, Math. Comput. 41, 287-294, 1983. Depending on the setup time and difficulty of the numbers, some algorithms yield worse results than the rest. [6] Horsley, Rev. Samuel, F. R. S., "Κόσκινον Ερατοσθένους or, From the analytical perspective, it is known that the time complexity The Sieve of Eratosthenes. Being an account of his method of of the algorithms are as in Table 4. finding all the Prime Numbers," Philosophical Transactions (1683–1775), Vol. 62. (1772), pp. 327–347. Table 4. Time Complexity of the Methods Method Time Complexity [7] A.O.L. Atkin, D.J. Bernstein, Prime sieves using binary Pollard Rho O(B × log B × log2n) Where B is the bound quadratic forms, Math. Comp. 73 (2004), 1023-1030 and n is the composite number. ECM O(L(p)M(log n)) Where M(log n) is the complexity of multiplication mod n, and 퐿(푝) = 훼 1−훼 푒푐(log 푝) (푙표푔푙표푔푝) Fermat O(d) Where d is the distance between the two factors of the composite number. Quadratic O(log B loglog B) Where B is the Sieve bound. Erathostene O(√푛 + 푝) Where p is the number of primes below √푛 Trial Division O(√푛) New O(dp) Where dp is the Approach number of primes within the two factors of composite number.

7. CONCLUSION This study, brings up a new approach to the semi-prime number factorization very similar to the Fermat’s factorizatino algorithm. The biggest impact of semi-prime number factorization is the attack against crypto systems like RSA. During the study, we have evaluated the new approach and compare the success against most significant factorization algorithms. The success rate of the new approach seems quite convincing besides the encouraging analytical performance of the algorithm. We would also like to test the success of the new approach in bigger integer numbers like 50+ digits and also parallelization would be a challanging future work.