Integer Factorization Algorithms
Total Page:16
File Type:pdf, Size:1020Kb
INTEGER FACTORIZATION ALGORITHMS Thesis for a Bachelor’s degree in mathematics Joakim Nilsson Supervisor Examiner Klas Markström Gerold Jäger 0 Integer factorization algorithms Abstrakt Det matematiska området heltalsfaktorisering har kommit en lång väg sedan de tidiga åren av Pierre de Fermat och med enklare algoritmer som utvecklats under det förra seklet med exempel som Trial division och Pollards rho algoritm till de mer komplexa metoderna som det Kvadratiska sållet har vi nu kommit till Det generella talkroppssållet vilket har blivit erkänt som den snabbaste algoritmen för att faktorisera väldigt stora heltal. Idag har forskningen kring heltalsfaktorisering många användningsområden, exempelvis inom säkerhet kring krypteringsmetoder som exempelvis den kända RSA algoritmen. I denna uppsats kommer jag att beskriva och ge exempel på de heltalsfaktoriseringsmetoder som jag nämnt. Jag kommer även att göra en jämförelse av hur snabbheten för Trial division, Pollards rho metod samt Fermats metod. För programmeringen i denna uppsats kommer jag att använda mig av Python. För exemplet med tidskomplexitet kommer jag att använda Wolfram Mathematica. Abstract The mathematical area of integer factorization has gone a long way since the early days of Pierre de Fermat, and with simpler algorithms developed in the last century such as the Trial division and Pollards rho algorithm to the more complex method of the Quadratic sieve algorithm (QS), we have now arrived at the General Number Field Sieve (GNFS) which has been recognized as the fastest integer factorization algorithm for very large numbers. Today the research of integer factorization has many applications, among others in the security systems of encryption methods like the famous RSA algorithm. In this thesis I will describe and give calculated examples of the various integer factorization methods mentioned. I will also make a comparison of the speed of factorization for the Trial division, Pollards rho method and the Fermat method. For the programming part of this thesis I will be using the Python programming language. For the time complexity comparison, I will use Wolfram Mathematica. 1 Table of contents 1.Introduction………………………………………………………………………3 2.Definitions………………………………………………………………………..4 3. Background………………………………………………………………………6 4.Integer factorization……………………………………………………………....9 1. Sieve of Eratosthenes…………………………………………..…………………………10 2. B-smooth numbers………………………………………………………………………..11 5.Trial division…………………………………………………………………….12 1. Time complexity……………………………………………………………………………..12 6.Fermats method………………………………………………………………….14 1. Kraitchik-Fermat method…………………………………………………………………15 7.Pollards rho method………………………………………………………….….17 8.The Quadratic Sieve…………………………………………………………......20 9.The general number field sieve……………………………………………….....23 1. Set up of GNFS……………………………………………………………………….…..24 10. Comparison…………………………………………………………………....26 11. Applications…………………………………………………………………...29 12. Summary………………………………………………………………………31 List of references……………………………………………………………….....34 Appendix………………………………………………………………………….35 2 Chapter 1. Introduction In integer factorization we are trying to write an integer as a product of prime numbers. The study of integer factorization has a very long history and the studies have a wide range of applications. I will in this thesis focus on the applications of integer factorization on the area of cryptography. Although there are many different integer factorization algorithms to choose from, I have chosen to focus on five different integer factorization algorithms. Firstly, I will describe the most fundamental three algorithms that are easy to understand and implement, these are the Trial division method, Fermat’s method and the Pollard rho method. Then I will focus on the more mathematically challenging integer factorization method called Quadratic Sieve that led up to the fastest method today to factorize integers larger than 10100 called the General Number Field Sieve. There are also other factorization methods based on lattice basis reduction such as the Schnorr’s method [푆푐ℎ푛표푟푟, 퐶. 푃] and the continued fraction factorization method, CFRAC [퐿푒ℎ푚푒푟, 퐷. 퐻, 푃표푤푒푟푠, 푅. 퐸] described as early as 1931 and later made into a computer algorithm by Michael A. Morrison and John Brillhart in 1975 [푀표푟푟푖푠표푛. 푀, 퐵푟푖푙푙ℎ푎푟푡. 퐽]. Integer factorization plays an important role in cryptography and I have chosen to particularly focus on the RSA public-key cryptosystem in this thesis. The RSA public-key cryptosystem relies on the difficulty of solving equations of the form 푥푒 ≡ 푐(푚표푑 푁), where now the quantities 푒, 푐 and 푁 are known and 푥 is the unknown [Hoffstein, J. p119]. However, not all numbers are equally difficult to factorize. The most difficult kinds of number to factorize are called semiprimes. We will also see how another special kind of numbers called Carmichael numbers can play an important role when implementing the RSA cryptosystem. In order to fully comprehend this thesis, I will begin with a section of definitions where I will give short explanations needed to fully understand the rest of this thesis. I will also give a short introduction to the various theorems that I will use in this thesis. 3 Chapter 2. Definitions A few commonly used mathematical notations that will be used in this thesis: ℝ: The set of all rational and irrational numbers ℕ: The set of all nonnegative integers 0,1,2, … ℤ: The set of all integers … , −2, −1,0,1,2, … ℚ: The set of all rational numbers, 3/2, -15/3… ∀: For all… ∃: There exists an element… Definition 1. Greatest common divisor: Let a and b be integers, not both zero. The largest positive integer d such that d|a and d|b is called the greatest common divisor of a and b. The greatest common divisor of a and b is denoted by 푔푐푑(푎, 푏). Definition 2. Congruence: If a and b are integers and m is a positive integer, then a is congruent to b modulo m if m divides a-b. We use the notation 푎 ≡ 푏(푚표푑 푚) to indicate that a is congruent to b modulo m. 푎 ≡ 푏(푚표푑 푚) <=> 푚|(푎 − 푏) Definition 3. Semiprimes is a number that is a product of two prime numbers. Examples of two easy semiprimes are 4 = 2 ∗ 2 and 6 = 2 ∗ 3. When these numbers are sufficently large even the fastest prime factorization algorithms will take such a long time to factor that it is infeasible. Definition 4. Polynomial: A function of a single variable 푡 is a polynomial if we can put it in 푛 푛−1 the form 푎푛푡 + 푎푛−1푡 + ⋯ + 푎1푡 + 푎0 where 푎푛, 푎푛−1, … , 푎1, 푎0 are constants [Barbeau, E.J. p1]. Definition 5. Ring: A ring in the mathematical sense is a set S together with two binary operators + and * satisfying the following conditions: 1. Additive associativity: ∀ 푎, 푏, 푐 ∈ 푆, (푎 + 푏) + 푐 = 푎 + (푏 + 푐). 2. Additive commutativity: ∀ 푎, 푏 ∈ 푆, 푎 + 푏 = 푏 + 푎. 3. Additive identity: ∃ an element 0 ∈ 푆, 푠. 푡. ∀푎 ∈ 푆, 0 + 푎 = 푎 + 0 = 푎. 4. Additive inverse: ∀푎 ∈ 푆, ∃ − 푎 ∈ 푆 푠. 푡. 푎 + (−푎) = (−푎) + 푎 = 0. 5. Left and right distributivity: ∀ 푎, 푏, 푐 ∈ 푆, 푎 ∗ (푏 + 푐) = (푎 ∗ 푏) + (푎 ∗ 푐)&(푏 + 푐) ∗ 푎 = (푏 ∗ 푎) + (푐 ∗ 푎). 6. Multiplicative associativity:∀ 푎, 푏, 푐 ∈ 푆, (푎 ∗ 푏) ∗ 푐 = 푎 ∗ (푏 ∗ 푐). Definition 6. Field: A (commutative) ring in which every nonzero element has a multiplicative inverse is called a field [Hoffstein, J, p96]. Definition 7. 풪(푛)-notation: also called the Big O notation, describes the limiting behaviour of a function when the argument tends toward a specific value or towards infinity. for example, 풪(푛2) denotes something that grows as fast as 푛2 when 푛 increases. In complexity theory it is used to describe the effectiveness of algorithms. 4 Definition 8. B-smooth numbers: A positive integer 푛 is said to be y-smooth if it does not have any prime factor exceeding y [Pomerance, C. p48]. Definition 9. Fix an integer 푛. Then an integer 푎 is called a witness for the compositeness of 푛 if 푎푛 ≢ 푎(푚표푑 푛). Definition 10. Continued fraction: An infinite continued fraction is an expression of the form 1 푎0 + 1 푎1+ 1 푎2+ 푎3+⋯ where 푎0, 푎1, 푎2 … ∈ ℤ. 1 A finite continued fraction is an expression of the form [푎0: 푎1, 푎2, … , 푎푛] = 1 푎0+ 1 푎1+ +⋯ 푎2 where 푎0, 푎1 … ∈ ℤ and 푎1, … , 푎푛 being positive. 5 Chapter 3. Background Fundamental Theorem of Arithmetic: 푎1 푎2 푎푘 For each natural number 푛 there is a unique factorization 푛 = 푝1 푝2 … 푝푘 , where exponents 푎푖 are positive integers and 푝1 < 푝2 < ⋯ < 푝푘 are primes. Proof: This theorem can be proven by a false assumption. Assume there are integers that can be represented as a product of primes in more than one way. Let 푛 be such a positive integer and assume there are two ways to represent the integer 푛 as a product of primes, such that: 푝1푝2 … 푝푘 = 푞1푞2 … 푞푖 = 푛. Some of the primes 푝 can then be identical to prime numbers 푞. If we then divide these primes we obtain 푝푖1푝푖2푝푖3 … = 푞푗1푞푗2 푞푗3 = 푚, where no factor of 푝푖푟 = 푞푗푠. Then the prime number 푝푖1 must be a divisor of one of the primes 푞푗푘 since it divides 푚. Since this is impossible the assumption that there were positive integers that could be represented as the product of primes in more than one way was false. Chinese Remainder Theorem: 푟=1 Let 푚0, … , 푚푟−1 be positive, pairwise coprime moduli with product 푀 = ∏푖=0 푚푖. Let 푟 respective residues 푛푖 also be given. Then the system comprising the 푟 relations and inequality 푛 ≡ 푛푖(푚표푑 푚푖), 0 ≤ 푛 < 푀 has a unique solution. Furthermore, this solution is 푟−1 given explicitly by the least nonnegative residue modulo M of ∑푖=0 푛푖푣푖푀푖, where 푀푖 = 푀/푚푖, and the 푣푖 are inverses defined by 푣푖푀푖 ≡ 1(푚표푑 푚푖) [Pomerance, C,p87]. Proof: −1 −1 Let 푝1 = 푝 (푚표푑 푞) and 푞1 = 푞 (푚표푑 푝). This must hold since 푝 and 푞 are coprime. Then we can state that if 푦 is an integer such that 푦 = 푎푞푞1 + 푏푝푝1(푚표푑 푝푞) then 푦 satisfies both of the equations: modulo 푝 we have 푦 = 푎푞푞1 = 푎(푚표푑 푝) since 푞푞1 = 1(푚표푑 푝).