MTH6115 Cryptography

Notes 8: Public-key cryptography—Primes and factorisation

8.1 Primality testing

Primality testing is easy, but factorisation is hard. This assertion is the basis of the security of the RSA cipher. In this section and the next, we consider it further.

8.1.1 Trial division

The basic algorithm for primality testing is trial division. In a very crude√ form it asserts that, if n > 1 and n is not divisible by any integer m such that 2 6 m 6 n, then n is prime. The first thing to say about this algorithm is that, with minor modification, it leads to a factorisation of n into primes. If n is not prime, then the first we find will be a prime p, and we continue dividing by p while this is possible to establish the exact power of p. The quotient is divisible only by primes greater than p, so we can continue the trial divisions from the point where we left off. The second thing is that this simple algorithm does not run in polynomial√ time: the input is the string of digits of n, and the number of trial divisions is about n, (very) roughly 2b/2 if n has b digits to the base 2, and roughly 10d/2 if n has d decimal digits. There are some minor improvements to the algorithm, but they do not solve the fundamental problem that the algorithm is not polynomial time. We do not need to try a potential divisor if we know it is composite;√ this is because a - n implies that ab - n for all integers b. But there are about 2 n/logn primes in the required interval, which is far too many to allow the algorithm to run in polynomial time. Making sure that a potential divisor is prime (presumably by trial division) is a bad idea (it would take too long).√ Also, using the Sieve of Eratosthenes (or variants) to enumerate all primes up to n is also not recommended, because of memory issues. A reasonable compromise 1 √ is to divide by 2, 3 and all numbers 6k ± 1 in the required range, for about 3 n trial divisions.

1 Nevertheless, for the very smallest n, trial division is the most efficient method to prove primality. For example, if we wish to prove that an integer n 6 1680 we should √divide it by the integers from 2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37 that do not exceed n. For larger n, it is probably more efficient to trial divide by some ‘small’ primes to root out composite numbers quickly, before applying a more sophisticated test. In this context, ‘small’ can mean quite a bit larger than you are used to. Also, here it would be worth making the list of ‘small’ primes we try. (Some tests may require checking that n is not divisible by some of the very smallest primes, such as 2 and 3, but it will always be advisable to use trial division somewhat beyond this.)

8.1.2 Some potentially useful results It is a bit surprising at first that primality testing can be easier than factorisation. This holds because there are algorithms which decide whether or not a number is prime without actually finding a factor if it is composite! Three examples of such theorems are given below. Theorem 34 (Fermat’s Little Theorem) If n > 1 is prime then xn ≡ x (mod n) for any integer x. Theorem 35 If n > 1 is prime and (for some integer x) x2 ≡ 1 (mod n) then x ≡ 1 or −1 (mod n). Theorem 36 (Wilson’s Theorem) An integer n > 1 is prime if and only if we have (n − 1)! ≡ −1 (mod n). We have already seen the proof of the little Fermat theorem. For the second of these we note that if p is prime and x2 ≡ 1 (mod p) then p | (x2 − 1) = (x + 1)(x − 1), and so p | (x + 1) or p | (x − 1). Here is the proof of Wilson’s Theorem. Suppose that n = p is prime. We know that every number x in the set {1,..., p−1} has an inverse y modulo p (so that xy ≡ 1 (mod p)). The only numbers which are equal to their inverses are 1 and p − 1: for if x is equal to its inverse, then x2 ≡ 1 (mod p), and so x ≡ ±1 (mod p) (see above). The other p − 3 numbers in the range can be paired with their inverses, so that the product of each pair is congruent to 1 mod p. Now multiplying all these numbers together gives (p − 1)! ≡ 1 · 1(p−3)/2 · (p − 1) ≡ −1 (mod p). Conversely, suppose that n is composite. If q is a prime divisor of n, then certainly q divides (n − 1)!, and so (n − 1)! is congruent to a multiple of q modulo n, and so cannot be −1. (In this case, it is quite easy to show that n is a divisor of (n − 1)!, with the single exception of n = 4.) The proof of Wilson’s Theorem is not examinable.

2 Can any of these results give us a quick test for primality? Fermat’s Little Theorem gives a necessary, but not sufficient condition that n be prime. We shall say more about this in the next subsection. For Theorem 35, we note a a + that if n = 2, 4, p or 2p , where p is an odd prime and a ∈ Z then x2 ≡ 1 (mod n) implies that x ≡ ±1 (mod n); if n > 1 is any other integer then this implication does not hold. We can detect whether n is even or a proper power relatively easily, and a a thus we can easily know if n has any hope of being in {2,4, p (a > 2),2p }. So we reduce to the case where n is an odd prime or has two distinct odd prime . In the former case, we do not know how to show (directly and fast enough) that if x2 ≡ 1 (mod n) then x ≡ ±1 (mod n). And in the latter case, we do not know how to find (in an efficient way, even probabilistically, except if n is somewhat special) x such that x2 ≡ 1 (mod n) and x 6≡ ±1 (mod n). (The Miller–Rabin makes the latter potential problem into a non-issue.) In contrast, Wilson’s Theorem gives a necessary and sufficient condition for n to be prime. That is, if (n − 1)! ≡ −1 (mod n), then n is prime; if not, then not. But nobody knows how to calculate (n − 1)! (mod n) sufficiently quickly (or even how to determine whether gcd(n,(n − 1)!) = 1 efficiently); the natural method would require n − 2 multiplications.

8.1.3 The Strictly speaking this is a compositeness test: if n > 1 and you have found an x such that xn 6≡ x (mod n) then you know that n is composite, without having found any of the factors of n. If there is an x with this property, then we can reduce it modulo n to get an x with this property that satisfies xn 6≡ x (mod n). As we explained in Notes n 7, there is an efficient way to calculate x mod n, involving at most 2log2 n multipli- cations of numbers not exceeding n and calculation of the remainder modulo n after each multiplication, so each individual Fermat test is polynomial time. The possible outcomes of each Fermat test are (n is definitely) COMPOSITE if xn 6≡ x (mod n) and (n is) MAYBE PRIME if xn ≡ x (mod n). Thus, for example, 2589 ≡ 326 6≡ 2 (mod 589), and so we know that 589 is com- posite. For two larger examples, we take m = 7 · 1011 + 1 and n = 4 · 1012 + 1 and find that 2m ≡ 295817535273 6≡ 2 (mod m) and 2n ≡ 3374782511254 6≡ 2 (mod n). In none of these three cases did the test reveal a (prime) factor of the number proven to be composite. (The factorisations are 589 = 19 · 31 (very easy), 7 · 1011 + 1 = 41149 · 17011349 and 4 · 1012 + 1 = 277 · 7213 · 2002001; part of the latter factorisation stems from the algebraic identity 4X4 + 1 = (2X2 + 2X + 1)(2X2 − 2X + 1).) Unfortunately, this test doesn’t work in the other direction. For example, we have

3 2341 ≡ 2 (mod 341), even though 341 = 11 · 31 is not prime. We say that 341 is a to (the) base 2.

Definition An integer n > 1 is a (weak or Fermat) pseudoprime to the base a if n is not prime but an ≡ a (mod n). (We really ought to specify that the integer a is at least 2, since every would be a pseudoprime to the bases 0 and 1.) Any a such that an 6≡ a (mod n) is a Fermat witness (to the compositeness of n), while the other a are Fermat liars.

Pseudoprimes are not very common, though it is known that there are still infinitely many of them, and if we are prepared to take the small risk that the number we chose is a pseudoprime rather than a prime, then we could simply accept such numbers. The smallest to base 2 are 341, 561, 645, 1105, 1387, 1729, 1905, 2047. We could feel safer if we checked different values. For example, although 341 is a pseudoprime to base 2, we find that 3341 ≡ 168 6≡ 3 (mod 341), so that 341 is definitely composite. Unfortunately even this does not give us complete confidence. A Carmichael num- ber is defined to be a composite number that is a pseudoprime to every possible base. It seems unlikely that such numbers exist, but they do! However, they are quite rare. More formally, we have the following.

Definition An integer n > 1 is said to be a if n is not prime and one of the following two equivalent conditions hold.

(a) We have an ≡ a (mod n) for all integers a.

(b) We have an−1 ≡ 1 (mod n) for all integers a such that gcd(a,n) = 1.

Of course, we can restrict a in the above so that 0 6 a < n.

Proposition 37 The following all hold.

1. Conditions (a) and (b) in the above definition are equivalent.

2. The positive integer n is a Carmichael number if and only if it is composite and λ(n) divides n − 1.

3. Every Carmichael number is the product of distinct primes.

(Note that λ(n) | (n − 1) if and only if n is square-free and (p − 1) | (n − 1) for every prime divisor of p of n.)

4 Proof (Not examinable.) The implication (a) ⇒ (b) is trivial (we can cancel the a’s when gcd(a,n) = 1). Let n be a Carmichael number with respect to version (b) of the definition. Pick a such that a has order λ(n) modulo n (we know such an a exists). A theorem from Notes 7 then gives λ(n) | (n − 1). Conversely, if λ(n) | (n − 1), say n− (n − 1) = kλ(n) for some n ∈ Z, then for any a with gcd(a,n) = 1 we have a 1 = (aλ(n))k ≡ 1k = 1 (mod n), so that n is Carmichael with respect to version (b) of the definition. And if p2 | n for some prime p then p | λ(n), and so λ(n) - (n − 1), and therefore n is not Carmichael (w.r.t. version (b)). Finally, we can now prove that if n is version (b) Carmichael then it is version (a) Carmichael. Note that n = p1 ··· ps is a product of distinct primes. For all a we p n have a i ≡ a (mod pi) for all i (by Fermat), and so for all i we have a ≡ a (mod pi) (since (pi −1) | λ(n) and λ(n) | (n−1)). The Chinese Remainder Theorem then gives an ≡ a (mod n).

Now consider n = 561 = 3 · 11 · 17. We have λ(n) = lcm(2,10,16) = 80 and 80 divides 560, and so 561 is a Carmichael number. The twelve smallest Carmichael numbers are 561, 1105, 1729, 2465, 2821, 6601, 8911, 10585, 15841, 29341, 41041 and 46657. Famously, 1729 = 13 +123 = 93 +103; this helps in its factorisation since x3 +y3 = (x+y)(x2 −xy+y2) for all x and y. It is known that there are infinitely many Carmichael numbers. It is valid to ask which integers a one should use in the Fermat test when trying to determine the primality of n? Obviously, a = 0 or 1 is useless, and a = −1 is useless if n is odd. (And of course, if n is odd, any a such that a ≡ 0,±1 (mod n) is useless.) And if an ≡ a (mod n) and bn ≡ b (mod n) then (ab)n ≡ ab (mod n), that is the set of Fermat liars is closed under multiplication. So we should only try a = −1 (for n even only) or a a prime. If n is composite and not Carmichael, then at least 1/2 of all integers a such that gcd(a,n) = 1 and 0 < a < n are Fermat witnesses to the compositeness of n. (This statement need not apply to those a which are prime.) Refinements of this (Fermat) test due to Solovay and Strassen and to Rabin give fast algorithms which could conclude either that n is certainly composite or that n is ‘probably prime’, where our degree of confidence could be made as close to 1 as required.

8.1.4 The Miller–Rabin primality test (The examinable material in this subsection consists of the basic test; the definition of strong pseudoprimes and their relation to weak pseudoprimes; the Fact and its use to give a one-sided Monte Carlo test for primality; possible factorisation in certain cases. Material such as Bach’s Theorem and ERH is not examinable.)

5 This is a probabilistic algorithm based on the same idea as we saw in the last chapter, for factorising N = pq when an encryption–decryption pair (e,d) for RSA modulo N is known. This test (Miller–Rabin) combines the Fermat test with the observation that if p is prime and x2 ≡ 1 (mod p) then x ≡ ±1 (mod p). The possible outputs of each run of the algorithm are (definitely) COMPOSITE or MAYBE PRIME. (We note that there is only any real point in applying Miller–Rabin when n is odd; for n even it is simply the Fermat test.) In its simple form, a single run of Miller–Rabin to test n for primality (or not) operates as follows. (Unlike the corresponding earlier algorithm, there is no need to check whether gcd(x,n) = 1.) • Let n − 1 = 2b · c where c is an odd integer. (This is really preprocessing.) • Choose an integer x such that 0 < x < n. • Now let y = xc mod n. (Use the repeated squaring method, and perform reduc- tions mod n whenever possible.)

• By repeated squaring and reduction modulo n, calculate y0 = y, y1, y2,...,yb, so 2 2i·c that yi = yi−1 mod n for i = 1,2,...,b. (Thus yi ≡ x (mod n) for all i.)

• If y0 ≡ 1 (mod n) or yi ≡ −1 (mod n) for some i with 0 6 i 6 b − 1 then return MAYBE PRIME (and exit).

• In any other case, return COMPOSITE. n−1 The first thing to note about this algorithm is that yb ≡ x (mod n), and that the only cases for which MAYBE PRIME is returned have yb ≡ 1 (mod n), and so in all such cases n is a Fermat pseudoprime to base x. The converse of this is false. For exam- ple, we saw that 341 is a Fermat pseudoprime to base 2. But it is not a Miller–Rabin pseudoprime to that base, since we have 341 = 22 · 85 and 285 ≡ 32 (mod 341) and 2170 ≡ 1 (mod 341). And if n is prime, then gcd(x,n) = 1 and so xn−1 ≡ 1 (mod n), and so Miller–Rabin will return MAYBE PRIME in that case. But there are pseudo- primes, even for this test.

Definition We say that an integer n > 1 is a (strong or Miller–Rabin) pseudoprime to base a if n is composite, and either we are in the degenerate case where n | a or the Miller–Rabin test returns MAYBE PRIME when using the integer x = a mod n. (We normally limit this terminology to the case a > 2.) The ten smallest Miller–Rabin pseudoprimes to base 2 are 2047 = 211 −1 = 23·89, 3277, 4033, 4681, 8321, 15841, 29341, 42799, 49141 and 52633. When we apply

6 Miller–Rabin to n = 2047 at the base 2, we first note that n − 1 = 2 · 1023 (and that 1023 1023 = 11 · 93 = 3 · 11 · 31), and so we get y0 ≡ 2 ≡ 1 (mod 2047). But 2047 is not a Miller–Rabin pseudoprime to base 3 since 31023 ≡ 1565 6≡ ±1 (mod 2047) and 32046 ≡ 1013 6≡ 1 (mod 2047), so 2047 is not even a Fermat pseudoprime to base 3. Given our experience with Carmichael numbers, it is reasonable to ask whether a composite number can be a to all bases. This time, the answer is “no”, and for given composite n it is easy to find a Miller–Rabin witness.

Fact Let n > 1 be an odd composite integer. Then n is a pseudoprime to base a for at most 1/4 of the a such that 0 < a < n. (The constant 1/4 is usually a massive overesti- mate. Numbers that come close to achieving it are 9, certain Carmichael numbers and (m+1)(2m+1) where both m+1 and 2m+1 are prime. The constant 1/4 represents the maximum probability of ‘failure’; in the corresponding algorithm from Notes 7, the maximum probability of ‘failure’ at each attempt is 1/2.)

We turn this into a probabilistic (one-sided Monte Carlo) test as follows. Take an odd n > 1, and run Miller–Rabin k times, for different random x each time. If any of these tests outputs COMPOSITE, then we do not run any more of the tests, and output that n is COMPOSITE. If all of the tests output MAYBE PRIME, then we return that n is PRIME or, more accurately, PROBABLY PRIME. In this case, the output COMPOSITE is guaranteed to be correct, and if the output is PRIME then n is composite with probability at most 1/4k; for k = 20, this probabity is less than one in a billion. (I use the word billion in its proper sense, that is one billion is 1012; there are some strange uses of this word.) A randomised ‘algorithm’ that always gives a correct result is termed Las Vegas; the corresponding algorithm from Notes 7 is Las Vegas. It must be stressed that these probabilities apply numbers n encountered in the ‘wild’. In an adverserial setting, if you choose to apply Miller–Rabin only to certain bases, an enemy could pass off as ‘prime’ a composite number that was a pseudoprime to those bases. The same applies to poorly chosen ‘randomisation’, but not to true randomisation (except if you are unlucky). (With the exception of one paragraph, the rest of this subsection, §8.1.4, is not examinable.) Another question is whether we can make this into a deterministic algorithm by testing enough bases. We have the following. Theorem 38 (Eric Bach) Let n > 1 be an odd integer. If ERH (the Extended Riemann Hypothesis) holds, and the Miller–Rabin test returns MAYBE PRIME for each integer 2 a such that 2 6 a 6 min(2(logn) ,n − 2) then n is prime. This would make Miller–Rabin into a polynomial time algorithm. Unfortunately, the truth of ERH is an unsolved problem. Even the ‘basic’ Riemann Hypothesis remains

7 unsolved (as of 2019); it is also one of the Millennium Prize Problems. The various Riemann Hypotheses are problems about the locations of zeros of certain functions. We can also use Miller–Rabin to produce deterministic primality tests of the form: if n > 1 is odd and n < N and Miller–Rabin returns MAYBE PRIME when testing n for each base a ∈ S, then n is prime. Some such results are given below.

N members of S 2047 2 1373653 2,3 25326001 2,3,5 3215031751 2,3,5,7 2152302898747 2,3,5,7,11 3474749660383 2,3,5,7,11,13 341550071728321 2,3,5,7,11,13,17

But some people have done a lot of work determining pseudoprimes (using something more sophisticated than trial division), so that you don’t have to. In fact, 1373653 is a strong pseudoprime to bases 2 and 3. In contrast to the Fermat case, there is no multiplicative closure for Miller–Rabin liars. However, most strong pseudoprimes to bases 2 and 3 are also strong pseudoprimes to base 6; the smallest one that isn’t is 106485121. (This paragraph is examinable.) We can sometimes use Miller–Rabin to find fac- tors of composite n. In particular, this applies to Carmichael numbers, where if Miller– Rabin returns COMPOSITE we have either gcd(n,x) 6= 1 or a z such that z2 ≡ 1 (mod n) n−1 but z 6≡ ±1 (mod n). In the latter case, we have yb ≡ x ≡ 1 (mod n) and y0 6≡ 2 1 (mod n), so for some i, we have yi ≡ 1 6≡ yi (mod n), and we have yi 6≡ −1 (mod n), since we did not return MAYBE PRIME. We then take gcd(yi +1,n) and gcd(yi −1,n) to get proper factors of n. This method works pretty well whenever n has a good chance of being a Fermat pseudorpime to a given base, for example n = 2047, where thos probability is 23/89 (or 1/4 for bases coprime to n). For example if we apply Miller–Rabin to n = 2047 at the base a = 25 we get a1023 ≡ 622 (mod 2047) and a2046 ≡ 1 (mod 2047), and so 6222 ≡ 1 (mod 2047), giving gcd(621,2047) = 23 and gcd(623,2047) = 89 as proper factors of 2047. When implemented on a computer, a single run of Miller–Rabin to test n for pri- mality (or not) operates roughly as follows. This version also includes opportunities to find non-trivial factors of n. (This more complicated version is not examinable.)

• Let n − 1 = 2b · c where c is an odd integer. (This is really preprocessing.)

• Choose an integer x such that 0 < x < n.

8 • Calculate gcd(x,n). If this is not 1 then we can return COMPOSITE and exit. We also get a non-trivial factor of n. (There is only any point in performing this step to get the factor.)

• Now let y = xc mod n. (Use the repeated squaring method, and perform reduc- tions mod n whenever possible.) Also set i = 0.

• If y ≡ 1 (mod n) then we return MAYBE PRIME and exit.

• While i < b and y 6≡ ±1 (mod n) we do the operations z := y, y := y2 mod n, i := i + 1. 2i·c (We note that the value of y corresponding to i is yi = x mod n. When we exit this loop it’ll be because (i) 0 6 i < b and y = yi ≡ −1 (mod n); (ii) 0 < i < b 2 and y = yi ≡ z ≡ 1 (mod n) and z 6≡ ±1 (mod n); or (iii) i = b.)

• If y ≡ −1 (mod n) then we return MAYBE PRIME and exit.

• If y ≡ 1 (mod n) then we return COMPOSITE, and if desired gcd(z + 1,n) and gcd(z − 1,n), which are proper factors of n. We then exit.

• In all other cases (we now have i = b and y = yb 6≡ ±1 (mod n)), we return COMPOSITE and then exit.

8.1.5 Verifying that a number is prime Is it possible to verify that a number n > 1 is prime in polynomial time, given some extra information? The answer to this question is “yes”, and here is one way to do it. From earlier results (in Notes 7), we have λ(n) 6 φ(n) < n−1 if n is composite, while if n is prime then λ(n) = φ(n) = n−1 and a primitive element modulo n (one of order n − 1) exists. A result from Notes 7 tells us when something has order m. Putting all this together gives us the following two theorems.

Theorem 39 An integer n > 1 is prime if and only if there is an integer a such that a has order n − 1 modulo n.

Theorem 40 (Lucas) An integer n > 1 is prime if and only if there is an integer a such that (i) an−1 ≡ 1 (mod n), and (ii) a(n−1)/q 6≡ 1 (mod n) whenever q is a prime divisor of n − 1.

The latter theorem forms the basic test. We must apply it recursively to the prime factors of n − 1, and so on, until all primes we have are small enough to be proven prime by trial division. The associated certificate (the Pratt certificate) consists of all

9 non-small primes p found during the recursion (including n, of course), together with a primitive element modulo p and the prime factors of p − 1 (multiplicity does not matter). Verification of this certificate runs in polynomial time. At each step, we must perform at most logn (if n > 42) computations am (mod n) for various integers m with 0 < m < n. So this is certainly constant in the input size (constant times logn). We must also consider how many primes the recursion forces us to verify in this manner. It is not large enough to destroy the polynomial time property. On the other hand, finding this certificate requires factorising p − 1 for various primes p, and so is hard in general. (Finding the primitive elements is, at least conjec- turally, not really an issue.)

Example We first consider p = 4 · 1013 + 1 = 40000000000001, which is suspected of being prime. In this case, it is rather easy to factor p −1: we have p −1 = 215 ·513. We must first find a primitive element modulo p, by testing the numbers 2,3,4,5,... in turn that are not proper powers. We claim 3 is a primitive element modulo p (we note that 2 has order 2 · 1013 modulo p). This is because 3(p−1)/5 = 38000000000000 ≡ 24408569584485 6≡ 1 (mod p) and 3(p−1)/2 = 320000000000000 ≡ 40000000000000 ≡ −1 6≡ 1 (mod p) and so 3p−1 = 340000000000000 ≡ 1 (mod p). And that’s it! We have now proved that p is indeed a . There is no recursion in this case, since 2 and 5 are obviously prime. This is obviously a lot faster than trial dividing by (at √ least) all primes up to and including b pc = 6324555. (There are 433652 of them.)

Example We now consider the 18-digit prime p = 850444127862962813. This ex- ample has quite a bit of recursion. Horizontal lines separate the layers of recursion.

Prime q Primitive Element Factors of q − 1 850444127862962813 2 22 · 212611031965740703 212611031965740703 3 2 · 3 · 7 · 47189 · 107274310279 107274310279 3 2 · 3 · 24499 · 729787 47189 2 22 · 47 · 251 729787 2 2 · 3 · 121631 24499 2 2 · 32 · 1361 121631 22 2 · 5 · 12163 1361 3 24 · 5 · 17 12163 5 2 · 3 · 2027 2027 2 2 · 1013 1013 3 22 · 11 · 23

10 8.1.6 The Agrawal–Kayal–Saxena primality test (not examinable) The method finally used by Agrawal, Kayal and Saxena was a kind of combination of Fermat’s Little Theorem and Wilson’s Theorem, together with some ingenuity. They begin with the remark that n is prime if and only if

(x − 1)n ≡ xn − 1 (mod n) as polynomials, rather than integers. This is because the coefficients¨ in (x − 1)n are n n binomial coefficients i ; if n is prime, then i is a multiple of n for i = 1,...,n − 1, n but if n has a prime factor q then q is not a multiple of n. This is no good as it stands; we can raise x − 1 to the nth power with only 2log2 n multiplications, but the polynomials we have to deal with along the way have as many as n terms, too many to write down. So the trick is to work mod (n,xd − 1) for some carefully chosen number d. I refer to their paper for the details. This test has great theoretical importance as the first provably polynomial time pri- mality test, but appears not to be used in practice as of 2019. (Polynomial of too high a degree and bad implied constant.) Instead, the best (fully proved) general purpose algorithm appears to the the Elliptic Curve Primality Prover (ECPP), which may well run in polynomial time anyway.

8.1.7 Two special primality tests (not examinable) (This subsection was inserted for interest only and it is certainly not examinable. Also, it wasn’t lectured, and possibly never will be.) 2n We first state Pepin’s´ Test, which determines whether a Fn = 2 + 1 is prime. (Other 2m + 1 have an algebraic factor.) As of the time of writing only F0,...,F4 (that is 3, 5, 17, 257, 65537) are known to be prime. It is known that F5,...,F32 and various Fn for n > 36 are composite, and the status of F33 is unknown. The test (a form of the Pratt certificate) is as follows.

2n Theorem 41 (Pepin)´ Let n > 1 be an integer. Then the Fermat number Fn = 2 + 1 is prime if and only if 2n−1 n 32 ≡ −1 (mod 22 + 1).

2n Note that if this congruence holds, then 3 has multiplicative order 2 modulo Fn, and so Fn is prime. On the other hand, if n > 1 and Fn is prime, it can be shown that 3 is a non-square modulo Fn, and thus must be a primitive element modulo Fn. And n n n 22 −1 2 22 22 −1 since (3 ) = 3 ≡ 1 (mod Fn), we get that 3 ≡ −1 (mod Fn), since only ±1 square to 1 modulo a prime.

11 For large n, it is rather expensive to perform Pepin’s´ Test (for example F33 has 2585827973 decimal digits). The largest number for which there is any evidence 16777216 (known to me) of the test having been applied is F24 = 2 + 1, with 5050446 decimal digits. It is usually easier to prove Fn is composite by finding a non-trivial factor, using trial division! But one does not have to try every prime factor: if q is a 2n n+1 prime divisor of Fn then 2 ≡ −1 (mod q), so that 2 has order 2 modulo q. This implies (by Lagange’s Theorem) that 2n+1 | q − 1, that is q ≡ 1 (mod 2n+1). And if n > 2 then q ≡√1 (mod 8), and so 2 is a square modulo q (using a result from ), and 2 has order 2n+2 modulo q. Therefore q ≡ 1 (mod 2n+2). The trial division by q consists of n squarings and reductions modulo q, starting with 2; we have success if the final result is congruent to −1 modulo q. 16 For example, to show F4 = 2 +√1 = 65537 is prime, we need only test for divisi- bility by 65, 129 and 193 (as 257 > 65537 is too large), and we needn’t consider the obviously composite 65 and 129, leaving only 193 to test. 32 For F5 = 2 +1 = 4294967297, candidate factors are in the list 129, 257, 385, 513, 641, 769, 897, 1025, 1153, 1281,..., and removing some obvious composites (for ex- ample, multiples of 3 and 5) leaves us with 257, 641, 769, 1153, 1409, 1537, 1793, 1921, 2177, 2561, 2689,..., and testing these soon gives us the factor 641. The cofac- tor is 6700417, which is prime,√ which can be confirmed by testing divisibility by num- bers in this list less than 6700417 ≈ 2588.516. (Some of these are composite, for example 1537 = 29·53.) Let us see how divisibility by 641 works. We start off with 2, and square it repeatedly five times, reducing modulo 641 at each step. The outputs are 4, 16, 256, 65536 ≡ 154 (mod 641) and 1542 = 23716 ≡ 640 ≡ −1 (mod 641), and so 32 7 F5 is divisible by 641. (There is a proof that 641 | (2 +1) based on 641 = 2 ·5+1 = 54 +24, so that 232 +1 ≡ −(228 ·54)+1 = −(27 ·5)4 +1 ≡ −(−1)4 +1 = 0 (mod 641); this does not generalise in any sensible manner.) m a The second test is for the Mersenne numbers Mm = 2 − 1. Now 2 − 1 divides 2m − 1 whenever a | m, since xa − 1 divides xm − 1 in this case. So there is only any point in testing 2m − 1 for primality when m is prime (which can be done by trial division, as this will be much faster than anything else you do).

Theorem 42 (Lucas–Lehmer) Let p be an odd prime. Define the sequence u0, u1, u2,... of integers recursively by 2 u0 = 4, u j+1 = u j − 2 for all j > 0 p Then the Mersenne number Mp = 2 − 1 is prime if and only if up−2 ≡ 0 (mod Mp). (Of course M2 = 3 is prime.) √ √ 2 j 2 j It is readily shown (by induction) that u j = (2 + 3) + (2 − 3) for all integers j > 0. Thus the√ Lucas–Lehmer Test asserts that (for p an odd prime) Mp is prime if and only if 2 + 3 has order 2p in something, which I am not going to specify.

12 The sequence u0, u1, u2,... grows alarmingly quickly; the first few terms are 4, 14, 194, 37634, 1416317954. In order to make Lucas–Lehmer into an effective test, we 2 must reduce the u j modulo Mp at every step, that is we define u j+1 = (u j −2) mod Mp. Even then, the Lucas–Lehmer Test is expensive (on multi-million digit numbers), and it is cheaper to use trial division to locate ‘small’ factors of Mp. It can be shown that if q is a prime divisor of Mp (p an odd prime) then q ≡ 1 (mod 2p) and q ≡ ±1 (mod 8). The trial division must be performed modulo q to check whether 2p ≡ 1 (mod q). The Great Internet Search (GIMPS) does all these things. The Pepin´ and Lucas–Lehmer Tests for the numbers 2m ±1 are essentially of equal difficulty, and much faster than any general primality test would be for numbers of this size. They both involve about m squarings of (at most) m-bit numbers, perhaps m subtractions of 2, and m reductions modulo 2m ± 1. For computers, working in binary, the modular reductions are particularly easy, being either addition or subtrac- tion of m-bit numbers. (There is one trivial and very easily detected exception to all this.) When testing numbers of realistic size, for example the primes M13466917 and 16777216 M20996011, and the composite number F24 = 2 + 1, even doing the multiplica- tions (squarings) is a problem, especially since we must do so many of them. We want 2 log 3 something a lot faster than up to C1m bit operations (Long Multiplication) or C2m 2 bit operations (Karatsuba), where C1 and C2 are constants. We want something more like Schonhage–Strassen,¨ which takes time C3mlogmloglogm, for some constant C3. There is some devil in the detail of the constants, and unsurprisingly C3 > C2 > C1; for m ≈ 224 = 16777216 we are comfortably in the realm where Schonhage–Strassen¨ outperforms Karatsuba and Long Multiplication. (Please seek more information else- where, if you are interested.) So now we can say why so many of the largest known primes are Mersenne num- bers: they have a faster primality test than almost any other numbers of similiar size, and in a given size range there are so many more Mersenne numbers than Fermat numbers.

8.2 Factorisation

There is not a lot to say about factorisation: it is a hard problem! There are some special tricks which have been used to factorise huge numbers of some special forms, m such as Fermat numbers 22 + 1 and Mersenne numbers 2p − 1 (for p prime). (More generally, the Cunningham project tries to factor the numbers bm ± 1 for b = 2, 3, 5, 6, 7, 10, 11, 12 and large m; other people extend this to larger bases b.) Of course, we would avoid such special numbers when designing a cryptosystem. However, one should not overestimate the difficulty of factorisation. Numbers with well over 100 digits can be factorised with today’s technology; several power-

13 ful factoring algorithms exist. The gap between primality testing and factorisation is sufficiently narrow that it is necessary to keep updating an RSA system to use larger primes. And nor should we underestimate the power of today’s computers. My desktop verified that p = (1019 − 1)/9 was prime in under 10 seconds using trial division, so any integer up to this size can be factored at least as quickly using trial division. (In this case, it is much faster to produce and verify the Pratt certificate for p, even if we use trial division to factor p − 1 = 2 · 32 · 5 · 7 · 11 · 13 · 19 · 37 · 52579 · 333667; this is because all the prime factors of p − 1 are fairly small. We have not taken advantage of the algebraic relation that p − 1 = 10(109 − 1)(109 + 1), which would speed up the factorisation.) Later we may touch on quantum computing and see why the advent of this tech- nology (if and when it comes) will allow efficient factorisation of large numbers and make the RSA system insecure. (As of November 2019 we are not there; 55 qubits [re- ports vary] are simply insufficient to apply Shor’s Algorithm to RSA-sized numbers. But Shor’s Algorithm isn’t the only quantum factorisation algorithm.) We discuss briefly just two factorisation techniques other than trial division: Fer- mat factorisation and Pollard’s p−1 method. Neither is a particularly general method, but they are easier to describe than other methods. And there is a powerful method based on Fermat factorisation.

8.2.1 Fermat factorisation

We only describe the basic form of this method, and do not seek√ improvements. This method effective at finding factors of N in the vicinity of N. The basis for this method is that x2 − y2 = (x + y)(x − y). So we attempt to write the integer N we wish to factorise as N = x2 − y2 = (x − y)(x + y) for suitable integers x and y. Now x − y and x + y are both even or both odd, so this method cannot work when N is twice an odd number. We would normally use this method√ when N is odd. The basic method is follows. Let M = d Ne and for x = M,M + 1,M + 2,... we test whether x2 − N is a perfect square, since this should be y2. We stop when we get 2 bored or have found x such that x − N is a perfect square. Letting x =√M or M + 1 4 effectively tests√N for potential factors in an interval of length at least 2 N roughly centred around N. But unfortunately testing further values of x gives diminishing returns in terms of the length of interval tested. It is better to use trial division rather than Fermat to find the small factors of N. √ Example Let N = 589. Then M = d Ne = 25. We have M2 −N = 625−589 = 362 is square, and so we get the proper factorisation 589 = (25 − 6)(25 + 6) = 19 · 31 at

14 the first attempt, clearly beating trial division. Clearly both 19 and 31 are prime. √ Example Let N = 4·1012 +1. Then M = d Ne = 2000001, and we have M2 −N = 4000000 = 20002. So again, we get a factorisation at the first attempt, in this case N = 1998001 · 2002001. The number 2002001 is prime. It√ is best not to use Fermat to factor N1 = 1998001 = 277 · 7213. (We have M1 = d Ne = 1414, and Fermat 2 2 succeeds by noting that (M1 + 2331) − N1 = 3468 .) √ Example Let N = 20106739. Then M = d Ne = 4485. We have the following table for the smaller values of x (those at most M + 6). a 0 1 2 3 4 5 6 (M + a)2 − N 8486 17457 26430 35405 44382 53361 62342

Of these (M + a)2 − N values, only 53361 = 2312 is a square. So the factors are 4485 + 5 ± 231 = 4490 ± 231, and so N = 20106739 = 4259 · 4721. This is much faster than trial division. (It is worth noting that some values of a cannot work. In this example whenever a is even then (M + a)2 − N must be twice an odd number, and so cannot be square.) It turns out that both 4259 and 4721 are prime. √ Example Let N = 7 · 1011 + 1 = 41149 · 17011349. Then M = d Ne = 836661. In this case, the Fermat method runs for quite a long time before producing a factor, only succeeding when it finds that (M + 7689588)2 − N = 84851002. In this case, the trial division was much faster.

8.2.2 Pollard’s p − 1 method (not examinable) This method (not to be confused with the Pollard ρ method by the same person) works well if the number N we are trying to factorise has a prime factor p such that p−1 has only small divisors. Suppose that we can choose a number b such that every prime power divisor q of p − 1 satisfies q 6 b. The algorithm works as follows. [Currently, there are some lies in this paragraph; I shall remove them and replace them by more accurate statements as time permits.] Choose any number a > 1, and by successive powering compute x = ab! mod N. By assumption, every prime power divisor of p − 1 is at most b, and hence divides b!. Hence p − 1 divides b! [not quite true]. Thus, ab! ≡ 1 (mod p) by Fermat’s Little Theorem, so that p divides x − 1. By assumption, p divides N. Hence gcd(x − 1,N) is a multiple of p, and so is a non-trivial divisor of N. (Indeed, in the RSA case, if N is the product of two primes, then gcd(x − 1,N) will be a prime factor of N [not quite true either].)

15 Here is an example. Let N = 6824347 and b = 10. Choosing a = 2, we find that x = 5775612 and gcd(x − 1,N) = 2521. Thus, 2521 is a factor of N, and with a bit more work we find that it is prime and that N = 2521 · 2707 is the prime factorisation of N. The method succeeds because

2521 − 1 = 23 · 32 · 5 · 7 and all the prime power divisors are smaller than 10. Of course, if this condition were not satisfied, the method would probably fail. If we replace 2521 by 2531 in the above example, we find that N = 2531 · 2707 = 6851417, x = 210! mod N = 6414464, and gcd(x − 1,N) = 1. Because we have to calculate ab! mod N by successively replacing a by ai mod N for i = 1,...,b, we have to perform b − 1 mod N. So the method will k not be polynomial-time unless b 6 (logN) for some k. So we are only guaranteed success in polynomial time if the prime-power factors of p − 1 for at least one of the k divisors of N are at most (logN) – this√ is small compared to the magnitudes of the primes involved, which may be roughly N. Thus, in choosing the primes p and q for an RSA key, we should if possible avoid primes for which p − 1 or q − 1 have only small prime power divisors; these are the most vulnerable.

16