are so laborious and difficult that even for numbers that do not exceed the limits of tables constructed by Computational Number Theory estimable men, they try the patience of even the prac- ticed calculator. And these methods do not apply at C. Pomerance all to larger numbers. . . Further, the dignity of the sci- ence itself seems to require that every possible means be explored for the solution of a problem so elegant and so celebrated. 1 Introduction Factorization into primes is a very basic issue in Historically, computation has been a driving force in number theory, but essentially all branches of number the development of mathematics. To help measure the theory have a computational component. And in some sizes of their fields, the Egyptians invented geometry. areas there is such a robust computational literature To help predict the positions of the planets, the Greeks that we discuss the algorithms involved as mathemat- invented trigonometry. Algebra was invented to deal ically interesting objects in their own right. In this with equations that arose when mathematics was used article we will briefly present a few examples of the to model the world. The list goes on, and it is not just computational spirit: in analytic number theory (the historical. If anything, computation is more important distribution of primes and the Riemann hypothesis); than ever. Much of modern technology rests on algo- in Diophantine equations (Fermat’s last theorem and rithms that compute quickly: examples range from the abc conjecture); and in elementary number theory the wavelets that allow CAT scans, to the numer- (primality and factorization). A secondary theme that ical extrapolation of extremely complex systems in we shall explore is the strong and constructive inter- order to predict weather and global warming, and to play between computation, heuristic reasoning, and the combinatorial algorithms that lie behind Internet conjecture. search engines (see Section ?? of The Mathematics of Algorithm Design). In pure mathematics we also compute, and many 2 Distinguishing Prime Numbers from of our great theorems and conjectures are, at root, Composite Numbers motivated by computational experience. It is said that Gauss, who was an excellent computationalist, needed The problem is simple to state. Given an integer n>1, only to work out a concrete example or two to dis- decide if n is prime or composite. And we all know an cover, and then prove, the underlying theorem. While algorithm. Divide n by each positive integer in turn. some branches of pure mathematics have perhaps lost Either we find a proper factor, in which case we know contact with their computational origins, the advent that n is composite, or we do not, in which case we of cheap computational power and convenient mathe- know that n is prime. For example, take n = 269. It matical software has helped to reverse this trend. is odd, so it has no even divisors. It is not a multiple One mathematical area where the new emphasis on of 3, so it has no divisor which is a multiple of 3. computation can be clearly felt is number theory, and Continuing, we rule out 5, 7, 11, and 13. The next that is the main topic of this article. A prescient call- possibility, 17, has a square that is greater than 269, to-arms was issued by Gauss as long ago as 1801: which means that if 269 were a multiple of 17, then it would also have to be a multiple of some number less The problem of distinguishing prime numbers from than 17. Since we have ruled that out, we can stop our composite numbers, and of resolving the latter into trial division at 13 and conclude that 269 is prime. (If their prime factors, is known to be one of the most we were actually carrying out the algorithm, we might important and useful in arithmetic. It has engaged the try dividing 269 by 17, in which case we would discover industry and wisdom of ancient and modern geome- × ters to such an extent that it would be superfluous to that 269 = 15 17+14. At that point we would notice discuss the problem at length. Nevertheless we must that the quotient, 15, is less than 17, which is what 2 confess that all methods that have been proposed would tell us that 17 was greater than 269. Then we thus far are either restricted to very special cases or could stop.) In general, since a composite number n
1 2 Princeton Companion to Mathematics Proof
√ − has a proper factor d with d n,√ one can give up on of (n 1)! + 1. Thus, we have an ironclad criterion the trial dividing once one passes n, at which point for primality. However, the Wilson criterion does not we know that n is prime. meet the standard of being simply computed, since we This straightforward method is excellent for men- know no especially rapid way of computing factorials tal computation with small numbers, and for machine modulo another number. For example, Wilson predicts computation for somewhat larger numbers. But it that 268! ≡−1 (mod 269), as we have already seen scales poorly, in that if you double the number of digits that 269 is prime. But if we did not know this already, of n, then the time for the worst case is squared; it is how in the world could we quickly find the remain- therefore an “exponential time” algorithm. One might der when 268! is divided by 269? We can work out tolerate such an algorithm for 20-digit inputs, but the product 268! one factor at a time, but this would think how long it would take to establish the primality take many more steps than trying divisors up to 17. of a 40-digit number! And you can forget about num- It is hard to prove that something cannot be done, bers with hundreds or thousands of digits. The issue and in fact there is no theorem that says we cannot of how the running time of an algorithm scales when compute a!modb in polynomial time. We do know one goes to larger inputs is absolutely paramount in some ways of speeding up the computation over the measuring one algorithm against another. In contrast totally naive method, but all methods known so far to the exponential time it takes to use trial division to take exponential time. So, Wilson’s theorem initially recognize primes, consider the problem of multiplying seems promising, but in fact it is no help at all unless two numbers. The school method of multiplication is we can find a fast way to compute a!modb. to take each digit of one number in turn and multiply How about Fermat’s little theorem? Note that it by the other number, forming a parallelogram array. 27 = 128, which is 2 more than a multiple of 7. Or One then performs an addition to obtain the answer. If take 35 = 243, which is 3 mod 5. Fermat’s little the- you now double the number of digits in each number, orem tells us that if n is prime and a is any integer, then the parallelogram becomes twice as large in each then an ≡ a (mod n). If computing a large factorial dimension, so the running time grows by a factor of modulo n is hard, perhaps it is also hard to compute about 4. Multiplication of two numbers is an example a large power modulo n. of a “polynomial time” algorithm; its runtime scales It cannot hurt to try it out for some moderate by a constant factor when the input length is doubled. example to see if any ideas pop up. Take a = 2 and One might then rephrase Gauss’s call to arms as n = 91, so that we are trying to compute 291 mod 91. follows. Is there a polynomial time algorithm that dis- A powerful idea in mathematics is that of reduc- tinguishes prime numbers from composite numbers? Is tion. Can we reduce this computational problem to there a polynomial time algorithm that can produce a a smaller one? Notice that if we had already com- nontrivial factor of a composite number? It might not puted 245 mod 91, obtaining a remainder r , say, then be apparent at this point that these are two different 1 91 ≡ 2 questions, since trial division does both. We will see, 2 2r1 (mod 91). That is, it is just a short addi- though, that it is convenient to separate them, as did tional calculation to get to our goal, yet the power 45 Gauss. is only half as big. How to continue is clear: we further Let us focus on recognizing primes. What we would reduce to the exponent 22, which is less than half of 45. 22 45 ≡ 2 like is a simply computed criterion that primes sat- If 2 mod 91 = r2, then 2 2r2 (mod 91). And of 22 11 isfy and composites do not, or vice versa. An old course 2 is the square of 2 , and so on. It is not theorem of Wilson might just fit the bill. Note that so hard to “automate” this procedure: the exponent 6! = 720, which is just one less than a multiple of sequence seven. Wilson’s theorem asserts that if n is prime, then 1, 2, 5, 11, 22, 45, 91 − ≡− (n 1)! 1 (mod n). (The meaning of this and can be read directly from the binary (base 2) repre- modular arith- similar statements is explained in sentation of 91 as 1011011, since the above sequence metic .) This cannot hold when n is composite, for if in binary is p is a prime factor of n and is smaller than n, then it is a factor of (n − 1)!, so it cannot possibly be a factor 1, 10, 101, 1011, 10110, 101101, 1011011. Princeton Companion to Mathematics Proof 3
These are the initial strings from the left of 1011011. grows) you would choose a pair with n prime, a result And it is plain that the transition from one term to of Erd˝os and myself. the next is either the double or the double plus 1. It is possible to combine Fermat’s little theorem This procedure scales nicely. When the number of with another elementary property of (odd) prime digits of n is doubled, so is the sequence of exponents, numbers. If n is an odd prime, there are exactly two and the time it takes to get from one exponent to solutions to the congruence x2 ≡ 1 (mod n), namely the next, being a modular multiplication, is multiplied ±1. Actually, some composites have this property as by 4. (As with naive multiplication, naive divide-with- well, but composites divisible by two different odd remainder also takes four times as long when the size primes do not. of the problem is doubled.) Thus, the overall time is Now let us suppose that n is an odd number and multiplied by 8, yielding a polynomial time method. that we wish to determine whether it is prime. Sup- We call this the “powermod” algorithm. pose that we pick some number a with 1 a n − 1 − So, let us try to illustrate Fermat’s little theorem, and discover that an 1 ≡ 1 (mod n). If we set x = − − taking a = 2 and n = 91. Our sequence of powers is a(n 1)/2, then x2 = an 1 ≡ 1 (mod n); so, by the sim- ple property of primes just mentioned, if n is prime, 21 ≡ 2, 22 ≡ 4, 25 ≡ 32, 211 ≡ 46, then x must be ±1. Therefore, if we calculate a(n−1)/2 22 ≡ 45 ≡ 91 ≡ 2 23, 2 57, 2 37, and discover that it is not congruent to ±1 (mod n), where each congruence is modulo 91, and each term then n must be composite. in the sequence is found by squaring the prior one Let us try this idea with a =2,n = 561. We know 560 280 mod 91 or squaring and multiplying by 2 mod 91. already that 2 ≡ 1 (mod 561), so what is 2 mod Wait a second: does Fermat’s little theorem not say 561? This too turns out to be 1, so we have not shown that we are supposed to get 2 for the final residue? that 561 is composite. However, we can go further, 140 Well, yes, but this is guaranteed only if n is prime. And since now we know that 2 is also a square root of 1 140 ≡ as you have probably already noticed, 91 is composite. and computing this we find that 2 67 (mod 561). In fact, the computation proves this. So now we have found a square root of 1 that is not ± Quite remarkably, here is an example of a compu- 1, which proves that 561 is composite. (Of course, for tation that proves that n is composite, yet it does not this particular number, it is obviously divisible by 3, reveal any nontrivial factorization! so there was not really any mystery about whether it You are invited to try out the powermod algorithm was prime or composite. But the method can be used as above, but to change the base of the power from 2 in much less obvious cases.) In practice, there is no to 3. The answer you should come to is that 391 ≡ 3 need to backtrack from a higher exponent to a smaller 560 (mod 91): that is, the congruence for Fermat’s little one. Indeed, in order to calculate 2 (mod 561) by theorem holds. Since you already know that 91 is com- the efficient method outlined earlier, one calculates 140 280 posite, I am sure you would not jump to the false con- the numbers 2 and 2 along the way, so that this clusion that it is prime! So, as it stands, Fermat’s little generalization of the earlier test is both quicker and theorem can be used to sometimes recognize compos- stronger. ites, but it cannot be used to recognize primes. Here is the general principle that we have illus- There are two interesting further points to be made trated. Suppose that n is an odd prime and let a be − s regarding Fermat’s little theorem. First, on the nega- an integer not divisible by n. Write n 1=2 t, where tive side, there are some composites, such as n = 561, t is odd. Then t 2it where the Fermat congruence holds for every integer either a ≡ 1 (mod n)ora ≡−1 (mod n) a. These numbers n are called Carmichael numbers, for some i =0, 1,...,s−1. Call this the strong Fermat and unfortunately (from the point of view of testing congruence. The wonderful thing here is that there is primality) there are infinitely many of them, a result no analogue of a Carmichael number—as proved inde- due to Alford, Granville, and me. But, on the positive pendently by Monier and Rabin. They showed that if side, if one were to choose randomly among all pairs n is an odd composite, then the strong Fermat con- a, n for which an ≡ a (mod n), with a If you want only to be able to distinguish between beyond the scope of this article.) In further work, primes and composites in practice, and you do not Bach was able to show that we may take c =2in insist on proof, then you have read enough. Namely, this last result. Summarizing, if this generalized Rie- given a large odd number n, choose 20 values of a mann hypothesis holds, and if the strong Fermat con- at random from [1,n− 1], and begin trying to verify gruence holds for every positive integer a 2(log n)2, the strong Fermat congruence with these bases a.Ifit then n is prime. So, provided that a famous unproved should ever fail, you may stop: the number n must be hypothesis in another field of mathematics is correct, composite. And if the strong Fermat congruence holds, one can decide in polynomial time, via a deterministic we might surmise that n is actually prime. Indeed, algorithm, whether n is prime or composite. (It has if n were composite, the Monier–Rabin theorem says been tempting to use this conditional test, for if it that the chance that the strong Fermat congruence should ever lie to you and tell you that a particular would hold for 20 random bases is at most 4−20, which composite number is prime, then this failure—if you is less than one chance in a trillion. Thus we have a were able to detect it—would be a disproof of one of remarkable probabilistic test for primality. If it tells the most famous conjectures in mathematics. Perhaps us that n is composite, then we know for sure that n this is not too disastrous a failure!) is composite; if it tells us that n is prime, then the After Miller’s test in the 1970s, the question contin- chances that n is not prime are so small as to be more ually challenging us was whether it is possible to test or less negligible. for primality in polynomial time without assuming If three-quarters of the numbers a in [1,n− 1] pro- unproved hypotheses. Recently, Agrawal et al. (2004) vide the key to an easily checkable proof that the answered this question with a resounding yes. Their odd composite number n is indeed composite, surely idea begins with a combination of the binomial theo- it should not be so hard to find just one! How about rem and Fermat’s little theorem. Given an integer a, checking small numbers a, in order, until one is found? consider the polynomial (x + a)n and expand it in the Excellent, but when do we stop? Let us think about usual way through the binomial theorem. Each inter- this for a moment. We have given up the power of ran- mediate term between the leading xn and the trailing domness and are forcing ourselves to choose sequen- an has the coefficient n!/(j!(n−j)!) for some j between tially among small numbers for the trial bases a. Can 1 and n − 1. If n is prime, then this coefficient, which we argue heuristically that they continue to behave is an integer, is divisible by n because n appears as a as if they were random choices? Well, there are some factor in the numerator that is not cancelled by any connections among them. For example, if taking a =2 factors in the denominator. That is, the coefficient is does not result in a proof that n is composite, then 0 (mod n). For example, (x +1)7 is equal to neither will taking any power of 2. It is theoretically x7 +7x6 +21x5 +35x4 +35x3 +21x2 +7x +1, possible for 2 and 3 not to give proofs that n is com- posite but for 6 to work just fine, but this turns out not and we see each internal coefficient is a multiple of 7. to be very common. So let us amend the heuristic and Thus, we have (x +1)7 ≡ x7 + 1 (mod 7). (Two poly- assume that we have independence for prime values nomials are congruent mod n if corresponding coeffi- of a.Uptologn log log n there are about log n primes cients are congruent mod n.) In general, if n is prime (via the prime number theorem discussed later in and a is any integer, then via this binomial-theorem this article); so, heuristically, the probability that n is idea and Fermat’s little theorem we have composite, but that none of these primes help us to n ≡ n n ≡ n − log n −4/3 (x + a) x + a x + a (mod n). prove it, is about 4 So, if we can build up a special number m that may Fermat’s method. Let’s try it: be likely to have a nontrivial factor in common with n, 412 − 1649 = 32, we can use Euclid’s algorithm to discover this factor. 422 − 1649 = 115, For example, Pollard and Strassen (independently) 2 − used this idea, together with fast subroutines for mul- 43 1649 = 200, tiplication and polynomial evaluation, to enhance the . . trial division method discussed in the last section. Somewhat miraculously, one can take the integers up Well, no squares yet, which is not surprising, since to n1/2, break them into n1/4 subintervals of length the Fermat method is often very poor. But wait, do n1/4, and for each subinterval calculate the GCD of the first and third lines not multiply together to give a 2 n with the product of all the integers in the subin- square? Yes they do, 32·200=80 . So, multiplying the terval, spending only about n1/4 elementary steps in first and third lines, and treating them as congruences total. If n is composite, then at least one GCD will mod 1649, we have be larger than 1, and then a search over the first such (41 · 43)2 ≡ 802 (mod 1649). subinterval will locate a nontrivial factor of n. To date, That is, we have a pair u, v with u2 ≡ v2 (mod 1649). this algorithm is the fastest rigorous and deterministic This is not quite the same as having u2 − v2 = 1649, method of factoring that we know. but we do have 1649 a divisor of u2 − v2 =(u − v)(u + Most practical factoring algorithms are based on v). Now maybe 1649 divides one of these factors, but unproved but reasonable-seeming hypotheses about if it does not, then it is split between them, and so the natural numbers. Although we may not know how a computation of the GCD of u − v (or u + v) with to prove rigorously that these methods will always pro- 1649 will reveal a proper factor. Now v = 80 and u = duce a factorization, or do so quickly, in practice they 41·43 ≡ 114 (mod 1649), and so we see instantly that do. This situation resembles the experimental sciences, u ≡±v (mod 1649), so we are in business. The GCD where hypotheses are tested against experiments. Our of 114−80 = 34 with 1649 is 17. Dividing, we see that experience with certain factoring algorithms is now 1649 = 17 · 97, and we are done. so overwhelming that a scientist might claim that a Can we generalize this? In trying to factor n = 1649 physical law is involved. As mathematicians, we still we considered consecutive values of the quadratic√ search for proof, but fortunately the numbers we fac- polynomial f(j)=j2 −n for j starting just above n, tor do not feel the need to wait for us. 2 ≡ and viewed these as congruences j f(j) (mod n). I often mention a contest problem from my high M Then we found a set of numbers j with j∈M f(j) school years: factor 8051. The trick is to notice that 2 equal to a square, say v . We then let u = j∈M j,so 8051 = 902 −72 = (90−7)(90+7), from which the fac- that u2 ≡ v2 (mod n). Since u ≡±v (mod n), we torization 83·97 can be read off. In fact every odd com- could split n via the GCD of u − v and n. posite can be factored as the difference of two squares, There is another lesson that we can learn from our an idea that goes back to Fermat. Indeed, if n has small example with n = 1649. We used 32 and 200 1 the nontrivial factorization ab, then let u = 2 (a + b) to form our square, but we ignored 115. If we had 1 − 2 − 2 and v = 2 (a b), so that n = u v , and a = u + v, thought about it, we might have noticed from the b = u − v. This method works very well if n has a start that 32 and 200 were more likely to be use- divisor very close to n1/2,asn = 8051 does, but in ful than 115. The reason is that that 32 and 200 the worst case, the Fermat method is slower than trial are smooth numbers (meaning that they have only division. small prime factors), while 115 is not smooth, hav- My quadratic sieve method (which follows work of ing the relatively large prime factor 23. Say you have Kraitchik, Brillhart–Morrison, and Schroeppel) tries k + 1 positive integers that involve in their prime to efficiently extend Fermat’s idea to all odd com- factorizations only the first k primes. It is an easy posites. For example, take n = 1649. We start just theorem that some nonempty subset of these num- above n1/2 with j = 41, and consider the numbers bers has product a square. The proof has us asso- j2 − 1649. As j runs, we will eventually hit a value ciate with each of these numbers, which can be writ- 2 − a1 a2 ··· ak where j 1649 is a square, and so be able to use ten in the form p1 p2 pk ,anexponent vector Princeton Companion to Mathematics Proof 7 (a1,a2,...,ak). Since squares are detected by all even ber of the same size, an assumption that is probably exponents, we really only care whether the exponents correct even if it is hard to prove. ai are odd or even. Thus, we think of these vectors Finally, note that if the final GCD yields only a as having coordinates 0 and 1, and when we add trivial factor with n, one can continue just a bit longer them (which corresponds to multiplying the under- and find more linear dependencies, each with a fresh lying numbers), we do so mod 2. Since we have k +1 chance at splitting n. vectors, each with only k coordinates, an easy matrix These thoughts lead us to a time bound of about calculation leads quickly to a nonempty subset that exp( log n log log n ) adds up to the 0-vector. The product of the corre- sponding integers is then a square. for the quadratic sieve to factor n. Instead of being In our toy example with n = 1649, the first and exponential in the number of digits of n, as with trial third numbers, which are 32 = 253050 and 200 = division, this is exponential in about the square root 233052, have exponent vectors (5, 0, 0) and (3, 0, 2), of the number of digits of n. This is certainly a huge which reduce to (1, 0, 0) and (1, 0, 0), so we see that the improvement, but it is still a far cry from polynomial sum of them is (0, 0, 0), which indicates that we have time. a square. We were lucky that we could make do with Lenstra and I actually have a rigorous random fac- just two vectors, instead of the four that the above toring method with the same time complexity as that argument shows would be sufficient. above for the quadratic sieve. (It is random in the sense that a coin is flipped at various junctures, and In general with the quadratic sieve, one finds decisions on what to do next depend on the outcomes smooth numbers in the sequence j2 − n, forms the of these flips. Through this process, we expect to get exponent vectors mod 2, and then uses a matrix to a bona fide factorization within the advertised time find a nonempty subset which adds up to the 0-vector, bound.) However, the method is not so computer prac- which then corresponds to a set M where f(j) j∈M tical, and if you had to choose in practice between the is a square. two, then you should go with the nonrigorous quad- In addition, the “sieve” in the quadratic sieve comes ratic sieve. A triumph for the quadratic sieve was the 2 − in with the search for smooth values of f(j)=j n. 1994 factorization of the 129-digit RSA cryptographic These numbers are the consecutive values of a (quad- challenge first published in Martin Gardner’s column ratic) polynomial, so those divisible by a given prime in Scientific American in 1977. can be found in regular places in the sequence. For The number field sieve, which is another sieve-based 2 − example, in our illustration, j 1649 is divisible by 5 factoring algorithm, was discovered in the late 1980s ≡ precisely when j 2 or 3 (mod 5). A sieve, very much by Pollard for integers close to powers, and later devel- like the sieve of Eratosthenes, can then be used to oped by Buhler, Lenstra and me for general integers. 2 − efficiently find the special numbers j where j n is The method is similar in spirit to the quadratic sieve, smooth. A key issue though is how smooth a value f(j) but assembles its squares from the product of certain has to be for us to decide to accept it. If we choose sets of algebraic integers. The number field sieve has a smaller bound for the primes involved, we do not a conjectured time complexity of the type have to find all that many of them to use the matrix 1/3 2/3 method. But such very smooth values might be very exp(c(log n) (log log n) ), rare. If we use a larger bound for the primes involved, for a value of c slightly below 2. For composite num- then smooth values of f(j) may be more common, bers beyond 100 digits or so that have no small prime but we will need many of them. Somewhere between factor, it is the method of choice, with the current smaller and larger is just right! In order to make the record being 200 decimal digits. choice, it would help to know how frequently values The sieve-based factorization methods share the of an irreducible quadratic polynomial are smooth. property that if you use them, then all composite num- Unfortunately, we do not have a theorem that tells bers of about the same size are equally hard to factor. us, but we can still make a good choice by assuming For instance, factoring n will be about as difficult if that this frequency is about that for a random num- n is a product of five primes each roughly near the 8 Princeton Companion to Mathematics Proof fifth root of n as it will be if n is a product of two Lenstra had the brilliant idea of using the Pollard primes roughly near the square root of n. This is quite method in the context of elliptic curve groups. unlike trial division, which is happiest when there is a There are many elliptic curve groups associated with small prime factor. We will now describe a famous fac- the prime p, and therefore many chances to hit upon torization method due to Lenstra that detects small one where the number of elements is smooth. Of great prime factors before large ones, and beyond baby cases importance here are theorems of Hasse and Deuring. is much superior to trial dividing. This is his elliptic An elliptic curve mod p (for p>3) can be taken as curve method. the set of solutions to the congruence y2 ≡ x3 + ax + b Just as the quadratic sieve searches for a number (mod p), for given integers a, b with the property that 3 m with a nontrivial GCD with n, so does the ellip- x +ax+b does not have repeated roots mod p. There tic curve method. But where the quadratic sieve (and is one additional “point at infinity” thrown in (see the number field sieve) painstakingly build up to a below). A fairly simple addition law (but not as sim- successful m from many small successes, the elliptic ple as adding coordinatewise!) makes the elliptic curve curve method hopes to hit upon m with essentially into a group, with the point at infinity as the identity. Weil one lucky choice. Hasse, in a result later generalized by with his famous proof of the “Riemann hypothesis for curves,” Choosing random numbers m and testing their showed us that the number of elements in the ellip- GCD with n can also have instant success, but you √ tic curve group always lies between p +1− 2 p and can well imagine that if n has no small prime factors, √ p +1+2 p. And Deuring proved that every number then the expected time for success would be enormous. in this range is indeed associated with some elliptic Instead, the elliptic curve method involves consider- curve mod p. ably more cleverness. Say we randomly choose integers x1, y1, a, and Consider first the “p − 1 method” of Pollard. Sup- 2 3 then choose b so that y is congruent to x + ax1 + b pose you have a number n you wish to factor and a 1 1 (mod n). This gives us the curve with coefficients a, b certain large number k. Unbeknownst to you, n has a and a point P =(x1,y1) on the curve. One can then prime factor p with p − 1 a divisor of k, and another mimic the Pollard strategy, with a number k as before prime factor q with q − 1 not a divisor of k. You can with many divisors, and with the point P playing the use this imbalance to split n. First of all, by Fermat’s role of u. Let kP denote the k-fold sum of P added to k ≡ little theorem there are many numbers u with u 1 itself using elliptic curve addition. If kP is the point at k ≡ (mod p) and u 1 (mod q). Say you have one of infinity on the curve considered mod p (which it will k − these, and let m be u 1 reduced mod n. Then the be if the number of points on the curve is a divisor of GCD of m and n is a nontrivial factor of n; it is divisi- k), but not on the curve considered mod q, then this ble by p but not by q. Pollard suggests taking k as the gives us a number m whose GCD with n is divisible least common multiple of the integers to some mod- by p and not by q. We will have factored n. erate bound so that it has many divisors and perhaps To see where m comes from it is convenient to con- − a decent chance that it is divisible by p 1. The best sider the curve projectively: we take solutions (x, y, z) case of Pollard’s method is when n has a prime factor of the congruence y2z ≡ x3 + axz2 + bz3 (mod p). − p with p 1 smooth (has all small prime factors—see The triple (cx, cy, cz) when c = 0 is considered to be the quadratic sieve discussion above). But if n has no the same as (x, y, z). The mysterious point at infin- prime factors p with p − 1 smooth, Pollard’s method ity is now demystified; it is just (0, 1, 0). And our fares poorly. point P is (x1,y1, 1). (This is the mod p version of What is going on here is that corresponding to the classical projective geometry which is described prime p we have the multiplicative group of the p − 1 in Section ?? of Some Fundamental Mathemati- nonzero residues mod p. Furthermore, when doing cal Definitions.) Say we work mod n and compute arithmetic mod n with numbers relatively prime to the point kP =(xk,yk,zk). Then the candidate for n, we are, whether we realize it or not, doing arith- the number m is just zk. Indeed, if kP is the point at k metic in this group. We are exploiting the fact that u infinity mod p, then zk ≡ 0 (mod p), and if it is not is the group identity mod p, but not mod q. the point at infinity mod q, then zk ≡ 0 (mod q). Princeton Companion to Mathematics Proof 9 − 1 When Pollard’s p 1 method fails, our only recourse The assertion that ζ(s) = 0 for Re s> 2 is known is to raise k or give up. With the elliptic curve method, as the Riemann hypothesis; arguably it is the most if things do not work for our randomly chosen curve, famous unsolved problem in mathematics. Though we can pick another. Corresponding to the hidden Hadamard and de la Vallee Poussin were able in prime p in n, we are actually picking new elliptic 1896 to prove (independently) a weak form of Gauss’s curve groups mod p, and so gaining a fresh chance for conjecture known as the prime number theorem, the number of elements in the group to be smooth. the apparent breathtaking strength of the approxi- The elliptic curve method has been quite successful mation li(x)toπ(x) is uncanny. For example, take in factoring numbers which have a prime factor up to x =1022.Wehave about 50 decimal digits, and even occasionally some- π(1022) = 201 467 286 689 315 906 290 what larger primes have been discovered. We conjecture that the expected time for the elliptic exactly, and, to the nearest integer, we have curve method to find the least prime factor p of n is about li(1022) ≈ 201 467 286 691 248 261 497. exp( 2 log p log log p ) As you can plainly see, Gauss’s guess is right on the arithmetic operations mod n. What is holding us back money! from proving this conjecture is not lack of knowledge The numerical computation of li(x) is simple via about elliptic curves, but rather lack of knowledge of numerical methods for integration, and it is directly the distribution of smooth numbers. obtainable in various mathematics computing pack- For more on these and other factorization meth- ages. However, the computation of π(1022) (due to ods, the reader is referred to Crandall and Pomerance Gourdon) is far from trivial. It would be far too labo- (2005). rious to count these approximately 2 × 1020 primes one by one, so how are they counted? In fact, we have 4 The Riemann Hypothesis and the various combinatorial tricks to count without listing Distribution of the Primes everything. For example, one does not need to count one by one to see that there are exactly 2[1022/6]+1 22 As a teenager looking at a modest table of primes, integers in the interval from 1 to 10 that are rel- atively prime to 6. Rather one thinks of these num- Gauss conjectured that their frequency decays loga- x bers grouped in blocks of six, with two in each block rithmically and that li(x)= 2 dt/ log t should be a good approximation for π(x), the number of primes coprime to 6. (The “+1” comes from the partial block between 1 and x. Sixty years later, Riemann showed at the end.) Building on early ideas of Meissel and Lehmer, Lagarias, Miller, and Odlyzko presented an how Gauss’s conjecture can be proved if one assumes −s elegant combinatorial method for computing π(x) that that the Riemann zeta function ζ(s)= n n has no 2/3 zeros in the complex half-plane where the real part of takes about x elementary steps. The method was 1 refined by Del´eglise and Rivat, and then Gourdon s is greater than 2 . The series for ζ(s) converges only for Re s>1, but it may be analytically continued to found a way to distribute the computation to many Re s>0, with a simple pole at s = 1. (For a brief computers. description of the process of analytic continuation, From work of von Koch, and later Schoenfeld, we see Section ?? of Some fundamental definitions.) know that the Riemann hypothesis is equivalent to This continuation may be seen quite concretely the assertion that ∞ via the identity ζ(s)=s/(s − 1) − s {x}x−s−1 dx, √ 1 |π(x) − li(x)| < x log x (1) with {x} the fractional part of x (so that {x} = x − [x]): note that this integral converges quite nicely for all x 3 (see Crandall and Pomerance 2005, Exer- in the half-plane Re s>0. In fact, via Riemann’s func- cise 1.37). Thus, the mammoth calculation of π(1022) tional equation mentioned below, ζ(s) can be contin- might be viewed as computational evidence for the ued to a meromorphic function in the whole complex Riemann hypothesis—in fact, if the count had turned plane, with the single pole at s =1. out to violate (1), we would have had a disproof. 10 Princeton Companion to Mathematics Proof It may not be obvious what (1) has to do with in the symmetric sense where we sum over those ρ the location of the zeros of ζ(s). To understand the with | Im ρ| 1 all have real part 2 (and, in addition, that they are all is 1. By Riemann’s result, together with our assump- simple zeros). If the counts did not match, it would not tion of the Riemann hypothesis, this can be done if be a disproof of the Riemann hypothesis, but certainly we multiply a gap near T by (1/2π) log T , or, equiva- it would indicate a region where we should be check- lently, if for each zero ρ we replace its imaginary part ing the data more closely. So far, whenever we have t =Imρ by (1/2π)t log t. In this way we arrive at a tried this approach, the counts have matched, though sequence δ1,δ2,... of normalized gaps between con- sometimes we have been forced to evaluate g(t)atvery secutive zeros, which on average are about 1. closely spaced points. Checking numerically, we see that some δn are large, The first few nontrivial zeros were computed by Rie- with others close to 0; it is just the average that is 1. mann himself. The famous cryptographer and early Mathematics is well equipped to study random phe- computer scientist Turing also computed some zeta nomena, and we have names for various probability zeros. The current record for this kind of calculation distributions, such as Poisson, Gaussian, etc. Is this is held by Gourdon, who has shown that the first 1013 what is happening here? These zeta zeros are not ran- zeta zeros with positive imaginary part all have real dom at all, but perhaps thinking in terms of random- 1 part equal to 2 , as predicted by Riemann. Gourdon’s ness has promise. method is a modification of that pioneered by Odlyzko In the early twentieth century, Hilbert and P´olya and Sch¨onhage (1988), who ushered in the modern age suggested that the zeros of the zeta function might of zeta-zero calculations. correspond to the eigenvalues of some operator. Now Explicit zeta-function calculations can lead to this is provocative! But what operator? Some 50 years highly useful explicit prime number estimates. If pn is later in a now famous conversation between Dyson the nth prime, then the prime number theorem implies and Montgomery at the Institute for Advanced Study, that pn ∼ n log n as n →∞. Actually, there is a sec- it was conjectured that the nontrivial zeros behave ondary term of order n log log n, and so for all suffi- like the eigenvalues of a random matrix from the so- ciently large n, we have pn >nlog n. By using explicit called Gaussian unitary ensemble. This conjecture, zeta estimates, Rosser was able to put a numerical now known as the GUE conjecture, can be numeri- bound on the “sufficiently large” in this statement, cally tested in various ways. Odlyzko has done this, and then by checking small cases, was able to prove and found persuasive evidence for the conjecture: the that in fact pn >nlog n for every n. The paper of higher the batches of zeros one looks at, the more Rosser and Schoenfeld (1962) is filled with highly use- closely their distribution corresponds to what the ful and numerically explicit inequalities of this kind. GUE conjecture predicts. Let us imagine for a moment that the Riemann For example, take the 1 041 417 089 numbers δn with hypothesis had been proved. Mathematics is never n starting at 1023 + 17 368 588 794. (The imaginary “used up,” there is always that next problem around parts of these zeros are around 1.3 × 1022.) For each the bend. Even if we know that all of zeta’s nontrivial interval (j/100, (j +1)/100] we can compute the pro- 1 zeros lie on the line Im s = 2 , we can still ask how portion of these normalized gaps that lie in this inter- they are distributed on this line. We have a fairly con- val, and plot it. If we were dealing with eigenvalues cise understanding of how many zeros there should be from a random matrix from the GUE, we would expect up to a given height T . In fact, as already found by these statistics to converge to a certain distribution Riemann, this count is about (1/2π)T log T . Thus, on known as the Gaudin distribution (for which there is average, the zeros would tend to get closer and closer no closed formula, but which is easily computable). with about (1/2π) log T of them in a unit interval near Odlyzko has kindly supplied me with the graph in Fig- height T . ure 1, which plots the Gaudin distribution against the This tells us the average distance, or spacing, data just described (but leaves out every second data between one zeta zero and the next, but there is much point to avoid clutter). Like pearls on a necklace! The more that one can ask about how these spacings are fit is absolutely remarkable. distributed. In order to discuss this question, it is very The vital interplay of thought experiments and convenient to “normalize” the spacings, so that the numerical computation has taken us to what we feel average (normalized) gap between consecutive zeros is a deeper understanding of the zeta function. But 12 Princeton Companion to Mathematics Proof 1.0 established a long-sought and wonderful connection between modular forms and elliptic curves. But do you know why Fermat’s last theorem is true? 0.8 That is, just in case you are not an expert on all of the intricacies of the proof, are you surprised that there are in fact no solutions? In fact, there is a fairly simple 0.6 heuristic argument that supports the assertion. First note that the case n = 3, namely x3 + y3 = z3, can Density 0.4 be handled by elementary methods, and this in fact had already been done by Euler. So, let us focus on 1 the cases when n 4. Let Sn be the set of positive 0.2 nth powers of integers. How likely is it that the sum of two members of Sn is itself a member of Sn? Well, not at all likely, since Wiles has proved that this never 0 0.5 1.0 1.5 2.0 2.5 3.0 occurs! But recall that we are trying to think naively. Normalized spacing Let us try to mimic our situation by replacing the set Sn with a random set. In fact, we will throw all of Figure 1 Nearest neighbor spacing. the powers together into one set. Following an idea of Erd˝os and Ulam (1971) we create a set R by a random process: each integer m is considered independently, R where do we go next? The GUE conjecture suggests and the chance it gets thrown into is proportional −3/4 a connection to random matrix theory, and pursuing to m . This process would typically give us about x1/4 numbers in R in the interval [1,x], or at least further connections seems promising to many. It may this would be the order of magnitude. Now the total be that random matrix theory will allow us only to number of fourth and higher powers between 1 and x formulate great conjectures about the zeta function, is also about x1/4, so we can take our random set R as and will not lead to great theorems. But then again, modelling the situation for these powers, namely the who can deny the power of a glimpse at the truth? We union of all sets S for n 4. We ask how likely it is await the next chapter in this development. n to have a + b = c where a, b, c all come from R. The probability that a number m may be repre- ∈R sented as a + b where 0 quantity that −1/2 unproved for three-and-a-half centuries until Andrew is proportional to m . Now the events that would Wiles published a proof in 1995. In addition, perhaps have to occur for m to be given as such a sum involve more important than the solution of this particular numbers smaller than m, so the event that m itself R Diophantine equation (that is, an equation where the is in is independent of these. Therefore, the proba- unknowns are restricted to the integers), the centuries- bility that m is not only the sum of two members of long quest for a proof helped establish the field of algebraic number theory 1. Actually, Fermat himself had a simple proof in the case n =4, . And the proof itself but we ignore this. Princeton Companion to Mathematics Proof 13 R, but also itself a member of R, is at most a quan- Kummer and early twentieth century work of Van- tity proportional to m−1/2m−3/4 = m−5/4. So now we diver. In fact, Buhler et al. (1993) also verify in the can count how many times we should expect a sum of same range a related conjecture of Vandiver dealing R R two members of to itself be a member of . This is with cyclotomic fields, but this conjecture may in fact −5/4 at most a constant times m m . But this sum is be false in general. convergent, so we expect only finitely many examples. The probabilistic thinking above, combined with Further, since the tail of a convergent series is tiny, we computation of small cases, can carry us deeply into do not expect any large examples. some very provocative conjectures. The above prob- Thus, this argument suggests that there are at most abilistic argument can easily be extended to suggest finitely many positive integer solutions to that (2) has at most finitely many relatively prime solutions x, y, z over all possible exponent triples u, u v w x + y = z , (2) v, w with 1/u +1/v +1/w < 1. This conjecture has come to be known as the Fermat–Catalan conjecture, where the exponents u, v, w are at least 4. Since Fer- since it contains within it essentially Fermat’s last the- mat’s last theorem is the special case when u = v = w, orem and also the Catalan conjecture (recently proved we would have at most finitely many counterexamples by Mih˘ailescu) that 8 and 9 are the only consecutive to that as well. powers. This seems tidy enough, but now we get a surprise! It is good that we do allow for the possibility that There are actually infinitely many solutions to (2) there are some solutions, and this is where our main in positive integers with u, v, w all at least 4. For topic of computing comes in. For example, since 1 + example, note that 174 +344 =175. This is the case 8 = 9, we have a solution to x7 + y3 = z2, where a =1,b =2,u = 4 of a more general identity: if x =1,y = 2, and z = 3. (The exponent 7 is chosen to a, b are positive integers, and c = au + bu, we have insure that the reciprocal sum of the exponents is less (ac)u +(bc)u = cu+1. Another way to get infinitely than 1. Of course, we could replace 7 by any larger many examples is to build on the possible existence integer, but since in each case the power involved is of just one example. If x, y, z, u, v, w are positive the number 1, they should all together be considered integers satisfying (2), then with the same exponents, as just one example.) Here are the known solutions to vw uw uv we may replace x, y, z with a x, a y, a z for any (2): integer a, and so get infinitely many solutions. n 3 2 The point is that events of the kind that we are 1 +2 =3 , 5 2 4 considering—that a given integer is a power—are not 2 +7 =3 , quite independent. For instance, if A and B are both 132 +73 =29, uth powers, then so is AB, and this idea is exploited 7 3 2 2 +17 =71 , in the infinite families just mentioned. 35 +114 = 1222, So how do we neatly bar these trivialities and come 8 2 3 to the rescue of our heuristic argument? One simple 33 + 1 549 034 = 15 613 , way to do this is to insist that the numbers x, y, z in 14143 + 2 213 4592 =657, (2) be relatively prime. This gives no restriction what- 92623 + 15 312 2832 = 1137, soever in the Fermat case of equal exponents, since a 177 + 76 2713 = 21 063 9282, solution to xn + yn = zn with d the greatest com- 8 3 2 mon divisor of x, y, z leads to the coprime solution 43 + 96 222 = 30 042 907 . n n n (x/d) +(y/d) =(z/d) . The larger members were found in an exhaustive com- Concerning Fermat’s last theorem, one might ask puter search by Beukers and Zagier. Perhaps this is how far it had actually been verified before the the complete list of all solutions, or perhaps not—we final proof by Wiles. The paper by Buhler et al. have no proof. (1993) reports a verification for all exponents n up to However, for particular choices of u, v, w, more can 4 000 000. This type of calculation, which is far from be said. Using results from a famous paper of Faltings, trivial, has its roots in nineteenth century work of Darmon and Granville (1995) have shown that for any 14 Princeton Companion to Mathematics Proof fixed choice of u, v, w with reciprocal sum at most 1, Basic to this argument is a perfectly rigorous result there are at most finitely many coprime triples x, y, z on the distribution of integers n for which rad(n)is solving (2). For a particular choice of exponents, one below some bound. These ideas are worked through might try to actually find all of the solutions. If it can in the thesis of van Frankenhuijsen and also the new be handled at all, this task can involve a delicate inter- paper by Stewart and Tenenbaum (forthcoming). Here play between arithmetic geometry, effective meth- is a slightly weaker statement than the one suggested ods in transcendental number theory, and good by these authors: if a + b = c are relatively prime hard computing. In particular, the exponent triple sets positive integers and c is sufficiently large, then we {2, 3, 7}, {2, 3, 8}, {2, 3, 9}, and {2, 4, 5} are known to have √ − have all their solutions in the above table. See Poo- rad(abc) >c1 1/ log c. (3) nen et al. (forthcoming) for the treatment of the case One might wonder how the numerical evidence {2, 3, 7} and links to other work. stacks up against (3). This inequality√ asserts that The “abc conjecture” of Oesterl´e and Masser is if rad(abc)=r, then log(c/r)/ log c<1.√ So, let deceptively simple. It involves positive integer solu- T (a, b, c) denote the test statistic log(c/r)/ log c.A tions to the equation a + b = c, hence the name. To website maintained by Nitaj (www.math.unicaen.fr/ put some meaning into a + b = c, we define the “radi- ˜nitaj/abc.html) contains a wealth of information cal” of a nonzero integer n as the product of the primes about the abc conjecture. Checking the data, there are that divide n, denoting this as rad(n). So, for exam- quite a few examples with T (a, b, c) 1, the champion ple, rad(10) = 10, rad(72) = 6, and rad(65 536) = 2. so far being In particular, high powers have small radicals in com- 2 2 3 a =7 · 41 · 311 = 2 477 678 547 239 parison to the number itself, and so do many other 16 · 2 · numbers. Basically, the abc conjecture asserts that if b =11 13 79 = 613 474 843 408 551 921 511 a + b = c, then the radical of abc cannot be too small. c =2· 33 · 523 · 953 = 613 474 845 886 230 468 750 More specifically we have the following. r =2· 3 · 5 · 7 · 11 · 13 · 41 · 79 · 311 · 953 The abc conjecture. For each ε>0 there are at = 28 828 335 646 110, most finitely many relatively prime positive integer so that 1−ε triples a, b, c with a + b = c and rad(abc) Remarks and Acknowledgments I would like to recommend to the reader the book by Cohen (1993) for a discussion of computational