<<

for instance, that there are no solutions to the equation x2 + y2 = 13 475, since 13 475 = 52 × 72 × 11, Analytic Theory and 11 appears an odd number of times in this prod- uct.) Andrew Granville Once one begins the process of determining which are primes and which are not, it is soon appar- ent that there are many primes. However, as one goes 1 Introduction further and further, the primes seem to consist of a smaller and smaller proportion of the positive inte- What is ? One might have thought that gers. They also seem to come in a somewhat irregular it was simply the study of , but that is too pattern, which raises the question of whether there is broad a definition, since numbers are almost ubiqui- any formula that describes all of them. Failing that, tous in . To see what distinguishes num- can one perhaps describe a large class of them? We can ber theory from the rest of mathematics, let us look at also ask whether there are infinitely many primes? If the equation x2 + y2 = 15 925, and consider whether there are, can we quickly determine how many there it has any solutions. One answer is that it certainly are up to a given point? Or at least give a good esti- √does: indeed, the solution set forms a circle of radius mate for this number? Finally, when one has spent 15 925 in the plane. However, a number theorist is long enough looking for primes, one cannot help but interested in integer solutions, and now it is much less ask whether there is a quick way of recognizing them. obvious whether any such solutions exist. This last question is discussed in computational A useful first step in considering the above question number theory; the rest motivate this article. is to notice that 15 925 is a multiple of 25: in fact, Now that we have discussed what marks number it is 25 × 637. Furthermore, the number 637 can be theory out from the rest of mathematics, we are ready decomposed further: it is 49 × 13. That is, 15 925 = to make a further distinction: between algebraic and 52 × 72 × 13. This information helps us a lot, because . The main difference is that if we can find integers a and b such that a2 + b2 = 13, in theory (which is the main topic then we can multiply them by 5 × 7=35andwe of algebraic numbers) one typically considers ques- will have a solution to the original equation. Now we tions with answers that are given by exact formulas, notice that a = 2 and b = 3 works, since 22 +32 = whereas in analytic number theory, the topic of this 13. Multiplying these numbers by 35, we obtain the article, one looks for good approximations. For the sort solution 702 + 1052 = 15 925 to the original equation. of quantity that one estimates in analytic number the- As this simple example shows, it is often useful ory, one does not expect an exact formula to exist, to decompose positive integers multiplicatively into except perhaps one of a rather artificial and unillu- components that cannot be broken down any further. minating kind. One of the best examples of such a These components are called prime numbers, and the quantity is one we shall discuss in detail: the number fundamental theorem of states that of primes less than or equal to x. every positive integer can be written as a product of Since we are discussing approximations, we shall primes in exactly one way. That is, there is a one- need terminology that allows us to give some idea of to-one correspondence between positive integers and the quality of an approximation. Suppose, for exam- finite products of primes. In many situations we know ple, that we have a rather erratic function f(x) but are what we need to know about a positive integer once we able to show that, once x is large enough, f(x) is never have decomposed it into its prime factors and under- bigger than 25x2. This is useful because we under- stood those, just as we can understand a lot about stand the function g(x)=x2 quite well. In general, molecules by studying the atoms of which they are if we can find a constant c such that |f(x)|  cg(x) composed. For example, it is known that the equation for every x, then we write f(x)=O(g(x)). A typical x2 +y2 = n has an integer solution if and only if every usage occurs in the sentence “the average number of prime of the form 4m + 3 occurs an even number of prime factors of an integer up to x is log log x+O(1)”; times in the prime factorization of n. (This tells us, in other words, there exists some constant c>0 such

1 2 Princeton Companion to Mathematics Proof

that |the average − log log x|  c once x is sufficiently a1,a2,...,ak  0}. But, as Euler observed, this large. implies that a sum involving the elements of the first

We write f(x) ∼ g(x)iflimx→∞ f(x)/g(x)=1;and set should equal the analogous sum involving the ele- also f(x) ≈ g(x) when we are being a little less precise, ments of the second set: that is, when we want to say that f(x) and g(x) come 1 ns close when x is sufficiently large, but we cannot be, or n1 do not want to be, more specific about what we mean n a positive integer 1 by “come close.” = a1 a2 ··· ak s It is convenient for us to use the notation for  (p1 p2 pk ) a1,a2,...,ak 0 sums and for product. Typically we will indicate 1 1 1 = ··· beneath the symbol what terms the sum, or product, a1 s a2 s ak s (p1 ) (p2 ) (pk ) a10 a20 ak0 is to be taken over. For example, m2 will be a sum k −1 over all integers m that are greater than or equal to 2, − 1 = 1 s . whereas will be a product over all primes p. pj p prime j=1 The last equality holds because each sum in the 2 Bounds for the Number of Primes second-last line is the sum of a geometric progres- sion. Euler then noted that if we take s = 1, the Ancient Greek knew that there are right-hand side equals some (since ∞ infinitely many primes. Their beautiful proof by con- each pj > 1) whereas the left-hand side equals . tradiction goes as follows. Suppose that there are only This is a contradiction, so there cannot be finitely finitely many primes, say k of them, which we will many primes. (To see why the left-hand side is infi- nite when s = 1, note that (1/n)  n+1(1/t)dt denote by p1,p2,...,pk. What are the prime factors of n ··· since the function 1/t is decreasing, and therefore p1p2 pk + 1? Since this number is greater than 1 it − N 1(1/n)  N (1/t)dt = log N which tends to ∞ must have at least one prime factor, and this must be n=1 1 as N →∞.) pj for some j (since all primes are contained amongst During the proof above, we gave a formula for p1,p2,...,pk). But then pj divides both p1p2 ···pk n−s under the false assumption that there are only and p1p2 ···pk +1, and hence their difference, 1, which is impossible. finitely many primes. To correct it, all we have to do is rewrite it in the obvious way without that assumption: Many people dislike this proof, since it does not −1 actually exhibit infinitely many primes: it merely 1 1 = 1 − . (1) shows that there cannot be finitely many. It is more ns ps n1 p prime or less possible to correct this deficiency by defin- n a positive integer ing the sequence x1 =2,x2 = 3 and then xk+1 = Now, however, we need to be a little careful about x1x2 ···xk +1 for each k  2. Then each xk must con- whether the two sides of the formula converge. It is tain at least one prime factor, qk say, and these prime safe to write down such a formula when both sides are factors must be distinct, since if k<, then qk divides absolutely convergent, and this is true when s>1. xk which divides x −1, while q divides x. This gives (An infinite sum or product is absolutely convergent if us an infinite sequence of primes. the value does not change when we take the terms in In the seventeenth century Euler gave a differ- any order we want.) ent proof that there are infinitely many primes, one Like Euler, we want to be able to interpret what that turned out to be highly influential in what was happens to (1) when s = 1. Since both sides converge to come later. Suppose again that the list of primes and are equal when s>1, the natural thing to do is p1,p2,...,pk. As we have mentioned, the funda- is consider their common limit as s tends to 1 from mental theorem of arithmetic implies that there is a above. To do this we note, as above, that the left-hand one-to-one correspondence between the set of all inte- side of (1) is well approximated by gers and the set of products of the primes, which, if ∞ dt 1 a1 a2 ak = , those are the only primes, is the set {p p ···p : s − 1 2 k 1 t s 1 Princeton Companion to Mathematics Proof 3 so it diverges as s → 1+. We deduce that are 1 and the primes up to x, since every composite 1 has a prime factor no bigger than its square root. So, 1 − =0. (2) p is (4) a good approximation√ for the number of primes p prime up to x when y = x? Upon taking logarithms and discarding negligible To answer this question, we need to be more pre- terms, this implies that cise about what the formula in (4) is estimating. It is 1 = ∞. (3) supposed to approximate the number of integers up p p prime to x that have no prime factors less than or equal to So how numerous are the primes? One way to get y, plus the number of primes up to y. The so-called an idea is to determine the behaviour of the sum inclusion–exclusion principle can be used to show that analogous to (3) for other sequences of integers. For the approximation given in (4) is accurate to within k instance, 1/n2 converges, so the primes are, in 2 , where k is the number of primes less than or equal n1 k this sense, more numerous than the squares. This to y. Unless k is very small, this error term of 2 is argument works if we replace the power 2 by any far larger than the quantity we are trying to estimate, and the approximation is useless. It is quite good if k s>1, since then, as we have just observed, the sum s − is less than a small constant times log x, but, as we n1 1/n is about 1/(s 1) and in particular con- verges. In fact, since 1/n(log n)2 converges, we have seen, this is far less than√ the number of primes n1 ≈ see that the primes are in the same sense more numer- we expect up to y if y x. Thus it is not clear ous than the numbers {n(log n)2 : n  1}, and hence whether (4) can be used to obtain a good estimate there are infinitely many integers x for which the for the number of primes up to x. What we can do, number of primes less than or equal to x is at least however, is use this argument to give an upper bound x/(log x)2. for the number of primes up to x, since the number of Thus, there seem to be primes in abundance, but primes up to x is never more than the number of inte- we would also like to verify our observations, made gers up to x that are free of prime factors less than or from calculations, that the primes constitute a smaller equal to y, plus the number of primes up to y, which k and smaller proportion of the integers as the integers is no more than 2 plus the expression in (4). become larger and larger. The easiest way to see this is Now, by (2), we know that as y gets larger and − to try to count the primes using the “sieve of Eratos- larger the product py(1 1/p) converges to zero. thenes.” In the sieve of Eratosthenes one starts with Therefore, for any small positive number ε we can find − all the positive integers up to some number x. From a y such that py(1 1/p) <ε/2. Since every term these, one deletes the numbers 4, 6, 8 and so on—that in this product is at least 1/2, the product is at least k  2k k is, all multiples of 2 apart from 2 itself. One then takes 1/2 . Hence, for any x 2 our error term, 2 ,is the first undeleted integer greater than 2, which is 3, no bigger than the quantity in (4), and therefore the and deletes all its multiples—again, not including the number of primes up to x is no larger than twice (4), number 3 itself. Then one removes all multiples of 5 which, by our choice of y, is less than εx. Since we apart from 5, and so on. By the end of this process, were free to make ε as small as we liked, the primes one is left with the primes up to x. are indeed a vanishing proportion of all the integers, This suggests a way to guess at how many there as we predicted. are. After deleting every second integer up to x other Even though the error term in the inclusion– than 2 (which we call “sieving by 2”) one is left with exclusion principle is too large for√ us to use that roughly half the integers up to x; after sieving by 3, method to estimate (4) when y = x, we can still one is left with roughly two-thirds of those that had hope that (4) is a good approximation for the number remained; continuing like this we expect to have about of primes up to x: perhaps a different argument would give us a much smaller error term. And this turns 1 x 1 − (4) out to be the case: in fact, the error never gets much p √ py bigger than (4). However, when y = x the number of integers left by the time we√ have sieved with all the primes up to x is actually about 8/9 times (4). So why primes up to y. Once y = x the undeleted integers does (4) not give a good approximation? After sieving 4 Princeton Companion to Mathematics Proof with prime p we supposed that roughly 1 in every p of Table 1 Primes up to various x, and the the remaining integers were deleted: a careful analy- overcount in Gauss’s prediction. sis yields that this can be justified when p is small,  Overcount: but that this becomes an increasingly poor approxi- x dt mation of what really happens for larger p; in fact (4) xπ(x)=#{primes  x} − π(x) 2 log t does not give a correct approximation once y is bigger than a fixed power of x. So what goes wrong? In the 108 5 761 455 753 9 hope that the proportion is roughly 1/p lies the unspo- 10 50 847 534 1 700 10 ken assumption that the consequences of sieving by p 10 455 052 511 3 103 11 are independent of what happened with the primes 10 4 118 054 813 11 587 12 smaller than p. But if the primes under consideration 10 37 607 912 018 38 262 1013 346 065 536 839 108 970 are no longer small, then this assumption is false. This 1014 3 204 941 750 802 314 889 is one of the main reasons that it is hard to estimate 1015 29 844 570 422 669 1 052 618 the number of primes up to x, and indeed similar dif- 1016 279 238 341 033 925 3 214 631 ficulties lie at the heart of many related problems. 1017 2 623 557 157 654 233 7 956 588 One can refine the bounds given above but they 1018 24 739 954 287 740 860 21 949 554 do not seem to yield an asymptotic estimate for the 1019 234 057 667 276 344 607 99 877 774 primes (that is, an estimate which is correct to within 1020 2 220 819 602 560 918 840 222 744 643 a factor that tends to 1 as x gets large). The first good 1021 21 127 269 486 018 731 928 597 394 253 guesses for such an estimate emerged at the begin- 1022 201 467 286 689 315 906 290 1 932 355 207 ning of the nineteenth century, none better than what emerges from Gauss’s observation, made when study- ing tables of primes up to three million, at 16 years of age, that “the density of primes at around x is about the primes, has the same properties as a “typical” 1/ log x.” Interpreting this, we guess that the number sequence of 0s and 1s, and to use this principle to of primes up to x is about make precise conjectures about the primes. More pre- cisely, let X3,X4,... be an infinite sequence of ran- x x 1 dt dom variables ≈ . taking the values 0 or 1, and let the log n log t n=2 2 variable Xn equal 1 with probability 1/ log n (so that it equals 0 with probability 1 − 1/ log n). Assume also Let us compare this prediction (rounded to the nearest that the variables are independent, so for each m integer) with the latest data on numbers of primes, dis- knowledge about the variables other than X tells covered by a mixture of ingenuity and computational m us nothing about X itself. Cram´er’s suggestion was power. Table 1 shows the actual numbers of primes m that any statement about the distribution of 1s in the up to various powers of 10 together with the differ- sequence that represents the primes will be true if and ence between these numbers and what Gauss’s formula only if it is true with probability 1 for his random gives. The differences are far smaller than the numbers sequences. Some care is needed in interpreting this themselves, so his prediction is amazingly accurate. It statement: for example, with probability 1 a random does seem always to be an overcount, but since the sequence will contain infinitely many even numbers. width of the last column is about half that of the cen- However, it is possible to formulate a general princi- tral one it appears that the difference is something √ ple that takes account of such examples. like x. Here is an example of a use of the Gauss–Cram´er In the 1930s, the great probability theorist, Cram´er, model. With the help of the central limit theorem gave a probabilistic way of interpreting Gauss’s predic- one can prove that, with probability 1, there are tion. We can represent the primes as a sequence of 0s x dt √ and 1s: Putting a “1” each time we encounter a prime, + O( x log x) and a “0” otherwise, we obtain, starting from 3, the 2 log t sequence 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 1,.... Cram´er’s idea 1s among the first x terms in our sequence. The model is to suppose that this sequence, which represents tells us that the same should be true of the sequence Princeton Companion to Mathematics Proof 5 representing primes, and so we predict that is defined everywhere except z = 1.) Riemann proved x √ the remarkable fact that confirming Gauss’s conjec- { } dt # primes up to x = + O( x log x), (5) ture for the number of primes up to x is equivalent to 2 log t gaining a good understanding of the zeros of the func- just as the table suggests. tion ζ(s)—that is, of the values of s for which ζ(s)=0. The Gauss–Cram´er model provides a beautiful way Riemann’s deep work gave birth to our subject, so it to think about distribution questions concerning the seems worthwhile to at least sketch the key steps in the prime numbers, but it does not give proofs, and it argument linking these seemingly unconnected topics. does not seem likely that it can be made into such a Riemann’s starting point was Euler’s formula (1). tool; so for proofs we must look elsewhere. In analytic It is not hard to prove that this formula is valid when number theory one attempts to count objects that s is complex, as long as its real part is greater than 1, appear naturally in arithmetic, yet which resist being so we have counted easily. So far, our discussion of the primes −1 1 has concentrated on upper and lower bounds that ζ(s)= 1 − . ps follow from their basic definition and a few elemen- p prime tary properties—notably the fundamental theorem of If we take the logarithm of both sides and then differ- arithmetic. Some of these bounds are good and some entiate, we obtain the equation  not so good. To improve on these bounds we shall do ζ (s) log p log p − = = . something that seems unnatural at first, and reformu- ζ(s) ps − 1 pms late our question as a question about complex func- p prime p prime m1 tions. This will allow us to draw on deep tools from We need some way to distinguish between primes p  analysis. x and primes p>x; that is, we want to count those primes p for which x/p  1, but not those with x/p < 1. This can be done using the step function that takes 3 The “Analysis” in Analytic Number the value 0 for y<1 and the value 1 for y>1 (so Theory that its graph looks like a step). At y = 1, the point of discontinuity, it is convenient to give the function the 1 These analytic techniques were born in an 1859 mem- average value, 2 . Perron’s formula, one of the big tools oir of Riemann, in which he looked at the function of analytic number theory. describes this step function that appears in the formula (1) of Euler, but with one by an , as follows. For any c>0, ⎧ crucial difference: now he considered complex values ⎪ ⎨⎪0if0

We can justify swapping the order of the sum and He determined that this can be done by taking 1 − −s/2 1 the integral if c is taken large enough, since every- ξ(s):= 2 s(s 1)π Γ ( 2 s)ζ(s). Here Γ (s)isthe thing then converges absolutely. Now the left-hand famous , which equals the factorial side of the above equation is not counting the number function at positive integers (that is, Γ (n)=(n−1)!), of primes up to x but rather a “weighted” version: for and is well-defined and continuous for all other s. each prime p we add a weight of log p to the count. A careful analysis of (1) reveals that there are no It turns out, though, that Gauss’s prediction for the zeros of ζ(s) with Re(s) > 1. Then, with the help of number of primes up to x follows so long as we can (8), we can deduce that the only zeros of ζ(s) with show that x is a good estimate for this weighted count Re(s) < 0 lie at the negative even integers −2, −4,... when x is large. Notice that the sum in (6) is exactly (the “trivial zeros”). So, to be able to use (7), we need the logarithm of the lowest common multiple of the to determine the zeros inside the critical strip, the set integers less than or equal to x, which perhaps explains of all s such that 0  Re(s)  1. Here Riemann made why this weighted counting function for the primes is yet another extraordinary observation which, if true, a natural function to consider. Another explanation is would allow us tremendous insight into virtually every that if the density of primes near p is indeed about aspect of the distribution of primes. 1/ log p, then multiplying by a weight of log p makes The . If 0  Re(s)  1 and the density everywhere about 1. 1 ζ(s) = 0, then Re(s)= 2 . If you know some , then you will know that Cauchy’s residue theorem allows one to It is known that there are infinitely many zeros on the line Re(s)= 1 , crowding closer and closer together evaluate the integral in (6) in terms of the “residues” 2  as we go up the line. The Riemann hypothesis has of the integrand (ζ (s)/ζ(s))(xs/s), that is, the poles been verified computationally for the ten billion zeros of this function. Moreover, for any function f that is of lowest height (that is, with |Im(s)| smallest), it can analytic except perhaps at finitely many points, the  be shown to hold for at least 40% of all zeros, and it fits poles of f (s)/f(s) are the of f. Each  nicely with many different heuristic assertions about pole of f (s)/f(s) has order 1, and the residue is sim- the distribution of primes and other sequences. Yet, for ply the order of the corresponding zero, or minus the all that, it remains an unproved hypothesis, perhaps order of the corresponding pole, of f. Using these facts the most famous and tantalizing in all of mathematics. we can obtain the explicit formula How did Riemann think of his “hypothesis”? Rie-  xρ ζ (0) mann’s memoir gives no hint as to how he came up log p = x − − . (7) ρ ζ(0) with such an extraordinary conjecture, and for a long p prime,m1 ρ:ζ(ρ)=0 pmx time afterwards it was held up as an example of the Here the zeros of ζ(s) are counted with multiplicity: great heights to which humankind could ascend by that is, if ρ is a zero of ζ(s) of order k, then there pure thought alone. However, in the 1920s Siegel and Weil are k terms for ρ in the sum. It is astonishing that got hold of Riemann’s unpublished notes and there can be such a formula, an exact expression for from these it is evident that Riemann had been able the number of primes up to x in terms of the zeros to determine the lowest few zeros to several decimal of a complicated function: you can see why Riemann’s places through extensive hand calculations—so much work stretched people’s imagination and had such an for “pure thought alone”! Nevertheless, the Riemann hypothesis is a mammoth leap of imagination and to impact. have come up with an algorithm to calculate zeros Riemann made another surprising observation of ζ(s) is a remarkable achievement. (See computa- which allows us to easily determine the values of ζ(s) tional number theory for a discussion of how zeros on the left-hand side of the complex plane (where the of ζ(s) can be calculated.) function is not naturally defined). The idea is to multi- If the Riemann hypothesis is true, then it is not ply ζ(s) by some simple function so that the resulting hard to prove the bound product ξ(s) satisfies the functional equation ρ 1/2 x x  . ξ(s)=ξ(1 − s) for all s. (8) ρ |Im(ρ)| Princeton Companion to Mathematics Proof 7

Inserting this into (7) one can deduce that How does one arrive at such a guess given that the √ table of primes extends only up to 1022? One begins by log p = x + O( x log2 x); (9) using the first thousand terms of the right-hand side p prime px of (10) to approximate the left-hand side; wherever it which, in turn, can be “translated” into (5). In fact looks as though it could be negative, one approximates these estimates hold if and only if the Riemann with more terms, maybe a million, until one becomes hypothesis is true. pretty certain that the value is indeed negative. The Riemann hypothesis is not an easy thing to It is not uncommon to try to understand a given understand, nor to fully appreciate. The equivalent, function better by representing it as a sum of sines (5), is perhaps easier. Another version, which I prefer, and cosines like this; indeed this is how one studies the is that, for every N  100, harmonics in music and (10) becomes quite compelling √ from this perspective. Some experts suggest that (10) |log(lcm[1, 2,...,N]) − N|  2 N log N. tells us that “the primes have music in them” and To focus on the overcount in Gauss’s guesstimate thus makes the Riemann hypothesis believable, even for the number of primes up to x, we use the following desirable. To prove unconditionally that approximation, which can be deduced from (7) if, and only if, the Riemann hypothesis is true: x dt #{primes  x}∼ , 2 log t x dt − #{primes  x} 2 log t √ the so-called “ theorem,” we can take x/ log x the same approach as above but, since we are not ask- sin(γ log x) ≈ 1+2 . (10) ing for such a strong approximation to the number of γ all real numbers γ>0 primes up to x, we need to show only that the zeros 1 such that 2 +iγ is a zero of ζ(s) near to the line Re(s) = 1 do not contribute much to the formula (7). By the end of the nineteenth cen- The right-hand side here is the overcount in Gauss’s tury this task had been reduced to showing that there prediction for the number of primes up to x, divided √ are no zeros actually on the line Re(s) = 1: this was by something that grows like x. When we looked at eventually established by de la Vallee´ Poussin and the table of primes it seemed that this quantity should Hadamard in 1896. be roughly constant. However, that is not quite true as Subsequent research has provided wider and wider we see upon examining the right-hand side. The first subregions of the critical strip without zeros of ζ(s) term on the right-hand side, the “1”, corresponds to (and thus improved approximations to the number of the contribution of the squares of the primes in (7). primes up to x), without coming anywhere near to The subsequent terms correspond to the terms involv- proving the Riemann hypothesis. This remains as an ing the zeros of ζ(s) in (7); these terms have denomi- outstanding open problem of mathematics. nator γ so the most significant terms in this sum are A simple question like “How many primes are there those with the smallest values of γ. Moreover, each up to x?” deserves a simple answer, one that uses ele- of these terms is a sine wave, which oscillates, half mentary methods rather than all of these methods of the time positive and half the time negative. Having complex analysis, which seem far from the question the “log x” in there means that these oscillations hap- at hand. However, (7) tells us that the prime number pen slowly (which is why we hardly notice them in theorem is true if and only if there are no zeros of ζ(s) the table above), but they do happen, and indeed the on the line Re(s) = 1, and so one might argue that it quantity in (10) does eventually get negative. No one is inevitable that complex analysis must be involved has yet determined a value of x for which this is neg- in such a proof. In 1949 Selberg and Erd˝os surprised ative (that is, a value of x for which there are more the mathematical world by giving an elementary proof than x(1/ log t)dt primes up to x), though our best 2 of the . Here, the word “ele- guess is that the first time this happens is for mentary” does not mean “easy” but merely that the x ≈ 1.398 × 10316. proof does not use advanced tools such as complex 8 Princeton Companion to Mathematics Proof analysis—in fact, their argument is a complicated one. An easy but important example of a character mod q

Of course their proof must somehow show that there is the principal character χq, which takes the value 1 if is no zero on the line Re(s) = 1, and indeed their com- (n, q) = 1 and 0 otherwise. If q is prime, then another binatorics cunningly masks a subtle complex analysis important example is the Legendre symbol (·/q): one proof beneath the surface (read Ingham’s discussion sets (n/q)tobe0ifn is a multiple of q,1ifn is a (1949) for a careful examination of the argument). quadratic residue mod q, and −1ifn is a quadratic nonresidue mod q. (An integer n is called a quadratic residue mod q if n is congruent mod q to a perfect 4 Primes in Arithmetic Progressions square.) If q is composite, then a function known as the Legendre–Jacobi symbol (·/q), which generalizes After giving good estimates for the number of primes the Legendre symbol, is also a character. This too is up to x, which from now on we shall denote by π(x), an important example that helps us, in a slightly less we might ask for the number of such primes that direct way, to recognize squares mod q. are congruent to a mod q.( These characters are all real-valued, which is the is discussed in Part III.) Let us write π(x; q, a) for exception rather than the rule. Here is an example this quantity. To start with, note that there is only of a genuinely complex-valued character in the case one prime congruent to 2 mod 4, and indeed there q = 5. Set χ(n)tobe0ifn ≡ 0 (mod 5), i if n ≡ can be no more than one prime in any arithmetic 2, −1ifn ≡ 4, −iifn ≡ 3, and 1 if n ≡ 1. To a, a + q, a +2q,... if a and q have a common fac- see that this is a character, note that the powers of tor greater than 1. Let φ(q) denote the number of 2mod5are2, 4, 3, 1, 2, 4, 3, 1,..., while the powers of integers a,1 a  q, such that (a, q) = 1. (The − − − − notation (a, q) stands for the highest common factor i are i, 1, i, 1, i, 1, i, 1,.... of a and q.) Then all but a small finite number of the It can be shown that there are precisely φ(q) dis- infinitely many primes belong to the φ(q) arithmetic tinct characters mod q. Their usefulness to us comes progressions a, a + q, a +2q,... with 1  a

The sum on the left-hand side is a natural adaptation to hold when x is just a little bigger than q.Soeven of the sum we considered earlier when we were count- the Riemann hypothesis and its generalizations are ing all primes. And we can estimate it if we can get not powerful enough to tell us the precise behaviour good estimates for each of the sums of the distribution of primes. Throughout the twentieth century much thought χ(pm) log p. was put in to bounding the number of zeros of Dirich- p prime,m1 pmx let L-functions near to the 1-line. It turns out that We approach these sums much as we did before, one can make enormous improvements in the range of obtaining an explicit formula, analogous to (7), (10), x for which (11) holds (to “halfway between polyno- now in terms of the zeros of the Dirichlet L-function: mial in q and exponential in q”) provided there are no Siegel zeros. These putative zeros β of L(s, (·/q)) χ(n) √ L(s, χ):= . would be real numbers with β>1 − c/ q; they can ns n1 be shown to be extremely rare if they exist at all. This function turns out to have properties closely That Siegel zeros are rare is a consequence of analogous to the main properties of ζ(s). In particular, the Deuring–Heilbronn phenomenon: that zeros of it is here that the multiplicativity of χ is all-important, L-functions repel each other, rather like similarly since it gives us a formula similar to (1): charged particles. (This phenomenon is akin to −1 the fact that different algebraic numbers repel one χ(n) χ(p) = 1 − . (12) another, part of the basis of the subject of Diophantine ns ps n1 p prime approximation.) That is, L(s, χ) has an . We also believe How big is the smallest prime congruent to a mod q the “generalized Riemann hypothesis” that all zeros ρ when (a, q) = 1? Despite the possibility of the exis- of L(ρ, χ) = 0 in the critical strip satisfy Re(ρ)= 1 . tence of Siegel zeros, one can prove that there is always 2 5.5 This would imply that the number of primes up to x such a prime less than q if q is sufficiently large. that are congruent to a mod q can be estimated as Obtaining a result of this type is not difficult when there are no Siegel zeros. If there are Siegel zeros, then π(x) √ π(x; q, a)= + O( x log2(qx)). (13) we go back to the explicit formula, which is similar to φ(q) (7) but now concerns zeros of L(s, χ). If β is a Siegel Therefore, the generalized Riemann hypothesis zero, then it turns out that in the explicit formula implies the estimate we were hoping for (formula (11)), there are now two obviously large terms: x/φ(q) and 2 provided that x is a little bigger than q . −(a/q)xβ /βφ(q). When (a/q) = 1 it appears that they In what range can we prove (11) unconditionally— might almost cancel (since β is close to 1), but with that is, without the help of the generalized Riemann more care we obtain hypothesis? Although we can more or less translate xβ 1 the proof of the prime number theorem over into this x−(a/q) =(x−xβ )+xβ 1− ∼ x(1−β) log x. β β new setting, we find that it gives (11) only when x is very large. In fact, x has to be bigger than an expo- This is a smaller main term than before, but it is not nential in a power of q—which is a lot bigger than the too hard to show that it is bigger than the contri- “x is a little larger than q2” that we obtained from the butions of all of the other zeros combined, because generalized Riemann hypothesis. We see a new type the Deuring–Heilbronn phenomenon implies that the of problem emerging here, in which we are asking for repels those zeros, forcing them to be far a good starting point for the range of x for which we to the left. When (a/q)=−1, the same two terms tell obtain good estimates, as a function of the modulus us that if (1 − β) log x is small, then there are twice q; this does not have an analogy in our exploration of as many primes as we would expect up to x that are the prime number theorem. By the way, even though congruent to a mod q. this bound “x is a little larger than q2” is far out of There is a close connection between Siegel zeros reach of current methods, it still does not seem to be and class numbers, which are defined and discussed in the best answer; calculations reveal that (11) seems Section ?? of algebraic numbers. Dirichlet’s class 10 Princeton Companion to Mathematics Proof

√ number formula states that L(1, (·/q)) = πh−q/ q Rabinowitch, is rather surprising: it happens if and − for q>√6, where h−q is the class number of the only if h−q = 1, where q =4p 1. Gauss did extensive field Q( −q) (for more on this topic, see Section 7 calculations of class numbers and predicted that there of Algebraic Numbers). A class number is always are just nine values of q with h−q = 1, the largest of a positive integer, so this result immediately implies × − √ which is 163 = 4 41 1. Using the Deuring–Heilbronn that L(1, (·/q))  π/ q. Another consequence is that phenomenon researchers showed, in the 1930s, that h−q is small if and only if L(1, (·/q)) is small. The there is at most one q with h−q = 1 that is not reason this gives us information about Siegel zeros is already on Gauss’s list; but as usual with such meth- that one can show that the derivative L(σ, (·/q)) is ods, one could not give a bound on the size of the positive (and not too small) for real numbers σ close putative extra counterexample. It was not until the to 1. This implies that L(1, (·/q)) is small if and only 1960s that Baker and Stark proved that there was no if L(s, (·/q)) has a real zero close to 1, that is, a Siegel tenth q, both proofs involving techniques far removed zero β. When h−q = 1, the link is more direct: it from those here (in fact Heegner gave what we now can be shown that the Siegel zero β is approximately √ understand to have been a correct proof in the 1950s 1−6/(π q). (There are also more complicated formu- but he was so far ahead of his time that it was difficult las for larger values of h−q.) for mathematicians to appreciate his arguments and These connections show that getting good lower to believe that all of the details were correct). In the bounds on h−q is equivalent to getting good bounds 1980s Goldfeld, Gross, and Zagier gave the best result 1 on the possible range for Siegel zeros. Siegel showed −  to date, showing that h q 7700 log q this time using that for any ε>0 there exists a constant cε > 0 the Deuring–Heilbronn phenomenon with the zeros of −ε such that L(1, (·/q))  cεq . His proof was unsatis- yet another type of L-function to repel the zeros of factory because by its very nature one cannot give L(s, (·/q)). an explicit value for cε. Why not? Well, the proof This idea that primes are well distributed in arith- comes in two parts. The first assumes the generalized metic progressions except for a few rare moduli was Riemann hypothesis, in which case an explicit bound exploited by Bombieri and Vinogradov to prove that follows easily. The second obtains a lower bound in (11) holds “almost always” when x is a little big- terms of the first counterexample to the generalized ger than q2 (that is, in the same range that we get Riemann hypothesis. So if the generalized Riemann “always” from the generalized Riemann hypothesis). hypothesis is true but remains unproved, then Siegel’s More precisely, for given large x√we have that (11) proof cannot be exploited to give explicit bounds. holds for “” q less than x/(log x)2 and for This dichotomy, between what can be proved with an all a such that (a, q)√ = 1. “Almost all” means that, explicit constant and what cannot be, is seen far and out of all q less than x/(log x)2, the proportion for wide in analytic number theory—and when it appears which (11) does not hold for every a with (a, q)=1 it usually stems from an application of Siegel’s result, tends to 0 as x →∞. Thus, the possibility is not ruled and especially its consequences for the range in which out that there are infinitely many counterexamples. the estimate (11) is valid. However, since this would contradict the generalized A polynomial with integer coefficients cannot Riemann hypothesis, we do not believe that it is so. always take on prime values when we substitute in The Barban–Davenport–Halberstam theorem gives a an integer. To see this, note that if p divides f(m) weaker result, but it is valid for the whole feasible then p also divides f(m + p),f(m +2p),....How- range: for any given large x, the estimate (11) holds ever, there are some prime-rich polynomials, a famous for “almost all” pairs q and a such that q  x/(log x)2 2 example being the polynomial x + x + 41, which is and (a, q)=1. prime for x =0, 1, 2,...,39. There are almost cer- tainly quadratic polynomials that take on more con- secutive prime values, though their coefficients would 5 Primes in Short Intervals have to be very large. If we ask the more restricted question of when the polynomial x2 + x + p is prime The prediction of Gauss referred to the primes for x =0, 1, 2,...,p − 2, then the answer, given by “around” x, so it perhaps makes more sense to inter- Princeton Companion to Mathematics Proof 11 pret his statement by considering the number of Table 2 The largest known gaps between primes. primes in short intervals at around x. If we believe − − pn+1 pn Gauss, then we might expect the number of primes pn pn+1 pn 2 log pn between x and x + y to be about y/ log x. That is, in terms of the prime-counting function π, we might 113 14 0.6264 expect that 1 327 34 0.6576 y 31 397 72 0.6715 π(x + y) − π(x) ∼ (14) log x 370 261 112 0.6812 2 010 733 148 0.7026 for |y|  x/2. However, we have to be a little care- 20 831 323 210 0.7395 ful about the range for y. For example, if y = 1 log x, 2 25 056 082 087 456 0.7953 then we certainly cannot expect to have half a prime in 2 614 941 710 599 652 0.7975 each interval. Obviously we need y to be large enough 19 581 334 192 423 766 0.8178 that the prediction can be interpreted in a way that 218 209 405 436 543 906 0.8311 makes sense; indeed, the Gauss–Cram´er model sug- 1 693 182 318 746 371 1132 0.9206 gests that (14) should hold when |y| is a little bigger than (log x)2. If we attempt to prove (14) using the same methods we used in the proof of the prime number theorem, we the differences can get really small, and whether the find ourselves bounding differences between ρth pow- differences can get really large. The Gauss–Cram´er ers as follows: model suggests that the proportion of n for which the ρ ρ x+y (x + y) − x − gap between consecutive primes is more than λ times = tρ 1 dt the average, that is pn+1 − pn >λlog pn, is approxi- ρ x −λ x+y mately e ; and, similarly, the proportion of intervals Re(ρ)−1 Re(ρ)−1  t dt  y(x + y) . [x, x + λ log x] containing exactly k primes is approx- x imately e−λλk/k!, a suggestion which, as we shall With bounds on the density of zeros of ζ(s) well to see, is supported by other considerations. By looking 1 the right of 2 , it has been shown that (14) holds for y 7/12 at the tail of this distribution, Cram´er conjectured a little bigger than x ; but there is little hope, even 2 that lim sup →∞(pn+1 − pn)/(log pn) = 1, and the assuming the Riemann hypothesis, that such methods n √ evidence we have seems to support this (see Table 2). will lead to a proof of (14) for intervals of length x The Gauss–Cram´er model does have a big draw- or less. back: it does not “know any arithmetic.” In particu- In 1949 Selberg showed that (14) is true for “almost all” x when |y| is a little bigger than (log x)2, assum- lar, as we noted earlier, it does not predict divisibility ing the Riemann hypothesis. Once again, “almost all” by small primes. One manifestation of this failing is means 100%, rather than “all,” and it is feasible that that it predicts that there should be just about as there are infinitely many counterexamples, though at many gaps of length 1 between primes as there are that time it seemed highly unlikely. It therefore came of length 2. However, there is only one gap of length as a surprise when Maier showed, in 1984, that, for any 1, since if two primes differ by 1, then one of them fixed A>0, the estimate (14) fails for infinitely many must be even, whereas there are many examples of integers x, with y = (log x)A. His ingenious proof rests pairs of primes differing by 2—and there are believed on showing that the small primes do not always have to be infinitely many. For the model to make correct as many multiples in an interval as one might expect. conjectures about prime pairs, we must consider divis-

Let p1 =2

Gauss–Cram´er model suggests that a random integer f1(n),f2(n),...,fk(n) are prime is about up to x is prime with probability roughly 1/ log x,so − we might expect x/(log x)2 prime pairs q, q +2 up (1 ωf (p)/p) (1 − 1/p)k to x. However, we do have to account for the small p prime primes, as the q, q + 1 example shows, so let us con- × x | | | |··· | | (15) sider 2-divisibility. The proportion of random pairs of log f1(x) log f2(x) log fk(x) 1 integers that are both odd is 4 , whereas the propor- once x is sufficiently large. One can use a similar tion of random q such that q and q + 2 are both odd heuristic to make predictions in Goldbach’s conjec- 1 2 is 2 . Thus we should adjust our guess x/(log x) by a ture, that is, for the number of pairs of primes p, q for 1 1 factor ( 2 )/( 4 ) = 2. Similarly, the proportion of ran- which p + q =2N. Again, these predictions are very dom pairs of integers that are both not divisible by well matched by the computational evidence. 2 2 3 (or indeed by any given odd prime p)is(3 ) (and There are just a few cases of conjecture (15) that (1 − 1/p)2, respectively), whereas the proportion of have been proved. Modifications of the proof of the random q such that q and q + 2 are both not divisible prime number theorem give such a result for admis- 1 − by 3 (or by prime p)is 3 (and (1 2/p), respectively). sible polynomials qt + a (in other words, for primes Adjusting our formula for each prime p we end up with in arithmetic progressions) and for admissible at2 + the prediction btu + cu2 ∈ Z[t, u] (as well as some other polynomials in two variables of degree two). It is also known for a #{q  x : q and q + 2 both prime} certain type of polynomial in n variables of degree n (1 − 2/p) x ∼ 2 . (the admissible “norm-forms”). (1 − 1/p)2 (log x)2 p an odd prime There was little improvement on this situation dur- This is known as the “asymptotic twin-prime conjec- ing the twentieth century until quite recently, when, ture.” Despite its plausibility there do not seem to by very different methods, Friedlander and Iwaniec be any practical ideas around for turning the heuris- broke through this stalemate showing such a result tic argument above into something rigorous. The one for the polynomial t2 + u4, and then Heath-Brown did good unconditional result known is that the number so for any admissible homogenous polynomial in two of twin primes less than or equal to x is never more variables of degree three. than four times the quantity we have just predicted. Another truly extraordinary breakthrough occurred One can make a more precise prediction replacing recently with a result of Green and Tao, proved in 2 x 2 x/(log x) by 2 (1/(log t) )dt, and then we expect 2004, which states that for every k there are infinitely that the√ difference between the two sides is no more many k-term arithmetic progressions of primes: that than c x for some constant c>0, a guesstimate that is, pairs of integers a, d such that a, a+d, a+2d,...,a+ is well supported by computational evidence. (k − 1)d are all prime. Green and Tao are currently 14 Princeton Companion to Mathematics Proof hard at work attempting to show that the number of are close together than one might expect, which we four-term arithmetic progressions of primes is indeed express informally by saying that the zeros of ζ(s) well approximated by (15). They are also extending repel one another. their results to other families of polynomials. In a now-famous conversation that took place at the Institute for Advanced Study in Princeton, Mont- gomery mentioned his ideas to the physicist Freeman 8 Gaps between Primes Revisited Dyson. Dyson immediately recognized (16) as a func- tion that comes up in modelling energy levels in quan- In the 1970s Gallagher deduced from the conjectured tum chaos. Believing that this was unlikely to be a prediction (15) (with fj (t)=t + aj ) that the propor- coincidence, he suggested that the zeros of the Rie- tion of intervals [x, x + λ log x] which contain exactly mann zeta function are distributed, in all aspects, −λ k k primes is close to e λ /k! (as was also deduced, like energy levels, which are in turn modelled on the in Section 5 above, from the Gauss–Cram´er heuris- distribution of eigenvalues of random Hermitian tics). This has recently been extended to support matrices. There is now substantial computational the prediction that, as we vary x from X to 2X, and theoretical evidence that Dyson’s suggestion is the number of primes in the interval [x, x + y]is correct and can be extended to Dirichlet L-functions, normally distributed with mean x+y(1/ log t)dt and x as well as other types of L-functions, and even to other variance (1 − δ)y/ log x, where δ is some constant about L-functions. strictly between 0 and 1 and we take y to be xδ. √ One note of caution. Few of the conjectured con- When y> x the Riemann zeta function supplies sequences of this new “random matrix theory” have information on the distribution of primes in intervals been unconditionally proved, or seem likely to be in [x, x + y) via the explicit formula (7). Indeed when we the foreseeable future. It simply provides a tool to compute the “variance” make predictions where that was too difficult to do 2X 2 1 before. However, there is at least one key question log p − y dx X about which we still cannot make a well-substantiated X p prime, x

(which is what was used by Goldfeld, Gross, and Further Reading Zagier, as mentioned in Section 4, to get their lower Hardy and Wright’s classic book (1980) stands alone bound on h− ). q amongst introductory number theory texts for the L-functions arise in many areas of arithmetic geom- quality of its discussion of analytic topics. The best etry, and their coefficients typically describe the num- introduction to the heart of analytic number theory is ber of points satisfying certain equations mod p. The the masterful book by Davenport (2000). Everything Langlands program seeks to understand these connec- you have ever wanted to know about the Riemann tions at a deep level. zeta-function is in Titchmarsh (1986). Finally, there It seems that every “natural” L-function has many are two recently released books by modern masters of of the same analytic properties as those discussed the subject (Iwaniec and Kowalski 2004; Montgomery in this article. Selberg has proposed that this phe- and Vaughan 2006) that introduce the reader to the nomenon should be even more general. Consider sums key issues of the subject. s A(s)= n1 an/n that The reference list below includes several papers, sig- nificant for this article, whose content is not discussed • are well-defined when Re( s) > 1, in any of the listed books. s 2s • have an Euler product (1 + b /p + b 2 /p + p p p Davenport, H. 2000. Multiplicative Number Theory, 3rd ··· ) in this (or an even smaller) region, edn. Springer. • have coefficients an that are smaller than any Deligne, P. 1977. Applications de la formule des traces given power of n, once n is sufficiently large, aux sommes trigonometriques. In Cohomologie Etale´ • | | θ 1 (SGA 4 1/2). Lecture Notes in Mathematics, Vol- satisfy bn <κn for some constants θ< 2 and κ>0. ume 569. Springer. Green, B., and T. Tao. Forthcoming. The primes contain arbitrarily long arithmetic progressions. Annals of Math- Selberg conjectures that we should be able to give a ematics, in press. good definition to A(s) on the whole complex plane, Hardy, G. H., and E. M. Wright. 1980. An Introduction to and that A(s) should have a symmetry connecting the Theory of Numbers, 5th edn. Oxford: Oxford Science the value of A(s) with A(1 − s). Furthermore, he Publications. Ingham, A. E. 1949. Review 10,595c (MR0029411). Math- conjectures that the Riemann hypothesis should hold ematical Reviews. Providence, RI: American Mathemat- for A(s)! ical Society. The current wishful thinking is that Selberg’s family Iwaniec, H., and E. Kowalski. 2004. Analytic Number The- of L-functions is precisely the same as those considered ory. Colloquium Publications, Volume 53. Providence, by Langlands. RI: American Mathematical Society. Montgomery, H. L., and R. C. Vaughan. 2006. Multiplica- tive Number Theory I: Classical Theory. Cambridge Uni- versity Press. 13 Conclusion Soundararajan, K. Forthcoming. Small gaps between prime numbers: the work of Goldston–Pintz–Yildirim. Bulletin of the American Mathematical Society, in press. In this article we have described current thinking on Titchmarsh, E. C. 1986. The Theory of the Riemann Zeta- several key questions about the distribution of primes. Function, 2nd edn. Oxford University Press. It is frustrating that after centuries of research so little has been proved, the primes guarding their mysteries so jealously. Each new breakthrough seems to require brilliant ideas and extraordinary technical prowess. As Euler wrote in 1770:

Mathematicians have tried in vain to discover some order in the sequence of prime numbers but we have every reason to believe that there are some mysteries which the human mind will never penetrate.