<<

LTCC basic course: Analytic Theory

Andrew Granville

3

Preface

Riemann’s seminal 1860 memoir showed how questions on the distribution of prime are more-or-less equivalent to questions on the distribution of zeros of the . This was the starting point for the beautiful theory which is at the heart of analytic , which we introduce in this mini- course. Selberg showed how sieve bounds can be obtained by optimizing values over a wide class of combinatorial objects, making them a very flexible tool. We will also see how such methods can be used to help understand primes in short intervals. Key phrases: By a conjecture we mean a proposition that has not been proven, but which is favoured by some compelling evidence. An open question is a problem where the evidence is not especially convincing in either direction. The distinction between conjecture and open question is somewhat subjective and may change over time.1 We trust, though, that there will be no similar ambiguity concerning the theorems! The exercises: In order to really learn the subject the keen student should try to fully answer the exercises. We have marked several with † if they are difficult, and occasionally †† if extremely difficult. Some exercises are embedded in the text and need to be completed to fully understand the text; there are many other exercises at the end of each chapter. Textbooks: There is a bibliography at the end of these notes of worthy books. Davenport’s [Da] is, for us, the greatest introduction to the key ideas of the subject. Davenport keeps the focus narrow and gives a masterful, if terse, explanation of the proof of the theorem and related issues. Similarly Bombieri’s [Bo] is focussed on all there is to know about the large sieve and applications, and Crandall and Pomerance [CP] on computational issues. Titchmarsh [Ti] long ago write the definitive treatise on the Riemann zeta function, and Edwards [Edw] a wonderful historical development. Narkiewicz’s historical treatise [Nar] on the distribution of prime numbers is a beautiful and informative read —- the reader is asked to find more about who did what from that source. Ribenboim’s [Rib] is source of joy to professional and amateur prime seekers alike. There are two recent tracts by the modern masters of the subject, [IK] and [MV], covering a wide cross-section of topics in analytic number theory; these are much broader than what we have attempted here and are recommended to someone who wishes to gain broad expertise in our subject. ———————————————– These notes have been cobbled together out of some of the course notes from two full length courses I have taught in the past, so please excuse any repetitions.

1For example, whether or not there are odd perfect numbers was an open question; however it is now known that if one exists it is > 10300, which entitles us to conjecture that there are none.

Contents

Chapter 1. Proofs that there are infinitely many primes, without analysis 1 1.1. Euclid and beyond 1 1.2. Various other non-analytic proofs 3 1.3. Primes in certain progressions 4 1.4. Prime divisors of polynomials 7 1.5. A diversion: Dynamical systems proofs 9 1.6. Formulas for primes 11 1.7. Special types of primes 13

Chapter 2. Infinitely many primes, with analysis 15 2.1. First Counting Proofs 15 2.2. Euler’s proof and the Riemann zeta-function 16 2.3. Upper bound on the number of primes up to x 19 2.4. An explicit lower bound on the sum of reciprocals of the primes 19 2.5. Another explicit lower bound 20 2.6. Binomial coefficients: First bounds 20 2.7. Bertrand’s postulate 21 2.8. Big Oh and other notation 23 2.9. How many prime factors does a typical have? 25 2.10. How many primes are there up to x? 28 2.11. The Gauss-Cram`erheuristic 30 2.12. Smoothing out the prime counting function 31 2.13. An attempt to prove the 33

Chapter 3. The prime number theorem 35 3.1. Partial Summation 35 3.2. Chebyshev’s elementary estimates 37 3.3. Multiplicative functions and Dirichlet 38 3.4. The average value of the divisor function and Dirichlet’s hyperbola method 40 3.5. The prime number theorem and the M¨obiusfunction: proof of Theorem 3.3.1 41 3.6. Selberg’s formula 43

Chapter 4. Introduction to 49 4.1. The sieve of Eratosthenes 49 4.2. A first combinatorial Eratosthenes 50 4.3. without large prime factors: “Smooth” or “friable” numbers 52 4.4. An upper bound on the number of smooth numbers 55 4.5. Large gaps between primes 56

5 6 CONTENTS

4.6. Exercises 59

Chapter 5. Selberg’s sieve applied to an 61 5.1. Selberg’s sieve 61 5.2. The Fundamental Lemma of Sieve Theory 63 5.3. A stronger Brun-Titchmarsh Theorem 65 5.4. Additional exercises 67

Chapter 6. Primer on analysis 69 6.1. Inequalities 69 6.2. Fourier series 70 6.3. Poisson summation 72 6.4. The order of a function 74 6.5. 75 6.6. Perron’s formula and its variants 77 6.7. 79 6.8. The 80

Chapter 7. Riemann’s plan for proving the prime number theorem 83 7.1. A method to accurately estimate the number of primes 83 7.2. Linking number theory and complex analysis 85 7.3. The functional equation 86 7.4. The zeros of the Riemann zeta-function 86 7.5. Counting primes 87 7.6. Riemann’s revolutionary formula 88

Chapter 8. The fundamental properties of ζ(s) 91 8.1. Representations of ζ(s) 91 8.2. A functional equation 91 8.3. A functional equation for the Riemann zeta function 92 8.4. 8.4. Properties of ξ(s) 92 8.5. A zero-free region for ζ(s) 93 8.6. Approximations to ζ0(s)/ζ(s) 95 8.7. On the number of zeros of ζ(s) 96

Chapter 9. The proof of the Prime Number Theorem 99 9.1. The explicit formula 99 9.2. Proving the prime number theorem 101 9.3. Assuming the 101 9.4. Primes in short intervals, via zeta functions 102

Chapter 10. The prime number theorem for arithmetic progressions 103 10.1. Uniformity in√ arithmetic progressions 104 10.2. Breaking the x-barrier 106 10.3. Improvements to the Brun-Titchmarsh Theorem 107 10.4. To be put later 107

Chapter 11. Gaps between primes, I 109 11.1. The Gauss-Cram´erheuristic predictions about primes in short intervals 109 11.2. The prime k-tuplets conjecture 110 CONTENTS 7

11.3. A quantitative prime k-tuplets conjecture 111 11.4. Densely packed primes in short intervals 113 Chapter 12. Goldston-Pintz-Yıldırım’s argument 115 12.1. The set up 115 12.2. Evaluating the sums over n 116 12.3. Evaluating the sums using Perron’s formula 117 12.4. Evaluating the sums using Fourier analysis 119 12.5. Evaluating the sums using Selberg’s combinatorial approach, I 120 12.6. Sums of multiplicative functions 121 12.7. Selberg’s combinatorial approach, II 121 12.8. Finding a positive difference; the proof of Theorem 10.4.1 122

Chapter 13. Goldston-Pintz-Yıldırım in higher dimensional analysis 125 13.1. The set up 125 13.2. The 126 13.3. Sums of multiplicative functions 126 13.4. The combinatorics, II 127 13.5. Finding a positive difference 128 13.6. A special case 128 13.7. Maynard’s F s, and gaps between primes 128 13.8. F as a product of one dimensional functions 129 13.9. The optimal choice 130 13.10. Tao’s upper bound on ρ(F ) 132 CHAPTER 1

Proofs that there are infinitely many primes, without analysis

Introduction The fundamental theorem of arithmetic states that every positive integer may be factored into a product of primes in a unique way.1 Moreover any finite product of prime numbers equals some positive integer. Therefore there is a 1-to-1 corre- spondence between positive integers and finite products of primes. This means that we can understand positive integers by decomposing them into their prime factors and studying these, just as we can understand molecules by studying the atoms of which they are composed. Once one begins to determine which integers are primes and which are not, one observes that there are many of them, though as we go further out, they seem to become sparser, a decreasing proportion of the positive integers. It is also tempting to look for patterns amongst the primes: Can we find a formula that describes all of the primes? Or at least some of them? Can we prove that there are infinitely many? And, if so, can we quickly determine how many there are up to a given point? Or at least give a good estimate? How are they distributed? Regularly, or unevenly? These questions motivate this mini-course. This first, preliminary chapter investigates what is known about the infinitude of prime numbers using only imaginative elementary arguments. This is not nec- essarily part of this mini-course but the student might enjoy playing with some of the concepts here.

1.1. Euclid and beyond Ancient Greek mathematicians knew that there are infinitely many primes. Their beautiful proof by contradiction goes as follows: Suppose that there are only finitely many primes, say k of them, which we will denote by

p1 = 2 < p2 = 3 < . . . < pk.

What are the prime factors of p1p2 . . . pk + 1? Since this number is > 1 it must have a prime factor (by the Fundamental Theorem of Arithmetic), and this must be some pj, for some j, 1 ≤ j ≤ k (since, by assumption, p1, p2, . . . , pk is a list of all of the primes). But then pj divides both p1p2 . . . pk and p1p2 . . . pk + 1, and 2 hence pj divides their difference, 1, which is impossible.

1A positive integer n is prime if it is divisible only by 1 and itself. Note that −1 is not considered to be prime. An integer n is composite if it is divisible by two (not necessarily distinct) primes; note that this includes negative integers as well as 0. Neither word applies to 1 or −1, or to −p for any prime p. 2Euclid gives this proof in Book 9 Proposition 20 of his Elements, assuming that there are just three primes. The reader is evidently meant to infer that the same proof works no matter how

1 2 1. PROOFS THAT THERE ARE INFINITELY MANY PRIMES, WITHOUT ANALYSIS

This proof has the disadvantage that it does not exhibit infinitely many primes, but only shows that it is impossible that there are finitely many. It is possible to more-or-less correct this deficiency by defining the sequence

a1 = 2, a2 = 3 and then an = a1a2 . . . an−1 + 1 for each n ≥ 2.

Let pn be some prime divisor of an, for each n ≥ 1. We claim that the pn are all distinct so we have an infinite sequence of distinct primes. We know that these primes are distinct since if m < n then

(pm, pn) divides (am, an) = (am, 1) = 1, since an ≡ 1 (mod am) by our construction, which implies that (pm, pn) = 1; that is, pm and pn must be different. 2n Fermat conjectured that the integers bn = 2 + 1 are primes for all n ≥ 0. His claim starts off correct: 3, 5, 17, 257, 65537 are all prime, but is false for b5 = 641 × 6700417, as Euler famously noted. Nonetheless we can prove that if pn is some prime divisor of bn for each n ≥ 0 then p0, p1,... is an infinite sequence of distinct primes, in this case because bn = b1b2 . . . bn−1 + 2 for each n ≥ 1, and so 3 (bm, bn) = (bm, 2) = 1 for all m < n, since bn ≡ 2 (mod bm). A related proof involves letting qn be the smallest prime factor of n! + 1. We see that there cannot be a largest prime n since qn is always larger.

Exercises

Exercise 1.1.1. (Other proofs inspired by Euclid’s proof) Suppose that we are given distinct primes p1, p2, . . . , pk, and let m denote their product, p1p2 . . . pk. Show that each of the following integers N has a prime factor which is not equal to any prime in this list, and so deduce that there are infinitely many primes. a) For any r, 1 ≤ r ≤ k, let n = p1p2 . . . pr and N := n + m/n. Pk b) (Reminiscent of proofs of the Chinese Remainder theorem): Let N := i=1 m/pi. Exercise 1.1.2. In this question we give an algorithm that determines all of the primes. Let p1 < p2 < . . . < pk be the distinct primes up to pk, and let m = p1p2 . . . pk. Pk a) Prove that if N = i=1 Ni, where the set of prime factors of Ni is precisely {p1, p2, . . . , pk}\{pi} for each i, then (N, m) = 1. b) Use the Chinese Remainder theorem to show that, for any given integer N with Pk (N, m) = 1, there exist integers Mi ∈ [1, m] for which N ≡ i=1 Mi (mod m), where the prime factors of Mi are exactly {p1, . . . , pk}\{pi}. Pk c) Prove that if 1 ≤ N ≤ i=1 m/pi with (N, m) = 1 then there exist integers Ni as in part a, with each |Ni| ≤ m. (Hint: Select Ni = Mi or Mi − m in part b). large a finite number of primes we assume there to be. The notation of those times was far less flexible than that of today, so that the astute reader necessarily had to deduce the full content of the statement of a theorem, or of a proof, from what was written, and could not necessarily learn all that was meant from what was actually written. Even Renaissance thinkers like Fermat and Descartes faced these notational limitations and deplored those who could not navigate these difficulties adroitly. 3This proof appeared in a letter from Goldbach to Euler in July 1730. 1.2. VARIOUS OTHER NON-ANALYTIC PROOFS 3 d) The discussion√ in part c) gives us a way to√ determine, with proof, each prime N between x and x, given the primes up to x.4 Find all the primes between 10 and 100 in this way, finding a representation of N as in part c). Exercise 1.1.3. Use the sequence of Fermat numbers to prove that, for each integer k ≥ 1, there are infinitely many primes ≡ 1 (mod 2k).

Exercise 1.1.4. Suppose that p1 = 2 < p2 = 3 < . . . is the sequence of prime numbers. Use the fact that every Fermat number has a distinct prime divisor to 2n prove that pn ≤ 2 + 1. What can one deduce about the number of primes up to x? Exercise 1.1.5. (Open questions). Are there infinitely many primes of the form an? If p1 = 2 < p2 = 3 < . . . is the sequence of prime numbers then are there infinitely many n for which p1p2 . . . pn + 1 is prime? For which p1p2 . . . pn − 1 is prime? Let us determine an infinite sequence of primes by starting with prime q1, and then letting qn be some prime divisor of q1q2 . . . qn−1 + 1 for each n ≥ 2. Can the sequence q1, q2, ... be a re-arrangement of the set of all primes? What if qn is the smallest prime divisor of q1q2 . . . qn−1 + 1? 1.2. Various other non-analytic proofs n The Mersenne numbers take the form Mn = 2 − 1. Suppose that p is prime and q is a prime dividing 2p − 1. The order of 2 mod q, must be divisible by p, and must divide q − 1, hence p ≤ q − 1. Thus there cannot be a largest prime p, since any prime factor q of Mp is larger, and so there are infinitely many primes. Furstenberg gave an extraordinary proof using point set : Define a topology on the set of integers Z in which a set S is open if it is empty or if for every a ∈ S there is an arithmetic progression Za,q := {a + nq : n ∈ Z} which is a subset of S, for some integer q 6= 0. Evidently each Za,q is open, and it is also closed since Za,q = Z \ ∪b: 0≤b≤q−1, b6=aZb,q. If there are only finitely many primes p then A = ∪pZ0,p is also closed, and so Z\A = {−1, 1} is open, but this is obviously false since A does not contain any arithmetic progression Z1,q. Hence there are infinitely many primes.

Exercises

Exercise 1.2.1. a) Prove that if f(t) ∈ Z[t] and r, s ∈ Z then r − s divides f(r) − f(s). b) Prove that if 2n − 1 is prime then n is prime. c) Prove that if 2n + 1 is prime then n is either 0 or a power of 2. Exercise 1.2.2. Prove that if prime q divides 2p − 1, where p is prime, then 2 has order p mod q. Deduce that (2p − 1, 2` − 1) = 1 for all primes p and `. Exercise 1.2.3. (Open questions). Prove that there are infinitely many Mersenne primes, 2p − 1. (This is equivalent to asking whether there are infinitely many even perfect numbers, since n is an even perfect number if and only if it is of the form √ 4The sieve of Eratosthenes also allows us to find the primes in ( x, x], given the primes up √ to x, but subsequently, the only way to prove to a skeptic that what you claim are the primes, are actually prime, would be for that person to redo the calculation. On the other hand the proof here provides easily verified proofs of primality. 4 1. PROOFS THAT THERE ARE INFINITELY MANY PRIMES, WITHOUT ANALYSIS

2p−1(2p − 1) with 2p − 1 prime.5) Prove that there are infinitely many Fermat n n primes, 22 + 1. Prove that there are integers n for which 22 + 1 is composite.6 Exercise 1.2.4. (Open). Prove that there are infinitely primes p for which 2p − 1 is composite. (This is a conjecture, rather than an open question, because one can prove, and you should prove, that if p ≡ 3 (mod 4) and q = 2p + 1 is also prime then q divides 2p − 1, so that 2p − 1 is composite.) Exercise 1.2.5. We know that 2 2 22−1 22 −1−1 22 − 1, 22 −1 − 1, 22 −1 − 1 and 22 −1 − 1 are all prime. Is it true that the next term in the sequence is prime? How about all the terms of the sequence?

1.3. Primes in certain arithmetic progressions Any prime ≡ a (mod m) is divisible by (a, m), and so if (a, m) > 1 there cannot be more than one prime ≡ a (mod m). Thus all but finitely many primes are distributed among the φ(m) arithmetic progressions a (mod m) with (a, m) = 1. We will eventually prove that all such arithmetic progressions contain infinitely many primes, and that the primes are roughly equally distributed amongst these φ(m) arithmetic progressions (mod m). For now we will prove results along these lines using only elementary methods. We begin by proving that there are infinitely many primes in each of the two feasible residue classes mod 3. There are infinitely many primes ≡ −1 (mod 3), for if there are only finitely many, say p1, p2, . . . , pk, then N = 3p1p2 . . . pk −1 must have a prime factor q ≡ −1 (mod 3), else N ≡ 1 (mod 3), and so q divides both N and N + 1 and hence their difference 1, which is impossible. A similar proof works for primes ≡ −1 (mod 4), and indeed for much more general sets of primes – see exercise 1.3.1. How about the primes ≡ 1 (mod 3)? Suppose that prime q divides a2 + a + 1 for some integer a. Then a3 ≡ 1 (mod q) (since a2 + a + 1 divides a3 − 1) and so a has order 1 or 3 mod q. In the first case a ≡ 1 (mod q) in which case 0 ≡ a2 + a + 1 ≡ 3 (mod q) so that q = 3. Otherwise 3 divides q − 1; that is, q ≡ 1 (mod 3). Therefore if there are only finitely many primes ≡ 1 (mod 3), say 2 p1, p2, . . . , pk, let a = 3p1p2 . . . pk so that any prime divisor q of N = a + a + 1 is different from 3 and the pi’s, and ≡ 1 (mod 3), contradicting our assumption. To generalize this argument to primes ≡ 1 (mod m), we replace the polynomial a2 + a + 1 by a polynomial that recognizes when a has order m. Evidently this must be a divisor of the polynomial am − 1, indeed am − 1 divided through by all of the factors corresponding to orders which are proper divisors of m. So we define the cyclotomic polynomials φn(t) ∈ Z[t], inductively, by the requirement m Q m t − 1 = d|m φd(t) for all m ≥ 1, with each φd(t) monic. The roots of t − 1 are the distinct mth roots of unity, so our definition implies that the roots of φm(t) are exactly the primitive mth roots of unity, that is those α ∈ C for which αm = 1

5It is known that 2p − 1 is prime for p = 2, 3, 5, 7, 13, 17, 19,..., 57885161, a total of 48 values as of late 2015. There is a long history of the search for Mersenne primes, from the first serious computers to the first great distributed computing project, GIMPS (The Great Internet Mersenne Prime Search). n n 6There are no primes known of the form 22 + 1 other than for n ≤ 4, and we know 22 + 1 is composite for 5 ≤ n ≤ 30 and many other n besides. It is always a significant moment when a Fermat number is factored for the first time. 1.3. PRIMES IN CERTAIN ARITHMETIC PROGRESSIONS 5 but αr 6= 1 for all r, 1 ≤ r ≤ m − 1. These can be written more explicitly as exp(2iπj/m) with (j, m) = 1 so that φm(t) has degree φ(m). A proof of the infinitude of primes ≡ 1 (mod m), along these lines, is given by exercise 1.3.3.

Exercises

Exercise 1.3.1. Let G be a proper subgroup of the multiplicative group of elements mod m (that is, the residue classes coprime to m). a) Show that if N is an integer with (N, m) = 1 where N (mod m) is not an element of G, then N has a prime factor which is not an element of G. b) Given any finite set of primes p1, p2, . . . , pk which do not divide m, and a residue class b (mod m), show that there exists an integer N such that N ≡ b (mod m) and (N, p1p2 . . . pk) = 1. c) We will now prove that there are infinitely many primes whose residues mod m do not belong to the subgroup G. For if there are only finitely many, say p1, p2, . . . , pk, select b appropriately and then N as in part b), and finally use part a) to deduce the desired conclusion. d) Deduce that there are infinitely many primes in at least two of the arithmetic progressions 3 mod 8, 5 mod 8, and 7 mod 8. Exercise 1.3.2. a) Prove that the set of primitive mth roots of unity can be written in the form exp(2iπj/m) with (j, m) = 1. Q d µ(n/d) b) Prove that φn(x) = d|n(x − 1) , where ( (−1)k if n = p p . . . p is squarefree; µ(n) := 1 2 k 0 if n is divisible by p2 for some prime p. (Hint: Use the Mobius inversion formula). Exercise 1.3.3. The prime p is a primitive prime factor of am − 1 if p divides am − 1 but p does not divide an − 1 for any n, 1 ≤ n ≤ m − 1. m a) Show that every primitive prime factor of a − 1 divides φm(a). b) Prove that gcd(am − 1, an − 1) = a(m,n) − 1. Deduce that if prime p divides m φm(a) but is not a primitive prime factor of a − 1 then p divides φd(a) for some proper divisor, d, of m. c) Prove that if d divides m and prime p divides gcd(φm(a), φd(a)) then m/d is a power of p. (Hint: Suppose that prime q divides m/d. Show that φm(a) divides m m/q m/q m m/q (a −1)/(a −1), and φd(a) divides a −1. By considering (a −1)/(a −1) m/q (mod a − 1), establish that gcd(φm(a), φd(a)) divides q.) d) Prove that φm(0) = 1 for all m > 1. Deduce that if p is a prime factor of φm(ma) for any integer a then p ≡ 1 (mod m). e) Use part d) to deduce that there are infinitely primes ≡ 1 (mod m).

In the next exercise we more precisely understand the primes that divide φn(a) as n varies, for fixed a with |a| > 1. Exercise 1.3.4. a) Show that if d is the smallest integer for which p divides φd(a) then d divides p − 1. 6 1. PROOFS THAT THERE ARE INFINITELY MANY PRIMES, WITHOUT ANALYSIS b) Use part a) and 1.3.3c) to show that if p divides φm(a) but is not a primitive prime factor of am − 1 then m = prd for some r ≥ 1 and some d < p. Deduce that there can be at most one such prime p. m m/p 2 c) Prove that if odd prime p divides φm/p(a) then (a −1)/(a −1) ≡ p (mod p ). Deduce, using part b), that if φm(a) has no primitive prime factor then either k |φm(a)| = p, a prime dividing m, or m = p = 2 and a = ±2 − 1, for some k ≥ 1. d) Our goal now is to show that if m ≥ 3 and |a| ≥ 2 then |φm(a)| > p where p is the largest prime factor of m, other than the case φ3(−2) = φ6(2) = 3. Use 1.3.2b) to show that if a ≥ 2 then Y  1  |φ (a)|/|a|φ(m) ≥ 1 − . m 2d d≥1 A calculation reveals that this equals 0.288788 ...; in particular it is > 1/4, and φ(m)−2 φ(m)−2 hence |φm(a)| > 2 . Next prove that 2 ≥ m for all m > 18; and then that 2φ(m)−2 ≥ p, except when m = 1, 2, 3, 4, 6, 10. For m = 3, 4, 6, 10, show that |φm(a)| > p for |a| ≥ 4. Verify the remaining cases to justify our claim. e) Deduce that φm(a) has a primitive prime factor whenever |a| > 1, except the k cases φ3(−2) = φ6(2) = 3, φ2(±2 − 1) and φ1(a) with a 6= 2. f) Deduce that there is a prime ≡ 1 (mod m) which is ≤ 2m − 1. (Note that this cannot be improved when m = 3.) In the next exercise we see further applications and developments of the theory that we have developed in exercise 1.3.4.

Exercise 1.3.5. a) Writing ad = 1 + kpr where p - k, prove that if pr 6= 2 then pd r+1 r+2 a − 1 is divisible by p but not by p . Deduce that if p divides φm(a) but is m 2 not a primitive prime factor of a − 1 then p does not divide φm(a), except when m = 2 and a ≡ 3 (mod 4). b) Deduce that for each integer a with |a| ≥ 2, the integers (am − 1)/(a − 1) have a primitive (and thus distinct) prime factor for all integers m 6= 1, 2, 3 or 6.

{xn}n≥0 is a Lucas sequence if x0 = 0, x1 = 1 and

(1.3.1) xn+2 = bxn+1 + cxn for all n ≥ 0, n for given non-zero, coprime integers b, c; the integers xn = (a − 1)/(a − 1) form a Lucas sequence with b = a + 1 and c = −a for each n ≥ 0. In 1913 Carmichael 2 showed that if the discriminant ∆ := b + 4c > 0 then xn has a primitive prime factor for each n 6= 1, 2 or 6 except for F12 = 144 where Fn is the Fibonacci sequence 0 0 n−1 (b = c = 1), and for F12 where Fn = (−1) Fn (b = −1, c = 1). It is much more difficult to prove that Lucas sequences with negative discriminant have primitive prime factors. Nonetheless, in 1974 Schinzel succeeded in showing that xn has a primitive prime factor once n > n0, for some sufficiently large n0, if ∆ 6= 0, other than in the periodic case b = ±1, c = −1. Determining the smallest possible value of n0 has required great efforts culminating in the beautiful work of Bilu, Hanrot and Voutier [BHV] who proved that n0 = 30 is best possible (indeed if b = 1, c = −2 then x5, x8, x12, x13, x18, x30 have no primitive prime factors). Exercise 1.3.6. In exercise 1.3.3e we saw that there are infinitely many primes ≡ 1 (mod 8). Now we prove that there are infinitely many primes in the other arithmetic progressions (mod 8). 1.4. PRIME DIVISORS OF POLYNOMIALS 7 a) For b = 3, 5 or −1 suppose that there are only finitely many primes ≡ b (mod 8), and let nb be their product. Establish a contradiction by considering the prime 2 factors of nb + b − 1. (Hint: Use the law of quadratic reciprocity.) b) Generalize this argument to prime divisors of values of other quadratic polyno- mials.

1.4. Prime divisors of polynomials

In 1837 Dirichlet proved that any linear polynomial mt+a ∈ Z[t] with (m, a) = 1 takes on infinitely many prime values (which is equivalent to the fact that there are infinitely many primes ≡ a (mod m) if (a, m) = 1). We wish to generalize this to any irreducible f(t) ∈ Z[t], with suitable restrictions. To formulate the restrictions there is one subtlety: the reason we need (m, a) = 1 in the linear case is that (m, a) divides mn+a for every integer n, and in fact (m, a) = gcd{mn+a : n ∈ Z}. Hence let us define Content(f) to be the gcd of the integers f(n) as we vary over n ∈ Z. It is conjectured that for any irreducible f(t) ∈ Z[t] with Content(f) = 1 there are infinitely many integers n for which f(n) is prime. (In fact that for any irreducible f(t) ∈ Z[t] there are infinitely many integers n for which f(n)/Content(f) is prime.) The polynomial n2 + n + 41 is prime for n = 0, 1,..., 39, though composite for n = 40. One can ask whether there are any polynomials f(t) ∈ Z[t] such that f(n) is prime for all integers n? The answer is no: We now show that if f(t) ∈ Z[t] has degree ≥ 1 then f(n) is composite for infinitely many integers n. Since a non-zero polynomial has only finitely many roots, for example (f(t) − 1)f(t)(f(t) + 1), thus |f(n)| ≥ 2 for all but finitely many integers n. Select any such n and let m = f(n). Now f(n + km) ≡ f(n) ≡ 0 (mod m) for any integer k, by exercise 1.2.1a, so that f(n + km) is composite all k except for the finitely many n + km which are roots of (f(t) − m)f(t)(f(t) + m). One can use a minor variation to show that if f(t) ∈ Z[t] has degree d and there are more than 2d distinct integers n for which |f(n)| is prime then f(t) is irreducible. To prove this suppose f(t) = g(t)h(t); for each n where |f(n)| is prime we have that either g(n) = ±1 or h(n) = ±1. Now there are no more than 2 deg(g) roots of (g(t) − 1)(g(t) + 1), and no more than 2 deg(h) roots of (h(t) − 1)(h(t) + 1), and therefore ≤ 2 deg(g) + 2 deg(h) = 2d distinct integers n for which |f(n)| prime. (The “2d” in this result can be improved to “d + 2”, and this is probably best possible for all d ≥ 6. If we ask for f(n) to be prime then the correct bound is d.) We finish this section by proving that for any f(t) ∈ Z[t] of degree ≥ 1 there are infinitely many distinct primes p for which p divides f(n) for some integer n. We may assume that f(n) 6= 0 for all n ∈ Z else we are done. Now suppose that p1, . . . , pk are the only primes which divide values of f and let m = p1 . . . pk. Then f(nmf(0)) ≡ f(0) (mod mf(0)) for every integer n, by exercise 1.2.1a, so that f(nmf(0))/f(0) ≡ 1 (mod m). Therefore f(nmf(0)) has prime divisors other than those dividing m for all n other than the finitely many n which are roots of (f(tmf(0)) − f(0))(f(tmf(0)) + f(0)), a contradiction.

Exercises on prime values of reducible polynomials

Exercise 1.4.1. a) Prove that the gcd of the coefficients of f divides Content(f). b) Give an example to show that Content(f) can be larger than the gcd of the coefficients of f. 8 1. PROOFS THAT THERE ARE INFINITELY MANY PRIMES, WITHOUT ANALYSIS c) If a polynomial f(t) of degree d only takes on integer values then show that there Pd t exist integers b0, b1, . . . , bd for which f(t) = j=0 bj j . d) Prove that Content(f) = gcd0≤j≤d bj = gcd0≤n≤d f(n). Exercise 1.4.2. Use the proof above to show that if f has degree d then there exists an integer n, with |n| ≤ d(1 + max|m|≤d |f(m)|), for which |f(n)| is composite. Can you significantly improve this?

Exercise 1.4.3. Suppose that m1, m2, . . . , mk are a set of pairwise coprime integers such that mj divides f(aj) for some integer aj, for each j. Prove that there are infinitely many integers n for which m1m2 . . . mk divides f(n).

Exercise 1.4.4. a) Suppose that g(t) ∈ Z[t], where m1, . . . , mk are integers that satisfy g(mi) = 1, and n1, . . . , n` are integers that satisfy g(nj) = −1. Prove Qk Q` that i=1(mi − nj) divides 2 for each j, and j=1(mi − nj) divides 2 for each i. b) Deduce that if k + ` ≥ 4 with k, ` ≥ 1 then k + ` = 4 and the four integers are consecutive. In fact g(x) = ±G(±x + a) for some choice of sign ± and some integer a where either G(x) ≡ x(x−1)−1 or x(x+1)(x−2)+1 mod (x−2)(x−1)x(x+1). In particular show that if k + ` > deg(g) then g(x) = ±G(±x + a) where G(x) is either x(x + 1)(x − 2) + 1 with k = 3, ` = 1, or x(x − 1) − 1 with k = ` = 2, or 2x2 − 1 with k = 2, ` = 1, or 2x − 1 with k = ` = 1, or x − 1 with k = ` = 1. c) Suppose that f(t) ∈ Z[t] is reducible of degree d, for which there are more than d integers n with |f(n)| is prime. Deduce that f(t) has a proper factor g(t), as in part b). d) Suppose that f(t) ∈ Z[t] is reducible of degree d, for which there are ≥ d + 4 integers n for which |f(n)| is prime. Deduce that there exist integers a and b such that f(t) = g(t + b) where g(t) = ((t − a)(t − a − 1) − 1)(t(t − 1) − 1) so that the prime values are g(2) = g(a−1) = a2 −3a+1, −g(1) = −g(a) = a2 −a−1, −g(0) = −g(a + 1) = a2 + a − 1 and g(−1) = g(a + 2) = a2 + 3a + 1. (These are prime for a = 4, 5, 10, 55 and 550 and, we conjecture, for infinitely many values of a. This is the same as asking that t2 − t − 1 is prime for t = a − 1, a, a + 1 and a + 2.) e) Show that if d > 4 then for any reducible polynomial f(t) ∈ Z[t] of degree d, the values of |f(n)| can be prime no more than d + 2 times. 2 f) We conjecture that ri − ri − 1 is prime for some infinite sequence of integers r1, r2,... . For any given integer d ≥ 2, let d−2 2 Y f(t) = (t + t − 1)(1 + a (t + ri)), i=1 a degree d, reducible polynomial, with f(−ri) prime for 1 ≤ i ≤ d − 2. We would like to find four more prime values, for values of n where |n2 + n − 1| = 1. The prime k-tuplets conjecture states that if a1t + b1, . . . , akt + bk ∈ Z[t] and Qk Content( j=1(ajt + bj)) = 1 then there are infinitely many integers n for which a1n + b1, . . . , akn + bk are all prime. Use the prime k-tuplets conjecture to deduce that there exist an integer a such that |f(t)| takes on prime values at d + 2 different integer values for t. If one asks for prime values of f(n)/Content(f) then the answer is surprisingly different: The number of prime values that can be taken by f(n)/Content(f) where 1.5. A DIVERSION: DYNAMICAL SYSTEMS PROOFS 9 f(t) ∈ Z[t] is a reducible polynomial of degree d is something like 2cd/ log d, for some constant c > 0.7 See [ChRu] for details. The final exercise in this section leads to the wonderful theorem that a poly- nomial f(t) with non-negative integer coefficients, can be proved to be irreducible simply by finding one integer n for which f(n) is prime.

Pd i Exercise 1.4.5. a) For a given polynomial f(t) = i=0 ait ∈ Z[t] define H := max0≤i≤d−1 |ai/ad|. Show that if f(α) = 0 then |α| < H + 1. b) Show that if f is reducible and n is an integer for which |f(n)| is prime then there exists a root α of f(α) = 0 for which |n − α| ≤ 1. c) Deduce that if n is an integer, with |n| ≥ H + 2, for which |f(n)| is prime then f(t) is irreducible. d) Show that these results cannot be much improved by considering the example f(t) = t(t + 1 − p) for p prime. e) Show the following result.

d If p = a0 +a110+...+ad10 is a prime written in base 10, then the polynomial d a0 + a1t + ... + adt is irreducible.

d P j (Hint: For α as in part b), prove that |f(α)/α | ≥ ad −9 j≥2 1/|α| , as Re(1/α) > 0.) Generalize this argument to any base b ≥ 3 in place of 10.

1.5. A diversion: Dynamical systems proofs The prime divisors of a sequence of integers > 1 form an infinite sequence of primes if the integers in the sequence are pairwise coprime. To generalize the constructions from section 1.1, we simplify the description of the sequences an and bn.

an+1 − 1 = a1a2 . . . an = (a1a2 . . . an−1)an = (an − 1)an, 2 so that an+1 = f(an) where f(t) := t − t + 1. (Similarly bn+1 = g(bn) where 2 g(t) := t − 2t + 2.) How do we explain the fact that an ≡ 1 (mod am) for all m < n? Well

am+1 = f(am) ≡ f(0) = 1 (mod am) and, thereafter,

an+1 = f(an) ≡ f(1) = 1 (mod am) by induction on n ≥ m + 1. (Similarly bm+1 = g(bm) ≡ g(0) = 2 (mod bm) and bn+1 = g(bn) ≡ g(2) = 2 (mod bm) by induction.) So the only requirements on f seem to be that f(0) = f(1) = 1, and f = 1 + t(t − 1) is the simplest such polynomial, though any polynomial 1 + h(t)t(t − 1), where h(t) ∈ Z[t] has positive leading coefficient, will work. In this section we will determine all polynomials f(t) ∈ Z[t] that can be used into this framework. We introduce a little terminology from dynamical systems: A number a is said to be preperiodic for f, if the sequence a, f(a), f(f(a)),... is eventually periodic.

7It is standard practice in this subject to let log be in base e, whereas the logarithm in base b is denoted logb. 10 1. PROOFS THAT THERE ARE INFINITELY MANY PRIMES, WITHOUT ANALYSIS

Proposition 1.5.1. Let f(t) ∈ Z[t] have degree > 1, positive leading coefficient, and f(0) 6= 0. Suppose that 0 is a preperiodic point for f but that 0 is not part of the period, and let ` be the least common multiple of the integers in the sequence f(0), f(f(0)), f(f(f(0))),... . If a0 ∈ Z with an+1 = f(an) for all n ≥ 0, and (an, `) = 1 for all n ≥ 0, then we obtain an infinite sequence of distinct primes by selecting one prime factor from each an.

Proof. Let w0 = 0 and wn+1 = f(wn) for all n ≥ 0, so that am+1 = f(am) ≡ f(0) = w1 (mod am) and, thereafter, am+j+1 = f1(am+j) ≡ f(wj) = wj+1 (mod am) by induction on j ≥ 1. Therefore if m < n then (am, an) = (am, wn−m) which divides (am, `), which equals 1 by the hypothesis. The rest of the proof follows as above.  To be able to use Proposition 1.5.1 we need a good idea of when 0 is a prepe- riodic point, which turns out to be simpler than one might guess:

Proposition 1.5.2. Suppose that f(t) ∈ Z[t] and that the sequence {un : n ≥ 0}, with u0 ∈ Z and un+1 = f(un) for all n ≥ 0, is periodic. The shortest period has length 1 or 2.

Proof. Suppose that un+p = un for all n ≥ 0. One knows that x − y divides f(x) − f(y) for any integers x, y; in particular that un+1 − un divides f(un+1) − f(un) = un+2 − un+1. Therefore u1 − u0 divides u2 − u1, which divides u3 − u2, . . . , which divides up −up−1, which divides up+1 −up = u1 −u0. That is, we have a sequence of integers that all divide one another and so must all be equal in absolute value. If they are all 0 then p = 1. If not then they cannot all be equal, say to d 6= 0, else 0 = (u1 − u0) + (u2 − u1) + (u3 − u2) + ··· + (up − up−1) = pd. Therefore there must be two consecutive terms that have opposite signs, yet have the same absolute value, so that un+2 − un+1 = −(un+1 − un) and therefore un+2 = un. But then applying f, p − n times to both sides, we deduce that u2 = u0 and therefore p = 2.  This allows us to classify all such polynomials f:

Theorem 1.5.3. Suppose that f(t) ∈ Z[t] and that 0 is a preperiodic point for f but is not in the period. The basic possibilities are: a) The period has length 1, and either f(t) = u with 0 → u → u → ... , or, for u = 1 or 2, f(t) = (2/u)t2 − u with 0 → −u → u → u → ... ; b) The period has length 2, and either f(t) = 1 + ut − t2 with 0 → 1 → u → 1 → ... , or f(t) = 1 + t + t2 − t3 with 0 → 1 → 2 → −1 → 2 → .... Other examples arise by replacing f(t) by −f(−t), or by adding a polynomial mul- Qk tiple of i=1(t − ai) where the ai are the distinct integers in the orbit of 0. Proof by Exercises

Exercise 1.5.1. Let f(t) ∈ Z[t], and assume that f has a period of length 1, say f(u) = u. Prove that a) f must be of the form f(t) = u + (t − u)g(t) for some g(t) ∈ Z[t]. b) If f(v) = u with v 6= u then f(t) = u + (t − u)(t − v)g(t) for some g(t) ∈ Z[t]. 1.6. FORMULAS FOR PRIMES 11 c) If f(w) = v then v −w = w −u = ±1 or ±2, = δ say and g(t) = 2/δ +(t−w)h(t) for some h(t) ∈ Z[t]. d) If f(x) = w then (x − u)(x − v) divides (w − u), which is impossible.

Exercise 1.5.2. Assume f(t) ∈ Z[t], and f has a period of length 2, say f(u) = v and f(v) = u. Prove that a) f must be of the form f(t) = v + u − t + (t − u)(t − v)g(t) for some g(t) ∈ Z[t]. b) If f(w) = v then w − v = ±1, so that g(t) = w − v + (t − w)h(t) for some h(t) ∈ Z[t]. c) If f(x) = w then x − u = ±1. If x − u = w − v = δ then 2 = (x − v)(w − v + (x − w)h(x)); this implies that x − v = δ, 2δ, −δ or −2δ each of which can be ruled out. If x − u = −(w − v) then u, x, w, v are consecutive integers (in this order), and h(t) = −1 + (t − x)j(t) for some j(t) ∈ Z[t]. d) Show that if f(y) = x then y −u divides |x−v| = 2, and y −v divides |x−u| = 1, which is impossible. Exercise 1.5.3. a) Deduce the cases of the theorem by setting x = 0, then w = 0 and then v = 0. Exercise 1.5.4. The second case in Theorem 1.5.3a) with u = 2, gives f(t) = 2 t − 2, with 0 → −2 → 2 → 2 → .... The sequence starting with x0 = 4 and then let xn+1 = f(xn) for all n ≥ 0 is known as the Lucas-Lehmer sequence, who proved n that the Mersenne number 2 − 1 is prime if and only if it divides xn−2. Prove that gcd(xm, xn) = 2 for all m 6= n and deduce that there is an infinite sequence of distinct odd primes {pn : n ≥ 1} where each pn is a prime divisor of xn. Exercise 1.5.5. Going back to the proof of Proposition 1.5.2, now suppose that f(x) ∈ A[x] where A is the of integers of some number field, and that u0 → u1 → ... → up−1 → u0 where p is prime, and u0 ∈ A. Prove that the quotient (ui+m − ui)/(um − u0) is a unit in A for all i and all m, 1 ≤ m ≤ p − 1.

We have considered iterations of the map n → f(n) where f(t) ∈ Z[t]. If one allows f(t) ∈ Q[t] then it is an open question as to the possible period lengths in the integers. Even the simplest imaginable case, f(x) = x2 + c, with c ∈ Q, is not only open but leads to the magnificent world of dynamical systems (see []). It would be 2 1 interesting to know what primes divide the numerators of xn, when xn = xn−1 + 2 .

1.6. Formulas for primes

Let p1 = 2 < p2 = 3 < . . . be the sequence of primes and define X m2 α = pm/10 = .2003000050000007000000011 .... m≥1

m2 2m−1 (m−1)2 Then pm = [10 α] − 10 [10 α]. Is such a magical number α truly inter- esting? If one could easily describe α (other than by the definition that we gave) then it might provide an easy way to determine the primes. But with its artificial definition it does not seem like it can be used in any practical way. At first one might suppose that, useful or not, α is quite unique; however the following exercise shows that this is not so. 12 1. PROOFS THAT THERE ARE INFINITELY MANY PRIMES, WITHOUT ANALYSIS

Exercise 1.6.1. Show that there are uncountably many numbers α ∈ (0, 1) such that one can read the prime numbers out from the decimal digits, in-between the 0’s.

——————————- An even less practical formula for primes is derived as follows: Wilson’s theorem tells us that n is a prime if and only if n divides (n − 1)! + 1. We deduce that (   (n − 1)! + 1 1 if n is prime; cos 2π = n 0 if n is composite.

Summing this over all integers up to x we have an exact formula for the number of primes up to x. This formula is surely useless, because we have no idea how to sum up such terms. ——————————- In 1971 Matijaseviˇcindicated how to construct a polynomial f in many vari- ables, such that the positive values taken by f when each variable is an integer, is precisely the set of primes.8 One can find many different such polynomials; we will give one below with 104 variables of degree 25, and it is known that one can cut the degree to as low as 5 though for an astronomical cost in terms of the number of variables. No one knows the minimum possible degree, or the minimum possible number of variables. Our polynomials below is in 26 variables, only allowed to take positive integer values, since we can write each variable as a sum of four squares plus 1.9 Our polynomial, which is reducible, is k + 2 times

1 − (n + l + v − y)2 − (2n + p + q + z − e)2 − (wz + h + j − q)2 − (ai + k + 1 − l − i)2 − ((gk + 2g + k + 1)(h + j) + h − z)2 − (z + pl(a − p) + t(2ap − p2 − 1) − pm)2 − (p + l(a − n − 1) + b(2an + 2a − n2 − 2n − 2) − m)2 − ((a2 − 1)l2 + 1 − m2)2 − (q + y(a − p − 1) + s(2ap + 2a − p2 − 2p − 2) − x)2 − ((a2 − 1)y2 + 1 − x2)2 − (16(k + 1)3(k + 2)(n + 1)2 + 1 − f 2)2 − (e3(e + 2)(a + 1)2 + 1 − o2)2 − (16r2y4(a2 − 1) + 1 − u2)2 − (((a + u2(u2 − a))2 − 1)(n + 4dy)2 + 1 − (x + cu)2)2.

For this to take a positive value, the displayed quantity must equal 1, and k + 2 be a prime. The displayed quantity equals 1 minus a sum of squares, so each of these squares must equal 0. Evidently there are some ideas hidden in the algebraic expressions given for the squares, and the only way to appreciate this polynomial is to understand its derivation – see [JSW]. In the current state of knowledge it seems that this absolutely extraordinary and beautiful polynomial is entirely useless in helping us better understand the distribution of primes!

8One can also construct such polynomials for the set of Fibonacci numbers, for the set of Fermat primes, for the set of Mersenne primes and the set of even perfect numbers, and indeed any diophantine set. 9Lagrange proved that n is a non-negative integer if and only if it can be written as the sum of four squares of integers. 1.7. SPECIAL TYPES OF PRIMES 13

1.7. Special types of primes In section 1.2 we asked whether there are infinitely many primes of the form 2n + 1, or of the form 2n − 1, both of which are open questions. One can generalize this by asking whether there are infinitely many primes of the form k·2n ±1 or of the form k ± 2n for given integer k. At first sight this seems like a much more difficult question but Erd˝osshowed, ingeniously, how these questions can be resolved for certain integers k: 2n Let bn = 2 + 1 be the Fermat numbers (remember that b0, b1, b2, b3, b4 are prime and b5 = 641 × 6700417), and let k be any positive integer such that k ≡ 1 (mod 641b0b1b2b3b4) and k ≡ −1 (mod 6700417). Now n 1 • if n ≡ 1 (mod 2) then k · 2 + 1 ≡ 1 · 2 + 1 = b0 ≡ 0 (mod b0); n 2 • if n ≡ 2 (mod 4) then k · 2 + 1 ≡ 1 · 2 + 1 = b1 ≡ 0 (mod b1); n 22 • if n ≡ 4 (mod 8) then k · 2 + 1 ≡ 1 · 2 + 1 = b2 ≡ 0 (mod b2); n 23 • if n ≡ 8 (mod 16) then k · 2 + 1 ≡ 1 · 2 + 1 = b3 ≡ 0 (mod b3); n 24 • if n ≡ 16 (mod 32) then k · 2 + 1 ≡ 1 · 2 + 1 = b4 ≡ 0 (mod b4); n 25 • if n ≡ 32 (mod 64) then k · 2 + 1 ≡ 1 · 2 + 1 = b5 ≡ 0 (mod 641); and • if n ≡ 0 (mod 64) then k · 2n + 1 ≡ −1 · 20 + 1 = 0 (mod 6700417). Every integer n belongs to one of these arithmetic progressions (these are called a covering system of congruences), and so we have exhibited a prime factor of k · 2n + 1 for every integer n. This proof works for all k in some arithmetic progression mod 264 − 1, and therefore we have shown that for a positive proportion of integers k, there is no prime p such that (p − 1)/k is a power of 2.

Exercise 1.7.1. Select k as above. a) Prove that k · 2n + 1 is composite for every integer n ≥ 0.. b) Prove that 2n + k is composite for every integer n ≥ 0. (That is, there is no prime p for which p − k is a power of 2.) n c) Let ` be any positive integer for which ` ≡ −k (mod b6 −2). Prove that `·2 −1 and |2n −`| is composite for every integer n ≥ 0. Deduce that a positive proportion of odd integers m cannot be written in the form p + 2n with p prime.

Exercise 1.7.2. (Open question) John Selfridge found the smallest k known for which k · 2n + 1 is always composite: At least one of the primes 3, 5, 7, 13, 19, 37, and 73 divides 78557 · 2n + 1. Is this the smallest such k?

Exercise 1.7.3. Let ` be any integer for which ` ≡ 1 (mod 641b0b2b3b4) and 23 4 25 ≡ 2 (mod 6700417), so that k = ` ≡ 1 (mod 641b0b2b3b4) and is ≡ 2 ≡ −1 (mod 6700417). If n ≡ 2 (mod 4) then, writing n = 4m + 2 we have k · 2n + 1 = 4(`2m)4 + 1 and the polynomial 4t4 + 1 = (2t2 + 2t + 1)(2t2 − 2t + 1). Prove that k · 2n + 1 is composite for every integer n ≥ 0. This last exercise shows that we can have k · 2n + 1 composite for all n for reasons other than having a covering system.

Exercise 1.7.4. Prove that bn − 2 = b0b1 . . . bn−1 cannot be written in the k ` form p + 2 + 2 where p is prime and k > ` ≥ 0. (Hint: Consider divisibility by br where 2r is the highest power of 2 dividing k − `.) 14 1. PROOFS THAT THERE ARE INFINITELY MANY PRIMES, WITHOUT ANALYSIS

Exercise 1.7.5. Combine the ideas above to show that there are infinitely many integers m which cannot be written in the form p + 2k + 2` where p is prime and k ≥ ` ≥ 0. CHAPTER 2

Infinitely many primes, with analysis

In this chapter the reader is introduced to some basic counting concepts while proving that there are infinitely many primes, giving upper and lower bounds for the number of primes, or certain functions of the number of primes, and culminating in a discussion of the inter-relation between counting functions.

2.1. First Counting Proofs It is instructive to see several proofs involving simple counting arguments, as these will give a flavor of things to come. Suppose that there are only finitely many primes, say p1 < p2 < . . . < pk. Let m = p1p2 . . . pk. If d is squarefree then d divides m, and so the number of integers up to m that are divisible by d is m/d. Therefore, by the inclusion-exclusion principle, the number of positive integers up to m that are not divisible by any prime is

k k k X m X m Y  1  Y m − + − · · · = m 1 − = (pi − 1). pi pipj pi i=1 1≤i

(2.1.1) e1 log p1 + e2 log p2 + ··· + ek log pk ≤ log x. If one adjoins to each lattice point a unit cube in the positive direction, one obtains a shape S whose volume is exactly the number of lattice points, and which is roughly the same shape as the original tetrahedron itself. The tetrahedron T (x) has volume

k 1 Y log x (2.1.2) , k! log p i=1 i so we might expect this to be a good approximation to the number of such lattice points; actually it is a lower bound on the number of lattice points since T (x) ⊂ S. On the other hand S is contained in the tetrahedron given by adding 1 in each

15 16 2. INFINITELY MANY PRIMES, WITH ANALYSIS direction to the diagonal boundary of the tetrahedron, that is T (xp1 . . . pk). There- fore if p1 < p2 < . . . < pk are the only primes then x = N = |S| ≤ |T (xp1 . . . pk)| for all x; and so if m = p1p2 . . . pk then k 1 Y log(mx) (2.1.3) x ≤ , k! log p i=1 i which is certainly false if x is large enough (see exercise 2.1.1). One can easily obtain good enough upper bounds on the number of solutions to (2.1.1) with less work: In any solution to (2.1.1) we must have ej log pj is no more than the left side of (2.1.1) and thus than log x, so that 0 ≤ ej ≤ (log x)/(log pj), Qk and therefore N ≤ i=1(log xpj)/(log pj).

Exercises

Exercise 2.1.1. Prove that (2.1.3) is false for x sufficiently large. Find a constant C such that (2.1.3) is false for x = (Ck log m)k. It is useful to be able to count the number of integers up to x divisible by a given integer d ≥ 1. These are the set of integers of the form dn where n ≥ 1 and dn ≤ x, in other words these are in 1-to-1 correspondence with the set of integers n in the range 1 ≤ n ≤ x/d. There are [x/d] such integers (where [t], the integer part of t, denotes the largest integer ≤ t), and [x/d] ≤ x/d. What about the number of positive integers up to x which are ≡ a (mod d)? Exercise 2.1.2. Give a precise expression for the number of positive integers up to x which are ≡ a (mod d) in terms of the least positive residue of a (mod d)? Give an approximation, involving a smooth function involving only the variables x and d, that is out from the correct count by at most 1?

2.2. Euler’s proof and the Riemann zeta-function In the eighteenth century Euler gave a different proof that there are infinitely many primes, one which would prove highly influential in what was to come later. Suppose again that the list of primes is p1 < p2 < ··· < pk. Euler observed that the fundamental theorem of arithmetic implies that there is a 1-to-1 corre- a1 a2 ak spondence between the sets {n ≥ 1 : n is a positive integer} and {p1 p2 . . . pk : a1, a2, . . . , ak ≥ 0}. Thus a sum involving the elements of the first set should equal the analogous sum involving the elements of the second set: X 1 X 1 = s a1 a2 ak s n (p1 p2 . . . pk ) n≥1 a1,a2,...,ak≥0 n a positive integer       X 1 X 1 X 1 = ...  a1 s   a2 s   ak s  (p1 ) (p2 ) (pk ) a1≥0 a2≥0 ak≥0 k −1 Y  1  = 1 − , p s j=1 j for any real number s > 0. The last equality holds because each sum in the second- to-last line is over a geometric progression. Euler then noted that if we take s = 1 2.2. EULER’S PROOF AND THE RIEMANN ZETA-FUNCTION 17 then the right side equals some (since each pj > 1) whereas the left side equals ∞, a contradiction (and thus there cannot be finitely many primes). P We prove that n≥1 1/n diverges in exercise 2.2.1 below What is wonderful about Euler’s formula is that something like it holds without assumption, involving the infinity of primes; that is

−1 X 1 Y  1  (2.2.1) = 1 − . ns ps n≥1 p prime n a positive integer One does need to be a little careful about convergence issues. It is safe to write down such a formula when both sides are “absolutely convergent”, which takes place when s > 1. In fact they are absolutely convergent even if s is a so long as Re(s) > 1. We have just seen that (2.2.1) makes sense when s is to the right of the hor- izontal line in the going through the point 1. Like Euler, we want to be able to interpret what happens to (2.2.1) when s = 1. To not fall afoul of convergence issues we need to take the limit of both sides as s → 1+, since (2.2.1) only obviously holds for values of s larger than (2.2.1). To do this it is convenient R ∞ dt 1 to note that the left side of (2.2.1) is well approximated by 1 ts = s−1 , and thus diverges as s → 1+. We deduce that

Y  1 (2.2.2) 1 − = 0 p p prime which, upon taking logarithms, implies that X 1 (2.2.3) = ∞. p p prime So how numerous are the primes? One way to get an idea is to determine the behaviour of the sum analogous to (2.2.3) for other sequences of integers. For P 1 instance n≥1 n2 converges, so the primes are, in this sense, more numerous than the squares. We can do better than this from our observation, just above, that P 1 1 n≥1 ns ≈ s−1 is convergent for any s > 1 (see exercise 2.2.2 below). In fact, P 1 since n≥1 n(log n)2 converges, we see that the primes are in the same sense more numerous than the numbers {n(log n)2 : n ≥ 1}, and hence there are infinitely many integers x for which there are more than x/(log x)2 primes ≤ x. There is another derivation of (2.2.1) that is worth seeing. One begins with P 1 s n≥1 ns , the sum of 1/n over all integers n. Now suppose that we wish to remove the even integers from this sum. Their contribution to this sum is X 1 X 1 1 X 1 = = ns (2m)s 2s ms n≥1 m≥1 m≥1 n even writing even n as 2m, and hence

X 1 X 1 X 1  1  X 1 = − = 1 − . ns ns ns 2s ns n≥1 n≥1 n≥1 n≥1 (n,2)=1 n even 18 2. INFINITELY MANY PRIMES, WITH ANALYSIS

If we wish to remove the multiples of 3 we can proceed similarly, to obtain X 1  1   1  X 1 = 1 − 1 − ; ns 2s 3s ns n≥1 n≥1 (n,2·3)=1 Q and for arbitrary y, letting m = p≤y p, X 1 Y  1  X 1 = 1 − · . ns ps ns n≥1 p≤y n≥1 (n,m)=1 As y → ∞, the left side becomes the sum over all integers n ≥ 1 which do not have any prime factors: the only such integer is n = 1 so the left hand side becomes 1/1s = 1. Hence Y  1  X 1 1 − · = 1 ps ns p prime n≥1 an alternative formulation of (2.2.1). The advantage of this proof is that we see what happens when we “sieve” by various primes, that is remove the integers from our set that are divisible by the given prime.

Exercises

Exercise 2.2.1. The box with corners at (n, 0), (n + 1, 0), (n, 1/n), (n + 1, 1/n) has area 1/n and contains the area under the curve y = 1/x between x = n and R n+1 1 x = n + 1. Therefore 1/n ≥ n t dt. P R N+1 1 a) Deduce that n≤N 1/n ≥ 1 t dt = log(N + 1), and that the sum of the reciprocals of the positive integers diverges. b) Now draw the box of height 1/n and width 1 to the left of the line x = n, for P each n ≥ 2, and obtain the upper bound n≤N 1/n ≤ log(N) + 1. c) Our goal is to prove that limN→∞(1/1 + 1/2 + 1/3 + ··· + 1/N − log N) exists — it is usually denoted by γ and called the Euler-Mascheroni constant. Now let xn = 1/1 + 1/2 + 1/3 + ··· + 1/n − log n for each integer n ≥ 1. By the same argument as in parts a) and b), show that if n > m then

0 ≤ xm − xn ≤ log(1 + 1/m) − log(1 + 1/n) < 1/m.

Thus xm is a Cauchy sequence and converges to a limit as desired. It can be shown that γ = .5772156649 ... . d) Deduce that 0 ≤ 1/1 + 1/2 + 1/3 + ··· + 1/N − log N − γ ≤ 1/N. e) Let {t} = t − [t] denote the fractional part of t. Show that

N Z N {t} X 1 Z ∞ {t} 1 − dt = − log N, and deduce that γ = 1 − dt. t2 n t2 1 n=1 1 In exercise 3.6.1 we begin with this formula to deduce (d). Exercise 2.2.2. Use the method of exercise 2.2.1 to show that if σ > 1 then 1 X 1 σ ≤ ≤ . σ − 1 nσ σ − 1 n≥1 2.4. AN EXPLICIT LOWER BOUND ON THE SUM OF RECIPROCALS OF THE PRIMES 19

Exercise 2.2.3. Given that P 1/p diverges, deduce that there are arbitrarily p √ √ large values of x for which #{p ≤ x : p prime} ≥ x. Improve the x here as much as you can using these methods.

Exercise 2.2.4. Prove that there are infinitely many primes p with a 1 in their decimal expansion. (Hint: How many integers do not have a 1 in their decimal expansion?) Prove that any finite digit pattern appears in the decimal expansion of infinitely many primes.

2.3. Upper bound on the number of primes up to x In the previous section we introduced sieving, that is removing the multiples of the small primes from the set of integers. Since primes > y are defined to be integers without small prime factors, one might hope that sieve methods will help one to count the number of primes up to a given point. We begin by proving that the primes give a vanishing proportion of the integers up to x. Q Fix  > 0. By (2.2.2) we know that there exists y such that p≤y(1−1/p) < /3. Let m be the product of the primes ≤ y, and select x > 3y/. If k = [x/m], so that km ≤ x < (k + 1)m < 2km, then the number of primes up to x is no more than the number of primes up to (k + 1)m, which is no more than the number of primes up to y plus the number of integers up to (k + 1)m which have all of their prime factors > y. Since there are no more than y primes up to y, and since the set of integers up to (k + 1)m is {jm + i : 1 ≤ i ≤ m, 0 ≤ j ≤ k}, we deduce that the number of primes up to x is

k X X ≤ y + 1 = y + (k + 1)φ(m) j=0 1≤i≤m (jm+i,m)=1 Y < x/3 + 2km (1 − 1/p) < x/3 + 2x/3 = x. p≤y In other words 1 lim #{p ≤ x : p prime} → 0. x→∞ x

2.4. An explicit lower bound on the sum of reciprocals of the primes One of our goals now is to find simple combinatorial techniques to give good upper and lower bounds on the number of primes up to x, perhaps with certain weights. This is good practice in using analytic methods. Every integer n up to x can be written as a squarefree integer m times a square, r2. Moreover any squarefree integer ≤ x can be written as a product of distinct primes ≤ x, so that

X 1 X 1 X µ2(m) Y  1 ≤ ≤ 2 1 + , n r2 m p n≤x r≥1 m≤x p prime p≤x where µ(m) is the Mobius function and the last inequality follows from exercise 2.2.2. Since 1+1/p ≤ e1/p we may take the logarithm of both sides to obtain, using 20 2. INFINITELY MANY PRIMES, WITH ANALYSIS exercise 2.2.1, X 1 (2.4.1) ≥ log log(x + 1) − log 2. p p prime p≤x This gives a good quantitative lower bound on the number of primes, counted with the weight 1/p.

2.5. Another explicit lower bound This next argument takes a sum and evaluates it in one way using analysis, in another way using , and when equating the two, we obtain an advance. Using this dichotomy, is something that happens repeatedly in number theory. P We begin by writing log n = pa|n log p where the sum is over all of the prime powers that divide n. Summing over all positive integers ≤ N, we obtain the identity X X X X X X  N  (2.5.1) log n = log p = log p 1 = log p . pa n≤N n≤N pa|n p prime, a≥1 n≤N p prime pa≤N pa|n a≥1 Use a modification of the method of exercise 2.2.1 we prove that X Z N log n ≥ log t dt = N(log N − 1) + 1. n≤N 1 Then (2.5.1) yields X X N X log p N(log N − 1) + 1 ≤ log p = N , pa p − 1 p prime a≥1 p prime p≤N p≤N and therefore X log p (2.5.2) > log N − 1. p − 1 p prime p≤N We obtain an entirely explicit lower bound in exercise 2.6.3.

2.6. Binomial coefficients: First bounds The numerator of any binomial coefficient m + n (m + 1)(m + 2) ··· (m + n) = n 1 · 2 ··· n with m > n is evidently divisible by all the primes between m + 1 and n, while the denominator is not. Therefore we can hope to study primes in intervals by analyzing the binomial coefficients, both algebraically (by being more precise about prime divisibility), and analytically, by estimating accurately the size of binomial coefficients. Every prime in (n + 1, 2n + 1] divides the numerator of the binomial coefficient 2n+1 2n+1 2n+1 2n+1 n . Therefore the product of these primes is ≤ n . Since n = n+1 , 2n+1 1 P 2n+1 1 2n+1 L therefore n ≤ 2 k k = 2 × 2 = 4 , where L is the number of integers in the interval (n + 1, 2n + 1]. We deduce by induction that the product 2.7. BERTRAND’S POSTULATE 21 of the primes up to N is ≤ 4N−1 for all N ≥ 1. Taking logarithms (in base e) we obtain X (2.6.1) log p ≤ (x − 1) log 4. p prime p≤x n  n If n > 1 then [n/2] is the largest of the binomial coefficients a ; and is at n Pn n n  least as large as the sum of the two smallest. Therefore 2 ≤ a=0 a ≤ n [n/2] . Combining this with exercise 2.6.1c) we obtain that 2n  n  Y ≤ ≤ pep ≤ n#{p prime: p≤n} n [n/2] p prime p≤n

ep n  where p k [n/2] , so that n (2.6.2) #{p prime : p ≤ n} ≥ (log 2) − 1. log n

Exercises

Exercise 2.6.1. a) Use exercise 2.1.2 to prove that the power of prime p which P e divides n! is e≥1[n/p ]. a+b b) Deduce that the power of p that divides the binomial coefficient a is given by the number of carries when adding a and b in base p. e n  e c) Deduce that if p divides the binomial coefficient m then p ≤ n. Exercise 2.6.2. Modify the argument used to prove (2.6.1) to now show that n n (2.6.3) #{p ≤ n : p prime} ≤ log 4 + 2(log 4)2 log n (log n)2 for all n ≥ 2. (Hint: First prove this for all n ≤ 100 by calculations, and then by induction for larger n.)

Exercise 2.6.3. a) For each prime p ≤ N let kp be the largest integer k for which pk ≤ N. Prove that N X  N  N log N − ≤ + kp ≤ 2 + . p − 1 pa pkp (p − 1) log p a≥1 b) Using (2.5.1) and results from section 2.5 deduce that X log p (2.6.4) < log N + 4 p − 1 p prime p≤N for N sufficiently large. 6n3n Exercise 2.6.4. Prove that every prime in (n, 6n] divides 3n 2n . Use this to improve (2.6.1). Can you obtain further improvements? (For example, exercise 3.6.5.)

2.7. Bertrand’s postulate 22 2. INFINITELY MANY PRIMES, WITH ANALYSIS

Theorem 2.7.1 Bertrand’s postulate

For every integer n > 1, there s a prime number between n and 2n.

Equations (2.6.2) and (2.6.3), taken together, just fail to imply Bertrand’s postulate. Paul Erd˝os’singenious approach involves a more detailed analysis of the prime factors of binomial coefficients, similar to those that arose in section 2.6, by considering the primes in different intervals: Let pep be the exact power of prime p 2n dividing n . From exercise 2.6.1b we know that ep = 1 if n < p ≤ 2n, and ep = 0 if 2√n/3 < p ≤ n (verify this as an exercise). By√ exercise 2.6.1c we know that ep ≤ 1 if 2n < p ≤ 2n/3, and that pep ≤ 2n if p ≤ 2n. Therefore we have 22n 2n Y Y Y Y ≤ = pep ≤ p p 2n 2n n √ p≤2n n 2 is prime). Taking logarithms we deduce that √ X log 4 2n + 3 log p > n − log(2n). 3 2 p prime n

Exercises

Exercise 2.7.1. Write a computer program to verify (2.7.1) for all positive integers n ≤ 10000. Exercise 2.7.2. Prove Bertrand’s postulate for all n up to 20000 using only the primes 2, 3, 5, 7, 13, 23, 43, 83, 163, 317, 631, 1259, 2503, 5003, 9973, 10007. Exercise 2.7.3. Prove that there are infinitely many primes p with a 1 as the leftmost digit in their decimal expansion. Exercise 2.7.4. Use Bertrand’s postulate to show, by induction, that every integer n > 6 can be written as the sum of distinct primes. (Hint: Use induction to show that, for each n ≥ 6, every integer in [7, 2pn + 6] is the sum of distinct primes in {2, 3 . . . , pn}, where pn is the nth smallest prime.) 2.8. BIG OH AND OTHER NOTATION 23

Exercise 2.7.5. Our goal is to prove the theorem of Sylvester and Schur, that any k consecutive integers, beginning at n + 1 with n ≥ k, is divisible by a prime > k. We suppose that this is false. n+k π(k) a) Show that n only has prime factors ≤ k and so is ≤ (n + k) . n+k k b) Since n > (n/k) , use (2.6.3) deduce that n ≤ 8k if k ≥ 62503. Using Stirling’s formula (see exercise 2.8.3) in the form e(k/e)k ≤ k! ≤ ek(k/e)k for k ≥ 7, improve the above. c) Sylvester actually proved that if (m, d) = 1 where d ≥ 1 and m > n then (m + d)(m + 2d) ... (m + nd) has a prime factor larger than n. Modify your work above to go some way to proving this.

2.8. Big Oh and other notation We might guess that the sum in (2.5.2) tends to log N + C for some constant C, where −1 < C < 4, so that the bounds obtained in (2.5.2) or (2.6.4) do not reflect the true value of C. So the numbers obtained in these bounds are of limited interest. If we focus on proving only that there exist bounds on C, then our work is easier, and it pays to have notation that reflects this new objective: If A(x) and B(x) are functions of x, and there exists some constant c > 0 such that |A(x)| ≤ cB(x) for all x ≥ 1 then we write A(x) = O(B(x)) (we say “A(x) is big oh of B(x)”), or A(x)  B(x) (“A(x) is less than, less than B(x)”), or even B(x)  A(x) (“B(x) is greater than, greater than A(x)”). If A(x)  B(x) and B(x)  A(x), that is A(x),B(x) > 0 and there exist constants c1, c2 > 0 such that c1B(x) ≤ A(x) ≤ c2B(x), then we write A(x)  B(x). For example, #{p prime : p ≤ n}  n/log n by (2.6.2) and (2.6.3). Now log N! equals the left side of (2.5.1), which equals

Z N log t dt + O(log N) = N(log N − 1) + O(log N). 1 The right side equals

X  N  X  N  X log p log p = log p + O(1) = N +O(N) pa pa pa p prime, a≥1 p prime, a≥1 p prime, a≥1 pa≤N pa≤N pa≤N by (2.6.1), and X log p X log p ≤ = O(1), pa p(p − 1) p prime, a≥2 p prime pa≤N by exercise 2.2.2. Therefore, combining the above displayed equations with (2.5.1), we obtain

X log p (2.8.1) = log N + O(1). p p prime p≤N 24 2. INFINITELY MANY PRIMES, WITH ANALYSIS

P log p We define E(N) := p≤N p − log N so that E(N) = O(1) by (2.8.1). From the identity   Z N X 1 1 X log p 1  X log p (2.8.2) = +   dt p log N p t(log t)2 p p prime p prime 2 p prime  p≤N p≤N p≤t we obtain, using (2.8.1), X 1 1 Z N 1 = (log N + E(N)) + (log t + E(t)) dt p log N t(log t)2 p prime 2 p≤N E(N) log N  Z ∞ E(t) Z ∞ E(t) = 1 + + log + 2 dt − 2 dt. log N log 2 2 t(log t) N t(log t) ∞ R ∞ E(t) R ∞ 1 h 1 i 1 Now since |E(t)|  1 we have t(log t)2 dt  t(log t)2 dt = − log t = log 2 , 2 2 2 R ∞ E(t) 1 R ∞ E(t) and that N t(log t)2 dt  log N . So let c be the constant 1−log log 2+ 2 t(log t)2 dt and then the above reads: X 1  1  (2.8.3) = log log N + c + O , p log N p prime p≤N a big improvement on (2.4.1), although determining c from this proof would be difficult. How difficult is it to obtain the seemingly magical identity (2.8.2)? Can we find such identities in other circumstances? This leads us to the subject of partial summation: Suppose that a(n) is some function on the integers (for example a(p) = P (log p)/p if p is prime, and a(n) = 0 otherwise), and let A(x) := n≤x a(n). If f(t) is a differentiable function on R+ then, formally, we have (2.8.4) Z x Z x X x 0 a(n)f(n) = f(t)dA(t) = [f(t)A(t)]1− − f (t)A(t)dt − − n≤x 1 1   Z x X 0 X (2.8.4) = f(x) a(n) − f (t)  a(n) dt n≤x 1 n≤t after integrating by parts, since A(1−) = 0. Calculating a sum in this manner is called partial summation. We see that (2.8.2) is an example of this, with f(t) = 1/ log t, a decreasing function. If a(n) ≥ 0 for all n and f(x) > 0 for all x > x0, then this formula is most successful with decreasing f(t) since then f 0(t) < 0 and all but finitely many of the terms in (2.8.4) take the same sign. Let’s look at an example where f(t) is increasing: Again let a(p) = (log p)/p if p is prime, and a(n) = 0 otherwise, but this time take f(t) = t/ log t, so as to count the number of primes up to N. Then, by (2.8.4) we have   Z N   X N X log p 1 1  X log p 1 = − −   dt, log N p log t (log t)2 p p prime p prime 1 p prime  p≤N p≤N p≤t so by (2.8.1) the right side becomes (2.8.5) 2.9. HOW MANY PRIME FACTORS DOES A TYPICAL INTEGER HAVE? 25

N Z N  1 1  = (log N + O(1)) − − 2 (log t + O(1)) dt log N 1 log t (log t) ( !)  N  Z N dt N (2.8.5) = N + O − N + O  log N 1 log t log N

(we could have just as well have written O(N/ log N) at the final equality). We see here that the main terms cancel and that the secondary terms are not known accurately enough to get an asymptotic estimate. Typically, when f(t) is increasing there will be this cancellation between the main terms of the two halves of the formula, so we usually use partial summation when f is decreasing.

Exercises These exercises are very much focussed on using the analytic notions discussed above, something that is essential to any course in analytic number theory. P P Exercise 2.8.1. Prove that p(log p)/p − p(log p)/(p − 1)  1.

Exercise 2.8.2. Prove (2.8.4) by considering the coefficient of a(n) on both sides, for each n.

Exercise 2.8.3. Let aN = log N!−N(log N −1) for N ≥ 1. Prove that log n = R n+1/2 2 1 n−1/2 log tdt + O(1/n ), and deduce that aN − aM = 2 log(N/M) + O(1/M). De- duce that there exists a constant C such that N! = (N/e)N (CN)1/2{1 + O(1/N)}. (In fact C = 2π, which is Stirling’s formula.) Another approach is given in exercise 3.6.2.

P k R x k k Exercise 2.8.4. Prove that n≤x(log(x/n)) = 1 (log(x/t)) dt+O((log x) ), and then that this equals k!x + O((k + log x)k).

Exercise 2.8.5. Show that if f(x) is a function such that, for every  > 0, we have f(x) = 1 + O() then f(x) = 1 + o(1). √ Exercise 2.8.6. Let √A(x) = x + x. Show that {1 + o(1)}A(x) − {1 + o(1)}x is not equal to {1 + o(1)} x, and indicate what it is equal to.

2.9. How many prime factors does a typical integer have? We begin this section by estimating the average number of prime factors of an integer ≤ x. To be clear we must distinguish whether we are counting the prime factors with their multiplicities or not; that is, one might count 12 = 22 × 3 as having two or three prime factors depending on whether one counts the 22 as one or two primes. So we will work with either definition: X X ω(n) = 1 and Ω(n) = 1. p prime p prime,a≥1 p|n pa|n 26 2. INFINITELY MANY PRIMES, WITH ANALYSIS

First note that

X X X X X X x ω(n) = 1 = 1 = p n≤x n≤x p prime p prime n≤x p prime p|n p|n X x    1  (2.9.1) = + O(1) = x log log x + c + O . p log x p prime p≤x by (2.8.3) and (2.8.5). Therefore the average number of distinct prime factors of an integer up to x is about log log x + c + o(1). The notation “o(1)”, which we will use repeatedly, stands for a function A(x) for which A(x) → 0 as x → ∞. We use this in the context that we do not much care what the function is, simply that it goes to 0 as x goes to ∞. When we allow multiplicities, that is we count also the number of times each prime factor divides n, then there is not much change in our estimate. Indeed the difference between the two is given by

X X X X X X  x  {Ω(n) − ω(n)} = 1 = 1 = pa n≤x n≤x p prime,a≥2 p prime n≤x p prime,a≥2 pa|n a≥2 pa|n pa≤x X X  x  (2.9.2) = + O(1) . pa p prime√ a≥2 p≤ x pa≤x

√ For each prime p ≤ x, there are O(log x) error terms O(1); we extend the sum to all integers a ≥ 2, obtaining x/p(p−1) with an additional error of P x/pa < a,√ pa>x 1 +√ 1/p + ... = O(1). In total these error terms contribute O(π( x) log x) = O( x) by (2.8.5). We also extend the sum to all primes p, and therefore have a main term of c0x where c0 = P 1/p(p − 1), with an additional error of √ p prime ≤ x P √ 1/n(n − 1) = O( x). We have therefore proved that the quantity in n≥ x √ (2.9.2) equals c0x+O( x). This implies that the average number of (not necessarily distinct) prime factors of an integer up to x is log log x + c + c0 + o(1), not much different from the distinct case. We are going to go one step further and ask how much ω(n) varies from its mean, that is we are going to compute the statistical quantity, the variance. We begin with a standard identity for the variance (which we will use repeatedly, see exercise 2.9.2):

 2  2 1 X 1 X 1 X 1 X (2.9.3) ω(n) − ω(m) = ω(n)2 − ω(m) . x  x  x x  n≤x m≤x n≤x m≤x 2.9. HOW MANY PRIME FACTORS DOES A TYPICAL INTEGER HAVE? 27

Now the sum in the first term here is       X 2 X X X X  X x X x  ω(n) = 1 =  +  p pq n≤x n≤x p prime q prime p prime q prime q prime  p|n q|n q=p q6=p  2 X x X X x X 1  X 1 ≤ + ≤ x + x   . p pq p p p prime p prime q prime, q6=p p prime p prime  p≤x pq≤x p≤x p≤x Therefore (2.9.1) implies that (2.9.3), the variance, is  2  2   X 1  X 1  X 1 1  (2.9.4) +   −  + O  = log log x + o(1). p p p log x p prime p prime  p prime  p≤x p≤x p≤x By the inequality (a + b)2 ≤ 2(a2 + b2) we see that  2  2 1 X 1 X (ω(n) − log log x)2 ≤ 2 ω(n) − ω(m) + 2 ω(m) − log log x .  x  x  m≤x m≤x Taking the average of this over n ≤ x we deduce from (2.9.1) and (2.9.3), an inequality originally proved by Tur´an: 1 X (2.9.5) (ω(n) − log log x)2 ≤ 2 log log x + c2 + o(1). x n≤x From here we can deduce an elegant theorem of Hardy and Ramanujan: Let S be the set of n ≤ x for which |ω(n) − log log x| ≥ (log log x)2/3, then the left side of (2.9.5) is 1 X 1 X ≥ (log log x)4/3 + 0 = (|S|/x)(log log x)4/3. x x n∈S n≤x n6∈S Therefore, by (2.9.5) we have |S|  x/(log log x)1/3 which is, of course o(x). There- fore almost all integers n ≤ x have log log x+O((log log x)2/3) distinct prime factors. (By almost all integers up to x, we mean “for all but at most o(x) integers up to x”.) One more thing, log log n = log log x + o(1) for almost all integers n ≤ x (see exercise 2.9.1c), and so almost all integers n ≤ x have log log n + O((log log n)2/3) distinct prime factors. So we have proved

Theorem 2.9.1 Hardy and Ramanujan

Almost all integers n have {1 + o(1)} log log n distinct prime factors.

We need to further explain the notation. The term “o(1)” stands, as before, for a function A(n) for which A(n) → 0 as n → ∞. We could equally have written log log n + o(log log n) where “o(log log n)” stands for a function B(n) for which B(n)/ log log n → 0 as n → ∞. By “almost all integers n” we mean “almost all 28 2. INFINITELY MANY PRIMES, WITH ANALYSIS integers n ≤ x, for all sufficiently large x”; and by that we mean that there are no more than o(x) integers n ≤ x for which this is false.

Exercises Exercise 2.9.1. a) Use (2.9.1) to prove that the average number of prime  1  factors of an integer in [x, 2x] is log log x + c + O log x .  1  b) Show that if n is in this interval then this quantity equals log log n+c+O log n . c) Continue on to prove that log log n = log log x + o(1) for almost all integers n ≤ x (Hint: One way to do so would be to start by considering integers n in the two ranges x ≥ n ≥ x/ log x, and then n ≤ x/ log x). 1 P Exercise 2.9.2. a) If a1, . . . , aN have average m show that N n≤N (an − 2 1 P 2 2 m) = N n≤N an − m . b) Justify (2.9.3). Exercise 2.9.3. (Erd˝os’s multiplication table theorem) Show that there are o(x2) distinct integers n ≤ x2 that can be written as the product of two integers ≤ x. (Hint: Compare the typical number of prime factors of ab where a, b ≤ x with the typical number of prime factors of n.)

2.10. How many primes are there up to x? Let π(x) denote the number of primes up to x. After (2.6.2) and (2.6.3) we know that for any  > 0 if x is sufficiently large then x x (log 2 − ) ≤ π(x) ≤ (log 4 + ) ; log x log x so one might guess that there exists some number L, between log 2 and log 4, such x that π(x)/( log x ) → L as x → ∞. Such a suggestion was first published by Le- gendre in 1808, with L = 1, in fact with the more precise assertion that there exists a constant B such that π(x) is well approximated by x/(log x−B) for large enough x. Actually, back in 1792 or 1793, at the age of 15 or 16, Gauss had already made a much better guess, based on studying tables of primes.1 His observation may be best quoted as About 1 in log x of the integers near x are prime. This suggests that a good approximation to the number of primes2 up to x is Px n=2 1/ log n. Using the method of exercise 2.2.1 this is, up to an error of O(1), equal to Z x dt (2.10.1) . 2 log t We denote this quantity by Li(x) and call it the logarithmic . Here is a comparison of Gauss’s prediction with the actual count of primes up to various values of x:

1On Christmas Eve 1847 he wrote to Encke, “In 1792 or 1793 ... I turned my attention to the decreasing frequency of primes ... counting the primes in intervals of length 1000. I soon recognized that behind all of the fluctuations, this frequency is on average inversely proportional to the logarithm. 2Logarithms in base e are the natural logarithms. This base becomes the natural choice, after learning about Gauss’s correct prediction for the density of primes at around x. 2.10. HOW MANY PRIMES ARE THERE UP TO x? 29

x π(x) = #{primes ≤ x} Overcount: Li(x) − π(x) 103 168 10 104 1229 17 105 9592 38 106 78498 130 107 664579 339 108 5761455 754 109 50847534 1701 1010 455052511 3104 1011 4118054813 11588 1012 37607912018 38263 1013 346065536839 108971 1014 3204941750802 314890 1015 29844570422669 1052619 1016 279238341033925 3214632 1017 2623557157654233 7956589 1018 24739954287740860 21949555 1019 234057667276344607 99877775 1020 2220819602560918840 222744644 1021 21127269486018731928 597394254 1022 201467286689315906290 1932355208 1023 1925320391606818006727 7236148412 1024 18435599767349200867866 17146907278 1025 176846309399143769411680 55160980939 Table 1: Primes up to various x, and the overcount in Gauss’s prediction. Gauss’s prediction is amazingly accurate. It does seem to always be an overcount, and since the width of the last column is about√ half that of the central one, it appears that the difference is no bigger than x, perhaps multiplied by a constant. The data certainly suggests that π(x)/Li(x) → 1 as x → ∞. If we integrate (2.10.1) by parts then x Z x dt x  x  Li(x) = + O(1) + 2 = + O 2 . log x 2 (log t) log x (log x) Integrating by parts several more times we find, for any fixed integer N > 0, that x x 2! x (N − 1)! x  x  (2.10.2) Li(x) = + + + ... + + O log x (log x)2 (log x)3 (log x)N−1 N (log x)N x and so Li(x)/( log x ) → 1 as x → ∞. Combining this with Gauss’s prediction gives x that π(x)/( log x ) → 1 as x → ∞. The notation of limits is rather cumbersome notation – it is easier to write x (2.10.3) π(x) ∼ log x as x → ∞,“π(x) is asymptotic to x/ log x”. In general, A(x) ∼ B(x) is equivalent to limx→∞ A(x)/B(x) = 1. If one wishes to be more precise than (2.10.3) about the difference between the two sides one might write x  x  (2.10.4) π(x) = + O , log x (log x)2 30 2. INFINITELY MANY PRIMES, WITH ANALYSIS so the difference between π(x) and x/ log x is bounded by a constant multiple of x/(log x)2. Confronted with a formula like (2.10.4) one might only be concerned as to whether the secondary term of the right side, O(x/(log x)2) is actually much smaller than the main term, x/ log x. Specifically whether (x/(log x)2)(x/ log x) → 0 as x → ∞: of course it does and we use the notation x/(log x)2 = o(x/ log x). In general, A(x) = o(B(x)) is equivalent to limx→∞ A(x)/B(x) = 0 and we say “A(x) is little oh of B(x)”. Thus the right side of (2.10.4) can we rewritten as x  x  x + o , or even {1 + o(1)} log x log x log x which is equivalent to (2.10.3). The asymptotic (2.10.3) is called The Prime Number Theorem and its proof had to wait until the end of the nineteenth century, requiring various remarkable developments. The proof was a high point of nineteenth century mathematics and there is still no straightforward proof. There are reasons for this: Surprisingly the prime number theorem is equivalent to a statement about zeros of an analytic continuation, and although a proof can be given that hides this fact, it is still lurking somewhere just beneath the surface, perhaps inevitably so. We will now show that if L exists (as at the start of this section) then it must equal 1: Suppose that π(x) = {L + o(1)}x/ log x. Using partial summation (taking f(t) = (log t)/t in (2.8.4) where a(n) = 1 if n is prime and 0 otherwise) we obtain X log p log x Z x log t − 1 = π(x) + π(t)dt p x t2 p≤x 1 Z x log t − 1 t = O(1) + {L + o(1)} 2 dt = L log x + o(log x). 1 t log t Comparing this to (2.8.1), we deduce that L = 1.

Exercises

Exercise 2.10.1. Prove that the difference between Gauss’s prediction, Li(x), and Legendre’s prediction, x/(log x − B), is ≥ x/(2(log x)3) for x sufficiently large, no matter what the choice of B. Exercise 2.10.2. Prove (2.10.2). Exercise 2.10.3. Assuming the prime number theorem, show that for all  > 0 there are primes between x and x + x is x is sufficiently large. Deduce that R≥0 is the set of limit points of the set {p/q : p, q primes}. Exercise 2.10.4. Assume that π(x) = x/ log x + {1 + o(1)}x/(log x)2. Prove that if x is sufficiently large then there are more primes up to x than between x and 2x. Exercise 2.10.5. Show that if π(x) = x/ log x+x/(log x)2+O(x/(log x)2+) for P some  > 0 then there exists a contant c such that p≤x(log p)/p = log x+c+o(1). 2.11. The Gauss-Cram`erheuristic

Let X3,X4,X5,... be a sequence of independent random variables, each only taking values 0 or 1, with Prob(Xn = 1) = 1/ log n for each n ≥ 3. Thinking of the odd primes as the sequence x3 = 1, x4 = 0, x5 = 1,... etc., where xn = 1 if and 2.12. SMOOTHING OUT THE PRIME COUNTING FUNCTION 31 only if n is an odd prime, the Gauss-Cram`erheuristic states that anything that holds with probability 1 for sequences of 0’s and 1’s (according to the probability distribution on the Xn’s) is also true for the primes. For example one can show that there exists a constant A such that X 1/2 A Xn = Li(N) + O(N (log N) ) n≤N holds with probability → 1 as N → ∞ and therefore, we believe it is true of the primes; that is the difference between the actual√ count of primes and Gauss’s guesstimate should, more-or-less by bounded by x. This is supported by our data: In Table 1 in section 2.10 we observed that the difference Li(x) − π(x) appears to be about the square root of π(x). If we do not worry about the power of log x then, given the heuristic and the data, we feel comfortable in conjecturing that there exists a constant A > −1 such that √ (2.11.1) π(x) = Li(x) + O( x(log x)A). One of the great conjectures of mathematics is the Riemann Hypothesis which we shall discuss in detail.3 The Riemann Hypothesis is a statement concerning the zeros of the analytic continuation of the function we defined in (2.2.1) for Re(s) > 1. At first sight it is a rather difficult and esoteric question to appreciate. Riemann developed this question because he was thinking about the difference Li(x) − π(x). He showed the remarkable result that the Riemann Hypothesis is equivalent to (2.11.1). It is intriguing how a question of esoteric complex analysis could be equivalent to something as down-to-earth as the count of primes. The Gauss-Cram`erheuristic leads us to various other plausible conjectures including Cram`er’sconjectured bound on the largest possible gaps between consec- utive primes: Let p1 = 2, p2 = 3,... be the sequence of primes. Cram`er conjectured that 2 (2.11.2) max (pn+1 − pn) = {1 + o(1)}(log x) . pn≤x One must be a little wary of the Gauss-Cram`erheuristic method since it can offer some ridiculous predictions; for example, that there are infinitely many pairs of primes m, m + 1. This is because the heuristic does not take account of divisibility by small primes (e.g. that an even integer > 2 must be composite), it only takes account of the density of primes amongst the integers of a certain size. One can however modify the heuristic to take account also of divisibility by small primes and thus not make such egregiously wrong predictions.

2.12. Smoothing out the prime counting function In this section we will explore counting the primes with different weights, which leads us to a better reformulation of Gauss’s heuristic and the prime number the- orem Table 1 indicates that Gauss’s guesstimate Li(x) is indeed a startling good ap- proximation to π(x), the number of primes up to x. Although Li(x), as defined in (2.10.1) can be written in a rather compact form, its value is most easily approxi- mated by the complicated asymptotic series (2.10.2) which is far from easy to work

3If you are reading this book there is a good chance you have heard of this notorious conjecture. 32 2. INFINITELY MANY PRIMES, WITH ANALYSIS with. Early researchers realized that if one counts primes with the weight log p then Gauss’s guesstimate looks a lot nicer: We define X θ(x) := log p p prime p≤x P 1 and expect that this should be well-approximated by n≤x log n · log n = x. Most easily we have X θ(x) ≤ log x = π(x) log x. p prime p≤x More precisely Z x Z x π(t) − Li(t) θ(x) − x = log t d{π(t) − Li(t)} = (π(x) − Li(x)) log x − dt, 1 1 t from which we deduce that (2.12.1) |θ(x) − x| ≤ 2 log x · max |π(t) − Li(t)|. 1≤t≤x Therefore if π(x) is well approximated by Li(x) then θ(x) is well approximated by x. In the other direction Z x 1 θ(x) − x 2 Z x θ(t) − t (2.12.2) π(x) − Li(x) = d{θ(t) − t} = + + 2 dt. 2− log t log x log 2 2 t(log t) Riemann recognized, for technical reasons that we will come to later, that including the prime powers up to x leads to a “better” counting function. So define X ψ(x) := log p. p prime, a≥1 pa≤x It is not difficult to relate ψ(x) to θ(x), and so to π(x); see exercise 2.12.2. How about the size of the nth prime: Let p1 = 2 < p2 = 3 < . . . be the sequence of primes. By inverting the relation π(x) ∼ Li(x) we can deduce that pn ∼ n log n, and obtain better estimates for pn the smaller that |π(x) − Li(x)| is. In the previous section we discussed the conjecture that √ π(x) = Li(x) + O( x(log x)A). √ A+1 This is√ equivalent to the two estimates θ(x) = x + O( x(log x) ), and ψ(x) = x + O( x(log x)A+1), using partial summation.

Exercises

Exercise 2.12.1. Use (2.12.1) and (2.12.2) to show that √ √max |θ(t) − t|  log x √max |π(t) − Li(t)| + O( x) x≤t≤x x≤t≤x

1/2 1/3 Exercise 2.12.2. √Prove that ψ(x) = θ(x) + θ(x ) + θ(x ) + .... Deduce that ψ(x) = θ(x) + O( x). Exercise 2.12.3. Prove that π(x) ∼ Li(x) if and only if ψ(x) ∼ x. Exercise 2.12.4. Suppose that π(x) is asymptotic to the first three terms of the asymptotic expansion of Li(x) as in (2.10.2). What does this imply about pn? Exercise 2.12.5. Prove that lcm[1, 2, . . . , n] = eψ(n). 2.13. AN ATTEMPT TO PROVE THE PRIME NUMBER THEOREM 33

2.13. An attempt to prove the prime number theorem Our proof of Bertrand’s postulate can be re-interpreted as writing sums of log p in terms of sums log(αn)! with various weights, for various α, and then calculating that sum using (weak forms) of Stirling’s formula. This leads one to ask whether one can do this more precisely? We shall prove the identity X X hxi  (2.13.1) ψ(x) = log p = µ(d) log ! . d p prime, a≥1 d≤x pa≤x To prove this it is convenient to use von Mangoldt’s notation, ( log p if n = pa, where p is prime, and a ≥ 1; (2.13.2) Λ(n) = 0 otherwise. P We have log n = m|n Λ(m). By Mobius inversion one deduces the remarkable identity X (2.13.3) Λ(n) = µ(d) log(n/d). d|n Now summing this over all n ≤ x we obtain X X X X X X ψ(x) = Λ(n) = µ(d) log m = µ(d) log m = µ(d) log m, n≤x n≤x dm=n dm≤x d≤x m≤x/d which is (2.13.1). One must either estimate log([x/d]!) in terms of the non-smooth function [x/d], or accept an error of size O(log(x/d)). The first case is too complicated, and the errors accumulate to O(x) by exercise 2.8.4 in the second case. This is a valid approach to the prime number theorem, which we look at further in section 3.5.

CHAPTER 3

The prime number theorem

In this chapter we repeat some of the arguments from the previous chapter but in more sophisticated language, which allows us to better develop some of the key objects of study. As a boy Gauss determined, from studying the primes up to three million, that the density of primes around x is 1/ log x, leading him to conjecture that the number of primes up to x is well-approximated by the estimate

X x (3.0.4) π(x) := 1 ∼ . log x p≤x

It is less intuitive, but simpler, to weight each prime with log p; and to include the prime powers in the sum (which has little impact on the size). Thus we define the von Mangoldt function

( log p if n = pm, where p is prime, and m ≥ 1 (3.0.5) Λ(n) := 0 otherwise, and then, in place of (3.0.4), we conjecture that

X (3.0.6) ψ(x) := Λ(n) ∼ x. n≤x

The equivalent estimates (3.0.4) and (3.0.6), known as the prime number theorem, are difficult to prove. In this chapter we show how the prime number theorem is equivalent to understanding the mean value of the M¨obiusfunction. This will motivate our study of multiplicative functions in general, and provide new ways of looking at many of the classical questions in analytic number theory.

3.1. Partial Summation

Given a sequence of complex numbers an, and some function f : R → C, we wish to determine the value of

B X anf(n) n=A+1

35 36 3. THE PRIME NUMBER THEOREM

P from estimates for the partial sums S(t) := k≤t ak. Usually f is continuously differentiable on [A, B], so we can replace our sum by the appropriate Riemann- Stieltjes integral, and then integrate by parts as follows:1 Z B+ Z B X B 0 anf(n) = f(t)d(S(t)) = [S(t)f(t)]A − S(t)f (t)dt + A 1. ns ps n=1 p This definition is restricted to the region Re(s) > 1, since it is only there that this and this both converge absolutely (see the next subsection for definitions). Exercise 3.1.2. (i) Prove that for Re(s) > 1 Z ∞ [y] s Z ∞ {y} ζ(s) = s s+1 dy = − s s+1 dy. 1 y s − 1 1 y where throughout we write [t] for the integer part of t, and {t} for its fractional part (so that t = [t] + {t}). The right hand side is an analytic function of s in the region Re(s) > 0 except for a simple pole at s = 1 with residue 1. Thus we have an analytic continuation of ζ(s) to this larger region, and near s = 1 we have the Laurent expansion 1 ζ(s) = + γ + c (s − 1) + .... s − 1 1 (The value of the constant γ is given in exercise 3.6.1.)

1The notation “t+” denotes a real number “marginally” larger than t. 3.2. CHEBYSHEV’S ELEMENTARY ESTIMATES 37

1 1 (ii) Deduce that ζ(1 + log x ) = log x + γ + Ox→∞( log x ). (iii) † Adapt the argument in Exercise 3.6.2 to obtain an analytic continuation of ζ(s) to the region Re(s) > −1. (iv) † Generalize.

3.2. Chebyshev’s elementary estimates Chebyshev made significant progress on the distribution of primes by showing that there are constants 0 < c < 1 < C with x x (3.2.1) (c + o(1)) ≤ π(x) ≤ (C + o(1)) log x log x (as we saw in (2.6.2) and (2.6.3)). Moreover he showed that if π(x) lim x→∞ x/ log x exists, then it must equal 1 (as we saw at the end of section 2.10). We now rework these proofs in slightly more sophisticated notation. The key to obtaining such information is to write the prime factorization of n in the form X log n = Λ(d). d|n Summing both sides over n (and re-writing “d|n” as “n = dk”), we obtain that ∞ X X X X (3.2.2) log n = Λ(d) = ψ(x/k). n≤x n≤x n=dk k=1 Using Stirling’s formula, Exercise 3.6.2, we deduce that ∞ X (3.2.3) ψ(x/k) = x log x − x + O(log x). k=1 Exercise 3.2.1. Use (3.2.3) to prove that ψ(x) ψ(x) lim sup ≥ 1 ≥ lim inf , x→∞ x x→∞ x so that if limx→∞ ψ(x)/x exists it must be 1. To obtain Chebyshev’s estimates (3.2.1), take (3.2.2) at 2x and subtract twice that relation taken at x. This yields x log 4 + O(log x) = ψ(2x) − ψ(2x/2) + ψ(2x/3) − ψ(2x/4) + ..., and upper and lower estimates for the right hand side above follow upon truncating the series after an odd or even number of steps. In particular we obtain that ψ(2x) ≥ x log 4 + O(log x), which gives the lower bound of (3.2.1) with c = log 2 a permissible value. And we also obtain that ψ(2x) − ψ(x) ≤ x log 4 + O(log x), which, when used at x/2, x/4, ... and summed, leads to ψ(x) ≤ x log 4+O((log x)2). Thus we obtain the upper bound in (3.2.1) with C = log 4 a permissible value. 38 3. THE PRIME NUMBER THEOREM

Returning to (3.2.2), we may recast it as X X X X x  log n = Λ(d) 1 = Λ(d) + O(1) . d n≤x d≤x k≤x/d d≤x Using Stirling’s formula, and the recently established ψ(x) = O(x), we conclude that X Λ(d) x log x + O(x) = x , d d≤x or in other words X log p X Λ(n) (3.2.4) = + O(1) = log x + O(1). p n p≤x n≤x In this proof we see interplay between algebra (summing the identity log n = P d|n Λ(d)) and analysis (evaluating log [x]! using Stirling’s formula), which fore- shadows much of what is to come.

3.3. Multiplicative functions and Dirichlet series The main objects of study in this book are multiplicative functions. These are functions f : N → C satisfying f(mn) = f(m)f(n) for all coprime integers m and n. If the relation f(mn) = f(m)f(n) holds for all integers m and n we say that f is Q αj completely multiplicative. If n = j pj is the prime factorization of n, where the Q αj primes pj are distinct, then f(n) = j f(pj ) for multiplicative functions f. Thus a multiplicative function is specified by its values at prime powers and a completely multiplicative function is specified by its values at primes. One can study the multiplicative function f(n) using the Dirichlet series, ∞ X f(n) Y  f(p) f(p2)  F (s) = = 1 + + + ... . ns ps p2s n=1 p The product over primes above is called an Euler product, and viewed formally the equality of the Dirichlet series and the Euler product above is a restatement of the unique factorization of integers into primes. If we suppose that the multiplicative function f does not grow rapidy – for example, that |f(n)|  nA for some constant A – then the Dirichlet series and Euler product will converge absolutely in some half-plane with Re(s) suitably large. Given any two functions f and g from N → C (not necessarily multiplicative), their f ∗ g is defined by X (f ∗ g)(n) = f(a)g(b). ab=n P∞ −s P∞ −s If F (s) = n=1 f(n)n and G(s) = n=1 g(n)n are the associated Dirichlet series, then the convolution f ∗ g corresponds to their product: ∞ X (f ∗ g)(n) F (s)G(s) = . ns n=1 The basic multiplicative functions and their associated Dirichlet series are: • The function δ(1) = 1 and δ(n) = 0 for all n ≥ 2 has the associated Dirichlet series 1. 3.3. MULTIPLICATIVE FUNCTIONS AND DIRICHLET SERIES 39

• The function 1(n) = 1 for all n ∈ N has the associated Dirichlet series ζ(s) which converges absolutely when Re(s) > 1, and whose analytic continuation we discussed in Exercise 3.1.2. • For a k, the k-divisor function dk(n) counts the number of ways of writing n as a1 ··· ak. That is, dk is the k-fold convolution of the function k 1(n), and its associated Dirichlet series is ζ(s) . The function d2(n) is called the divisor function and denoted simply by d(n). More generally, for any complex s number z, the z-th divisor function dz(n) is defined as the coefficient of 1/n in the Dirichlet series, ζ(s)z.2 • The M¨obiusfunction µ(n) is defined to be 0 if n is divisible by the square of some prime and, if n is square-free, µ(n) is 1 or −1 depending on whether n has an even or odd number of prime factors. The associated Dirichlet series P∞ −s −1 n=1 µ(n)n = ζ(s) so that µ is the same as d−1. We deduce that µ ∗ 1 = δ. • The von Mangoldt function Λ(n) is not multiplicative, but is of great interest to us. We write its associated Dirichlet series as L(s). Since

X log n = Λ(d) = (1 ∗ Λ)(n) d|n hence −ζ0(s) = L(s)ζ(s), that is L(s) = (−ζ0/ζ)(s). Writing this as

1 · (−ζ0(s)) ζ(s) we deduce that X (3.3.1) Λ(n) = (µ ∗ log)(n) = µ(a) log b. ab=n

As mentioned earlier, our goal in this chapter is to show that the prime number theorem is equivalent to a statement about the mean value of the multiplicative function µ. We now formulate this equivalence precisely.

Theorem 3.3.1 PNT and the mean of the Mobius˝ function

The prime number theorem, namely ψ(x) = x + o(x), is equivalent to X (3.3.2) M(x) = µ(n) = o(x). n≤x

In other words, half the non-zero values of µ(n) equal 1, the other half −1. Before we can prove this, we need one more ingredient: namely, we need to understand the average value of the divisor function.

2To explicitly determine ζ(s)z it is easiest to expand each factor in the Euler product using z Q  P −z −s k the generalized binomial theorem, so that ζ(s) = p 1 + k≥1 k (−p ) . 40 3. THE PRIME NUMBER THEOREM

3.4. The average value of the divisor function and Dirichlet’s hyperbola method P We wish to evaluate asymptotically n≤x d(n). An immediate idea gives X X X X X d(n) = 1 = 1 n≤x n≤x d|n d≤x n≤x d|n X hxi X x  = = + O(1) d d d≤x d≤x = x log x + O(x). Dirichlet realized that one can substantially improve the error term above by pairing each divisor a of an integer n with its complementary divisor b = n/a; one minor 2 exception is√ when n = m and the divisor m cannot be so paired. Since a or n/a must be ≤ n we have X X d(n) = 1 = 2 1 + δn, d|n d|n √ d< n where δn = 1 if n is a square, and 0 otherwise. Therefore X X X X d(n) = 2 1 + 1 d|n n≤x n≤x n≤x √ d< n n=d2 X  X  = 1 + 2 1 √ d≤ x d2 0). For our subsequent work, we use Exercise 3.6.2 to recast (3.4.1) as X √ (3.4.2) (log n − d(n) + 2γ) = O( x). n≤x Since we developing our technique here, we ask What proportion of pairs of integers are pairwise coprime? In combinatorics the inclusion-exclusion principle often boils down to the identity (1 − 1)n = 1 if n = 0, and 0 otherwise. In analytic 3.5. THE PRIME NUMBER THEOREM AND THE MOBIUS¨ FUNCTION: PROOF OF THEOREM 3.3.141 number theory it often boils down to the identity ( X 1 if m = 1 µ(d) = 0 otherwise. d|m Here, the M¨obius function, µ(n), is defined to (−1)k if n is squarefree and the product of k distinct primes, and is 0 if n is divisible by the square of a prime. So, to determine the number of pairs of integers up to x that are pairwise coprime we have X X X X X X X X hxi2 1 = µ(d) = µ(d) 1 = µ(d) 1 = µ(d) . d a,b≤x a,b≤x d|(a,b) d≤x a,b≤x d≤x a,b≤x d≤x (a,b)=1 d|(a,b) d|a, d|b Finally we replace [t] by t + O(1) and it is the readers challenge to complete the 2 Q 2 proof that this comes to cx + O(x log x) where c = p prime(1 − 1/p ), so that the proportion of pairs of integers that are pairwise coprime is c (which happens to equal 6/π2).

Exercises

Exercise 3.4.1. What is the average size of the gcd of two integers ≤ x? P 2 (ii) Prove that the average is c(x)+O(1) where c(x) := d≤x φ(d)/d . (Hint: P Write g = d|g φ(d) where g = (a, b).) (iii) Prove that c(x) → ∞ as x → ∞, but that c(x) ≤ log x + 1. P (iv) Show that φ(m)/m = d|m µ(d)/d. Deduce that X µ(d) X 1 6 c(x) = = log x + O(1). d2 n π2 d≤x n≤x/d (v) Prove that the average size of the gcd of k integers ≤ x is ζ(k −1)/ζ(k)+ o(1) for each k ≥ 3. Exercise 3.4.2. Prove that the average lcm of two randomly chosen integers is, asymptotically, their product times a constant κ, where κ = ζ(3)/ζ(2). (Part of the problem is to formulate the question so as to be able to apply analytic methods.)

3.5. The prime number theorem and the M¨obiusfunction: proof of Theorem 3.3.1 This and the next section are not part of this course but included for the student’s interest. P First we show that the estimate M(x) = n≤x µ(n) = o(x) implies the prime number theorem ψ(x) = x + o(x). Define the a(n) = log n − d(n) + 2γ, so that a(n) = (1 ∗ (Λ − 1))(n) + 2γ1(n). When we form the Dirichlet convolution of a with the M¨obiusfunction we therefore obtain (µ ∗ a)(n) = (µ ∗ 1 ∗ (Λ − 1))(n) + 2γ(µ ∗ 1)(n) = (Λ − 1)(n) + 2γδ(n), 42 3. THE PRIME NUMBER THEOREM where δ(1) = 1, and δ(n) = 0 for n > 1. Hence, when we sum (µ ∗ a)(n) over all n ≤ x, we obtain X X (µ ∗ a)(n) = (Λ(n) − 1) + 2γ = ψ(x) − x + O(1). n≤x n≤x On the other hand, we may write the left hand side above as X µ(d)a(k), dk≤x and, as in the hyperbola method, split this into terms where k ≤ K or k > K (in which case d ≤ x/K). Thus we find that X X X X µ(d)a(k) = a(k)M(x/k) + µ(d) a(k). dk≤x k≤K d≤x/K K 0 and select K to be the smallest 2 P integer > 1/ , and then let αK := k≤K |a(k)|/k. Finally choose y so that |M(y)| ≤ (/αk)y whenever y ≥ y. Inserting all this into the last line for x ≥ Ky P yields ψ(x) − x  (/αk)x k≤K |a(k)|/k + x  x. We may conclude that ψ(x) − x = o(x), the prime number theorem. Now we turn to the converse. Consider the arithmetic function −µ(n) log n which is the coefficient of 1/ns in the Dirichlet series (1/ζ(s))0. Since  1 0 ζ0(s) ζ0 1 = − = − (s) · , ζ(s) ζ(s)2 ζ ζ(s) we obtain the identity −µ(n) log n = (µ ∗ Λ)(n). As µ ∗ 1 = δ, we find that X X (3.5.1) (µ ∗ (Λ − 1))(n) = − µ(n) log n − 1. n≤x n≤x The right hand side of (3.5.1) is X X  X  − log x µ(n) + µ(n) log(x/n) − 1 = −(log x)M(x) + O log(x/n) n≤x n≤x n≤x = −(log x)M(x) + O(x), upon using Exercise 3.6.2. The left hand side of (3.5.1) is X X   µ(a)(Λ(b) − 1) = µ(a) ψ(x/a) − [x/a] . ab≤x a≤x Now suppose that ψ(x) − [x] = o(x), the prime number theorem, so that, for given 1/  > 0 we have |ψ(t) − t| ≤ t if t ≥ T. Suppose that T ≥ T and x > T . Using this |ψ(x/a) − x/a| ≤ x/a for a ≤ x/T (so that x/a > T ), and the Chebyshev 3.6. SELBERG’S FORMULA 43 estimate |ψ(x/a) − x/a|  x/a for x/T ≤ a ≤ x, we find that the left hand side of (3.5.1) is X X  x/a + x/a  x log x + x log T. a≤x/T x/T ≤a≤x Combining these observations, we find that log T |M(x)|  x + x  x, log x if x is sufficiently large. Since  was arbitrary, we have demonstrated that M(x) = o(x).

3.6. Selberg’s formula The elementary techniques discussed above were brilliantly used by Selberg to get an asymptotic formula for a suitably weighted sum of primes and products of two primes. Selberg’s formula then led Erd˝osand Selberg to discover elementary proofs of the prime number theorem. We will not discuss these elementary proofs of the prime number theorem here, but let us see how Selberg’s formula follows from the ideas developed so far.

Theorem 3.6.1 Selberg’s formula

We have X X (log p)2 + (log p)(log q) = 2x log x + O(x). p≤x pq≤x

P Proof. We define Λ2(n) := Λ(n) log n + `m=n Λ(`)Λ(m). Thus Λ2(n) is the coefficient of 1/ns in the Dirichlet series ζ0 0 ζ0 2 ζ00(s) (s) + (s) = , ζ ζ ζ(s) 2 so that Λ2 = (µ ∗ (log) ). In the previous section we exploited the fact that Λ = (µ ∗ log) and that the function d(n) − 2γ has the same average value as log n. Now we search for a divisor type function which has the same average as (log n)2. By partial summation we find that X (log n)2 = x(log x)2 − 2x log x + 2x + O((log x)2). n≤x

Using Exercise 3.6.11 we may find constants c2 and c1 such that X 2 2/3+ (2d3(n) + c2d(n) + c1) = x(log x) − 2x log x + 2x + O(x ). n≤x 2 Set b(n) = (log n) − 2d3(n) − c2d(n) − c1 so that the last two displayed equations give X (3.6.1) b(n) = O(x2/3+). n≤x 44 3. THE PRIME NUMBER THEOREM

Now consider (µ ∗ b)(n) = Λ2(n) − 2d(n) − c2 − c1δ(n), and summing this over all n ≤ x we get that X X (µ ∗ b)(n) = Λ2(n) − 2x log x + O(x). n≤x n≤x The left hand side is X X X µ(k) b(l)  (x/k)2/3+  x k≤x l≤x/k k≤x by (3.6.1), and we conclude that X Λ2(n) = 2x log x + O(x). n≤x The difference between the left hand side above and the left hand side of our desired√ formula is the contribution of the prime powers, which is easily shown to be  x log x, and so our Theorem follows. 

Exercises

Exercise 3.6.1. ∗ (i) Using partial summation, prove that for any x ≥ 1 X 1 [x] Z x {t} = log x + − dt. n x t2 1≤n≤x 1 (ii) Deduce that for any x ≥ 1 we have the approximation

X 1 1 − (log x + γ) ≤ , n x n≤x where γ is the Euler-Mascheroni constant, N  X 1  Z ∞ {t} γ := lim − log N = 1 − dt. N→∞ n t2 n=1 1 This exercise gives a different approach to the result in exercise 2.2.1. Exercise 3.6.2. (i) For an integer N ≥ 1 show that Z N {t} log N! = N log N − N + 1 + dt. 1 t P (ii) Deduce that x − 1 ≥ n≤x log(x/n) ≥ x − 2 − log x for all x ≥ 1. R x 2 (iii) Using that 1 ({t} − 1/2)dt = ({x} − {x})/2 and integrating by parts, show that Z N {t} 1 1 Z N {t} − {t}2 dt = log N − 2 dt. 1 t 2 2 1 t √ (iv) Conclude that N! = C N(N/e)N {1 + O(1/N)}, where  1 Z ∞ {t} − {t}2  √ C = exp 1 − 2 dt . In fact C = 2π, 2 1 t √ and the resulting asymptotic for N!, namely N! ∼ 2πN(N/e)N , is known as Stirling’s formula. This exercise gives a different approach to that of exercise 2.8.3. 3.6. SELBERG’S FORMULA 45

Exercise 3.6.3. ∗ (i) Prove that for Re(s) > 0 we have

N X 1 Z N dt 1 Z ∞ {y} − = ζ(s) − + s dy. ns ts s − 1 ys+1 n=1 1 N (ii) Deduce that, in this same range but with s 6= 1, we can define

( N ) X 1 N 1−s ζ(s) = lim − . N→∞ ns 1 − s n=1 Exercise 3.6.4. ∗ Using that ψ(2x)−ψ(x)+ψ(2x/3) ≥ x log 4+O(log x), prove Bertrand’s postulate that there is a prime between N and 2N, for N sufficiently large. (This is a more sophisticated reworking of the proof in section 2.7.) Exercise 3.6.5. The first part of this exercise is discussed in section 2.8 P (i) Using (3.2.2), prove that if L(x) := n≤x log n then ψ(x) − ψ(x/6) ≤ L(x) − L(x/2) − L(x/3) − L(x/5) + L(x/30) ≤ ψ(x). (ii) Deduce, using (3.2.3), that with log 2 log 3 log 5 log 30 κ = + + − = 0.9212920229 ..., 2 3 5 30 6 2 we have κx + O(log x) ≤ ψ(x) ≤ 5 κx + O(log x). (iii) † Improve on these bounds by similar methods. Exercise 3.6.6. (i) Use partial summation to prove that if X Λ(n) − 1 lim exists, N→∞ n n≤N then the prime number theorem, in the form ψ(x) = x + o(x), follows. (ii) † Prove that the prime number theorem implies that this limit holds. (iii) Using exercise 3.1.2, prove that −(ζ0/ζ)(s) − ζ(s) has a Taylor expansion 0 −2γ + c1(s − 1) + ... around s = 1. (iv) Explain why we cannot then deduce that X Λ(n) − 1 X Λ(n) − 1 lim = lim , which exists and equals − 2γ. N→∞ n s→1+ ns n≤N n≥1 Exercise 3.6.7. ∗ The first part of this exercise was worked out in section 2.8, but the reader should try for her-or-himself. (i) Use (3.2.4) and partial summation show that there is a constant c such that X 1  1  = log log x + c + O . p log x p≤x (ii) Deduce Mertens’ Theorem, that there exists a constant γ such that Y  1 e−γ 1 − ∼ . p log x p≤x 46 3. THE PRIME NUMBER THEOREM

In the two preceding exercises the constant γ is in fact the Euler-Mascheroni constant, but this is not so straightforward to establish. The next exercise gives one way of obtaining information about the constant in Exercise 3.6.7. Exercise 3.6.8. † In this exercise, put σ = 1 + 1/ log x. (i) Show that X  1 −1 X 1  1  Z ∞ e−t  1  log 1 − = + O = dt + O . pσ pσ x t log x p>x p>x 1 (ii) Show that X   1 −1  1−1 Z 1 1 − e−t  1  log 1 − − log 1 − = − dt + O . pσ p t log x p≤x 0 (iii) Conclude, using exercise 3.1.2, that the constant γ in exercise 3.6.7(ii) equals Z 1 1 − e−t Z ∞ e−t dt − dt. 0 t 1 t That this equals the Euler-Mascheroni constant is established in [?]. ∗ 1 Exercise 3.6.9. Uniformly for η in the range log y  η < 1, show that X log p yη  ; p1−η η p≤y and X 1  yη  ≤ log(1/η) + O . p1−η log(yη) p≤y Hint: Split the sum into those primes with pη  1, and those with pη  1.

Exercise 3.6.10. ∗ If f and g are functions from N to C, show that the relation f = 1 ∗ g is equivalent to the relation g = µ ∗ f. (Given two proofs.) This is known as M¨obiusinversion. Exercise 3.6.11. (i) Given a natural number k, use the hyperbola method together with induction and partial summation to show that X 1−1/k+ dk(n) = xPk(log x) + O(x ) n≤x k−1 where Pk(t) denotes a polynomial of degree k − 1 with leading term t /(k − 1)!. 0 (ii) Deduce, using partial summation, that if Rk(t) + Rk(t) = Pk(t) then X 1−1/k+ dk(n) log(x/n) = xRk(log x) + O(x ). n≤x R u (iii) Deduce, using partial summation, that if Qk(u) = Pk(u) + t=0 Pk(t)dt then

X dk(n) = Q (log x) + O(1). n k n≤x Analogies of these estimates hold for any real k > 0, in which case (k − 1)! is replaced by Γ(k). Exercise 3.6.12. Modify the above proof to show that 3.6. SELBERG’S FORMULA 47

(i) If M(x)  x/(log x)A then ψ(x) − x  x(log log x)2/(log x)A. (ii) Conversely, if ψ(x) − x  x/(log x)A then M(x)  x/(log x)min(1,A). Exercise 3.6.13. (i) ∗ Show that X M(x) log x = − log p M(x/p) + O(x). p≤x (ii) Deduce that M(x) M(x) lim inf + lim sup = 0. x→∞ x x→∞ x (iii) Use Selberg’s formula to prove that X  x x (ψ(x) − x) log x = − log p ψ − + O(x). p p p≤x (iv) Deduce that ψ(x) − x ψ(x) − x lim inf + lim sup = 0. x→∞ x x→∞ x Compare!

CHAPTER 4

Introduction to Sieve Theory

4.1. The sieve of Eratosthenes In ancient Greece, Eratosthenes gave a rather efficient algorithm to determine all of the primes up to some given point x. The procedure works recursively: once we know the primes up to N, we write out the integers between 1 and N 2, and for each prime p ≤ N we cross out every pth integer (starting with p) up to N 2. What remains at the end are 1 and the primes in (N,N 2]. One can analyze the number of steps fairly easily: First we initialize an array of length N 2, which can be done in N 2 steps. Then for each prime p we mark every 2 2 P pth element in the array, about N /p marks, and in total, roughly N p≤N 1/p ∼ N 2 log log N. Then we need to search for the unmarked elements of the array to identify the primes. Now log log(the number of atoms in the universe) is < 7 and so this algorithm will take no more than about 8N 2 steps! Pretty efficient. Can the algorithm be analyzed so as to give an estimate for the number of primes up to x? When we remove the integers divisible by 2, the number left is hxi x  x x − = x − + O(1) = + O(1). 2 2 2 In fact the O(1) is either 0 or 1. When we also remove the primes divisible by 3, we have, by the inclusion-exclusion principle hxi hxi hxi x x x  1  1 x − − + = x − − + + O(1) = 1 − 1 − x + O(1). 2 3 6 2 3 6 2 3 We need to be a little careful about the “O(1)”; here we have ± three different O(1)’s. As we generalize, say to sieving by the primes up to y, the exact formula is X x X  x  X  x  x − + − + ... p pq pqr p≤y p,q≤y p,q,r≤y Replacing each [T ] by T + O(1), leaves us with a main term of X x X x X x Y  1 (4.1.1) x − + − + ... = x 1 − . p pq pqr p p≤y p,q≤y p,q,r≤y p≤y What about the error term? That is, the accumulation of the O(1)’s? The number of O(1)’s equals the total number of terms in our sum, and these are in 1-to-1 correspondence with the subsets of the primes ≤ y; that is, there are 2k terms where k = π(y). This potential error term doubles each time we add a prime to the set we are sieving by, so as soon as k > c log x, for c = 1/ log 2, the error term is, as far as we know, larger than the√ main term. But c log x is not large at all, especially when we compare k with π( x), which is how far we wish to sieve. So this direct approach is not going to give us an accurate estimate for π(x), and the reason is

49 50 4. INTRODUCTION TO SIEVE THEORY that there are far too many terms to handle. In the next section we will see how one can restrict the number of terms in the sum. Q A few minor remarks. If m = p≤y p then the terms in the sum are in 1-to-1 correspondence with the divisors of m, and so we can rewrite our exact formula as X hxi (4.1.2) µ(d) . d d|m

For any term with d > x the contribution here is 0, but any approximation x/d introduces an unnecessary error, so we wish to restrict the sum to those d ≤ x (or to d substantially smaller). This sieving discussion works for any interval of length x; indeed the number of integers in (N,N + x] free of prime factors ≤ y, is exactly

X N + x N  µ(d) − . d d d|m

Exercise 4.1.1. Suppose that the positive integers m and x are given. Define φ(m) E(N) := #{n ∈ (N,N + x]:(n, m) = 1} − m x. (i) Prove that |E(N)| ≤ 2k−1, where k = ω(m). (ii) Prove that E(N + m) = E(N) for all N. Pm−1 (iii) Prove that N=0 E(N) = 0. Exercise 4.1.2. Now suppose that m has no prime factors ≤ x. Let T (d) := #{n (mod m):(n(n + d), m) = 1}. Q (i) Prove that T (d) = φ2(m) if 1 ≤ |d| ≤ x, where φ2(m) := m p|m(1−2/p). Pm−1 2 P 2 (ii) Prove that N=0 E(N) = |d|≤x(x − |d|)T (d) − (φ(m)x) /m. (iii) Deduce that

m−1  2 1 X φ2(m) φ(m) φ(m) φ(m) E(N)2 = x(x − 1) − x2 + x ≤ x . m m m m m N=0 This exercise tells us that the number of integers coprime to m in an interval of length x, is (4.1.1), which is (φ(m)/m)x, with an error term typically bounded by a small multiple of the square root of the main term.

4.2. A first combinatorial Eratosthenes If we define 00 = 1 then we have the convenient identity

k X k (1 − 1)k = (−1)j. j j=0

We use the easily verified inequalities

2J+1 2J X k X k (−1)j ≤ (1 − 1)k ≤ (−1)j, j j j=0 j=0 4.2. A FIRST COMBINATORIAL ERATOSTHENES 51 which hold for any J ≥ 0. We let 1m(n) = 1 if (m, n) = 1, and = 0 if (m, n) > 1. k If k is the number of prime factors of (m, n) then 1m(n) = (1 − 1) . Therefore 2J+1 2J X j X X j X (−1) 1 ≤ 1m(n) ≤ (−1) 1, j=0 d|(m,n) j=0 d|(m,n) ω(d)=j ω(d)=j where ω(d) denotes the number of distinct prime factors of d. Q Let m = p≤y p. We wish to count the number of integers up to x that are co-prime to m. We sum the above up over all n ≤ N to obtain 2J+1 2J X X X X X X X (−1)j 1 ≤ 1 ≤ (−1)j 1. j=0 d|m n≤x n≤x j=0 d|m n≤x ω(d)=j d|n (n,m)=1 ω(d)=j d|n For K = 2J or 2J + 1 we are interested in evaluating K K K X X hxi X X x  X X 1 (−1)j = (−1)j + O(1) = x (−1)j + O(yK ) d d d j=0 d|m j=0 d|m j=0 d|m ω(d)=j ω(d)=j ω(d)=j since each integer d counted here is ≤ yK . We complete the sum on the right-hand side (that is, we add in all the terms with j > K) to obtain the expected main term X X 1 Y  1 x (−1)j = x 1 − , d p j≥0 d|m p≤y ω(d)=j with an error that is bounded by X X 1 x . d j>K d|m ω(d)=j

Now each of these d can be written as p1p2 . . . pj, the product of distinct primes ≤ y, and so  j X 1 1 X 1 ≤ d j!  p d|m p≤y ω(d)=j We denote the quantity inside the brackets by L, and L = log log y+O(1) by (2.8.3). j+1 1 j If we select K ≥ 2L then L /(j + 1)! ≤ 2 L /j! for all j ≥ K. Hence K X X 1 X Lj LK eL 1 ≤ ≤ 2  ≤ e−K  d j! K! K (log y)3 j>K d|m j>K ω(d)=j by Stirling’s formula, and then taking K = [e2L] + 1. Now if y ≤ x1/2K then we combine the above to obtain X Y  1   1  (4.2.1) 1 = x 1 − 1 + O + O(x1/2), p log x n≤x p≤y (n,m)=1 where y = xc/ log log x for some constant c > 0. 52 4. INTRODUCTION TO SIEVE THEORY

We have not obtained an understanding of π(x) through sieve methods, but it is a start, getting an accurate estimate the number of integers up to x, free of prime factors ≤ xc/ log log x. Moreover this proof gives the same estimate for the number of such integers in any interval of length x. The observations of this section heralded the birth of sieve theory. The first great work, due to Brun, worked with a more complicated set of d, though still all substantially less than x. His famous result states that whereas Euler had shown P that p prime 1/p diverges, the sum X 1 converges. p p, p+2 both prime In the next section we will encounter a situation where the number of integers left in an interval of length x, after removing those sharing a common prime factor with m, is substantially less then the expected (φ(m)/m)x. We will use this to find long intervals that contain no primes.

4.3. Integers without large prime factors: “Smooth” or “friable” numbers Let p(n) and P (n) be the smallest and largest prime factors of n, respectively. Given a real number y ≥ 2, the integers, n, all of whose prime factors are at most y (that is, for which P (n) ≤ y) are called “y-smooth” or “y-friable”.1 Smooth numbers appear all over analytic number theory. For example most factoring al- gorithms search for smooth numbers (in an intermediate step) which appear in a certain way, since they are relatively easy to factor. Moreover all smooth num- bers n may be factored as ab, where a ∈ (A/y, A] for any given A, 1 ≤ A ≤ n. This “well-factorability” is useful in attacking Waring’s problem and in finding small gaps between consecutive primes (see chapter ??). However, counting the y-smooth numbers up to x can be surprisingly tricky. Define X Ψ(x, y) := 1. n≤x P (n)≤y We can formulate this as a question about multiplicative functions by considering the multiplicative function given by f(pk) = 1 if p ≤ y, and f(pk) = 0 otherwise. If x ≤ y then clearly Ψ(x, y) = [x] = x + O(1). Next suppose that y ≤ x ≤ y2. If n ≤ x is not y-smooth then it must be divisible by a unique prime p ∈ (y, x]. Thus, by exercise 3.6.7(i), X X X x  Ψ(x, y) = [x] − 1 = x + O(1) − + O(1) p y

1“Friable” is French (and also appears in the O.E.D.) for “crumbly”. Its usage, in this context, is spreading, because the word “smooth” is already overused in mathematics. 4.3. INTEGERS WITHOUT LARGE PRIME FACTORS: “SMOOTH” OR “FRIABLE” NUMBERS53

One can continue the process begun above, using the principle of inclusion and exclusion to evaluate Ψ(yu, y) by subtracting from [yu] the number of integers which are divisible by a prime larger than y, adding back the contribution from integers divisible by two primes larger than y, and so on.2 However one can proceed more easily via identities for the function Ψ(x, y). One begins by noting that an integer n is not y-smooth, if it has a prime factor p > y. When n < y2, the prime p is unique, but this is not necessarily true for larger n, so we select the largest prime factor p of n. Then we can write n = mp where m is p-smooth. Summing over all such n ≤ x, leads to de Bruijn’s identity: X x  (4.3.1) Ψ(x, y) = [x] − Ψ , p . p y 1 we will see that ρ can be defined as the unique continuous solution to the integral (delay) equation Z u uρ(u) = ρ(t)dt. u−1 Differentiating yields the differential-difference equation uρ0(u) = −ρ(u − 1). We will indicate how one proves the following result:

Theorem 4.3.1 Counting Smooth numbers

For all x ≥ y ≥ exp((log log y)2) we have  log(u + 1) Ψ(x, y) = ρ(u)x 1 + O , log y u log x where x = y ; that is, u = log y .

In other words, the proportion of the integers ≤ x that have all of their prime factors ≤ x1/u, tends to ρ(u), as x → ∞.

2A result of this type for small values of u may be found in Ramanujan’s unpublished manuscripts (collected in The last notebook). 54 4. INTRODUCTION TO SIEVE THEORY

Sketch. Let x = yu, and we start with X  X  X Ψ(x, y) log x = log n + O log(x/n) = log n + O(x). n≤x n≤x n≤x P (n)≤y P (n)≤y P Using log n = d|n Λ(d) we have X X X log n = Λ(d)Ψ(x/d, y) = (log p)Ψ(x/p, y) + O(x), n≤x d≤x p≤y P (n)≤y P (d)≤y since the contribution of prime powers pk (with k ≥ 2) is easily seen to be O(x). Thus X x  (4.3.2) Ψ(x, y) log x = log p Ψ , y + O(x). p p≤y (Compare this to the formulae in Exercise 3.6.13.) This is easier to work with than (4.3.1). In the terms in the sum in (4.3.2) we have the following range of u-values: log 2 log x/p log p u − ≥ = u − ≥ u − 1. log y log y log y If we think of y as fixed, then all of these values are all significantly smaller than u, so we can proceed by some form of induction.3 The main term on the right-hand side is then X x  log p  log p ρ u − . p log y p≤y P log p Next we use partial summation with the sum S(T ) = p≤T p , so our sum equals x times Z y  log T  ρ u − dS(T ). T =2− log y In (2.8.1) we saw that S(T ) = log T + O(1), so just substituting in the main term gives, as d(log T ) = dT/T , Z y  log T dT Z 1 ρ u − = ρ(u − t)dt · log y + O(1) = ρ(u) log y + O(1) T =2− log y T t=0 writing T = yt, and since each |ρ(u−t)| ≤ 1. This is how one obtains the definition of ρ(.). Sorting out the error terms here is tricky, something we will leave for another course.  The lower bound in the range in this theorem, y ≥ exp((log log y)2) seems complicated and is dependent on the strongest fully proved version of the prime number theorem. What if we assume more than has been proved? Hildebrand showed that the asymptotic Ψ(x, y) ∼ ρ(u)x holds in larger range y ≥ (log x)2+o(1) if and only if the Riemann Hypothesis is true. Exercise 4.3.1. Use the integral and differential equations defining ρ(.) to prove the following: (i) ρ(u) > 0 for all u ≥ 0,

3Here is where we are vague about the details, since this kind of continuous induction can get technical. 4.4. AN UPPER BOUND ON THE NUMBER OF SMOOTH NUMBERS 55

(ii) ρ0(u) < 0 for all u ≥ 1, so that ρ(u) is decreasing in this range. (iii) uρ(u) ≤ ρ(u − 1) for all u ≥ 2 (iv) Deduce that ρ(u) ≤ 1/[u]!. We showed in exercise 4.1.2 that an interval of length x, typically has ∼ Q 1 x y

4.4. An upper bound on the number of smooth numbers Good upper bounds may be obtained for Ψ(x, y), uniformly in a wide range, by a simple application of Rankin’s trick. In analytic number theory one often faces a sum over the integers n ≤ x, and it would be more natural, for analytic methods, to sum over all positive integers (like in (2.2.1)). So we need to develop techniques that extend the sum but do not greatly change its value. Rankin’s beautiful idea when the summands are all non-negative, is to multiply through each term by (x/n)σ for some real number σ > 0. We see that (x/n)σ ≥ 1 for all n ≤ x, and (x/n)σ > 0 for all n > x. Hence X X  x σ (4.4.1) Ψ(x, y) = 1 ≤ = xσζ(σ, y) n n≥1 n≥1 P (n)≤y P (n)≤y where we define −1 X 1 Y  1  ζ(s, y) := = 1 − . ns ps n≥1 p≤y P (n)≤y The product and the series are both absolutely convergent in the half-plane Re(s) > 0. To obtain as good an upper bound on Ψ(x, y) as possible, we wish to minimize the right-hand side of (4.4.1), over all σ > 0. Exercise 4.4.1. ∗ Show that, for any real numbers x ≥ 1 and y ≥ 2, the function xσζ(σ, y) for σ ∈ (0, ∞) attains its minimum when σ = α = α(x, y) satisfying X log p log x = . pα − 1 p≤y To effectively use Rankin’s method we need to be able to get decent estimate for this sum, so as to approximate the value of α, substitute this into (4.4.1), and then bound that in terms of easily understood functions. Using the prime number u 3 log(u log u) theorem one can show that for x = y with y > (log x) , α ≈ 1 − log y , which 1 is > 2 . Now, using exercise 3.6.9, we have  log(u log u)  X 1  log y  log ζ 1 − , y = + O(1) ≤ log + O(u). log y pσ log(u log u) p≤y Substituting this into (4.4.1), we deduce that  C u (4.4.2) Ψ(x, y) ≤ x · log y u log u 56 4. INTRODUCTION TO SIEVE THEORY for some constant C > 0. By a more sophisticated argument, using the saddle point method, Hildebrand and Tenenbaum [?] established an asymptotic formula for Ψ(x, y) uniformly in x ≥ y ≥ 2: xαζ(α, y)   1  log y  (4.4.3) Ψ(x, y) = p 1 + O + O , α 2πφ2(α, y) u y

2 with α as in Exercise 4.4.1, φ(s, y) = log ζ(s, y) and φ (s, y) = d φ(s, y). This √2 ds2 implies that if y ≥ (log x)1+δ then Ψ(x, y)  xαζ(α, y)/( u log y), the upper bound obtained in (4.4.1) is larger than Ψ(x, y) by only a small factor. One can improve Rankin’s upper bound, with a slightly more complicated argument,√ to obtain an upper bound for Ψ(x, y) which is too large by a factor of only  u log u.

4.5. Large gaps between primes We now apply our estimates for smooth numbers to construct large gaps be- tween primes. The gaps between primes get arbitrarily large since each of m! + 2, m! + 3, . . . , m! + m are composite, so if p is the largest prime ≤ m!+1, and q the next prime, then q−p ≥ m. Note that m ∼ log p/(log log p) by Stirling’s formula (Exercise 3.6.2), whereas we expect, from (3.0.4), gaps as large as log p. Can such techniques establish that there are gaps between primes that are substantially larger than log p (and substantially smaller)? That is, if p1 = 2 < p2 = 3 < . . . is the sequence of prime numbers then is it true that p − p (4.5.1) lim sup n+1 n = ∞ ? n→∞ log pn We establish such result in the reminder of this chapter. In section ?? we will return to such questions and prove that p − p (4.5.2) lim inf n+1 n = 0. n→∞ log pn

Theorem 4.5.1 Long gaps between consecutive primes

There are arbitrarily large pn for which

1 (log log pn) log log log log pn pn+1 − pn ≥ log pn 2 . 2 (log log log pn) In particular (4.5.1) holds.

In 2015, Ford, Green, Konyagin, Maynard and Tao improved this to

(log log pn) log log log log pn pn+1 − pn ≥ c log pn , log log log pn for some constant c > 0. This is still a long way from Cram´er’sconjecture (see 2 (2.11.2)) that there are arbitrarily large pn for which pn+1−pn ≥ {1+o(1)}(log pn) . 4.5. LARGE GAPS BETWEEN PRIMES 57

Proof. The idea is to construct a long interval, in which every integer is known Q to be composite since it divisible by a small prime. Let m = p≤z p. Our goal is to show that there exists an interval (T,T + x] for which (T + j, m) > 1 for each j, 1 ≤ j ≤ x, with T > z (so that every element of the interval is composite). Erd˝os formulated an easy way to think about this problem: The Erd˝osshift: There exists an integer T for which (T + j, m) > 1 for each j, 1 ≤ j ≤ x if and only if for every prime p|m there exists a residue class ap (mod p) such that for each j, 1 ≤ j ≤ x there exists a prime p|m for which j ≡ ap (mod p).

Proof of the Erdos˝ shift. Given T , let each ap = −T , since if (T +j, m) > 1 then there exists a prime p|m with p|T + j and so j ≡ −T ≡ ap (mod p). In the other direction select T ≡ −ap (mod p) for each p|m, using the Chinese Remainder Theorem, and so if j ≡ ap (mod p) then T + j ≡ (−ap) + ap ≡ 0 (mod p) and so p|(T + j, m). 

The y-smooth integers up to x, can be viewed as the set of integers up to x, with the integers in the residue classes 0 (mod p) sieved out, by each prime p in the range y < p ≤ x. The proportion of the integers that are left unsieved is ρ(u) (as we proved above), which is roughly 1/uu. This is far smaller than the proportion suggested by the usual heuristic:4 Y  1 log y 1 1 − ∼ = , p log x u y

(I) We select the congruence classes ap = 0 (mod p) for each prime p ∈ (y, z]. Let

N0 := {n ≤ x : n 6∈ 0 (mod p) for all p ∈ (y, z]}.

The integers n counted in N0 either have a prime factor p > z or not. If they do then we can write n = mp so that m = n/p ≤ x/z ≤ y and therefore m is composed only of prime factors ≤ y. Otherwise if n does not have a large prime

4For a randomly chosen interval, the proportion of integers removed when we sieve by the 1 prime p is p ; and the different primes act “independently”. 58 4. INTRODUCTION TO SIEVE THEORY factor then all of its prime factors are ≤ y. By this decomposition, and Exercise 3.6.7, we have X X x  x  #N = [x/p] + Ψ(x, y) ≤ + o 0 p log x z

(II) Now for each consecutive prime pj ≤ y let

Nj : = {n ∈ N0 : n 6∈ ap (mod p) for all p = p1, . . . , pj}

= {n ∈ Nj−1 : n 6∈ ap (mod p) for p = pj}.

We select ap for p = pj so as to maximize #{n ∈ Nj−1 : n ≡ ap (mod p)}, which 1 1 must be at least the average #Nj−1. Hence #Nj ≤ (1 − )#Nj−1, and so if pk p pj is the largest prime ≤ y then, by induction, we obtain that Y  1 e−γ log log x z r := #N ≤ 1 − #N ∼ x ∼ e−γ (1 + ) k p 0 log y log x log z p≤y using Mertens’ Theorem. This implies that r < #{p ∈ (z, z]} using (3.2.1) (which we proved there with constant c = log 2), since e−γ < log 2.

(III) Let Nk = {b1, . . . , br}, and let p`+1 < p`+2 < . . . < p`+r be the r smallest primes in (z, z]. Now let ap = bj for p = p`+j for j =, 2, . . . , r. Hence every integer n ≤ x belongs to an arithmetic progression ap (mod p) for some p ≤ z.

We have now shown how to choose ap (mod p) for each p ≤ z so that every n ≤ x belongs to at least one of these arithmetic progressions. By the Erd˝osshift Q we know that there exists T (mod m), where m = p≤z p for which (T + j, m) > 1 for 1 ≤ j ≤ x. We select T ∈ (m, 2m] to guarantee that every element of the interval (T,T + x] is greater than any of the prime factors of m. Hence if pn is the largest prime ≤ T , then pn+1 − pn > x. We need to determine how big this gap is compared to the size of the primes involved. Now pn ≤ 2m and log m ≤ ψ(z) ≤ z log 4 + O(log z) by (3.2.1), so that 2 z ≥ 3 log pn. This implies the theorem.  Exercise 4.5.1. ∗ Assuming the prime number theorem, improve the constant 1 γ 2 in this lower bound to e + o(1). The Erd˝osshift for arithmetic progressions: It is not difficult to modify the above argument to obtain large gaps between primes in any given arithmetic progression. However there is a direct connection between strings of consecutive composite num- bers, and strings of consecutive composite numbers in an arithmetic progression: Let m be the product of a finite set of primes that do not divide q. Select integer r for which qr ≡ 1 (mod m). Hence (a + jq, m) = (ar + jqr, m) = (ar + j, m), and so, for T = ar, (4.5.3) #{1 ≤ j ≤ N :(a + jq, m) = 1} = #{1 ≤ j ≤ N :(T + j, m) = 1}. 4.6. EXERCISES 59

In other words, the sieving problem in an arithmetic progression is equivalent to sieving an interval. This leads us to the following useful result, which follows from the Erd˝os-Rankin construction: Corollary 4.5.1. Suppose that q has no prime factor < z. There exists an Q integer a in any given interval of length m = p≤z p such that each integer a + jq, with 1 ≤ j ≤ N, is divisible by some prime ≤ z, where z log z log log log z N = (eγ + o(1)) . (log log z)2 4.6. Exercises Exercise 4.6.1. Prove that if f is a non-negative arithmetic function, and F (σ) is convergent for some 0 < σ < 1 then X X f(n) f(n) + x ≤ xσF (σ). n n≤x n>x Exercise 4.6.2. Prove that e + o(1)u ρ(u) = . u log u (Hint: Select c maximal such that ρ(u)  (c/u log u)u. By using the functional equation for ρ deduce that c ≥ e. Take a similar approach for the implicit upper bound.) ∗ Exercise 4.6.3. A permutation π ∈ Sn is m-smooth if its cycle decomposition contains only cycles with length at most m. Let N(n, m) denote the number of m- smooth permutations in Sn. (i) Prove that m N(n, m) X N(n − j, m) n = . n! (n − j)! j=1 (ii) Deduce that N(n, m) ≥ ρ(n/m)n! holds for all m, n ≥ 1. (iii)† Prove that there is a constant C such that for all m, n ≥ 1, we have N(n, m)  n  C ≤ ρ + . n! m m (One can take C = 1 in this result.) Therefore, a random permutation in Sn is n/u-smooth with probability → ρ(u) as n → ∞.

CHAPTER 5

Selberg’s sieve applied to an arithmetic progression

In this chapter we develop Selberg’s sieve, obtaining a strong upper estimate for the number of primes in an interval, without getting into the deep ideas needed for the proof of the prime number theorem.

5.1. Selberg’s sieve Let I be the set of integers in the interval (x, x + y], that are ≡ a (mod q). For a given integer P which is coprime to q, we wish to estimate the number of integers in I that are coprime to P ; that is, the integers that remain when I is sieved (or sifted) by the primes dividing P . Selberg’s sieve yields a good upper bound for this quantity. Note that this quantity, plus the number of primes dividing P , is an upper bound for the number of primes in I; selecting P well will give us the Brun-Titchmarsh theorem. When P is the product of the primes ≤ x1/u, other than those that divide q, we will obtain (for suitably large u) strong upper and lower bounds for the size of the sifted set; this result, which we develop in Section 5.2, is a simplified version of the fundamental lemma of sieve theory. Let λ1 = 1 and let λd be any sequence of real numbers for which λd 6= 0 only when d ∈ S(R,P ), which is the set of integers d ≤ R such that d is composed entirely of primes dividing P (where R is a parameter to be chosen later). We say that λ is supported on S(R,P ). Selberg’s sieve is based on the simple idea that squares of real numbers are ≥ 0, and so (  X 2 = 1 if (n, P ) = 1 λd is ≥ 0 always. d|n Therefore we obtain that X X  X 2 1 ≤ λd . n∈I n∈I d|n (n,P )=1 Expanding out the inner sum over d, the first term on the right hand side above is X X λd1 λd2 1,

d1,d2 x

61 62 5. SELBERG’S SIEVE APPLIED TO AN ARITHMETIC PROGRESSION that

X y X λd1 λd2 X 1 ≤ + |λd1 λd2 | q [d1, d2] n∈I d1,d2 d1,d2 (n,P )=1 2 y X λd1 λd2  X  (5.1.1) = + |λd| q [d1, d2] d1,d2 d The second term here is obtained from the accumulated errors obtained when we estimated the number of elements of I in given congruence classes. In order that each error is small compared to the main term, we need that 1 is small compared to y/(q[d1, d2]), that is [d1, d2] should be small compared to y/q. Now if d1, d2 are coprime and close to R then this forces the restriction that R  py/q. The first term in (5.1.1) is a in the variables λd, which we wish to minimize subject to the linear constraint λ1 = 1. Selberg made the remarkable observation that this quadratic form can be elegantly diagonalized, which allowed him to determine the optimal choices for the λd: Since [d1, d2] = d1d2/(d1, d2), and (d , d ) = P φ(`) we have 1 2 `|(d1,d2)

2 X λd1 λd2 X X λd1 λd2 X φ(`) X λd`  X φ(`) 2 (5.1.2) = φ(`) = 2 = 2 ξ` , [d1, d2] d1 d2 ` d ` d1,d2 ` `|d1 ` d ` `|d2 where each X λd` ξ = . ` d d

So we have diagonalized the quadratic form. Note that if ξ` 6= 0 then ` ∈ S(R,P ), just like the λd’s. We claim that (5.1.2) provides the desired diagonalization of the quadratic form. To prove this, we must show that this change of variables is invertible, which is not difficult using the fact that µ ∗ 1 = δ. Thus

X λd` X X X λd` X µ(r) λ = µ(r) = µ(r) = ξ . d ` ` r dr ` r|` r r|` r

In particular, the constraint λ1 = 1 becomes X µ(r) (5.1.3) 1 = ξ . r r r We have transformed our problem to minimizing the diagonal quadratic form in (5.1.2) subject to the constraint in (5.1.3). reveals that the optimal choice is when ξr is proportional to µ(r)r/φ(r) for each r ∈ S(R,P ) (and 0 otherwise). The constant of proportionality can be determined from (5.1.3) and we conclude that the optimal choice is to take (for r ∈ S(R,P ))

1 rµ(r) X µ(r)2 (5.1.4) ξ = where L(R; P ) := . r L(R; P ) φ(r) φ(r) r≤R p|r =⇒ p|P 5.2. THE FUNDAMENTAL LEMMA OF SIEVE THEORY 63

For this choice, the quadratic form in (5.1.2) attains its minimum value, which is 1/L(R; P ). Note also that for this choice of ξ, we have (for d ∈ S(R,P )) 1 X dµ(r)µ(dr) λ = , d L(R; P ) φ(dr) r≤R/d p|r =⇒ p|P and so X 1 X µ(dr)2d 1 X µ(n)2σ(n) (5.1.5) |λ | ≤ = , d L(R; P ) φ(dr) L(R; P ) φ(n) d≤R d,r n≤R dr≤R p|n =⇒ p|P p|dr =⇒ p|P P where σ(n) = d|n d. Putting these observations together, we arrive at the following result. Proposition 5.1.1. Suppose that (P, q) = 1. The number of integers from the interval [x, x+y] that are in the arithmetic progression a (mod q), and are coprime to P , is bounded above by y 1  X µ(n)2σ(n)2 + qL(R; P ) L(R; P )2 φ(n) n≤R p|n =⇒ p|P for any given R ≥ 1, where L(R; P ) is as in (5.1.4).

5.2. The Fundamental Lemma of Sieve Theory Sieve theory provides a strong estimate for the number of integers in an interval of an arithmetic progression that are left unsieved by a subset of the primes up to some bound.

Theorem 5.2.1 The Fundamental Lemma of Sieve Theory

Let P be an integer with (P, q) = 1, such that every prime factor of P is ≤ (y/q)1/u for some given u ≥ 1. Then, uniformly, we have X y φ(P )  y 1/2+o(1) 1 = 1 + O(u−u/2) + O . q P q x

We will obtain the upper bound of the Fundamental Lemma by directly ap- plying Proposition 5.1.1 and using our understanding of multiplicative functions to evaluate the various terms there. We will deduce the lower bound from the upper bound, via a sieve identity, which is a technique that often works in sieve theory. We have already seen sieve identities in the previous chapter (e.g. (4.3.2) and (4.3.1)), and they are often used to turn upper bounds into lower bounds. In this case we wish to count the number of integers in a given set I that are coprime to a given integer P . We begin by Qj−1 writing P = p1 ··· pk with p1 < p2 < . . . < pk, and Pj = i=1 pi for each j > 1, 64 5. SELBERG’S SIEVE APPLIED TO AN ARITHMETIC PROGRESSION with P1 being interpreted as 1. Since every element in I is either coprime to P , or its common factor with P has a smallest prime factor pj for some j, we have

k X (5.2.1) #{n ∈ I :(n, P ) = 1} = #I − #{n ∈ I : pj|n and (n, Pj) = 1}. j=1

Good upper bounds on each #{n ∈ I : pj|n and (n, Pj) = 1} will therefore yield a good lower bound on #{n ∈ I :(n, P ) = 1}.

Proof Sketch. It will take us too far afield to evaluate the two sums in Proposition 5.1.1 with all the details, but less us discuss how to get the necessary estimates. One completes the sum L(R; P ) to obtain

X µ(r)2 Y  1  P L(∞; P ) := = 1 + = . φ(r) p − 1 φ(P ) r≥1 p|P p|r =⇒ p|P So we wish to bound the tail; that is, the contribution of the terms with r ≥ R. We select here R = py/q, so that if p|P then p ≤ z := R2/u by hypothesis. That is the r ≥ R are z-smooth and we saw in Theorem 4.3.1 that such integers are smooth. Hence they contribute something like 1/uu/2 times the total sum. The other sum in Proposition 5.1.1 is also over z-smooth integers, but now the summands have size ≥ 1, so the main contribution to the sum comes from the large z-smooth integers which are ≤ R, and hence are a sparse subset of all the integers ≤ R. One can therefore obtain the claimed upper bound. We now prove the lower bound using (5.2.1), and that #I = y/q + O(1). The upper bound that we just proved implies that X X 1 = 1

n∈I x/pj 0. We deal with the sum over j by then noting that φ(Pj)/Pj  (uj/u)φ(P )/P and so

k 2 X 1 φ(Pj) u φ(P ) u X log p φ(P ) 2   , pj Pj u P log(y/q) p P j=1 j p≤(y/q)1/u by (3.2.4). This completes our proof.  5.3. A STRONGER BRUN-TITCHMARSH THEOREM 65

Exercise 5.2.1. ∗ Suppose that y, z, q are integers for which log q ≤ z ≤ y/q, Q and let m = p≤z p. Use the Fundamental Lemma of Sieve Theory to prove that if (a, q) = 1 then X y 1  . φ(q) log z x

Taking the special case here with z = (y/q)1/2 , and trivially bounding the num- ber of primes ≤ z that are ≡ a (mod q), we deduce the most interesting corollary to Theorem 5.2.1:

Corollary 5.2.1 (The Brun-Titchmarsh Theorem). Let π(x; q, a) denote the number of primes p ≤ x with p ≡ a (mod q). There exists a constant κ > 0 such that κy π(x + y; q, a) − π(x; q, a) ≤ . φ(q) log(y/q)

5.3. A stronger Brun-Titchmarsh Theorem We have just seen that sieve methods can give an upper bound for the number of primes in an interval (x, x + y] that belong to the arithmetic progression a (mod q). The smallest explicit constant κ known for Corollary 5.2.1 is κ = 2, due to Montgomery and Vaughan, for which we sketch a proof in this section using the Selberg sieve:

Theorem 5.3.1 The strong Brun-Titchmarsh Theorem

There is a constant C > 1 such that if y/q ≥ C then 2y (5.3.1) π(x + y; q, a) − π(x; q, a) ≤ , φ(q) log(y/q) for any arithmetic progression a (mod q) with (a, q) = 1.

Since π(x + y; q, a) − π(x; q, a) ≤ y/q + 1, we deduce (5.3.1) for q ≤ y ≤ q exp(q/φ(q)). One can considerably simplify proofs in this area using Selberg’s monotonicity principle: For given integers ω(p) < p, for each prime p, and any integer N, define

+ #{n ∈ I : n 6∈ Ω(p) for all primes p} S (N, {ω(p)}p) := max max  ω(p)  I an interval Ω(p)⊂Z/pZ ∀p #I Q 1 − #(I∩Z)=N #Ω(p)=ω(p) p p where the first “max” is over all intervals containing exactly N integers, and the second “max” is over all sets Ω(p) of ω(p) residue classes mod p, for each prime p. − We can analogously define S (N, {ω(p)}p) as the minimum.

Lemma 5.3.1 (Selberg’s monotonicity principle). If ω1(p) ≤ ω2(p) for all primes p then, for all integers N ≥ 1,

+ + − − S (N, {ω2(p)}p) ≥ S (N, {ω1(p)}p) ≥ S (N, {ω1(p)}p) ≥ S (N, {ω2(p)}p). 66 5. SELBERG’S SIEVE APPLIED TO AN ARITHMETIC PROGRESSION

Proof. We shall establish the result when ω0(p) = ω(p) for all primes p 6= q, and ω0(q) = ω(q) + 1, and then the full result follows by induction. So given the sets {Ω(p)}p and an interval I, let N := {n ∈ I : n 6∈ Ω(p) for all primes p}. Let m be the product of all primes p 6= q with ω(p) 6= 0, and then define Ij := I + jm for j = 0, 1, . . . , q − 1. Define J := {j ∈ [0, q − 1] : −jm 6∈ Ω(q)} so that #J = q − ω(q). Let Ωj(p) = Ω(p) for all p 6= q and Ωj(q) = (Ω(q) + jm) ∪ {0}; notice that #Ωj(q) = #Ω(q)+1 whenever j ∈ J. Moreover, letting Nj := {n+jm ∈ Ij : n + jm 6∈ Ωj(p) for all primes p} we have

#Nj = #N\ #{n ∈ N : n ≡ −jm (mod q)}. We sum this equality over every j ∈ J. Notice that each n ∈ N satisfies n ≡ −jm P (mod q) for a unique j ∈ J, and hence j∈J #Nj = (#J − 1)#N , which implies that (1 − ω(q)/q) #N ≤ max #Nj; (1 − ω0(q)/q) j∈J + + 0 and therefore S (N, {ω(p)}p) ≤ S (N, {ω (p)}p). The last step can be reworked, − − 0 analogously, to also yield S (N, {ω(p)}p) ≥ S (N, {ω (p)}p). 

Proof of Theorem 5.3.1. Let P be the set of primes ≤ R. One can show that L(R; P ) ≥ log R + γ0 + o(1) 0 P log p where γ := γ + p p(p−1) ; and that

X µ(n)2σ(n) 15 = R + o(R). φ(n) π2 n≤R

π2 p y Inserting these estimates into Theorem 5.1.1 with R := 15 2 we deduce that 2y (5.3.2) #{n ∈ [x, x + y):(n, P ) = 1} ≤ log y + c + o(1) where c := 2γ0 − 1 − log 2 + 2 log(π2/15) = 0.1346 ... This implies (5.3.1) for q = 1 when y ≥ C, for some constant C (given by when c + o(1) > 0). Given y and q, let Y = y/q and let m be the product of the primes ≤ R that do not divide q. Suppose that Y ≥ C. Let {a + jq : 1 ≤ j ≤ N} be the integers ∈ (x, x + y] in the arithmetic progression a (mod q) (so that N = Y +O(1)). By (4.5.3) we know that the number of these integers that are coprime to m, equals exactly the number of integers in + some interval of length N that are coprime to m, and this is ≤ S (N, {ω1(p)}p), by definition, where Ω1(p) = {0} for each p|m and Ω1(p) = ∅ otherwise. Now suppose that Ω2(p) = {0} for each p|P and Ω2(p) = ∅ otherwise, so that Selberg’s + + monotonicity principle implies that S (N, {ω1(p)}p) ≤ S (N, {ω2(p)}p). In other words P/m max #{n ∈ (x, x+N]:(n, m) = 1} ≤ ·max #{n ∈ (T,T +N]:(n, P ) = 1}, x φ(P/m) T and the result follows from (5.3.2) since P/m divides q.  5.4. ADDITIONAL EXERCISES 67

5.4. Additional exercises ∗ Exercise 5.4.1. Prove that our choice of λd (as in section 5.1) is only sup- ported on squarefree integers d and that 0 ≤ µ(d)λd ≤ 1. Exercise 5.4.2. ∗ (i) Prove the following reciprocity law: If L(d) and Y (r) are supported only on the squarefree integers then X X Y (r) := µ(r) L(m) for all r ≥ 1 if and only if L(d) = µ(d) Y (n) for all d ≥ 1. m: r|m n: d|n

(ii) Deduce the relationship, given in Selberg’s sieve, between the sequences λd/d and µ(r)ξr/r. (iii) Suppose that g is a multiplicative function and f = 1 ∗ g. Prove that X X 2 L(d1)L(d2)f((d1, d2)) = g(n)Y (n) .

d1,d2≥1 n≥1 (iv) Suppose that L is supported only on squarefree integers in S(R,P ). Show that to maximize the expression in (iii), where each f(p) > 1, subject to the constraint L(1) = 1, we have that Y is supported only on S(R,P ), and then Y (n) = c/g(n) P where c = n 1/g(n). Use this to determine the value of each L(m) in terms of g. (v) Prove that 0 ≤ f(m)µ(m)L(m) ≤ 1 for all m; and if R = ∞ then L(m) = µ(m)/f(m) for all m ∈ S(P ).

CHAPTER 6

Primer on analysis

To study analytic number theory it helps to know some of the basics of “hard analysis”. We made a start in section 2 with o(.),O(.), , , , ∼, etc., but now it is time to discuss, possibly review, the techniques and ideas that can be first encountered in courses in real analysis, complex analysis, Fourier analysis, applied analysis etc.

6.1. Inequalities

The Cauchy-Schwarz inequality is remarkably useful. It states that if a1, a2,..., ak, b1, b2, . . . , bk ∈ C then k 2 k !  k 

X X 2 X 2 (6.1.1) aibi ≤ |ai|  |bj|  . i=1 i=1 j=1 There are many proofs in the literature. The main use comes from the separating of variables, the ai from the bj, which might have arisen for quite different reasons. The Cauchy-Schwarz inequality is a special case of Holder’s inequality which states that if p and q are positive with 1/p + 1/q = 1 then 1/q k k !1/p  k 

X X p X q (6.1.2) aibi ≤ |ai|  |bj|  . i=1 i=1 j=1

There is a continuous version of this, namely, if f(t), g(t) ∈ C(t) then Z v Z v 1/p Z v 1/q p q (6.1.3) f(t)g(t) dt ≤ |f(t)| dt |g(t)| dt . u u u I(m) R v m 1 n−m Given h(t) define e := u |h(t)| dt. If ` < m < n then we can take p = n−` 1 m−` `/p n/q so that q = n−` , and then f(t) = h(t) and g(t) = h(t) , in (6.1.3) to obtain (6.1.4) (n − `)I(m) ≤ (n − m)I(`) + (m − `)I(n); that is, I(m) can be bounded by the appropriate convex linear combination of I(`) and I(n), a convexity bound.

Exercises

Exercise 6.1.1. You interview a set of single parent families, first asking each parent the number of children in their family and determining the average of the responses, and then asking each child the number of children in their family and determining the average of the responses. Which average is larger, or are they always the same?

69 70 6. PRIMER ON ANALYSIS

Exercise 6.1.2. Use the method of Lagrange multipliers to prove Holder’s inequality. Show that one gets equality if and only if there exists a non-zero constant p q 1 1 c such that |ai| = c|bi| for all i. (Note : p + q = 1 if and only if (p−1)(q −1) = 1.)

6.2. Fourier series You have probably wondered what “radio waves” and “sound waves” have to do with waves. Sounds don’t usually seem to be very “wavy,” but rather are fractured, broken up, stopping and starting and changing abruptly. So what’s the connection? The idea is that all sounds can be converted into a sum of waves. For example, let’s 1 imagine that our “sound” is represented by the gradually ascending line y = x − 2 , considered on the interval 0 ≤ x ≤ 1, which is shown in the first graph of Figure 1.

0.4 0.4

0.2 0.2

0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1

-0.2 -0.2

-0.4 -0.4 .

1 1 Figure 1: The line y = x − 2 and the wave y = − π sin 2πx. If we try to approximate this with a “wave,” we can come pretty close in the middle 1 of the line using the function y = − π sin(2πx). However, as we see in the second graph of Figure 1, the approximation is rather poor when x < 1/4 or x > 3/4. How can we improve this approximation? The idea is to “add” a second wave to the first, this second wave going through two complete cycles over the interval [0, 1] rather than only one cycle. This corresponds to hearing the sound of the two waves at the same time, superimposed; mathematically, we literally add the two 1 functions together. As it turns out, adding the function y = − 2π sin(4πx) makes the approximation better for a range of x-values that is quite a bit larger than the range of good approximation we obtained with one wave, as we see in Figure 2.

0.4 0.4

0.2 0.2 + = 0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1

-0.2 -0.2

-0.4 -0.4

1 1 Figure 2: Adding the wave y = − 2π sin 4πx to the wave y = − π sin 2πx.

We can continue in this way, adding more and more waves that go through three, four, or five complete cycles in the interval, and so on, to get increasingly better approximations to the original straight line. The approximation we get by using one hundred superimposed waves is really quite good, except near the endpoints 0 and 1. 6.2. FOURIER SERIES 71

0.6

0.4

0.2

0.2 0.4 0.6 0.8 1

-0.2

-0.4

-0.6 Figure 3: The sum of one hundred carefully chosen waves. If we were to watch these one hundred approximations being built up one additional wave at a time, we would quickly be willing to wager that the more waves we allowed ourselves, the better the resulting approximation would be, perhaps becoming as accurate as could ever be hoped for. As long as we allow a tiny bit of error in the approximation (and shut our eyes to what happens very near the endpoints1), we can in fact construct a sufficiently close approximation if we use enough waves. However, to get a “perfect” copy of the original straight-line, we would need to use infinitely many sine waves—more precisely, the ones on the right-hand side of the formula ∞ 1 X sin(2πnx) (6.2.1) x − = −2 , 2 2πn n=1 which can be shown to hold whenever 0 < x < 1. (We can read off, in the n = 1 and n = 2 terms of this sum, the two waves that we chose for Figures 1 and 2.) This formula is not of immediate practical use, since we can’t really transmit infinitely many waves at once. However each finite truncation (that is, the terms with 1 ≤ n ≤ N) gives an approximation to the right-hand side, which gets better as N gets larger. In general, for any function f(x) defined on the interval [0, 1] that is not “too irregular,” we can find numbers an and bn such that f(x) can be written as a sum of trigonometric functions, namely, ∞ X  (6.2.2) f(x) = a0 + an cos(2πnx) + bn sin(2πnx) . n=1

This formula, together with a way to calculate the coefficients an and bn, is one of the key identities from “Fourier analysis,” and it and its many generalizations are the subject of the field of mathematics known as harmonic analysis. In terms of waves, the numbers 2πn are the frequencies of the various component waves

1The inability of these finite sums of waves to approximate an original function well near an “unwavelike” feature, like an endpoint or a discontinuity, is a persistent issue in Fourier analysis known as the Gibbs phenomenon. It is not avoidable since a finite sum of waves is continuous so cannot model, very precisely, a discontinuity. 72 6. PRIMER ON ANALYSIS

(controlling how fast they go through their cycles), while the coefficients an and bn are their amplitudes (controlling how far up and down they go). In fact the right side of (6.2.2) defines a function on all the reals that is evidently periodic of period 1 (being a sum of periodic functions of period 1), that is f(x+n) = f(x) for all x ∈ R, n ∈ Z. We may write X X 2iπmx (6.2.3) a0 + (am cos(2πmx) + bm sin(2πmx)) = cme , m≥1 m∈Z where a0 = c0 and am = cm + c−m, bm = i(cm − c−m) with Z 1 −2iπmt (6.2.4) cm = f(t)e dt, 0 for each m ∈ Z. Notice that if we integrate the series in (6.2.3) multiplied by −2iπjt e , over [0, 1), we obtain cj provided that we can swap the order of summation and integration without anything surprising happening. Dirichlet proved that this is 1 true at any point of continuity of f, and that the value of (6.2.3) is 2 (limt→x− f(t)+ limt→x+ f(t)) at any point of discontinuity of f. An amazingly useful result. Note that we can rewrite (6.2.1) as 1 X e2iπkx {x} − = − . 2 2iπk k∈Z k6=0

6.3. Poisson summation Suppose that we have a function g(x) that is real valued and piecewise contin- uous. If f(t) = g(x + t) on t ∈ [0, 1) and is periodic, of period 1, then it is also real valued and piecewise continuous, so we can use the Fourier series (6.2.3) and (6.2.4) to prove that 1 X Z 1 (g(x) + g(x + 1)) = e2iπmx g(x + u)e−2iπmudu 2 0 m∈Z X Z x+1 = e2iπmx g(t)e−2iπmtdt. x m∈Z Repeating this process by shifting x → x + 1, etc, we obtain, since e2iπmx = e2iπm(x+1), the Poisson summation formula, for y − x ∈ Z, (6.3.1) 1 1 X Z y g(x) + g(x + 1) + g(x + 2) + ··· + g(y − 1) + g(y) = e2iπmx g(t)e−2iπmtdt. 2 2 x m∈Z This implies, taking y = −x where x is an integer, that we allow to get increasing large, that X X Z ∞ (6.3.2) f(n) = fˆ(m) where fˆ(u) := f(t)e−2iπutdt. −∞ n∈Z m∈Z One can easily deduce that if h = f ∗ g, that is h is the convolution of f and g defined by Z ∞ h(u) := f(t)g(u − t)dt then hˆ(t) = fˆ(t)ˆg(t). −∞ 6.3. POISSON SUMMATION 73

One can “invert” the Fourier transform to obtain Z ∞ f(x) = fˆ(t)e2iπtxdt. −∞ One can deduce Parseval’s identity Z ∞ Z ∞ f(x)g(x)dx = fˆ(t)gˆ(t)dt, −∞ −∞ which itself implies Z ∞ Z ∞ |f(x)|2dx = |fˆ(t)|2dt. −∞ −∞ If f has period q then (6.3.1) implies q−1 X X Z q f(n) = f(t)e−2iπmtdt. 0 n=0 m∈Z 2 For the example f(t) = e2iπt /q we obtain, swapping m and −m, q−1 Z q Z 1 X 2 X 2 X 2 e2iπn /q = e2iπ(t /q+mt)dt = q e2iπq(u +mu)du 0 0 n=0 m∈Z m∈Z Z 1 X 2 2 = q e−iπqm /2 e2iπq(u+m/2) du 0 m∈Z Z m/2+1 Z m/2+1 X 2 X 2 = q e2iπqv dv + q(−i)q e2iπqv dv m/2 m/2 m∈Z m∈Z m even m odd ∞ ∞ Z 2 √ Z 2 = q(1 + (−i)q) e2iπqv dv = q(1 + (−i)q) e2iπw dw, −∞ −∞ taking t = uq and completing the square, then putting v = u + m/2 and combining √ the , and finally letting w = qv. Taking q = 1 we see that the integral here equals 1/(1 − i), and therefore if q is prime and χ is the character of order 2 mod q then X  a X  n2  g(χ) := (1 + χ(a)) exp 2iπ = exp 2iπ q q a (mod q) n (mod q) ( √ (1 + (−i)q) √ 1 if q ≡ 1 (mod 4) (6.3.3) = q = q · 1 − i i if q ≡ 3 (mod 4) Thus we have evaluated the sign of the Gauss sum, something that is difficult to do without some machinery. There are several important number theoretic consequences of this formula. We define a finite Fourier transform by q 1 X  kn fˆ(k) = f(n)e − , q q n=1 so that q X kn f(n) = fˆ(k)e . q k=1 74 6. PRIMER ON ANALYSIS

The analogy to Parseval’s identity is q q X X |f(n)|2 = q |fˆ(k)|2. n=1 k=1

6.4. The order of a function An analytic function f(z) has finite order d if log |f(z)|  |z|d+o(1) for all complex numbers z with |z| ≥ 1, and d is the smallest such number. If f has no zeros we can write it in the form g(z) P n e ; where g(z) = n≥0 cnz . Hence Z 1 n 2iπt −2iπnt d+o(1) |cn|R = g(Re )e dt  R , 0 so that cn = 0 if n > k, and therefore g must be a polynomial of degree d (which must therefore be an integer).2 Q For a given polynomial f(x) = a0 1≤i≤k(x − αi), we have Mahler’s measure Y M(f) := |a0| max{1, |αi|}. 1≤i≤k Jensen’s formula links this definition, which depends on the algebraic part of f (that is, its roots), with the analytic part of f (its size): Z 1 (6.4.1) log M(f) = log |f(e2iπt)| dt. 0 This is easily proved by factoring f and proving this for each linear polynomial f. Rather more generally, if g is an analytic function of finite order with g(0) 6= 0 then Z 1  Y R exp log |g(Re2iπt)| dt = |g(0)| |z| 0 z: |z|≤R g(z)=0 where we count multiple zeros with their multiplicity. This can be proved by writing g(z) = f(z)h(z) where f is a polynomial the same zeros as g, and h is analytic without zeros so we can write it as eP (z) where P is a polynomial, as above. If one orders the zeros z1, z2,... of g to have absolute value 0 < |z1| = r1 ≤ |z2| = r2 ≤ ... and there are k zeros with |z| ≤ R then   k   k−1     Z R  Y R  X R X ri+1 R dr log   = log = i log + k log = n(r)  |z| ri ri rk r z: |z|≤R i=1 i=1 0 g(z)=0 where n(r) := #{i : ri ≤ r}. Therefore Z eR dr Z eR dr Z 1 n(R) ≤ n(R) ≤ n(r) ≤ log |g(Re2iπt)/g(0)| dt  Rd+o(1), R r R r 0

2Note that this argument works if we simply have an infinite sequence of values of R, getting arbitrarily large, for which log |f(z)|  Rd+o(1) whenever |z| = R. 6.5. COMPLEX ANALYSIS 75 and hence X 1 Z ∞ n(r) (6.4.2) = α dr  1 |z|α r1+α z: |z|≤R 0 g(z)=0 if α > d. Let us now focus on the case d = 1, and consider the function

Y z/zn G(z) := (1 − z/zn)e . n≥1 Taking logarithms we see that this is an analytic function that converges everywhere by (6.4.2). Since n(R)  R1+o(1) we can certainly find arbitrarily large values of R o(1) 1+o(1) such that if |z| = R then |zn − z| ≥ 1/R for all n. Hence |1 − z/zn| ≥ 1/R if |zn| ≤ 2R, implying that X X 2 − log |G(z)|  (log R + R/|zn|) + |R/zn|

n: |zn|≤2R n: |zn|>2R Z 2R Z ∞ n(t) 2 n(t) 1+o(1)  n(2R) log R + R 2 dt + R 3 dt  R . 0 t 2R t Therefore g(z)/G(z) is an analytic function of order 1 with no zeros and so must be of the form eAz+B for some constant A, B. Hence we may rewrite g(z) as

Az+B Y z/zn (6.4.3) g(z) = e (1 − z/zn)e n≥1 P which is everywhere convergent. If n 1/|zn| converges then one can modify the above to show that log |g(z)|  |z|. The representation (6.4.3) of an analytic function g(.) of order 1, is very useful since it more-or-less allows us to use techniques developed for polynomials, on a much wider range of functions.

6.5. Complex analysis If one wants to make sense of calculus when one is working with functions of a R b complex variable then one needs to determine how to evaluate a f(z)dz. If a and b are real numbers and z a real variable, then the Riemann-Stieltje interpretation of the integral involves allowing the z variable to travel along a path from a to b, there being only one choice along the x-axis, though one can, if needs be, re-interpret the path as going from a to c, and then from c to b, whether or not c lies in-between a and b. In all cases this should give the same answer. When we work with functions of a complex variable, and a, b ∈ C, we want this to again be true; that is that the R b value of a f(z)dz does not depend on the path taken to get from a to b; and, in the complex plane, there are an infinite number of possible choices of path. In fact if we wish to compare the value of the integral along two paths from a to b, we could append the paths into a circuit C by going from a to b along one path, and then R going backwards on the other from b to a. We would then want that C f(z)dz = 0. This is true for any closed circuit C when f(z) can be written at each point in terms of a Taylor series (that is, f is analytic); and in fact it is true for C when f is analytic inside and on C. If f is analytic except for a finite number of points, say z1, . . . , zk, called poles, for each of which there exists an integer q1, . . . , qk > 0 qj such that (z − zj) f(z) can be written as a Taylor series at z = zk, then we say that f is meromorphic. From what we have just written, to be able to interpret 76 6. PRIMER ON ANALYSIS all integrals involving f we simply need to determine the value of the integral in a closed circuit around each zj, that circuit not enclosing zi for any i 6= j. P j Suppose that f(z) can be written as J≥r aj(z − z0) with aJ 6= 0, for some J (which equals = −q` when z0 = z`) in a circle C of radius r around z0. Then, 2iπt taking z = z0 + re ,

1 I X 1 I (6.5.1) f(z) dz = a · (z − z )jdz = a , 2iπ j 2iπ 0 −1 C j≥J C H where C indicates that the integral is evaluated on a path that goes around the circle in an anti-clockwise direction, since, for given integer j,

I Z 1 ( 1 j 2iπt j+1 1 if j = −1 (z − z0) dz = (re ) dt = 2iπ C 0 0 otherwise.

−1 Hence only the coefficient of (z − z0) contributes to the integral around a closed circuit like this. If a−1 6= 0 then we say that the pole at z = z0 of f(z) contributes a residue of a−1 to the integral. 1 H If we wish to determine 2iπ C f(z) dz for a function f which is meromorphic inside C then we can break the region C up into a finite number of pieces, each containing at most one pole of f. If C1,C2,...,Cr denote the boundaries of these pieces then r 1 I X 1 I f(z) dz = f(z) dz, 2iπ 2iπ C i=1 Ci where all of the Ci are traversed in an anti-clockwise direction, so that the con- tribution of any internal boundary cancels, having been traversed once in each direction. The birth of analytic number theory came in recognizing that one could use (6.5.1) to sum numbers a−1, by summing up functions f, and then using the tools of calculus to determine the answer. One example of such a tool is the argument principle. Suppose that f(z) is a and we wish to count the number of zeros of f inside a closed circuit C. To simplify matters we carefully select our circuit to never run P j though a zero or pole of f. If f = j≥J aj(z − z0) with aJ 6= 0, for some J then 0 f (z)/f(z)−J/(z −z0) is analytic at z0. The integral of J/(z −z0) only contributes if J 6= 0, that is if z0 is either a pole or zero of f. Hence, by (6.5.1)

1 I f 0(z) (6.5.2) dz = #{zeros of f inside C} − #{poles of f inside C}, 2iπ C f(z) where we count a zero of order J > 0, J times, and a pole of order J < 0, |J| times. Now if f(z) = er(z)+2iπa(z) on the boundary then f 0(z)/f(z) = r0(z) + 2iπa0(z). H 0 Comparing the imaginary parts of both sides of (6.5.2), we see that C r (z)dz = 0, H 0 and so the left side of (6.5.2) equals C a (z) dz, which is the change in a(z) as we go round the circuit C. Therefore the number of zeros of f minus the number of poles inside C equals the change in argument of f(z), divided by 2π, as we go around the circuit C. 6.6. PERRON’S FORMULA AND ITS VARIANTS 77

6.6. Perron’s formula and its variants We begin with a way of identifying whether a real number t is positive or negative, using the discontinuous integral  0 if u < 0 1 Z c+i∞ ds  (6.6.1) esu = 1/2 if u = 0 2iπ s c−i∞ 1 if u > 0 which holds for any c > 0. It is proved using the ideas of complex analysis: The key is that the value of the integral in a closed contour C, is given by the contribution of the poles of esu/s (as discussed in the previous section). Since the only pole lies at s = 0, the value of the integral equals 0 if s = 0 does not lie within C, and equals 1 if it does. Here we take the line from c − i∞ to c + i∞ to be part of the boundary, and the rest of the boundary going from N + i∞ to N − i∞ for some N 6= c.3 For u < 0 we take N to be a very large positive real number and we will bound the size of the integral: The size of the integrand changes little while the argument goes through a full revolution. Thus writing s = N + it we find that su Nu itu e /s = e e /(N + it). Allowing t to vary from t0 > 0 to t0 + 2π/|u| we obtain:

1 Z N+i(t0+2π/|u|) ds 1 Z t0+2π/|u| eNueitu esu = dt 2iπ N+it0 s 2π t0 N + it eNu Z t0+2π/|u|  eitu eitu  = − dt, 2π t0 N + it N + it0

Nu 2 2 2 which is ≤ πe /(u (N + t0)) in absolute value. Thus taking t0 = 2πm/|u| for m = 0, 1, 2,... (and the conjugate argument for negative t), we obtain

1 Z N+i∞ ds X eNu X eNu 2eNu esu ≤ 2 + 2 ≤ ; 2iπ s (uN)2 (2πm)2 πN|u| N−i∞ m≤N|u|/2π m≥N|u|/2π and this → 0 as we allow N to get larger and larger. Thus we have proved the case with u < 0. The result with u > 0 is similar though now we move our contour to the left, that is letting N → −∞, and therefore the pole at s = 0 now contributes to the value of the integral. The case with u = 0 is easiest approached by adding the contributions of c − it and c + it, and changing variable t = cv to obtain 1 R ∞ 1 1 π 0 1+v2 dv = 2 via the substitution v = tan θ.

Typically we use Perron’s formula to recognize integers n ≤ x so we apply it with u = log(x/n) to obtain  0 if n > x 1 Z c+i∞  x s ds  (6.6.2) = 1/2 if n = x 2iπ n s c−i∞ 1 if n < x

3If one is uncomfortable with the idea that all vertical lines end in the same place, then one can truncate the lines at c ± iT, N ± iT , as we will explain. 78 6. PRIMER ON ANALYSIS

P 2 Therefore, to estimate n≤x φ(n)/n from section 2.14, we have, assuming x is not an integer so we can avoid the n = x case of (6.6.2), X φ(n) X φ(n) 1 Z c+i∞  x s ds 1 Z c+i∞ X φ(n) ds = = xs n2 n2 2iπ n s 2iπ n2+s s n≤x n≥1 c−i∞ c−i∞ n≥1 where we did not justify swapping the order of the summation and the integration though, as usual, it can easily be justified if everything is convergent enough, which P 2+s in practice involves taking c large enough. Now the Dirichlet series n≥1 φ(n)/n involves the sum of a multiplicative function, so is the product of the same sum taken over the powers of each prime p, which is  1  1 1   1    1  1 + 1 − + + ... = 1 − 1 − . p p1+s (p2)1+s p2+s p1+s Therefore our integral becomes 1 Z c+i∞ ζ(1 + s) ds xs . 2iπ c−i∞ ζ(2 + s) s To evaluate this we again shift the contour in the same way. The key thing to note is that |xs| = xc so we shrink the size of the integrand by moving the contour to the left. When we do this the first pole that we meet lies at s = 0 and is in fact a pole of order two, since a pole is contributed by the ζ(1 + s) term. Let’s compute the residue at this point by working out the series for the integrand: sζ(1+s) = 1+γs+O(s2), ζ(2+s) = ζ(2)+sζ0(2)+O(s2), xs = 1+s log x+O(s2), so altogether we obtain that the coefficient of 1/s in our integrand, and thus our 0 6 residue, is (log x + γ − ζ (2)/ζ(2))/ζ(2) = π2 log x + O(1). At this point we expect that this is a good estimate for our original sum. The next pole of our integrand has Res < −1,4 so if we move the contour to the vertical line at Re(s) = −1 then we simply need to show that this integrand is small to achieve our original objective, and this can be done.

Typically it is more useful to describe Perron’s formula in terms of an integral that goes to a large height T , rather than to ∞. Proposition 6.6.1. For c > 0 and u ∈ R we have  0 if u < 0 ( cu 1 Z c+iT ds  e if u 6= 0 esu − 1/2 if u = 0  1+T |u| c 2iπ c−iT s  if u = 0. 1 if u > 0 πT Proof. In the bounds we gave above for the integrand on the line Re(s) = N, one can simply take N = c and m0 = [|u|T/2π], so that our missing integral is no cu 2 cu more twice than the sum of e /4πm over all m ≥ m0, which is ≤ e /(T |u|). The 1 R ∞ 1 1 R ∞ 1 c missing integral when u = 0 equals π T/c 1+v2 dv ≤ π T/c v2 dv = πT . Suppose that u < 0. We will create a closed circuit C by going up the path from c − iT to c + iT , and returning on most of a semi-circle, to the right of this line. Thus our integral is minus the integral on this arc. Since the arc is to the right of the line we have |esu| ≤ ecu throughout, and evidently |s| = r where r = c2 + T 2. 1 cu cu Therefore as the arc has length ≤ 2πr the integral is ≤ 2π · 2πr · e /r = e . An analogous proof works when u > 0 by going to the right.

4This follows from knowing that ζ is non zero on the 1-line, and has no poles other than at 1. 6.7. ANALYTIC CONTINUATION 79



Exercises

Exercise 6.6.1. Use Perron’s formula to give an integral involving L-functions that gives a precise formula for the number of distinct integers that are the sum of two squares. Why is this difficult to estimate by the methods described above?

P φ(n) 6 φ(n) Exercise 6.6.2. Prove that n≤x n2 = π2 log x + O(1), by writing n = P µ(d) d|n d .

6.7. Analytic continuation There is a beautiful phenomenon in the theory of functions of a complex variable called “analytic continuation.” It tells us that functions that are originally defined only for certain complex numbers often have unique “sensible” definitions for other complex numbers. For example the definition of ζ(s), that X 1 ζ(s) := , ns n≥1 only works when σ = Re(s) > 1, since it is only is this domain that the sum is absolutely convergent. However “analytic continuation” will allow us to define ζ(s) for every complex number s other than s = 1. This description of analytic continuation looks disconcertingly magical. Fortunately, there is a quite explicit way to show how ζ(σ + it) can be “sensibly” defined at least for the larger region where σ > 0. We start with an expression for the product (1 − 21−s)ζ(s) and then perform some sleight-of-hand manipulations:  2  2 1 − 21−s ζ(s) = 1 − ζ(s) = ζ(s) − ζ(s) 2s 2s X 1 X 1 = − 2 ns (2m)s n≥1 m≥1 X 1 X 1 X 1 X 1 = − 2 = − ns ns ms ns n≥1 n≥1 m≥1 n≥1 n even m odd n even  1 1   1 1   1 1  = − + − + − + ··· . 1s 2s 3s 4s 5s 6s Solving for ζ(s), we find that 1  1 1   1 1   1 1   ζ(s) = − + − + − + ··· . (1 − 21−s) 1s 2s 3s 4s 5s 6s All of these manipulations were valid for complex numbers s = σ + it with σ > 1. However, it turns out that the infinite series in curly brackets actually converges whenever σ > 0. Therefore, we can take this last equation as the new “sensible” definition of the Riemann zeta-function on this larger domain. Note that the special number s = 1 causes the otherwise innocuous factor of 1/(1−21−s) to be undefined; the Riemann zeta-function intrinsically has a problem there, one that cannot be swept away with clever rearrangements of the infinite series. 80 6. PRIMER ON ANALYSIS

What is surprising is that the values given by this representation of ζ(s) are the only possible values that ζ(s) could take, given what it equals to the right of the 1-line, and that it is more-or-less analytic.

6.8. The Gamma function Euler defined Z ∞ (6.8.1) Γ(s) := e−tts−1dt, 0 which converges whenever Re(s) > 0. Writing t = −n log x we obtain Z 1 (6.8.2) Γ(s) = ns xn−1(log x−1)s−1dx, 0 for any n > 0; and taking t = n2πx we get Z ∞  s  s s s −1 −n2πx (6.8.3) Γ = π 2 n x 2 e dx. 2 0 The Gamma function is best known as a continuous extrapolation of the fac- torial function:

Exercise 6.8.1. Show that Γ(s + 1) = sΓ(s) for all s ∈ C whenever Re(s) > 0. (Hint: Integrate (6.8.1) by parts.) Noting that Γ(1) = 1 we deduce, by induction, that Γ(n+1) = n! for all integers n ≥ 0. We can define Γ(s) on the whole of C, from the value on Re(s) > 0, using the identity in exercise 6.8.1. We see that Γ(s) has no zeroes; moreover Γ(s) has simple poles at s = 0, −1, −2,... and nowhere else. By showing that Γ(s) is an analytic function of order 1, Weierstrass then used (6.4.3) to give the formula 1 Y (6.8.4) = eγs (1 + s/n)e−s/n, sΓ(s) n≥1 valid for all s ∈ C. Exercise 6.8.2. From (6.4.3) Weierstrass obtained the factor eAs+B before the product in (6.8.4), for some A, B. Show that A = γ, B = 0. (Hint: Let s be a large integer and N a much larger integer. Show that the terms in the product with n > PN 1 N contribute negligibly. Then that the product is ∼ exp(s(log N − n=1 n ))/s!.) Taking the logarithmic derivative we deduce from (6.8.4) that Γ0(s) 1 X  1 1  (6.8.5) − = γ + + − Γ(s) s s + n n n≥1 One can deduce from (6.8.4) that π  1 (6.8.6) Γ(s)Γ(1 − s) = and Γ(s)Γ s + = 21−2sπ1/2Γ(2s). sin πs 2 Stirling’s formula gives, provided | arg s| < π − δ for fixed δ > 0, ss   1  (6.8.7) Γ(s) = p2π/s 1 + O . e δ |s| 6.8. THE GAMMA FUNCTION 81

Exercises

Exercise 6.8.3. Show that −Γ0(1)/Γ(1) = γ and so deduce, from the definition R ∞ −t 0 of Γ(s) that we have 0 e log tdt = Γ (1) = −γ. Exercise 6.8.4. Using the Taylor series for − log(1 − t) at t = −1, and (6.8.5), deduce that −Γ0(1/2)/Γ(1/2) = γ + log 4.

Γ0(s) Exercise 6.8.5. Prove that if | arg s| < π − δ for fixed δ > 0 then Γ(s) = log |s| + Oδ(1). Why is the restricted range necessary?

CHAPTER 7

Riemann’s plan for proving the prime number theorem

7.1. A method to accurately estimate the number of primes Up to the middle of the nineteenth century, every approach to estimating π(x) = #{primes ≤ x} was relatively direct, based upon elementary number theory and combinatorial principles, or the theory of quadratic forms. In 1859, however, the great geometer Riemann took up the challenge of counting the primes in a very different way. He wrote just one paper that could be called “number theory,” but that one short memoir had an impact that has lasted nearly a century and a half, and its ideas have defined the subject we now call analytic number theory. Riemann’s memoir described a surprising approach to the problem, an approach using the theory of complex analysis, which was at that time still very much a developing subject.1 This new approach of Riemann seemed to stray far away from the realm in which the original problem lived. However, it has two key features: • it is a potentially practical way to settle the question once and for all; • it makes predictions that are similar, though not identical, to the Gauss predic- tion. Indeed it even suggests a secondary term to compensate for the overcount we saw in the data in the table in section 2.10. Riemann’s method is the basis of our main proof of the prime number theorem, and we shall spend this chapter giving a leisurely introduction to the key ideas in it. Let us begin by extracting the key prediction from Riemann’s memoir and restating it in entirely elementary language: lcm[1, 2, 3, ··· , x] is about ex. Using data to test its accuracy we obtain: Nearest integer to x log(lcm[1, 2, 3, ··· , x]) Difference 100 94 -6 1000 997 -3 10000 10013 13 100000 100052 57 1000000 999587 -413 Riemann’s prediction can be expressed precisely and explicitly as √ (7.1.1) | log(lcm[1, 2, ··· , x]) − x| ≤ 2 x(log x)2 for all x ≥ 100.

1Indeed, Riemann’s memoir on this number-theoretic problem was a significant factor in the development of the theory of analytic functions, notably their global aspects.

83 84 7. RIEMANN’S PLAN FOR PROVING THE PRIME NUMBER THEOREM

Since the power of prime p which divides lcm[1, 2, 3, ··· , x] is precisely the largest power of p not exceeding x, we have that  Y   Y   Y  p × p × p × · · · = lcm[1, 2, 3, ··· , x]. p≤x p2≤x p3≤x Combining this with Riemann’s prediction and taking logarithms, we deduce that  X   X   X  log p + log p + log p + ··· is about x. p≤x p2≤x p3≤x The primes in the first sum here are precisely the primes counted by π(x), the primes in the second sum those counted by π(x1/2), and so on. By partial summation we deduce that Z x 1 1/2 1 1/3 dt π(x) + 2 π(x ) + 3 π(x ) + · · · ≈ = Li(x). 2 log t If we solve for π(x) in an appropriate way, we find the equivalent form

1 1/2 π(x) ≈ Li(x) − 2 Li(x ) + ··· . Hence Riemann’s method yields more-or-less the same prediction as Gauss, but with something extra, a secondary term that will hopefully compensate for the overcount that we witnessed in the Gauss prediction. Reviewing the data (where 1 √ the column headed “Riemann” refers to Li(x) − 2 Li( x) − π(x), while the column headed “Gauss” refers to Li(x) − π(x) as before) we have: x #{primes ≤ x} Gauss Riemann 108 5761455 753 131 109 50847534 1700 -15 1010 455052511 3103 -1711 1011 4118054813 11587 -2097 1012 37607912018 38262 -1050 1013 346065536839 108970 -4944 1014 3204941750802 314889 -17569 1015 29844570422669 1052618 76456 1016 279238341033925 3214631 333527 1017 2623557157654233 7956588 -585236 1018 24739954287740860 21949554 -3475062 1019 234057667276344607 99877774 23937697 1020 2220819602560918840 222744643 -4783163 1021 21127269486018731928 597394253 -86210244 1022 201467286689315906290 1932355207 -126677992 1023 1925320391606803968923 7236148412 1034233576 1024 18435599767349200867866 17146907278 -1657067862 1025 176846309399143769411680 55160980939 -1830820885

Table 6: Primes up to various x, and Gauss’s and Riemann’s predictions. Riemann’s prediction does seem to fare better than that of Gauss, though not by a lot. But the fact that the error in Riemann’s prediction takes both positive and negative values suggests that this might be about the best that can be done. 7.2. LINKING NUMBER THEORY AND COMPLEX ANALYSIS 85

7.2. Linking number theory and complex analysis Riemann showed that the number of primes up to x can be given in terms of the complex zeros of the function 1 1 1 ζ(s) = + + + ··· . 1s 2s 3s studied by Euler, which we now call the Riemann zeta-function. In this definition s is a complex number, which we write as s = σ + it when we want to refer to its real and imaginary parts σ and t separately. If s were a real number, we would know from first-year calculus that the series in the definition of ζ(s) converges if and only if s > 1; that is, we can sum up the infinite series and obtain a finite, unique value. In a similar way, it can be shown that the series only converges for complex numbers s such that σ > 1. But what about when σ ≤ 1? How do we get around the fact that the series does not sum up (that is, converge)? As we discussed in section 6.7, one can “analytically continue” ζ(s) so that it is well-defined on the whole complex 1 plane. More than that, ζ(s) − s−1 is analytic, so that ζ is meromorphic, indeed analytic other than its only pole at s = 1, which is a simple pole with residue 1. Riemann showed that confirming Gauss’s conjecture for the number of primes up to x is equivalent to gaining a good understanding of the zeros of the function ζ(s), so we now begin to sketch the key steps in the argument that link these seemingly unconnected topics. The starting point is to take the derivative of the logarithm of Euler’s identity (2.2.1) −1 X 1 Y  1  (7.2.1) ζ(s) = = 1 − , ns ps n≥1 p prime n a positive integer to obtain ζ0(s) X log p X X log p − = = . ζ(s) ps − 1 pms p prime p prime m≥1 Perron’s formula (6.6.2) allows one to describe a “step-function” in terms of a continuous function so that if x is not a then we obtain s X 1 X Z  x  ds Ψ(x) : = log p = log p 2πi pm s pm≤x p prime,m≥1 s:Re(s)=c p prime,m≥1 1 Z ζ0(s) xs (7.2.2) = − ds. 2πi s:Re(s)=c ζ(s) s Here we can justify swapping the order of the sum and the integral if c is taken large enough since then everything converges absolutely. Note that we are not counting the number of primes up to x but rather the “weighted” version, Ψ(x). The next step is perhaps the most difficult. The idea is to replace the line Re(s) = c along which the integral has been taken, by a line far to the left, on which we can show that the integral is small, in fact smaller the further we go to the left. The difference between the values along these two integrals is given by a sum of residues, as described in sections 6.5 and 6.6. Now for any meromorphic function f, the poles of f 0(s)/f(s) are given by the of f, all of order 1, and the residue is simply the order of that zero, or minus the order of that pole. In this way we can obtain the following: 86 7. RIEMANN’S PLAN FOR PROVING THE PRIME NUMBER THEOREM

The explicit formula

X X xρ ζ0(0) (7.2.3) Ψ(x) = log p = x − − , ρ ζ(0) p prime, m≥1 ρ: ζ(ρ)=0 pm≤x where, if ρ is a zero of ζ(s) of order k, then there are k terms for ρ in the sum. One might ask how we add up the (possibly) infinite sum over zeros ρ of ζ(s)? Simple, add up by order of ascending |ρ| values and it will work out. It is hard to believe that there can be such a formula, an exact expression for the number of primes up to x in terms of the zeros of a complicated function. You can see why Riemann’s work stretched people’s imagination and had such an amazing impact.

7.3. The functional equation We saw in section 6.7 how to analytically continue ζ(s) to all s for which Re(s) > 0. Riemann made an amazing observation which allows us to easily determine the values of ζ(s) on the left side of the complex plane (where the function is not naturally defined) in terms of the right side. The idea is to multiply ζ(s) through by some simple function so that this new product ξ(s) satisfies the functional equation (7.3.1) ξ(s) = ξ(1 − s) for all complex numbers s. Riemann determined that this can be done by taking 1  s  ξ(s) := s(s − 1)π−s/2Γ ζ(s). 2 2 Here Γ(s) is the Gamma function which we encountered at the end of the last chapter. It is well-defined and continuous for all other s.

7.4. The zeros of the Riemann zeta-function An analysis of the right side of (7.2.1) reveals that there are no zeros of ζ(s) with Re(s) > 1. Therefore, using (7.3.1) and (6.8.4), we deduce that the only zeros of ζ(s) with Re(s) < 0 are those at the negative even integers −2, −4,..., the so- called trivial zeros. Thus to be able to use (7.2.3) we need to determine the zeros of ζ(s) inside the critical strip 0 ≤ Re(s) ≤ 1. After some calculation, Riemann made yet another extraordinary observation which, if true, would allow us tremendous insight into virtually every aspect of the distribution of primes:2

The Riemann Hypothesis

1 If ζ(s) = 0 with 0 ≤ Re(s) ≤ 1 then Re(s) = 2 .

Clever people have computed literally billions of zeros of ζ(s),3 and every single zero inside the critical strip that has been computed does indeed have real part equal

2No reference to these calculations of Riemann appeared in the literature until Siegel discovered them in Riemann’s personal, unpublished notes long after Riemann’s death. 3At least the ten billion zeros of lowest height; that is, with |Im(s)| smallest. 7.5. COUNTING PRIMES 87 to 1/2. For example, the nontrivial zeros closest to the real axis are s = 1/2 + iγ1 and s = 1/2 − iγ1, where γ1 ≈ 14.1347 ... . Note that if the Riemann Hypothesis is true, then we can write all the non-trivial zeros of the Riemann zeta-function in 1 1 the form ρ = 2 + iγ (together with their conjugates 2 − iγ, since ζ(1/2 + iγ) = 0 if and only if ζ(1/2 − iγ) = 0), where γ is a positive real number. We believe that the positive numbers γ occurring in the nontrivial zeros look more or less like random real numbers, in the sense that none of them are related to others by simple linear equations with integer coefficients (or even by more complicated polynomial equations with algebraic numbers as coefficients). However, nothing along these lines has ever been proved, indeed all we know how to do is to approximate these nontrivial zeros numerically to a given accuracy, We will show that there are infinitely many zeros β + iγ of ζ(s) in the critical T T  strip, indeed about 2π log 2e with 0 ≤ γ ≤ T . It is not difficult to find all of the zeros up to a given height T . The Riemann Hypothesis can be shown to hold for at least forty percent of all zeros, and it fits nicely with many different heuristic asser- tions about the distribution of primes and other sequences, yet remains an unproved hypothesis, perhaps the most famous and tantalizing in all of mathematics.

7.5. Counting primes At first sight it seems to make sense to use partial summation on (7.2.3) to get an exact expression for π(x), such as 1 1 X π(x) + π(x1/2) + π(x1/3) + ... = Li(x) − Li(xρ) + Small(x) − log 2, 2 3 ρ: ζ(ρ)=0 R ∞ dt 4 where Small(x) = x (t3−t) log t . However this is a lot more complicated than (7.2.3), so it will be easier to do partial summation at the end of our calculations rather than before. Since ζ(s) has infinitely many zeros in the critical strip, (7.2.3) is a difficult formula to use in practice. Indeed we would not expect to use infinitely many sine 1 waves from the formula (7.2.1) to approximate {x} − 2 in practice, but instead we would use a finite number of sine waves, as in our discussion there, presumably those with the largest amplitudes. Similarly we modify (7.2.3) to only include only a finite number of zeros, in particular those up to a certain height, T , that is in the box B(T ) := {ρ : ζ(ρ) = 0, 0 ≤ Re(ρ) ≤ 1, −T ≤ Im(ρ) ≤ T }. This though is an approximation, not an exact formula, and so comes at the cost of an error term, which depends on the height T : For 1 ≤ T ≤ x we have5 X xρ x log x log T  (7.5.1) Ψ(x) = x − + O . ρ T ρ: ζ(ρ)=0 0

4This expression appeared in Riemann’s paper. The simpler expression (7.2.3) is due to Von Mangoldt. 5The trivial zeros lie at −2, −4, −6,... and so contribute P 1/(2mx2m) = − 1 log(1− 1 ) m≤1 2 x2 in total to (7.2.3), which is O(1) for x ≥ 1. 88 7. RIEMANN’S PLAN FOR PROVING THE PRIME NUMBER THEOREM number and so consists of a magnitude and direction and we might guess that there is considerable cancellation amongst these terms, resulting from the different directions in which they point. However we are unable to prove anything along these lines so, rather disappointingly, we simply bound each term in absolute value:

ρ ρ X x X x Re(ρ) X 1 β(T ) 2 ≤ ≤ max x  x (log T ) , ρ ρ ρ∈B(T ) |ρ| ρ∈B(T ) ρ∈B(T ) ρ∈B(T ) T T  using the fact that there are about 2π log 2e zeros in B(T ) for all T , where β(T ) is the largest real part of any zero in B(T ). The final step in proving the prime number theorem is therefore to produce zero-free regions for ζ(s): that is regions of the complex plane, close to the line Re(s) = 1, without zeros of ζ(s). For instance in section 8.6 we show that we may take β(T ) = 1 − c/ log T for some constant c > 0. Therefore choosing T so that log T = (log x)1/2 we deduce that

 0 1/2  Ψ(x) = x + O x/ec (log x) for some constant c0 > 0, which implies the prime number theorem,  0 1/2  π(x) = Li(x) + O x/ec (log x) . One can see that any improvements in the zero-free region for ζ(s) will immediately imply improvements in the error term of the prime number theorem. For example if 1 √ the Riemann Hypothesis is true, so that β(T ) = 2 for all T , then by taking T = x in (7.5.1) we obtain that Ψ(x) = x + O(x1/2(log x)2),6, and so Z x dt √ (7.5.2) π(x) = + O( x log x). 2 log t We will show that this is not only implied by the Riemann Hypothesis, but actually implies the Riemann Hypothesis. With more care one can prove that the more precise bound √ |π(x) − Li(x)| ≤ x log x for all x ≥ 3, is equivalent to the Riemann Hypothesis.

7.6. Riemann’s revolutionary formula Riemann’s formula (7.2.3) is a little hard to appreciate at first glance. If we assume that the Riemann Hypothesis is true then each non-trivial zero can be 1/2+iγ 1 written as 1/2 + iγ and therefore contributes x /( 2 + iγ). Now as we vary 1 over zeros of ζ(s) the value of γ grows and 2 + iγ will be dominated by the iγ. 1/2+iγ 1 1/2+iγ Therefore x /( 2 + iγ) is roughly x /(iγ). Summing this together with 1 the term for 2 − iγ, we get, roughly, x1/2+iγ x1/2−iγ sin(γ log x) − = 2x1/2 . iγ iγ γ Combining this information, (7.2.3) becomes X sin(γ log x) Ψ(x) is roughly x − 2x1/2 + O(1). γ 1 γ>0: ζ( 2 +iγ)=0

6Which is a weak form of (7.1.1) since Ψ(x) = log(lcm[1, 2, ··· , x]). 7.6. RIEMANN’S REVOLUTIONARY FORMULA 89

We want to convert this into information about the number of primes up to x. If we 1 1/2 proceed by partial summation then Ψ(x) should be replaced by π(x)+ 2 π(x )+..., as in section 7.1, and x1/2 by x1/2/ log x. Therefore, after some re-arrangement,

x R dt log t − #{primes ≤ x} 2 X sin(γ log x) (7.6.1) √ ≈ 1 + 2 . x/ log x γ all real numbers γ>0 1 such that 2 +iγ is a zero of ζ(s) The numerator of the left side of this formula is the overcount term when comparing Gauss’s prediction Li(x) with the actual count√ π(x) for the number of primes up to x. The denominator, being roughly of size x, corresponds to the magnitude of the overcount as we observed earlier in our data. The right side of the formula bears much in common with our formula for {x} − 1/2. It is a sum of sine functions, with the numbers γ employed in two different ways in place of 2πn: each γ is used inside the sine (as the “frequency”), and the reciprocal of each γ forms the coefficient of the sine (as the “amplitude”). We even get the same factor of 2 in each formula. However, the numbers γ here are much more subtle than the straightforward numbers 2πn in the corresponding formula for x − 1/2. This formula can perhaps be best paraphrased as The primes should be counted as a sum of waves. It should be noted that this formula is valid if and only if the Riemann Hypothesis is true – and is thus widely believed to be correct. There is a similar formula if the Riemann Hypothesis is false, but it is rather complicated and technically far less pleasant. The main difficulty arises because the coefficients, 1/γ, are replaced by functions of x. So we want the Riemann Hypothesis to hold because it leads to the simpler formula (7.6.1), and that formula is a delight. Indeed this formula is similar enough to the formulas for sound-waves for some experts to muse that (7.6.1) asserts that “the primes have music in them.”

CHAPTER 8

The fundamental properties of ζ(s)

8.1. Representations of ζ(s) . Let us begin this section by noting that for Re(s) > 1 we have  2   2  X 1 X 1 X 1 1 − ζ(s) = 1 − = − 2 2s 2s ns ns (2m)s n≥1 n≥1 m≥1 X  1 1  = − . (2m − 1)s (2m)s m≥1 Just as in section 6.7 we find that with the terms grouped like this the right side converges for Re(s) > 0. This defines an analytic continuation for ζ(s) except perhaps where s − 1 is an integer multiple of 2π. Bounding each term above by R 2m −1−Re(s) s 2m−1 t dt yields that the right side is ≤ |s|/Re(s). Another approach is given in exercise 3.1.2. Exercise 8.1.1. a) Use exercise 3.1.2 to prove that  1  ζ0(s) 1  lim ζ(s) − = lim + = γ. s→1 s − 1 s→1 ζ(s) s − 1 b) Prove that ζ(s) = ζ(s); and deduce that if ζ(σ + it) = 0 then ζ(σ − it) = 0.

8.2. A functional equation .

Lemma 8.2.1. For any a ∈ R and x > 0 we have X 2 √ X 2 (8.2.1) e−π(n+a) /x = x e−πn x−2iπna. n∈Z n∈Z Proof. By (6.3.2) we have, taking t = xu − a, Z ∞ X 2 X 2 e−π(n+a) /x = e−π(t+a) /x+2iπmtdt −∞ n∈Z m∈Z Z ∞ X 2 2 = x e−π(xm +2ima) e−πx(u−im) du. −∞ m∈Z If we change variables v = u−im in the final integral we are integrating the function 2 e−πxv from −∞ to ∞ along a path shifted a little bit up or down. The value of the integral does not change since there are no singularities of this function, so its R ∞ −πxv2 √ √ R ∞ −πw2 value is −∞ e dv = C/ x , letting w = xv, where C := −∞ e dw. This gives (8.2.1) with the right side multiplied through by C; taking a = 0, x = 1 we deduce that C = 1 and hence our result. 

91 92 8. THE FUNDAMENTAL PROPERTIES OF ζ(s)

If we differentiate (8.2.1) with respect to a we obtain X 2 X 2 (8.2.2) (n + a)e−π(n+a) /x = ix3/2 ne−πn x−2iπna. n∈Z n∈Z 8.3. A functional equation for the Riemann zeta function

2 Suppose that Re(s) > 1. Writing ω(x) := P e−πn x, we obtain from (8.2.1) √ n≥1 with a = 0 that 2ω(1/x) + 1 = x(2ω(x) + 1). Therefore Z 1 Z ∞ Z ∞ √  s −1 − s −1 − s −1 x − 1 √ x 2 ω(x)dx = x 2 ω(1/x)dx = x 2 + xω(x) dx 0 1 1 2 Z ∞ Z ∞ 1 1 − s+1 1 1−s −1 = − + x 2 ω(x)dx = + x 2 ω(x)dx. s − 1 s 1 s(s − 1) 1 Hence by (7.8.3) we obtain Z ∞ Z ∞ − s  s  X s −1 −πn2x s −1 π 2 Γ ζ(s) = x 2 e dx = x 2 ω(x)dx 2 n≥1 0 0 ∞ 1 Z s 1−s dx (8.3.1) = − + (x 2 + x 2 )ω(x) s(1 − s) 1 x This equation is important for two reasons. Firstly since ω(x) gets small very rapidly as x gets larger, we see that the integral on the right of (8.3.1) converges for all s, not just those with Re(s) > 1. Thus this formula provides an analytic s continuation of Γ( 2 )ζ(s) except at the points s = 0, 1 where we get poles of order 1. Moreover, one can see that (8.3.1) remains unchanged if we replace s by 1 − s. A convenient way to write this information is to

1 − s  s  Define ξ(s) := s(s − 1)π 2 Γ ζ(s) 2 2 so that ξ(s) is analytic and satisfies the functional equation (8.3.2) ξ(s) = ξ(1 − s).

s s Now 2 Γ( 2 ) has no zeros (see section 7.9), and so the poles of (s − 1)ζ(s) are the same as those of ξ(s) (of which there are none). Therefore the only pole of ζ(s) lies at s = 1; and if Re(s) < 0 then ζ(s) has trivial zeros at s = −2, −4, −6,... Exercise 8.3.1. Use (6.8.5) to rewrite (8.3.2) as π ζ(1 − s) = 21−sπ−s(cos s)Γ(s)ζ(s). 2 8.4. 8.4. Properties of ξ(s) Using the bound obtained on ζ(s) in section 8.1 and Stirling’s formula we see that log |ξ(s)| ∼ |s| log |s| for Re(s) ≥ 1/2, as |s| → ∞, provided ζ(s) 6= 0. We also get this inequality in Re(s) ≤ 1/2 using the functional equation (8.3.2). Therefore ξ(s) is an analytic function of order 1 and so we can write Y (8.4.1) ξ(s) = eAs+B (1 − s/ρ)es/ρ ρ: ξ(ρ)=0 8.5. A ZERO-FREE REGION FOR ζ(s) 93 by (6.4.3). The zeros of ξ(s) are precisely the non-trivial zeros of ζ(s); that is, the zeros in the critical strip 0 ≤ Re(s) ≤ 1, the others having been cancelled by s P 1+ the zeros of Γ( 2 ). From section 6.4 we know that ρ: ξ(ρ)=0 1/|ρ| converges for P every  > 0. However ρ: ξ(ρ)=0 1/|ρ| must diverge, else we would have the bound log |ξ(s)|  |s|.1 Taking the logarithmic derivative of (8.4.1) gives ξ0(s) X  1 1 (8.4.2) = A + + , ξ(s) s − ρ ρ ρ: ξ(ρ)=0 so that, by (6.8.5) we obtain, noting that the zeros of ζ(s) are precisely those of ξ(s) together with the trivial zeros −2, −4, −6,..., ζ0(s) 1 X  1 1 (8.4.3) = + A0 + + , ζ(s) 1 − s s − ρ ρ ρ: ζ(ρ)=0

0 0 0 γ 1 ξ (s) ξ (1−s) where A = A + 2 + 2 log π. By (8.3.2), we have ξ(s) + ξ(1−s) = 0, and that if ξ(ρ) = 0 then ξ(1 − ρ) = 0; hence, by (8.4.2), X  1 1 X  1 1  X 1 0 = 2A + + + + = 2A + 2 , s − ρ ρ 1 − s − ρ0 ρ0 ρ ρ: ξ(ρ)=0 ρ0: ξ(ρ0)=0 ρ: ξ(ρ)=0 adding the terms 1/(s − ρ) and 1/(1 − s − ρ0) where ρ0 = 1 − ρ. Therefore X 1 (8.4.4) A = − . ρ ρ: ξ(ρ)=0 We have seen that this sum does not converge absolutely, but if we pair up the ρ and 1 − ρ terms, or the ρ and ρ terms, then it does. Moreover if ρ = β + iγ then Re(1/ρ + 1/ρ) = 2β/(β2 + γ2), so A is a negative real number. Exercise 8.4.1. In this exercise we evaluate A and B in (8.4.1). a) Use (6.8.6) to show that Γ(1/2) = π1/2, and deduce, using the definition of ξ, that eB = ξ(0) = ξ(1) = 1/2. b) Use (8.4.2), the functional equation, and exercises 6.8.4 and 8.1.1 and to show 0 0 1 γ that A = ξ (0)/ξ(0) = −ξ (1)/ξ(1) = 2 log 4π − 1 − 2 = −.0230957084 .... Deduce that A0 = log 2π − 1 = .837877 .... c) Deduce, using (8.4.4), that if ξ(ρ) = 0 with Re(ρ) ≥ 1/2 then |ρ| ≥ 6.580128218 ....

8.5. A zero-free region for ζ(s) We begin by proving that ζ(1 + it) 6= 0 for all real t. This was the final step in the proof of the prime number theorem in 1896, and the proof is quite beautiful. Starting from the Euler product we have X  1  X X 1 log ζ(σ + it) = − log 1 − = pσ+it mpm(σ+it) p prime p prime m≥1 for σ > 1, so that X X cos(mt log p) (8.5.1) log |ζ(σ + it)| = Re (log ζ(σ + it)) = . mpmσ p prime m≥1

1Note that this implies that ζ(s) has infinitely many zeros in the critical strip. 94 8. THE FUNDAMENTAL PROPERTIES OF ζ(s)

Now if ζ(1 + it) = 0 then (8.5.1) yields that the cos(mt log p) have a bias as we vary over prime powers pm, pointing significantly more often in the negative than positive direction. But this implies that cos(2mt log p) should point significantly more often in the positive than negative direction, so that ζ(1 + 2it) is unbounded, which we know is impossible. The proof (of Mertens) that we now give formalizes this heuristic. The first thing to notice is that for any θ, 3 + 4 cos θ + cos 2θ = 2(1 + cos θ)2 ≥ 0, so that 3 log |ζ(σ)| + 4 log |ζ(σ + it)| + log |ζ(σ + 2it)| ≥ 0 by (8.5.1), and hence (8.5.2) ζ(σ)3 · |ζ(σ + it)|4 · |ζ(σ + 2it)| ≥ 1. Now assume that ζ(1 + it) = 0 so that ζ(σ + it) ∼ C(σ − 1)r for some integer r ≥ 1 and constant C 6= 0, as σ → 1+. We also know that ζ(σ) ∼ 1/(σ − 1) as σ → 1+. But then (8.5.2) implies that there exists  > 0 such that if |σ − 1| <  then |ζ(σ + 2it)| ≥ 1/(2C4(σ − 1)4r−3) ≥ 1/(2C4(σ − 1)). This implies that 1 + 2it is a pole of ζ(s), giving a contradiction. We can extend this proof to obtain a zero-free region for ζ(s), that is a region of the complex plane without zeros of ζ(s). Now, by (8.4.3) and exercise 6.8.5, we have ζ0(s) 1 X  1 1 (8.5.3) = − log |s| + O(1) + + . ζ(s) 1 − s s − ρ ρ ρ: ζ(ρ)=0 0≤Re(ρ)≤1 Now Re(1/ρ) > 0 as Re(ρ) > 0, and so if Re(s) ≥ 1 > Re(ρ) then Re(1/(s−ρ)) ≥ 0. Suppose that s = σ + it where σ > 1 so that Re(ζ0(s)/ζ(s)) ≥ Re(1/(1 − s)) − log |s| + O(1); and if ζ(β + it) = 0 for some β < 1 then we can add a 1/(σ − β) to the lower bound. Next we again use the cosine inequality, this time with the series ζ0(s) X X cos(mt log p) −Re = log p , ζ(s) pmσ p prime m≥1 so that ζ0(σ) ζ0(σ + it) ζ0(σ + 2it) (8.5.4) 0 ≤ −3 Re − 4 Re − Re . ζ(σ) ζ(σ + it) ζ(σ + 2it) Assuming that ζ(β + it) = 0 and σ is close to 1, this is 3 4 (8.5.5) ≤ − + 5 log(|t| + 2) + O(1) σ − 1 σ − β since Re(1/(σ + it − 1)) ≤ 1/|t − 1|  1. Selecting σ = 1 + 1/(10 log(|t| + 2)), we deduce that 1  1  (8.5.6) β ≤ 1 − + O . 70 log(|t| + 2) (log(|t| + 2))2 Since there are no zeros near to σ = 1 (by exercise 8.4.1c), one can prove by these methods, the more convenient 1 β ≤ 1 − . 71 log(|t| + 2) 8.6. APPROXIMATIONS TO ζ0(s)/ζ(s) 95

In 1922, Littlewood enlarged the width of the zero-free region to  log log |t|/ log |t|, and in 1958 by Korobov and Vinogradov to  1/(log |t|)2/3+, a central result that has not been improved in fifty years.

8.6. Approximations to ζ0(s)/ζ(s) Following (8.4.4) and (8.5.3) we have X 1 ζ0(s) 1 (8.6.1) = + log |s| + + O(1). s − ρ ζ(s) s − 1 ρ: ζ(ρ)=0 0≤Re(ρ)≤1 The right side equals log T + O(1) when s = 2 + iT for large T . For 0 ≤ β ≤ 1, the real part of 1/(2 + iT − (β + iγ)) is (2 − β)/((2 − β)2 + (T − γ)2) ≥ 1/(4 + (T − γ)2), and so X 1 (8.6.2) ≤ log T + O(1). 4 + (T − γ)2 ρ: ζ(β+iγ)=0 0≤β≤1 We deduce that there are ≤ 8 log T + O(1) zeros β + iγ for which |T − γ| ≤ 2. Now take (8.6.1) with s = σ + iT (which is not a zero of ζ(s)) and σ bounded, and subtract (8.6.1) with s = 2 + iT . The terms corresponding to a zero ρ give

1 1 2 − σ 2 − σ − = ≤ . s − ρ 2 + iT − ρ |s − ρ| · |2 + iT − ρ| |T − γ|2 We will use this bound when |T − γ| ≥ 2; and note that 1/|2 + iT − ρ| ≤ 1 for all such ρ by considering the real part. Therefore for s = σ + iT we deduce from (8.6.2) that

0 ζ (s) X 1 X 1 X 2(2 − σ) − ≤ + + O(1) ζ(s) s − ρ |2 + iT − ρ| 4 + |T − γ|2 ρ: ζ(ρ)=0 ρ: ξ(ρ)=0 ρ: ξ(ρ)=0 |T −γ|≤2 |T −γ|≤2 |T −γ|≥2 ≤ (12 − 2σ) log T + O(1)(8.6.3) Suppose that we select s which is not too close to any zero of ζ(s), that is |s − ρ|  1/ log T for every ρ such that ζ(ρ) = 0. Then the contribution of the sum on the left side of (8.6.3) is  (log T )2, as the sum contains  log T terms, and so we can deduce that for |σ| ≤ 2 0 ζ (σ + it) 2 (8.6.4)  (log(|t| + 2)) . ζ(σ + it) If σ ≤ −1 we can do better by using the functional equation as presented in exercise 8.3.1. Thus if s = 1 − (σ + it) then ζ0(1 − s) π π  Γ0(s) ζ0(s) = − log(2π) − tan s + + ζ(1 − s) 2 2 Γ(s) ζ(s) 1 = + log |s| + O(1)(8.6.5) s − 2m − 1 using exercise 6.8.5, where 2m is the even integer nearest to −σ. 96 8. THE FUNDAMENTAL PROPERTIES OF ζ(s)

8.7. On the number of zeros of ζ(s) We know that ζ(s) has the same zeros in the critical strip as ξ(s); and ξ(s) has the advantage that it is analytic. Therefore the number of zeros, N(T ), of ζ(s) inside C := {s : 0 ≤ Re(s) ≤ 1, 0 ≤ Im(s) ≤ T } is given by 1 I ξ0(s) 1 (8.7.1) N(T ) = ds = 4C (arg(ξ(s))) 2iπ C ξ(s) 2π by the argument principle as discussed in section 6.5, so long as there are no zeros on C: We showed in section 8.5 that ξ(s) has no zeros with Re(s) = 1, which implies via the functional equation (8.3.2) that ζ(s) has no zeros with Re(s) = 0. By exercise 8.4.1c there are no zeros in this region with Re(s) = 0 (or even small). We need only to make sure that there is no zero with Im(s) = T . Now, from (8.3.2) we know that ξ(s) = ξ(1−s) = ξ(1 − s); in particular ξ(σ +it) = ξ(1 − σ + it); and so the change of argument as we proceed along the path P which goes from 1/2 to 1, then 1 to 1 + iT , and then 1 + iT to 1/2 + iT , is the same as when we proceed around the rest of C. Moreover ξ(s) is real-valued (by definition) and positive for −1 ≤ s ≤ 2 since it has no zeros close to 0, hence there is no change in arg(ξ(s)) 1 as we go along this line. We have therefore proved that N(T ) equals π times the change in argument of ξ(s) along the path L which goes from 1 to 1 + iT , and then from 1 + iT to 1/2 + iT . For the next part of the calculation it is easiest if we widen C, to allow −1 ≤ Re(s) ≤ 2: this does not change the value of (8.7.1) since there are no further zeros of ξ(s) in this region, nor any of the arguments above. By definition log π s arg(ξ(s)) = arg(s) + arg(s − 1) − Im(s) + arg(Γ( )) + arg(ζ(s)). 2 2 π Now the arguments of both s and s − 1 change from 0 to 2 + O(1/T ). Stir- s ling’s formula (see exercise 8.7.1 below) tells us that arg(Γ( 2 )) changes from 0 to T T  π 1  2 log 2eπ − 8 + O T . Therefore T  T  7  1  (8.7.2) N(T ) = log + + S(T ) + O , 2π 2eπ 8 T

1 1 where S(T ) := π arg ζ( 2 + iT ) (since arg ζ(2) = 0). By exercise 8.7.2 we see that arg ζ(2 + iT ) is bounded, and so, using (8.6.3),  1  Z 2+iT ζ0(s) πS(T ) = arg ζ( + iT ) − arg ζ(2 + iT ) + O(1) = − Im ds + O(1) 2 1 ζ(s) 2 +iT X Z 2+iT  1  = − Im ds + O(log T ) 1 +iT s − ρ ρ: ζ(ρ)=0 2 |T −γ|≤2 X  1  = − arg( + iT − ρ) − arg(2 + iT − ρ) + O(log T ). 2 ρ: ζ(ρ)=0 |T −γ|≤2 Evidently each such change in argument contributes at most π, and we have seen that the sum has  log T terms, and so we deduce that (8.7.3) S(T )  log T, 8.7. ON THE NUMBER OF ZEROS OF ζ(s) 97 which implies that T  T  (8.7.4) N(T ) = log + O (log T ) . 2π 2eπ We deduce, by partial summation, that X 1 1  T  (8.7.5) = log2 + O(1). |ρ| 2π 2π ρ: ζ(ρ)=0 0

Exercises

Exercise 8.7.1. Use (6.8.5) and the Taylor series for log(1 + z) to show that  1 T   T  π  T   1  −4 log Γ + i = πT +log +1−2 log(2π)+i − 2T log +O . 4 2 2e 2 2e T Exercise 8.7.2. Use (8.6.2) to show that N(T + 1) − N(T )  log T . Use this together with (8.7.2) to show that N(T + 1) − N(T )  log T for at least a positive proportion of integers T . Deduce also that that there exists t ∈ [T,T + 1] which is at a distance  1/ log T from the nearest zero.

Exercise 8.7.3. Show that there exists a constant ∆0 such that if ∆ ≥ ∆0 then N(T + ∆) − N(T )  ∆ log T for all sufficiently large T . Exercise 8.7.4. Prove that the argument of ζ(2 + it) is bounded.

CHAPTER 9

The proof of the Prime Number Theorem

9.1. The explicit formula In chapter 3 we explained that one can work with the weighted sum X Ψ(x) := log p. p prime, m≥1 pm≤x The reason that we prefer this to π(x) is that when we use Perron’s formula (6.6.2), we obtain (provided x is not a prime power), the rather elegant formula 1 Z c+i∞ X log p xs Ψ(x) = ds 2iπ pms s c−i∞ p prime, m≥1 1 Z c+i∞  ζ0(s) xs (9.1.1) = − ds 2iπ c−i∞ ζ(s) s provided Re(s) > 1 so that we can justify swapping the order of summation and integration. We now move the contour away to the left, so to Re(s) = −N for some large odd integer N. What are the poles of the integrand? For each zero and pole of ζ(s) with −N < Re(s) < c we have a pole of order 1, as well as a pole at s = 0. So the contribution of the pole at s = 0 is −ζ0(0)/ζ(0) and of the pole of ζ(s) at s = 1 is x. If ρ is a zero of ζ(s) of order m then its contribution is −mxρ/ρ. Hence we obtain X xρ ζ0(0) 1 Z −N+i∞  ζ0(s) xs Ψ(x) = x − − + − ds ρ ζ(0) 2iπ ζ(s) s ρ: ζ(ρ)=0 −N−i∞ Re(ρ)>−N where we count a zero ρ of ζ(s) of multiplicity m, m times in the sum. Now, the integrand can be proved to go to 0 as N → ∞. Moreover the zero ρ = −2m 2m 1 1 contributes 1/(2mx ), which gives a total of − 2 log(1− x2 ) when we sum up from 1 to ∞. Hence we have:

99 100 9. THE PROOF OF THE PRIME NUMBER THEOREM

The explicit formula

X xρ ζ0(0) 1  1  (9.1.2) Ψ(x) = x − − − log 1 − . ρ ζ(0) 2 x2 ρ: ζ(ρ)=0 0

This amazing exact formula gives the number of primes up to x, a discontinuous function, as a sum of many continuous functions. As beautiful as this formula is, it has the drawback that it involves infinitely many terms, so it is far from clear how to manipulate such a sum. It is more convenient for us to truncate the sum at a height T , that is to consider only those zeros β + iγ inside C = {s : 0 ≤ Re(s) ≤ 1, −T ≤ Im(s) ≤ T }, and we have a spectacularly accurate estimate for the number of zeros inside this box thanks to (8.7.2). To work with such a truncated sum, so we must use (the technically complicated) Proposition 6.6.1 rather than (the rather more elegant) (6.6.2). Proceeding analogously to as above we obtain that c 1 Z c+iT  ζ0(s) xs X log p  x  Ψ(x) − − ds  . 2iπ ζ(s) s 1 + T | log(x/pm)| pm c−iT p prime, m≥1 If | log(x/pm)| ≤ 1 then let d be the closest integer to x−pm, so that | log(x/pm)| = | log(1+(pm−x)/x)|  |d|/x. Therefore the right side here is, taking c = 1+1/ log x, c X log p  x  X log x  + 1 + T pm 1 + T |d|/x p prime, m≥1 0≤|d|x xc  ζ0(c) x log x log T x log x log T  − + + log x  + log x. T ζ(c) T T Next we wish to evaluate the integral, and idea will be to make a closed contour, namely the rectangle with vertices at c − iT, c + iT, −N + iT, −N − iT , and the integral around this contour is the sum of the contributions of the residues inside the contour, namely X xρ ζ0(0) X 1 x − − + . ρ ζ(0) mxm ρ: ζ(ρ)=0 1≤m≤N 0

Combining these estimates and letting N → ∞ we obtain (9.1.3) X xρ ζ0(0) 1  1  x log x log T  Ψ(x) = x − − − log 1 − + O + log x ρ ζ(0) 2 x2 T ρ: ζ(ρ)=0 0

9.2. Proving the prime number theorem Our objective is to prove that ψ(x) ∼ x. Therefore we want to pick T to be somewhat bigger than (log x)2 so that the error term in (9.1.4) is o(x). This leaves us with a sum over the zeros of the Riemann zeta-function and this is X xρ X xRe(ρ) (log T )2  ≤ ≤ x1−1/(71 log T ) + O(1) ρ |ρ| 2π ρ: ζ(ρ)=0 ρ: ζ(ρ)=0 0

1 1/2 by (8.5.6) for T sufficiently large, and (8.7.5). Now, selecting T = exp( 9 (log x) ) and inserting the last estimate into (9.1.4) we obtain

 − 1 (log x)1/2  (9.2.1) Ψ(x) = x + O xe 10 . By partial summation one can deduce that

 − 1 (log x)1/2  (9.2.2) π(x) = Li(x) + O xe 10 .

9.3. Assuming the Riemann Hypothesis 1 If we now assume that Re(ρ) ≤ λ for some fixed λ ∈ [ 2 , 1) then the sum in (9.1.4) is  xλ(log T )2. Taking T = x we deduce that (9.3.1) Ψ(x) = x + O xλ(log x)2 and π(x) = Li(x) + O xλ log x . 1 We can take λ = 2 if the Riemann Hypothesis is true. Surprisingly we can also prove a converse result. If Re(s) > 1 then ζ0(s) X log p s Z ∞ (Ψ(x) − x) (9.3.2) − = = + s dx. ζ(s) pms s − 1 xs+1 p prime, m≥1 1 102 9. THE PROOF OF THE PRIME NUMBER THEOREM

If (9.3.1) holds then the right side of (9.3.2) converges for Re(s) > λ. This implies that ζ(s) has no zeros with Re(s) > λ else the left side would diverge at that point. Therefore

Theorem 9.3.1

We have the following equivalence: Ψ(x) = x + O(x1/2(log x)2) if and only if the Riemann Hypothesis is true.

9.4. Primes in short intervals, via zeta functions Exercise 9.4.1. Assuming the Riemann Hypothesis deduce the following state- ments, from Theorem 9.3.1:

(i) For any fixed  > 0 there exists a constant κ such that if x ≥ 1 then 1/2+ every interval (x, x + κx contains a prime number. (ii) Prove that if n is sufficiently large then there is a prime between n3 and (n + 1)3. (iii) Prove that for any fixed k ≥ 3, there is a prime between any two consec- utive, sufficiently large, kth powers. This leaves open the famous conjecture that there is a prime between any two consecutive squares. One can attempt to prove (11.1.1) unconditionally by modifying the methods used in the proof of the prime number theorem. This leads to having to bound differences ρ ρ Z x+y (x + y) − x ρ−1 <(ρ)−1 = t dt ≤ y(x + y) . ρ x With bounds on the density of zeros of ζ(s) well to the right of 1/2, of the form #{ρ : ζ(ρ) = 0, Re(ρ) ≥ σ, |Im(ρ)| ≤ T } < cT κ(1−σ)(log T )A for some constants c, κ, A > 0, one can show that (11.1.1) holds for y > x1−1/κ+ for any  > 0. Such zero-density estimates are known with κ = 12/5, and believed to hold for any κ > 2 (which is called the density hypothesis); from σ = 1/2 we see that one cannot have such a result with κ < 2; and thus, even assuming the Riemann Hypothesis, such methods seem unlikely to allow intervals shorter than x1/2, and so this question seems stuck. CHAPTER 10

The prime number theorem for arithmetic progressions

Any prime divisor of (a, q) divides every number in the arithmetic progression a (mod q), and so there can only be finitely many primes ≡ a (mod q) if (a, q) > 1. On the other hand if if (a, q) = 1 then there appear to be infinitely many primes of the form qn+a. For example, if we look at the primes in the arithmetic progressions (mod 10):

11, 31, 41, 61, 71, 101, 131, 151, 181, 191, 211, 241,... 3, 13, 23, 43, 53, 73, 83, 103, 113, 163, 173, 193, 223, 233,... 7, 17, 37, 47, 67, 97, 107, 127, 137, 157, 167, 197, 227,... 19, 29, 59, 79, 89, 109, 139, 149, 179, 199, 229, 239,... then there seem to be lots of primes in each such arithmetic progression. Moreover there seem to roughly equal numbers in each, and this pattern persists as we look further out. Let φ(q) denote the number of a (mod q) for which (a, q) = 1 (which are the only arithmetic progressions in which there can be more than one prime), and so we expect that

X x Θ(x; q, a) := log p ∼ as x → ∞. φ(q) p prime p≤x p≡a (mod q)

This is the prime number theorem for arithmetic progressions and was first proved by suitably modifying the proof of the prime number theorem. That there are infinitely many primes ≡ a (mod q) whenever (a, q) = 1 was first proved by Dirichlet, and is a difficult result to establish. To establish the prime number theorem for arithmetic progressions one obtains a result that is analogous to (9.1.4). To start with we define the Dirichlet characters (mod q) which are (multiplicative) homomorphisms χ : Z → C, which are periodic with minimal period q, and χ(n) = 0 whenever (n, q) > 1. These have the key property that

( 1 X 1 if m ≡ 1 (mod q); χ(m) = φ(q) 0 otherwise. χ (mod q)

103 104 10. THE PRIME NUMBER THEOREM FOR ARITHMETIC PROGRESSIONS

To recognize an integer n in the arithmetic progression a (mod q), we let m ≡ n/a (mod q), and therefore X X 1 X Ψ(x; q, a) : = log p = log p χ(pk)χ(a) φ(q) p prime, k≥1 p prime, k≥1 χ (mod q) pk≤x pk≤x p≡a (mod q) 1 X X 1 X = χ(a) χ(pk) log p =: χ(a)Ψ(x, χ), φ(q) φ(q) χ (mod q) p prime, k≥1 χ (mod q) pk≤x where χ(a) = χ(a) = χ(1/a). Each of the Ψ(x, χ) can be analyzed analogously to how we analyzed Ψ(x) in the previous chapter. The simplest character mod q, is the principal character χ0(n) which is 1 if (n, q) = 1 and 0 otherwise. The value of Ψ(x, χ0) differs from that of ψ(x) by O(ω(q) log x) which is small compared to the main term x. Therefore we obtain something like Ψ(x) 1 X X xρ Ψ(x; q, a) = − χ(a) + Small Error φ(q) φ(q) ρ χ (mod q) ρ: L(ρ,χ)=0 χ6=χ0 0 1, ns n≥1 and then use analytic continuation to analytically extend the function to the whole complex plane. As we did for Ψ(x) we again need to find zero-free regions, this time for the L(s, χ). This implies the prime number theorem for arithmetic progressions for q fixed with x → ∞.

10.1. Uniformity in arithmetic progressions By when are we guaranteed that the primes are more-or-less equi-distributed amongst the arithmetic progressions a (mod q) with (a, q) = 1? That is, for what x do we have x (10.1.1) Θ(x; q, a) ∼ for all (a, q) = 1? φ(q) Here x should be a function of q, and the asymptotic should hold as q → ∞. Calculations suggest that, for any  > 0, if q is sufficiently large and x ≥ q1+ then the primes up to x are equi-distributed amongst the arithmetic progressions a (mod q) with (a, q) = 1, that is (10.1.1) holds. However no one has a plausible plan of how to prove such a result at the moment. The slightly weaker statement that (10.1.1) holds for any x ≥ q2+, can be shown to be true, assuming the Generalized Riemann Hypothesis. This gives us a clear plan for proving such a result, but one which has seen little progress in the last century! The best unconditional results known involve much larger values of x, equidis-  tribution only being proved once x ≥ eq . This is the Siegel-Walfisz Theorem, and 10.1. UNIFORMITY IN ARITHMETIC PROGRESSIONS 105 it can be stated in several (equivalent) ways with an error term: For any B > 0 we have x  x  (10.1.2) Θ(x; q, a) = + O for all (a, q) = 1. φ(q) (log x)B Or: for any A > 0 there exists B > 0 such that if q < (log x)A then x   1  (10.1.3) Θ(x; q, a) = 1 + O for all (a, q) = 1. φ(q) (log x)B That x needs to be so large compared to q limits the applicability of this result. The big difficulty is that we cannot rule out non-principal characters χ with 2 χ = χ0, which might have a real zero β that is very close to 1: If, for example. β = 1 − 1/(log q)2 and x = qA then xβ/β = x(1 + O(1/ log q)), which provides a secondary term of size x/φ(q) which is roughly the same as the main term. Then we surprisingly seem to have ( o(x/φ(q)) if χ(a) = 1; (10.1.4) Θ(x; q, a) = (2 + o(1))x/φ(q) if χ(a) = −1. In other words, if we have a β then, at least for a certain range of values of x, almost all of the primes p ≤ x satisfy χ(p) = −1. The great breakthough of the second-half of the twentieth century came in appreciating that for many applications, it is not so important that we know that equidistribution holds for every a with (a, q) = 1, and every q up to some Q, but rather that it holds for most such q (with Q = x1/2−). It takes some juggling of variables to state the Bombieri-Vinogradov Theorem: We are interested, for each modulus q, in the size of the largest error term

x max Θ(x; q, a) − , a mod q φ(q) (a,q)=1 or even

y max max Θ(y; q, a) − . y≤x a mod q φ(q) (a,q)=1 x x x The bounds − φ(q) ≤ Θ(x; q, a) − φ(q) ≤ ( q + 1) log x are trivial, the upper bound obtained by bounding the possible contribution from each term of the arithmetic progression. We would like to improve on these bounds, perhaps by a power of log x (as in (10.1.2)), but we are unable to do so for all q. However, what we can prove is that exceptional q are few and far between,1 and the Bombieri-Vinogradov Theorem expresses this in a useful form. The “trivial” upper bound, obtained by adding up the above quantities over all q ≤ Q < x, is   X x X 2x x 2 max Θ(x; q, a) − ≤ log x +  x(log x) . a mod q φ(q) q φ(q) q≤Q (a,q)=1 q≤Q (Throughout, the symbol “”, as in “f(x)  g(x)” means “there exists a constant c > 0 such that f(x) ≤ cg(x).”) The Bombieri-Vinogradov states that we can beat √this trivial bound by an arbitrary power of log x, provided Q is a little smaller than x:

1 x Exceptional q being those q for which |Θ(x; q, a) − φ(q) | is not small, for some a coprime to q. 106 10. THE PRIME NUMBER THEOREM FOR ARITHMETIC PROGRESSIONS

The Bombieri-Vinogradov Theorem

For any given A > 0 there exists a constant B = B(A), such that

X x x max Θ(x; q, a) − A a mod q φ(q) (log x)A q≤Q (a,q)=1 where Q = x1/2/(log x)B.

In fact one can take B = 2A + 5; and one can also replace the summand here by the expression above with the maximum over y (though we will not need to use this here). √ 10.2. Breaking the x-barrier

It is believed that estimates like√ that in the Bombieri-Vinogradov Theorem hold with Q significantly larger than x; indeed Elliott and Halberstam conjectured [?] that one can take Q = xc for any constant c < 1:

The Elliott-Halberstam conjecture

1 For any given A > 0 and η, 0 < η < 2 , we have

X x x max Θ(x; q, a) −  a mod q φ(q) (log x)A q≤Q (a,q)=1 where Q = x1/2+η.

It was shown in [?] that one cannot go so far as to take Q = x/(log x)B. The Elliott-Halberstam conjecture seems to be too difficult to prove for now, but progress has been made when restricting to one particular residue class: Fix 1 integer a 6= 0. We believe that for any fixed η, 0 < η < 2 , one has

X x x Θ(x; q, a) −  φ(q) (log x)A q≤Q (q,a)=1 where Q = x1/2+η, which follows from the Elliott-Halberstam conjecture (but is weaker). The key to progress has been to notice that if one can“factor” the key terms here then the extra flexibility allows one to make headway. For example by factoring the modulus q as, say, dr where d and r are roughly some pre-specified sizes. The simplest class of integers q for which this can be done is the y-smooth integers, those integers whose prime factors are all ≤ y. For example if we are given a y-smooth integer q and we want q = dr with d not much smaller than D, then we select d to be the largest divisor of q that is ≤ D and we see that D/y < d ≤ D. This is precisely the class of moduli that considered in 2013. 10.4. TO BE PUT LATER 107

Theorem 10.2.1 Yitang Zhang’s Theorem

There exist constants η, δ > 0 such that for any given integer a, we have

X x x (10.2.1) Θ(x; q, a) − A φ(q) (log x)A q≤Q (q,a)=1 q is y−smooth, and squarefree where Q = x1/2+η and y = xδ.

1 Zhang [?] proved his Theorem for η/2 = δ = 1168 . We expect that this estimate holds for every η ∈ [0, 1/2) and every δ ∈ (0, 1], but just proving it for any positive pair η, δ > 0 is an extraordinary breakthrough that has an enormous effect on number theory, since it is such an applicable result (and technique). This is the technical result that truly lies at the heart of Zhang’s result about bounded gaps between primes.

10.3. Improvements to the Brun-Titchmarsh Theorem In the Brun-Titchmarsh Theorem (Theorem 5.3.1) we have an upper bound that seems to give as an upper bound, twice as many primes in an arithmetic progression as we expect. So we might expect to replace the “2” there, by a “1”, asymptotically. In particular, when x = 0, we have that if y  C then 2y π(y; q, a) ≤ , φ(q) log(y/q) for any arithmetic progression a (mod q) with (a, q) = 1. If we could improve this, obtaining 2 −  in place of 2, say for y ≥ qB, then taking A ≥ max{B, 2/}, the bound obtained for y = qA here would contradict the second case in (10.1.4). This would imply that Dirichlet L-functions do not have Siegel zeros, which would be an exciting development. For this reason we believe that it is likely to be difficult to improve the Brun-Titchmarsh Theorem.

10.4. To be put later The Elliott-Halberstam conjecture was the starting point for the work of Gold- ston, Pintz and Yıldırım [?]. It can be applied to obtain the following result, which we will prove. Theorem 10.4.1 (Goldston-Pintz-Yıldırım). [?] Let k ≥ 2, l ≥ 1 be integers, and 0 < η < 1/2, such that  1   2l + 1 (10.4.1) 1 + 2η > 1 + 1 + . 2l + 1 k Assume that the Elliott-Halberstam conjecture holds with Q = x1/2+η. If x + a1, . . . , x + ak is an admissible set of forms then there are infinitely many inte- gers n such that at least two of n + a1, . . . , n + ak are prime numbers. The conclusion here is exactly the statement of Zhang’s main theorem. 108 10. THE PRIME NUMBER THEOREM FOR ARITHMETIC PROGRESSIONS

If the Elliott-Halberstam conjecture holds for some η > 0 then select l to be an  1  √ integer so large that 1 + 2l+1 < 1 + 2η. Theorem 10.4.1 then implies Zhang’s theorem for k = (2l + 1)2. CHAPTER 11

Gaps between primes, I

11.1. The Gauss-Cram´erheuristic predictions about primes in short intervals In section 2.11 we presented Cram´er’sinterpretation of Gauss’s statement about the density of primes as a heuristic to model the distribution of primes: Let X3,X4,... be independent random variables with Prob(Xn = 1) = 1/ log n and Prob(Xn = 0) = 1 − 1/ log n. Cram´ersuggested that the indicator sequence for the primes, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 1,... , is “typical” of the space of sequences of 0’s and 1’s under these probabilities, and so anything that can be said for this space with probability 1 is surely true of the primes. For example this led to the prediction (2.11.1), which is equivalent to the Riemann Hypothesis. Gauss referred to the primes “around” x, so perhaps is best applied to un- derstanding the distribution of primes in short intervals at around x? Gauss’s statement suggests that one should ask whether y (11.1.1) π(x + y) − π(x) ∼ log x is true, for “small” y (to be close to x), say for |y| ≤ x/2. The asymptotic (11.1.1) 1 1 cannot be true if y is too small. For example, if y = 2 log x we cannot have 2 a prime in each interval. We need y to be large enough for the prediction makes sense; the Gauss-Cram´ermodel suggests that (11.1.1) should hold provided |y| > (log x)2+. (In other words the analogous statement holds for sequences of 0’s and 1’s, under our probabilistic model, with probability → 1 as x → ∞, for all y in the range (log x)2+ ≤ |y| ≤ x/2.) Selberg was able to show that (11.1.1) is true for “almost all” x when |y| = (log x)2+, assuming the Riemann Hypothesis; here “almost all” means 100%, not “all”, and it is feasible that there are infinitely many counterexamples, though at that time it seemed unlikely. It therefore came as a surprise when Maier showed that (11.1.1) fails for infinitely many integers x with y = (log x)2+, or even y = (log x)A for any fixed A > 0. His ingenious proof rests on showing that the small primes do not always divide as many integers in an interval as one might expect. For a “typical” sequence of 0’s and 1’s, how many 1’s are there in intervals 1 1 1 of length 2 log x around x? On average the answer is 2 , but one cannot have 2 a prime in an interval. Rather there must be some sort of distribution function. That some of the time there are no primes in such an interval, sometimes 1, and sometimes more. In our model, when n is close to x, the function 1/ log n is very close to 1/ log x, and so we are essentially asking about the distribution of the sum 1 of 2 L independent copies of probability distribution X with Prob(X = 1) = 1/L 1 and Prob(X = 0) = 1 − 1/L, where L = log x. This sum has expectation 2 , and is well known to satisfy the Poisson distribution. That is, we expect that the

109 110 11. GAPS BETWEEN PRIMES, I

1 proportion of values of x, for which [x, x+ 2 log x] contains exactly k primes, should be ∼ e−1/2(1/2)k/k!. A similar result can be proved with any fixed real number 1 λ > 0 in place of 2 : The proportion of intervals [x, x + λ log x] containing exactly k primes is ∼ e−λλk/k!. The probability that one has no primes in [x, x + λ log x] is ∼ e−λ. How many intervals of length λ log x are there up to x? This depends how you count them: To be disjoint one has x/λ log x such intervals; for each to start at a different integer, more like x − λ log x. Either way, if λ is not too large, not far from x intervals, so expect something like e−λx with no primes. That this is at least one interval, requires λ ≤ log x; and therefore we recover Cram´re’sconjecture that the largest gap between prime numbers up to x, is ∼ (log x)2. A similar argument also suggests the following about the sequence of gaps between primes. Let p1 = 2 < p2 = 3 < . . . be the sequence of primes. The Gauss- Cram´ermodel suggests that the proportion of n for which pn+1 − pn > λ log pn is ∼ e−λ. Moreover the probability distributions for consecutive gaps are independent.

11.2. The prime k-tuplets conjecture Studying the sequence of prime numbers, patterns begin to emerge; for example, primes often come in pairs: 3 and 5; 5 and 7; 11 and 13; 17 and 19; 29 and 31; 41 and 43; 59 and 61,... One might guess that there are infinitely many such prime pairs. But this is an open, elusive question, the conjecture. Until recently there was a lot of data, but little theoretical evidence for it. Staring at the list of primes, other patterns emerge too. For example, we find four primes which have all the same digits, except the last one: 11, 13, 17 and 19; which is repeated with 101, 103, 107, 109; then 191, 193, 197, 199 and one can find many more such examples – are there infinitely many? More simply how about prime pairs with difference 4: 3 and 7; 7 and 11; 13 and 17; 19 and 23; 37 and 41; 43 and 47; 67 and 71,... ; or difference 10: 3 and 13; 7 and 17; 13 and 23; 19 and 29; 31 and 41; 37 and 47; 43 and 53,...? Are there infinitely many such pairs? Such questions were probably asked back to antiquity, but the first clear mention of twin primes in the literature appears in a presentation by Alphonse de Polignac, a student at the Ecole´ Polytechnique in Paris, in 1849. In his honour we now call any integer h, for which there are infinitely many prime pairs p, p + h, a de Polignac number Then there are the Sophie Germain pairs, primes p and q := 2p + 1: 2 and 5; 3 and 7; 5 and 11; 11 and 23; 23 and 47; 29 and 59; 41 and 83; ... ; Can one predict which prime patterns can occur and which do not? Let’s start with differences between primes: One of any two consecutive integers must be even, and so can be prime only if it equals 2. Hence there is just the one pair, 2 and 3, of primes with difference 1. One can make a similar argument for prime pairs with odd difference. Hence if h is an integer for which there are infinitely many prime pairs of the form p, q = p + h then h must be even. We discussed examples for h = 2, for 11.3. A QUANTITATIVE PRIME k-TUPLETS CONJECTURE 111 h = 4 and for h = 10 above, and the reader can similarly construct lists of examples for h = 6 and for h = 8, and indeed for any other even h that takes her or his fancy. This leads us to bet on the generalized twin prime conjecture, which states that for any even integer 2k there are infinitely many prime pairs p, q = p + 2k. What about prime triples? or quadruples? We saw two examples of prime quadruples of the form 10n+1, 10n+3, 10n+7, 10n+9, and believe that there are infinitely many. What about other patterns? Evidently any pattern that includes an odd difference cannot succeed. Are there any other obstructions? The simplest pattern that avoids an odd difference is n, n + 2, n + 4. One finds the one example 3, 5, 7 of such a prime triple, but no others. Further examination makes it clear why not: One of the three numbers is always divisible by 3. This is analogous to one of n, n+1 being divisible by 2; and, similarly, one of n, n+6, n+12, n+18, n+24 is always divisible by 5. The general obstruction can be described as follows: For a given set of distinct integers a1 < a2 < . . . < ak we say that prime p is an obstruction if p divides at least one of n + a1, . . . , n + ak, for every integer n. In other words, p divides

P(n) = (n + a1)(n + a2) ... (n + ak) for every integer n; which can be classified by the condition that the set a1, a2, . . . , ak (mod p) includes all of the residue classes mod p. If no prime is an obstruction then 1 we say that x + a1, . . . , x + ak is an admissible set of forms. . In 1904 Dickson made the optimistic conjecture that if there is no such obstruc- tion to a set of linear forms being infinitely often prime, then they are infinitely often simultaneously prime. That is:

Conjecture: If x + a1, . . . , x + ak is an admissible set of forms then there are infinitely many integers n such that n + a1, . . . , n + ak are all prime numbers.

In this case, we call n + a1, . . . , n + ak a k-tuple of prime numbers. Dickson’s prime k-tuple conjecture states that if a set b1x + a1, . . . , bkx + ak of linear forms is admissible (that is, if the forms are all positive at infinitely many integers x and, Q for each prime p there exist an integer n such that p - P(n) := j(bjn + aj)) then there are infinitely many integers n for which b1n + a1, . . . , bkn + ak are all primes. 11.3. A quantitative prime k-tuplets conjecture We are going to develop a heuristic to guesstimate the number of pairs of twin primes p, p+2 up to x. We start with Gauss’s statement that “the density of primes at around x is roughly 1/ log x. Hence the probability that p is prime is 1/ log x, and the probability that p + 2 is prime is 1/ log x so, assuming that these events are independent, the probability that p and p + 2 are simultaneously prime is 1 1 1 · = ; log x log x (log x)2 and so we might expect about x/(log x)2 pairs of twin primes p, p+2 ≤ x. However there is a problem with this reasoning, since we are implicitly assuming that the events “p is prime for an arbitrary integer p ≤ x”, and “p + 2 is prime for an arbitrary integer p ≤ x”, can be considered to be independent. This is obviously

1 Notice that a1, a2, . . . , ak (mod p) can occupy no more than k residue classes mod p and so, if p > k then p cannot be an obstruction. Hence, to check whether a given set A of k integers is admissible, one needs only find one residue class bp (mod p), for each prime p ≤ k, which does not contain any element of A. 112 11. GAPS BETWEEN PRIMES, I false since, for example, if p is even then p + 2 must also be.2 So, we correct for the non-independence modulo small primes q, by the ratio of the probability that both p and p + 2 are not divisible by q, to the probability that p and p0 are not divisible by q. Now the probability that q divides an arbitrary integer p is 1/q; and hence the probability that p is not divisible by q is 1 − 1/q. Therefore the probability that both of two independently chosen integers are not divisible by q, is (1 − 1/q)2. The probability that q does not divide either p or p + 2, equals the probability that p 6≡ 0 or −2 (mod q). If q > 2 then p can be in any one of q − 2 residue classes mod q, which occurs, for a randomly chosen p (mod q), with probability 1 − 2/q. If q = 2 then p can be in any just one residue class mod 2, which occurs with probability 1/2. Hence the “correction factor” for divisibility by 2 is (1 − 1 ) 2 = 2, 1 2 (1 − 2 ) and the “correction factor” for divisibility by any prime q > 2 is (1 − 2 ) q . 1 2 (1 − q ) Divisibility by different small primes is independent, as we vary over values of n, by the Chinese Remainder Theorem, and so we might expect to multiply together all of these correction factors, corresponding to each “small” prime q. The question then becomes, what does “small” mean? In fact, it doesn’t matter much because the product of the correction factors over larger primes is very close to 1, and hence we can simply extend the correction to be a product over all primes q. (More precisely, the infinite product over all q, converges.) Hence we define the twin prime constant to be 2 Y (1 − q ) C := 2 ≈ 1.3203236316, (1 − 1 )2 q prime q q≥3 the total correction factor over all primes q. We then conjecture that the number of prime pairs p, p + 2 ≤ x is x ∼ C . (log x)2 Computational evidence suggests that this is a pretty good guess. An analogous argument implies the conjecture that the number of prime pairs p, p + 2k ≤ x is Y p − 1 x ∼ C . p − 2 (log x)2 p|k p≥3 This argument is easily modified to make an analogous prediction for any k- tuple: Given a1, . . . , ak, let Ω(p) be the set of distinct residues given by a1, . . . , ak (mod p), and then let ω(p) = |Ω(p)|. None of the n + ai is divisible by p if and only

2This reasoning can be seen to be false for a more dramatic reason: The analogous argument implies that there are ∼ x/(log x)2 prime pairs p, p + 1 ≤ x. 11.4. DENSELY PACKED PRIMES IN SHORT INTERVALS 113 if n is in any one of p − ω(p) residue classes mod p, and therefore the correction factor for prime p is (1 − ω(p) ) p . 1 k (1 − p ) Hence we predict that the number of prime k-tuplets n + a1, . . . , n + ak ≤ x is, ω(p) x Y (1 − p ) ∼ C(a) where C(a) := . (log x)k (1 − 1 )k p p Computational evidence suggests that these conjectures do approach the truth, even though this rests on the rather shaky theoretical framework given here. One can give a more convincing theoretical framework based on the circle method. Gallagher showed that some of the predictions about the distribution of primes in short intervals, obtained via the Gauss-Cram´er heuristic, can also be justified via a uniform version of the prime k-tuplets conjecture: • The proportion of intervals [x, x + λ log x] containing exactly k primes is ∼ e−λλk/k!. −λ • The proportion of n for which pn+1 − pn > λ log pn is ∼ e . One can extend Gallagher’s arguments to predict that as we vary over x ∈ R x+y [X, 2X], the primes in [x, x + y] are normally distributed with mean x dt/ log t and variance (1 − δ)y/ log x where y = xδ+o(1) and 0 < δ < 1. Primes between n2 and (n + 1)2.

11.4. Densely packed primes in short intervals One can ask what is the largest number of primes one can have in an interval of length y? Extensive calculations seem to indicate that one can not do much better than the initial interval of length y (or, better, the numbers from 2 to y + 1). This led Hardy and Littlewood to conjecture that π(x + y) ≤ π(x) + π(y) whenever x, y ≥ 2, in the same paper in which they formulated the prime k-tuplets conjecture. Although this conjecture seems completely reasonable it turns out that, thanks to a brilliant argument of Hensley and Richards, we now believe it to be wrong. In fact, it is definitely wrong if the prime k-tuplets conjecture holds, though to date no one has found a counterexample (and we shall explain below why we expect this to be a very challenging problem). A set of integers B is called admissible if, for every prime p there exists a residue class ap (mod p) such that b 6≡ ap (mod p) for all b ∈ B. The prime k- tuplets conjecture allows us, for any fixed y, to determine the largest number N such that there are infinitely many integers x for which there are N primes in (x, x + y]: Lemma 11.4.1. For given integer y let S = S(y) be a largest admissible subset of [1, y]. If the prime k-tuplets conjecture holds then max(π(x + y) − π(x)) = |S(y)|; x≥y and there are infinitely many integers x for π(x + y) − π(x) = |S(y)|.

Proof. Suppose that N = maxx≥y(π(x + y) − π(x)) and that for some given x ≥ y we have π(x+y)−π(x) = N, where the primes in (x, x+y] are x+a1, . . . , x+ aN . Evidently these form am admissible set since none of the ai are ≡ 0 (mod p) 114 11. GAPS BETWEEN PRIMES, I for any prime p ≤ y. In the other direction, if a1, . . . , aN ⊂ [1, y] is admissible then the prime k-tuplets conjecture predicts that there are infinitely many integers x for which each of x + a1, . . . , x + aN are prime and so π(x + y) − π(x) = N.  Theorem 11.4.2 (Hensley-Richards). Fix  > 0. If N is sufficiently large then the set S := {p, −p : N/ log N < p ≤ N/2} is an admissible subset of [−N/2,N/2]. Note that N N  N  |S| = + (1 + log 2 − ) + O > π(N). log N (log N)2 (log N)3

Proof. Evidently we can take ap = 0 for all primes p ≤ N/ log N. If p > |S| then we can simply take ap to be any residue class not containing an element of S, and these must exist by the . log N(log log log N)2 So suppose N/ log N < p ≤ |S|. Let z =  log log N log log log log N in Corollary Q 4.5.1 so that if Q = q≤z q then Q = o(N) and y > (N + Q)/p. Then there exists Q rp ∈ [−N/2 − Q, −N/2), where Q = q≤z q such that rp + jp has a common factor with Q, and so is not in S, for each j in the range 1 ≤ j ≤ y. In other words there are no elements of S which are ≡ rp (mod p).  Combining these last two results we obtain the promised result, contradicting Hardy and Littlewood had conjectured that π(M + N) ≤ π(M) + π(N) for all M,N ≥ 2. Corollary 11.4.3. Assume the prime k-tuplets conjecture. If N is sufficiently large then there exists some M such that π(M + N) > π(M) + π(N). There have been extensive calculations of admissible sets. We can ask what is the smallest y for which there is an admissible set in an interval of length y with more than π(y) elements. It is known that 2340 ≤ y ≤ 3158. There is an admissible set of size 447 amongst the first 3158 integers and yet π(3158) = 446. See http://math.mit.edu/∼primegaps/ for data. One can extend this argument to do rather more than contradict the Hardy- Littlewood conjecture: Clark and Jarvis found an admissible subset on [1,N], for N = 130, 808, 636, with N N ≥ + (1 + 2 log 2 − ) log N (log N)2 elements, hence showing that π(x + N) − π(x) ≤ 2π(N/2) is incompatible with the prime k-tuplets conjecture. CHAPTER 12

Goldston-Pintz-Yıldırım’s argument

The combinatorial argument of Goldston-Pintz-Yıldırım [?] lies at the heart of the proof that there are bounded gaps between primes. (Henceforth we will call it “the GPY argument”. See [?] for a more complete discussion of their ideas.)

12.1. The set up

Let H = (a1 < a2 < . . . < ak) be an admissible k-tuple, and take x > ak. Our goal is to select a weight for which weight(n) ≥ 0 for all n, such that

k ! X X (12.1.1) weight(n) θ(n + ai) − log 3x > 0, x

k ! X weight(n) θ(n + ai) − log 3x > 0. i=1 In that case weight(n) 6= 0 so that weight(n) > 0, and therefore

k X θ(n + ai) > log 3x. i=1

However each n + ai ≤ 2x + ak < 2x + x and so each θ(n + ai) < log 3x. This implies that at least two of the θ(n + ai) are non-zero, that is, at least two of n + a1, . . . , n + ak are prime. A simple idea, but the difficulty comes in selecting the function weight(n) with these properties in such a way that we can evaluate the sums in (12.1.1). Moreover in [?] they also require that weight(n) is sensitive to when each n + ai is “almost prime”. All of these properties can be acquired by using a construction championed by Selberg. In order that weight(n) ≥ 0 one can simply take it to be a square. Hence we select  2  X  weight(n) :=  λ(d) ,   d|P(n) d≤R where the sum is over the positive integers d that divide P(n), and  log d  λ(d) := µ(d)G , log R

115 116 12. GOLDSTON-PINTZ-YILDIRIM’S ARGUMENT where G(.) is a measurable, bounded function, supported only on [0, 1], and µ is the M¨obiusfunction. Therefore λ(d) is supported only on squarefree, positive integers, that are ≤ R. (This generalizes the discussion at the end of section ??.) We can select G(t) = (1 − t)m/m! to obtain the results of this section but it will pay, for our understanding of the Maynard-Tao construction, if we prove the GPY result for more general G(.).

12.2. Evaluating the sums over n Expanding the above sum gives   k X X X X  (12.2.1) λ(d1)λ(d2)  θ(n + ai) − log 3x 1 .   d1,d2≤R i=1 x

k X X X X X (12.2.2) θ(n + ai) − log 3x 1,

i=1 m∈Ωi(D) x

The two sums over d1 and d2 in (12.2.3) are not easy to evaluate: The use of the M¨obiusfunction leads to many terms being positive, and many negative, so that there is a lot of cancelation. There are several techniques in analytic number theory that allow one to get accurate estimates for such sums, two more analytic ([?], [?]), the other more combinatorial ([?], [?]). We will discuss them all.

12.3. Evaluating the sums using Perron’s formula Perron’s formula allows one to study inequalities using complex analysis:  1 if y > 1; 1 Z ys  ds = 1/2 if y = 1; 2iπ s Re(s)=2 0 if 0 < y < 1. (Here the subscript “Re(s) = 2” means that we integrate along the line s : Re(s) = 2; that is s = 2 + it, as t runs from −∞ to +∞.) So to determine whether d < R we simply compute this integral with y = R/d. (The special case, d = R, has a negligible effect on our sums, and can be avoided by selecting R 6∈ Z). Hence the second sum in (12.2.3) equals

Z s1 Z s2 X ω(D) 1 (R/d1) 1 (R/d2) λ(d1)λ(d2) · ds1 · ds2. D 2iπ Re(s1)=2 s1 2iπ Re(s2)=2 s2 d1,d2≥1 D:=[d1,d2] Re-organizing this we obtain   Z 1 X λ(d1)λ(d2) ω(D) ds2 ds1 (12.3.1)   Rs1+s2 · 2 Re(s )=2  s1 s2  (2iπ) 1  d1 d2 D  s2 s1 Re(s2)=2 d1,d2≥1 D:=[d1,d2] We will compute the sum in the middle in the special case that λ(d) = µ(d), the more general case following from a variant of this argument. Hence we have

X µ(d1)µ(d2) ω([d1, d2]) (12.3.2) s1 s2 . d1 d2 [d1, d2] d1,d2≥1 The summand is a multiplicative function, which means that we can evaluate it 2 prime-by-prime. For any given prime p, the summand is 0 if p divides d1 or d2 118 12. GOLDSTON-PINTZ-YILDIRIM’S ARGUMENT

(since then µ(d1) = 0 or µ(d2) = 0). Therefore we have only four cases to consider: p - d1, d2; p|d1, p - d2; p - d1, p|d2; p|d1, p|d2, so the pth factor is 1 ω(p) 1 ω(p) 1 ω(p) 1 − · − · + · . ps1 p ps2 p ps1+s2 p We have seen that ω(p) = k for all sufficiently large p so, in that case, the above becomes k k k (12.3.3) 1 − − + . p1+s1 p1+s2 p1+s1+s2 In the analytic approach, we compare the integrand to a (carefully selected) power of the Riemann-zeta function, which is defined as

−1 X 1 Y  1  ζ(s) = = 1 − for Re(s) > 1. ns ps n≥1 p prime

−1  1  The pth factor of ζ(s) is 1 − ps so, as a first approximation, (12.3.3) is roughly

 1 −k  1 k  1 k 1 − 1 − 1 − . p1+s1+s2 p1+s1 p1+s2 Substituting this back into (12.3.1) we obtain

1 ZZ ζ(1 + s + s )k ds ds 1 2 s1+s2 2 1 2 k k G(s1, s2) R · . (2iπ) Re(s1)=2 ζ(1 + s1) ζ(1 + s2) s2 s1 Re(s2)=2 where k −k −k Y  1   1   1   ω(p) ω(p) ω(p)  G(s1, s2) := 1 − 1 − 1 − 1 − − + . p1+s1+s2 p1+s1 p1+s2 p1+s1 p1+s2 p1+s1+s2 p prime To determine the value of this integral we move both contours in the integral slightly to the left of the lines Re(s1) =Re(s2) = 0, and show that the main contribution comes, via Cauchy’s Theorem, from the pole at s1 = s2 = 0. This can be achieved using our understanding of the Riemann-zeta function, and by noting that

−k Y  ω(p)  1 G(0, 0) := 1 − 1 − = C(a) 6= 0. p p p prime Remarkably when one does the analogous calculation with the first sum in (12.2.3), one takes k − 1 in place of k, and then

−(k−1) Y  ω∗(p)  1 G∗(0, 0) := 1 − 1 − = C(a), p − 1 p p prime also. Since it is so unlikely that these two quite different products give the same constant by co-incidence, one can feel sure that the method is correct! This was the technique used in [?] and, although the outline of the method is quite compelling, the details of the contour shifting can be complicated. 12.4. EVALUATING THE SUMS USING FOURIER ANALYSIS 119

12.4. Evaluating the sums using Fourier analysis Both analytic approaches depend on the simple pole of the Riemann zeta func- tion at s = 1. The Fourier analytic approach (first used, to my knowledge, by Green and Tao, and in this context, on Tao’s blog) avoids some of the more mysterious, geometric technicalities (which emerge when shifting contours in high dimensional space), since the focus is more on the pole itself. To appreciate the method we prove a fairly general result, starting with smooth log R functions F,H : [0, +∞) → R that are supported on the finite interval [0, log x ], and log d1 log d2 then letting λ(d1) = µ(d1)F ( log x ) and λ(d2) = µ(d2)H( log x ). Note that λ(d1) is supported only when d1 ≤ R, and similarly λ(d2). Our goal is to evaluate the sum

X log d1 log d2 ω(D) (12.4.1) µ(d )µ(d ) F ( )H( ) . 1 2 log x log x D d1,d2≥1 D:=[d1,d2] The function etF (t) (and similarly etH(t)) is also a smooth, finitely supported, function. Such a function has a Fourier expansion Z ∞ Z ∞ v −itv log d1 f(t) e F (v) = e f(t) dt and so F ( ) = 1+it dt, t=−∞ log x t=−∞ log x d1 1 for some smooth, bounded function f : R → C that is√ rapidly decreasing, and therefore the tail of the integral (for instance when |t| > log x) does not contribute much. Substituting this and the analogous formula for H(.) into (12.4.1), we obtain Z ∞ X µ(d1)µ(d2) ω(D) f(t)h(u) 1+it 1+iu dtdu. t,u=−∞ log x log x D d1,d2≥1 d1 d2 D:=[d1,d2] 1+it 1+iu We evaluated this same sum, (12.3.2) with s1 = log x and s2 = log x , in the previous approach, and so know that our integral equals 2+i(t+u) Z ∞ ζ(1 + )k 1 + it 1 + iu f(t)h(u) log x G( , ) dtdu. 1+it k 1+iu k t,u=−∞ ζ(1 + log x ) ζ(1 + log x ) log x log x √ One can show that the contribution with |t|, |u|√> log x does not contribute much since f and h decay so rapidly. When |t|, |u| ≤ log x we are near the pole of ζ(s), and can get very good approximations for the zeta-values by using the Laurent 1+it 1+iu expansion ζ(s) = 1/s + O(1). Moreover G( log x , log x ) = G(0, 0) + o(1) using its Taylor expansion, and therefore our integral is very close to G(0, 0) Z ∞ (1 + it)(1 + iu) k f(t)h(u) dtdu. (log x) t,u=−∞ 2 + i(t + u) 0 R ∞ −(1+it)v By the definition of F we have F (v) = − t=−∞(1 + it)e f(t) dt, and so Z ∞ Z ∞ Z ∞  F 0(v)H0(v)dv = f(t)h(u)(1 + it)(1 + iu) e−(1+it)v−(1+iu)vdv dtdu v=0 t,u=−∞ v=0 Z ∞ (1 + it)(1 + iu) = f(t)h(u) dtdu. t,u=−∞ 2 + i(t + u)

1 −A That is, for any given A > 0 there exists a constant cA such that |f(ξ)| ≤ cA/|ξ| . 120 12. GOLDSTON-PINTZ-YILDIRIM’S ARGUMENT

Combining the last two displayed equations, and remembering that G(0, 0) = C(a) we deduce that (12.4.1) is asymptotically equal to Z ∞ C(a) 0 0 k F (v)H (v)dv. (log x) v=0 One can do the analogous calculation with the first sum in (12.2.3), taking k − 1 in place of k, and obtaining the constant G∗(0, 0) = C(a).

12.5. Evaluating the sums using Selberg’s combinatorial approach, I As discussed, the difficulty in evaluating the sums in (12.2.3) is that there are many positive terms and many negative terms. In developing his upper bound sieve method, Selberg encountered a similar problem and dealt with it in a surprising way, using combinatorial identities to remove this issue. The method rests on a reciprocity law: Suppose that L(d) and Y (r) are sequences of numbers, supported only on the squarefree integers. If X0 Y (r) := µ(r) L(m) for all r ≥ 1, m: r|m then X0 L(d) = µ(d) Y (n) for all d ≥ 1 n: d|n X0 From here on, denotes the restriction to squarefree integers that are ≤ R. 2

Let φω be the multiplicative function (defined here, only on squarefree integers) for which φω(p) = p − ω(p). We apply the above reciprocity law with λ(d)ω(d) y(r)ω(r) L(d) := and Y (r) := . d φω(r)

Now since d1d2 = D(d1, d2) we have

ω(D) (d2, d2) λ(d1)λ(d2) = L(d1)L(d2) D ω((d2, d2)) and therefore X0 ω(D) X X0 (d1, d2) S1 := λ(d1)λ(d2) = Y (r)Y (s) µ(d1)µ(d2) . D ω((d1, d2)) d1,d2 r,s d1,d2 D:=[d1,d2] d1|r, d2|s The summand (of the inner sum) is multiplicative and so we can work out its value, prime-by-prime. We see that if p|r but p - s (or vice-versa) then the sum is 1−1 = 0. Hence if the sum is non-zero then r = s (as r and s are both squarefree). In that case, if p|r then the sum is 1−1−1+p/ω(p) = φω(p)/ω(p). Hence the sum becomes 2 X φω(r) X y(r) ω(r) (12.5.1) S = Y (r)2 = . 1 ω(r) φ (r) r r ω We will select  log r  y(r) := F log R

2Selberg developed similar ideas in his construction of a small sieve, though he neither formu- lated a reciprocity law, nor applied his ideas to the question of small gaps between primes. 12.7. SELBERG’S COMBINATORIAL APPROACH, II 121 when r is squarefree, where F (t) is measurable and supported only on [0, 1]; and y(r) = 0 otherwise. Hence we now have a sum with all positive terms so we do not have to fret about complicated cancelations.

12.6. Sums of multiplicative functions An important theme in analytic number theory is to understand the behaviour of sums of multiplicative functions, some being easier than others. Multiplicative functions f for which the f(p) are fixed, or almost fixed, were the first class of non-trivial sums to be determined. Indeed from the Selberg-Delange theorem,3 one can deduce that X g(n) (log x)k (12.6.1) ∼ κ(g) · , n k! n≤x where k Y  g(p) g(p2)   1 κ(g) := 1 + + + ... 1 − p p2 p p prime when g(p) is typically “sufficiently close” to some given positive integer k that the Euler product converges. Moreover, by partial summation, one deduces that X g(n) log n Z 1 tk−1 (12.6.2) F ∼ κ(g)(log x)k · F (t) dt. n log x (k − 1)! n≤x 0 We apply this in the sum above, noting that here κ(g) = 1/C(a), to obtain 2 X ω(r)  log r  Z 1 tk−1 C(a)S = C(a) F ∼ (log R)k · F (t)2 dt. 1 φ (r) log R (k − 1)! r ω 0 A similar calculation reveals that Z 1 k−1 k t k C(a)λ(d) ∼ µ(d) · (1 − vd) F (t) dt · (log R) , vd (k − 1)! log d where vd := log R .

Remark 12.6.0.1. If, as in section 12.4, we replace λ(d1)λ(d2) by λ1(d1)λ2(d2) then we obtain Z 1 k−1 k t C(a)S1 ∼ (log R) · F1(t)F2(t) dt. 0 (k − 1)! 12.7. Selberg’s combinatorial approach, II A completely analogous calculation, but now applying the reciprocity law with λ(d)ω∗(d) y∗(r)ω∗(r) L(d) := and Y (r) := , φ(d) φω(r) yields that X0 ω∗(D) X y∗(r)2ω∗(r) (12.7.1) S2 := λ(d1)λ(d2) = . φ(D) φω(r) d1,d2 r D:=[d1,d2]

3This also follows from the relatively easy proof of Theorem 1.1 of [?]. 122 12. GOLDSTON-PINTZ-YILDIRIM’S ARGUMENT

We need to determine y∗(r) in terms of the y(r), which we achieve by applying the reciprocity law twice: ∗ ∗ φω(r) X ω (d) d X y(n)ω(n) y (r) = µ(r) ∗ µ(d) ω (r) φ(d) ω(d) φω(n) d: r|d n: d|n r X y(n) X ω∗(d/r)d/r = µ(d/r) ω(n/d) φ(r) φω(n/r) φ(d/r) n: r|n d: d/r|n/r X0 y(n) r X0 y(mr) = r = φ(n) φ(r) φ(m) n: r|n m:(m,r)=1 Z 1 ∼ F (t)dt · log R, log r log R where the last estimate was obtained by applying (12.6.2) with k = 1, and taking care with the Euler product. We now can insert this into (12.7.1), and apply (12.6.2) with k replaced by k − 1, noting that κ(g∗) = 1/C(a), to obtain

2 X y∗(r)2ω∗(r) Z 1 Z 1  tk−2 C(a)S = C(a) ∼ (log R)k+1 · F (u)du dt. 2 φ (r) (k − 2)! r ω 0 t

Remark 12.7.0.2. If, as in section 12.4, we replace λ(d1)λ(d2) by λ1(d1)λ2(d2) then we obtain Z 1 Z 1  Z 1  k−2 k+1 t C(a)S2 ∼ (log R) · F1(u)du F2(v)dv dt. 0 t t (k − 2)!

12.8. Finding a positive difference; the proof of Theorem 10.4.1 From these estimate, we deduce that C(a) times (12.2.3) is asymptotic to x(log 3x)(log R)k times

2 log R Z 1 Z 1  tk−2 Z 1 tk−1 (12.8.1) k · F (u)du dt − F (t)2 dt. log 3x 0 t (k − 2)! 0 (k − 1)! Assume that the Elliott-Halberstam conjecture holds with exponent 1 + η, and let √ 2 R = Q. This then equals Z 1 k−1     2 t 1 1 F (t) dt · + η ρk(F ) − 1 0 (k − 1)! 2 2 where Z 1 Z 1 2 k−2  Z 1 k−1 t 2 t (12.8.2) ρk(F ) := k F (u)du dt F (t) dt. 0 t (k − 2)! 0 (k − 1)! Therefore if 1 1  + η ρ (F ) > 1 2 2 k for some F that satisfies the above hypotheses, then (12.8.1) is > 0, which implies that (12.2.3) is > 0, and so (12.1.1) is also > 0, as desired. 12.8. FINDING A POSITIVE DIFFERENCE; THE PROOF OF THEOREM ?? 123

We now need to select a suitable function F (t) to proceed. A good choice is (1−t)` F (t) = `! . Using the beta integral identity Z 1 vk (1 − v)` 1 dv = , 0 k! `! (k + ` + 1)! we obtain Z 1 k−1 Z 1 2` k−1   2 t (1 − t) t 1 2` F (t) dt = 2 dt = , 0 (k − 1)! 0 `! (k − 1)! (k + 2`)! ` and 2 2 Z 1 Z 1  tk−2 Z 1 (1 − t)`+1  tk−2 1 2` + 2 F (u)du dt = dt = . 0 t (k − 2)! 0 ` + 1 (k − 2)! (k + 2` + 1)! ` + 1 Therefore (12.8.2) is > 0 if (10.4.1) holds, and so we deduce Theorem 10.4.1.

1 To summarize: If the Elliott-Halberstam conjecture holds with exponent 2 +η, 2  1  and if ` is an integer such that 1 + 2η > 1 + 2`+1 then for every admissible k- tuple, with k = (2`+1)2, there are infinitely many n for which the k-tuple, evaluated at n, contains (at least) two primes.

CHAPTER 13

Goldston-Pintz-Yıldırım in higher dimensional analysis

In the GPY argument, we studied the divisors d of the product of the k-tuple values; that is d|P(n) = (n + a1) ... (n + ak). with d ≤ R. Maynard and Tao (independently) realized that they could instead study the k-tuples of divisors d1, d2, . . . , dk of each individual element of the k-tuple; that is

d1|n + a1, d2|n + a2, . . . , dk|n + ak.

Now, instead of d ≤ R, we take d1d2 . . . dk ≤ R.

13.1. The set up One can proceed much as in the previous section, though technically it is easier to restrict our attention to when n is in an appropriate congruence class mod m where m is the product of the primes for which ω(p) < k, because, if ω(p) = k then p can only divide one n + ai at a time. Therefore we study 2  k    X X X X S0 :=  θ(n + aj) − h log 3x  λ(d1, . . . , dk) r∈Ω(m) n∼x j=1 d |n+a for each i n≡r (mod m) i i which upon expanding, as (di, m)|(n + ai, m) = 1, equals

 k  X X X X λ(d1, . . . , dk)λ(e1, . . . , ek)  θ(n + aj) − h log 3x . d ,...,d ≥1 r∈Ω(m) n∼x j=1 1 k n≡r (mod m) e1,...,ek≥1 [d ,e ]|n+a for each i (diei,m)=1 for each i i i i

Next notice that [di, ei] is coprime with [dj, ej] whenever i 6= j, since their gcd divides (n + aj) − (n + ai), which divides m, and so equals 1 as (diei, m) = 1. Hence, in our internal sum, the values of n belong to an arithmetic progression Q with modulus m i[di, ei]. Also notice that if n + aj is prime then dj = ej = 1. Therefore, ignoring error terms, X ω(m) ω(m) S = S · x − h S · x log 3x 0 φ(m) 2,` m 1 1≤`≤k where X λ(d1, . . . , dk)λ(e1, . . . , ek) S1 := Q i [di, ei] d1,...,dk≥1 e1,...,ek≥1 (di,ej )=1 for i6=j

125 126 13. GOLDSTON-PINTZ-YILDIRIM IN HIGHER DIMENSIONAL ANALYSIS and X λ(d1, . . . , dk)λ(e1, . . . , ek) S2,` := Q . i φ([di, ei]) d1,...,dk≥1 e1,...,ek≥1 (di,ej )=1 for i6=j d`=e`=1 These sums can be evaluated in several ways. Tao (see his blog) gave what is perhaps the simplest approach, generalizing the Fourier analysis technique discussed in section 12.4 (see [?], Lemma 4.1). Maynard [?] gave what is, to my taste, the more elegant approach, generalizing Selberg’s combinatorial technique:

13.2. The combinatorics The reciprocity law generalizes quite beautifully to higher dimension: Suppose k that L(d) and Y (r) are two sequences of complex numbers, indexed by d, r ∈ Z≥1, and non-zero only when each di (or ri) is squarefree. Then

k Y X L(d1, . . . , dk) = µ(di) Y (r1, . . . , rk)

i=1 r1,...,rk≥1 di|ri for all i if and only if

k Y X Y (r1, . . . , rk) = µ(ri) L(d1, . . . , dk).

i=1 d1,...,dk≥1 ri|di for all i We use this much as above, in the first instance with

λ(d1, . . . , dk) y(r1, . . . , rk) L(d1, . . . , dk) = and Y (r1, . . . , rk) = d1, . . . , dk φk(r1 . . . rk) where log r log r  y(r , . . . , r ) = F 1 ,..., k 1 k log R log R with F ∈ C[t1, . . . , tk], such that that there is a uniform bound on all of the first order partial derivatives, and F is only supported on

Tk := {(t1, . . . , tk) : Each tj ≥ 1, and t1 + ... + tk ≤ 1}. Proceeding much as before we obtain 2 X y(r1, . . . , rk) (13.2.1) S1 ∼ . φk(r1 . . . rk) r1,...,rk≥1

13.3. Sums of multiplicative functions By (12.6.1) we have

2 X µ (n) Y p − 1 Y (p − 1)φk−1(p) (13.3.1) = · (log N + O(1)) φk(n) p p φk(p) 1≤n≤N p|m p-m (n,m)=1 13.4. THE COMBINATORICS, II 127

We apply this k times; firstly with m replaced by mr1 . . . rk−1 and n by rk, then with m replaced by mr1 . . . rk−2, etc By the end we obtain 2 X µ (r1 . . . rkm) Y (13.3.2) Cm(a) = (log Ri + O(1)), φk(r1, . . . , rk) 1≤r1≤R1, i ..., 1≤rk≤Rk where −k −k Y  1 Y  k   1 C (a) := 1 − 1 − 1 − . m p p p p|m p-m From this, and partial summation, we deduce from (13.2.1), that Z k 2 (13.3.3) Cm(a)S1 ∼ (log R) · F (t1, . . . , tk) dtk . . . dt1. t1,...,tk∈Tk Had we stopped our calculation one step earlier we would have found 2 X µ (r1 . . . rk−1m) m Y (13.3.4) Cm(a) = · (log Ri + O(1)), φk(r1, . . . , rk−1) φ(m) 1≤r1≤R1, i ..., 1≤rk−1≤Rk−1

Remark 13.3.0.3. If we replace λ(d)λ(e) by λ1(d)λ2(e) in the definition of S1, R 2 R then the analogous argument yields F (t) dt replaced by F1(t)F2(t)dt in t∈Tk t∈Tk (13.3.3).

13.4. The combinatorics, II We will deal only with the case ` = k, the other cases being analogous. Now we use the higher dimensional reciprocity law with

λ(d1, . . . , dk−1, 1) yk(r1, . . . , rk−1) L(d1, . . . , dk−1) = and Yk(r1, . . . , rk−1) = φ(d1 . . . dk−1) φk(r1 . . . rk−1) where dk = rk = 1, so that, with the exactly analogous calculations as before, 2 X yk(r1, . . . , rk−1) S2,k ∼ . φk(r1 . . . rk−1) r1,...,rk−1≥1

Using the reciprocity law twice to determine the yk(r) in terms of the y(n), we obtain that φ(m) Z yk(r1, . . . , rk−1) ∼ · F (ρ1, . . . , ρk−1, t)dt · log R m t≥0

ρi where each ri = N . Therefore, using (13.3.4), we obtain (13.4.1) Z Z 2 φ(m) k+1 Cm(a)S2,k ∼ F (t1, . . . , tk−1, tk)dtk dtk−1 . . . dt1· (log R) . 0≤t1,...,tk−1≤1 tk≥0 m

Remark 13.4.0.4. If we replace λ(d)λ(e) by λ1(d)λ2(e) in the definition of S2, R 2 R R then the analogous argument yields ( F (t)dtk) replaced by ( F1(t)dtk)( F2(t)dtk) tk≥0 tk≥0 tk≥0 in (13.4.1). 128 13. GOLDSTON-PINTZ-YILDIRIM IN HIGHER DIMENSIONAL ANALYSIS

13.5. Finding a positive difference By the Bombieri-Vinogradov Theorem we can take R = x1/4−o(1), so that, by ω(m) k (13.3.3) and (13.4.1), Cm(a)S0 equals m x(log 3x)(log R) times k Z Z 2 Z 1 X Y 2 F (t1, . . . , tk)dt` dtj−h F (t1, . . . , tk) dtk . . . dt1+o(1). 0≤t ≤1 for 4 i t ≥0 t ,...,t ∈T `=1 1≤i≤k, i6=` ` 1≤j≤k 1 k k i6=`

One can show that the optimal choice for F must be symmetric. Hence S0 > 0 follows if there exists a symmetric F (with the restrictions above) for which the ratio R R 2 k F (t1, . . . , tk)dtk dtk−1 . . . dt1 t1,...,tk−1≥0 tk≥0 ρ(F ) := R 2 . F (t1, . . . , tk) dtk . . . dt1 t1,...,tk≥0 satisfies ρ(F ) > 4h. We have proved the following Proposition which leaves us to find functions F with certain properties, in order to obtain the main results:

Proposition 13.5.1. Fix h ≥ 1. Suppose that there exists F ∈ C(x1, . . . , xk) which is measurable, supported on Tk, for which there is a uniform bound on the first order partial derivatives and such that ρ(F ) > 4h. Then, for every admissible k-tuple of linear forms, there are infinitely many integers n such that there are > h primes amongst the k linear forms when evaluated at n. If the Elliott-Halberstam conjecture holds then we only need that ρ(F ) > 2h.

13.6. A special case

If F (t1, . . . , tk) = f(t1 + ... + tk) then since Z tk−1 dtk−1 . . . dt1 = , t1,...,tk≥0 (k − 1)! t1+...+tk=t we deduce that

ρ(F ) = ρk(f) as defined in (12.8.2). That is, we have reverted to the original GPY argument, which was not quite powerful enough for our needs. We want to select F that does not lead us back to the original GPY argument, so we should avoid selecting F to be a function of one variable. Pk j Since F is symmetric, we can define the symmetric sums, Pj = i=1 ti . In the GPY argument F was a function of P1. A first guess might be to work now with functions of P1 and P2, so as to consider functions F that do not appear in the GPY argument.

13.7. Maynard’s F s, and gaps between primes For k = 5 let 2 F (t1, . . . , t5) = 70P1P2 − 49P1 − 75P2 + 83P1 − 34. A calculation yields that 1417255 ρ(F ) = > 2. 708216 13.8. F AS A PRODUCT OF ONE DIMENSIONAL FUNCTIONS 129

Therefore, by Proposition 13.5.1, if we assume the Elliott-Halberstam conjecture with h = 1 then for every admissible 5-tuple of linear forms, there are infinitely many integers n such that there are at least two primes amongst the five linear forms when evaluated at n. In particular, from the admissible forms {x, x + 2, x + 6, x + 8, x + 12} we deduce that there are infinitely many pairs of distinct primes that differ by no more than 12. Also from the admissible forms {x + 1, 2x + 1, 4x + 1, 8x + 1, 16x + 1} we deduce that there are infinitely many pairs of distinct primes, p, q for which (p − 1)/(q − 1) = 2j for j = 0, 1, 2, 3 or 4. Maynard [?] showed that there exists a polynomial of the form X a b ca,b(1 − P1) P2 a,b≥0 a+2b≤11 with k = 105, for which ρ(F ) = 4.0020697 ... By Proposition 13.5.1 with h = 1, we can then deduce that for every admissible 105-tuple of linear forms, there are infinitely many integers n such that there are at least two primes amongst the 105 linear forms when evaluated at n. How did Maynard find his polynomial, F , of the above form? The numerator T and denominator of ρ(F ) are quadratic forms in the 42 variables ca,b, say v M2v T and v M1v, respectively, where v is the vector of c-values. The matrices M1 and M2 are easily determined. By the theory of Lagrangian multipliers, Maynard showed that −1 M1 M2v = ρ(F )v −1 so that ρ(F ) is the largest eigenvalue of M1 M2, and v is the corresponding eigen- vector. These calculations are easily accomplished using a pack- age and yield the result above.

13.8. F as a product of one dimensional functions We select ( g(kt1) . . . g(ktk) if t1 + ... + tk ≤ 1 F (t1, . . . tk) = 0 otherwise, R 2 where g is some integrable function supported only on [0,T ]. Let γ := t≥0 g(t) dt, so that the denominator of ρ(F ) is Z Z 2 2 −k k Ik = f(t1, . . . tk) dtk . . . dt1 ≤ (g(kt1) . . . g(ktk)) dtk . . . dt1 = k γ . t∈Tk t1,...,tk≥0

We rewrite the numerator of ρ(F ) as Lk − Mk where Z Z 2 Z 2 −k k−1 Lk := k g(kt1) . . . g(ktk)dtk dtk−1 . . . dt1 = k γ g(t)dt . t1,...,tk−1≥0 tk≥0 t≥0

As g(t) is only supported in [0,T ] we have, by Cauchying and letting uj = ktj, Z Z !2 Mk : = g(kt1) . . . g(ktk)dtk dtk−1 . . . dt1 t1,...,tk−1≥0 tk≥1−t1−...−tk−1 Z −k 2 2 ≤ k T g(u1) . . . g(uk) du1 . . . duk. u1,...,uk≥0 u1+...+uk≥k 130 13. GOLDSTON-PINTZ-YILDIRIM IN HIGHER DIMENSIONAL ANALYSIS

R 2 R 2 Now assume that µ := t tg(t) dt ≤ (1 − η) t g(t) dt = (1 − η)γ for some given η > 0; that is, that the “weight” of g2 is centered around values of t ≤ 1 − η. We have  1 2 1 ≤ η−2 (u + ... + u ) − µ/γ k 1 k whenever u1 + ... + uk ≥ k. Therefore,

Z  2 −2 −k 2 2 1 Mk ≤ η k T g(u1) . . . g(uk) (u1 + ... + uk) − µ/γ du1 . . . duk u1,...,uk≥0 k Z −2 −k−1 2 2 2 2 2 = η k T g(u1) . . . g(uk) (u1 − µ /γ )du1 . . . duk u1,...,uk≥0 Z  Z = η−2k−k−1γk−1T u2g(u)2du − µ2/γ ≤ η−2k−k−1γk−1T u2g(u)2du, u≥0 u≥0 by symmetry. We deduce that

2 R  η−2T R 2 2 t≥0 g(t)dt − k u≥0 u g(u) du (13.8.1) ρ(F ) ≥ R 2 . t≥0 g(t) dt

Notice that we can multiply g through by a scalar and not affect the value in (13.8.1).

13.9. The optimal choice We wish to find the value of g that maximizes the right-hand side of (13.8.1). This can be viewed as an optimization problem: R Maximize t≥0 g(t)dt, subject to the constraints R 2 R 2 t≥0 g(t) dt = γ and t≥0 tg(t) dt = µ. One can approach this using the calculus of variations or even by discretizing g and employing the technique of Lagrangian multipliers. The latter gives rise to (a discrete form of)

Z Z  Z  g(t)dt − α g(t)2dt − γ − β tg(t)2dt − µ , t≥0 t≥0 t≥0 for unknowns α and β. Differentiating with respect to g(v) for each v ∈ [0,T ], we obtain 1 − 2αg(v) − 2βvg(v) = 0; that is, after re-scaling,

1 g(t) = for 0 ≤ t ≤ T, 1 + At 13.9. THE OPTIMAL CHOICE 131 for some real A > 0. We select T so that 1 + AT = eA, and let A > 1. We then calculate the integrals in (13.8.1): Z 1 γ = g(t)2dt = (1 − e−A), t A Z 2 1 −A tg(t) dt = 2 A − 1 + e , t A Z 2 2 1 A −A t g(t) dt = 3 e − 2A − e , t A Z and g(t)dt = 1, t 1 − (A − 1)e−A so that η = > 0, A(1 − e−A) which is necessary. (13.8.1) then becomes (13.9.1) A e2A (1 − e−A)2 e2A ρ(F ) ≥ − 1 − 2Ae−A − e−2A ≥ A − (1 − e−A) Ak (1 − (A − 1)e−A)2 Ak 1 1 Taking A = 2 log k + 2 log log k, we deduce that 1 1 ρ(F ) ≥ log k + log log k − 2. 2 2 Hence, for every m ≥ 1 we find that ρ(F ) > 4m provided e8m+4 < k log k. This implies the following result: Theorem 13.9.1. For any given integer m ≥ 2, let k be the smallest integer 8m+4 with k log k > e . For any admissible k-tuple of linear forms L1,...,Lk there exists infinitely many integers n such that at least m of the Lj(n), 1 ≤ j ≤ k are prime. For any m ≥ 1, we let k be the smallest integer with k log k > e8m+4, so that k k > 10000; in this range it is known that π(k) ≤ log k−4 . Next we let x = 2k log k > 5 x 1 10 and, for this range it is known that π(x) ≥ log x (1 + log x ). Hence 2k log k  1  k π(2k log k) − π(k) ≥ 1 + − log(2k log k) log(2k log k) log k − 4 and this is > k for k ≥ 311 by an easy calculation. We therefore apply the theorem with the k smallest primes > k, which form an admissible set ⊂ [1, 2k log k], to obtain: 8m+5 Corollary 13.9.2. For any given integer m ≥ 2, let Bm = e . There are infinitely many integers x for which there are at least m distinct primes within the interval [x, x + Bm]. By a slight modification of this construction, Maynard in [?] obtains

(13.9.2) ρ(F ) ≥ log k − 2 log log k − 1 + ok→∞(1) 3 4m from which he analogously deduces that Bm  m e . By employing Zhang’s improvement, Theorem 10.2.1, to the Bombieri-Vinogradov (4− 28 )m Theorem, one can improve this to Bm  me 157 and if we assume the Elliott- (2+o(1))m Halberstam conjecture, to Bm  e . The key to significantly improving these upper bounds on Bm is to obtain a much better lower bound on ρ(F ). 132 13. GOLDSTON-PINTZ-YILDIRIM IN HIGHER DIMENSIONAL ANALYSIS

13.10. Tao’s upper bound on ρ(F ) By the Cauchy-Schwarz inequality we have 2 Z 1−t1−...−tk−1  Z 1−t1−...−tk−1 dtk F (t1, . . . , tk)dtk ≤ · tk=0 tk=0 1 − t1 − ... − tk−1 + (k − 1)tk Z 1−t1−...−tk−1 2 × F (t1, . . . , tk) (1 − t1 − ... − tk−1 + (k − 1)tk)dtk. tk=0 Letting u = A + (k − 1)t we have Z A dt 1 Z kA du log k = = . 0 A + (k − 1)t k − 1 A u k − 1 Applying this with A = 1 − t1 − ... − tk−1 and t = tk, yields an upper bound for the numerator of ρ(F ): k log k Z X ≤ · F (t , . . . , t )2 (1 − t − ... − t + kt )dt . . . dt k − 1 1 k 1 k j k 1 t1,...,tk≥0 j=1 Z k log k 2 = · F (t1, . . . , tk) dtk . . . dt1. k − 1 t1,...,tk≥0 Hence k log k ρ(F ) ≤ = log k + o (1) k − 1 k→∞ so there is little room for improvement of Maynard’s upper bound, (13.9.2).