<<

The Theorem

Ben Green

Mathematical Institute, Oxford E-mail address: [email protected] 2000 Mathematics Subject Classification. Primary Contents

0.1. Overview 1 0.2. Notation 1 0.3. Quantities 2

Chapter 1. Basic facts about the primes 3 1.1. Euclid’s proof 3 1.2. Elementary bounds 3 1.3. The 6

Chapter 2. Arithmetic functions 9 2.1. M¨obiusinversion. 9

Chapter 3. Introducing the Riemann ζ-function 11 3.1. Dirichlet 11 3.2. The ζ-function 11 3.3. Looking forward 13 3.4. Meromorphic continuation to Re s > 0. 14

Chapter 4. Some 15 4.1. Introduction 15 4.2. Fourier analysis of Schwarz functions on R 16 4.3. Fourier analysis on Z and R/Z 20 4.4. The Poisson summation formula 24

Chapter 5. The and functional 25 5.1. The Γ-function 25 5.2. Statement of the functional equation for ζ 28 5.3. Proof of the functional equation 28

Chapter 6. The partial fraction expansion 31 6.1. Statement of the partial fraction expansion 31 6.2. The product formula for ζ 32 6.3. The size of a holomorphic function and its zeros 33 6.4. Weierstrass product of an integral function of order one: proof 36

v vi CONTENTS

Chapter 7. Mellin transforms and the explicit formula 39 7.1. Definitions and statement of the formula 39 7.2. Proof of the explicit formula – overview 41 7.3. Estimates for the Mellin transforms. 43 7.4. Bounds for ζ0/ζ at judicuously chosen points 45

Chapter 8. The prime number theorem 49

Chapter 9. The zero-free region. Error terms. 53 9.1. The classical zero-free region 53 9.2. The prime number theorem with classical error term 55 9.3. The and its implications for primes 57

Chapter 10. *An introduction to sieve theory 59 10.1. Introduction 59 10.2. Selberg’s weights 60

Appendix A. *Some smooth bump functions 65

Appendix B. Infinite products 69

Bibliography 73 0.2. NOTATION 1

0.1. Overview

These notes give a proof of the prime , together with background on complex analysis, the Riemann ζ-function, and Fourier analysis. My main aim is to give some approximation to the “correct” proof of the theorem which, in my opinion, follows Davenport’s book [1] up to a point but emphasises the role of smooth cutoff functions throughout and has no mention of “Perron’s formula”. Although this demands a little more sophistication on the part of the reader, the payoff is that different types of error terms are cleanly separated, and one can see very clearly the link between the primes and the zeta zeros. A technical advantage is that one may establish the prime number theorem using only the nonvanishing of ζ on Re s = 1, rather than a zero-free region. Other technical advantages are that one may get away with relatively crude estimates in many places, so there is no need for a careful asymptotic analysis of the Γ-function, for example. There are surprisingly few accounts along these lines in the literature. The closest I know if is Kowalski’s nice book [2]. However, this is a little faster-paced, wider-ranging and lighter on detail than we aim to be here, and some readers may be put off by the fact that it is in French. Davenport’s book is the traditional connoisseur’s choice for a course of this nature. It is an excellent book. However, as mentioned above, there are some notable differences in emphasis with our approach here. The book of Iwaniec and Kowalski [3] contains a vast amount of material and is really for the expert, but it should find space on the shelf of any person wishing to describe themselves in that way. The notes have evolved over around 14 years from various courses I have given in Cambridge and Oxford, most recently “C3.8 Analytic Number Theory” in Hilary Term 2018 at the latter institution.

0.2. Notation

As with any course, a certain amount of notation will be introduced as we go along. One very important point to be made at the outset is that log x always means the natural logarithm of x. Some people consider the notation ln x, which can be found in some books, tasteless. logC x is the same as (log x)C . We will very often use the notation btc, which means the greatest integer less than or equal to t. Throughout the course we will be using asymptotic notation. This is vital in handling the many inequalities and rough estimates we will encounter. Here is a 2 CONTENTS summary of the notation we will see. We suggest the reader not worry too much about this now; we will gain plenty of practice with this notation. • A  B means that there is an absolute constant C > 0 such that |A| 6 CB. In this notation, A and B will typically be variable quantities, depending on some other parameter. For example, x + 1  x for x > 1, because |x + 1| 6 2x in this range. It is important to note that the constant C may be different in different instances of the notation. • A = O(B) means the same thing.

• A k,l,m B means that |A| 6 CB, but now C is allowed to depend on some other parameters k, l, m. For example, kx k x, k + l + m =

Ok,l,m(1). • A  B is the same as B  A. • O(A) means some quantity bounded in magnitude by CA for some ab- solute constant C > 0. In particular, O(1) simply means a quantity 5x bounded by an absolute positive constant. For example, 1+x = O(1) for x > 0. • A = o(B) means that |A| 6 εB as some other parameter becomes large enough. The other parameter will usually be clear from context. For 1 example, log x = o(1) (as x → ∞). • A = ok,l,m(B) means that |A| 6 εB as some other parameter becomes large enough, but how large it needs to be may depend on the other k parameters k, l, m. For example, log x = ok(1). • A ∼ B, which we read as “A is asymptotic to B” means that A = (1 + o(1))B.

0.3. Quantities

In understanding analytic number theory, it is important to develop a robust intuitive feeling for the rough size of certain quantities. For example, one should be absolutely clear about the fact that, for X large, √ 10 log X 0.01 log X ≪ e ≪ X . CHAPTER 1

Basic facts about the primes

1.1. Euclid’s proof

This course is largely about the prime numbers 2, 3, 5, 7, ... . The most basic fact about them is the following, proven by Euclid over 2000 years ago.

Theorem 1.1. There are infinitely many primes.

Proof. Suppose not, and that p1, . . . , pN is a complete list of the primes. Consider the number M = p1 ··· pN + 1. It must have a prime factor q. However, M is manifestly not divisible by any of p1, . . . , pN , and so q∈ / {p1, . . . , pN }. This is a contradiction.

Of course, it is possible to ask more refined questions. Does the sequence of prime numbers grow extremely rapidly (like the powers of two 1, 2, 22, 23,... ), or slowly like the odd numbers 1, 3, 5, 7,... , or somewhere in between? This is the question that will occupy us in this course.

1.2. Elementary bounds

If X > 1 is a real number then we write π(X) for the number of primes less than or equal to X. Rather elementary methods suffice to get the correct order of magnitude for π(X).

Theorem 1.2. There are constants 0 < c1 < 1 < c2 such that, for all sufficiently large X, X X c π(X) c . 1 log X 6 6 2 log X

X Remarks. In asymptotic notation, we may thus assert that log X  π(X)  X log X , and the upper bound is saying that π(X) = O(X/ log X). The lower bound immediately implies the infinitude of primes (and with a much better bound than Euclid’s proof). The upper bound implies, in particular, that the density of the primes up to X, that is to say π(X)/X, tends to zero as X → ∞.

3 4 1. BASIC FACTS ABOUT THE PRIMES

Proof. We begin with the lower bound, which is slightly easier. For this, it suffices to show that

Y n (1.1) p > C1 p62n for some C1 > 1, and for sufficiently large n. Indeed, we then have

n Y π(2n) C1 6 p 6 (2n) , p62n n from which we obtain π(2n)  log(2n) . This implies the lower bound in Theorem 1.2 when X is an even integer; the general case follows very quickly from this. 2n To prove (1.1), we look at the prime factorisation of n , which we write as 2n Y = pvp(n). n p62n Now we claim the following three facts.

√ (i) If p > 2n then vp(n) 6 1; v (n) (ii) For all p, p p 6 2n. 2n 4n (iii) n > 2n+1 .

For items (i) and (ii) we use the formula ∞ X 2n n (1.2) v (n) = b c − 2b c. p pi i=1 P∞ m This is a consequence of the fact that the power of p dividing m! is i=1b pi c, this m m 2 sum being composed of b p c multiples of p, b p2 c multiples of p , and so on. (We 2n remark that sums such as (1.2) are all finite, since for example b pi c = 0 when pi > 2n.) Proof of (i): each term in (1.2) is at most 1, since b2xc − 2bxc ∈ {0, 1} for all √ real x. For p > 2n, only the term with i = 1 can be nonzero. log(2n) Proof of (ii): all terms in (1.2) are zero or 1, and those with i > log p , that i log(2n) is to say with p > 2n, are zero. Therefore vp(n) 6 log p , from which (ii) follows immediately. P2n 2n n 2n Proof of (iii): note that j=0 j = 4 and, in this sum, the middle term n is the largest one. Putting these facts together gives 4n √ Y √ Y (2n)π( 2n) p (2n) 2n p. 2n + 1 6 6 p62n p62n

It follows that (1.1) holds for any C1 < 4, provided n is sufficiently large. 1.2. ELEMENTARY BOUNDS 5

We turn now to the upper bound in Theorem 1.2. We claim that now it suffices to show that

Y X (1.3) p 6 C2 X/2 X/2, and so X X √ π( ) 2 X. 2m 6 2m 6 √ i 1 For i 6 m we have log(X/2 ) 6 log X = 2 log X. Combining these observations with (1.4) gives m √ 2 log C2 X X π(X) 2 X + . 6 1 log X 2i 2 i=1 X This is evidently O( log X ), which is the upper bound in Theorem 1.2. It remains to establish (1.3). For this, note that if n is an integer and if p > n 2n then p divides n precisely once. Therefore Y n p 6 4 . n

Y X p 6 2 . X/2

Y X p 6 X2 . X/2

From this, (1.3) follows immediately (with any C2 > 2).

Additional remark. Euclid’s proof of the infinitude of primes can be modified to give an explicit lower bound on π(X), but it is very weak: we leave it as an exercise to show that π(X)  log log X by this method. 6 1. BASIC FACTS ABOUT THE PRIMES

1.3. The prime number theorem

The main aim of this course will be to prove the prime number theorem, which states that c1 and c2 in Theorem 1.2 can be taken arbitrarily close to 1.

Theorem 1.3 (Prime number theorem). Suppose that 0 < c1 < 1 < c2. Then, if X is sufficiently large, we have X X c π(X) c . 1 log X 6 6 2 log X X X Equivalently, π(X) = (1 + o(1)) log X , or π(X) ∼ log X .

It is convenient to reformulate the prime number theorem in various ways. One very simple way of doing so is to observe that it is (tautologically) equivalent to the statement that X X 1 (n) ∼ , P log X n6X where 1P is the characteristic function of the primes, that is to say the function which is 1 on the primes and 0 elsewhere. It turns out that a much more natural function than the characteristic function of the primes is the . It is defined by ( log p if n = pk, p a prime; Λ(n) := 0 otherwise. One should think of Λ as being roughly a function that assigns weight log p to the k prime p; the contribution of the genuine prime powers p , k > 2, is negligible. The reason this is a natural definition will become gradually apparent. For now, however, let us note that the prime number theorem has an equivalent formulation in terms of the von Mangoldt function. We define X ψ(X) := Λ(n). n6X

Proposition 1.1. The prime number theorem is equivalent to the statement that ψ(X) ∼ X, that is to say that the average value of Λ(n) over n 6 X is asymptotically 1.

Proof. Write π(X) a(X) := , b(X) := ψ(X)/X. X/ log X Thus the task is to show that a(X) → 1 if and only if b(X) → 1. We first obtain an upper bound for b(X) in terms of a(X). For a given prime k p, the maximum k for which p 6 X is k = blog X/ log pc. The contribution of this 1.3. THE PRIME NUMBER THEOREM 7 p to ψ(X), X log p, k k:p 6X is therefore at most log X log pb c log X. log p 6 Only those primes p with p 6 X contribute at all, and so we have the inequality

ψ(X) 6 π(X) log X, and thus

(1.5) b(X) 6 a(X). Now we obtain a bound in the other direction. To this end, note that 1 X (1.6) a(X) − b(X) (log X − log p), 6 X p6X k the contribution of the proper prime powers p , k > 2, to −b(X) being negative. −2 −2 We split the sum over p into two ranges p 6 X(log X) and X(log X) < p 6 X (there is a certain amount of flexibility in these choices). We bound the contribution from the first range trivially: 1 X 1 (1.7) (log X − log p) < . X log X −2 p6X(log X) On the second range,

(1.8) log X − log p  log log X.

Thus 1 X log log X (log X − log p)  #{p : X(log X)−2 < p X} X X 6 −2 X(log X)

(1.9) a(X) 6 (1 + o(1))b(X) + o(1), which is enough for our purposes. The fact that if b(X) → 1 then a(X) → 1 follows immediately from (1.5), (1.9). 8 1. BASIC FACTS ABOUT THE PRIMES

The other direction is just slightly trickier: if a(X) → 1 then it certainly follows from (1.5) that b(X) 6 1 + o(1). In particular b(X) 6 2 for X sufficiently large, and so the term o(1)b(X) in (1.9) may be replaced by o(1). We then have a(X) 6 b(X) + o(1), which implies that b(X) > 1 + o(1) as well.

To conclude this chapter, let is record an immediate consequence of (1.5) and Theorem 1.2.

Proposition 1.2. We have the bound X Λ(n) = O(X). n6X CHAPTER 2

Arithmetic functions

An is simply a function f from N to C. The most important arithmetic function in this course is the von Mangoldt function Λ, introduced in the last section. However, this is far from the only interesting arithmetic function.

2.1. M¨obiusinversion.

Some of the material of this section is not really necessary for the main devel- opment of the course, and in particular for proving the prime number theorem. However, it is certainly part of the general culture of analytic number theory, and helps us place Λ in a more general context. k The M¨obiusfunction µ is defined by µ(n) = (−1) if n = p1 ··· pk for distinct primes p1, . . . , pk, and µ(n) = 0 otherwise. Dirichlet If f, g : N → C are two arithmetical functions then we define X X f ? g(n) := f(d)g(n/d) = f(a)g(b). d|n ab=n

Proposition 2.1 (M¨obiusinversion). Let f, g : N → C be two arithmetical functions. Then g = f ? 1 if and only if f = g ? µ.

Proof. Observe that Dirichlet convolution is commutative and associative. Note that µ ? 1 = δ, where δ(n) = 1 if n = 1 and 0 otherwise. This is a routine check: if

α1 ak n = p1 . . . pk then

X X ε1 εk µ(d) = µ(p1 . . . pk ) = (1 − 1) ··· (1 − 1) = 0.

d|n εi∈{0,1} Now if g = f ? 1 then

g ? µ = (f ? 1) ? µ = f ? (1 ? µ) = f ? δ = f.

Conversely if f = g ? µ then

f ? 1 = (g ? µ) ? 1 = g ? (µ ? 1) = g ? δ = g.

This concludes the proof.

9 10 2. ARITHMETIC FUNCTIONS

This leads to an important link between the von Mangoldt function and the M¨obiusfunction. P Lemma 2.1. We have Λ = µ ? log, that is to say Λ(n) = d|n µ(d) log(n/d).

Proof. This follows from M¨obiusinversion and the fact, obvious upon considering the prime factorization of n, that X Λ ? 1(n) = Λ(d) = log n. d|n

Here are some other commonly occurring arithmetical functions. • Euler’s φ-function φ(n) is defined to be the number of integers x 6 n which are coprime to n, which is the same thing as the order of the multiplicative group (Z/nZ)×, the group of invertible elements in Z/nZ. • The divisor function τ(n) (sometimes written d(n)) is defined to be the number of positive integer divisors of n, including 1 and n itself. P • The sum-of-divisors function σ(n) is defined to be d|n d. Various properties of these may be found in the exercises. CHAPTER 3

Introducing the Riemann ζ-function

3.1.

An important tool for working with arithmetical functions – particularly when they arise from considerations that are somehow multiplicative – is that of Dirichlet series. Let f : N → C be an arithmetical function. Then the Dirichlet Series of f is

X −s Df (s) := f(n)n . n At the moment this is just a “formal” Dirichlet series; if f grows extremely rapidly, the series may not make sense as an actual number for any value of s at all. In practice, f will have reasonable growth. Commonly, for example, |f(n)| = no(1): this is the case when f is the constant function 1, the M¨obiusfunction µ, the von Mangoldt function Λ, or the divisor function τ (by contrast to the first three, this last one is not completely obvious). In this case the series for Df (s) converges, and defines a holomorphic function of s, in the domain Re s > 1.

Indeed if Re s > 1 + δ then for N > N0(δ) sufficiently large we have ∞ ∞ X −s X δ/2 −1−δ −δ/2 | f(n)n | 6 n n  N . n=N+1 n=N+1 Thus if we set N (N) X −s Df (s) := f(n)n n=1 (N) then Df (s) → Df (s) uniformly in Re s > 1 + δ, which implies that Df is holo- morphic on the interior of any such domain.

3.2. The ζ-function

Perhaps the most basic arithmetical function is the constant function f = 1. The Dirichlet series of this function is called the Riemann ζ-function. Thus ∞ X ζ(s) = n−s. n=1

11 12 3. INTRODUCING THE RIEMANN ζ-FUNCTION

As just noted, the series converges and defines a holomorphic function in the domain Re s > 1. Euler product. Unique factorization into primes convinces us of the identity 1 1 1 1 1 1 ζ(s) = 1 + + + ··· = (1 + + + ... )(1 + + + ... ) ··· , 2s 3s 2s 22s 3s 32s where there is one product for each prime p = 2, 3, 5,... . Summing each of the geometric series suggests then that Y (3.1) ζ(s) = (1 − p−s)−1. p Let us pause to justify this formally. Let s, Re s > 1, be fixed. Fix an N, and suppose that m > log2 N. Then the product

 m  Y X −js P (m, N) :=  p  p6N j=0 may be expanded, and it equals 1 + 2−s + ··· + N −s plus a number of further, distinct, terms n−s with n > N. Thus ∞ X −s −δ |P (m, N) − ζ(s)| 6 |n |  N n=N+1 uniformly in m. Letting m → ∞, we thus obtain Y | (1 − p−s)−1 − ζ(s)|  N −δ. p6N Finally, letting N → ∞, we confirm the equality (3.1).

Equation (3.1) is called the Euler product for ζ and it is valid for Re s > 1. An amusing consequence of the above is yet another proof (our third) that there are infinitely many primes. Suppose not. Then, letting s → 1+ in (3.1) we see that lims→1+ ζ(s) < ∞. However this is not the case, since N N X X lim n−s = n−1, s→1+ n=1 n=1 and the harmonic series diverges. It turns out that the Dirichlet series of many of the basic arithmetical functions can be expressed in terms of ζ. In the following proposition we detail the most basic such relations; others may be found on the exercise sheet.

Proposition 3.1. We have the following facts about Dirichlet series.

(i) If the Dirichlet series of f is F (s), and that of g is G(s), then the Dirichlet series of the convolution f ? g is FG. 3.3. LOOKING FORWARD 13

P −s (ii) If Re s > 1 then n µ(n)n = 1/ζ(s); P −s 0 (iii) If Re s > 1 then n Λ(n)n = −ζ (s)/ζ(s).

Proof. For (i), note that the Dirichlet Series of f ? g is X X X ( f(a)g(b))n−s = f(a)g(b)(ab)−s. n ab=n a,b This proves the result. (ii) is best proved using the Euler product for ζ, which gives 1 Y X = (1 − p−s) = µ(n)n−s ζ(s) p n (the second step follows on expanding out the product: we leave the rigorous jus- tification that this is permissible to the reader). Assuming that the relevant series may be differentiated term-by-term, part (iii) can be proven using (i), (ii), the fact that Λ = µ ? log and the observation that X (3.2) ζ0(s) = − log n · n−s. n Alternatively, write X log ζ(s) = − log(1 − p−s) p and differentiate. Again, we leave rigorous justifcations to the reader. To justify P −s (3.2), it would be best to define g(s) := − n log n · n , and show that term-by- term integration is possible and gives ζ(s).

Corollary 3.1. We have ζ(s) 6= 0 when Re s > 1.

Proof. This follows immediately from (ii).

3.3. Looking forward

One of the central results of the course is the fact that ζ, though it is currently defined only for Re s > 1, extends to a meromorphic function on the complex plane, holomorphic except for a simple pole at s = 1. The function (s − 1)ζ(s) is then entire (holomorphic on the whole complex plane). It has zeros ρ, and we shall show that they are of two types: the trivial zeros −2, −4, −6,... and the nontrivial zeros, which all lie in the region 0 6 Re s 6 1. To understand ζ in terms of its zeros, it Q is natural to ask to what extent (s − 1)ζ(s) ∼ ρ(s − ρ). Though it is not quite possible to make sense of this statement, a slight variant of it does hold. Here, then, is a summary of the main facts about ζ we will establish in this chapter that will be important for the later theory. 14 3. INTRODUCING THE RIEMANN ζ-FUNCTION

Theorem 3.1. The Riemann ζ-function extends to a meromorphic function, an- alytic except for a simple pole at s = 1. It has “trivial” zeros at s = −2, −4, −6,... , all simple, and (possibly) “nontrivial zeros” in the critical strip 0 6 Re s 6 1. Fi- nally, we have the Weierstrass product expansion Y s (s − 1)ζ(s) = eAs+B (1 − )es/ρ, ρ ρ where the product is over all zeros of ζ and A, B are some constants (which can be evaluated explicitly).

Let us remark that at least one of the facts used in the proof of this theorem – the functional equation, Theorem 5.1 – is of great importance in its own right.

3.4. Meromorphic continuation to Re s > 0.

To conclude this introductory discussion of the ζ-function, we give a quick proof that it may be meromorphically continued a little to the left of its current domain of definition.

Proposition 3.2. The ζ(s) has a meromorphic contin- uation to the right half-plane Re(s) > 0, holomorphic except for a simple pole at s = 1.

Proof. For Re s > 1 we have ∞ ∞ X X ζ(s) = n−s = n(n−s − (n + 1)−s) n=1 n=1 ∞ X Z n+1 = s n x−s−1 dx n=1 n Z ∞ = s bxcx−s−1 dx 1 s Z ∞ = − s {x}x−s−1 dx. s − 1 1 (Here, {x} := x − bxc.) The integral here defines an on Re s > 0.

The formula for ζ(s) found here will sometimes be useful in its own right. CHAPTER 4

Some Fourier analysis

A key role will be played in the rest of the course by Fourier analysis. We take the time to develop the parts of the subject that we need in this chapter.

4.1. Introduction

Most students will have met both the and “Fourier series” at undergraduate level. It turns out that these are just two instances of the same concept, that of a Fourier transform on a locally compact abelian group (LCAG). We will not go into any details of the theory in this generality – in fact there is not even any need to define a LCAG. The reader may, however, benefit from seeing the unified context at least in vague outline. If G is a LCAG then we associate to G its dual Gb, which consists of all continuous homomorphisms (characters) from G to S1, the unit circle {z ∈ C : |z| = 1}. It is easy to see that Gb is a group under pointwise multiplication. If f : G → C is a “suitably nice” function and γ ∈ Gb is a character then the Fourier transform fb(γ) is defined to be Z fb(γ) := f(x)γ(x)dµG(x) = hf, γi, G where µG is the Haar measure on G, a certain uniquely-defined measure with natural properties. We will not need the general theory here, and instead illustrate with examples.

Example 4.1 (G = R). It turns out that all characters have the form x 7→ eiξx, for ξ ∈ R. Thus Rb = R, that is R is self-dual. If f : R → C is integrable then we define Z ∞ fb(ξ) := f(x)e−iξx dx. −∞ (Stricly speaking, this is an abuse of notation: the Fourier transform is really defined “at the character x 7→ eiξx, and not at ξ.)

Example 4.2 (G = Z). It turns out that all characters have the form n 7→ e2πiθn, where θ ∈ R/Z. Thus Zb = R/Z. If f : Z → C is suitably nice we define X fb(θ) := f(n)e−2πiθn. n∈Z 15 16 4. SOME FOURIER ANALYSIS

(This is again an abuse of notation, similar to the previous one.)

Example 4.3 (G = R/Z: “Fourier series”). Here all the characters have the form θ 7→ e2πinθ, where n ∈ Z. Thus Rd/Z = Z. If f : R/Z → C is suitably nice we define Z 1 fb(n) := f(θ)e−2πiθn dθ. 0

The last two groups, Z and R/Z, are an example of a dual pair.

Example 4.4 (G = Z/NZ: Discrete Fourier transform). Here all characters have the form x 7→ e2πirx/N , where r ∈ Z/NZ (thus Gb =∼ G). In this finite setting all functions are “suitably nice” and we define

−2πirx/N 1 X −2πirx/N fb(r) := x∈ /N f(x)e = f(x)e . E Z Z N x∈Z/NZ

Example 4.5 (G = × : Mellin transform). In fact, G is isomorphic to via R>0 R the logarithm map x 7→ log x. The characters are x 7→ xiξ for ξ ∈ R. We have Z ∞ fˆ(ξ) = f(x)x−iξd×x, 0 ×x dx where d = x .

The Mellin transform will feature prominently later on. We will need to extend its domain of definition from iR to all of C, defining Z ∞ f˜(s) := f(x)xsd×x 0 (we recover the previous definition on taking s = −iξ). Often we will be look at a fixed vertical contour s = σ + iR, in which case the Mellin transform really is a Fourier transform, in fact of the function f(x)x−σ on × . R>0 Much of the theory of Fourier analysis is concerned with what is meant by “suit- ably nice”, and with such questions as when one can prove an inversion formula and what the decay of Fourier coefficients tells us about the smoothness of a func- tion. This is a fascinating theory. However for much of analytic number theory the deeper parts of this theory can be avoided. In this course we will work with particularly nice classes of functions called Schwartz functions, where most of the analytic issues can be avoided and Fourier analysis takes on an almost “algebraic” flavour.

4.2. Fourier analysis of Schwarz functions on R In this section, and for the rest of the course, we use the symbol ∂ for the R ∞ differentiation operator. We also use the notation kfk1 := −∞ |f(x)|dx. 4.2. FOURIER ANALYSIS OF SCHWARZ FUNCTIONS ON R 17

The Fourier transform of a smooth function decays rapidly. This fundamental fact will be used repeatedly in this course.

Lemma 4.1 (Fourier decay properties). Suppose that g : R → R is a smooth function, all of whose derivatives lie in L1(R)(that is, are integrable) and decay to infinity. Then for any m we have the decay estimate

−m m |gb(ξ)| 6 |ξ| k∂ gk1.

−m Remark. In particular, if g is regarded as fixed then we have |gˆ(ξ)| m |ξ| for all m. Proof. Recall the definition of the Fourier transform, that is to say Z ∞ gˆ(ξ) := g(x)e−iξxdx. −∞ Repeated integration by parts gives Z ∞ 1 m m iξx gb(ξ) = − ∂ g(x)e dx. iξ −∞ The result then follows immediately from the triangle inequality.

We now introduce a proper notion of “sufficiently nice” function which is ap- propriate to Fourier analysis on R is that of a Schwartz function. This is a smooth function, all of whose derivatives decay rapidly at infinity.

Definition 4.1 (Schwartz space S(R)). Let f ∈ C∞(R), that is to say suppose that f is infinitely differentiable. We say that f belongs to Schwartz space S(R) if lim |x|n∂mf(x) = 0 |x|→∞ for all integers m, n > 0.

We note in particular that, since ∂mf(x)  |x|−2 for large x, every derivative of a Schwartz function f lies in L1(R) and hence has well-defined Fourier transform. One of the main reasons for introducing this definition is that the Fourier trans- form maps Schwartz functions to Schwartz functions.

Lemma 4.2 (Fourier transforms of Schwartz functions are Schwartz). Suppose that f ∈ S(R). Then fb ∈ S(R).

n Proof. Note first of all that if f ∈ S(R) then, for any fixed n ∈ Z>0, x f ∈ S(R) as well. This follows easily by differentiating xnf repeatedly using the product rule; n0 j 0 each term is of the form x ∂ f for some n 6 n and j > 0, and every such term decays quicker than any power of x. 18 4. SOME FOURIER ANALYSIS

Now differentiation under the integral in the definition of Fourier transform furnishes the following relation:

∂nfb(ξ) = (−2πi)n(xnf)∧(ξ).

Since xnf is a Schwarz function, all of its derivatives lie in L1(R). It follows from Lemma 4.1 that ∂nfb(ξ) decays quicker than any polynomial, and hence fˆ ∈ S(R).

Using Fubini’s theorem, which is always valid when dealing with Schwartz func- tions, we can deduce the following important property of the Fourier transform.

Lemma 4.3. Suppose that f, g ∈ S(R). Then Z Z fgb = fg.b R R We will need an explicit form for the Fourier transform of Gaussian functions. This is also used, in its own right, in the proof of Lemma 5.1.

Lemma 4.4 (Fourier transform of Gaussians). Suppose that t ∈ R+, and set 2 2 f(x) = e−πx t. Then f(ξ) = √1 e−ξ /4πt. b t

Proof. Observe that Z ∞ Z −ξ2/4πt −πt(x+ iξ )2 −ξ2/4πt (4.1) fb(ξ) = e e 2πt dx = e f(z) dz, −∞ Γ1 where Γ1 is the line contour running from ∞ + iξ/2πt to −∞ + iξ/2πt. To evaluate this contour integral, integrate f (which is clearly defined on all of C) around the box defined by contours Γ1,R = [R+iξ/2πt, −R+iξ/2πt], Γ2,R = [−R+iξ/2πt, −R], R R Γ3,R = [−R,R] and Γ4,R = [R,R + iξ/2πt]. Clearly f(z) dz and f(z) dz Γ2,R Γ4,R both tend to 0 as R → ∞, and so by Cauchy’s theorem ∞ Z Z Z Z 2 f(z) dz = lim f(z) dz = − lim f(z) dz = e−πx t dx. R→∞ R→∞ Γ1 Γ1,R Γ3,R −∞ √ This last integral may be evaluated as 1/ t by making a simple substitution and recalling the Gaussian integral ∞ Z 2 e−πx dx = 1. −∞ This concludes the proof of the lemma.

Let  > 0. Taking t = 2/4π2 in the preceding lemma we see that if

1 2 2 g (x) := e−πx /   4.2. FOURIER ANALYSIS OF SCHWARZ FUNCTIONS ON R 19 then g(u) = φb(u), where

1 2 2 φ (x) := e−x  /4π.  2π We are now in a position to prove one of the key results concerning Fourier analysis on R, the Fourier inversion formula.

Proposition 4.1 (Fourier inversion formula for Schwartz functions). Suppose that f ∈ S(R). Then 1 Z ∞ f(x) = fb(ξ)eixξ dξ. 2π −∞

Remark. The factor 2π is a manifestation of the fact that we have two copies of R here, namely R and Rb. In our definition of the Fourier transform we have implicitly given an isomorphism from R to Rb. Under this isomorphism Lebesgue measure on R gets multiplied by a factor of 2π.

Proof. It suffices to establish this when x = 0, since if f is Schwartz then so is the “translate” function g defined by g(t) = g(t + x). One may check that gˆ(ξ) = eixξfˆ(ξ), and so by applying the inversion formula at 0 to g yields Z ∞ Z ∞ f(x) = g(0) = gˆ(ξ)dξ = fˆ(ξ)eixξdξ. −∞ −∞ Whilst not a crucial step, this simplification affords some notational simplicity. Using Lemma 4.3 we obtain Z ∞ Z ∞ 1 −ξ22/4π f(u)g(u) du = fb(ξ)e dξ. −∞ 2π −∞ Taking limits as  → 0, we see that ∞ ∞ 1 Z 2 2 1 Z fb(ξ)e−ξ  /4π dξ → fb(ξ) dξ. 2π −∞ 2π −∞ It suffices, then, to show that Z ∞ f(0) = lim f(u)g(u) du. →0 −∞ This is “clear” and not too difficult to justify rigorously either. To do so, let δ > 0. R Since g = 1, we see that it suffices to show that Z ∞ lim (f(u) − f(0))g(u) du = 0. →0 −∞ To do this we divide the integral into the two ranges |u| 6 δ and |u| > δ. The integral over the first range is bounded by

sup |f(u) − f(0)|. |u|6δ 20 4. SOME FOURIER ANALYSIS

The integral over the second is at most Z 2kfk∞ g(u)du. |u|>δ Note that Z Z −πx2 g(x) dx = e dx, x>δ x>δ/ which clearly tends to 0 as δ/ → ∞. Putting these two facts together we obtain Z ∞

(f(u) − f(0))g(u) du 6 sup |f(0) − f(u)| + kfk∞oδ/→∞(1). −∞ |u|6δ Since f is continuous at 0, this can be made as small as desired by choosing δ small, and then  even smaller (say  = δ2) so that δ/ → ∞.

Remark. It is possible to axiomatize the properties of the system (g) that made this work, obtaining the notion of a sequence of “approximations to the identity”. We shall not do so here.

A consequence of this and Lemma 4.3 is Plancherel’s formula.

Proposition 4.2 (Plancherel). Suppose that f ∈ S(R). Then Z ∞ Z ∞ |f(x)|2dx = 2π |fb(ξ)|2dξ. −∞ −∞

Proof. Take g := fb in Lemma 4.3, which stated that Z ∞ Z ∞ fgb = fg.b −∞ −∞

R 2 The right hand side is |fb| . Meanwhile Z 1 g(x) = fb(ξ)e−iξxdξ = f(x), b 2π by the inversion formula. The result follows.

4.3. Fourier analysis on Z and R/Z In the proof of the functional equation for ζ we will make use of some Fourier analysis on the group Z and its dual R/Z. We will only develop this theory for sufficiently smooth “Schwartz” functions, and it parallels the theory over R – just discussed very closely. This is an illustration of the fact that the natural place for harmonic analysis is on a general LCAG.

Definition 4.2 (Schwarz spaces). We define S(R/Z) to be the same as C∞(R/Z), the space of smooth functions on R/Z. (There is no “decay at infinity” condition 4.3. FOURIER ANALYSIS ON Z AND R/Z 21 for this group, which is compact.) We define S(Z) to be the space of functions k f : Z → C with the property that lim|n|→∞ |n| |f(n)| = 0 for all k. (There is no “smoothness” condition for this group, which is discrete.)

Recall the definition of the Fourier transform on R/Z: Z 1 (4.2) fb(n) := f(θ)e−2πinθ dθ 0 for n ∈ Z. Recall also the definition of the Fourier transform on Z: X (4.3) fb(θ) = f(n)e−2πiθn n∈Z for θ ∈ R/Z. Although we are using the hat symbol in two rather different contexts at once, little(!) confusion should result so long as the reader is careful to note which group each function under consideration is defined upon. Differentiation of (4.2) tells us that if f ∈ S(Z) then ∂kfb(θ) = (−2πi)k(nkf)∧(θ). Integration by parts of (4.3) tells us that if f ∈ S(R/Z) then (2πin)kfb(n) = (∂kf)∧(n). These properties once again allow us to establish closure of the Schwartz classes under the Fourier transform.

Lemma 4.5 (Fourier transform respects Schwartz classes). If f ∈ S(R/Z) then fb ∈ S(Z). If f ∈ S(Z) then fb ∈ S(R/Z).

Proof. This is actually somewhat easier than the proof of Lemma 4.2 and is left to the reader.

Once again an application of Fubini’s theorem (in this case an interchange of integration and summation) allows one to conclude the following. P Lemma 4.6. Suppose that f ∈ S(R/Z) and g ∈ S(Z). Then n fb(n)g(n) = R 1 f(θ)g(θ) dθ. 0 b Now it is time to prove the inversion formula. In this setting, with two different groups Z and R/Z, there are two different inversion formulæ. One is almost trivial:

Proposition 4.3 (Fourier inversion formula for S(Z)). Suppose that f ∈ S(Z). Then Z 1 f(n) = fb(θ)e2πinθ dθ. 0

Proof. Substitute in the definition of fb(θ) and swap the order of summation and integration. Then use the fact that Z 1 e2πimθ dθ = 0 0 22 4. SOME FOURIER ANALYSIS when m ∈ Z \{0}.

The other inversion formula is less trivial (it is also the one that we need in establishing the Poisson summation formula). It is proved using a type of “approx- imation to the identity” rather analogous to the use of gaussians in the proof of Proposition 4.1.

Proposition 4.4 (Fourier inversion formula for S(R/Z)). Suppose that f ∈ S(R/Z). Then X f(α) = fb(n)e2πinα. n∈Z

Proof. We will use Lemma 4.6 and will deal with the case α = 0 for notational simplicity (the general case may, as before, be deduced from this rather easily). Define, for N ∈ N, |n| g (n) := 1 − 1 . N N |n|6N By direct computation one may see that

−2πiθN 2 2 1 1 − e sin (πθN) gN (θ) = = . b N 1 − e−2πiθ N sin2(πθ) ◦ (A more conceptual way to establish this is to observe that gN = NhN ∗ hN , where −1 ◦ hN (n) := N 106n6N−1, the ∗ denotes convolution and hN (n) = hN (−n). The Fourier transform of gN is then just the absolute value of the square of bhN . We note that the gaussian g is also the convolution of a function with itself, that function being a gaussian of half the width.) Write sin2(πθN) KN (θ) := . N sin2(πθ) This is called the Fej´erkernel. From Lemma 4.6 we know that X Z 1/2 fb(n)gN (n) = f(θ)KN (θ) dθ. −1/2 n∈Z (Later on, it is notationally convenient to take the range of integration over [−1/2, 1/2]; since the integrand has period 1, this is permissible.) It is quite easy to see that X X lim fb(n)gN (n) = fb(n). N→∞ n∈Z n∈Z It remains, then, to show that Z 1/2 (4.4) lim f(θ)KN (θ) dθ = f(0). N→∞ −1/2 4.3. FOURIER ANALYSIS ON Z AND R/Z 23

Since for large N the mass of KN (θ) is concentrated near θ ≈ 0, so this is “intuitively clear”, in the same way it was in the proof of Proposition 4.1. The details are remarkably similar as well, and depend on the following properties of the Fej´er kernel, which again constitute a notion of “approximation to the identity”:

(i) KN (θ) > 0 R 1 (ii) 0 KN (θ) dθ = 1 R 1 (iii) KN (θ) dθ  . 1/2>|θ|>δ Nδ To prove (ii), one would do best to recall the definition of KN (θ) as gbN (θ), rather than attempt to integrate the closed form of KN directly. The proof of (iii) is straightforward using the fact that | sin t| > 2|t|/π for |t| 6 π/2. Indeed we have Z 1 Z 1/2 1 1 K (θ) dθ  dθ  . N N |θ|2 δN 1/2>|θ|>δ δ Returning to the proof of (4.4), observe that from (ii) we have Z 1/2 Z 1/2 f(θ)KN (θ)dθ − f(0) = (f(θ) − f(0))KN (θ)dθ. −1/2 −1/2 1 Split the range of integration into two ranges, |θ| 6 δ and δ < |θ| 6 2 . The integral over the first range is bounded by Z 1/2 sup |f(θ) − f(0)| |KN (θ)|dθ |θ|6δ −1/2 which, by (i) and (ii), is at most

(4.5) sup |f(θ) − f(0)|. |θ|6δ The integral over the second range is bounded by Z 2kfk∞ |KN (θ)|dθ 1 δ<|θ|6 2 which, by (iii), is at most 1 (4.6)  kfk . ∞ Nδ Taking N = d1/δ2e (say) and letting δ → 0, we see that both (4.5) and (4.6) tend to 0. This concludes the proof of Theorem 4.4. We conclude this section by stating, as a corollary, the only result concerning Fourier series that is actually needed in the course (in the proof of the Poisson summation formula).

Corollary 4.1 (Uniqueness for Fourier series). Suppose that f1, f2 ∈ S(R/Z) are two functions with fb1(n) = fb2(n) for all n ∈ Z. Then f1 ≡ f2 identically. Proof. This is absolutely immediate from Proposition 4.4. 24 4. SOME FOURIER ANALYSIS

4.4. The Poisson summation formula

An important ingredient in the proof of the functional equation for the ζ-function is a version of the Poisson summation formula.

Lemma 4.7 (Poisson summation). Suppose that f ∈ S(R). Then X X f(n) = fb(2πn). n∈Z n∈Z Proof. Consider two functions F,G : R/Z → C, defined by X F (θ) := fb(2πn)e2πinθ n∈Z and X G(θ) := f(θ + k). k∈Z Due to our conditions on f it is straightforward to check that F and G are C∞- functions. We will compute their Fourier coefficients, the aim being to show that these are equal. On the one hand we have, for m ∈ Z, ! Z 1 X Fb(m) := fb(2πn)e2πinθ e−2πimθ dθ 0 n∈Z X Z 1 = fb(2πn) e2πi(n−m)θ dθ 0 n∈Z = fb(2πm).

On the other hand, we have ! Z 1 X Gb(m) := f(θ + k) e−2πimθ dθ 0 k∈Z ! Z 1 X = f(θ + k) e−2πim(θ+k) 0 k∈Z X Z k+1 = f(x)e−2πimx dx, k k∈Z which is also equal to fb(2πm). Thus indeed Fb(n) = Gb(n), which by Corollary 4.1 implies that F = G identically. In particular F (0) = G(0), which is exactly the Poisson summation formula. CHAPTER 5

The analytic continuation and functional equation

5.1. The Γ-function

Define the Γ-function by Z ∞ (5.1) Γ(z) = e−ttz−1 dt. 0 for Re z > 0. Observe that if Re z > 0 then, since |tz−1| = tRe z−1, the integral converges. In Proposition 5.1 below, we establish the basic facts about Γ. Before doing this we record a result from complex analysis connected with “Weierstrass products”. For a proof (not examinable) of this result, see Appendix B.

Proposition B.1 (Weierstrass product). Suppose that Ω is such that 0 ∈/ Ω P −2 and ρ∈Ω |ρ| < ∞ (where the sum over Ω is taken with multiplicity). Then the function Y z E (z) := (1 − )ez/ρ Ω ρ ρ∈Ω is well-defined, entire (i.e. holomorphic on the whole complex plane), and has zeros at Ω with the correct multiplicities and nowhere else.

Of course, the proposition seems eminently reasonable and if Ω were finite it would be trivial. The factors of ez/ρ ensure that the product converges; note that 2 (1 − w)ew ≈ 1 − w2 ≈ e−w for small w, and so one may guess that convergence of the product is intimiately tied to convergence of P ρ−2.

Proposition 5.1. We have the following basic facts about the Γ-function.

(i) Γ(z), as defined by (5.1), is holomorphic in Re z > 0. (ii) Γ(z) extends to a meromorphic function all of C, satisfying the functional equation zΓ(z) = Γ(z + 1). It has simple poles at z = 0, −1, −2,... and no other poles. (iii) Set Ω = {−1, −2, −3,... }. Then we have the Weierstrass formula ∞ 1 Y = zeγzE (z) = zeγz (1 + z/n)e−z/n Γ(z) Ω n=1

25 26 5. THE ANALYTIC CONTINUATION AND FUNCTIONAL EQUATION

for all complex z, where  1 1  γ := lim 1 + + ··· + − log n n→∞ 2 n is Euler’s constant. In particular, Γ(z) is never 0.

Pn −1 Remark. One can easily establish that γ exists by comparing the sum j=1 j R n with the integral 1 dx/x. We have γ = 0.577215665... Euler’s constant γ is rather mysterious, and in fact it is not even known to be irrational. R ∞ −t z−1 Proof. (i) We claim that the derivative of Γ is 0 e t log tdt (that is, the “obvious” choice based on differentiating under the integral). To prove this we must show that Z ∞ th − 1 lim e−ttz−1 − log tdt = 0. h→0 0 h Since the derivative of ts at s = 0 is log t, this follows by interchanging the order of the limit and the integral, provided this can be justified. Suppose that Re z = δ. By the dominated convergence theorem, it is enough to show that h −t z−1 t − 1 (5.2) e t sup − log t h |h|6δ/2 is integrable from 0 to ∞. Now (th − 1)/h is an analytic function of h (it has a removable singularity at h = 0). By the maximum modulus principle we have th − 1 th − 1 sup | | = sup | |  1 + tδ/2. h h δ |h|6δ/2 |h|=δ/2 Thus (5.2) is bounded by

−t δ−1 δ/2 δ e t (1 + t + | log t|).

This function is indeed integrable: as t → ∞, the e−t more than compensates for the other terms, and as t → 0 we are only integrating functions that behave like t−c for c < 1. (ii) Suppose first that Re z > 0. Integrating by parts we have, provided Re z > 0, Z ∞ z −t ∞ −t z−1 Γ(z + 1) = [−t e ]0 + z e t dt 0 = zΓ(z).

This relation may be used to extend Γ meromorphically to the entire complex plane. One begins by extending it to the domain Re z > −1 via the formula Γ(z) := Γ(z + 1)/z. One can then iterate, extending in turn to the domains Re z > −n, n = 2, 3,... . Note that in the process of extending to Re z > −1 we introduce a pole at z = 0. This propagates along so that we get simple poles at all negative 5.1. THE Γ-FUNCTION 27 integers n = −1, −2,... as well. Γ(z) is meromorphic, and the poles just described are the only ones. (iii) Define the function Γe by ∞ 1 Y (5.3) := zeγz (1 + z/n)e−z/n. Γ(e z) n=1 By Proposition B.1, 1 is entire and has zeros only at 0, −1, −2,... . Therefore Γ is Γ˜ e meromorphic on C (with poles at 0, −1, −2,... ). Our aim, of course, is to show that Γ(z) = Γ(˜ z) for all z ∈ C. By the identity principle and the fact that both Γ and Γ˜ are meromorphic, it suffices to establish this when Re z > 0. Consider, for positive integer m, the function

m fm(t) := (1 − t/m) 1[0,m](t).

−t −x Note that fm(t) 6 e for all m (since 1 − x 6 e when x > 0) and that −t limm→∞ fm(t) = e for every t. By the dominated convergence theorem it follows that if Re z > 0 then Z ∞ Z ∞ z−1 −t z−1 (5.4) lim fm(t)t dt = e t dt = Γ(z). m→∞ 0 0 On the other hand one may verify, by repeated integration by parts, the formula Z ∞ z z−1 m m! (5.5) fm(t)t dt = 0 z(z + 1) ... (z + m) (first make the substitution t = mu to write the integral as Z 1 mz (1 − u)muz−1 du.) 0 Thus, from (5.3) and some rearrangement we have Z ∞ ∞ z−1 (γ−1− 1 −···− 1 +log m)z Y −z/n fm(t)t dt = Γ(˜ z)e 2 m (1 + z/n)e . 0 n=m+1 Since the infinite product converges for every z, it follows from this and the defini- tion of γ that Z ∞ z−1 lim fm(t)t dt = Γ(˜ z). m→∞ 0 Comparing this with (5.4) tells us that indeed Γ(z) = Γ(˜ z) for Re z > 0.

We note that, as an easy deduction from (ii), we have Γ(k + 1) = k! for all positive integers k. 28 5. THE ANALYTIC CONTINUATION AND FUNCTIONAL EQUATION

5.2. Statement of the functional equation for ζ

Define the completed ζ-function by

Ξ(s) = γ(s)ζ(s), where the gamma factor γ(s) is defined to equal π−s/2Γ(s/2). Our objective in this section is to prove the following theorem, the functional equation for ζ.

Theorem 5.1. The completed ζ-function Ξ is meromorphic in C, and its only poles are simple ones at s = 0 and 1. It satisfies the functional equation

Ξ(s) = Ξ(1 − s).

As a corollary we obtain a the first part of Theorem 3.1, the main result of this chapter about the ζ-function.

Corollary 5.1. The ζ-function has a meromorphic continuation to all of C. It has a simple pole at s = 1 and no other poles. It has zeros at s = −2, −4, −6,... and no other zeros outside of the critical strip 0 6 Re s 6 1.

Proof. The meromorphic continuation is obvious, since ζ(s) = πs/2(Γ(s/2))−1Ξ(s) and the three functions here are all meromorphic. The statements about are a consequence of the following information, which we have assembled at various earlier points of the course: • Ξ(s) = Ξ(1 − s) and the only poles of Ξ are simple ones at s = 0 and s = 1; • Γ has no zeros and simple poles at 0, −1, −2,... ; • ζ has no zeros with Re s > 1.

The set of all zeros of ζ will be denoted by Z. We further divide this set into the set of trivial zeros Ztriv, by which we mean −2, −4, −6,... , and nontrivial zeros

Znontriv, by which we mean all the other zeros. By the above corollary every zero in Znontriv lies in the critical strip 0 6 Re s 6 1. Note that, with our knowledge at this point, Znontriv could be empty. In fact it is infinite and on can obtain quite good control on the number of nontrivial zeros with imaginary part at most T . We will return to this later.

5.3. Proof of the functional equation

We turn now to the proof of Theorem 5.1. A key tool here will be the θ-function. 5.3. PROOF OF THE FUNCTIONAL EQUATION 29

We will apply Lemma 4.7 to derive a functional equation for the θ-function. Let us now define that function. For z in the upper half-plane H := {z : =z > 0} set

X 2 θ(z) := eiπn z. n∈Z It is quite easy to see that θ is analytic in H.

5.1 Suppose that t ∈ +. Then θ(it) = √1 θ(i/t). Lemma . R t

Remark which may be of interest to some: Then this functional equation for θ may √ be analytically extended to all of H to give something like θ(−1/z) = zθ(z). We have been deliberately vague about the branch of square root to be taken here. In conjunction with the easily verified relation θ(z + 2) = θ(z), this means that the function θ(2z) is a modular form of weight 1/2 for the group Γ0(4) of matrices ! a b with a, b, c, d ∈ Z, ad − bc = 1 and 4|c. c d

2 Proof. Suppose that t ∈ R+, and set f(x) = e−πx t. We calculated (by direct calculation) the Fourier transform fˆ(λ) in Lemma 4.4; indeed

1 2 fb(ξ) = √ e−ξ /4πt. t Lemma 5.1 is simply a matter of applying the Poisson summation formula in this case.

The next lemma relates the ζ-function to the θ-function. It turns out that the ζ-function is, roughly speaking, the Mellin transform of θ (we will see Mellin trans- forms again later).

Lemma 5.2. Suppose that Re s > 1. Then Z ∞ θ(ix) − 1 dx Ξ(s) = xs/2 . 0 2 x

Proof. Observe that ∞ ∞ Z 2 dx Z du e−πn xxs/2 = π−s/2n−s e−uus/2 = π−s/2Γ(s/2)n−s. 0 x 0 u Now simply sum over n ∈ N.

Proof of Theorem 5.1. Suppose that Re s > 1. We split the integral in Lemma 5.2 into the ranges [0, 1] and [1, ∞), and then apply Lemma 5.1 to the first of these. 30 5. THE ANALYTIC CONTINUATION AND FUNCTIONAL EQUATION

We have Z 1 θ(ix) − 1 dx 1 Z 1 dx 1 xs/2 = θ(ix)xs/2 − 0 2 x 2 0 x s 1 Z 1 i dx 1 = θ( )x(s−1)/2 − 2 0 x x s 1 Z ∞ du 1 = θ(iu)u(1−s)/2 − 2 1 u s Z ∞ θ(iu) − 1 du 1 1 = u(1−s)/2 − − . 1 2 u s 1 − s Thus Z ∞ θ(ix) − 1 dx Z ∞ θ(ix) − 1 dx 1 1 xs/2 = (xs/2 + x(1−s)/2) − − , 0 2 x 1 2 x s 1 − s which implies that Z ∞ θ(ix) − 1 dx 1 1 Ξ(s) = (xs/2 + x(1−s)/2) − − . 1 2 x s 1 − s A priori this formula is valid only for Re s > 1. Note, however, that the right-hand side defines a function which is meromorphic in the whole complex plane, with simple poles at s = 0 and 1, and which manifestly satisfies the claimed relation Ξ(s) = Ξ(1 − s). CHAPTER 6

The partial fraction expansion

6.1. Statement of the partial fraction expansion

In prime number theory, the ζ-function itself is less important than the loga- rithmic derivative ζ0/ζ. This is, of course, because ζ0/ζ is the Dirichlet series of the 0 P −s von Mangoldt function Λ, that is to say ζ (s)/ζ(s) = n Λ(n)n . The heart of the link between primes and zeros of ζ is the partial fraction expansion of ζ0/ζ.

Proposition 6.1. There is a constant C such that ζ0(s) 1 X 1 1 = C − + + . ζ(s) s − 1 s − ρ ρ ρ∈Z

Here sum is over all zeros Z = Ztriv ∪ Znontriv of ζ: the trivial zeros Ztriv := {−2, −4, −6,... } and the nontrivial zeros Znontriv in the critical strip 0 6 Re s 6 1.

The constant C can be computed explicitly, but this is not important for most √ applications; in fact eC = π 2/e. Proposition 6.1 is a consequence of a product expansion (Theorem 6.1) and repeated use of the fact that (fg)0 f 0 g = + . fg f g0 In fact, we need some version of this for infinite products; the justification of this is an exercise on sheet 4.

Theorem 6.1 (Hadamard Product for ζ). There are constants A and B (which can be evaluated explicitly) such that Y s (s − 1)ζ(s) = eAs+B (1 − )es/ρ, ρ ρ∈Z P −2 where again Z denotes both trivial and nontrivial zeros of ζ. We have ρ∈Z |ρ| < ∞, so the product here is a Weierstrass product and converges as described in Proposition B.1.

Theorem 6.1 is quite reminiscent of the well-known fact that if f : C → C is a Q polynomial then f(z) = C ρ(z−ρ), where the product is over the (finite) collection

31 32 6. THE PARTIAL FRACTION EXPANSION of zeros of f. Of course, (s−1)ζ(s) is not a polynomial. However, Theorem 6.1 may still be obtained as a consequence of a quite general theorem about entire functions, rather than any particular specific properties of ζ. We state such a theorem now.

Definition 6.1. Suppose that f : C → C is an entire function satisfying a 1+ε Cε|z| growth condition |f(z)| ε e for all ε > 0. Then we say that f is an integral function of order 1.

Proposition 6.2. Suppose that f is an integral function of order 1. Suppose that f has a zero of order r at 0, and write Ω for the set of other zeros of f. Then P −1−ε ρ∈Ω |ρ| < ∞ and there are constants A and B such that Y z f(z) = zreAz+B (1 − )ez/ρ. ρ ρ Proposition 6.2 is, as we said, a result of complex analysis and has no arithmetic content. The proof is given in Sections 6.3 and 6.4.

6.2. The product formula for ζ

In this section we will deduce Theorem 6.1, the Hadamard product for ζ, from Proposition 6.2, the product expansion for integral functions of order one. Rather than prove that (s − 1)ζ(s) is an integral function of order one, it is convenient to first include some Γ- and other factors.

Lemma 6.1. The function f(s) = s(1 − s)π−s/2Γ(s/2)ζ(s) = s(1 − s)Ξ(s) is an integral function of order one.

Proof. We have already noted that Ξ is meromorphic except for simple poles at s = 0, 1, so f is entire. Therefore we need only establish a growth condition of the form 1+ε |f(s)|  eCε|s| for all ε > 0. In fact we will establish the stronger bound

|f(s)|  exp(O(|s| log |s|)).

Moreover, from the functional equation we have f(s) = f(1−s), and so it is enough 1 1 to prove this when Re s > 2 . When Re s > 2 we have the integral representation s Z ∞ ζ(s) = − s {x}x−s−1 dx. s − 1 1 that we found in Proposition 3.2. This in fact implies that |(s − 1)ζ(s)| = O(|s|2) in this domain. For the Γ-factor we also have the integral representation Z ∞ Γ(s/2) = e−tts/2−1dt, 0 6.3. THE SIZE OF A HOLOMORPHIC FUNCTION AND ITS ZEROS 33 so 1 1 |Γ(s/2)| Γ( Re s) d Re s + 1e! = exp(O(|s| log |s|)) 6 2 6 2 This final bound follows from the fact that X!  exp(O(X log X)), which in turn follows from the estimate P log n ∼ X log X. This may be established by n6X R X comparison with the integral 1 log tdt. See Exercises 1, Q8 for a more precise result. It follows from Proposition 6.2 that Y s (6.1) f(s) = eAs+B (1 − )es/ρ, ρ ρ where the product is over the zeros ρ of f. In making this claim we have observed that f(0) 6= 0; the simple zero of s cancels the simple pole of Γ(s/2), and ζ(0) 6= 0 (the calculation of its exact value being an exercise on sheet 3). The zeros ρ satisfy P −2 ρ |ρ| < ∞. Note that these zeros ρ are precisely the nontrivial zeros Znontriv 1 of ζ(s), namely those in the critical strip 0 6 Re s 6 1, because the factor of Γ( 2 s) cancels out the trivial zeros of ζ at −2, −4, −6,... . It is clear that P |ρ|−2 < ρ∈Ztriv P −2 ∞, so ρ∈Z |ρ| < ∞. Of course, πs/2 1 (s − 1)ζ(s) = − f(s). s Γ(s/2) We may now recover the claimed product formula for (s − 1)ζ(s) from (6.1) and the Weierstrass product for 1/Γ, that is to say Proposition 5.1 (iii).

6.3. The size of a holomorphic function and its zeros

To prove Proposition 6.2 we need various facts about the relation between the size of a holomorphic function and the number of its zeros. We collect the facts we need in this section.

Write BR for the domain |z| < R.

Theorem 6.2 (Jensen’s formula). Let R,  > 0. Suppose that f is holomorphic on BR+, and that f(z) 6= 0 for R 6 |z| < R +  and for z = 0. Then Z 1 X R (6.2) log |f(Re2πiθ)| dθ = log |f(0)| + log , |ρ| 0 ρ where the sum is over zeros ρ of f in BR, counted with multiplicity.

Proof. Observe that if the identity is true for functions f1 and f2 then it is also true for f1f2. Write R(z − ρ) g (z) = ρ R2 − ρz 34 6. THE PARTIAL FRACTION EXPANSION and define a meromorphic function F by Y f(z) = CF (z) gρ(z), ρ where C is chosen so that F (0) = 1. If  is chosen so small that the poles z = R2/ρ lie outside of BR+ then F has no zeros in BR+. Jensen’s formula being manifestly true for constant functions, it suffices to check it for F and for the functions gρ.

Now we may define a single-valued, holomorphic logarithm of F in BR+ by the formula Z F 0(w) log F (z) := dw. [0→z] F (w) −1 Since F (0) = 1 the function z log F (z) is also holomorphic in BR+. Hence by Cauchy’s theorem we have Z log F (z) dz = 0. ∂BR z 2πiθ Parametrising the circle ∂BR by z = Re , 0 6 θ < 1 we get Z 1 log F (Re2πiθ) dθ = 0. 0 Taking real parts gives Z 1 log |F (Re2πiθ)| dθ = 0. 0 This is one side of (6.2); the other side is clearly zero. Thus we have verified the formula for F .

Turning our attention to the functions gρ, note that if |z| = R then

z(z − ρ) |gρ(z)| = = 1. R2 − ρz Thus the left-hand side of (6.2) equals 0. As for the right-hand side, note that

|gρ(0)| = |ρ|/R. Thus the right-hand side is zero as well.

The next result applies Jensen’s formula to get a fairly tight relation between the size of a holomorphic function and the number of its zeros.

Corollary 6.1. Let f be an entire function with f(0) = 1. Then for any R the number of zeros ρ of f with |ρ| < R is at most 2 sup log |f(z)|. |z|63R

Proof. Pick a radius R0 ∈ [2R, 3R] such that f(z) 6= 0 whenever |z| = R0. This is possible since otherwise f would have infinitely many zeros in the compact set |z| 6 3R, contrary to the identity principle. Jensen’s formula immediately gives Z 1 X R0 log = log |f(R e2πiθ)|dθ sup log |f(z)| sup log |f(z)|, |ρ| 0 6 6 ρ 0 |z|=R0 |z|63R 6.3. THE SIZE OF A HOLOMORPHIC FUNCTION AND ITS ZEROS 35

R0 where the sum is over all zeros ρ inside BR0 . Each term log |ρ| is positive, and the 1 terms corresponding to zeros ρ with |ρ| < R contribute at least log 2 > 2 .

A corollary of this and some of our earlier estimates is a bound for the number of zeros of ζ.

Proposition 6.3. The number of nontrivial zeros of ρ of the Riemann ζ- function with imaginary part at most T is O(T log T ).

Proof. Consider the function f(s) = s(1−s)π−s/2Γ(s/2)ζ(s), which vanishes at all nontrivial zeros of ζ. We showed earlier in the chapter that f is entire and satisfies the growth condition |f(s)|  eO(|s| log |s|). It follows from Corollary 6.1 that the number of nontrivial zeros of ζ with |ρ| 6 T + 1 is O(T log T ). If ρ is nontrivial and |=ρ| 6 T then |ρ| 6 T + 1, and so the result follows.

The next result does not use Jensen’s formula. It tells us the structure of entire functions of moderate growth with no zeros.

Lemma 6.2. Suppose that g is an entire function with no zeros which satisfies 3/2 the bound |g(z)| = exp(O(|z| )) whenever |z| = Rj, j = 1, 2,... , where Rj → ∞ as j → ∞. Then g(z) = eAz+B for some constants A, B.

Proof. Multiplying through by a constant, we may assume that g(0) = 1. Since g is nonvanishing, it has a holomorphic branch of logarithm h(z) := log g(z) with h(0) = 0, exactly as in the proof of Jensen’s formula. We cannot immediately 3/2 3/2 conclude that |h(z)|  |z| = Rj for |z| = Rj, but it does at least follow that 3/2 Re h(z) = log |g(z)| 6 CRj for some absolute constant C, and hence 3/2 | Re h(z)| 6 2CRj − Re h(z)

(consider the cases Re h(z) > 0 and Re h(z) 6 0 separately). It follows that Z 1 Z 1 2πiθ 3/2 2πiθ 3/2 (6.3) | Re h(Rje )|dθ 6 2CRj − Re h(Rje )dθ  Rj , 0 0 h(z) using the fact that z is holomorphic, as in the proof of Jensen’s formula. P∞ n Now if the Taylor expansion of h(z) is n=0 cnz then we have

2πiθ X n 2πinθ X n −2πinθ 2 Re(h(Rje )) = cnRj e + cnRj e . n>0 n>0 By orthogonality it follows that Z 1 1 −m 2πiθ −2πimθ cm = Rj Re(h(Rje ))e dθ, 2 0 36 6. THE PARTIAL FRACTION EXPANSION and so by the triangle inequality Z 1 −m 1 2πiθ |cm| 6 Rj | Re h(Rje )|dθ. 2 0 By (6.3) it follows that 3/2−m |cm|  Rj .

Letting j → ∞, this implies that c2 = c3 = ··· = 0, and so h(z) = c0 +c1z is linear. This concludes the proof.

6.4. Weierstrass product of an integral function of order one: proof

The aim of this section is to give a detailed proof of Proposition 6.2. The reader should start by recall the statement of Proposition B.1, which de- scribes the convergence properties of Weierstrass products EΩ(z). From the growth condition on f and Corollary 6.1 it follows very easily that

1+ε (6.4) #{ρ : R 6 |ρ| 6 2R} ε R , P −2 for every ε > 0. In particular (taking any ε < 1) we see that ρ6=0 |ρ| < ∞. Therefore we may apply Proposition B.1 with Ω being the set of zeros of f other than

0, obtaining an entire function EΩ(z) which vanishes (with the correct multiplicity) precisely at Ω. If f has a zero of order r at z = 0, then we define f(z) g(z) := r . z EΩ(z) By construction, g is an entire function of z with no zeros. To complete the proof of Proposition 6.2, we need only show that g(z) = eAz+B. Moreover we already have a tool for doing precisely this, namely Lemma 6.2. Applying this lemma, we see that all we need do is establish an upper bound

CR3/2 |g(z)| 6 e j for |z| = Rj, for some sequence of radii Rj → ∞. Since we already have a bound on f (by assumption) and z−r decays quickly, it is enough to establish the lower bound

−C0R3/2 (6.5) |EΩ(z)| > e j for |z| = Rj. This is slightly delicate, and must involve a careful choice of the Rj, since EΩ(z) vanishes at the points of Ω. In the light of this, it obviously makes sense to choose the radii Rj to lie away from any of the zeros ρ. j j+1 We will show that for every j we can choose an Rj ∈ [2 , 2 ] such that (6.5) j j+1 is satisfied. Write Sj for the number of zeros with 2 6 |ρ| 6 2 . Then, by (6.4) 6.4. WEIERSTRASS PRODUCT OF AN INTEGRAL FUNCTION OF ORDER ONE: PROOF37

2j with ε = 1, Sj  2 . It follows easily that Rj may be chosen in such a way that

(6.6) |z − ρ|  2−j whenever |z| = Rj, for all zeros ρ. Suppose from now on that |z| = Rj.

To get a lower bound on EΩ(z), we divide the product into dyadic subproducts

(j0) Y z/ρ EΩ (z) := (1 − z/ρ)e ,

ρ∈Sj0 0 j = 0, 1, 2,... . The contribution from |ρ| 6 1 is clearly bounded below, indepen- dently of R, by some constant (which may depend on f). 0 0 (j ) Now if j < j − 10 (say) then, for any zero ρ occurring in the product EΩ (z), 0 z/ρ −2j−j we have |1 − z/ρ| > 1. Furthermore |e | > e , and so

0 j−j0 j j0/10 (j ) −2 S j0 −C2 2 |EΩ (z)| > (e ) 2 > e (say), the last bound here being a consequence of (6.4) with ε := 1/10. 0 −2j If j−10 < j < j+10 then we employ the bound |1−z/ρ| > c2 , a consequence of (6.6). For any such j0 we have |ez/ρ|  1, and so

0 3j/2 (j ) −2j S j0 −C2 |EΩ (z)|  (c2 ) 2  e ,

1 this last bound following from (6.4) with any value of ε < 2 . 0 Finally, suppose that j > j + 10. By a crude Taylor series expansion one has 2 |(1 − w)ew| > e−10|w| for |w| < 1/10. Thus, for these values of j,

0 2j−2j0 2j −j0/2 (j ) −10·2 S j0 −C2 2 |EΩ (z)| > (e ) 2 > e ,

1 applying (6.4) with ε = 2 . (j0) It is a simple matter to check that the product of all these estimates for EΩ (z), 3j/2 over all j0, is bounded below by e−C2 , for some absolute constant C. This concludes the proof.

CHAPTER 7

Mellin transforms and the explicit formula

In this section we will take steps towards clarifying the relationship between primes and the zeros of the Riemann ζ-function, by proving the so-called explicit formula.

7.1. Definitions and statement of the formula

Recall that the von Mangoldt function Λ and it is defined by ( log p if n = pm is a prime power Λ(n) := 0 otherwise. The fact that Λ(n) 6= 0 for prime powers pm as well as for the primes themselves is almost never more than a slight annoyance, since the prime powers are so sparse. The prime number theorem asserts the ψ(X) ∼ X, where X ψ(X) := Λ(n). n6X Another way of writing this is X n ψ(X) = Λ(n)W ( ), X n where W : R → R is 1[0,1], the function defined to equal 1 on [0, 1], and zero else- where. The explicit formula is a formula for sums like this in terms of the zeros of ζ. However, we will only prove the formula when W is smooth, that is to say infinitely differentiable, and also compactly supported (that is, zero outside some closed interval). This means that in order to apply it to the prime number theorem we must approximate 1[0,1] by smooth, compactly supported functions. This is not especially difficult, but also not entirely straightforward – indeed, even the existence of a smooth compactly supported function other than 0 is not immediately obvi- ous. (The smoothness assumption can be relaxed to some condition involving only finitely many derivatives of W , if desired, and the compact support condition can also be relaxed to condition involving sufficiently rapid decay of W and sufficiently many of its derivatives. We will include an exercise about this.) The explicit formula involves the Mellin transform. This was discussed briefly in Chapter 4. We reintroduce it now: it is convenient to denote it with a tilde

39 40 7. MELLIN TRANSFORMS AND THE EXPLICIT FORMULA rather than a hat, as we will require the Fourier transform on R in our arguments. Moreover, the Mellin transform as defined earlier had, as its domain, the imaginary axis {it : t ∈ C}. Now we will extend the domain of definition to the whole complex plane, which takes us a little outside the scope of the discussion in Chapter 4.

Definition 7.1 (Mellin Transform). Suppose that W : R → R is has compact support contained in (0, ∞). For any complex number s we define the Mellin transform W˜ by Z ∞ dx W˜ (s) := W (x)xs . 0 x The assumption that W has compact support bounded away from 0 means that W is well-defined and entire, the derivative of W being obtained by differentiating under the integral.

Theorem 7.1 (Explicit formula). Let W be a smooth, compactly supported func- tion, supported on [1, ∞). Suppose that X > 1. Then X n Z X Λ(n)W ( ) = X W  − XρW˜ (ρ). X n ρ∈Z

The sum here is over the set Z = Ztriv ∪ Znontriv of all zeros of ζ.

Remarks. Since W˜ (1) = R W , one could if desired write the right-hand side as

X w − ordζ (w)X W˜ (w), w∈C where ordζ (w) is the order of ζ at w, that is to say r if w is a zero of order r, and −r if w is a pole of order r. The assumption that W be compactly supported is very convenient for the proof, for instance because in this case it is clear that W˜ (s) is holomorphic. However, Theorem 7.1 does hold under weaker conditions. We will not discuss this here. A particularly easy variant is to relax the support condition to Supp(W ) ⊂ [c, ∞). Under this condition, the explicit formula holds for X > 1/c. It is an easy exercise (left to the reader) to deduce this statement from Theorem 7.1. In some ways, W is most naturally thought of as a function on the multiplicative positive real line R+. To this end, we associate a function w : R+ → R via w := W ◦ exp, that is to say w(u) := W (eu). Saying Supp(W ) is a compact subset of (0, ∞) is then equivalent to saying that w is compactly supported, whilst the condition Supp(W ) ⊂ [1, ∞) is equivalent to Supp(w) ⊂ [0, ∞). One may note that Z ∞ dx W˜ (s) = W (x)x−s =w ˆ(is), 0 x 7.2. PROOF OF THE EXPLICIT FORMULA – OVERVIEW 41 wherew ˆ denotes the Fourier transform of w (but allowed to take arguments in C, not just in R). The explicit formula can be written in terms of w as X Z ∞ X Λ(n)w(log n − log X) = X w(log x)dx − Xρwˆ(iρ). n 0 ρ∈Z Before turning to the proof of the explicit formula, let is make some remarks about its form. The term X(R W ) is the “main term”; one would expect the sum on the left to equal roughly this, at least if one assumes the prime number theorem, P n R since Λ has average value 1 and n W ( X ) ≈ X( W ). The remaining terms P ˜ ρ∈Z W (ρ) are, therefore, to be thought of as “error terms”. It turns out that the contribution from ρ ∈ Z is negligible, leaving the sum P XRe ρW˜ (ρ). triv ρ∈Znontriv Let us imagine, for the moment (and we will prove various rigorous assertions about this in the next section) that W˜ (ρ) is on the order of 1. The contribution of this term rather depends on what we know about the location of the nontrivial zeros

Znontriv; at the moment, all we know is that they lie in 0 6 Re ρ 6 1. For all we know, it could be that some zero ρ lies on the line Re s = 1, in which case one would expect the term XRe ρW˜ (ρ) to have the same order of magnitude, O(X), as the main term. Thus, once one has the explicit formula, further progress on the prime number theorem depends on pinning down further information about the nontrivial zeros.

7.2. Proof of the explicit formula – overview

In this section we outline the proof of the explicit formula, leaving some technical details to the next section. A key ingredient is the inversion formula for the Mellin transform, which is basically equivalent to the Fourier inversion formula.

Proposition 7.1. Let W : R → R be a smooth function with compact support contained in (0, ∞). Then for any σ ∈ R we have the Mellin inversion formula 1 Z σ+i∞ W (x) = W˜ (s)x−s ds. 2πi σ−i∞ Proof. Suppose that s = σ + it. Then

(7.1) W˜ (s) = w\expσ(−t), where w is defined as before, and now the Fourier transform takes a real argument −t. Here, exp is the exponential function, so expσ(x) := eσx. The function w expσ is of course compactly supported and smooth, so we may apply the Fourier inversion formula to get 1 Z ∞ 1 Z ∞ w(u) = e−σu w\expσ(−t)e−itudt = e−σu W˜ (s)e−itudt, 2π −∞ 2π −∞ 42 7. MELLIN TRANSFORMS AND THE EXPLICIT FORMULA which is equivalent to the stated formula if one substitutes u = log x.

Let us recall that we know how to obtain decay estimates for the Fourier trans- form of smooth functions: see for example Lemma 4.1. It is therefore not surprising that (7.1) will be useful in its own right for obtaining corresponding estimates for the Mellin transform. See Proposition 7.1 below for details. Now we can outline the proof of the explicit formula as stated in Theorem 7.1. The proof begins with the Dirichlet series X ζ0(s) Λ(n)n−s = − , ζ(s) n valid for Re s > 1. We established this earlier in the course. By the Mellin inversion formula, Proposition 7.1 and this Dirichlet series expan- sion we have X n X 1 Z σ+i∞ Λ(n)W ( ) = Λ(n) W˜ (s)(n/X)−s ds X 2πi n n σ−i∞ 1 Z σ+i∞ X = Λ(n)n−sXsW˜ (s) ds 2πi σ−i∞ n 1 Z σ+i∞ ζ0(s) (7.2) = − XsW˜ (s) ds, 2πi σ−i∞ ζ(s) the interchange in the order of integration and summation being justified if σ > 1. Take σ = 2 for definiteness. We now reach the heart of the argument. The aim is to move the contour of integration far to the left, picking up residues due to the poles of ζ0/ζ at s = 1, at the non-trivial zeros ρ ∈ Znontriv in the critical strip, and at the trivial zeros

Ztriv = {−2, −4, −6,... }. To this end we note that ζ0 Res = −1, s=−1 ζ ζ0 Res = 1, s=−2k ζ ζ0 and Ress=ρ ζ is equal to the multiplicity of the zero ρ. To perform this contour integration rigorously we employ a rectangular contour (1) (2) (3) (4) (1) (2) Ck,T = CT ∪Ck,T ∪Ck,T ∪Ck,T , where CT = [2−iT, 2+iT ], Ck,T = [2+iT, −2k− (3) (4) 1 + iT ], Ck,T = [−2k − 1 + iT, −2k − 1 − iT ] and Ck,T = [−2k − 1 − iT, 2 + iT ], and then choose suitable values of k, T tending to infinity. Here k is an integer; (3) 0 with this choice, Ck,T avoids the poles of ζ /ζ at the trivial zeros Ztriv. The choice of T is rather more delicate and in particular we need to make sure that neither (2) (4) 0 Ck,T nor Ck,T pass close to any of the poles of ζ /ζ at the nontrivial zeros Znontriv. 7.3. ESTIMATES FOR THE MELLIN TRANSFORMS. 43

Later on, we will choose a sequence T1,T2,... of “good” values of T such that, in (2) (4) particular, neither Ck,T nor Ck,T contain any zeros of ζ. For j = 1, 2, 3, 4 define

Z 0 (j) 1 ζ (s) s ˜ (7.3) Ik,T := − X W (s) ds. 2πi (j) ζ(s) Ck,T P4 (j) To calculate j=1 Ik,T we apply the residue theorem and the preceding remarks, obtaining

4 k X (j) ˜ X ρ ˜ X −2j ˜ (7.4) Ik,T = XW (1) − X W (ρ) − X W (−2j). j=1 ρ∈Znontriv j=1 |ρ|

(1) X n (7.5) lim lim I = Λ(n)W ( ) k,Tj k→∞ Tj →∞ X n as well as

(7.6) lim lim I(2) ,I(4) = 0 k,Tj k,Tj k→∞ Tj →∞ and

(7.7) lim lim I(3) = 0. k,Tj k→∞ Tj →∞

We will prove these statements (as well as define the sequence Tj) in the next section.

7.3. Estimates for the Mellin transforms.

The following lemma gives a decay estimate for the Mellin transform of a smooth function. In this lemma, we have obeyed the usual cultural norm, which is that s = σ + it denotes a general element of the complex plane, while ρ denotes a zero of ζ which, if ρ is nontrivial, we write as β + iγ.

Lemma 7.1 (Mellin decay properties). Suppose that W : R+ → R is smooth and has compact support contained in [1, ∞). Then 44 7. MELLIN TRANSFORMS AND THE EXPLICIT FORMULA

(i) Suppose that −∞ < σ 6 2. Then we have

(7.8) |W˜ (σ + it)| W 1

and, for any integer m > 0,

m −m (7.9) |W˜ (σ + it)| m,W (1 + |σ|) |t|

uniformly for |t| > 1. (ii) Suppose that Supp W ⊂ [1, 10]. We have the bounds

j −m (7.10) |W˜ (ρ)| m sup k∂ W k1|ρ| 06j6m

whenever ρ ∈ Znontriv,|ρ| > 2, and

(7.11) |W˜ (ρ)|  kW k1

uniformly for all ρ ∈ Z = Ztriv ∪ Znontriv.

Proof. We remark that we have explicitly formulated this result in parts with future applications in mind. Part (i) is the bound we will use in proving the explicit formula, whereas (ii) are the bounds we will use in applying the explicit formula, where it is important to understand the penalty one must pay when approximating a rough cutoff by a smooth function W . The proof of all parts uses (7.1), that is to say

(7.12) W˜ (σ + it) = w\expσ(−t), where w := W ◦ exp. We also use Lemma 4.1, which states that

ˆ −m m (7.13) |f(t)| 6 |t| k∂ fk1 for any f ∈ S(R). For (7.8) and (7.11), we use the case m = 0 of (7.13). This gives Z ∞ ˜ σu σ W (σ + it) 6 |w(u)|e du 6 max(Supp(W )) kwk1. 0 The bound (7.8) follows immediately from this. The bound (7.11) follows from this and the additional observation that Z Z Z W (x) kwk = w(u)du = W (eu)du = dx kW k . 1 x 6 1 For the other statements, we use (7.13) with m > 1, which requires us to estimate the derivatives. By Leibniz’s rule and the triangle inequality,

m σ j m−j σ k∂ (w exp )k1 m sup k(∂ w)(∂ exp )k1 06j6m j m−j σ = sup k(∂ w)σ exp )k1 06j6m 6 E1E2E3, 7.4. BOUNDS FOR ζ0/ζ AT JUDICUOUSLY CHOSEN POINTS 45 where

m−j σu j E1 := sup |σ| ,E2 := sup e ,E3 := sup k∂ wk1. 06j6m u∈Supp w 06j6m m For (7.9), we use the bounds E1  (1 + |σ|) , E2 W 1 and E3 W,m 1. For (7.10), set ρ = β + iγ, where 0 6 β 6 1, and note that |ρ|  |γ|; therefore it suffices to get bounds in terms of |γ|−m (which is what (7.13) gives) rather than −m |ρ| . In the definition of E1,E2,E3, σ must be replaced by β. We have the bounds E1 m 1, which holds uniformly for 0 6 β 6 1, and E2  1, which again holds uniformly for 0 6 β 6 1 since Supp(w) ⊂ [0, 3]. To complete the proof, we must bound E3 by showing that

j j (7.14) sup k∂ wk1 m sup k∂ W k1. 06j6m 06j6m To see this, observe that by Leibniz’s rule ∂jw(u) is a sum of terms of the form j0u j0 u 0 e ∂ W (e ) for j 6 j. But Z Z j0u j0 u j0−1 j0 j0 e ∂ W (e )du = x ∂ W (x)dx j0 k∂ W k1, where in the last step use was made of the support property Supp(W ) ⊂ [1, 10].

7.4. Bounds for ζ0/ζ at judicuously chosen points

Now it is time to start thinking about how to choose the ordinates Tj in such a way that the poles of ζ0(s)/ζ(s) do not interfere with the horizontal contours C(2) ,C(4) . Very crude estimates suffice, ultimately because we are dealing with k,Tj k,Tj smooth functions W which enjoy very advantageous Mellin decay properties.

Lemma 7.2. There is a sequence of T tending to infinity such that dist(s, Z) > 1/T uniformly for all s with =s = T .

Proof. If T > 1 then certainly dist(s, Ztriv) > 1 and so it suffices to handle the nontrivial zeros. Let U be large, and suppose that for all T with U 6 T 6 2U there 1 is some s = σ + iT with dist(s, Ztriv) 6 U . Let {T1,...,Tm} ⊂ [U, 2U] be a set 2 U 2 with |Ti − Tj| > U for all i 6= j. If U is large then there is such a set with m > 4 . 1 For each i there is some σi and some zero ρi so that |σi + iTi − ρi| 6 U . If i 6= j, 2 |=ρ − =ρ | |T − T | − > 0, i j > i j U and so ρi 6= ρj. All the zeros ρi satisfy |ρi| 6 3U, and so we obtain that the number 2 j of zeros of ζ with |ρ| 6 3U is  U . Summing over U = 2 , with j taking integer values, it follows that P |ρ|−2 diverges. This is contrary to what we established earlier. 46 7. MELLIN TRANSFORMS AND THE EXPLICIT FORMULA

We call a value of T for which the conclusion of Lemma 7.2 holds good. If T is 0 good then certainly there are no elements of Znontriv, and hence no poles of ζ /ζ, on the horizontal lines =s = ±T , and hence in particular no such poles on the (2) (4) contours Ck,T ,Ck,T . We actually need rather more than this, namely a reasonable upper bound for ζ0/ζ on these contours. That said, a fairly crude bound will do, and we provide such a bound in the next proposition.

Proposition 7.2. Suppose that |s| > 3. Then ζ0(s)  |s|3 dist(s, Z)−1, ζ(s) where Z is the set of all zeros of ζ.

Remark. This is a crude bound, and stronger ones could be obtained. For our 0 purposes, however, any bound of the form |s|C dist(s, Z)−C would be sufficient. Proof. We use the partial fraction expansion ζ0(s) 1 X  1 1 = B0 − + + . ζ(s) s − 1 s − ρ ρ ρ∈Z

The first two terms are clearly O(1) in the domain |s| > 3. Note that the right-hand side is  1, since dist(s, Z) > dist(s, −2) > 1. To bound the sum over the zeros in the partial fraction expansion, our main ingredient will be the estimate X (7.15) |ρ|−2 < ∞. ρ∈Z

We will split into the sum into two parts. Suppose first that |ρ| > 2|s|. Then 1 s 1 1 |s/ρ| 6 2 , and hence | ρ − 1| > 2 and hence |s − ρ| > 2 |ρ|. Hence X  1 1 X 1 X 1 (7.16) + |s| 2|s|  |s|, s − ρ ρ 6 |s − ρ||ρ| 6 |ρ|2 |ρ|>2|s| |ρ|>2|s| |ρ|>2|s| the last step of course following from (7.15). To estimate the contribution from the remaining zeros, those with |ρ| < 2|s|, we proceed extremely crudely. By (7.15) it follows immediately that the number of such zeros is O(|s|2). The contribution from a given one is |s|  |s| dist(s, Z)−1, |s − ρ||ρ| 1 where here we used the fact that |ρ|  1, which follows simply from the fact that 0 is not a zero of ζ. Therefore X  1 1 +  |s|3 dist(s, Z)−1, s − ρ ρ |ρ|<2|s| and the proposition follows. 7.4. BOUNDS FOR ζ0/ζ AT JUDICUOUSLY CHOSEN POINTS 47

Now we complete the proof of the explicit formula by verifying (7.5), (7.6) and (7.7). (1) Proof of (7.5). If s ∈ CT then Re s = 2, and so dist(s, Z) > 1. It follows from Proposition 7.2 that ζ0(2+it)/ζ(2+it)  |t|3. However, in this case a better bound is available via an easier route: we have ζ0(2 + it) X X (7.17) | | = | Λ(n)n−2−it| Λ(n)n−2 = O(1), ζ(2 + it) 6 n n uniformly in t. Recall that Z 2+iT 0 (1) 1 s ˜ ζ (s) Ik,T = − X W (s) ds. 2πi 2−iT ζ(s) (Note that this integral does not, in fact, depend on k.) Recall from (7.9) that we have the decay estimate |W˜ (2 + it)|  |t|−2. Putting this and (7.17) together gives Z 2+i∞ ζ0(s) Z ∞ X2 XsWf(s) ds  X2 |t|−2dt  . 2+iT ζ(s) T T Therefore Z 2+i∞ 0 2 (1) 1 s ˜ ζ (s) X Ik,T = − X W (s) ds + O( ) 2πi 2−i∞ ζ(s) T X n X2 = Λ(n)W ( ) + O( ), X T n by (7.2). The statement (7.5) now follows immediately. (2) (4) 1 Proof of (7.6). If s ∈ Ck,T ∪ Ck,T then, since T is good, dist(s, Z)  T . It (2) (4) follows from Proposition 7.2 that, uniformly for s ∈ Ck,T ∪ Ck,T , we have ζ0(s)  |s|3T  k3T 4. ζ(s) It follows that Z 2 (2) (4) 3 4 σ ˜ Ik,T ,Ik,T  k T X |W (σ + iT )| dσ. −2k−1 The bound (7.9) with m = 5 tells is that

5 −5 |W˜ (σ + iT )| W (1 + |σ|) T uniformly for −∞ < σ 6 2. Therefore we obtain Z 2 (2) (4) 3 −1 5 σ 3 −1 Ik,T ,Ik,T W k T (1 + |σ|) X dσ W,X k T , −2k−1 where here we used the fact that X > 1. Thus

(2) (4) 3 −1 Ik,T ,Ik,T W,X k T . 48 7. MELLIN TRANSFORMS AND THE EXPLICIT FORMULA

Statement (7.6) follows (note that it is important that we take the limit as T → ∞ first, and only then the limit as k → ∞). (3) Proof of (7.7). If s ∈ Ck,T then dist(s, Z) > 1 and so Proposition 7.2 furnishes 0 ζ (s) 3 3 3 the bound ζ(s)  |s| . If s = −2k − 1 + it then (crudely) this is  k max(1, |t|) . From (7.9) with m = 5 we have

5 −5 |W˜ (−2k − 1 + it)| W k max(1, |t|) .

Thus ζ0(s) XsW˜ (s)  k8 max(1, |t|)−2X−2k−1, ζ(s) W which implies that Z −2k−1+iT 0 (3) ζ (s) s ˜ −2k−1 Ik,T = X W (s)ds k,W X , −2k−1−iT ζ(s) uniformly in T . From this and the fact that X > 1 we immediately conclude (7.7), that is to say that indeed (3) lim lim Ik,T = 0 k→∞ T →∞ Note that for the estimation of I(3) there was no need to restrict to T good. CHAPTER 8

The prime number theorem

In the last section we established the explicit formula, which provides a rela- tionship between the distribution of prime numbers and the nontrivial zeros ρ of the ζ-function. To make use of this result, and in particular to prove the prime number theorem, it is important to have some information about the location of those zeros. The prime number theorem turns out to be more-or-less equivalent to the fol- lowing statement.

Proposition 8.1. There are no zeros of ζ on the line Re s = 1.

Proof. Consider the Euler product identity Y ζ(s) = (1 − p−s)−1, p valid for Re s > 1. Set s = σ + it; we will let σ → 1. Taking logs, we have ∞ ∞ X X X 1 X X 1 (8.1) log ζ(s) = − log(1 − p−s) = = e−imt log p. mpms mpmσ p p m=1 p m=1 Now we invoke the inequality

2 (8.2) 3 + 4 cos θ + cos 2θ = 2(1 + cos θ) > 0. Applying this with θ = mt log p and comparing with (8.1), we get

3 log ζ(σ) + 4 Re log ζ(σ + it) + Re log ζ(σ + 2it) > 0, and thus 3 4 ζ(σ) |ζ(σ + it)| |ζ(σ + 2it)| > 1. Now suppose that ζ(1 + it) = 0, and let σ → 1 in the inequality just proved. We have ζ(σ) ∼ (σ−1)−1, but |ζ(σ+it)|4  (σ−1)4. Since ζ(σ+2it) remains bounded as σ → 1, we have ζ(σ)3|ζ(σ + it)|4|ζ(σ + 2it)| → 0, a contradiction.

We will actually prove the following statement, from which the prime number theorem is a relatively easy deduction.

49 50 8. THE PRIME NUMBER THEOREM

Proposition 8.2. Suppose that W : R → R is smooth and that Supp(W ) ⊂ [1, 10]. Then 1 X n Z lim Λ(n)W ( ) = W. X→∞ X X n

Proof. We regard W as fixed throughout the proof and do not indicate dependen- cies on W . By the explicit formula it suffices to show that X lim Xρ−1W˜ (ρ) = 0. X→∞ ρ We handle the contribution from the trivial zeros and the nontrivial ones separately.

By (7.11) we have W˜ (ρ)  1 uniformly for ρ ∈ Ztriv, and so ∞ X ρ−1 X −2j−1 X W˜ (ρ) W X .

ρ∈Ztriv j=1 This certainly tends to 0 as X → ∞. Turning now to the contribution from the nontrivial zeros, we use (7.10) with −2 m = 2, which tells us that W˜ (ρ)  |ρ| uniformly for ρ ∈ Znontriv. Noting also ρ−1 P −2 that |X | 6 1 (since Re ρ 6 1) and recalling that ρ |ρ| < ∞, it follows that for some K = K(ε) we have

X ρ−1 ε X W˜ (ρ) . 6 2 ρ∈Znontriv:|ρ|>K For the nontrivial zeros with |ρ| 6 K it follows from (7.9) and the fact that there are only finitely many such zeros that we have

X ρ−1 β(K)−1 X W˜ (ρ) K X ,

ρ∈Znontriv:|ρ|

X ρ−1 X W˜ (ρ) < ε,

ρ∈Znontriv which is what we wanted to prove.

Proposition 8.2 is a kind of “smoothed” prime number theorem. Now we show how the prime number theorem itself follows from it. As shown earlier in the course, the prime number theorem is equivalent to the asymptotic X Λ(n) = (1 + o(1))X. n6X 8. THE PRIME NUMBER THEOREM 51

Here, we will show that if X is big enough in terms of ε then X (8.3) Λ(n) 6 (1 + ε)X. n6X A corresponding lower bound may be established in exactly the same way.

Take a smooth function W : R+ → R such that Supp(W ) ⊂ [2 − ε, 4 + ε], 0 6 W (x) 6 1 for all x, and W (x) = 1 for 2 6 x 6 4. The existence of such a W is “obvious by picture” but perhaps not so obvious mathematically. A construction is given in Appendix A. Applying the explicit formula to W tells us that if Y is sufficiently large in terms of ε then X X n ε Z 2ε Λ(n) Λ(n)W ( ) ( + W )Y (1 + )2Y. 6 Y 6 3 6 3 2Y

CHAPTER 9

The zero-free region. Error terms.

9.1. The classical zero-free region

In the last section we showed that ζ has no zeros on the line Re s = 1, and we saw that this implies the prime number theorem. In this section we obtain, using a related argument, a region to the left of Re s = 1 in which there are no zeros of ζ. This may be used to prove a stronger version of the prime number theorem, with a good error term: we provide details of this in the next section. The following is known as the “classical zero-free region”.

Theorem 9.1. There is an absolute constant c > 0 such that any zero ρ = β+iγ with |γ| > 2 satisfies c β < 1 − . log |γ|

Remark. Since there are no zeros on the line Re s = 1, one can if desired state a zero-free region of the form c0 β < 1 − log(|γ| + 2) for all γ. Proof. We may assume that |γ| > 10, since for small γ the result is a consequence of the fact that there are no zeros on the line Re s = 1 (established in the last chapter) and a compactness argument. Of course, for Re s > 1 we have ∞ ζ0(s) X = − Λ(n)n−s. ζ(s) n=1 Taking real parts gives, provided σ > 1, ∞ ζ0(σ + it) X − Re = Λ(n)n−σ cos(t log n). ζ(σ + it) n=1 Using the inequality

2 3 + 4 cos θ + cos 2θ = 2(1 + cos θ) > 0

53 54 9. THE ZERO-FREE REGION. ERROR TERMS.

(exactly as in the last section) it follows that ζ0(σ) ζ0(σ + it) ζ0(σ + 2it) (9.1) −3 Re − 4 Re − Re 0 ζ(σ) ζ(σ + it) ζ(σ + 2it) > for any σ > 1 and for any t ∈ R. ζ0(s) 1 Now we assemble some further inequalities. Since ζ(s) + s−1 is holomorphic, and hence continuous, it is bounded on the compact interval [1, 2]. That is, if 1 < σ 6 2 then ζ0(σ) 3 (9.2) −3 Re + C , ζ(σ) 6 σ − 1 1 for some absolute constant C1. To get bounds for the other two terms in (9.1), we use the partial fraction expansion ζ0(σ + it) 1 X 1 1 = B − + ( + ). ζ(σ + it) σ + it − 1 σ + it − ρ ρ ρ∈Z The first two terms are bounded by O(1). To bound the sum over ρ, we treat the trivial zeros separately. If 1 6 σ 6 2 and |t| > 2, the contribution from the trivial zeros is X X σ + it X |t|  √ . ρ(σ + it − ρ) k k2 + t2 k ρ=−2k−1 k>1 To estimate this sum, we split it into k 6 |t| and k > |t|. On the first range the 1 summand is  k , so we get a contribution of O(log |t|). On the second range the |t| summand is  k2 , and so we get a contribution of O(1). It follows that ζ0(σ + it) X 1 1 −4 Re = O(log |t|) − 4 Re + . ζ(σ + it) σ + it − ρ ρ ρ∈Znontriv Of course, we also have ζ0(σ + 2it) X 1 1 − Re = O(log |t|) − Re + . ζ(σ + 2it) σ + 2it − ρ ρ ρ∈Znontriv Combining all this information, we obtain (9.3) X 1 1 X 1 1 3 4 Re +  + Re +  C log |t| + σ + it − ρ ρ σ + 2it − ρ ρ 6 2 σ − 1 ρ∈Znontriv ρ∈Znontriv for some C2. 1 a Since Re a+ib = a2+b2 and σ > 1 > Re ρ > 0 for every ρ ∈ Znontriv, every term on the left here is non-negative. Ignoring all terms except the contribution of the zero ρ = β + iγ to the leftmost sum, and setting t = γ, we obtain 4 3 C log |γ| + . σ − β 6 2 σ − 1 9.2. THE PRIME NUMBER THEOREM WITH CLASSICAL ERROR TERM 55

Setting σ = 1 + 1/2C2 log |γ| and rearranging then gives 1 β 6 1 − . 14C2 log |γ| This has been proved whenever |γ| > 2, and this completes the proof.

We remark that the classical zero-free region is not the largest one known. In fact, the stronger inequality c β < 1 − log2/3+ε |γ| is known to hold for any ε > 0, a result of Vinogradov and Korobov. The methods necessary to prove this lie considerably deeper.

9.2. The prime number theorem with classical error term

In this section we examine the implications of the classical zero-free region for the prime number theorem. The main result is the following.

√ Theorem 9.2. We have ψ(X) = X + O(Xe−c log X ).

Here, ψ(X) = P Λ(n), as usual. The letter c denotes a positive absolute n6X constant, but it may vary from line to line. The proof is very closely related to that in Chapter 8, except now we must be a little more precise in how we approximate 1[2,4] by smooth functions. Let ε > 0. We claim that there is a smooth function Wε : R+ → R with Supp(Wε) ⊂ [2 − ε, 4 + ε], 0 6 Wε(x) 6 1 for all x and Wε(x) = 1 for 2 6 x 6 4, satisfying the derivative bounds 1 (9.4) k∂W k  1, k∂2W k  , ε 1 ε 1 ε This is reasonably “clear by picture”. One way to proceed more rigorously is to

first define W1/2 and then to set

t Wε(2 + t) = W1/2(2 + 2ε ) |t| < ε, t Wε(4 + t) = W1/2(4 + 2ε ) |t| < ε, Wε(x) = 1 2 + ε 6 x 6 4 − ε.

That is, Wε is a kind of contracted version of W1/2. We leave the proof of the bounds (9.4) to the reader. 56 9. THE ZERO-FREE REGION. ERROR TERMS.

By the explicit formula and the properties of Wε we have X n ψ(4Y ) − ψ(2Y ) Λ(n)W ( ) 6 ε Y n Z X ρ = Y W − Y W˜ ε(ρ) ρ X ρ ˜ 6 2Y (1 + ε) − Y Wε(ρ) ρ X ρ ˜ 6 2Y (1 + ε) + |Y ||Wε(ρ)|. ρ To bound the sum over ρ, we use the estimates (7.10), (7.11). As in the last chapter, the contribution from the trivial zeros is  Y −3 and can be ignored. For

ρ ∈ Znontriv we apply (7.11) with m = 1, 2, obtaining (by (9.4)) the estimates 1 W˜ (ρ) min(|ρ|−1, |ρ|−2). ε ε Thus X 1 1 ψ(4Y ) − ψ(2Y ) 2Y + O(εY ) + O(1) |Y ρ| min( , ). 6 |ρ| ε|ρ|2 ρ∈Znontriv

To estimate the sum over ρ, we split into exponential ranges, defining Zj to be the j j+1 set of zeros ρ ∈ Znontriv with e 6 |ρ| < e . The number of zeros in such a range is  jej by Proposition 6.3. If ρ = β + iγ is such a zero then, by the classical ρ 1−c/j zero-free region, β 6 1 − c/j, and hence |Y | 6 Y . In the range j 6 log(1/ε) we therefore have X 1 |Y ρ|  jY 1−c/j, |ρ| ρ∈Zj and therefore X X 1 1 |Y ρ|  log2( )Y 1−c/ log(1/ε). |ρ| ε j6log(1/ε) ρ∈Zj In the range j > log(1/ε) we have X 1 1 |Y ρ|  jY 1−c/j , ε|ρ2| εej ρ∈Zj and so X X 1 |Y ρ|  log(1/ε)Y 1−c/ log(1/ε). ε|ρ2| j>log(1/ε) ρ∈Zj Putting all this together yields

2 1−c/ log(1/ε) ψ(4Y ) − ψ(2Y ) 6 2Y + O(εY ) + O(log (1/ε)Y ). √ Now we must choose a good value of ε. With the choice ε = e− log Y we obtain √ −c log Y ψ(4Y ) − ψ(2Y ) 6 2Y + O(Y e ). 9.3. THE RIEMANN HYPOTHESIS AND ITS IMPLICATIONS FOR PRIMES 57

By the usual telescoping sum argument, using the above with X/4, X/8,... , we see that √ −c log X ψ(X) 6 X + O(Xe ). (We leave the precise details of this telescoping sum argument to the reader.)

By entirely analogous arguments, only using a minorant to the interval 1[2,4] rather than the majorants Wε, we may obtain the corresponding lower bound √ −c log X ψ(X) > X − O(Xe ). This completes the proof of the prime number theorem with classical error term.

9.3. The Riemann hypothesis and its implications for primes

The Riemann hypothesis is the assertion that all the nontrivial zeros Znontriv lie on the line Re s = 1/2. If the Riemann hypothesis holds then we can, of course, improve the estimate ρ 1−c/j ρ 1/2 |Y | 6 Y in the above argument to |Y | 6 Y . By essentially the same arguments as before (splitting the zeros into ranges Zj) we obtain

2 1/2 ψ(4Y ) − ψ(2Y ) 6 2Y + O(εY ) + O(log (1/ε)Y ). Choosing ε = Y −2/3 (say) yields

1/2 2 ψ(4Y ) − ψ(2Y ) 6 2Y + O(Y log Y ). Telescoping the sum in the usual manner gives

1/2 2 ψ(X) 6 X + O(X log X). Once again, a corresponding lower bound may be obtained using an analogous argument.

CHAPTER 10

*An introduction to sieve theory

10.1. Introduction

In this chapter we introduce a different kind of method in the study of prime numbers, the sieve. Sieve theory is an enormous topic, but we will only consider one very particular problem: that of bounding from above the quantity π(X+Y )−π(X), the number of primes between X and X + Y . We note to begin with that, at least if X is much larger than Y , the prime number theorem is totally useless. Moreover, we certainly do not expect any kind of asymptotic formula for this quantity, since there could be no primes at all in such an interval. It is instructive to consider the efficacy of the most naive sieve method, the sieve of Erathosthenes (a.k.a. inclusion-exclusion). Writing f(d) := #{n : X < n 6 X + Y, d|n}, an upper bound for π(X + Y ) − π(X) is

X |I| Y f(1) − f(2) − f(3) − · · · + f(6) + f(10) + ··· = (−1) f( pi), I⊂[k] i∈I where p1, . . . , pk are the first k primes. We have Y f(d) = + O(1), d and in general it is not possible to say anything useful about the O(1) term. There- fore these error terms of O(1) add up to something that cannot be bounded by better than O(2k), and we have

X |I| Y 1 k π(X + Y ) − π(X) 6 Y (−1) + O(2 ) pi I⊂[k] i∈I k Y 1 = Y (1 − ) + O(2k). p i=1 i Clearly we cannot take k larger than a multiple of log Y and hope to get a useful result. However, Qk (1− 1 ) is of size roughly 1/ log k, and so the best bound this i=1 pi method can give is Y π(X + Y ) − π(Y )  . log log Y

59 60 10. *AN INTRODUCTION TO SIEVE THEORY

The main aim of this section is to prove the following result, giving the best possible order of magnitude.

Theorem 10.1. π(X + Y ) − π(X), the number of primes between X and X + Y Y is at most (2 + o(1)) log Y (with the o(1) being as Y → ∞).

Remark. It is in fact known that the number of primes in this range is at most 2π(Y ). The argument we give here can be used, with a little more care, to prove a result almost as good as this, namely that π(X + Y ) − π(X)  (2 + o(1))Y/ log Y . It is a major unsolved problem to improve the constant 2, though it is conjecture that it can be replaced by 1. However, Hensley and Richards have shown that one should not expect that π(X + Y ) − π(X) 6 π(Y ), in the sense that they have proved that this inequality fails if one assumes some widely-believed conjectures about configurations of primes.

10.2. Selberg’s weights

To get an upper bound on π(X + Y ) − π(X) we use an idea of Selberg which, in retrospect, is very simple. Let (λd)d>1 be an sequence of real numbers with λ1 = 1, let R < Y be a threshold to be specified later (we will choose it to be Y c for c 1 slightly less than 2 ), and consider the weight function X 2 ν(n) := λd . d|n:d6R Evdiently ν(n) > 0 for all n. Moreover, if n is a prime > R, then the sum P λd collapses to simply λ1 = 1, and therefore ν(n) = 1. In other words, d|n:d6R ν(n) is a majorant for the characteristic function of the primes, and hence X (10.1) π(X + Y ) − π(X) 6 ν(n) + R. X

The inner sum, that is to say the number of n ∈ (X,X + Y ] divisible by d1 and d2, is Y + O(1) (where here [a, b] means the l.c.m. of a and b) and so [d1,d2] 10.2. SELBERG’S WEIGHTS 61

X π(X + Y ) − π(Y ) 6 ν(n) + R X6n6X+Y

X λd1 λd2 X (10.2) = Y + O(1) |λd1 ||λd2 | + R. [d1, d2] d1,d26R d1,d26R

At this point the λd are still completely arbitrary, subject only to λ1 = 1. The strategy henceforth is to choose them to minimise the quadratic form λ λ λ λ ~ X d1 d2 X d1 d2 Q(λ) := = (d1, d2), [d1, d2] d1d2 d1,d26R d1,d26R which forms the main term above, and hope that the error term looks after itself. Under what conditions can the error term be “expected to look after itself?” If we assume that the λd are “reasonably bounded” in the sense that

o(1) (10.3) λd  d , then the error term here is  R2+o(1). If we choose R = Y c for some c < 1/2 then this will, if we take ε sufficiently small, be much smaller than Y/ log Y , the main term in our result. To carry out this strategy, we need to diagonalise the form Q. To do this, we note that X (d1, d2) = φ(δ),

δ|(d1,d2) where φ is Euler’s φ-function. We showed this earlier in the course. Substituting in to the definition of Q yields

X X λd 2 X (10.4) Q(~λ) = φ(δ)  = φ(δ)u2, d δ δ6R δ|d δ6R d6R where X λd u := . δ d δ|d d6R We claim that the change of variables is invertible, or in other words that we can express λd in terms of the uδ. This is a slight twist of the M¨obiusinversion formula; indeed we claim that

λd X δ (10.5) = µ( )u . d d δ d|δ δ6R This is easily verified: the right-hand side is

X λd0 X δ µ( ), d0 d 0 0 d 6R d|δ|d 62 10. *AN INTRODUCTION TO SIEVE THEORY and the inner sum vanishes except when d0 = d.

Note in particular that the constraint λ1 = 1 becomes

X (10.6) 1 = µ(δ)uδ. δ6R Minimising Q(~λ), as given in (10.4), subject to (10.6), is a standard task. Indeed by Cauchy-Schwarz we have 2 X X 1/2 X µ (δ) 1/2 1 = µ(δ)u φ(δ)u2  , δ 6 δ φ(δ) δ6R δ6R δ6R µ(δ) and moreover equality can occur by taking uδ ∝ φ(δ) . Thus the minimum value of Q(~λ) subject to (10.6) is 1/D, where X µ2(δ) X 1 D := = , φ(δ) φ(δ) δ6R δ6R:δ squarefree this being attained when µ(δ) u = . δ Dφ(δ) To complete the proof, it is enough to show that

(10.7) D > log R, as well as the fact that, with this choice of the uδ, the λd as specified in (10.5) ε satisfy (10.3), that is to say |λd| ε d . We handle these tasks in turn. Proof of (10.7). By definition we have X 1 D = . φ(d) d6R:d squarefree Q 1 If d is squarefree then φ(d) = p|d(1 − p ) and therefore this can be written as X 1 Y 1 1 (1 + + ··· ). d p p2 d6R:d squarefree p|d

α1 αk Now every m 6 R can be written as a product dp1 ··· pk , where d is squarefree and pi|d (simply take d = p1 ··· pk, where the pi are the primes dividing m). Therefore X 1 D log R, > m > m6R which is what we wanted to prove.

Proof of (10.3). By the choice of uδ and (10.5) we have d X µ(δ/d)µ(δ) λ = . d D φ(δ) d|δ,δ6R 10.2. SELBERG’S WEIGHTS 63

Note that the sum is supported where δ is squarefree. Thus, writing δ0 := δ/d, we have φ(δ) = φ(δ0)φ(d). The sum is therefore bounded above by d X 1 d 0 = . Dφ(d) 0 0 φ(δ ) φ(d) δ 6R,δ squarefree (Note that if µ(δ) 6= 0 then δ0, d are coprime and hence φ(δ) = φ(d)φ(δ0), and certainly δ0 must be squarefree.) Thus we only need prove that

1−ε (10.8) φ(d) ε d when d is squarefree. This is clear if we factor d as a product of primes: if p > p0(ε) 1−ε is a sufficiently large prime then φ(p) = p − 1 > p , whilst the contribution from the smaller primes is just a constant.

APPENDIX A

*Some smooth bump functions

In this appendix we discuss some smooth functions – for example, smooth ap- proximations to the interval [−1, 1]. Traditionally a “trick” is used to construct these involving the function

( 2 e1/(1−t ) |t| < 1; f(t) := 0 |t| > 1, which can be shown (a standard undergraduate exercise) to lie in C∞(R). I dislike this because it is a “trick”. Furthermore, though we know from Lemma ˆ ˆ −m ?? that the Fourier transform f satisfies the estimate |f(ξ)| 6 Cm|ξ| for |ξ| > 1, it is surprisingly difficult to get any effective bound for Cm. In this appendix we present a more natural construction. This is no doubt “classical”, but we learned of it in connection with (the easy direction of) something called the Denjoy-Carleman theorem. The idea is to construct a function f as an infinite convolution of normalised characteristic functions of intervals. If δ > 0, write 1 ν (x) := 1 . δ 2δ |x|6δ

Thus kνδk1 = 1.

It is extremely easy to compute the Fourier transform of νδ: we have Z δ 1 −ixξ sin δξ νˆδ(ξ) = e dx = . 2δ −δ δξ In particular we have the trivial bound 2 (A.1) |νˆ (ξ)| . δ 6 δ|ξ| Recall that the convolution of two functions f and g, f ∗ g, is defined by Z (f ∗ g)(x) := f(x − y)g(y)dy.

Suppose that f : R → [0, 1] is a function with f(x) = 1 for |x| 6 1 − η and f(x) = 0 for |x| > 1 + η. Then it is easy to see that f ∗ νδ takes values in [0, 1], has f(x) = 1 for |x| 6 1 − η − δ, and f(x) = 0 for |x| > 1 + η + δ. This observation leads quickly to the following lemma.

65 66 A. *SOME SMOOTH BUMP FUNCTIONS

Lemma A.1. Let ε > 0, and let n ∈ N. Then there is a function f = fn,ε : R → [0, 1] such that f(x) = 1 for |x| 6 1 − ε, f(x) = 0 for |x| > 1 + ε, and ˆ −n−1 2n+1 2 −n |f(ξ)| 6 C(n, ε)|ξ| . We may take C(n, ε) = 2 (n!) ε .

Proof. Define

f := ν1 ∗ νδ1 ∗ · · · ∗ νδn , 2 where δj := ε/2j . (There is considerable flexibility here; the important feature of the sequence 1/j2 is that it is summable but slowly.) Since P j−2 < 2, this has the support properties claimed. ˆ We have f =ν ˆ1νˆδ1 ... νˆδn , and so from the trivial bound (A.1) we obtain n+1 ˆ 2 −n−1 |f(ξ)| 6 |ξ| . δ1 . . . δn The result follows.

A slightly more refined analysis allows one to show that, for fixed ε > 0, the sequence fn,ε converges as n → ∞. This gives a version of Lemma A.1 in which f depends only on ε, and not on n.

∞ Lemma A.2. Let ε > 0 be fixed. Let (δj)j=1 be a sequence of positive real P numbers with j δj 6 ε. Set fn = fn,ε := ν1 ∗ νδ1 ∗ · · · ∗ νδn . Then fn converges uniformly to a function f : R → [0, 1] with f(x) = 1 for |x| 6 1 − ε and f(x) = 0 ˆ for |x| > 1 + ε. The Fourier transform f(ξ) satisfies n+1 ˆ 2 −n−1 |f(ξ)| 6 inf |ξ| . n δ1 . . . δn

Proof. Since ε is fixed throughout the argument, we write fn instead of fn,ε. First note that f1 = ν1 ∗ νδ1 satisfies the Lipschitz property

0 2 0 |f1(x) − f1(x )| 6 |x − x |. δ1 Indeed Z 0 0 |f1(x) − f1(x )| = ν1(y)(νδ1 (x − y) − νδ1 (x − y)) dy 1 Z |ν (x − y) − ν (x0 − y)|dy, 6 2 δ1 δ1 which can easily be computed to be at most the claimed bound. Next note that any such Lipschitz bound is preserved under convolution with an arbitrary non-negative A. *SOME SMOOTH BUMP FUNCTIONS 67

0 0 0 function ν with integral 1; indeed if |f(x) − f(x )| 6 C|x − x | for all x, x then Z |f ∗ ν(x) − f ∗ ν(x0)| = | ν(y)(f(x − y) − f(x0 − y))dy| Z 0 6 ν(y)|f(x − y) − f(x − y)|dy

0 6 C|x − x |.

It follows that each of the functions fn satisfies the same Lipschitz bound as f1. 0 0 Now suppose that f satisfies the Lipschitz bound |f(x)−f(x )| 6 C|x−x |, and that φ is a non-negative function with integral 1, supported on [−δ, δ]. Then Z kf − f ∗ φk∞ = sup | (f(x) − f(x − y))φ(y)dy| x Z 6 sup |f(x) − f(x − y)| φ(y)dy 6 Cδ. |y|6δ It follows from these observations that n 2 X kf − f k δ . m n ∞ 6 δ j 1 j=m+1

In particular, (fn) is Cauchy in the uniform norm and hence fn tends to some continuous function f. To obtain the stated bound on the Fourier transform, first note that fˆ(ξ) = ˆ limn→∞ fn(ξ). This follows immediately from the fact that fn → f uniformly, and the support of fn is contained in [−2, 2] for all n. Note also that if kνk1 = 1 then ∧ ˆ ˆ |(fn ∗ ν) (ξ)| = |fn(ξ)||νˆ(ξ)| 6 |fn(ξ)|. ˆ This means that |fn(ξ)| is a non-increasing function of n. The claimed bound now follows using (A.1).

Corollary A.1. Let ε > 0. Then for any κ > 0 there is a continuous function f = fε,κ : R → [0, 1] such that f(x) = 1 for |x| 6 1 − ε, f(x) = 0 for |x| > 1 + ε and 1−κ ˆ −Cε,κ|ξ| |f(ξ)| 6 e . Remark. Taking κ = 1/2 is certainly sufficient for any application I know of, ˆ −A and usually the bound |f(ξ)| A |ξ| , with no explicit dependence of the implied constant on A, is enough. 1+κ/2 Proof. Apply the preceding lemma with δj := c/j for an appropriately small c = c(ε, κ). For every n we have the bound

ˆ n 1+κ/2 −n |f(ξ)| 6 C (n!) |ξ| . Choosing n ∼ |ξ|1−κ/2 gives the claimed bound. 68 A. *SOME SMOOTH BUMP FUNCTIONS

We have said nothing about the smoothness (or otherwise) of the functions fn and f. One could examine this using the definition in terms of convolution. However, now that we have bounds on Fourier transforms, we may as well use the following lemma.

Lemma A.3. Suppose that f is continuous, that f ∈ L1(R) and that |fˆ(ξ)|  |ξ|−n. Then f is ? times differentiable. APPENDIX B

Infinite products

In this section we supply the proofs of some very basic results on infinite products from the course. The following key inequality underlies everything.

N Lemma B.1. Let (zn)n=1 be a sequence of complex numbers. Then N PN Y |zn| (1 + zn) − 1 6 e n=1 − 1. n=1 Proof. Expanding out the product gives

N Y X X X (1 + zn) − 1 = zi + zi1 zi2 + zi1 zi2 zi3 ...

n=1 i1 i1

This concludes the proof.

Proposition B.1 (Weierstrass products). Suppose that Ω ⊂ C is a countable P −2 multiset, not containing 0, and such that ρ∈Ω |ρ| < ∞. Then the function Y z E (z) := (1 − )ez/ρ Ω ρ ρ∈Ω is well-defined, entire, and has zeros with the correct multiplicities and nowhere else.

Proof. For each pair of positive integers M,N with M < N, define the truncated products (M,N) Y z E (z) := (1 − )ez/ρ. Ω ρ ρ∈Ω M<|ρ|6N Write (N) (0,N) Y z E = E = (1 − )ez/ρ. Ω Ω ρ ρ∈Ω |ρ|6N

69 70 B. INFINITE PRODUCTS

(N) ∞ for short. To make sense of EΩ, it suffices to show that the sequence (EΩ (z))N=1 is uniformly Cauchy “on compacta”, that is to say on compact subsets of C. If this can be shown, it then follows from standard results in complex analysis that (N) EΩ(z) := limN→∞ EΩ (z) exists and is holomorphic. (N) A first step is to show that the EΩ (z) are uniformly bounded on compacta. To establish this, note first that the z/ρ are uniformly bounded by some quantity K = K(R) as z ranges over |z| 6 R, because some ball about 0 contains none (1−w)ew−1 of the ρ. By Taylor expansion, the function F (w) := w2 has a removable singularity at 0, and hence is bounded on |w| 6 K; let us say that |F (w)| 6 CK for |w| 6 K, thus z |z|2 (B.1) |(1 − )ez/ρ − 1| C C R2|ρ|−2 ρ 6 K |ρ|2 6 K

(N) for all |z| 6 R. It now follows from Lemma B.1 that the EΩ (z) are indeed uniformly bounded on compacta. Now we show the Cauchy property. We start with the observation that

(N) (M) (M) Y z z/ρ |E (z) − E (z)| = |E (z)| (1 − )e − 1 Ω Ω Ω ρ M<|ρ|6N Y z z/ρ R (1 − )e − 1 ρ M<|ρ|6N uniformly for |z| 6 R. By Lemma B.1 and (B.1), it follows that P −2 (N) (M) OR( M<|ρ| N |ρ| ) |EΩ (z) − EΩ (z)| R e 6 .

P −2 The Cauchy property is now immediate from the convergence of ρ |ρ| . We have now established that EΩ(z) is well-defined as a holomorphic function. It remains to show that it has zeros with the correct multiplicity at the points ρ and at no other points. To this end it is enough to show that Y z (B.2) (1 − )ez/ρ 6= 0 when z∈ / Ω. ρ ρ∈Ω

Indeed, this obviously implies that EΩ(z) 6= 0 when z∈ / Ω, but it also implies that

EΩ(z) has a zero of exactly the right multiplicity m(ρ) at ρ by writing

z/ρm(ρ) EΩ(z) = (1 − z/ρ)e EΩ\{ρ}(z).

To establish (B.2), fix z. We need only consider the product over |ρ| > R, for large enough R, the remaining part of the product being finite. Here we use Lemma B.1 once again, together with the estimate |z|2 |(1 − z/ρ)ez/ρ − 1|  , |ρ|2 B. INFINITE PRODUCTS 71 which follows from the same argument used to prove (B.1), noting that we can |z| assume that |w| = |ρ| 6 1 by taking R large enough. This gives

Y z z/ρ O(z2 P |ρ|−2) 1 (1 + )e − 1 e |ρ|>R ρ 6 6 2 R<|ρ|6N provided that R is big enough. Thus, in this range,

Y z z/ρ 1 (1 + )e . ρ > 2 R<|ρ|6N Note that this bound does not depend on the choice of N in any way, so we may let N → ∞ to conclude the proof.

Bibliography

[1] H. Davenport, Multiplicative number theory, Graduate Texts in Mathematics, 74. Springer- Verlag, New York, 2000. xiv+177 pp. [2] E. Kowalski, Un cours de th´eorieanalytique des nombres, Cours Sp´ecialis´es 13, Soci´et´e Math´ematiquede France, Paris, 2004. [3] E. Kowalski and H. Iwaniec, Analytic Number Theory, American Mathematical Society Col- loquium Publications, 53. American Mathematical Society, Providence, RI, 2004.

73