Notes for the course in Analytic

G. Molteni

Fall 2019

revision 8.0

Disclaimer

These are the notes I have written for the course in Analytical Number Theory in A.Y. 2011–’20. I wish to thank my former students (alphabetical order): Gu- glielmo Beretta, Alexey Beshenov, Alessandro Ghirardi, Davide Redaelli and Fe- derico Zerbini, for careful reading and suggestions improving these notes. I am the unique responsible for any remaining error in these notes.

The image appearing on the cover shows a picture of the 1859 Riemann’s scratch note where in 1932 C. Siegel recognized the celebrated Riemann–Siegel formula (an identity allowing to computed with extraordinary precision the values of the Riemann zeta inside the critical strip). This image is a resized version of the image in H. M. Edwards Riemann’s Zeta Function, Dover Publications, New York, 2001, page 156. The author has not been able to discover whether this image is covered by any Copyright and believes that it can appear here according to some fair use rule. He will remove it in case a Copyright infringement would be brought to his attention.

Giuseppe Molteni

This work is licensed under a Creative Commons Attribution-Non- Commercial-NoDerivatives 4.0 International License. This means that: (Attribu- tion) You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use. (NonCommercial) You may not use the material for commercial purposes. (NoDerivatives) If you re- mix, transform, or build upon the material, you may not distribute the modified material. i Contents

Disclaimer i Notation 1 Chapter 1. theorem 2 1.1. Preliminary facts: a warm-up 2 1.2. Two general formulas 5 1.3. The ring of arithmetical functions 13 1.4. Dirichlet : as formal series 16 1.5. : as complex functions 19 1.6. The of ζ(s) 28 1.7. Some elementary results 29 1.8. The Prime Number Theorem 39 Chapter 2. Primes in arithmetic progressions 61 Chapter 3. Sieve methods 71 3.1. Eratosthenes-Legendre’s sieve 71 3.2. Selberg’s Λ2-method 77 3.3. Sifting more classes 83 3.4. Two sets with positive density 91 Chapter 4. Sumsets 103 Chapter 5. Waring’s problem 113 5.1. First step: cancellation in exponential sums 115 5.2. Second step: representation 120 Appendix. Bibliography 129

ii Notation

• Let f and g : R → [0, +∞) functions. Then • f(x) = O(g(x)) as x → x0 ∈ R := R ∪ {±∞} means that the quotient f(x)/g(x) is locally bounded in a neighborhood of x0, i.e. that there exist + a constant C ∈ R and an open set U(x0) such that

f(x)/g(x) ≤ C ∀x ∈ U(x0).

• f(x)  g(x) as x → x0 ∈ R := R ∪ {±∞} is an equivalent notation for f(x) = O(g(x)).

• f(x) = o(g(x)) as x → x0 ∈ R := R ∪ {±∞} means that the quotient f(x)/g(x) tends to 0 as x → x0 (in other words, the constant C in previous item can be taken arbitrarily small).

• f(x)  g(x) as x → x0 ∈ R := R ∪ {±∞} means that both the quotients f(x)/g(x) and g(x)/f(x) are locally bounded in a neighborhood of x0, i.e. + that there exist a constant C ∈ R and an open set U(x0) such that 1 g(x) ≤ f(x) ≤ Cg(x) ∀x ∈ U(x ). C 0

• f(x) = Ω(g(x)) as x → x0 ∈ R := R ∪ {±∞} means that f(x) = O(g(x)) is false. This means that for every constant C ∈ R and every open set U(x0) there exists x ∈ U(x0) such that |f(x)|/|g(x)| > C.

• Given x ∈ R, the integer part of x is defined bxc := max{n ∈ Z: n ≤ x}. It must be not confused with dxe := min{n ∈ Z: n ≥ x}. The fractional part of x is {x} := x − bxc (in signal processing this is called sawtooth func- tion). According to the definition, {x} is a 1-periodic function R → R with discontinuities in every x ∈ Z: lim {x} = 0, lim {x} = 1, ∀n ∈ Z. x→n+ x→n− • Usually we denote by s the complex argument of a complex function. In A. N. Th. it is customary to use σ and t to denote the real and the imaginary parts of s, respectively. In other words, s = σ + it with σ, t ∈ R. • Given two integers m and n, m|n means that m divides n.(m, n) denotes their greatest common divisor, and [m, n] their smallest common multiple, so that mn = (m, n)[m, n]. • e(x) denotes the function e(x) := e2πix.

1 Chapter 1

Prime number theorem

1.1. Preliminary facts: a warm-up + Let P be the set of prime numbers. For every x ∈ R , let π(x) := ]{p ∈ P : p ≤ x}. How much large π(x) can be? Proposition 1.1 (Euclid) P is not finite, therefore π(x) → ∞ as x → ∞.

Proof. Let p1, . . . , pn be any set of primes. Let N := 1 + p1p2 ··· pn. N is an integer, thus it has a prime factor p. p is not equal to any pj, since N = 1 (mod pj).  The argument can be modified in such a way to produce a quantitative result. Proposition 1.2 π(x)  ln ln x as x → ∞.

Proof. Let p1 = 2 < p2 < . . . be the complete set of primes (an infinite set, according to the previous result). We use the previous argument to prove that 2n−1 21−1 pn ≤ 2 for every n. In fact, the claim is true for n = 1 (because p1 = 2 ≤ 2 ). By induction on n and following the argument proving Proposition 1.1, we know that n−1 Y 2j−1 Pn−1 2j−1 2n−1−1 2n−1 pn ≤ 1 + p1p2 ··· pn−1 ≤ 1 + 2 = 1 + 2 j=1 = 1 + 2 ≤ 2 . j=1 For every x ≥ 2, let n be such that 22n−1 ≤ x < 22n . Then 2n−1 π(x) ≥ π(2 ) ≥ n ≥ log2 log2 x  ln ln x.  There are several alternative and elementary proofs of these facts. 2n P´olya. For every n ∈ N let Fn := 2 +1, the nth Fermat number. These numbers are pairwise coprime, i.e.

(Fn,Fm) = 1 ∀n 6= m. Proof. The sequence of Fermat numbers satisfies a kind of recursive formula; Qm−1 in fact k=0 Fj = Fm − 2 for every m ≥ 1, an identity which can be easily proved by induction. The formula shows that Fn divides Fm − 2 whenever n < m; in particular every prime dividing both Fn and Fm is also a factor of Fm and Fm − 2, hence it must be 2. Nevertheless, Fermat’s numbers are odd, therefore they cannot have 2 as common factor.  2 CAP. 1: PRIME NUMBER THEOREM 3

The co-primality implies that every Fn has a special prime factor, pn, which divides Fn and does not divide every Fm with m 6= n. In particular, there are infinity many primes (because there are infinity many Fermat’s numbers) and 2n−2 2n−1 the nth prime is lower than Fn−2 = 2 + 1 ≤ 2 as proved before. (Note that we use here the fact that p1 = 2 and p2 = 3 = F0). n P´olya (variation). For every n ∈ N let Mn := 2 − 1, the nth Mersenne num- m−n ber. These numbers satisfy the relation Mm = 2 Mn + Mm−n for every m ≥ n, so that the greatest common divisor of Mm and Mn is M(m,n). Let

p1 = 2, . . . , pk be any set of distinct primes. Then Mp1 ,...,Mpk are pairwise coprime, therefore there are at least k distinct odd prime numbers (because every Mn is an odd number), in particular there is at least one odd prime num- ber which is greater than pk, and hence one more prime. The argument proves pn that if 2 = p1 < p2 < p3 < ··· is the sequence of prime, then pn+1 < 2 . This upper bound for pn can be used to produce a lower bound for π(x), but of incredibly low quality. Erd˝os. Every integer n can be written in a unique way as a product of a square 2 m and a squarefree q. Fix√ x > 2 and apply that decomposition to every integer n ≤ x. There are x possible values for m, and 2π(x) values for q, at most. Hence √ π(x) bxc = ]{n ∈ N: n ≤ x} ≤ ]{m}· ]{q} ≤ x · 2 implying that π(x)  ln x. Note that also that this simple argument already improves Proposition 1.2. Euler. Consider the product Y Y  1 1 1  (1 − 1/p)−1 = 1 + + + + ··· . p p2 p3 p≤x p≤x Every integer n can be written in a unique way as product of prime powers and when n is ≤ x then also the primes appearing in its factorization are ≤ x (trivial). Therefore the previous product gives Y Y  1 1 1  X 1 (1.1) (1 − 1/p)−1 = 1 + + + + ··· ≥ . p p2 p3 n p≤x p≤x n≤x The right hand side diverges in x, hence this inequality already proves the existence of infinitely many primes: only in this case the product appearing to the left hand side diverges. The following argument deduces an interesting lower bound from this argu- ment. The inequality (1 − y) ≥ e−2y holds whenever y ∈ [0, 1/2], hence 4 1.1. PRELIMINARY FACTS: A WARM-UP

(1 − 1/p)−1 ≤ e2/p for every prime p, so that

P 2 Y Y X 1 e p≤x p = e2/p ≥ (1 − 1/p)−1 ≥ , n p≤x p≤x n≤x

i.e.,

X 1 1  X 1  (1.2) ≥ ln . p 2 n p≤x n≤x

P 1 P 1 This inequality proves that p≤x p  ln ln x because n≤x n ∼ ln x. Exercise. 1.1 The following steps improve the lower bound (1.2) by removing the constant 1/2 appearing there. 1) Prove that (1 − x)ex = 1 + O(x2) for x → 0 and use this equality to prove that

P 1 Y 1 Y  1−1 Y  O(1) e p≤x p = e p = 1 − · 1 + . p p2 p≤x p≤x p≤x

Q −2 2) Prove that p≤x(1 + c · p ) converges to a nonzero constant for every fixed constant c ≥ 0; deduce that X 1  Y  1−1 = ln 1 − + O(1). p p p≤x p≤x 3) Use (1.1) to deduce that X 1  X 1  ≥ ln + O(1). p n p≤x n≤x 4) Conclude that X 1 (1.3) ≥ ln ln x + O(1). p p≤x In Proposition 1.14 we will see that ln ln x is the right behavior for the sum of inverse of primes, since X 1 ∼ ln ln x. p p≤x This is part of a very famous set of results proved with totally elementary tools by Mertens long before the proof of the Prime Number Theorem. CAP. 1: PRIME NUMBER THEOREM 5

1.2. Two general formulas The following formula is due to Abel. It is essentially the discrete version of the well known formula for the partial integration.

Proposition 1.3 (Partial summation formula: 1 v.) Let f, g : N → C. Let F (x) P := 1≤k≤x f(k) for x ≥ 1, and F (x) := 0 if x < 1. Then N N−1 X X f(n)g(n) = F (N)g(N) − F (n)(g(n + 1) − g(n)). n=1 n=1 Proof. f(n) = F (n) − F (n − 1) for every n ≥ 1, hence N N N N X X X X f(n)g(n) = F (n) − F (n − 1)g(n) = F (n)g(n) − F (n − 1)g(n) n=1 n=1 n=1 n=1 N N−1 X X = F (n)g(n) − F (n)g(n + 1) n=1 n=1 N−1 X = F (N)g(N) − F (n)(g(n + 1) − g(n)).  n=1

1 R n+1 0 Suppose now that g ∈ C ([0, +∞)), then g(n + 1) − g(n) = n g (x) dx so that the formula becomes N N−1 X X Z n+1 f(n)g(n) = F (N)g(N) − F (n) g0(x) dx. n=1 n=1 n Here F (x) = F (n) for x ∈ [n, n + 1), therefore

N−1 X Z n+1 = F (N)g(N) − F (x)g0(x) dx n=1 n Z N = F (N)g(N) − F (x)g0(x) dx. 1 In this way we have proved the following useful formula.

Proposition 1.4 (Partial summation formula 2 v.) Let f : N → C, g : [0, +∞) → 1 P C, g ∈ C ([0, +∞)). Let F (x) := 1≤k≤x f(k) for x ≥ 1, and F (x) := 0 if x < 1. Then N X Z N (1.4) f(n)g(n) = F (N)g(N) − F (x)g0(x) dx. n=1 1 6 1.2. TWO GENERAL FORMULAS

The importance of the partial summation formula comes from the fact that P the function F (x) := n≤x f(n) (sometime called the cumulating function of f) is no less regular than f. The following result is a simple instance of this fact.

Proposition 1.5 (Ces`aromean value) Let f : N → R and suppose that f(k) → 1 ` ∈ R as k diverges. Then x F (x) → `, too. 1 Note that x F (x) is the mean value of the set {f(k)}k≤x, thus the proposition claims that the mean value of f is at least as regular as the original sequence f.

Proof. Suppose ` ∈ R. Let  > 0 be fixed. There exists an integer K such that k ≥ K implies f(k) ∈ (` − , ` + ). Let x ≥ K and let x ∈ N, then 1 1 X 1 X F (x) − ` = f(k) − ` = (f(k) − `) x x x k≤x k≤x so that 1 1 X 1  X X  F (x) − ` ≤ |f(k) − `| = |f(k) − `| + |f(k) − `| x x x k≤x k≤K K≤k≤x c 1 X c ≤ +  ≤ + , x x x K≤k≤x where c is independent of x. This formula proves the claim when x diverges in N. Trivial bounds involving x and bxc prove the claim for the general case x ∈ R. At last, the statement for ` ∈ {±∞} can be proved in similar way.  Usually F (x) has a better behavior (in some sense) than f(n). The following exercise show this principle in action: there we have f(n) = exp(2πinθ) oscillates as a function of n, but F has a finite mean value, and this allows the convergence of the series. Exercise. 1.2 Recall that e(x) := exp(2πix). 1) Prove that X X e2πi(bxc+1)θ − 1 sin(π(bxc + 1)θ) F (x, θ):= e(nθ)= e2πinθ = = eπibxcθ e2πiθ − 1 sin(πθ) 0≤n≤x 0≤n≤bxc for every θ ∈ R\Z.

2) Deduce that for every θ ∈ R\Z there is a constant cθ > 0 such that |F (x, θ)| ≤ cθ independently of x. 3) Using Proposition 1.4 deduce that the series ∞ X e(nθ) converges for every θ ∈ \ . n R Z n=1 CAP. 1: PRIME NUMBER THEOREM 7

N Exercise. 1.3 Let {zj}j=1 be any finite set of distinct complex numbers, with |zj| = 1 for every j. Let α1, . . . , αN be complex numbers, not all equal to zero. Prove that k k sk := α1z1 + ··· + αN zN = Ω(1) as k → ∞, i.e., that the claim limk→∞ sk = 0 is false. Hint: by absurd, suppose that the limit exists and is 0. Suppose α1 6= 0. Deduce 1 P −k a contradiction by computing the limit of x k≤x skz1 in two different ways. The following formula compares a sum with the corresponding integral.

Proposition 1.6 (Euler, Maclaurin) Let c be an integer and f :[c, +∞) → C be a C1 function. Then N X Z N 1 Z N (1.5) f(k) = f(x) dx + f(N) + f(c) + f 0(x)({x} − 1 ) dx. 2 2 k=c c c Proof. It is immediate to verify that Z 1 Z 1 1  0 1 g(0) + g(1) = g(x) dx + g (x)(x − 2 ) dx 2 0 0 1 for every g ∈ C ([0, 1]). Now write this formula for the functions gk(x) := f(x + k) where k ∈ N is arbitrarily fixed; we get Z 1 Z 1 1  0 1 f(k) + f(k + 1) = f(x + k) dx + f (x + k)(x − 2 ) dx. 2 0 0 In the we take the shift x + k → x, so that Z k+1 Z k+1 1  0 1 f(k) + f(k + 1) = f(x) dx + f (x)(x − k − 2 ) dx. 2 k k For x ∈ [k, k + 1), we have x − k = {x}, hence the equality can be written also as Z k+1 Z k+1 1  0 1 f(k) + f(k + 1) = f(x) dx + f (x)({x} − 2 ) dx. 2 k k Now we add the equality for k = c, . . . , N − 1, obtaining N X 1 Z N Z N f(k) − (f(N) + f(c)) = f(x) dx + f 0(x)({x} − 1 ) dx, 2 2 k=c c c which is the claim.  Exercise. 1.4 We have proved Proposition 1.6 directly, but it can be deduced also from the partial summation formula (1.4). Hint: set in that formula f(x) = 1, so that F (x) = bxc = x − {x}, and integrate by part the integral containing the factor x. 8 1.2. TWO GENERAL FORMULAS

Example. 1.1 Applying the formula to f(x) = 1/x we have N X 1 (1.6) = ln N + γ + O(1/N) k k=1 R ∞ {x} where γ := 1 − 1 x2 dx = 0.5772 ... is the Euler–Mascheroni constant. In fact, N 1 X 1 Z N 1 1 1  Z N {x} − = dx + 1 + − 2 dx. k x 2 N x2 k=1 1 1

R ∞ {x}−1/2 1 1 The integral 1 x2 dx converges absolutely because | {x} − 2 | ≤ 2 , thus we can write the equality as Z ∞ 1 Z ∞ 1 1 1  {x} − 2 {x} − 2 (1.7) = ln N + 1 + − 2 dx + 2 dx. 2 N 1 x N x 1 R ∞ {x}−1/2 R ∞ {x} Now the claim follows by setting γ := 2 − 1 x2 dx = 1 − 1 x2 dx and noticing that Z ∞ {x} − 1 1 Z ∞ 1 1 (1.8) 2 dx ≤ dx = . 2 2 N x 2 N x 2N The constant γ is probably the most famous and important constant of the Ma- thematics after 0, 1, π, e and i, and is not totally well understood. For instance, the known algorithms for its computation are not very efficient (when compared to the analogous algorithms for other constants, π for instance). One conjectures that γ is a transcendental number, but it is still unknown if γ ∈ Q.  Exercise. 1.5 From (1.7) and (1.8), deduce that N X 1 1 0 ≤ − ln N − γ ≤ ∀N ≥ 1. k N k=1 This formula can be used to compute the value of γ, but it is not very efficient (it needs N terms to get γ with an approximation 1/N). Exercise. 1.6 Use (1.6) to prove that: N N X 1 1 γ X 1 1 γ = ln(N/2) + + O(1/N), = ln(2N) + + O(1/N). k 2 2 k 2 2 k=1 k=1 k even k odd Use this result to prove that N X (−1)k − = ln 2 + O(1/N). k k=1 CAP. 1: PRIME NUMBER THEOREM 9

With the same tool, prove that for every m, n ≥ 1 the alternating sum

1 1 1 1 1 1 + 3 + ··· + 2m−1 −( 2 + 4 + ··· + 2n ) | {z } | {z } m terms n terms 1 + 1 + 1 +···+ 1 −( 1 + 1 +···+ 1 ) + ··· = ln(4m/n). 2m+1 2m+3 4m−1 2n+2 2n+4 4n 2 | {z } | {z } m terms n terms This is a concrete example of the phenomenon proved by Riemann: a series con- verges unconditionally (i.e. the convergence and the value of the series are inde- pendent of any reordering of its terms) if and only if it converges absolutely. In other words, if a series converges only simply, then there are reorderings of the same numbers producing new series which converge to different values.

Exercise. 1.7 Use Proposition 1.6 to prove that for every ` ∈ N, N X (ln k)` = N(ln N)` + O(N(ln N)`−1). k=1 With an inductive argument on ` and using also Proposition 1.4 prove the more precise equality N X ` ` (ln k) = NP`(ln N) + O((ln N) ), k=1 P` `−n `! n where P`(x) is the polynomial n=0(−1) n! x . The formula (1.5) can be considerably extended. For every integer n, let {Bn(x)} be the set of Bernoulli polynomials, i.e. the family of polynomials which are defined recursively as: Z 1 0 B0(x) = 1,Bn(x) = nBn−1(x), Bn(x) dx = 0, ∀n ≥ 1. 0 For example: 4 3 2 1 B0(x) = 1,B4(x) = x − 2x + x − 30 , 1 5 5 4 5 3 1 B1(x) = x − 2 ,B5(x) = x − 2 x + 3 x − 6 x, 2 1 6 5 5 4 1 2 1 B2(x) = x − x + 6 ,B6(x) = x − 3x + 2 x − 2 x + 42 , 3 3 2 1 7 7 6 7 5 7 3 1 B3(x) = x − 2 x + 2 x, B7(x) = x − 2 x + 2 x − 6 x + 6 x.

Exercise. 1.8 The following steps give a uniform bound for Bn(x). R 1 1) Deduce from 0 Bn(x) dx = 0 that Bn has at least a root in (0, 1) when n ≥ 1. 10 1.2. TWO GENERAL FORMULAS

0 2) Using Lagrange’s intermediate values theorem deduce that kBnk∞ ≤ kBnk∞, when n ≥ 1, where the sup norms are for x ∈ [0, 1].

3) Use the recursive definition of Bn to deduce that kBnk∞ ≤ nkBn−1k∞, and by induction conclude that kBnk∞ ≤ n!. This is not the correct order of growth for kBnk∞, since it is known that kBnk∞  n! (2π)n , but the simple argument captures the main features of the polynomials, i.e. their over-exponential growth: I’m indebt with Guglielmo Beretta for this argument (Thanks!). Exercise. 1.9 The following steps prove some relations among Bernoulli polyno- mials. P∞ tn 1) Let F (x, t) := n=0 Bn(x) n! . Prove that text F (x, t) = . et − 1 Hint: use the recursive formula to deduce that ∂xF (x, t) = tF (x, t), so that F (x, t) = a(t)ext for some function a(t). Then, integrating term by term the et−1 R 1 definition of F (x, t) prove that a(t) t = 0 F (x, t) dx = 1. n 2) From 1) deduce that F (1−x, t) = F (x, −t), i.e. that Bn(1−x) = (−1) Bn(x). t 3) From 1) deduce that F (0, t) + 2 is an even function of t, and that therefore Bn(0) = 0 when n is odd and > 2.

4) Conclude that Bn(1) = Bn(0) for every n ≥ 2. Pn n n−k 5) Prove that Bn(x) = k=0 k Bk(0)x for every n. Hint: use the equality F (x, t) = extF (0, t). 6) Specializing the previous formula to x = 1 and using 4), deduce a recursive formula for the sequence Bn(0), n ∈ N. |x| 7) Prove that |Bn(x)| ≤ n!e for every x ∈ R and every n. Hint: use the formula in Step 5 and the bound in Ex. 1.8.

Pn n+1 n 8) Prove that k=0 k Bk(x) = (n + 1)x for every n. Hint: use the equality (et − 1)F (x, t) = text.

n−1 9) Prove that Bn(x + 1) − Bn(x) = nx for every n. Hint: use the equality F (x + 1, t) − F (x, t) = text.

PN n 10) For every N ≥ 0 and n ∈ N, n ≥ 1 let Sn(N) := k=0 k . Prove that 1 Sn(N) = n+1 (Bn+1(N + 1) − Bn+1(0)). CAP. 1: PRIME NUMBER THEOREM 11

This is Faulhaber’s formula, giving the value of the sum of nth power of integers up to N as polynomial in N. 2 Note that B1(x) = x − 1/2, so that {x} − 1/2 is B1({x}). Suppose that f ∈ C and let k be any integer, then Z k+1 Z k+1− 0 1 0 f (x)({x} − 2 ) dx = lim f (x)B1({x}) dx k →0+ k+ 1 where we have introduced the parameter  because 2 B2(x) is a primitive for B1(x) 1 for every x, but 2 B2({x}) is a primitive for B1({x}) only for x ∈ R\Z. Now we can integrate by parts, getting h B ({x}) k+1− Z k+1− B ({x}) i = lim f 0(x) 2 − f 00(x) 2 dx + →0 2 k+ k+ 2 B ({x}) k+1− Z k+1 B ({x}) = lim f 0(x) 2 − f 00(x) 2 dx + →0 2 k+ k 2 B (1 − ) B () Z k+1 B ({x}) = lim f 0(k + 1 − ) 2 − f 0(k + ) 2 − f 00(x) 2 dx →0+ 2 2 k 2 B (1) B (0) Z k+1 B ({x}) = f 0(k + 1) 2 − f 0(k) 2 − f 00(x) 2 dx 2 2 k 2 B (0) Z k+1 B ({x}) = 2 (f 0(k + 1) − f 0(k)) − f 00(x) 2 dx. 2 k 2 Summing for k = c, . . . , N − 1 we get Z N Z N 0 1 B2(0) 0 0 00 B2({x}) f (x)({x} − 2 ) dx = (f (N) − f (c)) − f (x) dx, c 2 c 2 so that (1.6) becomes

N X Z N (1.9) f(k) = f(x) dx − B1(0)(f(N) + f(c)) k=c c Z N B2(0) 0 0 1 00 + (f (N) − f (c)) − f (x)B2({x}) dx. 2 2 c Now it is evident in which way the formula can be further iterated for sufficiently regular functions. R 1 Remark. 1.1 Why the polynomial Bn are normalized by setting 0 Bn(x) dx = 0? Because this condition says that integral mean value of Bn is zero, and this is R N 00 convenient for the estimation of the remainder term 0 f (x)B2({x}) dx (or even R N (n) the more general 0 f (x)Bn({x}) dx).  12 1.2. TWO GENERAL FORMULAS

Example. 1.2 Applying (1.9) to f(x) = ln x we have N X 1 (1.10) ln(N!) = ln k = N ln N − N + ln N + c + O(1/N), 2 k=1 where c is a constant that the method does not allow to determine but which can be found with a different argument (for instance as a consequence of Wallis’ 1 formula): c = 2 ln(2π). The resulting formula for ln(N!) is due to Stirling.  Exercise. 1.10 Probably you are curious about Wallis’ formula we mentioned before, i.e. about a possible way to identify the constant c in (1.10). Here the sketch of the proof. R π n n−1 1) Let In := 0 (sin x) dx. Prove that I0 = π, I1 = 2 and that In = n In−2 for every n ≥ 2. 2) Recall that the double factorial !! is defined as 0!! = 1!! = 1 and n!! := n(n − 2)(n − 4) ··· for n ≥ 2, where the product decreases up to 1 or 2, according to the parity of n. Prove that (2k + 1)! (2k + 1)! (2k)!! = 2kk!, (2k + 1)!! = = . (2k)!! 2kk!

R π 2k+1 R π 2k 2k+1 3) Prove that I2k+1= 0 (sin x) dx≤ 0 (sin x) dx=I2k

I2k (2k − 1)!!(2k + 1)!! I0 (2k − 1)!(2k + 1)!k = = 4k 4 π. I2k+1 (2k)!!(2k)!! I1 2 k! √ N c O(1/N) 5) The result in Example√ 1.2 can be written as N! = (N/e) e Ne , so that N! ∼ (N/e)N ec N. Inserting this asymptotic into 4), after some simpli- fications prove that the right hand side tends to 2πe−2c. This constant must ln(2π) be 1, by 3). This proves that c = 2 , i.e. that √ N! ∼ (N/e)N 2πN. There are several alternative proofs. Some of them are collected in the first chapter of [EL]. 1 Exercise. 1.11 Use (1.9) and the fact that |B2(x)| ≤ 6 for x ∈ [0, 1] to prove that N 1 X 1 1 − ≤ − ln N − − γ ≤ 0 ∀N ≥ 1. 6N 2 k 2N k=1 CAP. 1: PRIME NUMBER THEOREM 13

This formula can be used to√ compute the value of γ and is more efficient than the one in Ex. 1.5 (it needs N terms to get γ with an approximation 1/N). This formula for γ can be further improved using the Euler-Maclaurin formula at higher levels (involving Bernoulli polynomials of higher order), and was used by Euler exactly for this purpose.

Exercise. 1.12 Let α ∈ (0, 1). Prove that there exists a constant cα such that X 1 N 1−α ξ(α, N) = + c + ∀N ≥ 1, nα 1 − α α N α n≤N with ξ(α, N) ∈ [0, 1] for every N.

1.3. The ring of arithmetical functions In Analytic Number Theory it is customary to call arithmetical function every function f : N\{0} → C; the notion of arithmetical function therefore overlaps with the one of sequence. The set of arithmetical functions has a natural structure of commutative and associative ring with unit, with respect to the pointwise sum:

(f + g)(n) := f(n) + g(n) ∀n ∈ N\{0}, and the Dirichlet product: X X (f ∗ g)(n) := f(d)g(n/d) = f(d)g(d0). d|n d,d0 dd0=n The second representation shows immediately the equality f ∗ g = g ∗ f. The unit with respect to this product is the function δ which is defined as: δ(1) = 1, δ(n) = 0 for every n > 1. Exercise. 1.13 Prove that f is invertible in the ring of the arithmetical functions if and only if f(1) 6= 0. A function f is called: -) multiplicative, when f(mn) = f(m)f(n) for every couple of coprime integers m, n; -) completely multiplicative, when f(mn) = f(m)f(n) for every couple of integers m, n; -) additive, when f(mn) = f(m)+f(n) for every couple of coprime integers m, n; -) completely additive, when f(mn) = f(m) + f(n) for every couple of integers m, n. 14 1.3. THE RING OF ARITHMETICAL FUNCTIONS

We will see several examples of arithmetical functions having (one of) these pro- perties. The most interesting is certainly the multiplicativity, as a consequence of the following fact. Proposition 1.7 Let f and g be multiplicative, then f ∗ g is multiplicative, too. Proof. Let m and n be coprime. Then every divisor of mn can be factorized in a unique way as product of two integers d and d0, with d dividing m and d0 dividing n. Vice-versa, every couple d, d0 of divisors of m and n respectively, produces a divisors dd0 of mn. As a consequence X m n X X m n (f ∗ g)(mn) = f(dd0)g  = f(dd0)g  d d0 d d0 d|m d|m d0|n d0|n X X m n X m X n = f(d)f(d0)g g  = f(d)g  f(d0)g  d d0 d d0 d|m d0|n d|m d0|n = (f ∗ g)(m)(f ∗ g)(n), where for the intermediate equality we have used the fact that (m, n) = 1, d|m, 0 0 0 d |n imply (d, d ) = 1 = (m/d, n/d ), and the multiplicativity of f and g.  Exercise. 1.14 Prove that if f is invertible and multiplicative, then also f −1 is multiplicative; this proves that the multiplicative and invertible arithmetical functions form an abelian group. Here we recall some of the most useful arithmetical functions.

δP , the characteristic function of the primes, defined as: δP (n) = 1 when n ∈ P, 0 otherwise; δ, the delta function, defined as: δ(1) = 1, δ(n) = 0 for all n > 1; 1, the unit function, defined as: 1(n) = 1 for all n ≥ 1;

I, the identity function, defined as I(n) := n for all n; ω, the omega function, defined as: ω(n) := #{p: p|n} (i.e., the number of distinct primes dividing n);

µ, the M¨obiusfunction, defined as: µ(n) := 0 if n not squarefree, µ(n) = (−1)ω(n) if n is squarefree; Λ, the Von Mangoldt function, defined as: Λ(n) := 0 if n not a prime power, Λ(n) := ln p if n = pk for some prime p and some power k > 0;

P τ στ , the τ-divisor function, defined as: στ (n) := d|n d for every n, when τ is a fixed parameter in C. CAP. 1: PRIME NUMBER THEOREM 15 P d, the divisor function, defined as: d(n) := σ0(n) = d|n 1 for every n; P σ, the sum of divisors function, defined as: σ(n) := σ1(n) = d|n d for every n; ϕ, the totient Euler function, defined as ϕ(n) := the cardinality of the set of ∗ integers in 1, . . . , n which are coprime to n; in other words, ϕ(n) = ](Z/nZ) . The following facts can be verified directly: 1) ω is additive;

2) µ, 1, στ , d, σ are multiplicative; 3) ϕ is multiplicative, for instance as a consequence of the Chinese remainder Q theorem, and ϕ(n) = n p|n(1 − 1/p); τ 4) d = 1 ∗ 1, σ = 1 ∗ I, στ = 1 ∗ I . The following equality is less trivial and is fundamental for our purposes. It is called second form of the M¨obiusidentity (the first one being a slightly different identity): (1.11) 1 ∗ µ = δ, P i.e., d|n µ(d) = 0 for every n > 1. This equality shows that µ is the inverse of 1 in the ring of the arithmetical functions. Proof. Both δ and 1 ∗ µ are multiplicative, therefore it is sufficient to prove their equality for prime powers. Let k > 0 and p be any prime number, then X (1 ∗ µ)(pk) = µ(d) = µ(1) + µ(p) = 0 = δ(pk), d|pk which proves the claim.  The associativity of the ring implies that (1.12) F = 1 ∗ f ⇐⇒ f = µ ∗ F, i.e., that X X (1.13) F (n) = f(d) ⇐⇒ f(n) = F (d)µ(n/d). d|n d|n

Exercise. 1.15 (Erd˝os)Let f(n): N → (0, +∞) be a multiplicative and monotone α 1 function. Then, there exists α ∈ R such that f(n) = n for every n. 1The original proof was very difficult but now it has been considerably simplified, for instance see E. Howe: A new proof of Erd˝os’theorem on monotone multiplicative functions, Amer. Math. Monthly 93(8), 593–595, 1986. See also [T], Ch. 1.2 Ex. 10. 16 1.4. DIRICHLET SERIES: AS FORMAL SERIES P Exercise. 1.16 Prove that I = 1 ∗ ϕ and that ϕ = I ∗ µ, i.e. that n = d|n ϕ(d) P and ϕ(n) = d|n dµ(n/d). P Exercise. 1.17 Prove that ln = 1∗Λ and that Λ = µ∗ln, i.e. that ln n = d|n Λ(d) P P and Λ(n) = d|n µ(d) ln(n/d). Deduce that Λ(n) = − d|n µ(d) ln d.

Exercise. 1.18 Let h be a completely additive map. Let Dh be defined on the ring of arithmetical functions by setting Dhf : n → (Dhf)(n) := f(n)h(n) (i.e., the pointwise multiplication by the values of h). Prove that Dh is a derivation, i.e. that Dh(f ∗ g) = (Dhf) ∗ g + f ∗ Dhg. The ring of arithmetical functions supports also other derivations which are not of this kind.2 Exercise. 1.19 The ring of arithmetical functions is a Unique Factorization Do- main, i.e. every arithmetical function can be written in a unique way (up to reordering) as product of irreducible arithmetical functions.3. This is not known for the ring of somewhere converging Dirichlet series and this is a pity because, if proved, such unique factorization property would have many important conse- quences for the number theory.

1.4. Dirichlet series: as formal series To every arithmetical function f we associate a formal series F and vice-versa, in the following way: ∞ X f(n) f : \{0} → ⇐⇒ F (s) := . N C ns n=1 Note that we are not assuming any hypothesis about the convergence of the series defining F (s), so that we consider it (for the moment) only as a formal series. Any series of the form F is called Dirichlet series. For instance ∞ X 1 1 ⇐⇒ =: ζ(s) Riemann’s zeta function, ns n=1 ∞ X δ(n) δ ⇐⇒ = 1, ns n=1

2See H. N. Shapiro: On the convolution ring of arithmetic functions, Comm. Pure Appl. Math. 25, 287–336, 1972. 3See E. D. Cashwell, C. J. Everett: The ring of number-theoretic functions, Pacific. J. Math. 9, 975–985, 1959; and Formal power series Pacific. J. Math. 13, 45–64, 1963. CAP. 1: PRIME NUMBER THEOREM 17

∞ X n ⇐⇒ = ζ(s − 1), I ns n=1 ∞ X nτ τ ⇐⇒ = ζ(s − τ). I ns n=1 As usual when dealing with formal series, we consider two Dirichlet series F and G as equal if and only if they have the same coefficients. Also the set of formal Dirichlet series is a ring, with respect to the pointwise sum ∞ ∞ ∞ X f(n) X g(n) X f(n) + g(n) F (s) = ,G(s) = , =⇒ (F + G)(s) = , ns ns ns n=1 n=1 n=1 and product ∞ ∞ ∞ ∞ ∞ X f(m) X g(n) X X f(m)g(n) X (f ∗ g)(n) (FG)(s) :=   = = . ms ns (mn)s ns m=1 n=1 m=1 n=1 n=1 The last formula shows that the ring of arithmetical functions (with the ∗ Dirichlet product) and the ring of formal Dirichlet series (and pointwise sum and product) are isomorphic, so that we can prove identities about arithmetical functions simply multiplying the corresponding Dirichlet series (and vice-versa, of course). For instance, we have ∞ X d(n) d = 1 ∗ 1 =⇒ = ζ2(s), ns n=1 ∞ X σ(n) σ = 1 ∗ =⇒ = ζ(s)ζ(s − 1), I ns n=1 ∞ X στ (n) σ = 1 ∗ τ =⇒ = ζ(s)ζ(s − τ), τ I ns n=1 ∞ X µ(n) µ = 1−1 =⇒ = ζ−1(s). ns n=1 For multiplicative arithmetical functions f an alternative representation is pos- sible (for the moment only as formal identity, without any notion of convergence). In fact, from the unique factorization of every integer as product of prime powers we have ∞ ν ν ν Y f(p) f(p2) f(p3)  X f(p 1 ) · f(p 2 ) ··· f(p k ) 1 + + + + ··· = 1 2 k ps p2s p3s ns p n=1 18 1.4. DIRICHLET SERIES: AS FORMAL SERIES

ν1 ν2 νk where n = p1 · p2 ··· pk is the factorization of n in prime powers (for n = 1 the product is empty and its value is taken equal to 1, by definition). When f ν1 ν2 νk is multiplicative we have f(p1 ) · f(p2 ) ··· f(pk ) = f(n), so that we get the final equality ∞ Y f(p) f(p2) f(p3)  X f(n) (1.14) 1 + + + + ··· = , ps p2s p3s ns p n=1 which is called representation as . When f is completely multiplica- tive the identity can be further elaborated because in this case f(p) f(p2) f(p3) f(p) f(p)2 f(p)3  f(p)−1 1 + + + + ··· = 1 + + + + ··· = 1 − , ps p2s p3s ps p2s p3s ps again as formal identity between power series in the variable f(p)/ps. In this case the Euler product becomes ∞ Y f(p)−1 X f(n) (1.15) 1 − = . ps ns p n=1 The multiplicativity of the functions 1 and µ gives the representations ∞ Y 1 −1 X 1 1 − = = ζ(s) ps ns p n=1 ∞ Y 1  X µ(n) 1 − = = ζ−1(s), ps ns p n=1 so that now the equality (1.11) saying that 1 ∗ µ = δ simply becomes ∞ ∞ X 1 X µ(n) = ζ(s)ζ−1(s) = 1, ns ns n=1 n=1 and Equalities (1.12) and (1.13) can be formulated by saying that ζ(s)F (s) = G(s) ⇐⇒ F (s) = ζ−1(s)G(s). Exercise. 1.20 Using the multiplicativity and the representation as Euler product it is now easy to prove that: ∞ X ϕ(n) ζ(s − 1) = . ns ζ(s) n=1 Exercise. 1.21 Prove that X 1 Y 1  = 1 + ; ns ps n squarefree p CAP. 1: PRIME NUMBER THEOREM 19 deduce that X 1 ζ(s) = . ns ζ(2s) n squarefree Exercise. 1.22 This is a generalization of the previous exercise. Let r be a fixed r positive integer. An integer n is called r-power free when p|n implies that p - n (i.e., when n is not divisible by any r-power which is not equal to 1). Prove that

X 1 ζ(s) = . ns ζ(rs) n r-power free Exercise. 1.23 Again, using the multiplicativity and the representation as Euler product prove that for every couple of arbitrarily fixed τ, ν ∈ C, one has ∞ X στ (n)σν(n) ζ(s)ζ(s − τ)ζ(s − ν)ζ(s − τ − ν) = . ns ζ(2s − τ − ν) n=1 This identity is due to Ramanujan. As special cases we have

∞ 2 X |στ (n)| ζ(s)ζ(s − τ)ζ(s − τ)ζ(s − τ − τ) = ∀ τ ∈ , ns ζ(2s − τ − τ) C n=1 ∞ 2 2 X |σiτ (n)| ζ (s)ζ(s − iτ)ζ(s + iτ) = ∀ τ ∈ , ns ζ(2s) R n=1 ∞ X d(n)2 ζ4(s) = . ns ζ(2s) n=1

1.5. Dirichlet series: as complex functions We have seen how the Dirichlet series are already useful when are considered as formal series. Nevertheless, their full strength appears when they are considered as complex functions, but for this we have to discuss their convergence as series. There is very good introduction to this topic in [Titch], Ch. IX, and Hardy and Riesz [HR] have dedicated an entire book to this subject.

P∞ s Theorem 1.1 Suppose that the Dirichlet series F (s) := n=1 f(n)/n converges for s = s0 ∈ C. Then it converges for every s with Re(s) > Re(s0) and the convergence is uniform in the sector S` := {s: |Im(s − s0)| ≤ ` Re(s − s0)}, for every ` > 0. 20 1.5. DIRICHLET SERIES: AS COMPLEX FUNCTIONS

SH Im(s) s

Im(s0) s0

s0) Re(s)Re(

Figure 1.1.

Proof. Without loss of generality we can assume that F (s0) = 0, because we can satisfy this condition simply by change the value of f(1) into f(1) − F (s0). Let P s0 S(x) := n≤x f(n)/n , with S(x) = 0 when x < 1. By Proposition 1.4 we get that for every M < N ∈ N N N X f(n) X f(n) 1 = ns ns0 ns−s0 n=M+1 n=M+1 S(N) S(M) Z N s0−s−1 = s−s − s−s + (s − s0) S(x)x dx. N 0 M 0 M Now, fix  > 0 and take M large enough to have |S(m)| <  for every m ≥ M: such an M exists because F (s0) = 0, by hypothesis. Moreover, we take s with Re(s) ≥ Re(s0), so that we deduce that N f(n) |S(N)| |S(M)| Z N X Re(s0−s)−1 ≤ + +|s − s0| |S(x)|x dx ns N Re(s−s0) M Re(s−s0) n=M+1 M Z N Z +∞ Re(s0−s)−1 Re(s0−s)−1 ≤ 2 + |s − s0| x dx ≤ 2 + |s − s0| x dx M 1  |s − s |   |Im(s − s )| =  2 + 0 ≤  3 + 0 . Re(s − s0) Re(s − s0)

When s ∈ S` the last inequality becomes ≤ (3 + `), so that the convergence (uniform in S`) follows by the Cauchy test.  In analogy with the power series, the previous result motivates the introduction of the notion of abscissa of convergence, which is

σc := inf{Re(s0): F (s) converges at s = s0}, CAP. 1: PRIME NUMBER THEOREM 21 with σc = +∞ when the series does not converge in any point, and σc = −∞ when the series converges everywhere. In fact, from the previous theorem we deduce immediately the following fact.

P∞ s Corollary 1.1 Suppose that the Dirichlet series F (s) := n=1 f(n)/n converges somewhere (hence σc ∈ [−∞, +∞)), then the series converges in the half-plain Re(s) > σc and the convergence is uniform in every compact subset. s Each function 1/n = exp(−s ln n) is holomorphic in C, so that from Morera’s theorem we deduce immediately the following regularity result.

P∞ s Corollary 1.2 Suppose that the Dirichlet series F (s) := n=1 f(n)/n converges somewhere (hence σc ∈ [−∞, +∞)), then F (s) is holomorphic in the half-plain H := {s ∈ C: σ > σc} and its derivative can be computed termwise, so that ∞ X f(n) ln n F 0(s) = − ∀s: Re(s) > σ . ns c n=1

Proof. F (s) is continuous in Re(s) > σc, because the convergence is uniform in every compact (so we have uniform convergence in a suitable open neighborhood of every fixed point) and the summand are evidently continuous. Let Γ be any closed (simple and regular) curve contained in H. Then ∞ ∞ Z Z X f(n) X Z f(n) F (s) ds = ds = ds ns ns Γ Γ n=1 n=1 Γ where the inversion of the sum and the integral is allowed by the uniform conver- R f(n) gence (the curve Γ is evidently a compact set). Each integral Γ ns ds is null, −s because the map n is holomorphic in C, hence Z F (s) ds = 0. Γ Therefore F (s) is a complex map which is continuous and whose integral over every closed curve is zero: Morera’s theorem allows to conclude that F (s) is holomorphic in Re(s) > σc. 0 The proof of the formula for F (s) runs as follow. Let s0 ∈ H be fixed, and let Γ be a circle centered at s0, sufficiently small to be contained in H and positively oriented. Then the Cauchy formula for the derivative of a says that ∞ 1 Z F (z) 1 Z X f(n)n−z F 0(s ) = dz = dz. 0 2πi (z − s )2 2πi (z − s )2 Γ 0 Γ n=1 0 22 1.5. DIRICHLET SERIES: AS COMPLEX FUNCTIONS

The uniform convergence in Γ allows to exchange the integral and the series, so that ∞ X 1 Z n−z F 0(s ) = f(n) dz. 0 2πi (z − s )2 n=1 Γ 0 −s The inner integral is the derivative of the map s 7→ n at s0 (because this map is holomorphic), thus its value is −n−s0 ln n, so that

∞ 0 X −f(n) ln n F (s0) = , ns0 n=1 which is the claim.  Exercise. 1.24 Note that Corollary 1.2 shows that when F (s) is a Dirichlet series, 0 0 then F (s) is a Dirichlet series too, and that if we denote by σc and σc the abscissa 0 0 of convergence for F (s) and F (s) respectively, then σc ≤ σc. Prove that actually 0 σc = σc. Hint: F (s) is a primitive of F 0(s), and the integration along every compact path 0 0 can be done termwise, by the uniform convergence of F (s) in Re(s) > σc. The notion of abscissa of convergence is similar to the one of radius of conver- gence for power series, nevertheless there is an important difference: a Dirichlet series can have a finite abscissa σc without having any singularity along the vertical line Re(s) = σc. For power series, there is always a singularity on the critical circle. This different behavior is due essentially to the lack of compactness (the vertical line is not compact, while the critical circle is evidently a compact set). An exam- P∞ n s ple of this phenomenon is the series n=1(−1) /n for which σc = 0 (prove it, for instance by using Proposition 1.3 with f(n) = (−1)n and g(n) = n−s to prove that PN n s limN→∞ n=1(−1) /n exists and is finite whenever Re(s) > 0) and that admits an analytic continuation to C as holomorphic function (see Ex. 1.28). However, for Dirichlet series with non-negative coefficients this phenomenon cannot happen: this is the claim of the following result, due to Landau.

P∞ s Theorem 1.2 (Landau) Suppose that the Dirichlet series F (s) := n=1 f(n)/n converges somewhere, and that F (s) has an analytic continuation in Ω\{σc} where Ω is an open set containing the point s = σc. If f(n) ≥ 0 for every n, then σc is a singularity for F (s).

Proof. Without loss of generality we can assume that σc = 0 (because we can always translate the problem to the analogous problem for F (s+σc) whose abscissa of convergence is 0). By absurd, suppose that F (s) is holomorphic in an open CAP. 1: PRIME NUMBER THEOREM 23 neighborhood U of s = 0, so that its Taylor power series centered at 1 ∞ X F (k)(1) F (s) = (s − 1)k k! k=0 has a convergence radius strictly greater than 1. By Corollary 1.2 ∞ X f(n) lnk n F (k)(1) = (−1)k ∀k, n n=1 hence ∞ ∞ X X (1 − s)k f(n) lnk n F (s) = . k! n k=0 n=1 Let s ∈ U be negative, then each term appearing in this double series is non- negative (here we use the assumption f(n) ≥ 0 for all n) so that the series can be exchanged without modifying its value; in this way we get ∞ ∞ X f(n) X (1 − s)k lnk n F (s) = . n k! n=1 k=0 The inner sum here is exp((1 − s) ln n) = n1−s, hence we have proved that ∞ X f(n) F (s) = ns n=1 holds for some negative s. This means that the Dirichlet series converges for some negative s, which is impossible since we have assumed that σc = 0.  In its essence, the previous theorem holds because a non-negative double series can be reordered without assuming its convergence. The following exercise provides an other instance of this fact, this time in the more familiar setting of the power series. P∞ n Exercise. 1.25 (Pringsheim’s Theorem) Let F (z) := n=0 a(n)z be a power series with convergence radius equal to 1 and suppose that it has an analytic continuation in an open set containing z = 1. Prove that if a(n) ≥ 0 for every n then the point z = 1 is a singularity for F (z). Hint: Imitate the proof of the Landau theorem. Suppose (by absurd) that F (s) is holomorphic in an open set containing z = 1. Consider the representation as power series at s = 1/2 of F . The convergence radius of this power series is strictly greater than 1/2 (why?). Use the representation as power series at z = 0 to find F (k)(1/2) and substitute in the power series at z = 1/2. In this way you get a double series which can be exchanged (because its terms are non-negative). 24 1.5. DIRICHLET SERIES: AS COMPLEX FUNCTIONS

P∞ s A Dirichlet series n=1 f(n)/n is called absolutely convergent in s0 ∈ C when ∞ ∞ X f(n) X |f(n)| = < ∞. ns0 nRe(σ0) n=1 n=1 Note that the absolute convergence depends only on the behavior of the series at Re(σ0), so that if the series converges absolutely at s0 then it converges absolutely in every point of the vertical line Re(s) = Re(s0). The absolute convergence implies the usual convergence (a simple application of the Cauchy test). Moreover, it is immediate to prove that if a Dirichlet series converges absolutely at s0, then it converges absolutely in every point s with Re(s) ≥ Re(s0) and that the convergence is uniform in every half-plain H := {s: Re(s) > Re(s0)+}. Therefore, it is useful to introduce the notion of abscissa of absolute convergence, which is

σa := inf{σ ∈ R: F (σ) converges absolutely}.

Evidently σc is always ≤ σa. P∞ s Exercise. 1.26 Let F (s) = n=1 f(n)/n . Suppose that it converges somewhere, so that σc < +∞. Prove that σa ≤ σc + 1, i.e. that the convergence is absolute in every point s with Re(s) > σc + 1.

The previous exercise proves that σc ≤ σa ≤ σc +1; in general it is not possible to be more precise, in fact for every choice of u ∈ [0, 1] it is possible to define a Dirichlet series with σc = 0 and σa = u. P∞ n `s Exercise. 1.27 Let ` ∈ N. Let F`(s) := n=1(−1) /n . Note that this is a Dirichlet series, since ` ∈ N. Prove that for this series σc = 0 (using the partial summation formula) and σa = 1/`. As a more elaborated example, let α ≥ 1. Let Fα be the Dirichlet series ∞ X (−1)n F (s) = α dnαes n=1 where dxe := inf{n ∈ Z: x ≤ n}. Prove that for this series σc = 0 and σa = 1/α. For multiplicative functions the alternative representation as Euler product is possible. We see now that this representation is valid whenever the Dirichlet series converges absolutely. P∞ s Theorem 1.3 Let F (s) = n=1 f(n)/n converge absolutely at s0. Let f(n) be multiplicative, then the infinite product ∞ Y X f(pk) (1.16) 1 + pks p k=1 converges absolutely at s0 to F (s0). CAP. 1: PRIME NUMBER THEOREM 25 Q Proof. Recall that an infinite product (1 + an) converges absolutely (by defi- Q n P nition), when n(1 + |an|) converges, and that this happens if and only if n |an| converges too. In fact, the inequality 1 + y ≤ ey implies that Y X (1 + |an|) ≤ exp( |an|) n n so that X Y |an| < ∞ =⇒ (1 + |an|) < ∞. n n y/2 Q On the other hand, for y ∈ [0, 1] we have e ≤ 1 + y. If n(1 + |an|) < +∞ then 4 |an| goes to zero , so that |an| < 1 if n is large enough, n > N say. Then 1 X Y (1.17) exp( 2 |an|) ≤ (1 + |an|) n>N n>N so that Y X (1 + |an|) < ∞ =⇒ |an| < ∞. n n

Therefore, in order to prove the absolute convergence of (1.16) at s0 it is sufficient to prove that ∞ X X f(pk) < ∞. pks0 p k=1 This is almost immediate, since ∞ ∞ ∞ X X f(pk) X X |f(pk)| X |f(n)| ≤ ≤ < ∞, pks0 pkRe(s0) nRe(s0) p k=1 p k=1 n=1 by hypothesis. Now we have to prove that the product converges to F (s0). We fix P > 0 and consider the finite product ∞ Y  X f(pk) 1 + . pks0 p≤P k=1 In this product each factor is a power series converging absolutely, therefore we can rearrange the terms without modifying its value. The multiplicativity of f

4 QN In fact, the sequence n=1(1 + |an|) increases and is larger than 1. Thus its limit `, say, is not QN n=1(1+|an|) ` zero and 1 + |aN | = QN−1 goes to ` = 1. Notice that this argument does not apply n=1 (1+|an|) to simply convergent products, since in that case it is not possible to exclude the case that the limit ` is 0. 26 1.5. DIRICHLET SERIES: AS COMPLEX FUNCTIONS shows that the rearrangement is the sum X f(n) ns0 n∈A where A denotes the set of integers whose prime factors are smaller than P . We note that X f(n) X f(n) F (s0) − = ns0 ns0 n∈A n∈B where B = Ac is the set of integers having at least one prime factor greater then P . Hence ∞ Y  X f(pk) X f(n) X f(n) X f(n) 1 + − F (s0) = ≤ ≤ , pks0 ns0 ns0 ns0 p≤P k=1 n∈B n∈B n≥P since each integer in B is not lower than P . The last sum tends to 0 when P → ∞, P f(n) since n ns0 converges, by hypothesis.  The following proposition proves an important fact about the localization of zeros of an absolute convergent product; its relevance for Dirichlet series with multiplicative coefficients comes from the previous theorem. Q Proposition 1.8 Let n(1+an) be an absolutely converging infinite product. Then its value is 0 if and only if some factor 1 + an is zero. −2y P Proof. For y ∈ [0, 1/2] we have e ≤ 1 − y. By hypothesis n an converges absolutely, thus |an| < 1/2 if n is large enough, n > N say. Then

X Y Y Y exp(−2 |an|) ≤ (1 − |an|) ≤ |1 + an| = (1 + an) . n>N n>N n>N n>N Q Q In particular (1 + an) is strictly positive. This proves that (1 + an) can n>N Q n be equal to 0 if and only if the finite product n≤N (1 + an) is equal to zero, which is the claim.  When applied to the Dirichlet series defining the , the previous theorems prove that P s Corollary 1.3 In the half-plain Re(s) > 1 the Dirichlet series ζ(s) = n 1/n converges absolutely, is a holomorphic function, has the representation ∞ X 1 Y  1 −1 ζ(s) = = 1 − ns ps n=1 p and is not equal to zero. CAP. 1: PRIME NUMBER THEOREM 27

Remark. 1.2 It is sufficient to modify even only the sign of a finite set of coefficients of ζ to produce a new function having zeros in Re(s) > 1, and therefore badly violating the Riemann hypothesis. The new function has no more a representation as Euler product: this fact suggests that if RH is true, then for its proof in some place the arithmetic will have a fundamental contribution (see Ex. 1.53).  There is still one important aspect about the Dirichlet series that we have to dis- P∞ s cuss. When we consider them as formal series, we define the equality n=1 f(n)/n P∞ s = n=1 g(n)/n by saying that this happens if and only if f(n) = g(n) for every n. What happen to this condition when the series converge somewhere and therefore can be considered as functions of complex variable? The following proposition will allow to show that the series are equal as complex functions if and only if they are equal as formal series, i.e. that they are equal if and only if f(n) = g(n) for every n. P∞ s Proposition 1.9 Let F (s) = n=1 f(n)/n converge somewhere. Let σ0 be any real number greater then σa, the abscissa of absolute convergence. Then Z T ∞ 2 1 2 X |f(n)| lim |F (σ0 + it)| dt = < ∞. T →∞ 2T n2σ0 −T n=1 As a consequence, if F (s) is the null function then f(n) = 0 for every n.

Proof. The absolute convergence at σ0 implies the absolute convergence of the series in every point of the vertical line {s: s = σ0 + it, t ∈ R}, uniformly in t. The theorem about the absolute convergence of product of series proves that ∞ ∞ 2 X f(m) X f(n) |F (σ0 + it)| = · mσ0+it nσ0−it m=1 n=1 ∞ X f(m)f(n) n it = (mn)σ0 m m,n=1 and that this double series converges absolutely and uniformly for t ∈ R. The uniform convergence gives ∞ Z T Z T it 1 2 X f(m)f(n) 1  n  |F (σ0 + it)| dt = dt. 2T (mn)σ0 2T m −T m,n=1 −T A simple computation shows that ( 1 Z T  n it 1 if n = m dt = sin(T ln(n/m)) 2T −T m T ln(n/m) if n 6= m. 28 1.6. THE ANALYTIC CONTINUATION OF ζ(S)

1 Z T  n it 1 Z T  n it 1 Z T

Moreover, dt ≤ dt = dt = 1, so that 2T −T m 2T −T m 2T −T the convergence of the double series in m, n is uniform in T . Hence the limit as T → ∞ can be computed termwise, and we get ∞ Z T Z T it 1 2 X f(m)f(n) 1  n  lim |F (σ0 + it)| dt = lim dt T →∞ 2T (mn)σ0 T →∞ 2T m −T m,n=1 −T ∞ ∞ X f(m)f(n) X |f(n)|2 = δm,n = .  (mn)σ0 n2σ0 m,n=1 n=1

1.6. The analytic continuation of ζ(s) P∞ s From the previous section we know that ζ(s) = n=1 1/n is a holomorphic function in Re(s) > 1. In this section we prove that ζ(s) admits a ‘natural’ extension as a function in C. For every fixed s, from Proposition 1.6 we get that

N Z N Z N X 1 dx 1 1  B1({x}) = + + 1 − s dx ns xs 2 N s xs+1 n=1 1 1 1−s Z N 1 N 1 1  B1({x}) = + + s + 1 − s s+1 dx. s − 1 1 − s 2 N 1 x Now, suppose that Re(s) > 1, then we can take the limit N → ∞, getting Z ∞ 1 1 B1({x}) ζ(s) = + − s s+1 dx. s − 1 2 1 x This relation is an identity for Re(s) > 1, but the integral exists in the larger region Re(s) > 0 (the integrand here decays to ∞ as  1 , because B ({x}) xRe(s)+1 1 is bounded). Moreover, this integral defines a holomorphic function in Re(s) > 0 (standard argument, again based upon Morera’s theorem), and (s − 1)−1 is evidently a meromorphic function in C with a unique pole at s = 1, which is simple and where the residue is equal to 1. Thus, the previous equality can be used to define ζ(s) in the larger region Re(s) > 0, as meromorphic function in Re(s) > 0 and having a unique pole (simple, and residue equal to 1) at s = 1. The general theory of analytic continuation allows to conclude that this definition is the unique one which extends ζ(s) as meromorphic function. We can pursuit this argument. With an integration by parts we get Z ∞ Z ∞ B1({x}) B2(0) s + 1 B2({x}) s+1 dx = − + s+2 dx, when Re(s) > 0. 1 x 2 2 1 x CAP. 1: PRIME NUMBER THEOREM 29

As before, this is an equality in Re(s) > 0 but the integral to the right hand side exists in Re(s) > −1 and defines a holomorphic function here. Thus this equality can be used to define the function to the left hand side in this larger region. Iterating this argument m-times we get a formula providing the analytic extension of ζ(s) in Re(s) > 1 − m. Concluding, we have proved the following result. Corollary 1.4 The Riemann zeta function ζ(s) admits an analytic continuation in C as meromorphic function, with a unique pole at s = 1 which is simple and with Ress=1 ζ(s) = 1. There are more elegant ways to prove the same conclusion. For example we can extend ζ to Re(s) > 0 with the previous argument and then use the functional equation satisfied by ζ (a relation connecting the values of ζ(s) and ζ(1 − s)) to get the analytic continuation. The functional equation is a fundamental tool for the comprehension of the deep analytical properties of ζ(s), but it is not useful for our limited purpose (the proof of the Prime Number Theorem). We will not mention it anymore in these notes.

P∞ n s 1−s Exercise. 1.28 Prove that f(s) := n=1(−1) /n = (2 − 1)ζ(s) when Re(s) > 1. Notice that this equality provides the analytic continuation to C of f(s), as holomorphic function. P s P s Hint: Look for a representation of n even 1/n and n odd 1/n in terms of ζ(s), 1−s and subtract. For the second part, note that (2 − 1) is holomorphic in C and has a zero at s = 1.

1.7. Some elementary results Before to face the Prime Number Theorem it is a good idea to begin with a more modest target. Mertens’ result (in its simplest formulation) and Dirichlet’s result about the mean value of the divisor function will be two good tests. We need a preliminary upper bound for π(x). When we split the integers ≤ x in couples, each couple contains at most one prime; this simple remark gives the upper bound π(x) ≤ bx/2c which with a more analytical language we write as x π(x) ≤ + O(1). 2 This idea can be pushed far away. Let we take another integer, say 6. We split the integers up to x into distinct blocks containing six consecutive integers each one. Besides the first block, in each other block only two primes can appear, at most, because any term which is not coprime to 6 cannot be a prime when it is 30 1.7. SOME ELEMENTARY RESULTS not a divisor of 6 self, and there are only ϕ(6) = 2 integers coprime to 6 in every  2x  block. This proves that π(x) ≤ 6 + R where R ≤ 6. In other words, 2x π(x) ≤ + O(1). 6 Repeating this argument with a generic integer N, we get that ϕ(N) (1.18) π(x) ≤ x + O(N), N where O(N) denotes a quantity which is ≤ N. This innocent bound shows that π(x) ϕ(N) lim sup ≤ ∀N, x→∞ x N which is an interesting upper bound, since now we have the possibility to act on the parameter N in order to bound π(x) in a non-trivial way. For instance, let N Q be the product of all prime numbers below L (a new parameter): N = p≤L p. Then ϕ(N) Y  1 Y  1 = 1 − = 1 − . N p p p|N p≤L The inequality 1 − y ≤ e−y shows that

ϕ(N) − P 1 (1.19) ≤ e p≤L p . N P 1 We have already proved that p≤L p → ∞ as L diverges, this implies that ϕ(N) lim inf = 0 N→∞ N so that from (1.18) we conclude that (1.20) π(x) = o(x). With a slightly bigger effort we can improve this result. From (1.18), (1.19) Q and (1.3) with N = p≤L p we get L π(x) ϕ(N) N  − P 1 N  O(1) L  ≤ + O ≤ e p≤L p + O ≤ + O x N x x ln L x Q L because N = p≤L p ≤ L . The bound suggests to select L in such a way that the two terms have (approximatively) the same size, in other words in such a way that LL/x ≈ 1/ ln L, i.e., LL ln L ≈ x. The equation LL ln L = x has a complicated ln x solution for L = L(x), but it is easy to see that L(x) is asymptotic to ln ln x . Setting ln x L = ln ln x the bound becomes π(x) O(1) ≤ + O(exp(L ln L − ln x)) x ln ln x CAP. 1: PRIME NUMBER THEOREM 31

O(1)   ln x  ln x   = + O exp ln − ln x ln ln x ln ln x ln ln x O(1)   ln x ln ln ln x O(1)  1  = + O exp − = + O ln x . ln ln x ln ln x ln ln x (ln ln x) ln ln x This bound proves the following result. x Proposition 1.10 π(x)  ln ln x as x → ∞. The previous claim is interesting, but it is still far from the truth. Next result is still elementary in tools but represents a major improvement on all previous x results because it claims that π(x)  ln x : it is due to Chebyshev. 2n n Let n be any positive integer, then the binomial n is lower than 4 , because 2n 2n X 2n ≤ = (1 + 1)2n = 4n. n k k=0 On the other hands, every prime in (n, 2n] divides (2n)! but does not divide n!, hence Y 2n p ≤ ≤ 4n n n 4 when x is large enough. With some care in some steps we can improve the result to c = ln 4 = 1.386 ..., for every x. With some extra tricks Chebyshev succeeded to CAP. 1: PRIME NUMBER THEOREM 33

6 1/2 1/3 1/5 1/30 further refine it up to prove that 5 ln(2 3 5 /30 ) = 1.105 ... is a possible value for c. Chebyshev was also able to apply his method to deduce a lower bound for π(x), 0 0 x namely that there exists a constant c > 0 such that π(x) ≥ c ln x for every x, and that when x is large enough one can take c0 = ln(21/231/351/5/301/30) = 0.921 .... Following a very nice argument of Nair5 we prove an only slightly weaker bound. n Proposition 1.13 π(n) ≥ ln 2 ln n for every integer n ≥ 7; hence π(x) ≥ (ln 2 + x o(1)) ln x as x → ∞.

Proof. Let dn denote the least common multiple of 1, 2, . . . , n and for every pair of integers m, n with 1 ≤ m ≤ n let Z 1 I(m, n) := xm−1(1 − x)n−m dx. 0 Using the binomial formula we have n−m n−m Z 1 X n − m X n − m (−1)k I(m, n) = (−1)kxk+m−1 dx = , k k k + m 0 k=0 k=0 proving that dnI(m, n) is an integer. On the other hand, with an integration by parts one gets that xm−1(1 − x)n−m+1 1 m − 1 Z 1 m−2 n−m+1 I(m, n) = − + x (1 − x) dx n − m + 1 0 n − m + 1 0 m − 1 = I(m − 1, n) ∀m ≥ 2, n − m + 1 1 and since it is evident that I(1, n) = n , by induction (in m) one proves that n  −1 n  I(m, n) = [m m ] for every m and n. Thus we have proved that m m | dn so that, in particular,  n  d ≥ c := dn/2e ∀n. n n dn/2e 6 7 A direct computation proves that cn+2/cn ≥ 4 for every n , and since c7 = 140 > 2 8 n and c8 = 280 > 2 , we deduce that cn ≥ 2 holds for every n ≥ 7. On the other

5see M. Nair: On Chebyshev-type inequalities for primes, Amer. Math. Monthly 89(2), 126–129, 1982. 6In order to check the inequality it is convenient to split the cases according to the parity of n. When n = 2k (hence k ≥ 1) one gets

2k+2 2 c2k+2 (k + 1) k+1 (k + 1)(2k + 2)!k! 2(2k + 1) = 2k = 2 = > 4, c2k  k(k + 1)! (2k)! k k k 34 1.7. SOME ELEMENTARY RESULTS

π(n) hand, dn ≤ n because there are π(n) distinct primes dividing dn and when a a p kdn then p ≤ n. Thus

π(n) ln n ≥ ln dn ≥ ln cn ≥ n ln 2 ∀n ≥ 7, which is the claim.  We can now prove Mertens’ result. We need the elementary equality ∞ Y X  N  N! = pνp , ν = . p pk p k=1 Note that in this product only finitely many factors are different from 1 (in fact N/pk = 0 when pk > N), so that there is no need to discuss its convergence. This equality comes from the remark that there are bN/pc integers among 1,...,N which are divisible by p, and this is the first contribution to νp. Among these ones, there are N/p2 which are divisible by p2, so that also N/p2 must be added to produce νp, and so on. In terms of the von Mangoldt Λ-function we deduce that ∞ X X X  N  X N  ln(N!) = ν ln p = ln p = Λ(n). p pk n p p k=1 n The sum is naturally restricted to integers n ≤ N. We approximate bN/nc by N/n, with an error which is bounded by 1 (it is the fractional part of N/n), so that X Λ(n)  X  ln(N!) = N + O Λ(n) n n≤N n≤N so that by Proposition 1.11 we get X Λ(n) ln(N!) = N + O(N). n n≤N At last, the Stirling formula (1.10) gives ln(N!) = N ln N + O(N), therefore X Λ(n) = ln N + O(1), n n≤N and when n = 2k − 1 (hence k ≥ 1, again) one gets 2k+1 c2k+1 (k + 1) k+1 (k + 1)(2k + 1)!k!(k − 1)! 2(2k + 1) = 2k−1 = = > 4. c2k−1  k(k + 1)!k!(2k − 1)! k k k CAP. 1: PRIME NUMBER THEOREM 35 which is an interesting formula giving the exact asymptotic of a sum intimately connected to the primes numbers (actually we have obtained also an explicit bound for the remainder term). We can further modify this equality. In fact, X X ln p X X 1 X ln p 1 X ln p = ln p = = < +∞ pk pk p2 1 − 1/p p(p − 1) p k≥2 p k≥2 p p therefore X Λ(n) X ln p X ln p X ln p = + = + O(1) n p pk p n≤N p≤N pk≤N p≤N k≥2 so that the previous result gives X ln p (1.24) = ln N + O(1). p p≤N P From this asymptotic we can extract the asymptotic of p≤N 1/p, which is the celebrated result of Mertens. P 1 Proposition 1.14 There exists a constant c ∈ R such that p≤N p = ln ln N + c + 1  0 Q 1  O ln N . Moreover, there exists another constant c ∈ R such that p≤N 1− p = 0 ec 1  ln N 1 + O ln N . A different and more complicated argument shows that c0 = −γ, the (opposite of the) Euler–Mascheroni constant (see [HW], Theorem 429, or [MV] Theorem 2.7).

P 1 P ln n 1 Proof. We write p≤N p as n≤N δP (n) n · ln n and we use the partial summa- ln n tion formula (Proposition 1.4) with f(n) = δP (n) n and g(n) = 1/ ln n. Then X 1 X Z N F (N) Z N F (x) = f(n)g(n) = F (N)g(N) − F (x)g0(x) dx = + dx. p ln N x ln2 x p≤N n≤N 2 2 By (1.24) we know that F (N) = ln N + R(N) with R(N) = O(1), so that

X 1 ln N + R(N) Z N ln x + R(x) = + dx p ln N x ln2 x p≤N 2  1  Z N dx Z N R(x) = 1 + O + + 2 dx ln N 2 x ln x 2 x ln x  1  Z N R(x) = ln ln N + 1 − ln ln 2 + O + 2 dx. ln N 2 x ln x 36 1.7. SOME ELEMENTARY RESULTS

The last integral converges absolutely to +∞, since R(x) = O(1), thus

Z +∞ R(x)  1  Z +∞ R(x) = ln ln N + 1 − ln ln 2 + 2 dx + O − 2 dx 2 x ln x ln N N x ln x which is the claim with c := 1 − ln ln 2 + R +∞ R(x) dx , since R +∞ R(x) dx  1 . 2 x ln2 x N x ln2 x ln N The second claim comes from the identity X  1 X 1 X h  1 1i ln 1 − = − + ln 1 − + . p p p p p≤N p≤N p≤N

In fact the second sum converges, since ln(1 − x) + x = O(x2), so that

X 1 X h  1 1i X h  1 1i = − + ln 1 − + + ln 1 − + p p p p p p≤N p p>N X 1 X h  1 1i  X 1  = − + ln 1 − + + O p p p p2 p≤N p p>N X 1 X h  1 1i  1  = − + ln 1 − + + O , p p p N p≤N p 0 P  1  and now the conclusion comes from the first claim, with c := c + p ln 1 − p + 1  p .  Mertens spent a great effort in improving his result. In its stronger formulation he proved that X 1 d − ln ln N − c ≤ , p N p≤N for a couple of explicit constants c and d. The second ‘preparatory’ result is the following statement, due to Dirichlet, giving the mean value of the divisor function. P Proposition 1.15 Recall that d(n) := σ0(n) = d|n 1. Then X √ d(n) = N ln N + (2γ − 1)N + O( N) as N → ∞. n≤N By Stirling’s Formula (1.10), this result can be written also as √ X ∆(N)  N, where ∆(N) := (ln n − d(n) + 2γ). n≤N CAP. 1: PRIME NUMBER THEOREM 37

This result shows that d(n) is ln n + 2γ ‘in mean’, i.e., that a ‘typical’ integer n has ln n + 2γ divisors: this claim acquires interest when compared to the fact that ln n/ ln p is exactly the number of divisors of n when n is a power of p, and is the first manifestation of a general phenomenon (that can be exactly proved but that is out of reach of this course): the ‘typical’ integer is a product of many small primes.

Proof. The result follows quite easily from a very clever idea, called hyperbola method, devised by Dirichlet. We note that X X X X d(n) = 1 = 1, n≤N n≤N d|n m,n mn≤N so that the sum counts the number of points in the lattice Z × Z which are in the first quadrant and below the hyperbola XY = N. The symmetry√ of the problem shows that this number is 2 times the points whose abscissa√ is ≤ N diminished by the√ points which are contained in the square of length N, with an error of order N due to the points which could sit on the border of this square and that are counted twice.

Y

√ N

XY = N

√ N X

Figure 1.2.

In other words, we have X X N  √ d(n) = 2 − N + O( N). √ n n≤N n≤ N Approximating  N  with N we introduce an error term of order 1 in the summand n √ n and hence of order N to the sum, getting 38 1.7. SOME ELEMENTARY RESULTS

X X 1 √ d(n) = 2N − N + O( N), √ n n≤N n≤ N and the claim follows by the formula (1.6).  The exact size of ∆(N) is not known and its determination is called the divisors problem. Hardy and Landau proved in 1916 that ∆(N) = Ω(N 1/4), and it is believed that N 1/4 is the ‘right’ order of ∆(N). The best bound known is due to Huxley [2003]: ∆(N) is  N 131/416(ln N)2.26. This result is the culminating point of a very complicated strategy with fundamental contributes from Voronoi, Littlewood, Chen, Vinogradov and Iwaniec (just to cite few of them).

P∞ zk Exercise. 1.29 Let f(z) := k=1 1−zk . 1) Prove that this function is well defined in {z ∈ C: |z| < 1} and that the circle |z| = 1 is its natural boundary (i.e., f cannot be analytically continued in any open set containing the disk |z| ≤ 1).

P∞ n 2) Prove that f(z) = n=1 d(n)z for |z| < 1.

ln(1−z) − 3) Prove that f(z) ∼ − 1−z for z → 1 , z ∈ R. Hint: For the last step use the result in Proposition 1.15 and the result in [Titch], Sec. 7.5 p. 224.

Exercise. 1.30 Let τ ∈ C, Re(τ) > 1. Prove that X ζ(τ + 1) σ (n) = N τ+1 + O(N Re(τ)) as N → ∞. τ τ + 1 n≤N What happens if Re(τ) ∈ (0, 1)? P P τ P P τ Hint: Use the identity n≤N στ (n) = m,n: mn≤N n = m≤N n≤N/m n and Proposition 1.6 to deal with the inner sum.

Exercise. 1.31 Let g : N → C be the arithmetical function giving the Dirichlet coefficients of the Dirichlet series 1/ζ(2s). 1) Prove that g(k2) = µ(k) for every integer k, and that g(n) = 0 for every non-square n. P 2) Recalling Ex. 1.21, prove that |µ(n)| = d|n g(d). Deduce that X X X X X X j x k |µ(n)| = g(m) = µ(k) = µ(k) 1 = µ(k) 2 . √ √ k n≤x n,m n,k k≤ x n≤x/k2 k≤ x nm≤x nk2≤x CAP. 1: PRIME NUMBER THEOREM 39

3) Use the previous equality to deduce that X x √ ]{n ∈ : n ≤ x, n is squarefree} = |µ(n)| = + O( x). N ζ(2) n≤x Roughly speaking, this result shows that a randomly chosen integer lower than x is squarefree with a probability which is approximatively 61%, if x is large enough.

1.8. The Prime Number Theorem Propositions 1.12 and 1.13 together show that x π(x)  ; ln x the Prime Number Theorem claims that they are asymptotically equal, i.e. that x (1.25) π(x) ∼ as x → ∞. ln x It was conjectured by Legendre and independently by Gauss7. We will prove it in the following stronger form. R x du Theorem 1.4 (Prime Number Theorem) Let li(x) := 2 ln u . For every constant A > 0,  x  π(x) = li(x) + OA as x → ∞. lnA x As asymptotic formula, this result was proved by Hadamard and de la Vall´ee- Poussin (independently) in 1896, following the ideas proposed by Riemann in his celebrated (and unique) paper in number theory published in 1851. A proof which does not involve was found in 1949 by Erd˝osand Selberg (it was an ‘unintentional collaboration’8). We will follow a more recent approach, devised by Bombieri and Wirsing, which has the merit to produce that estimation for the remainder term with a level of difficulty substantially equivalent to the one of the previous asymptotic proofs. The best estimation known for the remainder is  x exp(−c(ln x)3/5/(ln ln x)1/5) for some c > 0, which is due to Vinogradov, but its proof is too difficult for a primer course in Analytic Number Theory.

7with only an intuitive meaning for the symbol ∼, since the notion of asymptotic equality was not yet formalized at that time. Actually, Legendre’s claim was that x π(x) ∼ ln x + c with an explicit c = −1.08366, while Gauss’ claim was that Z x du π(x) ∼ . 2 ln u When ∼ is used with the modern meaning, both the claims become equivalent to (1.25). 8To fully understand this claim see for example Goldfeld’s paper freely available at the web address: www.math.columbia.edu/~goldfeld/ErdosSelbergDispute.pdf 40 1.8. THE PRIME NUMBER THEOREM

Remark. 1.3 We notice that li(x) = x − 2 + R x du with R x du ∼ x . As ln x ln 2 2 ln2 u 2 ln2 u ln2 x a consequence the theorem both proves that x |π(x) − li(x)| A lnA x for every A, and that x x π(x) − ∼ , ln x ln2 x while the bound as conjectured by Legendre would imply that x x x π(x) − ≥ (c + o(1)) ≥ (1.08366 + o(1)) . ln x ln2 x ln2 x This shows that Gauss’ conjecture is closer to the truth than Legendre’s one.  Aim of this section is a proof of this theorem. Firstly we reformulate the problem. Proposition 1.16 (Chebyshev) The following claims are equivalent: x 1A. π(x) ∼ ln x , P 2A. ϑ(x) ∼ x, where ϑ(x) := p≤x ln p, P 3A. ψ(x) ∼ x, where ψ(x) := n≤x Λ(n). Also the following claims are equivalent: 1B. π(x) − li(x) = O x , for every A > 1, A lnA x 2B. ϑ(x) − x = O x , for every A > 1, A lnA x 3B. ψ(x) − x = O x , for every A > 1. A lnA x Proof. √ Equality (1.23) says that ψ(x) = ϑ(x) + O( x), so that equivalences 2A ⇐⇒ 3A and 2B ⇐⇒ 3B immediately follow. 1A =⇒ 2A. The definition of ϑ and the monotonicity of ln produce the double bound X (1.26) (π(x) − π(x/ ln x)) ln(x/ ln x) ≤ ln p ≤ ϑ(x) ≤ π(x) ln x. x ln x

Using the result in Proposition 1.12 (but also the weaker Proposition 1.10 or the even weaker bound (1.20) suffice) we get ϑ(x) ϑ(x)  x  ≤ π(x) ≤ + O ln x ln x − ln ln x ln2 x so that it is now clear that 2A implies 1A.

Now the proof of the other set of relations. 1B =⇒ 2B. The partial summation formula and the decomposition ϑ(x) = P P p≤x ln p = n≤x δP (n) · ln n give Z x π(u) ϑ(x) = π(x) ln x − du. 2 u Moreover, Z x li(u) du = ln(x) li(x) − x + 2, 2 u (because in our definition, li(2) = 0) therefore Z x du ϑ(x) − x = (π(x) − li(x)) ln(x) − (π(u) − li(u)) + O(1), 2 u which shows immediately that 1B implies 2B. 2B =⇒ 1B. The partial summation formula and the decomposition π(x) = P P 1 p≤x 1 = n≤x δP (n) ln n · ln n give ϑ(x) Z x ϑ(u) du π(x) = + 2 . ln x 2 u ln u Moreover, x Z x du 2 li(x) = + 2 − , ln x 2 ln u ln 2 so that ϑ(x) − x Z x ϑ(u)  du π(x) − li(x) = + − 1 2 + O(1), ln x 2 u ln u which shows immediately that 2B implies 1B.  The following statement is similar to the previous ones, but has a more com- plicated proof. P Proposition 1.17 Let M(x) := n≤x µ(n). Then: 1. ψ(x) ∼ x if and only if M(x) = o(x); 2. ψ(x) − x = O x  if and only if M(x)  x , for every A > 1. A lnA x A lnA x 42 1.8. THE PRIME NUMBER THEOREM

Proof. We prove only the implications ⇐=, for the opposite implications see [IK], p. 33. We have proved that Λ = µ ∗ ln, 1 = µ ∗ d and δ = µ ∗ 1. Therefore, X Λ(n) − 1 + 2γδ(n) = µ(k)(ln(n/k) − d(n/k) + 2γ). k|n Adding this equality for n ≤ x we have X X ψ(x) − x + O(1) = µ(k)(ln(n/k) − d(n/k) + 2γ) n≤x k|n X = µ(m)(ln(n) − d(n) + 2γ), m,n mn≤x where the Big-O term appears to take account of the 2γ constant and the fact that {x} can be non-zero. We can write this sum in two different ways, namely: X X as (ln(n) − d(n) + 2γ) µ(m), n≤x m≤x/n X X and as µ(m) (ln(n) − d(n) + 2γ). m≤x n≤x/m Actually, both of them are not useful for us per se, however we introduce a new parameter K and we split the original sum according to the first formula when n ≤ K, and to the second formula when K < n ≤ x. In this way we get: X ψ(x) − x + O(1) = µ(m)(ln(n) − d(n) + 2γ) m,n mn≤x X X = µ(m)(ln(n) − d(n) + 2γ) + µ(m)(ln(n) − d(n) + 2γ) m,n m,n mn≤x mn≤x n≤K n>K X X = (ln(n) − d(n) + 2γ) µ(m) n≤K m≤x/n X X + µ(m) (ln(n) − d(n) + 2γ) m≤x/K K

In absolute value this means that X |ψ(x) − x| ≤ O(1) + | ln(n) − d(n) + 2γ| · |M(x/n)| n≤K X + (|∆(x/m)| + |∆(K)|). m≤x/K Recalling the result in Proposition 1.15 we get X |ψ(x) − x| ≤ O(1) + | ln(n) − d(n) + 2γ| · |M(x/n)| n≤K  X √  + O (px/m + K) , m≤x/K and recalling the result in Ex. 1.12 we conclude that there exists a constant c > 0 independent of K such that X √ (1.27) |ψ(x) − x| ≤ O(1) + | ln(n) − d(n) + 2γ| · |M(x/n)| + cx/ K. n≤K

Suppose√ that M(x) = o(x). Let  > 0 be arbitrarily fixed. We first choose K such that c/ K < , then we choose x large enough to have X | ln(n) − d(n) + 2γ| · |M(x/n)| ≤ x ∀x ≥ x : n≤K such an x exists, because M(x) = o(x) and K is fixed. As a consequence

|ψ(x) − x| ≤ 2x + O(1) ∀x > x, and this means that ψ(x) = x + o(x), i.e. ψ(x) ∼ x. Now, suppose M(x)  x . Fix a (arbitrary) value for A, and take K = lnA/2 x A lnA x in (1.27). Then X |ψ(x) − x| ≤ | ln(n) − d(n) + 2γ| · |M(x/n)| + O(x/ lnA/4 x) n≤lnA/2 x  X x/n  ≤ O (ln(n) + d(n) + 2γ) + O(x/ lnA/4 x) A (ln x − ln n)A n≤lnA/2 x  x X  = O (ln(n) + d(n) + 2γ) + O(x/ lnA/4 x). A (ln x − A ln ln x)A n≤lnA/2 x 44 1.8. THE PRIME NUMBER THEOREM P P Recalling that n≤N ln n ∼ N ln N (Stirling) and that n≤N d(n) ∼ N ln N (Dirichlet, see Prop. 1.15), we have  x  = O lnA/2 x ln ln x + O(x/ lnA/4 x) A (ln x − A ln ln x)A A/4 = OA(x/ ln x), which is the claim.  Exercise. 1.32 Prove that if ` ∈ N and Re(s) > 1, then Z +∞ (ln x)` `! s dx = `+1 . 1 x (s − 1) u/(s−1) Hint: Prove the equality for s ∈ R, s > 1 with the substitution x = e . Then observe that both RHS and LHS are holomorphic functions in Re(s) > 1 and apply the identity principle for holomorphic functions to conclude that the equality holds in the entire half-plan Re(s) > 1. Alternatively, apply the substitution x = eu/(s−1) to deduce that Z +∞ ` Z (ln x) 1 ` −u s dx = `+1 u e du 1 x (s − 1) Γ where Γ is a ray from 0 to ∞ contained in the right side of C (here the hypothesis Re(s) > 1 is used), then prove that Γ can be deformed to the positive real line and R +∞ ` −u recall that 0 u e du = `!. Alternatively (again!) prove the claim for ` = 0 directly, and deduce the claim for general ` taking the `th derivative of claim for ` = 0 (but you need a good argument allowing you to pass the derivative inside the integral). Exercise. 1.33 Prove that if ` ∈ N and N → ∞, then Z +∞ (ln x)` (ln N)` 2 dx ` . N x N Hint: The claim is true when ` = 0. Then apply an integration by parts to get Z +∞ (ln x)` (ln N)` Z +∞ (ln x)`−1 2 dx = + ` 2 dx N x N N x and use the inductive hypothesis. Exercise. 1.34 (Perron) Prove that for every σ > 0 and for every y > 0, Z σ+i∞ s ( 1 y + ln y if y ≥ 1, 2 ds = ln y := 2πi σ−i∞ s 0 if y ∈ (0, 1]. Hint: Let T > 0. If y < 1 take the integral along the path σ − iT → T − iT → T + iT → σ + iT → σ − iT (this means that the original integration path is CAP. 1: PRIME NUMBER THEOREM 45 deformed to a path extending itself to the right half-plain: this is useful because ys is small here). The integral is zero by Cauchy, then put T → +∞. If y > 1, take the integral along the path σ −iT → −T −iT → −T +iT → σ +iT → σ −iT . (this means that the original integration path is deformed to a path extending itself to the left half-plain: this is useful because ys is small here). The integral is equal to ln y which is the residue at 0, then put T → +∞. Exercise. 1.35 (Perron) Prove that for every σ > 0, for every y > 0 and for every integer κ ∈ N, Z σ+i∞ s κ! y + κ κ+1 ds = (ln y) . 2πi σ−i∞ s The formula is correct also for κ = 0, but in this case the integral is not absolutely R σ+iT convergent and must be defined as limT →∞ σ−iT (and in that case produces the value 0 for y < 1, 1/2 when y = 1 and 1 for y > 1). What happens if κ is taken in C? (difficult). Exercise. 1.36 Let f be a function admitting derivatives of every order. Then 0 00 000  1 (`) X (a1 + a2 + a3 + ··· )!−f a1 −f a2 −f a3 f = `! ··· f a1!a2!a3! ··· 1!f 2!f 3!f a1,a2,a3,...≥0 a1+2a2+3a3+···=` Hint: The claim is evident for ` = 0, then derive the equality for ` obtaining to the LHS  1 (`+1) f 0  1 (`) f + f . f f f Now move the second term to the RHS and applying the inductive hypothesis the claim for ` + 1 follows (after a very tedious computation). Alternatively, this is the case g(x) = 1/x of Fa`adi Bruno’s formula for the `th derivative of g(f(x)).9 A few words are necessary in order to understand the main points of the proof of the Prime Number Theorem. We will prove it by proving a bound for P M(x) = n≤x µ(n), and using Proposition 1.17 to deduce an analogous bound for ψ(x) − x and Proposition 1.16 to get the proposition for π(x) − li(x). The sum M(x) can be represented as a complex integral (via a celebrated formula due to Perron) involving the Dirichlet series associated with the M¨obiusfunction. This function is 1/ζ(s), thus in order to get an upper-bound for the M(x) we need an upper-bound for 1/ζ(s), i.e. a lower-bound for ζ(s). This means that we need some kind of control on the positions of the zeros of ζ(s) (besides the control on the singularities of ζ(s)). The result in Proposition 1.18 here below is the key

9For a nice presentation of this result see the paper of Warren P. Johnson: The Curious History of Fa`adi Bruno’s Formula, Amer. Math. Monthly 109(3), 217–234, 2002. 46 1.8. THE PRIME NUMBER THEOREM ingredient allowing to deduce a lower-bound for ζ(s) from an analogous upper- bound for ζ(s): although given with a different language, it was already present in the original works of Hadamard and de la Vall´ee-Poussin. The argument will be concluded with a standard application of the Euler-Maclaurin formula giving the necessary upper-bound for ζ(s) (see Proposition 1.19). The good bound for the remainder term will come out applying this approach to the generic derivative of ζ(s) and not to ζ(s) alone.

Proposition 1.18 Let σ > 1 and t ∈ R, then ζ(σ)3|ζ(σ + it)|4|ζ(σ + 2it)|2 ≥ 1. Proof. In the region Re(s) > 1 the Riemann zeta function is represented by its Euler product and has no zeros, therefore ln ζ(s) is well defined here and can be computed as series (over p) of the logs of every prime-factor, thus ln(ζ(σ)3|ζ(σ + it)|4|ζ(σ + 2it)|2) = ln(ζ(σ)3ζ2(σ + it)ζ2(σ + it)ζ(σ + 2it)ζ(σ + 2it)) = ln(ζ(σ)3ζ2(σ + it)ζ2(σ − it)ζ(σ + 2it)ζ(σ − 2it)) Xh  1   p−it   pit   p−2it   p2it i =− 3 ln 1− +2 ln 1− +2 ln 1− +ln 1− +ln 1− pσ pσ pσ pσ pσ p ∞ X X 1 = (3 + 2p−imt + 2pimt + p−2imt + p2imt) mpmσ p m=1 ∞ X X 1 = (1 + pimt + p−imt)2 ≥ 0. mpmσ  p m=1 Corollary 1.5 ζ(1 + it) 6= 0 for every t ∈ R. Proof. By absurd, suppose that ζ has a zero along the vertical line σ = 1, so that there exists t0 such that ζ(1 + it0) = 0. t0 6= 0, because ζ has a simple pole at s = 1. ζ is holomorphic in an open neighborhood of 1 + it0, therefore n ∗ ζ(s) ∼ c(s − 1 − it0) for some constant c ∈ C and some integer n ≥ 1. Moreover, ζ(s) is regular in a neighborhood of 1 + 2it0 hence from the inequality proved in the previous proposition we deduce that 3 4 2 −3 4n + 1 ≤ ζ(σ) |ζ(σ + it0)| |ζ(σ + 2it0)|  (σ − 1) (σ − 1) as σ → 1 , which is impossible because 4n − 3 > 0.  Remark. 1.4 There are other ways to prove the previous corollary. One of them, actually a particularly elegant one, is due to Ingham and is worthy of mention here. Suppose by absurd that ζ(1 + it0) = 0. Evidently t0 6= 0 and by the Schwarz CAP. 1: PRIME NUMBER THEOREM 47 reflection principle it follows that ζ(1 − it0) is 0, as well. The function F (s) := P∞ 2 −s ζ(2s) n=1 |σit0 (n)| n is a Dirichlet series with positive Dirichlet coefficients. The Ramanujan identity in Ex. 1.19 shows that this function equals ζ2(s)ζ(s − 2 it0)ζ(s + it0), so that it is holomorphic in C (the double pole of ζ (s) at s = 1 is cancelled out by the two zeros ζ(1+it0) and ζ(1−it0)). By the Landau Theorem 1.2 the Dirichlet series representing F (s) converges at every complex point, but this P∞ 2 −s is impossible since the representation F (s) := ζ(2s) n=1 |σit0 (n)| n shows that the Dirichlet coefficients f(n) in F (s) corresponding to squares are greater than 1, so that the series cannot converge when Re(s) ≤ 1/2.  Proposition 1.19 Let Re(s) > 1, then `! (−1)`ζ(`)(s) = + O (ln(3|s|))`+1 (s − 1)`+1 ` where the symbol O` means that the implicit constant depends on `, so that the result is not uniform in this parameter. Proof. In the region Re(s) > 1 we have ∞ X (ln n)` (−1)`ζ(`)(s) = . ns n=1 We introduce a parameter N whose value will be chosen later, and we split the P P series as n 1, therefore X (ln n)` X (ln n)` X 1 ≤ ≤ (ln N)`  (ln N)`+1. ns nσ n n

(here we have used the assumption Re(s) > 1 and the bound |x−s−1| ≤ x−2). Recalling Ex. 1.32 we can write this bound also as

Z ∞ (ln x)` Z N (ln x)`  (ln N)`  = s dx − s dx + O` |s| 1 x 1 x N `!  (ln N)`  = + O(ln N)`+1 + O |s| . (s − 1)`+1 ` N Collecting the results, we see that `!  (ln N)`  (−1)`ζ(`)(s) = + O(ln N)`+1 + O |s| , (s − 1)`+1 ` N when Re(s) > 1, for every choice of the parameter N ≥ 1. We get the claim by setting N = 3|s|.  Finally we can attack the proof of the bound x M(x) A ∀A > 1. lnA x Suppose s ∈ C, with Re(s) > 1 and ` ∈ N. Let ∞ X G(s) := (−1)`(1/ζ)(`)(s) = µ(n)(ln n)`/ns. n=1 Perron’s formula in Ex. 1.34 gives10 ∞ 1 Z σ+i∞ ds 1 Z σ+i∞ X X s ds G(s)Xs = µ(n)(ln n)` 2πi s2 2πi n s2 σ−i∞ σ−i∞ n=1 ∞ X 1 Z σ+i∞ X s ds = µ(n)(ln n)` 2πi n s2 n=1 σ−i∞

10 Recall that ∞ ∞ Z σ+i∞ s Z σ+it 1 X ` X  ds 1 X ` X  dt µ(n)(ln n) = µ(n)(ln n) 2πi n s2 2π n (σ + it)2 σ−i∞ n=1 R n=1 so that ∞ ∞ Z σ+it Z σ 1 X ` X  1 1 X ` X  dt µ(n)(ln n) dt ≤ (ln n) 2π n (σ + it)2 2π n σ2 + t2 R n=1 R n=1 ∞ ` Z σ X (ln n) 1 dt = X · · < +∞, nσ 2π σ2 + t2 n=1 R because σ > 1 by hypothesis. By Fubini’s Theorem, the convergence of this integral allows one to exchange the integral and the series. CAP. 1: PRIME NUMBER THEOREM 49 X = µ(n)(ln n)` ln(X/n) =: F (X). n≤X We use this equality to deduce a bound for F (X) from a bound for G(s). We need some intermediate bounds. Proposition 1.19 gives 1 ζ(s) = + O(ln(3|s|)) Re(s) > 1. s − 1 From Proposition 1.18 we have for |t| > 1 and Re(s) ∈ (1, 2) 1 1 ≤ ζ(σ)3|ζ(σ + it)|4|ζ(σ + 2it)|2  |ζ(s)|4(ln(3|s|))2 (σ − 1)3 (the bound ζ(σ)  (σ−1)−1 holds because Re(s) ∈ (1, 2), and ζ(σ+2it)  ln(3|s|) because |t| > 1) so that 1 (1.28)  (σ − 1)−3/4(ln(3|s|))1/2, Re(s) ∈ (1, 2), |t| > 1. |ζ(s)| This bound has been proved under the assumption that |t| > 1, but it is true 1 also for |t| ≤ 1 (and Re(s) ∈ [1, 2]), since in this set the function |ζ(s)| is bounded (the pole of ζ(s) at 1 produces a zero for 1/ζ(s), there are no zeros for ζ(s) here by Corollaries 1.3 and 1.5, so that 1/ζ(s) is continue and hence bounded in the compact [1, 2] × [−1, 1]). Therefore, we can conclude that 1 (1.29)  (σ − 1)−3/4(ln(3|s|))1/2, Re(s) ∈ (1, 2), |ζ(s)| which is the lower-bound for |ζ(s)| that we have mentioned in the introduction to the proof of the Prime Number Theorem. Let ζ∗(s) := (s − 1)ζ(s). This function is essentially equivalent to the Riemann zeta function, but has a better behavior for s → 1. Since (ζ∗)(`)(s) = (s − 1)ζ(`)(s) + `ζ(`−1)(s), from Proposition 1.19 (applied two times, to ζ(`)(s) and ζ(`−1)(s)) we get that ∗ (`) `+1 (1.30) (ζ ) (s) ` |s|(ln(3|s|)) Re(s) ∈ (1, 2), and from (1.28) that 1  |s|−1(σ − 1)−3/4(ln(3|s|))1/2, Re(s) ∈ (1, 2), |t| > 1. |ζ∗(s)| As before, this bound is proved for |t| > 1, but it holds also in |t| ≤ 1, because 1 |ζ∗(s)| is continuous in the compact [1, 2] × [−1, 1]. Therefore, we have also 1  |s|−1(σ − 1)−3/4(ln(3|s|))1/2, Re(s) ∈ (1, 2), |ζ∗(s)| 50 1.8. THE PRIME NUMBER THEOREM and using this bound and (1.30) we deduce ∗ (`) (ζ ) (s) − 3 `+ 3  (σ − 1) 4 (ln(3|s|)) 2 Re(s) ∈ (1, 2). ζ∗(s) ` The formula in Ex. 1.36 shows that ∗ 0 ∗ 00 1 X |(ζ ) (s)|a1 |(ζ ) (s)|a2 |(1/ζ∗)(`)(s)|  ··· |ζ∗(s)| |ζ∗(s)| |ζ∗(s)| a1,a2,a3,...≥0 a1+2a2+3a3+···=` so that − 3 ∗ (`) (σ − 1) 4 1 |(1/ζ ) (s)|  (ln(3|s|)) 2 |s| X − 3 (a +a +··· ) (1+ 3 )a +(2+ 3 )a +··· · (σ − 1) 4 1 2 (ln(3|s|)) 2 1 2 2 .

a1,a2,a3,...≥0 a1+2a2+3a3+···=` − 3 We take σ ∈ (1, 2); in this range (σ − 1) 4 > 1 and a1 + a2 + · · · ≤ `, hence 3 3 − 4 (a1+a2+··· ) − 4 ` 3 3 (σ − 1) ≤ (σ − 1) . Moreover, (1 + 2 )a1 + (2 + 2 )a2 + · · · ≤ (1 + 3 3 5 3 3 5 (1+ 2 )a1+(2+ 2 )a2+··· 2 ` 2 )a1 + 2(1 + 2 )a2 + ··· = 2 `, therefore (ln(3|s|)) ≤ (ln(3|s|)) . In this way we obtain the upper-bound ∗ (`) −1 − 3 (`+1) 5 (`+1) (1/ζ ) (s) ` |s| (σ − 1) 4 (ln(3|s|)) 2 Re(s) ∈ (1, 2). At last, using the representation 1/ζ(s) = (s − 1)/ζ∗(s) giving  1 (`)  1 (`)  1 (`−1) = (s − 1) + ` , ζ(s) ζ∗(s) ζ∗(s) and the previous bound, we get ` (`) − 3 (`+1) 5 (`+1) (1.31) G(s) = (−1) (1/ζ) (s) ` (σ − 1) 4 (ln(3|s|)) 2 Re(s) ∈ (1, 2). Plugging this bound into the integral formula for F (X) we have Z σ+i∞ Z 1 s ds σ dt F (X) = G(s)X 2  |G(s)|X 2 2 2πi σ−i∞ s R σ + t Z 3 5 dt − 4 (`+1) 2 (`+1) σ ` (σ − 1) (ln(3|σ + it|)) X 2 2 R σ + t σ − 3 (`+1) ` X (σ − 1) 4 . This bound being uniform in σ ∈ (1, 2), for σ = 1 + 1/ ln X we have X ` 3 (`+1) (1.32) µ(n)(ln n) ln(X/n) = F (X) ` X(ln X) 4 , n≤X uniformly in X. CAP. 1: PRIME NUMBER THEOREM 51

Remark. 1.5 We take a break, now, to discuss the result we have just proved and P ` its importance for our purpose. Consider the quantity n≤X µ(n)(ln n) ln(X/n). In this sum there are X terms; lot of them have order ln X, but they have different signs as a consequence of the factor µ(n). If we leave out the cancellation coming from the different signs, i.e. if we consider the sum X |µ(n)|(ln n)` ln(X/n), n≤X then we have a sum which has order X(ln X)`: the upper bound follows by the Euler-Maclaurin summation formula, while the lower bound follows by noticing that X X |µ(n)|(ln n)` ln(X/n) ≥ |µ(n)|(ln n)` ln(X/n) n≤X X X 3 0 is parameter that we will choose later. In fact, X  Y  F (X + Y ) − F (X) = µ(n)(ln n)` ln 1 + X n≤X X + µ(n)(ln n)` ln((X + Y )/n), X

 X  + O (ln n)` ln((X + Y )/n) X

We have finally reached our purpose: the proof of the Prime Number Theorem with a convenient estimation for the remainder term. What ever next? We cannot leave the present topic without an explicit mention to the Riemann Hypothesis and to its relevance for the Prime Number Theorem. What is the ‘right’ estimation for the remainder term in the P.N.Th.? Using the argument in Propositions 1.16 and 1.17 it can be proved that π(x) = li(x) + O(xδ) for some fixed δ ∈ (1/2, 1), CAP. 1: PRIME NUMBER THEOREM 53 if and only if ψ(x) = x + O(xδ) for some fixed δ ∈ (1/2, 1), if and only if M(x)  xδ for some fixed δ ∈ (1/2, 1), if and only if ∞ 1 X µ(n) = converges for every s with Re(s) > δ, ζ(s) ns n=1 if and only if ζ(s) 6= 0 for every s with Re(s) > δ. The general theory of Hadamard on Weierstrass products (or the more complete Nevanlinna theory about the distribution of values of entire functions) connecting the growth of an entire function to the number of its zeros implies that ζ(s) has infinitely many zeros in the vertical strip Re(s) ∈ [0, 1]. The functional equation connecting ζ(s) with ζ(1−s) proves that the vertical strip Re(s) ∈ [1/2, 1] contains infinitely many zeros for ζ(s), therefore the best (smallest) possible value for δ is 1/2 +  ,with  > 0 and arbitrarily small. The Riemann Hypothesis (RH, in brief) actually claims that the prime number theorem holds with such a choice for δ, or in other words, that ζ(s) 6= 0 if Re(s) > 1/2. To date, no δ is known: the best (largest) proved zero-free region for ζ(s) is simply the set c σ + it: σ > 1 − |t| > 3 (ln |t|)2/3(ln ln |t|)1/3 for some c > 0, and is due to a joint work of Korobov and Vinogradov. Remark. 1.6 There are also other possible formulations of RH. The next two are quite striking because in apparently unrelated fields.

Nyman. for every α ∈ (0, 1), let ρα(x) := {α/x} − α {1/x}. Let B be the closure 2 2 11 in L ((0, 1)) of the C-span of functions ρα. Then, RH holds iff B = L ((0, 1)). P −1 Lagarias. Let Hn := j≤n j (the nth harmonic number). Then, RH holds iff X σ(n) = d ≤ Hn + exp(Hn) ln(Hn) ∀n, d|n with equality only for n = 1.12

11B. Nyman, On some groups and semigroups of translations Ph. D. Thesis, Uppsala, 1950. 12J. C. Lagarias, An elementary problem equivalent to the Riemann hypothesis, Amer. Math. Monthly 109(6), 534–543 (2002). 54 1.8. THE PRIME NUMBER THEOREM

In spite of the research that these conditions have triggered, at the present they have not been of any concrete utility towards a proof of RH.  Remark. 1.7 Let two bounded arithmetical functions f, g be given and suppose that 1 X f(n)g(n) x n≤x tends to a constant as x diverges. This limit is a kind of scalar product of f and g. In this setting, it is natural to call pairwise orthogonal every couple of functions for which the limit is 0. With this language, the fact that M(x) = o(x) can be stated by saying that the M¨obiusfunction and the constant function are orthogonal, and it would be very interesting to determine/classify the set of functions which are orthogonal to µ. In the next chapter we will see that µ is orthogonal to every Dirichlet character; for every couple of coprime integers m and q the function m m 2πimn/q e( q ·): n 7→ e( q n) := e can be written as linear combination of Dirichlet m characters modulo q. This fact shows immediately that µ and e( q ·) are pairwise orthogonal. H. Davenport in ’37 was able to prove that this is a particular case of a more general result saying that µ and every e(α·): n 7→ e(αn) := e2πiαn are 13 pairwise orthogonal for each α ∈ R. In 2010 P. Sarnak proposed the conjecture that actually every function g having ‘low complexity’ is orthogonal to µ (in a technical sense connected to the entropy of the dynamic system generated by g). As a sub-conjecture, Sarnak proposed a concrete class of functions that should be 14 orthogonal to µ, and this is exactly what B. Green has proved .  At last, it is important to realize that RH is not the ‘definitive’ hypothesis, i.e. the key tool which will give an answer to all problems in Analytical Number Theory. For example, some sharp predictions about the behavior of the gaps of prime numbers are not consequence of RH in itself but follow from conjectures about the distribution of the zeros of ζ(s) along the vertical line Re(s) = 1/2. We conclude the chapter with some exercises.

Exercise. 1.37 Let {pn} be the sequence of prime numbers. Prove that π(x) ∼ x/ ln x if and only if pn ∼ n ln n. Exercise. 1.38 The Prime Number Theorem, in the form we have proved it, implies 2 that π(x) = x/ ln x+x(1+o(1))/(ln x) . Deduce that pn = n(ln n+ln ln n−1+o(1)). Remark: A stronger conclusion is proposed as Exercise 5 to Ch. 6.2 in [MV].

13See for example, Th. 13.10 in [IK]. 14B. Green, On (not) computing the M¨obiusfunction using bounded depth circuits, Combin. Probab. Comput. 21(6), 2012, 942–951. CAP. 1: PRIME NUMBER THEOREM 55

w w Exercise. 1.39 Fix w > 0. Prove that the set Sw := {p /q : p, q ∈ P} is dense in + R . Hint: firstly prove the claim for w = 1 using Ex. 1.37, then use the fact that the map x 7→ xw is locally Lipschitz in (0, +∞). Exercise. 1.40 Let f : R → R be a locally Holderian map (i.e., a map for which for every compact set K there exist c > 0 and α > 0 (in general depending on K) such that |f(x) − f(y)| ≤ c|x − y|α holds for every x, y ∈ K). Prove that if S is dense in R then f(S) is dense in f(R). Use this remark to extend the claim in Ex. 1.39. Exercise. 1.41 (Primes in ‘fat’ intervals) Use the Prime Number Theorem in the form π(x) ∼ x/ ln x to prove that in [x, x + h] there are asymptotically h/ ln x prime numbers when h = h(x)  x. Exercise. 1.42 (Prime in ‘not so fat’ intervals) Let B be a fixed positive number. A Use the Prime Number Theorem in the form π(x) = li(x)+OA(x/(ln x) ) to prove that for every B > 0, in [x, x + h] there are asymptotically h/ ln x prime numbers B when h = h(x) B x/(ln x) . Exercise. 1.43 (Prime in ‘extra short’ intervals) Prove that under RH in [x, x+h] 1/2+ there are asymptotically h/ ln x prime numbers whenever h = h(x)  x . P Exercise. 1.44 Let M(x) := n≤x µ(n). 1) Prove that for every N ≥ 1, K ≥ 0 and s ∈ C: N+K X µ(n) M(N + K) M(N) Z N+K M(x) = − + s dx, ns (N + K)s N s xs+1 n=N+1 N so that for σ ≥ 1 N+K X µ(n) |M(N + K)| |M(N)| Z N+K |M(x)| ≤ + + |s| dx. ns (N + K) N x2 n=N+1 N P∞ s 2) Using the upper-bound for M(x), deduce that the series n=1 µ(n)/n con- verges in the closed half-plain σ ≥ 1, uniformly in every bounded subset.

P∞ s 3) Deduce that n=1 µ(n)/n = 1/ζ(s) in σ ≥ 1. In particular ∞ X µ(n) 1 = , ∀t ∈ . n1+it ζ(1 + it) R n=1 4) By Step 3. deduce that ∞ X µ(n) = 0. n n=1 56 1.8. THE PRIME NUMBER THEOREM

Exercise. 1.45 Suppose Re(s) > 1. How to compute the value of ζ(s)? The first P∞ s idea is by using the representation ζ(s) = n=1 1/n , but its rate of convergence is not very good. 1) Using the Euler-Maclaurin formula, in fact, prove that ∞ ∞ X 1 X 1  S  1 ≤ ≤ 1 + , ns nσ σ − 1 Sσ n=S n=S so that S−1 X 1  S  1 ζ(s) − ≤ 1 + ns σ − 1 Sσ n=1 and in order to get the value of ζ(s) with an error below 10−4, for example, you need to take S > 104/(σ−1).

1−s −1 P∞ n+1 s 2) Recall that ζ(s) = (1 − 2 ) ξ(s), where ξ(s) := n=1(−1) /n . Let P n+1 F (x) := n≤x(−1) . Using the partial summation formula prove that ∞ X (−1)n+1 F (S) Z +∞ F (x) = − + s dx ns Ss xs+1 n=S S so that ∞ X (−1)n+1 1 + |s|/σ ≤ . ns Sσ n=S It follows that S−1 X (−1)n+1 (1 + |s|/σ)|1 − 21−s|−1 (1.33) ζ(s) − (1 − 21−s)−1 ≤ . ns Sσ n=1 With this formula, to get the value of ζ(s) with an error below 10−4 you need to take S > 104/σ: the convenience with respect to the previous formula is evident. (For example, in order to compute ζ(1.1 + i) within 10−4 using the formula in Step 1 you need S = 1050, while S = 14000 suffices with the second formula).

1−s −1 3) The equality ζ(s) = (1 − 2 ) ξ(s) holds for every s ∈ C and the repre- P∞ n+1 s sentation ξ(s) = n=1(−1) /n holds for every s with Re(s) > 0. As a consequence, we can still compute ζ(s) with Re(s) > 0 using that formula. In particular, the bound in (1.33) holds for every Re(s) > 0, s 6= 1. Exercise. 1.46 The equality ζ(s) = (1 − 21−s)−1ξ(s) and its approximated version in Step 2 of Ex. 1.45 is not very useful for the computation of ζ(1 + it) for t ∈ R 1−s (because the factor 1 − 2 has zeros when t = 2πk/ ln 2 for any k ∈ Z, while ζ CAP. 1: PRIME NUMBER THEOREM 57 is regular here. This means that ξ(1 + it) is zero at the same points so that the formula gives ζ(1 + it) here as quotient of two very small quantities, a fact which introduces instabilities in numerical computations). An alternative formula is the following. 1 1 R ∞ B1({x}) 1) Recall that ζ(s) = s−1 + 2 − s 1 xs+1 dx for Re(s) > 0, and that

N 1−s Z N X 1 1 N 1 1  B1({x}) = + + + 1 − s dx. ns s − 1 1 − s 2 N s xs+1 n=1 1 Subtracting these identities prove that

N X 1 N 1−s 1 |s| ζ(s) − + + ≤ Re(s) = σ > 0. ns 1 − s 2N s 2σN σ n=1 2) Deduce that

N X 1 N 1−σ |s| |ζ(σ + it)|  + + ∀σ > 0, ∀|t| > 1. nσ |t| σN σ n=1 For σ ∈ (0, 1) this bound gives 1−σ |ζ(σ + it)| σ |t| ∀σ > 0, ∀|t| > 1, proving that ζ grows along the vertical lines at most as a power of t (but note that the bound is not uniform in σ). 3) For σ = 1 the formula gives

N X 1 |t| |ζ(1 + it)|  + O ∀|t| > 1, n N n=1 which with N = |t| shows that |ζ(1 + it)|  ln |t| ∀|t| > 1. P s Exercise. 1.47 The function P (s) := p 1/p is called Prime zeta function; also this Dirichlet series converges for σ > 1. The following steps will give a formula for a quick computation of the values of P (s). 1) Let σ > 1. Using the representation of the Riemann zeta function as Euler product, prove that ∞ X µ(n) P (s) = ln(ζ(ns)). n n=1 58 1.8. THE PRIME NUMBER THEOREM

2) Recall the result in Ex. 1.45 Step 1:

∞ ∞ X 1 X 1  2  1 ≤ ≤ 1 + . ks kσ σ − 1 2σ k=2 k=2 3) Using the inequality ln(1 + y) ≤ y, prove that for every L ≥ 3

∞ ∞ ∞ X µ(n) X 1  X 1  2(σ − 1)−1 ln(ζ(ns)) ≤ ln 1 + ≤ , n n knσ 2Lσ − 1 n=L n=L k=2 so that L−1 X µ(n) 2(σ − 1)−1 P (s) − ln(ζ(ns)) ≤ . n 2Lσ − 1 n=1 4) Use the previous formula to compute P (2) with an error lower than 10−4 (the necessary values for ζ can be computed using the approximation in Ex. 1.45, or using their exact values ζ(2) = π2/6, ζ(4) = π4/90 and so on).

Q 1  Exercise. 1.48 How to compute the value of c := p>2 1− (p−1)2 ? The following argument is due to Wrench.15 x2 P∞ 2m−2 m 1. Check that − ln(1 − (1−x)2 ) = m=2 m x for |x| < 1/2. P∞ 2m−2 P 1 2. Deduce that − ln c = m=2 m Podd(m), where Podd(m) := p>2 pm is the odd primes zeta function, i.e. the restriction to the odd primes of the usual Prime zeta function already introduced in Ex. 1.47.

P −m −m P −m −m 1−m 3. Prove that Podd(m) = p>2 p ≤ 3 + n>4 n ≤ 3 + m4 for every m ≥ 2, so that

K−1 X 2m − 2 3 2K 8 − ln c − P (m) ≤ + . m odd K 3 2K m=2 −m The values of Podd(m) = P (m) − 2 can be quickly computed using the result in Ex. 1.47. Exercise. 1.49 Use the Prime Number Theorem to reprove Proposition 1.14 (but notice that we have proved that proposition without the PNT).

15John W. Wrench Jr., Evaluation of Artin’s constant and the twin-prime constant, Math. Comp. 15, 396–398 (1961). On a possible solution of similar computational problems see also P. Moree, Approximation of singular series and automata, Manuscripta Math. 101(3), 385–399 (2000). CAP. 1: PRIME NUMBER THEOREM 59

Exercise. 1.50 Use the Prime Number Theorem to prove that X ln p  1  = ln x + d + O p ln x p≤x for a suitable constant d. This relation improves (1.24). Exercise. 1.51 Let α ∈ [0, 1). Use the Prime Number Theorem to prove that X 1  1  1  x1−α = + O . pα 1 − α ln x ln x p≤x The result is uniform in α? Exercise. 1.52 The following steps prove that if ψ(x) = x + O(xδ) for some δ > 0, then ζ(s) 6= 0 for Re(s) > δ. 1) Let ∞ ζ0(s) X Λ(n) − 1 H(s) := − − ζ(s) = . ζ(s) ns n=1 Note that H(s) is meromorphic in C, is holomorphic in an open neighborhood of s = 1 (the simple pole of ζ(s) is cancelled out) and has simple poles at every zero of ζ. 2) Using the partial summation formula prove that Z +∞ ψ(x) − bxc H(s) = s s+1 dx 1 x when Re(s) > 1. 3) By the principle of identity for meromorphic functions the previous equality holds not only in Re(s) > 1, but also in every open and connected set of C where both LHS and RHS are holomorphic. Assume that |ψ(x) − x|  xδ, then the RHS of the previous equality is holomorphic in Re(s) > δ. This means that H(s) is holomorphic here, i.e. ζ has no zeros here.

P1000 −s P∞ −s Exercise. 1.53 Let g(s) := 1 − n=2 n and h(s) := n=1001 n . 1) Prove that g(s) has a real zero s0 in the interval [1.72, 1.73].

2) Let Γ := {s ∈ C: |s − s0| ≤ 0.03}. Prove that |h(s)| ≤ 0.012 and that |g(s)| ≥ 0.04 for every s ∈ Γ. 3) Let f(s) := g(s) + h(s). Using Rouch´e’sTheorem, deduce that f(s) has a zero in the disk encircled by Γ. 60 1.8. THE PRIME NUMBER THEOREM

Comments: The function f(s) differs from the Riemann zeta function only for the signum of the coefficients whose index n is in 2,..., 1000; this ‘little’ change does not modify the general properties of the function (analytic continuation to C as meromorphic function, unique pole at s = 1 with residue equal to 1, a kind of functional equation, and so on), thus under this aspect we could consider f(s) as a ‘small’ change of ζ(s). Nevertheless, f(s) completely losses its connection with the arithmetic: the coefficients of f(s) are no more a multiplicative function, so f(s) does not have a representation as Euler product. Hence, in this sense f(s) is a ‘great’ variation of the Riemann zeta function. Besides, f(s) has zeros in σ > 1, a behavior which is in total contrast with the one of ζ: this example suggests that, probably, the arithmetic (i.e., the existence of an Euler product representation) will have a fundamental role in the proof (?) of the Riemann Hypothesis. Hint: the Euler-Maclaurin formula should be used (as in Ex. 1.45) to get some explicit formulas giving good approximations of h(s). A software can be used to perform the (great quantity of) computations needed for Items 1 and 2. Chapter 2

Primes in arithmetic progressions

Let a and q be fixed positive integers: How many primes p satisfy the con- gruence p = a (mod q)? Note that if (a, q) > 1 there are only finitely many of them, thus the question is interesting only under the assumption that a and q be coprime. For special values of a and q there are several elementary ways to prove that the set is actually infinite. For example, numbers Nk := 4p1 ··· pk − 1 can be used to prove that there are infinitely many primes p with p = −1 (mod 4)1 2 and numbers Nk := (2p1 ··· pk) + 1 can be used to prove that there are infinitely many primes p with p = 1 (mod 4)2. Similar arguments, for example based upon the cyclotomic polynomials, give the analogous result for other choices of a and q3. Therefore there is a good evidence in favor to the conjecture that every arithmetic progression a mod q contains infinitely many primes whenever (a, q) = 1. In spite of the partial results afore mentioned, the conjecture resisted for several years, when finally was proved, in a much stronger version, by Dirichlet. Its proof was a P far extension of the original argument of Euler proving the divergence of p 1/p, and in fact he proved the conjecture under the form

Theorem 2.1 (Dirichlet) Let (a, q) = 1, then X 1 1 X 1 ∼ as σ → 1+. pσ ϕ(q) pσ p=a (q) p

1 Every odd prime is congruent to 1 or to −1 modulo 4. Let p1, . . . , pk be primes, all congruent to −1 modulo 4. Take Nk := 4p1 ··· pk − 1; Nk is = −1 (mod 4), so it has a prime factor which is congruent to −1 modulo 4 (otherwise Nk would be a product of numbers congruent to 1 so that it would be congruent to 1, too). This prime factor cannot be equal to any pj , because Nk is congruent to −1, not to 0, modulo pj . 2 Let p be a prime divisor of Nk; Nk is odd, hence p is odd too. By its definition we have 2 −1 = (2p1 ··· pk) (mod p), in particular −1 is a square modulo p, so that p = 1 (mod 4). The proof concludes because it is evident that p 6= p1, . . . , pk. 3 2 In 1918 Schur proved that if a = 1 (mod q) then there is a polynomial P ∈ Z[x] such that for all integers n large enough, all prime divisors of P (n) are = a (mod q): this fact immediately proves that there are infinitely many primes p = a (mod q), and the argument is essentially what we have used before for primes in −1 (mod 4) (set P (x) = 4x − 1) and 1 (mod 4) (set P (x) = 4x2 + 1). In 1988 M. Ram Murty proved that this is also a necessary condition for such a kind of proof. In other words, he proved that if such a polynomial exists then a2 = 1 (mod q). Original Murty’s paper Primes in certain arithmetic progressions, J. Madras Univ. (1988), 161–169 is not easily accessible, but he reprinted it with some improvements in Prime numbers in certain arithmetic progressions, Funct. Approx. Comment. Math. 35 (2006), 249– 259. For a nice exposition of its theorem see also these notes of Keith Conrad www.math.uconn. edu/~kconrad/blurbs/gradnumthy/dirichleteuclid.pdf . 61 62 CAP. 2: PRIMES IN ARITHMETIC PROGRESSIONS

As attested by the appearing of the σ variable, his proof was essentially real ana- lytic.4 After the proof of the Prime Number Theorem, Landau was able to merge the two techniques and prove the result of Dirichlet in the stronger form X 1 √ π(x; a, q) := 1 = li(x) + O (xe−c ln x). ϕ(q) a,q p=a (q) p≤x His proof was complex analytic in essence. In this chapter we prove the following intermediate result (stronger than Dirichlet’s result but weaker than Landau’s one). Theorem 2.2 Let (a, q) = 1, then X 1  x  π(x; a, q) := 1 = li(x) + O , ∀A > 0. ϕ(q) a,q,A (ln x)A p=a (q) p≤x Actually, its proof is very similar to the one for the Prime Number Theorem, thus we will skip the entire set of analytic lemmas and we will concentrate our attention only upon the new tools. The first step is to introduce a way to distinct the integers satisfying a given arithmetic progression a (mod q), with (a, q) = 1. This is done by using the cha- ∗ racters of the group (Z/qZ) , a tool which was devised by Dirichlet exactly for this purpose and now has become fundamental for the theory of groups. Here we give a self contained proof of the main result we need, but for a better comprehension of this topic we demand the reader to specialized texts.5 ∗ Let G be an abelian finite group. A character of G is a map χ: G → C which is multiplicative, i.e. such that χ(ab) = χ(a)χ(b). The constant map χ0 defined by χ0(a) = 1 for every a is the simplest character. Proposition 2.1 1) χ(1) = 1. 2) χ(a) is a ]G-th root of unity.

3) χ(a−1) = χ(a)−1 = χ(a).

4Dirichlet said that the idea for the proof came out during a visit he spent in Florence, watching the chandelier suspended at the top of the cathedral. Considering that the same chandelier inspired to Galilei four century before his law upon the (almost) independence of the period of the oscillations of a pendulum of the amplitude, we can deduce that in case of a strong mathematical difficulty a visit to Florence could be a major step toward the solution. 5for example [H] and [I]. CAP. 2: PRIMES IN ARITHMETIC PROGRESSIONS 63

4) The set characters of G is itself a finite abelian group, which is denoted as Gˆ, with ]Gˆ = ]G, and χ0 is the unit of this group. 1 P 5) ]G a∈G χ(a) is 1 when χ = χ0, 0 otherwise.

1 P 6) ]G a∈G χ(a)η(a) is 1 when χ = η, 0 otherwise.

1 P 7) ]G χ∈Gˆ χ(a)χ(b) is 1 when a = b, 0 otherwise.

Proof. 1) Evident. 2) a]G = 1 for every a ∈ G (this the little Fermat theorem), therefore (χ(a))]G = χ(a]G) = χ(1) = 1. 3) The first claim follows by the multiplicativity, the second claim follows by the second item. 4) Let Gˆ be the set of characters of G. In Gˆ we introduce a composition law defining the product of two characters χ and η pointwise, i.e. setting (χη)(a) := χ(a)η(a) for every a ∈ G. It is evident that χη is a character, that χ0 is the unit and that χ acts as the inverse of χ in Gˆ: this proves that Gˆ has the structure of abelian group. The difficult point is proving that Gˆ is finite and that its order is equal to the one of G. There are several possible proofs; the simplest one consists in proving the claim for cyclic groups (in this case the set of characters can be concretely produced) and then using the fundamental theorem giving the structure of every abelian group G as direct product of suitable cyclic subgroups. Here we reproduce a different proof, which is longer but more interesting. Let V (G, C) := {f : G → C}, the set of all maps from G to C. It is a vector space with respect to the pointwise operations. Since G is finite, it is evidently finite dimensional with d := dim(V (G, C)) = #G.

Lemma 2.1 (Weil) Any set χ1,..., χn of distinct characters for G are line- arly independent elements in V (G, C).

Proof. By absurd, suppose that a set of linearly dependent characters exists, and let χ1,..., χn be such a set with minimal cardinality n. This means that Pn there exist c1, . . . , cn ∈ C such that j=1 cjχj(a) = 0 for every a ∈ G, and the minimality of n gives cj 6= 0 for every j. The values of any character are non-zero, hence in such a sum at least two characters appear. Let α be an 64 CAP. 2: PRIMES IN ARITHMETIC PROGRESSIONS

element in G such that χ1(α) 6= χ2(α). Then n n X X cjχ1(α)χj(a) = χ1(α) cjχj(a) = 0 ∀a ∈ G j=1 j=1 and n n X X cjχj(α)χj(a) = cjχj(αa) = 0 ∀a ∈ G. j=1 j=1 Subtracting them we get n X cj(χ1(α) − χj(α))χj(a) = 0 ∀a ∈ G. j=1 In this sum the j = 1 term does not appear, and the one with j = 2 has a non-zero coefficient, therefore the equality shows that the characters χ2, . . . , χn are C-linearly dependent. This is impossible since this set contains only n − 1 characters so that it violates the minimality of n.  The lemma implies that any collection of n distinct characters contains at most d elements, because d is the dimension of V (G, C). As a consequence Gˆ is a finite set, with dˆ := ]Gˆ ≤ d. The argument my be repeated for the new ˆ ˆ ˆ group Gˆ, and proves that dˆ≤ dˆ where dˆ:= ]Gˆ. Now, we prove that given any subgroup H ⊆ G and any character ψ ∈ Hˆ , there is always a character ψ0 ∈ G whose restriction to H coincides with ψ. In fact, let x be any element in G\H. Consider the group Hx := hH, xi. Let q be the order of [x]H (the class of x modulo H) as element in G/H (here we are using the assumption that G is abelian to ensure that H is normal in G). Then xq ∈ H, so that ψ(xq) is well defined. Let ξ be any qth root of ψ(xq), and set ψ0(hxm) := ψ(h)ξm 0 for every h ∈ H and m ∈ Z. We notice that ψ is well defined. In fact, if m 0 n 0 hx = h x for some h, h ∈ H and m, n ∈ Z, then n = m + kq for some k ∈ Z (because xn−m = hh0−1 ∈ H), so that ψ(h) = ψ(h0xn−m) = ψ(h0(xq)k) = ψ(h0)ψ(xq)k = ψ(h0)(ξq)k = ψ(h0)ξn−m, i.e., ψ(h)ξm = ψ(h0)ξn. Since G is abelian, this also proves that ψ0 is multipli- cative in Hx (here again we use this fundamental hypothesis on G), so that it 0 is a character of Hx. Its definition also shows that ψ coincides with ψ on H. If G = Hx the proof terminates, otherwise we repeat the argument, producing CAP. 2: PRIMES IN ARITHMETIC PROGRESSIONS 65

an even larger group. With finitely many steps we arrive to G (because G is finite), so that the claim is proved in full generality. Now, let a ∈ G be an element which is not the unit. Then there exists a character χ ∈ Gˆ such that χ(a) 6= 1. In fact, let q be the order of a in G, let 2πi 2πin H := hai and let ξ := e q . Then setting ψ(an) := ξn = e q we produce a character on H which is not trivial in a (because a 6= e, hence q 6= 1), and the previous argument proves that ψ can be lifted to a character of G having the same value in a. ˆ ∗ Finally, for every a ∈ G, let eva be the map G → C such that eva(χ) = ˆ χ(a). This is evidently multiplicative, so that it is an element in Gˆ. Let the ˆ evaluation map: ev: G → Gˆ be defined as ev(a) := eva. This is multiplicative, too, and the claim we have just proved states that it is injective. This shows ˆ that d ≤ dˆ. Thus ˆ d ≤ dˆ≤ dˆ≤ d, ˆ which proves that d = dˆ= dˆ.

5) The claim for χ = χ0 is evident. Let χ be not equal to χ0. Then there exists b ∈ G such that χ(b) 6= 1. Let S := P χ(a). The multiplicativity implies P P a∈G that χ(b)S = a∈G χ(ab) = a∈G χ(a) = S (because the product by b simply permutes the elements of G), so that S = 0. 6) A simple consequence of Item 5, since χη is a character.

7) Let d be the cardinality of G (and hence of Gˆ, by Item 4). Let χ1, . . . , χd be the complete set of characters and let a1, . . . , ad be the complete set of elements in G. Let M be the square matrix   χ1(a1) χ2(a1) . . . χd(a1) 1 χ1(a2) χ2(a2) . . . χd(a2) √   M =  . .  . d  . .  χ1(ad) χ2(ad) . . . χd(ad) ∗ ∗ 6 Item 6 proves that M M = I. Then MM = I too , which is the statement of Item 7.  6We use here the fact that if the product AB of two square matrices A and B is the identity, then B is the inverse of A (and vice versa); in other words, the right-inverse of A is necessary its (bilateral) inverse. The proof is the following. Let AB = I. Then the determinant of A is not zero (because det(A) det(B) = det(AB) = 1), hence A is invertible. Multiplying by A−1 we get the equality A−1 = A−1AB = B, proving the claim. 66 CAP. 2: PRIMES IN ARITHMETIC PROGRESSIONS

For every a ∈ G, let δa : G → C be the Dirac delta at a, i.e. the function whose value δa(b) is 1 when b = a and 0 otherwise. The family {δa}a∈G is evidently a P basis for V (G, C), with the identity f = a∈G f(a)δa giving the basic formula showing the decomposition of f on the basis of delta’s. Proposition 2.1 shows that the elements in Gˆ, i.e. the characters, are an alternative basis for the same space, with claim 6) giving a way to compute the coordinates of f with respect to this new basis: X 1 X f = fˆ(χ)χ, where fˆ(χ) := f(a)χ(a). ]G χ∈Gˆ a∈G This is an example (the easiest one, actually) of Fourier duality and can be ex- tended to each abelian locally compact group (but the extension needs several sophisticated tools).

∗ Let G be the group of integers modulo q which are coprime with q, i.e. (Z/qZ) . This is an abelian group whose cardinality is ϕ(q). We extend every character χ of G to every integer Z by setting ( 0 if (n, q) 6= 1 χ(n) = χ(n (mod q)) if (n, q) = 1.

The arithmetical function we have generated in this way is called Dirichlet cha- racter modulo q and by abuse of notation it is still denoted as χ. It is totally multiplicative, i.e. χ(mn) = χ(m)χ(n) for every couple of integers m, n not ne- cessarily co-prime. The key idea for the proof of the theorem is the following identity, which is an immediate application of the previous proposition (Item 7) X 1 X X Λ(n) = χ(a) χ(n)Λ(n). ϕ(q) n=a (q) χ (mod q) n≤x n≤x In this way the original sum on the arithmetic progression has been written as a (sum of a) more conventional sum on the entire set of integers n ≤ x, but for χ(n)Λ(n). In this sum we single out the term coming from the trivial character χ0: X 1 X 1 X X (2.1) Λ(n) = χ (n)Λ(n) + χ(a) χ(n)Λ(n). ϕ(q) 0 ϕ(q) n=a (q) n≤x χ (mod q) n≤x n≤x χ6=χ0

The effect of the presence of χ0(n) at the first term is the exclusion from the sum of the integers which are not coprime with q (because χ0(n) = 0 for them) CAP. 2: PRIMES IN ARITHMETIC PROGRESSIONS 67 and therefore is essentially equal to the well-known sum ψ(x). Hence, the proof consists in proving that    x + O x if χ = χ , X  A (ln x)A 0 (2.2) ψχ(x) := χ(n)Λ(n) =  x  n≤x OA (ln x)A if χ 6= χ0. The Dirichlet series ∞ X L(s, χ) := χ(n)/ns n=1 is called Dirichlet L-function associated with the character χ. Since |χ(n)| ≤ 1, the series converges for Re(s) > 1, and since χ is a totally multiplicative map, we get also that Y  χ(p)−1 L(s, χ) = 1 − Re(s) > 1. ps p From this representation as Euler product it is clear that Y  1  L(s, χ ) = 1 − · ζ(s); 0 ps p|q this equality shows that L(s, χ0) admits an analytic continuation to C as meromor- phic function with unique pole (simple) at s = 1, and that its zeros in Re(s) > 0 coincide with the zeros of ζ(s). P Let χ 6= χ0 and let Aχ(x) := n≤x χ(n). From Item 5 in Proposition 2.1 Pa+q we have n=a+1 χ(n) = 0 for every integer a, therefore Aχ(x) is bounded (for example by ϕ(q)/2, but the exact bound is not important here7). Hence by partial summation it follows that the Dirichlet series defining L(s, χ) converges in Re(s) > 0 (not absolutely in Re(s) ∈ (0, 1]) and that in this half-plan we have Z +∞ Aχ(x) (2.3) L(s, χ) = s s+1 dx if Re(s) > 0. 1 x Summarizing, we have proved that the Dirichlet L-functions are represented by Dirichlet series and Euler product in Re(s) > 1, admit a continuation to Re(s) > 0 as meromorphic function, and that actually only L(s, χ0) has a pole in this region; it is reminiscent of the one of the Riemann zeta function and in fact it is simple and located at s = 1. Our ability to prove the Prime Number Theorem was based upon our ability to find upper bounds for the derivatives of the inverse of the Riemann zeta function, i.e. a lower bound for the zeta itself and an upper bound for its derivatives. In particular, we have deduced the lower bound for ζ from an upper bound, using

7 √ The P´olya-Vinogradov inequality says that maxx |Aχ(x)| < q ln q. 68 CAP. 2: PRIMES IN ARITHMETIC PROGRESSIONS the special identity in Proposition 1.18. The upper bound for the derivatives of L(s, χ) can actually be proved following word by word the argument we have used for ζ; the existence of a special identity for L(s, χ) is more delicate, but we are lucky and such a formula exists.

Proposition 2.2 Let σ > 1 and t ∈ R, then 3 4 2 2 L(σ, χ0) |L(σ + it, χ)| |L(σ + 2it, χ )| ≥ 1. Proof. In the region Re(s) > 1 all the involved functions have a representation as (absolutely convergent) Euler product, so that −it it X h  χ0(p)  χ(p)p   χ(p)p  ln(LHS) = − 3 ln 1 − + 2 ln 1 − + 2 ln 1 − pσ pσ pσ p  χ2(p)p−2it   χ2(p)p2it i + ln 1 − + ln 1 − pσ pσ ∞ X X 1 = 3χ (pm) + 2χ(pm)p−imt + 2χ(pm)pimt mpmσ 0 p m=1 + χ2(pm)p−2imt + χ2(pm)p2imt ∞ X X 1 = (χ (pm) + χ(pm)p−imt + χ(pm)pimt)2 ≥ 0. mpmσ 0 p m=1  As we have done for the Riemann zeta function, we can use the previous inequality to prove the non-existence of zeros for the Dirichlet L-functions along the line Re(s) = 1, with the exception of the point s = 1. For the point s = 1 a different argument will be need.

Corollary 2.1 L(1 + it, χ) 6= 0 for every t ∈ R. Proof. By absurd, suppose that L(s, χ) has a zero along the vertical line σ = 1, so 2 that there exists t0 such that L(1 + it0, χ) = 0. When χ is real, i.e. when χ = χ0, we further suppose that t0 6= 0. L(s, χ) is holomorphic in an open neighborhood n ∗ of 1 + it0, therefore L(s, χ) ∼ c(s − 1 − it0) for some constant c ∈ C and some 2 integer n ≥ 1. Moreover, L(s, χ ) is regular in a neighborhood of 1 + 2it0 (here 2 we use the assumption t0 6= 0 for the characters χ with χ = χ0) hence from the inequality proved in the previous proposition we deduce that 3 4 2 2 −3 4n + 1 ≤ L(σ, χ0) |L(σ + it0, χ)| |L(σ + 2it0, χ )|  (σ − 1) (σ − 1) as σ → 1 , which is impossible because 4n − 3 > 0. To complete the proof we need to prove that L(1, χ) 6= 0 when χ is a real character. CAP. 2: PRIMES IN ARITHMETIC PROGRESSIONS 69

There are several ways to prove this fact, both algebraic (for example considering L(s, χ) as a factor of the Dedekind zeta function of a quadratic field and using the class number formula) and analytic. We reproduce here a classical analytic proof, based upon Landau’s Theorem 1.2. Let H(s) := ζ(s)L(s, χ). This function is meromorphic in Re(s) > 0 and is represented by a Dirichlet series P+∞ a(n)n−s P n=1 with nonnegative coefficients. In fact, a(n) = d|n χ(d) (because H is the product of two Dirichlet series, so that its coefficient is the ∗-product of their coefficients), hence it is multiplicative (because χ is multiplicative), and  k k 1 if χ(p) = 0 X X  a(pk) = χ(pj) = χ(p)j = k + 1 if χ(p) = 1 j=0 j=0  1 k 2 (1 + (−1) ) if χ(p) = −1 which proves that a(pk) ≥ 0 in any case. This computation also shows that 2k 2 a(p ) ≥ 1 whenever p - q, so that a(n ) ≥ 1 for every integer n which is coprime to q, yielding the lower bound

+∞ X 1 (2.4) H(σ) ≥ . n2σ n=1 (n,q)=1

Now, suppose that L(1, χ) = 0. Then H(s) is regular for σ > 0 (the zero of L(s, χ) kills the simple pole of ζ at s = 1) and it is a Dirichlet series with nonnegative coefficients. Therefore its Dirichlet series converges in σ > 0 (by Landau Theo- rem 1.2). However this is impossible, since by (2.4) the Dirichlet series diverges when σ = 1/2. 

The proof of Theorem 2.1 is almost immediate. In fact, let σ → 1+. by Euler product

X  χ(p) X χ(p) X h  χ(p) χ(p)i ln(L(σ, χ)) = − ln 1 − = − ln 1 − + pσ pσ pσ pσ p p p X χ(p)  X 1  X χ(p) = + O = + O(1). pσ p2σ pσ p p p

When χ 6= χ0 we know that L(1, χ) 6= 0, thus ln(L(σ, χ)) = O(1) and the previous computation shows that

X χ(p) = O(1) as σ → 1+. pσ p 70 CAP. 2: PRIMES IN ARITHMETIC PROGRESSIONS

This bound and the orthogonality of characters give

X 1 1 X X χ(q) 1 X χ0(q) 1 X X χ(q) = χ(a) = + χ(a) pσ ϕ(q) pσ ϕ(q) pσ ϕ(q) pσ p=a (q) χ p p χ6=χ0 p 1 X 1 = + O(1) ϕ(q) pσ p which is Theorem 2.1. In particular, we notice that its claim depends only the behavior of Dirichlet L functions on the real set (1, +∞). On the contrary, the proof of Theorem 2.2 is longer and starts with formulas (2.1) and (2.2), and then reproduces in this new setting the tools we have already used for the proof of the Prime Number Theorem: all key ingredients hold also in this new case, and only ancillary (yet fundamental!) computations are needed. They are proposed in the following exercises. Concluding, we notice that the proof of Theorem 2.2 relies on the behavior of Dirichlet L functions on the closed half-plan Re(s) ≥ 1.

Exercise. 2.1 Let q > 1 and let χ be a character modulo q, χ 6= χ0. Prove that P n≤x χ(n) ln n q ln x.

Exercise. 2.2 Let q > 1 and let χ be a character modulo q, χ 6= χ0. Let Mχ(x) := P P n≤x χ(n)µ(n) and ψχ(x) := n≤x χ(n)Λ(n). Prove that x x M (x)  ∀A > 0 =⇒ ψ (x)  ∀A > 0. χ A (ln x)A χ A (ln x)A Suggestion: imitate the proof of Proposition 1.17 and use the previous exercise.

Exercise. 2.3 Let q > 1 and let χ be a character modulo q, χ 6= χ0. Prove that (`) `+1 L (s, χ) `,q (ln(3|s|)) for Re(s) > 1, where the symbol `,q means that the implicit constant depends on ` and q, so that the result is not uniform in these parameters. Suggestion: imitate the proof of Proposition 1.19.

Exercise. 2.4 Let q > 1 and let χ be a character modulo q, χ 6= χ0. Prove that |L(1, χ)| ≤ maxx |Aχ(x)|, so that |L(1, χ)| ≤ ϕ(q)/2. √ The better bound |L(1, χ)| ≤ q ln q comes from the P´olya-Vinogradov inequality. Suggestion: use (2.3). Chapter 3

Sieve methods

With the exception of some original divagations and some clarifications, this chapter is based upon [MV], Ch. 3.

3.1. Eratosthenes-Legendre’s sieve Sylvester’s inclusion–exclusion principle has many formulations, one of them is the following: let χA be the characteristic function of the set A and let A1,...,An be sets of some kind, then X X X (3.1) χA ∪A ∪···∪A = χA − χA ∩A + χA ∩A ∩A + ··· . 1 2 n j j1 j2 j1 j2 j3 j j1

1 − χA ∪A ∪···∪A = χ(A ∪A ∪···∪A )c = χ c c c = χ c χ c ··· χ c 1 2 n 1 2 n A1∩A2∩···∩An A1 A2 An = (1 − χ )(1 − χ ) ··· (1 − χ ) A1 A2 An X X X = 1 − χA + χA χA − χA χA χA + ··· j j1 j2 j1 j2 j3 j j1

Let (X, µ, Σ) be any measure space and suppose that the sets {Aj}j belong to the algebra Σ. Integrating (3.1) with respect to µ we get

(3.2) µ(A1 ∪ A2 ∪ · · · ∪ An) X X X = µ(Aj) − µ(Aj1 ∩ Aj2 ) + µ(Aj1 ∩ Aj2 ∩ Aj3 ) + ··· . j j1 0, Y 1 2 k j1 j2 jk 71 72 3.1. ERATOSTHENES-LEGENDRE’S SIEVE then ` P` k Proposition 3.1 (Bonferroni) (−1) k=0(−1) Sk ≥ 0 for every `, so that

S1 − S2 ≤ S0 ≤ S1

S1 − S2 + S3 − S4 ≤ S0 ≤ S1 − S2 + S3

S1 − S2 + S3 − S4 + S5 − S6 ≤ S0 ≤ S1 − S2 + S3 − S4 + S5

... ≤ S0 ≤ ...

Proof. For every x ∈ Y , let I(x) be the number of sets Aj containing x. Then when k > 0 Z Z X Z I(x) Ξ (x) dµ(x) = χ (x) dµ(x) = dµ(x) k Aj ∩Aj ∩···∩Aj Y Y 1 2 k Y k j1

1 Prove it, for example by induction on `, or using the power series identity ∞ ∞ ! ∞ !  X ` X ω ` X ω − 1 ` x (−x) = (−x) ` ` `=0 `=0 `=0 which is only another way to write (1 − x)−1 · (1 − x)ω = (1 − x)ω−1. Equating the coefficient of x` to the left-hand side and to the right-hand side, we get the equality ` ! ! X j ω ` ω − 1 (−1) = (−1) , ∀ω, ∀`. j ` j=0 ω ω·(ω−1)···(ω−j+1) Note that if j is considered as j! , then the same argument proves the identity also for non integer values of ω. CAP. 3: SIEVE METHODS 73 generality the Eratosthenes’ sieve is a very similar way to reach the same purpose, for the set of primes. In order to appreciate this remark, recall that the Eratosthenes’ sieve is a method to build a complete list of all prime numbers which are lower than a fixed integer N. This algorithm runs as follows: 1. write down all the integers 2, 3,...,N in an ordered list, 2. save the first element of the list and erase all its proper multiples, l√ m 3. repeat the second step until you have reached the integer N , 4. the saved integers and the remaining entries are the complete list of primes ≤ N.

For example, let N = 20, so that we begin with 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20. We save the number two and we delete every proper multiple of two (i.e. numbers 2n with n > 1): we have

↓ 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 . Now we repeat the step, with the previous list: we save the number three and we delete every multiple of three which is not equal to 3 itself: we have

↓ ↓ 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 .

Note that some integers have been excluded (boxed) twice. One more run, with the previous list: we save the number five and we delete every multiple of five which is not equal to 5 itself. We get

↓ ↓ ↓ 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 . √ The algorithm stops here, because we have reached  20 = 5. Hence the prime numbers lower than 20 are: 2 3 5 7 11 13 17 19.

Why there is no need√ to repeat the second step for every integer below N, but it is sufficient to reach N? Because if n ≤ N is not a prime number, then it is a product of the form uv with u, v ≥ 2 and at least one of them, say u, is lower l√ m than N . Thus, the number n has already been erased when a prime divisor of u has been counted. Now, how can we cast this algorithm in a formula for π(N)? The first step gives an evident upper bound π(N) ≤ N − 1 (we subtract one because 1 is not a 74 3.1. ERATOSTHENES-LEGENDRE’S SIEVE prime).√ The second step erases the proper multiples of all primes which are below N, hence X  N   π(N) ≈ N − 1 − − 1 ; √ p p≤ N note that bN/pc − 1 appears here, not simply bN/pc, because we subtract the number of proper multiples. This difference actually provides a lower bound for π(N), i.e. √ X N  π(N) ≥ N − 1 + π( N) − , √ p p≤ N √ because the integers which are divisible by two primes lower than N have been P excluded away two times in p. To restore them we add a new term: √ X N  X  N  π(N) ≈ N − 1 + π( N) − + . √ p √ p1p2 p≤ N p1

2This is much more than a simple analogy. Gian Carlo Rota proved in late ’60 that they are simply two realizations of a general principle which is valid for Partial Ordered Sets. See E. A. Bender and J. R. Goldman: On the applications of M¨obiusinversion in combinatorial analysis, Amer. Math. Monthly 82(8), 789–803, 1975. CAP. 3: SIEVE METHODS 75

Legendre organized this argument in a more flexible way. Given an integer P , let S(x, y; P ) be the number of integers in (x, x + y] which are coprime with P : in a formula X S(x, y; P ) := 1. x

Substituting the integer parts with their argument we introduce a O(1) error in each summand, so that

X µ(d)  X  = y + O |µ(d)| . d d|P d|P 76 3.1. ERATOSTHENES-LEGENDRE’S SIEVE

P P The proof concludes recalling that ϕ(P ) = (µ ∗ I)(P ), i.e. ϕ(P ) = d|P µ(d) d , P ω(P ) 3 and that d|P |µ(d)| = 2 .  ϕ(P ) The previous result is non-trivial only when the explicit term P y represents the ‘main term’, i.e. when it is actually larger than the error term 2ω(P ). This is a condition on P with respect to y. We have already mentioned that Eratosthenes’ sieve corresponds to the claim Y π(x + y) − π(x) = S(x, y; P ) for P := p. √ p≤ x+y This holds as equality, but it needs a fixed and very large value for P : a so large value, actually, that in this case the error term in Proposition 3.2 becomes exponentially larger than the explicit term appearing there, so that anything useful can be deduced from this equality. On the other hand, it is still possible to deduce something interesting from this tool: simply we need to give up the demand of an equality. Independently of our choice of P , each prime number has only two possibilities: either is in P or it is coprime with P . As a consequence (3.4) π(x + y) − π(x) ≤ ω(P ) + S(x, y; P ). Q This is now only an upper bound, but it holds for every P . Suppose P = p≤z p, for a convenient z that we will choose later. Then ω(P ) = π(z) so that (3.4) and Proposition 3.2 give Y   π(x + y) − π(x) ≤ π(z) + y 1 − 1/p + O(2π(z)). p≤z Q −1 γ Recalling Mertens’ result p≤z 1 − 1/p ∼ e ln z and by setting z = ln y we conclude that y π(x + y) − π(x) ≤ (e−γ + o(1)) . ln ln y This is a weak result (for example when y = x, the left-hand side has order x/ ln x while the right-hand side is x/ ln ln x, so that it does not capture the real order of growth of the left-hand side), but it is still interesting since it is totally uniform in x. For example, it gives an upper bound for primes in extra short intervals which are not accessible with the prime number theorem, neither assuming the Riemann Hypothesis. At last, for x = 1 it also produces the upper bound y π(y) ≤ (e−γ + o(1)) ln ln y

3 P k Let F (m) := d|m |µ(d)|, then F is multiplicative, since |µ| is multiplicative, and F (p ) = 2 for every prime p and every power k ≥ 1. CAP. 3: SIEVE METHODS 77 which is not optimal but has been obtained without great effort.

3.2. Selberg’s Λ2-method How one can improve this result? In 1915 Brun realized that in order to get an upper bound for S(x, y; P ) we can substitute the M¨obiusfunction µ with any function λ+ such that + X + (3.5) λ1 = 1, λd ≥ 0 ∀m. d|m In fact, repeating with this generic function what we have done for the M¨obius, we get

X X X + S(x, y; P ) = 1 ≤ λd x

− X − (3.6) λ1 = 1, λm ≤ 0 ∀m. d|m Therefore, we have the opportunity to improve the previous result (i.e., getting a smaller error term), with a suitable choice of λ±. The concrete realization of Brun of this principle was by setting ( ( µ(d) if d ∈ D(2r) µ(d) if d ∈ D(2r − 1) λ+ = , λ− = d 0 otherwise d 0 otherwise

± where D(`) := {n: ω(n) ≤ `} and r is a new parameter. In fact λ1 = 1 and for each squarefree m we have (See Equation (3.3) and Footnote 1)

2r 2r X X X X ω(m) ω(m) − 1 λ+ = µ(d) = (−1)j = d j 2r d|m j=0 d|m j=0 ω(d)=j 78 3.2. SELBERG’S Λ2-METHOD and

2r−1 2r−1 X X X X ω(m) ω(m) − 1 λ− = µ(d) = (−1)j = − , d j 2r − 1 d|m j=0 d|m j=0 ω(d)=j proving that the conditions (3.5-3.6) are satisfied.

` P` j Exercise. 3.1 Let 0 ≤ a0 ≤ a1 ≤ a2 ≤ · · · . Prove that (−1) j=0(−1) aj ≥ 0 for every `. 4 Let a0, a1, a2, . . . , an be an unimodal sequence of nonnegative numbers and such Pn j ` P` j that j=0(−1) aj = 0. Prove that (−1) j=0(−1) aj ≥ 0 for every `. ω(P ) Pn j ω(P ) Remark: Since aj = j is unimodal and j=0(−1) aj = (1 − 1) = 0, we ` P` jω(P ) can use this argument to prove that (−1) j=0(−1) j ≥ 0 for every `.

With Brun’s choice of λ± it is a bit difficult (but perfectly possible) to estimate P ± the asymptotic behavior of d|P λd /d. In 1946 Selberg had a clever idea circu- mventing this difficulty. He remarked that for every sequence Λn of real numbers with Λ1 = 1, one has (  X 2 1 if m = 1 Λd ≥ 0 if m > 1, d|m so that the argument giving an upper bound for S(x, y; P ) becomes:

X X  X 2 S(x, y; P ) = 1 ≤ Λd x

X ΛdΛe  X 2 = y + O |Λ | . [d, e] d d|P d|P e|P

4 i.e., a sequence for which there exists an index h such that 0 ≤ a0 ≤ a1 ≤ · · · ≤ ah ≥ ah+1 ≥ ah+2 ≥ · · · ≥ an. CAP. 3: SIEVE METHODS 79

+ P This fits with Brun’s remark: it amounts to taking λn = d,e ΛdΛe. [d,e]=n Our purpose is to find the sequence Λd minimizing the quadratic form

X ΛdΛe [d, e] d|P e|P under the restriction Λ1 = 1, with the hope that for this sequence we also have a P 2 small error term d|P |Λd| . We also assume that Λd = 0 for d ≥ z, where z is a parameter that we will set later: this assumption will contribute to control the size of the error term. With suitable transformations we can diagonalize the quadratic form. In fact, least common multiple [d, e] and greatest common multiple (d, e) satisfy the identity [d, e](d, e) = de, therefore

X ΛdΛe X Λd Λe = (d, e). [d, e] d e d|P d|P e|P e|P P Since I = ϕ ∗1, i.e. m = f|m ϕ(f) for every m, we can write this sum as

X ΛdΛe X Λd Λe X X  X Λd 2 X = ϕ(f) = ϕ(f) = ϕ(f)y2, [d, e] d e d f d|P d|P f|d f|P d: f|P e|P e|P f|e f|d|P with X Λd y := . f d d: f|d|P

This is a linear transformation {Λd}d|P → {yf }f|P which is not singular since we can easily invert the previous relation, obtaining that

X Λd = d yf µ(f/d). f: d|f|P In fact, X X X X X yf µ(f/d) = µ(f/d) Λh/h = Λh/h µ(f/d). f: f: h: h: f: d|f|P d|f|P f|h|P d|h|P d|f|h Here both h and f are divisible by d, thus setting f = du and h = dh0, the inner P 0 sum becomes u|h0 µ(u) which is 1 when h = 1 (i.e. h = d) and 0 otherwise, hence the sum is equal to Λd/d. 80 3.2. SELBERG’S Λ2-METHOD P In terms of the new variables yf the restriction Λ1 = 1 becomes f|P yf µ(f) = 1, and the assumption Λd = 0 for d ≥ z becomes yf = 0 for f ≥ z.

Exercise. 3.2 For a finite set of indexes f, let cf , df be real numbers, with cf > 0 P 2 P for every f. Prove that the minimum of f cf yf restricted to f df yf = 1 is df /cf P 2 −1 reached when yf = P 2 and is equal to ( h dh/ch) . h dh/ch Hint: complete the squares, or use the Lagrange condition for the stationary points (to get the yf ) and the convexity (to prove that they produce a minimum). Let X 2 LP (z) := µ(n) / ϕ(n), n≤z n|P µ(f) then for yf = when f ≤ z the quadratic form assumes its minimum value ϕ(f)LP (z) which is 1/LP (z). Now we have to find a convenient bound for the size of the error term. We have X d X µ(f)µ(f/d) Λd = d yf µ(f/d) = . LP (z) ϕ(f) f: f: d|f|P d|f|P f≤z We are assuming that P is squarefree, therefore d and f/d are coprime, so that µ(f) = µ(d)µ(f/d) and ϕ(f) = ϕ(d) ϕ(f/d), and µ(f/d)2 = 1, hence dµ(d) X 1 dµ(d) X 1 Λd = = . LP (z) ϕ(d) ϕ(f/d) LP (z) ϕ(d) ϕ(m) f: m: d|f|P m|(P/d) f≤z m≤z/d Therefore X 1 X d X 1 1 X d X 1 |Λd| ≤ ≤ LP (z) ϕ(d) ϕ(m) LP (z) ϕ(d) ϕ(m) d≤z d: m: d≤z m≤z/d d|P m|(P/d) d≤z m≤z/d 1 X 1 X d = . LP (z) ϕ(m) ϕ(d) m≤z d≤z/m d P µ(r)2 Note that ϕ(d) = r|d ϕ(r) (prove it, for example using the multiplicativity) so that X d X X µ(r)2 X µ(r)2 X X 1 jy k = = 1 ≤ ϕ(d) ϕ(r) ϕ(r) ϕ(r) r d≤y d≤y r|d r≤y d≤y r≤y r|d CAP. 3: SIEVE METHODS 81

X 1 ≤ y  y r ϕ(r) r≤y P 1 Q p 5 because r r ϕ(r) = p(1 + (p−1)(p2−1) ) < ∞ . Hence X z X 1 z |Λd|   . LP (z) m ϕ(m) LP (z) d≤z m≤z Summarizing, we have proved the following result. Theorem 3.1 Let P be squarefree, and z ≥ 1. Then y  z 2 X µ(n)2 S(x, y; P ) ≤ + O , where L (z) := . L (z) L (z) P ϕ(n) P P n≤z n|P

The claim of the theorem is useful only if we have a lower bound for LP (z). The following lemma gives such a bound. Lemma 3.1 X µ(n)2 ≥ ln z. ϕ(n) n≤z Exercise 3.3 below gives a way to prove that the sum is asymptotic to ln z. Proof. Given an integer m, let s(m) denote the squarefree-part of m, i.e. the greatest squarefree integer dividing m. Let n be squarefree, then 1 1 Y  1−1 1 Y  1 1  X 1 = 1 − = 1 + + + ··· = . ϕ(n) n p n p p2 m p|n p|n m: s(m)=n Since s(m) ≤ m, we have that m ≤ z implies s(m) ≤ z, therefore X µ(n)2 X X 1 X 1 X 1 = = ≥ ≥ ln z. ϕ(n) m m m  n≤z n≤z m: m: m≤z s(m)=n s(m)≤z

Anyway, the lemma gives a lower bound for LP (z) only if P is such that n ≤ z and squarefree already implies that n|P , and a way to realize this condition is by Q setting P = p≤z p. In this way we have: Q Corollary 3.1 Let z ≥ 1 and let P = p≤z p. Then y  z2  S(x, y; P ) ≤ + O , ln z ln2 z

5 P 1/ ϕ(r) Use the multiplicativity to write r rs as an Euler product, and then take s = 1. 82 3.2. SELBERG’S Λ2-METHOD √ in particular, for z = y we have 2y  y  S(x, y; P ) ≤ + O . ln y ln2 y The second claim is a remarkable improvement on the result proved with the Eratosthenes-Legendre sieve. Using the bound (3.4) we get that Corollary 3.2 2y  y  π(x + y) − π(x) ≤ + O . ln y ln2 y Apart for the constant 2, this upper bound is essentially the best we can obtain without imposing additional conditions on the size of x and y. We can combine the previous result with a general lemma, obtaining a more general conclusion. The lemma will be given as a part of the proof. Theorem 3.2 Let P be squarefree. Then  Y  1  1  S(x, y; P ) ≤ eγy 1 − 1 + O . p ln y p|√P p≤ y Proof. Let q be a squarefree integer, coprime with P . Then q X (3.7) ϕ(q)S(x, y; P ) = S(x + mP, y; qP ). m=1 In fact, by definition the right-hand side is equal to q q q X X X X X S(x + mP, y; qP ) = 1 = 1, m=1 m=1 x+mP

The inner sum is equal to ϕ(q), because the map j : Z/qZ → Z/qZ, defined by j([m]q) = [r + mP ]q, is injective (here we use the hypothesis (q, P ) = 1), thus X = ϕ(q) 1 = ϕ(q)S(x, y; P ). x

Let M(y; P ) := maxx S(x, y; P ), then from (3.7) we get q (3.8) M(y; P ) ≤ M(y; qP ). ϕ(q) Now, let P be given as in the statement of the theorem. Let Y Y P1 := p , q1 := p,

p|√P p-√P p≤ y p≤ y with P and/or q equal to 1 if the corresponding product is empty. The product 1 1 √ q1P1 is the product of the complete set of primes smaller than y, therefore the bound of Corollary 3.1 applies to M(y; q1P1). By (3.8) we get

q1  Y  1−1  2y  y  M(y; P1) ≤ M(y; q1P1) ≤ 1 − · + O 2 ϕ(q1) p ln y ln y p|q1  Y  1−1  Y  1  1  = y 1 − · eγ 1 − 1 + O , p √ p ln y p|q1 p≤ y where we have used the result in Ex. 1.49. Therefore,

 Y  1  1  = yeγ 1 − 1 + O . p ln y p|√P p≤ y

Now the claim follows, because S(x, y; P ) ≤ S(x, y; P1) ≤ M(y; P1) (the first claim follows because P1|P , the second one by the definition of M(y; P1)). 

3.3. Sifting more classes In the previous section we have sifted (eliminated) the integers which are not coprime with an integer P , i.e. which are in the class 0 modulo the primes dividing P . It is possible to modify the argument in order to consider the more general case where for every prime p|P there are more classes which must be avoided. Let we fix our notation. For each prime p dividing the squarefree P let B(p) be a set of classes modulo p which are the ‘bad’ classes. We look for an upper bound of X SB(x, y; P ) := 1. x

X X X + X + X SB(x, y; P ) = 1 ≤ λd = λd 1. x

X X X   y   = Λ Λ 1 = Λ Λ b([d, e]) + O(b([d, e])) d e d e [d, e] d|P x

X b([d, e]) X b(d)Λd b(e)Λe (d, e) Λ Λ = , [d, e] d e d e b((d, e)) d|P d|P e|P e|P because [d, e](d, e) = de and b([d, e])b((d, e)) = b(d)b(e). Now, let g : N → N be the completely multiplicative map such that b(p) g(p) := . p − b(p) Then, for m squarefree

m Y p Y  p − b(p) Y  1  X 1 = = 1 + = 1 + = , b(m) b(p) b(p) g(p) g(f) p|m p|m p|m f|m thus

X b([d, e]) X b(d)Λd b(e)Λe (d, e) Λ Λ = [d, e] d e d e b((d, e)) d|P d|P e|P e|P

X b(d)Λd b(e)Λe X 1 = d e g(f) d|P f|d e|P f|e

X 1  X b(d)Λd 2 X 1 = = y2 g(f) d g(f) f f|P d: f|P f|d|P where X b(d)Λd y := . f d d: f|d|P 86 3.3. SIFTING MORE CLASSES

Once again this linear transformation can be inverted, obtaining

b(d)Λd X (3.10) = y µ(f/d), d f f: d|f|P P so that the condition Λ1 = 1 becomes equivalent to the restriction f|P yf µ(f) = 1, and the cut-off condition Λd = 0 for d ≥ z to the cut-off condition yf = 0 for f ≥ z. Using the general result in Ex. 3.2 we conclude that the minimum for the quadratic form in lambdas is equal to 1/LP (z), where X 2 LP (z) := µ(n) g(n), n≤z n|P and is reached when

(3.11) yf = µ(f)g(f)/LP (z) for f ≤ z, 0 otherwise. In order to get the result for twin primes we need to analyze in greater detail the situation where b(2) = 1 and b(p) = 2 for every p > 2. Q Proposition 3.3 Let z ≥ 1 and let P = p≤z p; suppose b(2) = 1 and b(p) = 2 for every odd prime p dividing P . Then, y  y   z2  SB(x, y; P ) ≤ 2c + O + O , ln2 z ln3 z ln2 z Q 1  where c := 2 p>2 1 − (p−1)2 = 1.320323 .... Proof. Our choice for P gives X 2 LP (z) = L(z) := µ(n) g(n). n≤z Main term. Our first task is deduce a lower bound for L(z). The definition of g(p) in our case becomes ( b(p) 1 if p = 2, g(p) = = 2 p − b(p) p−2 if 2 < p < z, 2 but we extend g(p) = p−2 to all odd primes (hence, also larger than z). This assumption does not affect the number LP (z), but simplifies the proof of its as- ymptotic. We consider the Dirichlet series associated with µ(n)2g(n); being mul- tiplicative, this Dirichlet series has an Euler product: ∞ X µ(n)2g(n) Y  2 1  G(s) := = (1 + 2−s) 1 + . ns p − 2 ps n=1 p>2 CAP. 3: SIEVE METHODS 87

The main term in this Euler product has order 2/ps+1, which is also the main order in ζ2(s + 1), thus we write G(s) as ζ2(s + 1)H(s), where H(s) is hence equal to ∞ X Y  2 1  1 1 2 H(s) = h(n)n−s = (1 + 2−s)(1 − 2−1−s)2 1 + 1 − . p − 2 ps p ps n=1 p>2 Note that the Euler factors in H(s) are equal to 4 1 4 1 1 2 1 1 + − + + p − 2 ps+1 p − 2 p2s+1 p2s+2 p − 2 p3s+2 thus their product converges absolutely for Re(s) > −1/2. The representation as 2 P∞ s Euler product of ζ (s + 1) is n=1[d(n)/n]/n , where d(n) denotes the devisor function, therefore the identity G(s) = ζ2(s + 1)H(s) gives X d(m)  n  µ(n)2g(n) = h . m m m|n As a consequence X X d(u) X X d(u) L(z) = µ(n)2g(n) = h(v) = h(v) . u u n≤z uv≤z v≤z u≤z/v By partial summation we see that P Z m P X d(u) u≤m d(u) r≤u d(r) = + du u m u2 u≤m 1 P and recalling the Dirichlet result u≤m d(u) = m ln m + O(m) (see Prop. 1.15) we get X d(u) Z m u ln u 1 = du + O(ln m) = ln2 m + O(ln m), u u2 2 u≤m 1 therefore X X d(u) X 1 L(z) = h(v) = h(v) ln2(z/v) + O(ln(z/v)) u 2 v≤z u≤z/v v≤z X 1 = h(v) (ln2 z − 2 ln z ln v + ln2 v) + O(ln z)). 2 v≤z Since the Euler product converges absolutely for Re(s) > −1/2, we have ∞ X X |h(n)| |h(v)|  1, since converges at s = 0, ns v≤z n=1 88 3.3. SIFTING MORE CLASSES

∞ X X |h(n)| |h(v)| ln v  1, since it gives the 1st derivative of at s = 0, ns v≤z n=1 ∞ X X |h(n)| |h(v)| ln2 v  1, since it gives the 2nd derivative of at s = 0, ns v≤z n=1 X h(v) = H(0) + O(1/zθ) ∀θ < 1/2, v≤z H(0) so that L(z) = ln2 z + O(ln z). 2 Note that ∞ X 1 Y  2  12 1 Y  1 −1 H(0) = h(n) = 1 + 1 − = 1 − = c−1. 2 p − 2 p 2 (p − 1)2 n=1 p>2 p>2 Error term. According to (3.9) the error term is equal to X X b(d)b(e) X  X 2 b([d, e])|Λ Λ | = |Λ Λ | ≤ b(d)b(e)|Λ Λ | = b(d)|Λ | , d e b((d, e)) d e d e d d|P d|P d|P d|P e|P e|P e|P where we have used the fact that b(m) ≥ 1 for every m dividing P to simplify the bound. Recalling the relation (3.10) giving Λd as a function of yf and (3.11) giving our choice for yf , we have X X X X X b(d)|Λd| = d yf µ(f/d) = d µ(f)µ(f/d)g(f)/L(z) d|P d|P f: d|P f:f≤z d|f|P d|f|P therefore, using the co-primality of d and f/d, we have X 1 X X b(d)|Λ | ≤ µ2(d)dg(d) µ2(f/d)g(f/d) d L(z) d|P d≤z f/d≤z/d d|P 1 X X ≤ µ2(d)dg(d) µ2(m)g(m) L(z) d≤z m≤z/d 1 X X (3.12) = µ2(m)g(m) µ2(d)dg(d). L(z) m≤z d≤z/m P 2 Hence we need an upper bound for d≤X µ (d)dg(d). This bound can be obtained in the usual way, considering the Dirichlet series associated with the arithmetical CAP. 3: SIEVE METHODS 89 function µ2(n)ng(n): ∞ X µ2(n)ng(n) Y  2p 1  G(s) := = (1 + 21−s) 1 + . ns p − 2 ps n=1 p>2 The main term appearing in the Euler product decays as 2p−s which is the same decay for ζ2(s), therefore we write G(s) as ζ2(s)H(s), where ∞ X h0(n) Y  2p 1  1 2 H(s) := = (1 + 21−s)(1 − 2−s)2 1 + 1 − , ns p − 2 ps ps n=1 p>2 P 2 which converges absolutely for Re(s) > 1/2. Writing d≤X µ (d)dg(d) in terms of d(n) and h0(n), we have X X X X µ2(d)dg(d) = d(u)h0(v) = h0(v) d(u) d≤X uv≤X v≤X u≤X/v X X X |h0(v)|  |h0(v)| ln(X/v)  X ln X  X ln X v v v≤X v≤X P where we have used Proposition 1.15 to bound u≤X/v d(u) and the absolute P |h0(v)| convergence of H(s) at s = 1 to conclude that v≤X v  1. With this result (3.12) becomes X 1 X z z ln z X µ2(m)g(m) b(d)|Λ |  µ2(m)g(m) ln(z/m)  . d L(z) m L(z) m d|P m≤z m≤z The last sum converges to G(2) for z → ∞, and L(z) has order ln2 z, thus the result follows.  Recall that two primes p, q are called twin primes when p − q = ±2. At the present it is not known if the twin primes are an finite/infinite set. Indeed, the first modern application of the sieves methods was given by Brun just in connection with this problem.

Theorem 3.3 (Brun) Let S2(x, y) denote the number of twin primes couples in (x, x + y]. Then, y  ln ln y  S2(x, y) ≤ 8c 1 + O . ln2 y ln y

In particular, let π2(x) be the number of twin primes couples in (1, x). Then x  ln ln x π2(x) ≤ 8c 1 + O . ln2 x ln x 90 3.3. SIFTING MORE CLASSES Q Proof. In fact, let P = p≤z p for a z > 1. Let p, p + 2 be a twin primes couple in (x, x + y]. There are three possibilities:

1. p 6= 0 (mod pj) and p 6= −2 (mod pj) for every pj dividing P , 2. p = 0 (mod pj) for some pj dividing P , 3. p = −2 (mod pj) for some pj dividing P .

The primes in 1 are evidently at most SB(x, y; P ), those in 2 are π(z) at most, and those in 3 are π(z) at most as well, because p + 2 is by hypothesis a prime so that p = −2 (mod pj) implies that p + 2 = pj. As a consequence

S2(x, y) ≤ 2π(z) + SB(x, y; P ) √ so that the claim follows by Proposition 3.3 where we have set z = y/ ln y. The claim for π2(x) follows immediately, because π2(x) ≤ S2(1, x).  As we see, we ignore if there are infinitely many twin primes, but for sure they costitute a set of primes having zero density with respect to the full set of primes, because π (x) ]{p: p, p + 2 ∈ P, p ≤ x} x/ ln2 x 2 =  → 0. π(x) ]{p: p ∈ P ≤ x} x/ ln x Actually, this set is small enough to produce a converging series when their inverses are summed. In fact Z x Z x X 1 π2(x) π2(u) u ≤ + du  O(1) + du  1. p x u2 u2 ln2 u p: p,p+2∈P 2 2 p≤x

Let r be a positive, even integer, and let Pr be the set of prime p such that p + r is a prime, too. In this setting the twin primes correspond to r = 2. In which extent we have to modify our argument in order to get an upper bound also for the density of Pr? We notice that the classes 0 (mod p) and −r (mod p) are distinct if and only if p - r. This means that b(p) equals 1 (not 2) for those odd primes dividing r. This means that when we repeat the proof of Proposition 3.3 we meet a different function Hr(s) which, in terms of the previous one is equal to 1 Y 1 + (p−1)ps  Hr(s) = H(s) . 1 + 2 p|r (p−2)ps p>2 The change involves only a finite set of primes, therefore the argument runs exactly as before, the unique difference being now that Y p − 2 H (0) = H(0) . r p − 1 p|r p>2 CAP. 3: SIEVE METHODS 91

This proves the following result.

Theorem 3.4 Let r be a positive and even integer; let Sr(x, y) denote the number of primes p in (x, x + y] such that also p + r is a prime. Then, y  ln ln y  Sr(x, y) ≤ 8c(r) 1 + O , ln2 y ln y Q p−1  where c(r) := c p|r p−2 . p>2 Tracing back the proof of this result, we can see that the claim is uniform in r, i.e., that it is correct also when r changes its value with x and/or y. This is an important remark: for Theorem 3.5 we will need this claim with r  y, indeed.

3.4. Two sets with positive density We conclude the chapter with two typical results which will be further analyzed in next chapter. The first one involves the sum of two primes, a topic which is intimately connected with Goldbach’s conjecture: is it true that every integer N can be written as sum of two primes? Since there is only one even prime, the question is interesting only when N is even. Apart this restriction, there are many good reasons to believe that the answer should be positive and that, besides, the number of representations should be quite large. In fact, there are N/ ln N primes up to N, hence there are (N/ ln N)2 couples of primes and ≈ 2N integers, therefore we expect that ‘in mean’ each integer 2N can be represented in 2N/ ln2 N different ways. Let X RP+P (2N) := 1, p,q∈P p+q=2N then this simple probabilistic model suggests that 2N (Conjecture) RP+P (2N)  . ln2 N The previous argument assumes that the choice for p and q can be considered as independent, but this is unrealistic, since the equation p + q = 2N shows that if p is randomly taken, then q must belong to a well defined class modulo any prime dividing 2N. A more refined probabilistic model which takes account of these dependencies shows that Y  1 2N (refined Conjecture) RP+P (2N)  1 + , p ln2 N p|N p odd where the implicit constant is independent of 2N. The known results are quite distant from the statement of the conjecture, but still surprising; for example: 92 3.4. TWO SETS WITH POSITIVE DENSITY

1) in ’75 Montgomery and Vaughan proved that the exceptions to the conjecture, i.e. the set of even integers for which the conjecture fails (if it is not empty), contains at most N 1−δ integers up to N, where δ is a small positive constant. Several people worked on this problem producing larger and larger values for δ. Up to now the record goes to Pintz, who proved this claim with δ = 0.28 in 2018. 2) In ’73 Chen proved that every even (and large enough) integer can be written as sum of two primes or as sum of a prime and a product of two primes. 3) In ’37 Vinogradov proved that the analogous conjecture about the representa- bility of an odd integer as sum of three primes (the so called Goldbach’s ternary problem) is true for integers larger than m0, and the number of representations agrees with the conjectured one. 4) The original argument of Vinogradov does not produce an explicit value for m0, but later explicit bounds have been found, up to the spectacular result of Helfgott in 2014 completely solving the problem: one can take m0 = 5, i.e. every odd number larger than 5 may be written as sum of three prime numbers. Our results in are sufficiently strong to prove the following fact. Theorem 3.5 Y  1 2N RP+P (2N)  1 + . p ln2 N p|N p odd In other words, we do not know if 2N can be written as sum of two primes, but if this is possible, then the number of such representations is essentially of the same order which is foreseen by the conjecture. √ Q Proof. As in Proposition 3.3 take P = p≤z p, with z = N/ ln N. Let p be a prime such that 2N − p is a prime as well. There are three cases: Case 1. p divides P ; Case 2.2 N − p divides P ; Case 3. both p and 2N − p are coprime to P . An upper bound for primes in Cases 1 and 2 is π(z), while an upper bound for primes in Case 3 is SB(1, 2N; P ). Proposition 3.3 gives a bound for SB(1, 2N; P ), Q p−1  which we must correct by a factor p|N p−2 to take account (as in the proof of p>2 Theorem 3.4) of the fact that in the present case the classes 0 (mod p) and −2N (mod p) coincide when p is a divisor of N. CAP. 3: SIEVE METHODS 93

The proof concludes by noticing that Y Y p − 1 Y Y  1 := and := 1 + p − 2 p 1 p|N,p odd 2 p|N,p odd are of the same size, i.e. that there exist two positive constant c1 < c2 independent of N such that Y Y Y c1 ≤ ≤ c2 .  1 2 1

Theorem 3.6 (Schnirelmann) Let SP+P be the set of integers which can be writ- ten as sum of two primes. Then ]{SP+P ∩ (0, x)}  x. Proof. The result follows quite directly from a clever use of the Cauchy-Schwarz inequality and the upper bound for RP+P we have found in Theorem 3.5. In fact, k let {an}n=1 be a set of complex numbers, and set δn = 1 when an 6= 0, δn = 0 otherwise. Then k k k k k 1/2 1/2 1/2 X X h X 2i h X 2i  1/2h X 2i |an|= δn|an|≤ δn |an| = ]{n: an 6= 0} |an| n=1 n=1 n=1 n=1 n=1 thus k k 2 h X i h X 2i (3.13) ]{n: an 6= 0} ≥ |an| / |an| . n=1 n=1

This inequality shows that a lower bound for ]{n: an 6= 0} comes from a lower bound for the `1 norm of the sequence and an upper bound for its `2 norm. In the present case we apply the previous idea with an := RP+P (n), so that the number of an 6= 0 with n ≤ x becomes the number of integers below x which can be written as sum of two primes. The number p + q is lower than X whenever p, q are both lower than X/2. Hence there are π(X/2)2  X2/ ln2 X such couples, so that

X X 2 2 2 (3.14) RP+P (n)  1  π(X/2)  X / ln X. n≤X p,q≤X/2 On the other hand, using the upper bound in Theorem 3.5 we have: X X Y  12 n2 X2 X Y  12 R2 (n)  1 +  1 + P+P p ln4 n ln4 X p n≤X n≤X p|n n≤X p|n p odd (in the last inequality we have used the fact that x/ ln2 x grows for x large enough). Q 1 2 Let f be the arithmetical function which is defined by f(n) := p|n 1 + p . To 94 3.4. TWO SETS WITH POSITIVE DENSITY P get an upper bound for n≤X f(n) we consider the Dirichlet series ∞ X f(n) Y  (1 + 1/p)2  = 1 + ns ps − 1 n=1 p that we write it as ζ(s)G(s). The Euler product defining G(s) converges absolutely P∞ −s for Re(s) > 0. Let G(s) be written as Dirichlet series, G(s) = n=1 g(n)n , then X X X X  X g(v) X f(n) = g(v) = g(v) = X + O( |g(v)|)  X, v v n≤X uv≤X v≤X v≤X v≤X where the last bound follows by the fact that the series defining G(s) converges absolutely for Re(s) > 0. In this way we have proved that X X3 (3.15) R2 (n)  . P+P ln4 X n≤X Now we can cast (3.14) and (3.15) into (3.13), to get x4/ ln4 x ]{SP+P ∩ (0, x)}   x. x3/ ln4 x  An analogous argument proves the following result.

Theorem 3.7 (Romanoff) Let SP+2N be the set of integers which can be written as sum of a prime and a power of 2. Then ]{SP+2N ∩ (0, x)}  x. In some sense it is still more surprising than Schnirelmann’s result: adding to the zero density set of primes an even thinner set (the powers of 2) we still get a set of positive density. Proof. Again we use the Cauchy-Schwartz inequality to deduce the claim from an upper bound for P R2 (n) and a lower bound for P R (n). n≤X P+2N n≤X P+2N The lower bound can be deduced easily: there are π(X/2) primes up to X/2 and ln X powers of 2 below X/2, therefore X X (3.16) RP+2N (n)  1  π(X/2) ln X  X. n≤X p≤X/2 2y≤X/2 Besides, we have X 2 X RP+2N (n) = 1. n≤X p1,p2,k,j k p1+2 ≤X j p2+2 ≤X k j p1+2 =p2+2 CAP. 3: SIEVE METHODS 95

The contribution to this sum of the ‘diagonal’ terms, i.e. those terms with k = j (and hence p1 = p2) is  X since there are π(X) ∼ X/ ln X choices for p1 and ln X choices for k. The non-diagonal terms contribute by X X X j k 1 = π2(X, 2 − 2 )

k

Q 1  P µ(d)2 P 1 By multiplicativity p|2m−1 1 + p = d|2m−1 d ≤ d|2m−1 d , thus p odd d odd d odd X X X 1  . ln X m d m≤ln2 X d|2 −1 d odd At last, exchanging the order of the sums and recalling the contribution of the diagonal term we get

X X X 1 X R2 (n)  1 + O(X). P+2N ln X d n≤X d≤X m≤ln2 X d odd d|2m−1 P Let h2(d) be the order of 2 modulo d, then m≤ln2 X 1  ln X/h2(d), therefore d|2m−1

X 2 X 1 RP+2N (n)  X + O(X). dh2(d) n≤X d≤X d odd The difficult part of the proof is proving that P 1 < +∞. The original d odd dh2(d) proof of this fact was quite difficult, but Erd˝ossimplified it in ’50, and now it runs h (d) as follow. We notice that the definition of h2(d) implies that d|(2 2 − 1). Thus, set X 1 a := , n d d odd h2(d)=n 96 3.4. TWO SETS WITH POSITIVE DENSITY

Q n and let Q := n≤X (2 − 1). If h2(d) ≤ X, then d|Q, so that X X 1 Y  1 1  Q a ≤ ≤ 1 + + + ··· =  ln ln Q n m p p2 ϕ(Q) n≤X m|Q p|Q

ϕ(n) 1 where for the last step we have used the bound n  ln ln n which holds as n diverges6. P n 2 The definition of Q shows that Q ≤ 2 n≤X  2X , therefore X an  ln X. n≤X By partial summation we conclude that X 1 X an = < +∞. dh2(d) n d≤X n≤X d odd In this way we have proved that X 2 (3.17) RP+2N (n)  X. n≤X By (3.16), (3.17) and Cauchy-Schwartz’s trick (3.13), we finally conclude that x2 ]{S ∩ (0, x)}   x. P+2N x  Let d and d be respectively the liminf and the limsup of the quotient 1 ]{S ∩ (0, x)} x P+2N

6Its proof is the following. We split the set of primes dividing n into two sets, those ones which are ‘small’ (i.e. smaller than ln n) and those one which are ‘large’ (i.e. larger than ln n), getting ϕ(n) Y  1  Y  1  Y  1  = 1 − = 1 − · 1 − . n p p p p|n p|n p|n p 0, and the d ≤ 1/2 is immediate, since only even integers that can be represented in that way are  x/ ln x (for them p either 2, or k is 1). Van der Corput and Erd¨osproved that d < 1/2, strictly. Recent results of Pintz7, Habsieger and Roblot8, and Elsholtz and Schlage-Puchta9, proved that 0.107648 < d ≤ d < 0.49095. Based on some numerical evidence and a probabilistic model, Romani conjectured that d = d = 0.434 ... .

7J. Pintz, A note on Romanov’s constant, Acta Math. Hungar. 112(1-2), 2006, p. 1–14. 8L. Habsieger and X.-F. Roblot, On integers of the form p+2k, Acta Arith. 122(1), 2006, p. 45–50. 9C. Elsholtz and J.-C. Schlage-Puchta, On Romanov’s constant, Math. Z. 288(3-4), 2018, p. 713– 724. 98 3.4. TWO SETS WITH POSITIVE DENSITY

Exercise. 3.3 The following steps provide the asymptotic behavior for the sum P 2 n≤X µ(n) / ϕ(n). 2 P s 1) Let cn := nµ(n) / ϕ(n) and let F (s) := n cn/n . Prove that Y  1 1  F (s) = ζ(s)G(s), with G(s) := 1 + − σ > 1. (p − 1)ps (p − 1)p2s−1 p The Euler product defining G(s) converges absolutely in Re(s) > 1/2, the- refore the previous equality gives the meromorphic continuation of F (s) in Re(s) > 1/2, with a unique, simple, pole at s = 1, coming from the pole of ζ. The residue of F (s) at s = 1 is 1, since G(1) = 1. 2) Let g(n) denote the Dirichlet coefficients of G(s), so that X cn = g(k). k|n Deduce that X X X X X  X X c = g(k) = g(k) = X g(k)/k + O( |g(k)|). n k n≤X n≤X k|n k≤X k≤X k≤X 3) Prove that: X g(k)/k → 1 as X → ∞, k≤X X X |g(k)| ≤ Xθ |g(k)|k−θ  Xθ ∀θ > 1/2, k≤X k≤X X X |g(k)|/k ≤ Xθ−1 |g(k)|/kθ  Xθ−1 ∀θ > 1/2, k>X k>X so that X θ cn = X + Oθ(X ), ∀θ > 1/2. n≤X

2 4) Let dn := µ(n) / ϕ(n), so that cn = ndn. The result in Step 3 can be written as X θ n(dn − 1/n) = Oθ(X ) ∀θ > 1/2. n≤X By partial summation deduce that X (dn − 1/n) = O(1). n≤X CAP. 3: SIEVE METHODS 99

Conclude that X dn = ln X + O(1). n≤X Exercise. 3.4 The following steps provide a different argument giving the asymp- P 2 totic behavior for the sum n≤X µ(n) / ϕ(n). 2 P s 1) Repeat Step 1 in Ex. 3.3, i.e. let cn := nµ(n) / ϕ(n), let F (s) := n cn/n and prove that Y  1 1  F (s) = ζ(s)G(s), with G(s) := 1 + − σ > 1. (p − 1)ps (p − 1)p2s−1 p The Euler product defining G(s) converges absolutely in Re(s) > 1/2, the- refore the previous equality gives the meromorphic continuation of F (s) in Re(s) > 1/2, with a unique, simple, pole at s = 1, coming from the pole of ζ. The residue of F (s) at s = 1 is 1, since G(1) = 1.

2 R Xs 2 2) Let σ > 0. Prove that 2πi (σ) s(s+1)(s+2) ds = (1−1/X) if X > 1, 0 otherwise. 3) Using the previous identity, prove that 2 Z σ+i∞ Xs X H(X) := F (s) ds = c (1 − n/X)2, 2πi s(s + 1)(s + 2) n σ−i∞ n≤X for every σ > 1.

1 1 R ∞ B1({x}) 4) Let σ > 0. Using the representation ζ(s) = s−1 + 2 − s 1 xs+1 dx prove 1 that ζ(s) − s−1  (1 + |t|) uniformly in Re(s) > 1/2.

5) Let σ > 1. Write H(X) as Σ1 + Σ2 + Σ3, where 2 Z σ+i∞ 1 Xs Σ1 = ds 2πi σ−i∞ s − 1 s(s + 1)(s + 2) 2 Z σ+i∞  1  Xs Σ2 = ζ(s) − ds 2πi σ−i∞ s − 1 s(s + 1)(s + 2) 2 Z σ+i∞ Xs Σ3 = (F (s) − ζ(s)) ds. 2πi σ−i∞ s(s + 1)(s + 2)

Note that Σ1 = X/3 + O(1) (move σ → −∞ and collect the residues at 1, 0, θ −1 and −2). Use Step 4 to prove that Σ2 and Σ3 both are O(X ) for every θ > 1/2 (move the integral line to Re(s) = θ). Therefore, X H(X) = + O (Xθ), ∀θ ∈ (1/2, 1). 3 θ 100 3.4. TWO SETS WITH POSITIVE DENSITY

6) Let 1 < a0 < a < 2 be parameters, and consider the linear combination K(X) := (1 + b0 + b00)−1(H(aX) + b0H(a0X) + b00H(X)), where a02 a − 1 1 a0 a − 1 b0 := − , b00 := − + . a2 a0 − 1 a a2 a0 − 1 Note that 1 + b0 + b00 = (1 − 1/a)(1 − a0/a) thus it is strictly positive. Prove that n 2 0 n 2 X X (1 − ) + b (1 − 0 ) (3.18) c + c aX a X n n 1 + b0 + b00 n≤X X

Q −1 7) Note that cn ≤ n/ ϕ(n) = p|n(1 − 1/p) , so that (use Mertens) X X X cn ≤ exp(− ln(1 − 1/p)) = exp( 1/p + O(1))  exp( 1/p)  ln n. p|n p|n p≤n

8) Use the fact that cn ≥ 0 to deduce from (3.18) that 1 2 0 1 2 X (1 − ) + |b |(1 − 0 ) X  c + O a a c n 1 + b0 + b00 n n≤X X 0, and deduce that X 1−η θ+2η cn = X + O(X ln X) + O(X ). n≤X Setting θ = 1/2 +  and η = 1/6, conclude that 5 X + cn = X + O(X 6 ) ∀ > 0. n≤X CAP. 3: SIEVE METHODS 101

9) Repeat the Step 4 of Ex. 3.3 to deduce that X µ(n)2/ ϕ(n) = ln X + O(1). n≤X Remarks: 1. This exercise reaches the same conclusion of Ex. 3.3, with a more complicate argument: where is its convenience? The techniques employed here are more sofisticate and could be used to get stronger conclusions. For example we could deduce an explicit representation of the terms which here are represented as O(1) (see [MV], Ex. 2.1.17). 2. The complicated definition of K(X) in Step 6 comes from the necessity to write a linear combination of H computed at multiples of X such that for this sum the coefficient of cn with n ≤ X be equal to 1. Xs 3. In Step 3 we have used the kernel s(s+1)(s+2) , which produces the complicated P 2 weighted sum n cn(1 − n/X) , because in Step 4 only the very poor bound ζ(s)  1+|t| is proved, so that we need a term of order s3 at the denominator R σ+i∞ of the integral σ−i∞ in order to ensure the convergence. Actually ζ(s) grows 1/2+ Xs along the vertical lines in σ > 0 at most as |t| , thus a better kernel s(s+1) P (giving n cn(1 − n/X)) could be used, but the proof of this stronger result about ζ(s) involves the Lindel¨oftheorem and is more difficult. Xs P With a bit of care we could also use the kernel s (giving directly n≤X cn), R σ+i∞ F (s)Xs but then the existence of the integral σ−i∞ s ds is more delicate, because the integrand is no more absolutely integrable. Exercise. 3.5 Prove that X 1 Y  1  = c ln X + O(1) with c := 1 + = 1.943596 .... ϕ(n) p(p − 1) n≤X p Hint: repeat the argument in Ex. 3.3, the value of c has been computed using the argument in Ex. 1.48. P Exercise. 3.6 Recall that σ1(n) = (1 ∗ I)(n) = d|n d. Prove that

2 X σ1(n) Y  2p − 1  = cX + O(1) with c := 1 + = 1.075822 .... ϕ(n) p(p + 1)(p − 1)2 n≤X p Hint: repeat the argument in Ex. 3.3. The value of c has been computed using the argument in Ex. 1.48. 102 3.4. TWO SETS WITH POSITIVE DENSITY

a P a Exercise. 3.7 Recall that σa(n) = (1 ∗ I )(n) = d|n d . Let a1, . . . , au and Q Q b1, . . . , bv be positive integers, and let f(n) := i σai (n)/ j σbj (n). Suppose that P a1 + a2 + ··· + au = b1 + b2 + ··· + bv. Prove that n≤X f(n) = cX + O(1) with a positive constant c ∞ Y  X Q (pa1(k+1) − 1) 1  1 c := 1 + i 1 − . Q pb1(k+1) − 1) pk p p k=1 j P P Remark: the product defining c converges, since i ai = j bj. Hint: repeat the argument in Ex. 3.3. r Exercise. 3.8 Recall that an integer n is r-power free when p - n for every prime p. Prove that X X 1 ∼ . ζ(r) n≤X r-power free Hint: Use the identity in Ex. 1.17 and repeat the argument in Ex. 3.3. Alterna- tively, imitate the approach suggested for Ex. 1.31. Exercise. 3.9 Let a, b be integer numbers, with 1 < a ≤ b. Let a b Ra,b := {N ∈ N: if p |n, then p |n, for every prime p}. b In other words, Ra,b is the set of integers which are divisible by p whenever are divisible by pa. Prove that X 1 Y  1 1  F (s) := = ζ(s)G(s), with G(s) := 1 − + σ > 1. ns pas pbs n∈Ra,b p Prove that X Y  1 1  1 ∼ cX, where c := G(1) = 1 − + . pa pb n≤X p n∈Ra,b Remark: What happens for a = b? And for b = ∞? Hint: repeat the argument in Ex. 3.3. Exercise. 3.10 Take b = 2a in the previous exercise, so that Y  1 1  Y  ω  ω2  c := G(1) = 1 − + = 1 + 1 + , pa p2a pa pa p p where ω is a non-trivial cubic root of unity. How you can use this equality to quickly compute c with 10 correct digits? Test your computations with a = 2. Hint: Use the Prime zeta function as in Ex. 1.47–1.48; the value for a = 2 is 0.66922021803 .... Chapter 4

Sumsets

Given two sets A and B in N ∪ {0}, we denote by A + B the set {a + b: a ∈ A, b ∈ B}, which is called sumset of A and B. Intuitively, we expect that A + B be in some sense larger than A and B alone. Nevertheless, this in not always true. For example, when A = B = N, then evidently A + B = N too, and more generally, if A and B are the same arithmetical sequence {nq : n ∈ N} for a fixed q, then A + B = {nq : n ∈ N}. As we see, the sumset can be not larger than the single sets, but this happens only when in A (and B) there is a ‘structure’ of some kind (for example, in the previous cases A and B are semigroup). This is subject of an intense research nowadays, with astonishing consequences and applications (for example the results of Green and Tao about the existence of linear sequences of arbitrary length in the sequence of prime numbers is an application of some of these ideas, and Bourgain has used these techniques to get amazing consequences about the cancellation in a class of short exponential sums). For the moment we do not pursuit the study of these ideas1, but we stress that from the qualitative point of view A+B is ‘small’ (with respect to the individual A and B) only if in A and B there is a structure, a fact which is not in anyway ‘typical’. In other words, for ‘generic’ A and B we expect that A + B be large. There are many ways to measure the ‘size’ of a set of integers A; one of the most useful, although not very intuitive, is Shnirelman’s density, which is defined as follows. Let A(n) := ]{a ∈ A, 1 ≤ a ≤ n}. Note that A(n) counts the number of positive integers in A but that we do not assume that 0 6∈ A: simply we do not count 0 even in case 0 ∈ A. Then

Shnirelman density: δA := infn≥1 A(n)/n. A minor role, for the moment at least, is held by the asymptotic densities:

Lower density: dA := lim infn→∞ A(n)/n,

Upper density: dA := lim supn→∞ A(n)/n. Note that 0 ≤ δA ≤ 1 and that if 1 ≤ k 6∈ A, then δA ≤ 1 − 1/k. In particular δA = 1 only if N ⊆ A and δA = 0 if 1 6∈ A. This means that δA is sensible to the whole set A: modifying A also in a unique member, its Shnirelman density generally changes: this is an important difference with the asymptotic densities. Lemma 4.1 Suppose that 0 ∈ A∩B and that A(n)+B(n) ≥ n for a given n. Then there exist a ∈ A and b ∈ B, such that a + b = n.

1but the interested reader can consult the paper of Imre Z. Ruzsa: Generalized arithmetical progressions and sumsets, Acta Math. Hungarica 65(4), 379–388, 1994, and the astonishing book of Terence Tao and Van Vu [TV]. 103 104 CAP. 4: SUMSETS

Proof. This is an application of the pigeon-hole principle. By hypothesis 0 ∈ A ∩ B, so that A and B contain A(n) + 1 and B(n) + 1 integers, respectively. Let 0 = a0 < a1 < a2 < ··· < aA(n) ≤ n and 0 = b0 < b1 < b2 < ··· < bB(n) ≤ n be the elements of the two sets. The numbers n − bj are B(n) + 1 and are in [0, n]. In [0, n] there are n + 1 integers and by hypothesis (A(n) + 1) + (B(n) + 1) ≥ n + 2 so A(n) B(n) that in {aj}j=0 and {n − bj}j=0 there is at least one common integer, i.e. there are indexes u and v such that au = n − bv, which is the claim.  The lemma has an immediate consequence.

Proposition 4.1 Suppose that 0 ∈ A∩B and that δA +δB ≥ 1, then each integer can be written as a sum of an element in A and an element in B, i.e. A+B = N∪{0}.

Proof. The claim follows by the previous lemma, because A(n) ≥ δAn and B(n) ≥ δBn for every n. Thus, A(n) + B(n) ≥ (δA + δB)n ≥ n for every n.  Proposition 4.2 Suppose that 0 ∈ A ∩ B. Then

δA+B ≥ δA + δB − δAδB.

Proof. The claim is evident if δA or δB is zero, hence we can assume that δA, δB > 0; in particular we can assume that 1 ∈ A ∩ B. Let 1 = a1 < a2 < . . . < aA(n) ≤ n be the sequence of integers in A ∩ [1, n]. For every index j, let gj := aj+1 − aj − 1 and gA(n) := n − aA(n). Each gj gives the size of the gap between aj and aj+1, in A (and of the gap between n and aA(n), respectively). This gap produces at least B(gj) numbers in A + B, since if b ∈ B ∩ [1, gj] then aj + b ∈ A + B. The numbers produced in this way with different js are distinct. We also have A ⊆ A + B, because 0 ∈ B. Also these ones are distinct from the previous ones, therefore A(n) A(n) X X (A + B)(n) ≥ A(n) + B(gj) ≥ A(n) + δB gj j=1 j=1

= A(n) + δB(n − A(n)) = A(n)(1 − δB) + nδB

≥ n(δA(1 − δB) + δB). The claim follows, because n is arbitrary.  We notice that the conclusion of the previous proposition can be written also as

1 − δA+B ≤ (1 − δA)(1 − δB). In this form the claim can be extended by induction to an arbitrary number of sets, in particular it can be extended to the case where several copies of the same set A are considered, thus proving that h h (4.1) 1 − δhA ≤ (1 − δA) , i.e. δhA ≥ 1 − (1 − δA) , ∀h. CAP. 4: SUMSETS 105

It is customary to call additive basis a set A for which there exists an integer h such that N ⊆ hA, i.e. such that every integer can be written as sum of h elements of A, and the smallest h with N ⊆ hA is called order of the basis.

Theorem 4.1 (Shnirelman) Suppose that 0 ∈ A and than δA > 0. Then A is a basis. ¯ Proof. In fact, by (4.1) there exists h such that δh¯A ≥ 1/2, so that the claim for ¯ ¯ ¯ 2hA = hA + hA follows from Lemma 4.1. 

How can we prove that Shnirelman’s density of a set A is positive? The following fact is a simple criterium:

δA > 0 iff 1 ∈ A and dA > 0.

Note that the claim does not state that δA = dA, thus applying this criterium we can prove that δA > 0 and that therefore every integer is a sum of a fixed number h of integers in A, but we have not an immediate way to compute h from dA. Exercise. 4.1 Prove that there exists h such that every integer can be written as sum of h squarefree numbers.

An application, which was devised by Shnirelman himself and was the first indication toward the Goldbach conjecture: the set of primes P has asymptotic density equal to zero, but the set P + P (the set of integers which can be written as sum of two primes) has a positive asymptotic density (this is exactly the claim of Theorem 3.6). Adding 0 and 1 to this set we have a new set having positive Shnirelman’s density, so that we have proved that

Corollary 4.1 (Shnirelman) There exists an integer h such that every integer n can be written as sum of w primes and h − w ones for some w ∈ {0, 1, 2, . . . , h}.

Later we will see that it is possible to write every n as sum of a finite set of primes alone (i.e., without any need of adding ones) if n is large enough (see Corollary 4.2).

The claim in Proposition 4.2 has already served for our purposes, but it is not-optimal. For example, when δA ≥ 1/2 we know from Lemma 4.1 that δ2A = 1, while the Inequality in (4.1) only says that δ2A ≥ 3/4. Actually, the following stronger result holds.

Theorem 4.2 (Mann) Let 0 ∈ A ∩ B. Then

δA+B ≥ min{1, δA + δB}. 106 CAP. 4: SUMSETS

This inequality was conjectured by Khinchin and after some partial results which was obtained by several peoples2 it was finally proved by Mann3. Suppose we know the following fact. Lemma 4.2 (Dyson) Let A and B be two collections of integers in [0, n], with 0 ∈ A ∩ B. Suppose that there exists c ∈ (0, 1] such that (4.2) A(m) + B(m) ≥ cm, ∀m ≤ n, then (4.3) (A + B)(m) ≥ cm, ∀m ≤ n. Then the theorem immediately follows, since under the hypotheses of the theorem we can take c = min{1, δA + δB} in (4.2). Therefore our real task is proving the lemma. Proof. We proceed by induction on n. When n = 1 the unique value for m is 1 and (4.2) in this case says that A(1) + B(1) ≥ c, i.e. that 1 ∈ A ∪ B. Hence 1 ∈ A + B (because 0 ∈ A ∩ B) so that (A + B)(1) = 1 ≥ c which is (4.3). Now suppose the claim of the theorem holds for every n0 < n, but that it is false for n. Choose a counterexample A, B with B(n) as small as possible. Then B(n) ≥ 1, otherwise B would be equal to {0},(A + B)(m) would be equal to A(m) and (4.3) would hold since it coincides with (4.2). We construct two sets A0, B0 with i. A0(m) + B0(m) ≥ cm for every m ≤ n, ii. A0 + B0 ⊆ A + B, iii. B0(n) < B(n). Together, these facts prove that A0, B0 are a counterexample to the claim with n (exactly as A, B) but with B0(n) < B(n), which is impossible since it contradicts the minimality of B. 00 For every a ∈ A, let B (a) := {b ∈ B, a + b 6∈ A} and let a0 be the smallest 00 integer for which B (a) 6= ∅ (such a0 exists, since A + B 6⊆ A, for example because 00 max{A} + max{B} 6∈ A, being B 6= {0}). For sake of brevity we denote B (a0) by 00 B . The minimality of a0 implies that for every r < a0 (4.4) b + (A ∩ [0, r]) ⊆ A, ∀b ∈ B,

2 Khinchin himself proved that the claim holds under the restriction δA = δB ≤ 1/2 (Zur additiven Zahlentheorie Rec. Math. Soc. Math. Moscou 39(3), 27–34, 1932), Erd˝osproved the inequality 1 δA+B ≥ δA + 2 δB (On the asymptotic density of the sum of two sequences, Annals of Math. 42(1), 65–68, 1942) and Besicovitch proved a slightly weaker inequality (On the density of the sum of two sequences of integers, J. London Math. Soc. 10, 246–248, 1935). 3The original proof was quite complicated; I reproduce here the proof as it is given in [P] and which is due to F. Dyson. For several extensions see [HR]. CAP. 4: SUMSETS 107 thus (4.5) (A + B) ∩ [0, r] ⊆ A. Let now 0 00 0 00 A := A ∪ ((a0 + B ) ∩ [0, n]), B = B\B . Note that the union defining A0 is disjoint, since the definition of B00 ensures that 00 00 00 0 0 the a0 + b 6∈ A for every b ∈ B . With these definitions, 0 ∈ A ⊆ A , and 0 ∈ B as well, since 0 ∈ B and 0 6∈ B00. Now, A + B0 ⊆ A + B and 00 0 00 0 00 00 0 0 (a0 + B ) + B = {a0 + b + b , b ∈ B , b ∈ B } 0 0 0 ⊆ {(a0 + b ) + b, b ∈ B, b ∈ B } ⊆ A + B, 0 0 00 0 0 because a0 + b ∈ A (because b 6∈ B ). This proves that A + B ⊆ A + B which is ii). Moreover, condition iii) is immediate since B00 6= ∅, thus we last to prove i). Recall that the union defining A0 is disjoint, therefore A0(m) + B0(m) 00 00 00 00 = A(m) + ]{b ∈ B , 1 ≤ a0 + b ≤ m} + B(m) − ]{b ∈ B , 1 ≤ b ≤ m} 00 00 = A(m) + B(m) − ]{b ∈ B , m − a0 + 1 ≤ b ≤ m}

(4.6) ≥ A(m) + B(m) − ]{b ∈ B, m − a0 + 1 ≤ b ≤ m}.

Let b0 be the smallest positive number in B∩[m−a0 +1, m]. If this integer does not exist then the last set in the previous bound is empty and the inequality becomes 0 0 A (m) + B (m) ≥ A(m) + B(m) and i) follows from (4.2). Hence, suppose that b0 exists. Then (4.6) implies that 0 0 A (m) + B (m) ≥ A(m) + B(b0 − 1).

Write m = b0 + r, so that 0 ≤ r < a0 ≤ n. By inductive hypothesis (here we use the fact that r < n) (A + B)(r) ≥ cr. Since 0 ∈ A ∩ B and c ≤ 1, ]((A + B) ∩ [0, r]) = 1 + (A + B)(r) ≥ 1 + cr ≥ c(r + 1).

Moreover, since r < a0, this result together (4.5) shows that ](A∩[0, r]) ≥ c(r+1). But (4.4) implies that [b0, b0 + r] contains at least as many elements of A as [0, r], so that ](A ∩ [b0, b0 + r]) ≥ c(r + 1). Consequently 0 0 A (m) + B (m) ≥ A(m) + B(b0 − 1)

= A(b0 + r) − A(b0 − 1) + A(b0 − 1) + B(b0 − 1)

≥ c(1 + r) + c(b0 − 1) = c(b0 + r) = cm. 108 CAP. 4: SUMSETS

 Exercise. 4.2 Let A ⊆ N ∪ {0} be an additive basis with δA > 0. Using (4.1), deduce that the order hA of A is bounded by  −2 ln 2  hA ≤ . ln(1 − δA) The inequality proved by Mann (Th. 4.3) improves this bound to  1  hA ≤ . δA An asymptotic basis is a set A ⊆ N∪{0} for which there exists an integer h such that every integer which is sufficiently large can be written as sum of h elements in A. As we see, this notion is weaker than that one of basis, every basis being also an asymptotic basis. It is very frequent for a set A to be an asymptotic basis without being a basis, and even for sets being basis in the ordinary sense, their dimension as asymptotic basis can be considerably smaller than their dimension as ordinary basis4. This means that there is a considerable interest about the weak notion of basis and its relation with the ordinary notion. Theorem 4.3 here below gives a way to reckon an asymptotic basis, deducing the claim from Shnirelman’s Theorem 4.1 and from the following general result.

Proposition 4.3 (Schur) Let 0

x1a1 + ··· + xkak = n k has a solution (x1, . . . , xk) ∈ N for every n ≥ nk. We provide two independents proofs of this result. The first one is constructive, the second one is shorter. This simple problem actually is very complicated: the 5 exact dependence of nk on ajs is known only for k = 2.

Proof. The proof is by induction on k and as a first step we prove that n2 = a1a2 has the desiderated property for k = 2. In fact, let n ≥ a1a2. The coprimality assumption (a1, a2) = 1 shows that

x1a1 + x2a2 = n has a solution with x1, x2 ∈ Z. Evidently it is impossible that x1, x2 be both negative, and the claim is already true if both are nonnegative, thus assume that x1 < 0 < x2 (the case x2 < 0 < x1 can be treated with a similar argument). Then

4For example, every integer can be written as sum of nine cubes but there are only a finite set of integers requiring nine terms, being proved that every integer large enough can be already represented with seven cubes. 5 see A. Nijenhuis and H. Wilf, Representations of integers by linear forms in nonnegative integers, J. Number Theory 4, 98-106, 1972. CAP. 4: SUMSETS 109 x2 > n/a2 ≥ a1. The transformation (x1, x2) → (x1 + a2, x2 − a1) produces a new solution, with a positive second component, and a first component which is strictly greater than the previous one. We can iterate this argument until also the first argument becomes nonnegative. Now, let k ≥ 3 and suppose we have already proved the claim for k − 1. Let 0 D := (a1, . . . , ak−1) and aj := aj/D for j ≤ k − 1. By hypothesis (D, ak) = 1, therefore the equation xD + xkak = n 0 has a nonnegative solution if n ≥ Dak. Suppose that xk ≥ 0 but x < nk−1, 0 0 0 where nk−1 denotes the constant nk−1 associated with the numbers a1, . . . , ak−1 0 whose existence is proved by the inductive hypothesis. Then xk > (n−Dnk−1)/ak 0 which is greater than D whenever n ≥ D(ak + nk−1). Thus, let nk be this number 0 0 D(ak +nk−1), then xk is greater than D whenever x is lower than nk−1, so that the transformation (x, xk) → (x + ak, xk − D) gives a new solution with a nonnegative value for xk and a strictly larger value for x. Iterating this process we get a solution 0 with a positive xk and an x not lower than nk−1. By induction the system 0 0 x1a1 + ··· + xk−1ak−1 = x admits a nonnegative solution, thus we get a nonnegative solution of the original equation, in the form 0 0 x1a1 + ··· + xkak = D(x1a1 + ··· + xk−1ak−1) + xkak = Dx + xkak = n. 

Exercise. 4.3 Prove that if the integers aj are pairwise coprime, then in the Q previous proposition we can take nk = j≤k aj. A different proof of Proposition 4.3.

Proof. Let y1, . . . , yk ∈ Z be such that y1a1 +···+ykak = 1; such a set of integers exists because the hypothesis implies that the ideal generated by a1, a2, . . . , ak is Pk trivial. Let Nk := j=1 |yj|aj and let n ≥ Nk(Nk − 1). Write n as n = qNk + r, with 0 ≤ r < Nk. In this representation we have q ≥ Nk − 1, otherwise we have 2 n = qNk + r ≤ (Nk − 2)Nk + Nk − 1 ≤ Nk − Nk − 1 contradicting the assumption about n. Then k k k X X X n = qNk + r = q |yj|aj + r yjaj = (q|yj| + ryj)aj j=1 j=1 j=1 and in this representation each q|yj| + ryj is nonnegative, since q ≥ Nk − 1 ≥ r. This argument proves the claim with nk := Nk(Nk − 1). 

Theorem 4.3 Let A ⊆ N ∪ {0}, with 0 ∈ A. If dA > 0 and GCD(A) = 1, then A is an asymptotic basis. 110 CAP. 4: SUMSETS

Proof. The Z-ideal generated by A is a principal ideal actually generated by GCD(A). Z being a notherian ring, there exists a finite set a1, . . . , ak ∈ A such that 1 = GCD(A) = (a1, . . . , ak). Let n0 be large enough so that

k k X X 0 n0 = ajxj, n0 + 1 = ajxj, j=1 j=1 0 have both solutions in nonnegative integers xj, xj (here we use Proposition 4.3). Pk Pk 0 Let ` := max{ j=1 xj, j=1 xj}; the previous equalities show that n0 and n0 + 1 0 belong to `A (also in case some xj or xj is zero, because 0 ∈ A by hypothesis). 0 0 0 0 Let A := (`A − n0) ∩ N (i.e. the set {n ∈ N: n + n0 = a1 + ··· + a`, aj ∈ A ∀j}). 0 Then 1 ∈ A and dA0 = d`A ≥ dA > 0 (because 0 ∈ A implies that A ⊆ `A). 0 0 Therefore A is a basis. Let h1 be the order of A . Then 0 N ⊆ h1A ⊆ h1(`A − n0) = h1`A − h1n0.

This proves that every integer ≥ h1n0 is a sum of h1` integers in A.  As a corollary, from Theorem 3.6 we can finally deduce that Corollary 4.2 P is an asymptotic basis, i.e. there exists h such that every integer large enough can be written as sum of h primes, at most. The original version of this result does not quantify explicitly the constant h. Later the constant has been estimated by several authors and now it is known that h ≤ 4 (see Table 1 here below).

Exercise. 4.4 1) Let A be an asymptotic basis and let h be its dimension. Prove that

](A ∩ [0,N])  N 1/h N → ∞.

2) Let P be an integer and let AP be the set of integers multiplicatively generated by primes dividing P , i.e.

AP := {n ∈ N: every prime p dividing n is a divisor of P }.

Prove that AP is not an asymptotic basis.

Hint: for the first point, let Rh(n) be the number of representations of n as sum h P of h elements in A, and prove that (]A ∩ [0,N]) ≥ n≤N Rh(n) ≥ N − c, where c is a constant (independent of N). For the second step, prove that ](A ∩ [0,N]) ≤ lnω(P ) N, where ω(P ) is the number of prime divisors of P . CAP. 4: SUMSETS 111

Exercise. 4.5 Let a, b be integer numbers, with 1 < a ≤ b. The asymptotic density of the set c b Ra,b := {N ∈ N: if p |n with c ≥ a, then p |n, for every prime p} has been computed in Ex. 3.9. What we can say about its Shnirelman’s density? How the density depends on a and b? There is some monotonicity in these para- meters? Is it possible to compute their value? Remark: This is an exercise only in a broad sense: actually it would be quali- fied more properly as Research Project, since some of these questions are not well understood. The case b = ∞ (i.e., for a-power free integers) has been quite ex- tensively studied, but it is still open in several aspects. See Diananda–Subbarao, On the Shnirelman density of the k-free integers, Proc. Amer. Math. Soc. 62(1), 1976, 7–10, and Erd˝os–Hardy–Subbarao, On the Shnirelman density of k-free in- tegers, Indian J. Math. 20(1), 1978, 45–56. Some non trivial lower bounds for the Shnirelman density of Ra,b may be deduced from a general result by Siva Rama Prasad–Bhramarambica, On the Shnirelman density of M-free integers, Fibonacci Quart. 27(4), 1989, 366–368. 112 CAP. 4: SUMSETS

Table 1. table of records for Goldbach-type problems. 6 ≥ dimension as asymptotic basis, i.e. as basis for integers large enough dimension as basis for integers = 3? 1742 Goldbach-Euler (conjecture) exists 1933 Shnirelman ≤ 4 1937 Vinogradov6 ≤ 6 · 109 1969 Klimov7 ≤ 115 1972 Klimov, Pil0tja˘ıand Septickajaˇ 8 ≤ 55 1975 Klimov9 ≤ 27 1977 Vaughan10 ≤ 26 1977 Deshouillers11 ≤ 19 1983 Riesel and Vaughan12 ≤ 7 1995 Ramar´e13 ≤ 6 2012 Tao14 ≤ 4 2013 Helfgott15

6Representation of an odd number as a sum of three primes, C. R. (Dokl.) Acad. Sci. URSS, n. Ser. 15, 169–172, 1937, and Some theorems concerning the theory of primes, Rec. Math. Moscou, n. Ser. 2, 179–195, 1937. 7Apropos the computations of Snirelˇ 0man’s constant, Volˇz.Mat. Sb. Vyp. 7, 32–40, 1969. 8An estimate of the absolute constant in the Goldbach-Snirelmanˇ problem, in Studies in number theory, No. 4, 35–51, Izdat. Saratov. Univ., Saratov, 1972. 9Ku˘ıbyev. Gos. Ped. Inst. Nauˇcn.Trudy 158, 14–30, 1975. 10On the estimation of Schnirelman’s constant, J. Reine Angew. Math. 290, 93–108, 1977. 11Sur la constante de Snirelˇ 0man, S´eminaireDelange-Pisot-Poitou, 17e ann´ee:(1975/76), Th´eorie des nombres: Fac. 2, Exp. No. G16, Paris, 1977. 12On sums of primes, Ark. Mat. 21(1), 46–74, 1983. 13On Snirelˇ 0man’s constant, Ann. Scuola Norm. Sup. Cl. Sci. (4) 22(4), 645–706, 1995. 14Every odd number greater than 1 is the sum of at most five primes, Math. Comp. 83(286), 997–1038, 2014. 15The ternary Goldbach problem, preprint 2014, see http://arxiv.org/pdf/1404.2224.pdf. Chapter 5

Waring’s problem

In a letter to Euler, Waring suggested the possibility that for every positive integer k there exists an integer hk such that every integer n can be written as sum of hk k-powers, at most. With the language we have introduced in the previous k chapter, Waring’s conjecture says that the set Gk := {n : n ∈ N} is an additive basis. This problem attracted the attention of several authors and some instances was proved with algebraic tools1. The first complete proof of the conjecture is due to Hilbert, which nevertheless was a pure existence proof, without an effective way to determine (or even bound) the value of hk. Later a considerable effort has been directed towards the determination of hk, but a complete comprehension of the behavior of these constants as a function of k is still lacking. This chapter is devoted to the proof of Waring’s conjecture in its weak formulation: k Theorem 5.1 (Waring’s conjecture) Gk := {n : n ∈ N} is an additive basis for every k ∈ N. The proof that we will reproduce here is due to Weyl, Linnik and Newman, and is probably the shortest one. Several steps are tailored on the specific problem but in its general aspects (Hardy–Littlewood’s circle method, Farey arcs, appro- ximations of exponential sums, etc...) it is a good introduction to general and widely used techniques in Analytic Number Theory. According to Theorem 4.1 we could attack the theorem by trying to prove that Schnirelmann’s density of Gk is positive, and (since 1 ∈ Gk) we could deduce this fact by proving that its lower density is positive. There is a difficulty in this argument: the density of Gk is zero whenever k ≥ 2! We have already faced a similar problem with the set of primes (Theorem 3.6), thus we know how we can overcome the difficulty: proving that there exists an integer h such that hGk has a positive lower density. If we are able to prove it then we can conclude that its Schnirelmann’s density is positive too (because 0 is a k power, therefore 1 ∈ hGk), and Theorem 5.1 follows by Theorem 4.1. Hence our true goal is the proof of the following proposition.

1Lagrange proved that every integer can be written as sum of four squares using an identity which now we reckon as giving the multiplicativity of the quaternionic norm to reduce the problem to the representability of primes, and the structure of the solutions of quadratic equations modulo primes to conclude that every prime is sum of four squares. Moreover, Hilbert’s first proof of the full Waring’s problem comes from a suitable polynomial identity: for a modern proof of this identity see Nesterenko, On a Hilbert identity, Mat. Zametki, 66(4), 1999, p. 527–532. His original approach is not constructive, but it can be refined, see Pollack On Hilbert’s solution of Waring’s problem, Cent. Eur. J. Math., 9(2), 2011, p. 294–301. Te elementary cases k = 3, 4, 6, 8 are also discussed in [HW], Ch. XXI. 113 114 CAP. 5: WARING’S PROBLEM

Proposition 5.1 Letd h,k be the lower density of hGk. For every k there exists h such thatd h,k > 0. Analogously to our way to prove the positivity of the density for the sets P +P and P +2N (i.e. Theorems 3.6 and 3.7), also Proposition 5.1 can be recovered from an upper bound for the number of representations of an integer n as sum of k- powers. More explicitly, let Rh,k(n) be the number of representations of n as sum of h k-powers; in a formula: h k k Rh,k(n) := ]{(n1, . . . , nh) ∈ N : n = n1 + ··· + nh}. It is easy to prove that X h/k (5.1) Rh,k(n) h,k x . n≤x Proof. In fact, X X X Rh,k(n) = 1 n≤x n≤x n1,...,nh∈N k k n1 +···+nh=n X X h/k = 1 ≥ 1  (x/h) .  n1,...,nh∈N n1,...,nh∈N k k 1/k n1 +···+nh≤x nj ≤(x/h) ∀j Suppose we know the following bound. k h/k−1 Theorem 5.2 (Linnik) Let h, k be fixed with h ≥ 6k2 , then Rh,k(n) h,k n . Let

Sh,k(x) :=#{n ≤ x: Rh,k(n) ≥ 1} h k k =#{n ≤ x: ∃(n1, . . . , nh) ∈ N with n = n1 + ··· + nh}. Then, under the assumption that h ≥ 6k2k, we have the lower bound (1) (2) (3) h/k−1 X h/k−1 X h/k x Sh,k(x) h,k n δn is sum of h h,k Rh,k(n) h,k x , k-powers n≤x n≤x where we have used in (1) the definition of Sh,k(x) (and the positivity of the exponent h/k − 1, to write xh/k−1 ≥ nh/k−1), Theorem 5.2 in (2) and (5.1) in (3). This proves that Sh,k(x) h,k x, k i.e. that hGk has a positive lower density when h ≥ 6k2 . Thus, our goal now is the proof of Theorem 5.2. We split the proof in two major steps: the first one is the proof of a certain cancellation in certain exponential sums (Theorem 5.3 here below), then we use the representation of Rh,k(n) as complex integral over the CAP. 5: WARING’S PROBLEM 115 interval [0, 1) and a decomposition of [0, 1) in Farey arcs (a typical tool from the Analytic number Theory), to deduce the claim from a bound for the integrand in each arc.

5.1. First step: cancellation in exponential sums Let f(x) = αxk +··· be a polynomial with real coefficients, and suppose α 6= 0, so that its degree is k. Let

N X Sf (N) := e(f(n)), n=1 where e(x) := exp(2πix). The absolute value of every summand is one, hence |Sf | ≤ N, and this holds as an equality when f is a constant. We need a result proving that Sf is actually smaller as soon as the degree of f is positive. The proof will be by induction on the degree, so we first need a result describing what happens for linear polynomials.

Lemma 5.1 Let N ∈ N, α, β ∈ R. Then N X −1 e(αn + β) ≤ min{N, kαk }, n=1 where kαk := min{{α} , 1 − {α}} is the distance of α from the closest integer. Proof. The bound ≤ N is trivial and the inequality is meaningful only when kαk ≥ 1/N. In particular, we can assume that α is not an integer. In this case

N X e2πiαN − 1 e(αn + β) = e(α + β) , e2πiα − 1 n=1 so that N X |e2πiαN − 1| | sin(παN)| 1 e(αn + β) = = ≤ . |e2πiα − 1| | sin(πα)| | sin(πα)| n=1 The function α 7→ | sin(πα)| is 1-periodic and even, therefore |sin(πα)|=|sin(πkαk)| = sin(πkαk). Note that kαk ∈ [0, 1/2], by its definition; in this range the down- convexity of the sine map shows that sin(πkαk) ≥ 2kαk, so that the inequality in the claim follows.  Now we can state the next general result. 116 5.1. FIRST STEP: CANCELLATION IN EXPONENTIAL SUMS

Proposition 5.2 Let f = αxk + ··· be a polynomial with degree k and real coeffi- cients. Then 1/K  −k X −1  (5.2) Sf (N) k N N min{N, kαk!d1 ··· dk−1k } ,

d1,...,dk−1 −N≤dj ≤N where K := 2k−1 and the implied constant depends only on k (in particular it is independent of all other terms in f). Note that once the proposition is proved for a given k, then it can be imme- diately generalized to the analogue claim for the same k but where the sum in n runs in every set of N consecutive integers: in fact, the shift n 7→ n − n0 where n0 is any integer simply changes f(x) to f(x − n0) which differs from f(x) only for terms of powers strictly lower than k. Moreover, the function appearing to the right–hand side is increasing in N: this implies that the claim can be further extended to sums over at most N consecutive integers. This remark will be used in the inductive step of the proof.

−1 Proof. When k = 1 the claim is intended as Sf  min{N, kαk } and holds by Lemma 5.1. Assume k > 1. We have 2 X |Sf (N)| = Sf (N) · Sf (N) = e(f(m) − f(n)) 1≤n,m≤N and reordering the terms we deduce that

2 X X |Sf | ≤ e(f(n + dk−1) − f(n)) . |dk−1|≤N 1≤n≤N 1≤n+dk−1≤N The inner term is again an exponential sum over a set of consecutive indexes which k−1 contains N terms at most, for a new polynomial f(n+dk−1)−f(n) = αkdk−1n + ··· whose degree is one less the previous one: we have the opportunity to trig an induction here (this remark is due to Weyl). By inductive hypothesis we deduce 2/K 2 X  1−k X −1  |Sf | k N N min{N, kαkdk−1(k − 1)!d1 ··· dk−2k }

|dk−1|≤N d1,...,dk−2 −N≤dj ≤N 2/K X  1−k X −1  = N N min{N, kαk!d1 ··· dk−1k } .

|dk−1|≤N d1,...,dk−2 −N≤dj ≤N P 2/K 1−2/K P 2/K Using the inequality |d|≤N |ad|  N ( |d|≤N |ad|) (which is evident for K = 2 while for K > 2 follows by (p, q)-Holder’s inequality with p := K/(K−2) CAP. 5: WARING’S PROBLEM 117 and q := K/2) we get 2/K 2 1−2/K  X 1−k X −1  |Sf | k N · N N min{N, kαk!d1 ··· dk−1k }

|dk−1|≤N d1,...,dk−2 −N≤dj ≤N 2/K 2 −k X −1  = N N min{N, kαk!d1 ··· dk−1k }

d1,...,dk−1 −N≤dj ≤N which gives the claim when a square root is taken.  Now we are able to prove our first tool, which is due to Weyl. Theorem 5.3 (Weyl) Let f(x) = axk + ··· be a polynomial of degree k with real coefficients. Let the maximal degree coefficient a be an integer coprime to the positive integer q. Then N X f(n) 1 1 q 1/K (5.3) S (N) := e  (qN)N + + , f q k, q N N k n=1 where K := 2k−1 and the implied constant depends only on k and  (in particular it is independent of a and of all other terms in f). This result shows that the sum is considerably lower than N (in absolute value) when q is large with respect to N but not too large: the cancellation appears when q = q(N) diverges but remaining o(N k). At last, we notice that for k ≥ 2 (5.3) may be written also as  1 if q ≤ N  q1/K S (N)  (qN)N · 1 if N ≤ q ≤ N k−1 f k, N 1/K  q 1/K k−1 N k if N ≤ q. Proof. By Proposition 5.2 we have  X n a −1o1/K S (N)  N N −k min N, k!d ··· d . f k q 1 k−1 d1,...,dk−1 −N≤dj ≤N k−2 The contribution to the sum of terms with d1 ··· dk−1 = 0 is k N ·N (because one of djs is zero, the other k − 2 run in a set with 2N + 1 terms producing ≤ k(2N + 1)k−2 elements, each one contributing to the sum by N at most), thus  X n a −1o1/K S (N)  N N −1 + N −k min N, k!d ··· d . f k q 1 k−1 d1,...,dk−1 1≤dj ≤N 118 5.1. FIRST STEP: CANCELLATION IN EXPONENTIAL SUMS

Let X τ`(m) := 1,

d1,...,d` d1···d`=m i.e. the number of ways we can write m as product of ` positive integers. Then  2 τ`(m) `, m .

Proof. In fact, τ` can also be recursively defined as: τ1 := 1, and τ` := τ`−1 ∗ 1 when ` > 1, and the claim can be proved by induction in the index `. The claim Q νj is evident for τ1. Let m = j pj be the decomposition of m as product of primes. P Q Then τ2(m) = u|m 1 = j(1+νj) (a fact which can be verified for example using τ2(m) Q 1+νj 1+ν the multiplicativity). Fix  > 0. Then m = j νj . The factor pν is greater pj than 1 only for a finite set of primes p and exponents ν. In fact, 1 + ν ≥ pν forces  1/ν ν ν p ≤ (1ν) ≤ 2, and 1 + ν ≥ p ≥ 2 forces ν ≤ ν0(). Hence the right hand side in the previous equality is bounded by a constant which is dependent on  but τ2(m) independent of m. This shows that m  1, which is the claim for τ2. For the P P  generic τ` it is sufficient to remark that τ`(m) = d|m τ`−1(d) `, d|m d `,  m τ2(m).  Therefore we get  X n a −1o1/K S (N)  N 1+ N −1 + N −k min N, k!m . f k, q m≤N k−1 Extending the range for m to k!N k−1 it becomes 1/K 1+ −1 −k X −1  Sf (N) k, N N + N min{N, kam/qk } . m≤k!N k−1 Thus the claim follows immediately from the bound X −1  (5.4) min{N, kam/qk }  q (N + q + MN/q + M) ∀M. m≤M We split the sum in blocks of q consecutive integers: there are bM/qc such blocks, plus eventually one more which is incomplete. All complete blocks contribute the same to the sum (by q periodicity of the summand), and the contribution of the uncomplete block is smaller than the one of a full block. As a consequence, it is sufficient to bound the contribution of the first block, corresponding to m = 1, . . . , q: the final bound will be produced multiplying this bound by 1 + bM/qc. The fraction am/q is an integer only for m = q, because a and q are coprime,

2Actually better bounds are known. See [HW] Th. 317 p. 262. CAP. 5: WARING’S PROBLEM 119 and in this case min{N, kam/qk−1} = N. For every other m we can bound the minimum with kam/qk−1, thus q q X X min{N, kam/qk−1} ≤ N + kam/qk−1. m=1 m=1 q-m We partition further the sum according to the values of (m, q), the greatest com- mon divisor of m and q. Thus, let u be a divisor of q, we have (m, q) = u if and only if m = um0 with (m0, q/u) = 1, and m ≤ q if and only if m0 ≤ q/u, thus we have q X X X min{N, kam/qk−1} = N + kam0/(q/u)k−1. m=1 u|q m0≤q/u u

X −1 1+ min{N, km/qk }  (1 + M/q)(N + q ), m≤M which is equivalent to (5.4).  120 5.2. SECOND STEP: INTEGRAL REPRESENTATION

5.2. Second step: integral representation R 1 Let β ∈ Z. The integral 0 e(αβ) dα detects when β = 0, since it is 1 when β = 0 and 0 otherwise. We can use this remark to give an integral representation PN k of Rh,k(n). Let f(α) := m=0 e(αm ) for a suitable integer N, then the previous computation shows that Z 1 Z 1 h X k k f(α) e(−αn) dα = e(α(m1 + ··· + mh − n)) dα 0 m1,...,mh 0 0≤mj ≤N h k k = ]{(m1, . . . , mh) ∈ N : mj ≤ N ∀j, m1 + ··· + mh = n}. k The condition mj ≥ 0 for every j forces each mj satisfying the equation m1 + k 1/k ··· + mh = n to be lower than n , therefore the full set of representations of n is computed by the integral as soon as N ≥ n1/k, i.e. Z 1 h Rh,k(n) = f(α) e(−αn) dα 0 whenever N ≥ n1/k. As a consequence, Linnik’s result (Theorem 5.2) is equivalent to the claim Z 1 h h−k (5.5) f(α) e(−αn) dα h,k N 0 for N = n1/k. The simple upper bound Z 1 h h h f(α) e(−αn) dα ≤ kf k∞ · 1 = kfk∞ 0 h is not strong enough, since kfk∞ = f(0) = N + 1, so that it simply gives  N , while we need  N h−k. We could try to get around this difficulty by splitting [0, 1) into two regions: a (small) interval I containing 0 and its complementary set Ic := [0, 1)\I, and estimating the part of the integral which is in I as N · µ(I) (which is small, when the measure of I is small enough), and estimating the part of h c the integral which is outside I with kfk∞,Ic , in the hope that the sup of f in I is significatively smaller than N. Unfortunately this is not true, essentially because the k powers of integers are unevenly distributed modulo q. For example, only 0 and 1 are squares modulo 3, and m2 is 0 (mod 3) when 3|m, and equals 1 (mod 3) otherwise. Thus if we set k = 2 then N 1 X N 2N iN f = e(m2/3) = e(0) + e(1/3) + O(1) = √ + O(1). 3 3 3 m=0 3 CAP. 5: WARING’S PROBLEM 121

The same phenomenon appears for every q for which k and ϕ(q) are not coprime (for example, try with k = 3 and q = 7). As a consequence, in order to bound the integral in [0, 1) we need to split its domain into a convenient set of arcs (called Farey arcs) centered around certain fractions (see Equation (5.6) here below), then we will be able to prove that the integrand is small in size in each arc by Weyl’s result. We start with a second lemma which we need in order to check that the collection of Farey arcs actually covers the full interval [0, 1). Lemma 5.2 (Dirichlet approximation lemma) Let Q be an arbitrary positive integer. For every α ∈ [0, 1) there are integers p, q with 0 < q ≤ Q such that p 1 α − ≤ . q qQ Proof. Let A := {{jα} : j = 1,...,Q + 1}. The elements in A are in [0, 1), by definition of fractional part, and are Q + 1 in number3. Hence (by the pigeon-hole principle) two of them are less than 1/Q apart, i.e. there are two indexes j1 < j2 such that | {j2α} − {j1α} | ≤ 1/Q. Let p := bj2αc − bj1αc and q := j2 − j1. Note that q ≤ Q. Then 1 |qα − p| = |j α − j α − bj αc + bj αc | = | {j α} − {j α} | ≤ , 2 1 2 1 2 1 Q which is the claim.  Note that the integers p and q appearing in the previous lemma can be taken coprime. In fact, Let p0, q0 be coprime integers with p0/q0 = p/q. Then evidently q0 ≤ q ≤ Q, so that p0 p 1 1 α − = α − ≤ ≤ . q0 q qQ q0Q Let k ≥ 2 and let ν be a positive ‘small’ parameter that we will set later (our choice will be any ν ∈ (0, 1/3], with ν = 1/3 producing our best result). For every k−ν couple 0 ≤ a < q ≤ N of coprime integers and for every j ∈ N let Ij(q, a) be the set n a 1 a h j j + 1o I (q, a) := α ∈ [0, 1): α − ≤ , α − ∈ , . j q qN k−ν q N k N k

Essentially, Ij(q, a) is the set of real numbers which we consider as well approxi- mated by the rational number a/q. Fixed a and q coprime, we have [ n a 1 o I (q, a) = α ∈ [0, 1): α − ≤ ; j q qN k−ν j

3some of them could coincide when α is rational. 122 5.2. SECOND STEP: INTEGRAL REPRESENTATION therefore, from the Approximation Lemma 5.2 (with Q = N k−ν) we get that

q−1 N ν /q [ [ [ (5.6) [0, 1) ⊆ Ij(q, a), q≤N k−ν a=0 j=0 (a,q)=1

k−ν ν where 1 ≤ q ≤ N , and j ≤ N /q (otherwise Ij(q, a) is empty). The next three lemmas give a bound for the value of f(α) in Ij(q, a) according to the range of q.

Lemma 5.3 Let ν ∈ (0, 1/3]. Let α ∈ Ij(q, a). If q ≤ N, then N f(α) k . q1/2k (j + 1)1/k Proof. We split the proof into two cases. Case 1: small q. Suppose q < N 2ν. Let q 1 X a  A := e mk . q q m=1 Its definition immediately implies that |A| ≤ 1, while from Theorem 5.3 we have 1  −21−k −k−1 A k, q q(q · q) q which for  = 2 becomes

−1/2k (5.7) A k q .

We write f(α) as 1 + F1(α) + AF2(α), with

N X  a   F (α) := e mk − A · e((α − a/q)mk) 1 q m=1 N X k F2(α) := e((α − a/q)m ). m=1

Bound for F1. PN The term F1 is m=1 amg(m) with a  a := e mk − A, g(m) := e((α − a/q)mk). m q P Let S(x) := 1≤m≤x am, with S(0) := 0. am is q-periodic and S(q) = S(2q) = Pq k ··· = 0 (because S(q) = m=1 e(am /q) − qA = 0), therefore S(x) is q-periodic too. In particular it is bounded by the maximum that it assumes in x ∈ [0, q], CAP. 5: WARING’S PROBLEM 123 which is ≤ 2q (since for x ∈ [0, q] there are q terms at most and the addends are ≤ 2). Therefore kS(·)k∞ ≤ 2q. By partial summation (Formula (1.4)) we have Z N 0 F1 = S(N)g(N) − S(x)g (x) dx, 1 so that Z N Z N 0 k−1 |F1| ≤ kS(·)k∞ + kS(·)k∞ |g (x)| dx ≤ 2q + 2q 2π|α − a/q|kx dx 1 1  1   q 1 + N k · = q + N ν  N 2ν. qN k−ν

Bound for F2. From the Euler–Maclaurin formula (1.5) it follows that Z N Z N  0  F2 = g(x) dx + O(1) + O |g (x)| dx . 1 1 R N k Setting v(β) := 1 e(βx ) dx and recalling that here above we have proved that R N 0 ν ν 1 |g (x)| dx  N /q  N , we conclude that ν F2 = v(α − a/q) + O(N ). Consequently 2ν ν f(α) = 1 + F1 + AF2 = 1 + O(N ) + Av(α − a/q) + O(|A|N ) = Av(α − a/q) + O(N 2ν) so that (by (5.7))

2ν −1/2k (5.8) |f(α)| k N + q |v(α − a/q)|. We need a bound for v(α − a/q). Trivially, we see that |v(α − a/q)| ≤ N: this fact already proves that

2ν N (5.9) |f(α)| k N + . q1/2k We have N 2ν  N whenever q1/2k  N 1−2ν. By hypothesis we have q < N 2ν, q1/2k k 1 2k therefore the bound holds for sure if 2ν ≤ (1 − 2ν)2 , i.e. if ν ≤ 2 2k+1 . Since we are assuming k ≥ 2, it is true for sure when ν ≤ 2/5. Under this assumption (5.9) becomes N |f(α)| k , q1/2k 124 5.2. SECOND STEP: INTEGRAL REPRESENTATION which is the claim for j = 0. Suppose that j ≥ 1. Then the change x = y/|β|1/k gives Z N Z N·|β|1/k v(β) = e(βxk) dx = |β|−1/k e(sgn(β)yk) dy. 1 |β|1/k R z k 4 Since F±(z) := 0 e(±y ) dy is bounded as function of z ∈ R, we conclude that |v(β)|  |β|−1/k. Therefore, when j ≥ 1 from (5.8) we have

k k  j −1/k |f(α)|  N 2ν + q−1/2 |α − a/q|−1/k ≤ N 2ν + q−1/2 N k N ≤ N 2ν + . q1/2k j1/k We have already noticed that j ≤ N ν. Using this restriction for j and recalling that we are assuming q ≤ N 2ν, it is immediate to verify that N 2ν  N for q1/2k j1/k ν ≤ 1/3. Concluding, we have proved that N |f(α)| k q1/2k j1/k also in this case. Case 2: large q. Suppose N 2ν ≤ q ≤ N. Then by Theorem 5.3 we have

N k a X am  1−k f = 1 + e  N(qN)q−2 q q k, m=1 which for any  ≤ ν2−k and under the assumption N 2ν ≤ q ≤ N gives

a k (5.10) f  Nq−1/2 . q k 0 k+1 Moreover, kf k∞  N uniformly in R, therefore a a 1 N 1+ν (5.11) f(α) − f  kf 0k · α − ≤ N k+1 · = . q ∞ q qN k−ν q We are assuming that N 2ν ≤ q ≤ N, therefore the bound proves that

a k (5.12) f(α) − f  Nq−1/2 . q

4In fact, the change y → z1/k and an integration by part show that Z Z e(±z) e(±z) 1 − 1/k Z e(±z) ± e(±yk) dy = ± dz = + dz kz1−1/k 2πikz1−1/k 2πik z2−1/k from which it is easy to deduce that limz→±∞ F±(z) exists and is finite whenever k > 1. CAP. 5: WARING’S PROBLEM 125

N 1+ν −1/2k (In other words, we are claiming that q  Nq . This happens if and only if N ν  q1−1/2k . Since N 2ν ≤ q, this happens for sure as soon as N ν  N 2ν(1−1/2k), i.e., as soon as 1/2k ≤ 1/2, which is true.) −1/2k Bounds (5.10) and (5.12) together imply that f(α) k Nq . This concludes the proof of this case since the assumption q ≥ N 2ν and j ≤ N ν/q imply that 0 is the unique possible value for j.  k−1 Lemma 5.4 Let ν ∈ (0, 1/3]. Let α ∈ Ij(q, a). If N < q ≤ N , then j = 0 and N f(α) k . N 1/2k Proof. j is zero because this happens for every q > N ν. Moreover, in the given range for q by Theorem 5.3 we have

N k a X am  k f = 1 + e  N(qN)N −2/2 q q k, m=1 which for any  ≤ 2−k/k and under the hypothesis N < q ≤ N k−1 gives a k f  N 1−1/2 . q k 1+ν We get the claim since f(α) − f a   N  N (for the first bound q q N 1/2k recall (5.11), for the second inequality notice that it can be written as N ν+1/2k  q, k that ν + 1/2 < 1 and that N ≤ q by hypothesis).  k−1 k−ν Lemma 5.5 Let ν ∈ (0, 1/3]. Let α ∈ Ij(q, a). If N < q ≤ N , then j = 0 and k 1/2k f(α) k N(q/N ) . Proof. j is zero because this happens for every q > N ν. Moreover, in the given range for q by Theorem 5.3 we have

N k a X am  k f = 1 + e  N(qN)(q/N k)2/2 q q k, m=1 ν −k k−1 k−ν which for any  ≤ k+1 2 and under the assumption N < q ≤ N gives a k f  N(q/N k)1/2 , q k a  N 1+ν k 1/2k and the claim follows since f(α) − f q  q  N(q/N ) (for the first bound recall (5.11), for the second inequality notice that it can be written as N ν+k/2k  q1+1/2k , that N k−ν ≤ q by hypothesis, and that ν +k/2k ≤ (k −ν)(1+ k 1/2 ) holds for ν ≤ 1/3 and k ≥ 2).  126 5.2. SECOND STEP: INTEGRAL REPRESENTATION

We are now in position to complete the proof of Linnik’s result (Theorem 5.2). k The size of Ij(q, a) is 2/N , at most, therefore from Lemma 5.3 we get Z  N h 1 N h−k f(α)he(−nα) dα   k 1/2k 1/k k k h/2k h/k Ij (q,a) q (j + 1) N q (j + 1) when q ≤ N,

Z h h−k X h  N  1 N f(α) e(−nα) dα k k N 1/2k N k N h/2k j Ij (q,a) when N < q ≤ N k−1 from Lemma 5.4, and

Z h h−k X h  k 1/2k  1 N f(α) e(−nα) dα k N(q/N ) k N k N hν/2k j Ij (q,a) when N k−1 < q ≤ N k−ν from Lemma 5.5. As a consequence, using the decompo- sition (5.6) we conclude that Z 1 f(α)he(−nα) dα 0 q−1 q−1 q−1 X X X N h−k X X N h−k X X N h−k k + + qh/2k (j + 1)h/k N h/2k N hν/2k q≤N a=0 j N

It is customary to denote with g(k) the dimension of Gk as additive basis and with G(k) its dimension as asymptotic additive basis. The value of g(k) is strongly affected by the arithmetic of k. In fact, Johann Albrecht Euler (son of k  3 k k Leonhard Euler) noticed that the number 2 ( 2 ) − 1 is smaller than 3 , thus in its representation as sum of k powers only 1k and 2k may appear. The shortest CAP. 5: WARING’S PROBLEM 127

 3 k k k possible representation needs ( 2 ) − 1 terms of kind 2 and 2 − 1 terms of kind k  3 k k 1 , and therefore needs ( 2 ) + 2 − 2 powers thus proving that 3k g(k) ≥ 2k + − 2. 2 He conjectured that this is the correct value of g(k). The joint work of Dickson, Pillai, Rubugunday and Niven proved that the conjecture holds for k whenever 3k 3k (5.13) 2k + ≤ 2k. 2 2 This condition may be rephrased by noticing that if we write 3k as q2k + r with k k  3 k  3 k 0 ≤ r < 2 , then r is 2 ( 2 ) and q is ( 2 ) and the inequality simply says that q + r ≤ 2k. Numbers q and r may be quickly recovered from the binary representation of 3k (because r is simply the number represented by the k lower binary digits, and q the number represented by the other binary digits), and the condition q + r ≤ 2k may be tested directly from their binary representation. Moreover, the binary representation of q contains only dk ln2 3e − k digits, so it is shorter than r by 2k − dk log2 3e digits. As a consequence, the claim holds for sure when among the 2k − dk log2 3e most significant digits of r at least one 0 appears. For example: 17 3 = (1111011001 01000010111000011)2 | {z } | {z } q r thus (1111011001) + q + r = 2 (01000010111000011)2 and this is lower than 2k since

r = (0100001 0111000011)2. | {z } zeros here With this procedure the statement may be checked by any machine with a very modest ability to manipulate integers but having a very huge memory to store the binary representation of 3k and which is specially tuned to quickly manipulated  3 k k extremely long binary sequences. In this way the equality g(k) = ( 2 ) + 2 − 2 has been checked for every k ≤ 4 · 108 by Kubina and Wunderlich5. In its essence (5.13) is a Diophantine inequality, claiming a property of the distribu- tion of (3/2)k modulo 1. In this sense it is not surprising that in 1957 Mahler6 was able to adapt a previous result of his ones to the then very recent Roth’s result on Diophantine inequalities to deduce that (5.13) has only finitely many exceptions,

5Extending Waring’s conjecture to 471, 600, 000, Math. Comp. 55(192), 815–820 (1990). 6 On the fractional parts of the powers of a rational number. II, Mathematika 4, 122–124 (1957). 128 5.2. SECOND STEP: INTEGRAL REPRESENTATION at most. Unfortunately Mahler’s result is not constructive, thus at the moment we don’t know any bound neither about the number of possible exceptions, nor about the largest possible exception. G(k) is in some sense more regular and apparently it should depend only on the size of k; the argument we have proposed here may be modified to provide a lower bound for the part of the integral corresponding to q < N 2ν (the so called major arcs), and in this way one proves that G(k) ≤ 6k2k. Much stronger upper 1+ bounds of the kind G(k)  k have been proved; in our opinion the completely explicit result of Karatsuba7 deserves a special mention, but other and even better bounds are known. A good and complete account of the history of the problem and of the recent results is contained in Vaughan–Wooley, Waring’s problem: a survey, in Number theory for the millennium, III (Urbana, IL, 2000), 301–340 (2002). Lastly, we mention that Linnik in 1943 provided a completely elementary proof of the Waring problem. Actually, his argument is called elementary because no complex analysis, cancellation in exponential sums, or integration methods are involved. However, it is not trivial, definitively. Moreover it can be used to treat Ph more general additive problems, for example j=1 f(xj) = n (where f is any fixed polynomial assuming integer and positive values at positive integers, degree k, and Ph we are looking for a solution in x1, . . . , xk ∈ N, for every n) or j=1 fj(xj) = n where the polynomials fj share the degree but change with j. A ‘passionate’ exposition of Linnik’s method is in Khinchin’s book: Three pearls of number theory8. A more recent exposition is in Nathanson’s book: Elementary methods in number theory GTM 195, Springer-Verlag, 2000, Chapters 11 and 12.

7see Th. 3.4 in Arkhipov–Chubarikov–Karatsuba, Trigonometric sums in number theory and analysis, de Gruyter Expositions in Mathematics 39 Walter de Gruyter GmbH & Co. KG, Berlin 2004. 8Khinchin composed the book in Spring 1945 for a friend of him who asked some interesting maths to study during his recovering from a wound in WWII (the preface of this book is very instructive about the general feeling of that period in URSS). Bibliography

Books used for the main parts of the course: [IK ] H. Iwaniec, E. Kowalski: Analytic number theory, American Mathematical Society Colloquium Publications 53, American Mathematical Society, Providence RI, 2004. [MV ] H. L. Montgomery, R. C. Vaughan: Multiplicative number theory. I. Classical theory, Cambridge Studies in Advanced Mathematics 97, Cambridge University Press, Cambridge, 2007. [P ] P. Pollack: Not always buried deep, A second course in elementary number theory, American Mathematical Society, Providence RI, 2009. Books cited somewhere: [ACK ] G. I. Arkhipov, V. N. Chubarikov and A. A. Karatsuba: Trigonometric sums in number theory and analysis, de Gruyter Expositions in Mathematics 39 Walter de Gruyter GmbH & Co. KG, Berlin 2004. [EL ] P. Eymard and J.-P. Lafon: The number π, American Mathematical Society, Provi- dence, 2004. [HR ] H. Halberstam and K. F. Roth: Sequences, 2 ed. Springer-Verlag, New York, 1983. [H ] B. Huppert: Character theory of finite groups, de Gruyter Expositions in Mathema- tics 25, Berlin, 1998. [HR ] G. H. Hardy and M. Riesz: The general theory of Dirichlet’s series, Stechert-Hafner, New York, 1964. [HW ] G. H. Hardy and E. M. Wright: An introduction to the theory of numbers, V ed., The Clarendon Press Oxford University Press, 1979. [I ] I. M. Isaacs: Character theory of finite groups, Dover Publications Inc., New York, 1994. [T ] G. Tenenbaum: Introduction to analytic and probabilistic number theory, Cambridge Studies in Advanced Mathematics 46, Cambridge, 1995. [Titch ] E. C. Titchmarsh: The theory of functions, 2d ed., Oxford University Press, London, 1975. [TV ] T. Tao and V. Vu: Additive combinatorics, Cambridge Studies in Advanced Ma- thematics 105, Cambridge University Press Cambridge, 2006.

129 Characters of this drama

• Niels Henrik Abel. Frind¨oe() 5-8-1802, Froland (Norway) 6-4-1829. • Jacob Bernoulli. Basel 27-12-1654, Basel 16-8-1705. • Enrico Bombieri. Milan 26-11-1940. • Carlo Emilio Bonferroni. Bergamo () 28-1-1892, Florence 18-8-1960. • Jean Bourgain. Ostend (Belgium) 28-2-1954. • Viggo Brun. Lier 13-10-1885 (Norway), Drøbak (Norway) 15-8-1978. • Edmond Darrel Cashwell. • Augustin Louis Cauchy. Paris 21-8-1789, Sceaux (France) 23-5-1857. • Ernesto Ces`aro. Naples 12-3-1859, Torre Annunziata (Italy) 12-9-1906. • Pafnuty Lvovich Chebyshev. Okatovo (Russia) 16-5-1821, San Pietroburgo 8-12-1894. • Jingrun Chen. Fuzhou (Fujian Province, China) 22-5-1933, 22-3-1996. • Harold Davenport. Huncoat (England) 30-10-1907, Cambridge 9-6-1969. • Julius Richard Dedekind. Braunschweig (now Germany) 6-10-1831, Braunschweig 12-2-1916. • Jean-Marc Deshouillers. • Johann Peter Gustav Lejeune Dirichlet. D¨uren(Germany) 13-2-1805, Gottingen 5-5-1859. • Leonard Eugene Dickson. Independence (Iowa) 22-1-1874, Harlingen (Texas)17-1-1954. • Freeman John Dyson. Crowthorne (England) 15-12-1923. • Eratosthenes of Cyrene. (now Shahhat, Libya) 276 BC, Alexandria (Egypt) 194 BC. • Paul Erd˝os. Budapest 26-3-1913, Warsaw 20-9-1996. • Euclid of Alexandria. About 325 BC, Alexandria (Egypt) about 265 BC. • Johann Albrecht Euler. St.Peterbourg 27-11-1734, St. Petersburg 17-9-1800. • Leonhard Euler. Basel 15-4-1707, St. Petersburg 18-9-1783. • Cornelius Joseph Everett. • Francesco Fa`adi Bruno. Alessandria (Italy) 29-3-1825, Torino (Italy) 27-3-1888. • Johann Faulhaber. Ulm (Germany) 5-5-1580, Ulm 10-9-1635. • John Farey. Woburn (England) 1766, London 6-1-1826. • Pierre de Fermat. Beaumont-de-Lomagne (France) 17-8-1601, Castres (France) 12-1-1665. • Guido Fubini. Venezia 19-1-1879, New-York 6-6-1943. • Johann Carl Friedrich Gauss. Brunswick-L¨uneburg(Germany) 30-4-1777, Gottingen 23-2-1855. • Christian Goldbach. K¨onigsberg, Prussia (now Kaliningrad, Russia) 18-3-1690, 20-11-1764. • Dorian Goldfeld. Marburg (Germany) 21-1-1947. • Ben Joseph Green. Bristol (England) 27-2-1977. • Jacques Salomon Hadamard. Versailles 8-12-1865, Paris 17-10-1963. • Harald Andr´es Helfgott. Lima (Peru) 25-11-1977. • David Hilbert. K¨onigsberg, Prussia (now Kaliningrad, Russia) 23-1-1862, G¨ottingen14-2-1943. • Otto Ludwig H¨older. Stuttgart 22-12-1859, Leipzig 29-8-1937. • Martin Neil Huxley (England). • Albert Edward Ingham. Northampton (England) 3-4-1900, Chamonix (France) 6-9-1967. • Henryk Iwaniec. 9-10-1947 (Poland). • Anatolii Alexeevitch Karatsuba. Grozny (Russia) 31-1-1937, Moscow 28-9-2008. • Aleksandr Yakovlevich Khinchin. Kondrovo (Russia) 19-7-1894, Moscow 18-11-1959. • K. I. Klimov. • Nikolai Mikhailovich Korobov. 23-11-1917. • Jeffrey M. Kubina. • Jeffrey Clark Lagarias. Pittsburgh 11-1949. • Joseph Louis Lagrange. Tourin 25-1-1736, Paris 10-4-1813. • Edmund Georg Hermann Landau. Berlin 14-2-1877, Berlin 19-2-1938. • Adrien-Marie Legendre. Paris 18-9-1752, Paris 10-1-1833. • Ernst Lindel¨of. Helsingfors ( now Helsinki, Finland) 7-3-1870, Helsinki 4-6-1946. • Yuri Linnik. Belaya Tserkov (Ukraine) 21-1-1915, Leningrad (now St Petersburg, Russia) 30-6-1972. • Rudolf Otto Lipschitz. K¨onigsberg Prussia (now Kaliningrad, Russia) 14-5-1832, Bonn 7-10-1903. 130 CAP. : BIBLIOGRAPHY 131

• John Edensor Littlewood. Rochester (England) 9-6-1885, Cambridge (England) 6-9-1977. • Colin Maclaurin. Kilmodan (Great Britain) 2-1698, Edimburg 14-6-1746. • Kurt Mahler. Krefeld (Prussian Rhineland) 26-7-1903, Camberra (Australia) 23-2-1988. • Henry Berthold Mann. Vienna 27-10-1905, 1-2-2000. • Lorenzo Mascheroni. Bergamo (Italy) 13-5-1750, Paris 14-7-1800. • Franz Carl Joseph Mertens. Schroda (Poland) 20-3-1840, Vienna 5-3-1927. • August Ferdinand M¨obius. Schulpforta (Germany) 17-1-1790, Lipsia 26-9-1868. • Hugh Lowell Montgomery. • Giacinto Morera. Novara (Italy) 18-7-1856, (Italy) 8-2-1907. • Maruti Ram Pedaprolu Murty. Guntur (India) 10-16-1953. • Mohan K. N. Nair. • Rolf Herman Nevanlinna. Joensuu (Finland, then Russia) 22-10-1895, Helsinki 28-5-1980. • Maxwell Herman Newman. Chelsea (England) 7-2-1897, Comberton (England) 22-2-1984. • Ivan Morton Niven. Vancouver (Canada) 25-10-1915, Eugene (Oregon) 9-5-1999. • Bertil Nyman. • Oskar Perron. Frankenthal (Germany) 7-5-1880, Munich 22-2-1975. • Subbayya Sivasankaranarayana Pillai. Nagercoil (Tamil Nadu) 5-4-1901, Cairo (Egypt) 31-8-1950. • George P´olya. Budapest 13-12-1887, Palo Alto (California) 7-9-1985. • Alfred Pringsheim. Ohlau (Germany) 2-9-1850, Zurich 25-6-1941. • Srinivasa Aiyangar Ramanujan. Erode (India) 22-12-1887, Kumbakonam (India) 26-4-1920. • Olivier Ramar´e. • Georg Friedrich Bernhard Riemann. Breselenz (Germany) 17-9-1826, Selasca (Italy) 20-6-1866. • Hans Riesel. • Marcel Riesz. Gy¨or(Hungary) 16-11-1886, Lund (Sweden) 4-9-1969. • Giancarlo Rota. Vigevano (Italy) 27-4-1932, Cambridge 18-4-1999. • Klaus Friedrich Roth. Breslau (Germany, now Wroc law, Polland) 29-10-1925. • Raghunath Krishna Rubugunday. Madras (India) 1918, 2000. • Imre Z. Ruzsa. Budapest 23-7-1953. • Issai Schur. Mogilev (Russia, now Belarus) 01-10-1875, Tel Aviv (Palestine now Israel) 01-10-1941. • Hermann Amandus Schwarz. Hermsdorf, Silesia (now Poland) 25-1-1843, Berlin 30-11-1921. • Peter Sarnak. (South Africa) 18-12-1953. • Atle Selberg. Langesund (Norway) 14-6-1917, Princeton 6-8-2007. • Lev Genrikhovich Shnirelman. Gomel (Belarus) 2-1-1905, Moscow 24-9-1938. • James Stirling. Garden (Scotland) 5-1692, Edinburgh 5-12-1770. • James Joseph Sylvester. London 3-9-1814, London 15-3-1897. • Brook Taylor. Edmonton (Great Britain) 18-8-1685, London 29-12-1731. • Charles Jean Baron de la Vall´eePoussin. Louvain (Belgium) 14-8-1866, Louvain 2-3-1966. • N. P. Romanoff. • Terence Chi-Shen Tao. Adelaide (Australia) 17-7-1975. • Robert Charles Vaughan 24-3-1945. • Ivan Matveevich Vinogradov. Milolyub (Russia) 14-9-1891, Moscow 20-3-1983. • Hans Carl Friedrich von Mangoldt. Weimar 18-5-1854, Gdansk 27-10-1925. • Georgy Fedoseevich Voronoi. Zhuravka, (Russia, now Ukraine) 20-04-1868, Warsaw 20-11-1908. • John Wallis. Ashford (England) 23-11-1616, Oxford 28-10-1703. • Edward Waring. Shrewsbury (England) 1736, Pontesbury (England) 15-8-1798. • Karl Theodor Wilhelm Weierstrass. Ostenfelde (Germany) 31-10-1815, Berlin 19-2-1897. • Andr´e Weil. Paris 6-5-1906, Princeton 6-8-1998. • Hermann Klaus Hugo Weyl. Elmshorn (Germany) 9-11-1885, Zurich 9-12-1955. • Eduard Wirsing. • Trevor Wooley. • John William Wrench Jr.. Westfield (New York) 13-10-1911, Frederick (Maryland) 27-2-2009. • Marvin C. Wunderlich.