<<

AN ELEMENTARY PROOF OF THE PRIME

ABHIMANYU CHOUDHARY

Abstract. This paper presents an ”elementary” proof of the theorem, elementary in the sense that no complex analytic techniques are used. First proven by Hadamard and Valle-Poussin, the prime number the- orem states that the number of primes less than or equal to an x x asymptotically approaches the value ln x . Until 1949, the theorem was con- sidered too ”deep” to be proven using elementary means, however Erdos and Selberg successfully proved the theorem without the use of . My paper closely follows a modified version of their proof given by Norman Levinson in 1969.

Contents 1. Arithmetic Functions 1 2. Elementary Results 2 3. Chebyshev’s Functions and Asymptotic Formulae 4 4. Shapiro’s Theorem 10 5. Selberg’s Asymptotic Formula 12 6. Deriving the Prime using Selberg’s Identity 15 Acknowledgments 25 References 25

1. Arithmetic Functions Definition 1.1. The prime counting denotes the number of primes not greater than x and is given by π(x), which can also be written as: X π(x) = 1 p≤x where the symbol p runs over the set of primes in increasing order. Using this notation, we state the , first conjectured by Legendre, as: Theorem 1.2. π(x) log x lim = 1 x→∞ x Note that unless specified otherwise, log denotes the . 1 2 ABHIMANYU CHOUDHARY

2. Elementary Results Before proving the main result, we first introduce a number of foundational definitions and results. Definition 2.1. An arithmetical function or a sequence, is a function whose domain is the natural , and codomain is either the real numbers or the complex numbers. Definition 2.2. We define the sum of an a to be: X a(d) d|n where the symbol d ranges over the set of positive of n. Definition 2.3. We define the Dirichlet product or Dirichlet of two arithmetic functions f, g as: X n (f ∗ g)(n) = f(d)g d d|n Note that the Dirichlet Product is commutative and associative. Moreover, the set of Arithmetical functions has an identity I over this product, and every arith- metical function with the property that f(1) 6= 0 has an inverse f −1 such that f ∗ f −1 = I. It is easy to verify that the identity function I is given by: (  1  1 if n = 1 I(n) = = n 0 otherwise Definition 2.4. We define the Mobius function, µ as:  1 if n = 1  k µ(n) = (−1) if n = p1, ..., pk for primes p1, ..., pk 0 otherwise Thus, the Mobius function allows us to determine a ”parity” of sorts for any squarefree integer. Theorem 2.5. The divisor sum of the mobius function is given by: ( X  1  1 if n = 1 µ(d) = = = I(n) n 0 otherwise d|n This can be verified using the fundamental theorem of arithmetic and the bino- mial theorem. Note that this divisor sum yields the identity function, an important property we will use momentarily. Definition 2.6. We define the unit function by: u(n) = 1 for all natural n. We see that the divisor sum in 2.5 can be rewritten as: X X X n  1  µ(d) = µ(d)1 = µ(d)u = (µ ∗ u)(n) = d n d|n d|n d|n AN ELEMENTARY PROOF OF THE PRIME NUMBER THEOREM 3

Thus, the mobius and unit functions are inverses of each other. We can use this property to derive a powerful formula, known as the Mobius inversion formula. Theorem 2.7 (Mobius inversion formula). If f, g are arithmetical functions and: X g(d) = f(n) d|n then: X n f(d)µ = g(n) d d|n Proof. We have by 2.3 that: g ∗ u = f Taking the convolution of both sides with µ we have: µ ∗ (g ∗ u) = µ ∗ f Using associativity and commutativity, we can write the above expression as: g = µ ∗ f = f ∗ µ as required.  Corollary 2.8 (Generalized Mobius Inversion). If f, g are arithmetical functions and: X  x  g(x) = f n n≤x then X  x  f(x) = µ(n)g n n≤x where the symbol n ranges over all not greater than x. We now introduce Von Mangoldt’s function given by the symbol Λ. Definition 2.9 (Von Mangoldt’s Function). For every integer n ≥ 1 we define: ( log(p) if n = pk for some prime p and k ≥ 1 Λ(n) = 0 otherwise

The above definition is fairly powerful as it turns a multiplication problem (prime factorization), into an addition problem through the use of logarithms. We are also prohibited from ”double counting” any prime factors, as we will see in the next theorem. Theorem 2.10 (Divisor sum of the ). X Λ(d) = log n d|n The proof for this result can be derived follows naturally from the fundamental theorem of arithmetic. Roughly speaking, the function counts each prime factor of n exactly as many times as it appears in the prime factorization of n. Through summing and properties of the logarithm, the result follows. 4 ABHIMANYU CHOUDHARY

3. Chebyshev’s Functions and Asymptotic Formulae Definition 3.1. We say that a function f is ”big-oh g(x) ” for all x ≥ a or write that: f(x) = O(g(x)) if there exists a constant M such that for all x ≥ a we have: f(x) ≤ Mg(x) Corollary 3.2. Let f, g be Riemann Integrable functions such that f(t) = O(g(t)) for t ≥ a. Then we have: Z x Z x  f(t)dt = O g(t)dt a a Definition 3.3. We say that f is asymptotic to g or f ∼ g, if: f(x) lim = 1 x→∞ g(x) Definition 3.4. We define the extension of an arithmetic function a as a map R+ → R given by: + a(x) = a (bxc) for all x ∈ R We now have a suitable way to extend the domain of arithmetic functions to the positive reals. We now have a new set of tools to our disposal, namely those of cal- culus. We now supply a powerful summation formula that allows us to approximate the partial sums of arithmetic functions. Theorem 3.5 (Abel’s Summation Formula). Let f be a real valued function with a Riemann-Integrable derivative for t ≥ 1. Let a(n) be an arithmetical function and let A(x) be the partial sum of a up to x. Then: X Z x a(n)f(n) = f(x)A(x) − f 0(t)A(t)dt n≤x 1 Proof. Taking suitable a, f, we have that: X A(n) − A(n − 1) = [A(1) − A(0)] + ... + [A(n) − A(n − 1)] 1≤n≤x This sum clearly telescopes to the value A(n) − A(0). Because A(0) is an empty sum, we have that: X X a(n) = [A(n) − A(n − 1)] n≤x n≤x Mutliplying both sides by f(n): X X a(n)f(n) = [A(n) − A(n − 1)]f(n) n≤x n≤x Expanding the right hand side: X X X [A(n) − A(n − 1)]f(n) = A(n)f(n) − A(n − 1)f(n) n≤x n≤x n≤x Reindexing, it follows that: X X X X A(n)f(n) − A(n − 1)f(n) = A(n)f(n) − A(n)f(n + 1) n≤x n≤x n≤x n≤x−1 AN ELEMENTARY PROOF OF THE PRIME NUMBER THEOREM 5

Note that: X X A(n)f(n) = A(n)f(n) + A(x)f(x) n≤x n≤x−1 So we have: X X X X A(n)f(n)− A(n)f(n+1) = A(x)f(x)+ A(n)f(n)− A(n)f(n+1) n≤x n≤x−1 n≤x−1 n≤x−1 Combining we have: X X X A(x)f(x) + A(n)f(n) − A(n)f(n + 1) = A(n)(f(n + 1) − f(n)) n≤x−1 n≤x−1 n≤x−1 Because f has Riemann integrable derivative, we can apply the fundamental theo- rem of calculus to it and say: X X Z n+1 A(n)(f(n + 1) − f(n)) = A(n) f 0(t)dt n≤x−1 n≤x−1 n Because A(n) is constant on the interval [n, n + 1) and takes on the value A(n) everywhere, we can place it inside the integral. Thus: X Z n+1 X Z n+1 A(n) f 0(t)dt = A(t)f 0(t)dt n≤x−1 n n≤x−1 n . Summing the integrals, we have: X Z n+1 Z x A(t)f 0(t)dt = A(t)f 0(t)dt n≤x−1 n 1 Thus, in conclusion, we see that X Z x a(n)f(n) = A(x)f(x) + A(t)f 0(t)dt n≤x 1 as we need.  Corollary 3.6 (Euler’s Summation Formula). Let f be a function with Riemann- integrable derivative defined on the interval [1, x]. Then: X Z x Z x f(n) = f(t)dt + f 0(t)dt + f(x)(bxc − x) n≤x 1 1 Proof. This result follows from the Abel summation formula.  We will use these results to derive results about the asymptotic behavior of cer- tain arithmetic functions. We first introduce two important functions of Chebyshev whose asymptotic behavior we will examine. Definition 3.7. For x > 0 we define the Chebyshev ϑ function by: X ϑ(x) = log p p≤x where the symbol p runs over all primes not exceeding x. Definition 3.8. For x > 0 we define the Chebyshev ψ function by: X ψ(x) = Λ(n) n≤x 6 ABHIMANYU CHOUDHARY

Lemma 3.9. For x > 1 we have: ψ(x) ϑ(x) (log x)2 0 ≤ − ≤ √ x x 2 x log 2

ψ(x) ϑ(x) Remark 3.10. Notice by the squeeze theorem that if either quotient x , x has a limit to infinity, then the other quotient has the same limit. Theorem 3.11. The following 3 statements are equivalent: (1) π(x) log x lim = 1 x→∞ x (2) ϑ(x) lim = 1 x→∞ x (3) ψ(x) lim = 1 x→∞ x (4) ψ(x) − x lim = 0 x→∞ x Remark 3.12. Throughout the rest of this paper, the function ψ(x) − x is denoted by R(x) and will be known as remainder function. We prove that 1 is equivalent to 2. The equivalency of 2 and 3 follows from 3.10 and the equivalency of 3 and 4 follows trivially. We first use a corollary of 3.5, Abel’s Summation Formula: Corollary 3.13. We have formulas: (1) Z x π(t) ϑ(x) = π(x) log x − 2 t (2) ϑ(x) Z x ϑ(t) π(x) = + 2 log x 2 t log t These results can be derived in a straightforward manner using the summation formula. We now use them to prove 3.10.

Proof. We first assume (1) from 3.11. We have then by 3.13 that: ϑ(x) π(x) log x 1 Z x π(t) = − x x x 2 t It suffices then to show that: 1 Z x π(t) lim = 0 x→∞ x 2 t We know by our assumption that: π(t)  1  = O t log t AN ELEMENTARY PROOF OF THE PRIME NUMBER THEOREM 7 for t ≥ 2. So: 1 Z x π(t)  1 Z x 1  = O dt x 2 t x 2 log t by 3.2. We can bound this integral as follows: √ √ √ Z x 1 Z x 1 Z x 1 x x − x √ dt = dt + √ dt ≤ + 2 log t 2 log t x log t log 2 log x and this approaches 0 as x approaches infinity. So by the squeeze theorem, the original integral approaches 0, and we have that: ϑ(x) lim = 1 x→∞ x Similarly, to prove the other direction, we now assume that: ϑ(x) lim = 1 x→∞ x Using our second integral formula, we have: π(x) log x ϑ(x) log x Z x ϑ(t) = + 2 dt x x x 2 t log t As we did previously, we must show the integral expression on the right hand side approaches 0 as we take a limit to infinity. Our initial assumption implies that ϑ(t) = O(t), so we have that: log x Z x ϑ(t) log x Z x 1  2 dt = O 2 dt x 2 t log t x 2 log t Again, bounding this integral in a similar manner, we have: √ √ √ Z x 1 Z x 1 Z x 1 x x − x 2 dt ≤ 2 + √ 2 dtdt ≤ 2 + 2 √ 2 log t 2 log t x log t log 2 log x And again by the squeeze theorem, we see our limit is 0, as necessary, and the second direction is complete. We have proven the equivalency of 3.11.1 and 3.11.2, and the other equivalences follow from this, as stated before. Knowing this, we now set out to prove the prime number by proving equivalent form 3.11.4, i.e that R(x) limx→∞ x = 0  We now examine the asymptotic behavior of Von-Mangoldt’s function and other arithmetical functions. The following identities will prove useful later in our analysis of Chebyshev’s ϑ and ψ functions. Before we do this however, we will introduce a set of corollaries about partial sums of dirichlet products (see definition 2.3). Theorem 3.14. Let f, g be arithmetic functions and denote h = f ∗ g. Let H,F,G denote the partial sums of their respective functions. Then we have that: X  x  H(x) = f(n)G = n n≤x Applying the above to a single arithmetic function, we have that f = f ∗ I and thus: P Theorem 3.15. For F (x) = n≤x f(n) we have: X X X j x k X  x  f(d) = f(n) = F n n n≤x d|n n≤x n≤x 8 ABHIMANYU CHOUDHARY

We now introduce an identity about the partial sums of the harmonic series: Theorem 3.16. X 1  1  = log x + γ + O n x n≤x Where γ is a constant, hereafter called the ”Euler-Mascheroni” constant.

Proof. This is a consequence of 3.6, the Euler summation formula.  Theorem 3.17. X j x k Λ(n) = logbnc! n x≤n Proof. This follows from a combination of 2.10 and 3.15.  Lemma 3.18. (Legendre) Y bxc! = pα(p) p≤x Where: ∞ X  x  α(p) = pm m=1 Proof. This identity is a consequence of 3.17.  Theorem 3.19. For x ≥ 2, we have: logbxc! = x log x − x + O(log x) Proof. We see that: X logbxc! = log n n≤x Applying Euler’s Summation formula, we have: X Z x Z x (t − btc) log n = log t dt + dt − (x − bxc) log x t n≤x 1 1 Z x (t − btc) = x log x + dt + O(log x) 1 t We know that: t − btc 1 = O t t So, we have: Z x (t − btc) Z x 1  dt = O dt = O(log x) 1 t 1 t Thus, X log n = x log x + O(log x) n≤x  and an immediate corollary that follows from this: Corollary 3.20. X j x k Λ(n) = x log x − x + O(log x) n x≤n AN ELEMENTARY PROOF OF THE PRIME NUMBER THEOREM 9

Proof. This follows from 3.17.  Corollary 3.21. X j x k X  x  Λ(n) = ψ = x log x − x + O(log x) n n n≤x n≤x Proof. This follows from 3.15 and the definition of the ψ function as the partial sums of Von Mangoldt’s function.  Remark 3.22. Note that we can also use the less precise approximation: X j x k X  x  Λ(n) = ψ = x log x + O(x) n n n≤x n≤x as the logarithm is dominated by the linear term. This approximation will be useful later on, when we discuss Shapiro’s theorem. The next theorem follows as a consequence of 3.21. Theorem 3.23. X x log p = x log x + O(x) p p≤x Proof. We know by 3.21 that X j x k Λ(n) = x log x − x + O(log x) n n≤x Now, we will reindex the above sum so it can be written in terms of primes less than or equal to x. We know that Λ is nonzero for prime powers and 0 otherwise. Thus, we have ∞ X j x k X X  x  Λ(n) = Λ(pm) n pm n≤x p≤x m=1 Note that the above ”infinite” sum is indeed a finite sum, as for sufficiently large m j x k m, we have p ≥ x and thus pm = 0. By the definition of Λ we have: ∞ ∞ X X  x  X X  x  Λ(pm) = log p pm pm p≤x m=1 p≤x m=1 Decomposing the above sum we have: ∞ ∞ X X  x  X X  x  X x log p = log p + log p pm pm p p≤x m=1 p≤x m=2 p≤x We now prove that: ∞ X X  x  log p = O(x) pm p≤x m=2 We know by definition of the floor function that: ∞ ∞ X X  x  X X  x  log p ≤ log p pm pm p≤x m=2 p≤x m=2 10 ABHIMANYU CHOUDHARY

Summing the geometric series of the right sum we have: ∞ X X  x  X  1  log p = x log p pm (p − 1)p p≤x m=2 p≤x Again, we have: ∞ X  1  X  1  x log p ≤ x log n = O(x) (p − 1)p (n − 1)n p≤x n=2 as the series on the right hand side converges by comparison. 

4. Shapiro’s Theorem We now provide a proof of Shapiro’s theorem, an important theorem which P  x  relates partial sums of the form n≤x a(n) to the often more interesting sums P n of form n≤x a(n). Specifically, we will use this to derive a result about the behavior of the partial sums of the ψ function. Theorem 4.1. (Shapiro’s Tauberian Theorem) Let a(n) be a nonnegative sequence such that: X j x k a(n) = x log x + O(x) n n≤x Then the following are true: (1) For n ≥ 1 we have: X a(n) = log x + O(1) n n≤x (2) There exists a constant M such that: X a(n) ≤ Mx for all x ≥ 1 n≤x (3) There exists a constant m such that: X a(n) ≥ mx for all x ≥ 1 n≤x Proof. Define functions S, T by: X X j x k S(x) = a(n),T (x) = a(n) n n≤x n≤x We first show the inequality: x x S(x) − S ≤ T (x) − 2T 2 2 holds. We have: x X j x k X j x k T (x) − 2T = a(n) − 2 a(n) 2 n x 2n n≤x n≤ 2 Reindexing we have x X j x k j x k X j x k T (x) − 2T = a(n) − 2a(n) + a(n) 2 x n 2n x n n≤ 2 2

The sum on the left hand side will be nonnegative because b2xc − 2bxc is always nonnegative (this can be checked by considering the min and max of both functions) and our sequence is nonnegative. Thus, we have: x X j x k T (x) − 2T ≥ a(n) 2 x n 2

If we substitute the asymptotic formula we have for T (x), it follows almost imme- diately that: X a(n) x = log x + O(1) n x≤n as needed.  We can now state a number of corollaries that follow immediately from this result: Corollary 4.2. The following asymptotic formulae hold: (1) ψ(x) = O(x) (2) X Λ(n) = log x + O(1) n n≤x (3) ϑ(x) = O(x) (4) X log p = log x + O(1) p p≤x All of these corollaries follow from previous asymptotic formulae derived in sec- tion 3.

5. Selberg’s Asymptotic Formula We now prove a major result, first derived by in 1949. Theorem 5.1 (Selberg’s Asymptotic Formula). For x > 0 we have the following: X  x  ψ(x) log x + Λ(n)ψ = 2x log x + O(x) n n≤x We first prove a lemma which will help us obtain the final result Lemma 5.2 (Tatuzawa Iseki Identity). Let F be a real valued function defined on R+ and let G be given by: X  x  G(x) = log x F n n≤x Then, we have: X  x  X x F (x) log x + F Λ(n) = µ(d)G n d n≤x d≤x Proof. We first rewrite F (x) log x as a sum. We have: X  1   x  x X X  x  x (5.3) F (x) log x = F log = µ(d)F log n n n n n n≤x n≤x d|n Now, we can use the Mobius inversion formula to say that: X n Λ(n) = µ(d) log d d|n AN ELEMENTARY PROOF OF THE PRIME NUMBER THEOREM 13

So we have: X  x  X  x  X n (5.4) F Λ(n) = F µ(d) log n n d n≤x n≤x d|n Adding (5.3) and (5.4) we have: X  x  X h  x  ni F µ(d) log + log n n d n≤x d|n This is equal to: X  x  X x F µ(d) log n d n≤x d|n So, summarizing our results so far, we have: X  x  X  x  X x (5.5) F (x) log x + F Λ(n) = F µ(d) log n n d n≤x n≤x d|n Now, taking the right hand side and writing n = qd we have: X  x  X x X x X  x  (5.6) F µ(d) log = µ(d) log F n d d x qd n≤x d|n d≤x q≤ d And by our initial definition of G, we can rewrite the right hand side of (5.6) as: X x µ(d)G d d≤x as needed. 

We now prove 5.1 using this lemma.

Proof. We apply 5.2 to the functions ψ(x) and x − γ − 1, where γ is the Euler Mascheroni constant. For ψ(x) we can define the associated Gψ as: X  x  G (x) = log x ψ ψ n n≤x = log x(x log x − x + O(log x)) = x log2 x − x log x + O(log2 x) which follows from 3.21. We have for x − γ + 1 that: X  x  G (x) = log x − γ + 1 γ n n≤x X 1 X = x log x − log x (γ + 1) n n≤x n≤x By 3.16, we have: X 1   1  x log x = x log x log x + γ + O n x n≤x 14 ABHIMANYU CHOUDHARY

So:   1  X G (x) = x log x log x + γ + O − log x (γ + 1) γ x n≤x   1  = x log x log x + γ + O − log x(γ + 1)x x = x log2 x − x log x + O(log x)

2 We see that the√ difference between Gψ and Gγ is O(log x). We will use the weaker estimate of O( x). Now, apply 5.2 (Tatuzawa Iseki) to ψ and x − γ + 1. We have for ψ that: X  x  X x (5.7) ψ(x) log x + ψ Λ(n) = µ(d)G n ψ d n≤x d≤x and for x − γ + 1 we have: X x X x (5.8) (x − γ + 1) log x + = µ(d)G n γ d n≤x d≤x

Subtracting the RHS of 5.7 from that of 5.8,we have the term: X  x x µ(d) G − G ψ d γ d d≤x

√ x Applying our O( x) estimate for the difference of Gψ and Gγ with d as our argu- ment, we have:   r X  x x X x (5.9) µ(d) G − G = O ψ d γ d  d  d≤x d≤x

Factoring, we see that 5.9 is equal to:   r x √ X 1 √ Z 1  (5.10) O  x  = O(x) x √ dt = O(x) d t d≤x 1

Using 5.2 (Tatuzwa Iseki), this time on the auxiliary function:

ψ(x) − (x + γ + 1) we have: X h  x  x i X  x x [ψ(x)−(x+γ+1)] log x+ ψ − − γ − 1 Λ(n) = µ(d) G − G n n ψ d γ d x≤n d≤x and we know from 5.10 that the RHS is O(x), so we have: X h  x  x i ψ(x) − (x + γ + 1)] log x + ψ − − γ − 1 Λ(n) = O(x) n n x≤n AN ELEMENTARY PROOF OF THE PRIME NUMBER THEOREM 15

Rearranging terms and simplifying we have: X  x  X  x  ψ(x) log x + Λ(n) = O(x) + (x + γ + 1) log x + + γ + 1 Λ(n) n n n≤x n≤x X Λ(n) X = O(x) + x log x + O(log x) + x + (γ + 1) Λ(n) n n≤x n≤x = O(x) + x log x + O(log x) + x log x + O(1) + O(log x) = 2x log x + O(x) which is the Selberg Asymptotic identity.  We now provide two alternate formulations of Selberg’s Identity. The corollary below follows immediately through an application of 3.5 (Abel Summation). Corollary 5.11. X Z x ψ(t) Λ(n) log n = ψ(x) log x − dt = ψ(x) log x + O(x) t n≤x 1 Definition 5.12. Define: X n Λ (n) = Λ(n) log n + Λ(d)Λ = Λ(n) log n + (Λ ∗ Λ)(n) 2 d d|n

By 3.15 the partial sums of Λ2 are given by: X X X X  x  Λ (n) = Λ(n) log n + (Λ ∗ Λ)(n) = ψ(x) log x + O(x) + Λ(n)ψ 2 n n≤x n≤x n≤x n≤x So an equivalent restatement of Selberg’s identity is: X (5.13) Λ2(n) = 2x log x + O(x) n≤x Moving the 2x log x to left hand side and applying , we have: X X X (5.14) Q(x) = Λ2(n) − 2 log n = Λ2(n) − 2 log n = O(x) n≤x n≤x n≤x

6. Deriving the Prime Number Theory using Selberg’s Identity We now move to derive the prime number theorem using Selberg’s identity. We will mainlybe working with the function R(x) = ψ(x) − x. We first introduce a lemma that allows us to restate Selberg’s identity in terms of the remainder function, rather than ψ. Lemma 6.1. Selberg’s identity can be restated as: X  x  R(x) log x + Λ(n)R = O(x) n n≤x Unfortunately, due to the nature of ψ(x), R(x) is also particularly temperamen- R(x) tal. and so is the quotient, x . Thus, we will use the smoother: Z x R(t) S(x) = dt 2 t 16 ABHIMANYU CHOUDHARY as our starting point. If we can show that: S(x) lim = 0 x→∞ x then we can show the same result for the quotient involving R, which is the result we need.We now prove two properties of S. Lemma 6.2. The following is true of S(x): (1) S(x) = O(x) (2) S(x) is Lipschitz Proof. We first show 1. Recall that ψ(x) is O(x) for x ≥ 2 and thus we have some constant M1 such that:

ψ(x) ≤ M1x for all x ≥ 2 Subtracting x we have:

R(x) = ψ(x) − x ≤ M1x − x = x(M1 − 1) for all x ≥ 2 Dividing by x, we see: R(x) ≤ (M − 1) x ‘ R(x) So clearly, x = O(1). We know then that: Z x R(t) Z x Z x  S(x) = dt ≤ O(1)dt = O 1dt = O(x) 2 t 2 2 as needed. To show the function is Lipschitz, we can use the fact that for any x1, x2 ≥ 2: |S(x1)| ≤ M1|x1|

|S(x2)| ≤ M1|x2| as S(x) = O(x). Subtracting the expressions and applying the triangle inequality we have:

|S(x1) − S(x2)| ≤ |S(x1)| − |S(x2)| ≤ M1|x1| − M1|x2| ≤ M1|x1 − x2| and thus S is Lipschitz, as required.  Corollary 6.3. R(x) S0(y) = = O(1) x where S0(y) is defined when y 6= pk for some prime p. This was proven indirectly in the proof of 6.2. The restriction on y gives us a guarantee of continuity. We now introduce an important corollary: Corollary 6.4. X  y  S(y) log y + Λ(n)S = O(y) n n≤y AN ELEMENTARY PROOF OF THE PRIME NUMBER THEOREM 17

Proof. Dividing the expression in (6.1) by x and integrating both sides from 2 to y, we have: Z y R(x) Z y X  x  log xdx + Λ(n)S dx = O(1) x n 2 2 n≤x Integrating the first expression from the left by parts, we have: Z y R(x) Z y R(x) Z y 1 log x = log y − S(x) dx 2 x 2 x 2 x = S(y) log y − O(y) Decomposing the second integral we have: Z y X  x  X Z y  x  Λ(n)S dx = Λ(n) S dx n n 2 n≤x n≤x 2 x This can be integrated by applying the substitution u = n and gives us: Z y X  x  X  y  Λ(n)S dx = Λ(n)S n n 2 n≤x n≤x Combining, we have: X  y  S(y) log y − O(y) + Λ(n)S = O(1) n n≤x and the result follows. 

Lemma 6.5. There exists a constant Z1 such that: X  y  log2 |S(y)| ≤ Λ S + Z y log y 2 m 1 m≤x Proof. We prove this by beginning with (6.3). We have that: X  y  S(y) log y + Λ(n)S = O(y) n n≤y y We now substitute y for k , for some positive dummy variable k, and we have:  y   y  X  y   y  O(y) S log + Λ(n)S = O = k k y kn k k n≤ k Multiplying both sides by Λ(k) we get:  y   y  X  y  O(y)Λ(k) Λ(k)S log + Λ(k) Λ(n)S = k k y kn k n≤ k Summing from 1 ≤ k ≤ y we have:      X  y   y  X X  y  X Λ(k) Λ(k)S log + Λ(k)Λ(n)S = O(y)  k k    kn  k k≤y k≤y y k≤y n≤ k The RHS simplifies to: O(y)(y log y + O(1)) = O(y log y) 18 ABHIMANYU CHOUDHARY by 6.2. We now simplify the LHS. First, we break up the sum in the first set of brackets using properties of logarithms. We have: X  y   y  X  y  X  y  Λ(k)S log = Λ(k)S log (y) − Λ(k)S log (k) k k k k k≤y k≤y k≤y Rewriting the above sum and making substitutions, we then have:    X  y  X  y  X X  y  Λ(k)S log (y)− Λ(k)S log (k)+ Λ(k)Λ(n)S = O(y log y) k k   kn  k≤y k≤y k≤y y n≤ k We turn our attention to:    X  y  X X  y  − Λ(k)S log (k) + Λ(k)Λ(n)S k   kn  k≤y k≤y y n≤ k We first perform a sign change, and see the above is equal to:     X  y  X X  y  − Λ(k)S log (k) − Λ(k)Λ(n)S  k   kn  k≤y k≤y y n≤ k y Let m = kn. Because k ≤ y and n ≤ k , we know m ≤ y. We can thus reindex the left hand portion of the above (the symbol changes from k to m but the value of the sum does not). We have:     X  y  X X  y  − Λ(m)S log (m) − Λ(k)Λ(n)S  m   m  m≤y k≤y y n≤ k y Because k ≤ y and n ≤ k , we see that the above means nk = m ≤ y, so reindex again:  " !# X  y  X  y  X − Λ(m)S log (m) − S Λ(k)Λ(n))  m m  m≤y m≤y kn=m

Combining and applying the definition of Λ2 we have:   X  y  − S Λ (m)  m 2  m≤y Now, we reconsider: X  y  Λ(k)S log (y) k k≤y Moving the logarithm to the outside, we have: X  y  log y Λ(k)S k k≤y By 6.1, this is: log y(O(y) − S(y) log y) AN ELEMENTARY PROOF OF THE PRIME NUMBER THEOREM 19

So in summary, we have:   X  y  log y(O(y) − S(y) log y) − S Λ (m) = O(y log y)  m 2  m≤y

And moving the sum to the RHS will give us the result. 

Because we showed that Λ2(m) = 2 log m + O(m) near the end of section 6, our next lemma will show that we can use weights of 2 log m, making our sum easier to work with.

Lemma 6.6. There exists a constant Z2 such that: X  y  (6.7) log2 y|S(y)| ≤ 2 S log m + Z y log y m 2 m≤y Proof. We define a second remainder function C(y), as given by: X  y  (6.8) S (Λ (m) − 2 log m) = C(y) m 2 m≤y Now, recall the function Q defined by: X (6.9) Q(y) = (Λ2(m) − 2 log m) m≤y We see that:

Λ2(m) − 2 log m = Q(m) − Q(m − 1) Substituting, we have: X  y  S (Q(m) − Q(m − 1)) = C(y) m m≤y We can ignore all terms with m < 2 as S(m) = 0 for those terms. Through examination (this becomes exceedingly clear when terms are written out), we can reindex the sum above as:     X  y  y (6.10) C(y) = S − S Q(m) m m + 1 2≤m≤y Applying the reverse triangle inequality (|x − y| ≥ ||x| − |y||), we have:     X  y  y C(y) ≤ S − S Q(m) m m + 1 2≤m≤y

Because S is Lipschitz, we have some Z3 such that: (6.11)         X  y  y X  y  y S − S Q(m) ≤ Z3 − Q(m) m m + 1 m m + 1 2≤m≤y 2≤m≤y Moreover, we can bound Q(m) by some linear function of m as Q(m) = O(m) by 5.14. So, making Z3 large enough and substituting 6.11 we have:     X  y  y C(y) ≤ Z3m − m m + 1 2≤m≤y 20 ABHIMANYU CHOUDHARY

Factoring a y term and simplifying the right hand side, we have:         X 1 1 X 1 C(y) ≤ yZ3m − = yZ3 m m + 1 m + 1 2≤m≤y 2≤m≤y We see that:   Z y X 1 1 C(y) ≤ yZ3 ≤ yZ3 dm = Z3y log y m + 1 m 2≤m≤y 1 Combining the above with our expression for C(y) proves the lemma.  We now further simplify this inequality by replacing the sum above with an integral.

Lemma 6.12. There is a constant Z4 such that: Z y  y  2 log y|S(y)| ≤ 2 S log u du + Z4y log y 2 u Proof. Note the following bound on the integral:  y  Z m+1  y 

log m S ≤ S log u du m m u As log is an increasing function. Now, by the triangle inequality, we see that:  y   y   y   y  S ≤ S + S − S m u m u Integrating we have: Z m+1  y  Z m+1  y  Z m+1  y   y 

S log u du ≤ S log u du+ S − S log u du m m m u m m u

Denote the integral furthest to the right as Jm. Now, using the Lipschitz property and the appropriate constant M1 from 6.2, we have that: Z m+1  y   y  Z m+1 y y

Jm = S − S log u du ≤ M1 − log u du m m u m m u We can bound this by: Z m+1 y y  y y  Z m+1

Jm ≤ M1 − log u du ≤ M1 − log u du m m u m m + 1 m Simplifying and bounding the final integral on the LHS we have:  y y  Z m+1 log(m + 1) Jm ≤ M1 − log u du ≤ M1y m m + 1 m m(m + 1) Because m ≥ log(m + 1) we have: M y J ≤ 1 m m + 1 Now, returning to our original expression, we see:  y  Z m+1  y  M y 1 (6.13) log m S ≤ S log u du + m m u m + 1 Using Z4 = Z2 + M1 and applying 6.13 to 6.6, we have the desired result.  AN ELEMENTARY PROOF OF THE PRIME NUMBER THEOREM 21

The inequality of 6.12 assumes a simpler form with a change of variables. Letting y x = log y and letting v = log u , we can rewrite 6.12 as: Z x−log 2 2 x v (x−v) x (6.14) x |S(e )| ≤ 2 |S (e )| (x − v)e dv + Z4xe 0 Performing another change of variables by defining the function W (x) = e−xS(ex), and applying the transform to 6.14, we have that: Z x−log 2 2 |W (x)| v (x−v) x (6.15) x −x ≤ |S (e )| (x − v)e dv + Z4xe e 0 Further simplifying 6.15 we have: Z x−log 2 x 2 |W (x)| v e x x −x ≤ 2 |W (v)|e (x − v) v dv + Z4xe e 0 e Z x−log 2 x x 2 x xe |W (x)|e ≤ 2 |W (v)|(x − v)e dv + Z4 2 x 0 x 2 Z x 1 (6.16) |W (x)| ≤ 2 |W (v)|(x − v)dv + Z4 x 0 x The transformations above were done to yield a function W which is essentially dominated by a weighted average of itself. We now prove two lemmas about |W (x)|. Lemma 6.17. lim sup |W (x)| = α ≤ 1 x→∞ Proof. By 6.3 we know that: R(x) lim sup ≤ 1 x→∞ x By definition of S, it follows that: |S(y)| lim sup ≤ 1 y→∞ y

It is clear from this that lim supx→∞ |W (x)| ≤ 1  Lemma 6.18. Let 1 Z x lim sup |W (t)|dt = β x→∞ x 0 Then β ≥ α = 1 This key result will be proven using lemmas 6.5, 6.6, and 6.12. Proof. Recall that by 6.16: 2 Z x 1 |W (x)| ≤ 2 |W (v)|(x − v)dv + Z4 x 0 x We first decompose the integral on the right hand side by using a dummy variable and rewriting it as an iterated integral. We see that 6.16 can be rewritten as: 2 Z x  1 Z u  1 (6.19) |W (x)| ≤ 2 udu |W (v)|dv + Z4 x 0 u 0 x 22 ABHIMANYU CHOUDHARY which can be verified by reversing the order of integration. Note that: 2 Z x 2 udu = 1 x 0 Thus we have the integral on the right handside of the form: 1 Z u 1 Z u |W (v)|dv = |e−vS(ev)|dv u 0 u 0

And this is bounded by M1 (our Lipschitz continuity constant) from lemma 6.2, i.e: Z x Z x 1 1 −v v |W (v)|dv = |e S(e )|dv ≤ M1 u 0 u 0 Thus, if we fix some x1, and take any x > x1, we have: 2 Z x  1 Z x  I(x) = 2 udu |W (v)|dv x 0 u 0 Z x1 Z x  Z x  2M1 2 1 ≤ 2 udu + 2 udu |W (v)|dv x 0 x x1 u 0

As we separate the integrals at x1. Given arbitrary  > 0, if we choose x1 sufficiently large, we have by the definition of β (limit supremum) that: 1 Z x1 |W (v)|dv ≤ β +  u 0 for all u ≥ x1. Thus, substituting into our inequality for I(x) we have: M x2  x2  I(x) ≤ 1 1 + (β + ) 1 − 1 x2 x2 Thus, for large x, we have by 6.19 that: M x2  x2  Z |W (x)| ≤ 1 1 + (β + ) 1 − 1 + 4 x2 x2 x Letting x → ∞, we have that α ≤ β + ,and the inequality holds as it is true for arbitrary .  We seek to show that α = 0. To do this, we will use two more facts about W , proving them along the way.

Lemma 6.20. Let k = 2M1 . Then, the following holds:

|W (x1) − W (x2)| ≤ k|x1 − x2| Proof. We see that by definition: |W 0(x)| = −e−xS(ex) + S0(ex) ≤ e−x|S(ex)| + |S0(ex)| ≤ c + c = 2c and it follows by the methods used in 6.2 that the condition follows. 

Lemma 6.21. If W (v) 6= 0 for v1 < v < v2 then there exists M2 such that:

Z v2 W (v)dv ≤ M2 v1 AN ELEMENTARY PROOF OF THE PRIME NUMBER THEOREM 23

Proof. We see through an application of Abel summation that: Z x ψ(t) = log x + O(1) 2 t Or, using the remainder function: Z x R(t) (6.22) 2 dt = O(1) 2 t S(y) Taking y2 , we do another double integral decomposition: Z x S(y) Z x dy Z y R(t) Z x R(t) Z x dy  2 dy = 2 = 2 dt 2 y 2 y 2 t 2 t t y This simplifies to: Z x R(t) 1 Z x R(t) 2 dt − dt 2 t x 2 t R x S(y) Using 6.2 and 6.22, it follows that 2 y2 dy = O(1). Performing change of variables with y = eu, x = ev we have: Z v W (u)du = O(1) log 2

Letting v = v1 and v = v2 and subtracting the integrals with these as endpoints, we have that the result integral is bounded, and thus there exists M2 such that: Z v2 W (u)du ≤ M2 v1  Lemma 6.23. A function W (x) subject to 6.17, 6.18, 6.20 and 6.21 must have β = 0.

Proof. Take ω > α. Then from the definition of α, there exists xω such that for all x ≥ xω we have: (6.24) |W (x)| ≤ ω If W (x) 6= 0 for large x, it follows from 6.21 that ω = 0 and thus α = 0. Thus, suppose that W has arbitrarily large zeros. Let a, b be adjacent zeros of W (x) for x > xω. We now have 3 cases. M2 (1) (b − a) ≥ 2 ω By 6.21, as x 6= 0 for a < x < b we have: Z b 1 W (u)du ≤ M2 ≤ (b − a)ω a 2 1 Thus, the average of |W | on (a, b) is less than 2 ω. ω (2) (b − a) ≥ 2 k Here, it follows from 6.20 (Lipschitz property of W )that if the graph of |W (x)| increases as rapidly as possible from x = a to x = b that it cannot k(b−a) lie above a triangle with height 2 ≤ ω and thus: Z b 1 |W (x)|dx ≤ (b − a)ω a 2 24 ABHIMANYU CHOUDHARY

M2 ω (3) 2 ω ≥ (b − a) ≥ 2 k We can use the reasoning as in case 2 for any points a distance ω/k from each endpoint. Otherwise, by 6.24 we have: Z b w2  2ω  |W (x)| ≤ + b − a − ω a k k Simplifying the right hand side above we have:  ω   ω2  (b − a)ω 1 − ≤ (b − a)ω 1 − k(b − a) 2M2k And this is strictly less than:  α2  (6.25) (b − a)ω 1 − 2M2k

as M2k > 1 and α ≤ 1. If x1 is the first zero of W (x) to the right of xω and x is the largest zero to the left of y then by 6.25 and 6.21 imply that: Z y Z x1  α2  |W (x)|dx ≤ |W (x)|dx + (x − x1)ω 1 − + M2 0 0 2M2k Dividing by y and noting that x ≤ y we have:

Z y Z x1  2  1 1 α M2 |W (x)|dx ≤ |W (x)|dx + (x − x1)ω 1 − + y 0 y 0 2M2k y Letting y → ∞ we see that: α2 β ≤ ω(1 − ) 2M2k and because β ≥ α, we see:  α2  α ≤ ω 1 − 2M2k Since this inequality holds for all ω > α, it must hold for ω = α. Thus, α3 ≤ 0 and since α ≥ 0 it follows α = 0. It follows then that: S(y) lim = 0 y→∞ y Thus, for any given  > 0 we have for large y that: 1 |S(y)| ≤ 2y 3 Hence, we see: 1 S(y(1 + )) − S(y) ≤ 2(y(1 + ) + y) < 2y 3 Expanding S, we have: Z R(t) y(1 + ) dt ≤ 2y y t By the definition of R and because ψ is nondecreasing, we have: ψ(y) Z y(1+) Z y(1+) dt − dt ≤ 2y y(1 + ) y y AN ELEMENTARY PROOF OF THE PRIME NUMBER THEOREM 25

ψ(y) 2 2 Hence, we have that y ≤ (1 + ) . Similarly, because S(y) − S(y(1 − )) ≥ − y ψ(y) 2 for large y leads to y ≥ (1 − ) . Because  is arbitrary, it follows that: ψ(x) lim = 1 x→∞ x proving 1.2.  Acknowledgments. It is a pleasure to thank my mentor, Karen Butt for her assistance in writing this paper, as well as her having the confidence in me to write on a topic that most would consider ambitious. I would also like to thank the program director, Peter May, for organizing this research experience.

References [1] Norman Levinson. A Motivated Account of the Elementary Proof of the Prime Number The- orem. https://www.jstor.org/stable/2316361 [2] Tom M. Apostol. Introduction to . http://plouffe.fr/simon/math/IntrodAnalyticNTApostol.pdf