ELEMENTARY PROOF OF DIRICHLET THEOREM

ZIJIAN WANG

Abstract. In this expository paper, we present the Dirichlet Theorem on primes in arithmetic progressions along with an elementary proof. We first show that L(1, χ) 6= 0 for all non-principal characters and then use the basic properties of the and Dirichlet characters to complete our proof of the Dirichlet Theorem.

Contents 1. Introduction 1 2. Dirichlet Characters 2 3. Dirichlet series 4 4. The non-vanishing property of L(1, χ) 9 5. Elementary proof of the Dirichlet Theorem 12 Acknowledgments 13 References 13

1. Introduction It is undeniable that complex analysis is an extremely efficient tool in the field of . However, mathematicians never stop pursuing so-called ”elementary proofs” of theorems which can be proved using complex analysis, since these elementary proofs can provide new levels of insight and further reveal the beauty of number theory. The famed analyst G.H. Hardy made a strong claim about the prime number theorem: he claimed that an elementary proof could not exist. Hardy believed that the proof of the prime number theorem used complex analysis (in the form of a contour integral) in an indispensable way. However, in 1948, Atle Selberg and Paul Erd¨osboth presented elementary proofs of the prime number theorem. Unfortunately, this set off a major accreditation controversy around whose name should be attached to the proof, further underscoring the significance of an elementary proof of the prime number theorem. The beauty of that proof lies in P 2 P the key formula of Selburg p≤x(log p) + pq≤x(log p)(log q) = 2x log x + O(x). Similarly, the brilliance of Dirichlets proof of his theorem on primes in arithmetic progression lies in his creation of Dirichlet characters, which are used to detect primes in arithmetic progressions. In this expository paper, we follow Dirichlet’s original proof except for his argument for the nonvanishing of L(1, χ). His original proof of the nonvanishing uses his . Here we provide a more

Date: AUGUST 28, 2017. 1 2 ZIJIAN WANG analytic proof of this result, following Iwaniec and Kowalski. The only algebraic ingredient in our proof is the orthogonality relations derived from Schur’s lemma.

2. Dirichlet Characters Dirichlet characters are functions that act as the coefficients of a Dirichlet series. They not only provide us with the language to describe the Dirichlet series, but also turn out to be a useful tool with the help of the Schur orthogonality relations. Instead of providing a formal proof of the orthogonality relations, we will introduce a few applications of the Schur orthogonality relations which are relevant to our discussion of Dirichlet characters. Definition 2.1. Given a finite abelian G, a character is a homomorphism χ : G → C. Example 2.2. Given any cyclic group G of size k with generator g, we always t 2πit have the character χ defined by χ(g ) = e k . Actually any character defined on t 2πilt this cyclic group must be in the form of χ(g ) = e k since the character is a homomorphism.

Example 2.3. The principal character χ0 is the trivial homomorphism such that

χ0(t) = 1, ∀ t ∈ G. Remark 2.4. We can define at least one character for any group, namely the prin- cipal character. However, there is a type of ”characters” that are extremely useful to our discussion. We now introduce the Dirichlet characters. Definition 2.5. Dirichlet Characters A Dirichlet character χ is a function from integers N to C that satisfies the following conditions: 1. There exists a positive integer k such that χ(n) = χ(n + k) for all n. 2. If (n, k)1 > 1 then χ(n) = 0; if (n, k) = 1 then χ(n) 6= 0. 3. χ(mn) = χ(m)χ(n) for all integers m and n. Remark 2.6. Dirichlet characters are actually not characters. However, they are deeply related to the characters on certain groups. For every period k, we have a canonical ”trivial character mod k” defined by: ( 1 (a, k) = 1. χ(a) = 0 otherwise. We call this the principal character of period k2. Moreover, one may notice that every Dirichlet character χ is built upon a character χ0 on the group Z/kZ∗ for some integer k by the following construction: ( χ0(b)(a, b) = 1, a ≡ b mod k. χ(a) = 3 0 otherwise. Since Dirichlet characters are periodic, we usually think of them as characters on Z/kZ∗ in order to use the algebraic tools.

1(a, b) is the greatest common denominator of integers a and b. 2Notice that here the principal character is not a character. It is just an periodic 3b is the remainder of a divided by k. ELEMENTARY PROOF OF DIRICHLET THEOREM 3

Definition 2.7. χ is the conjugate of χ if for all g ∈ G, χ(g) = χ(g). Remark 2.8. The conjugate of a character is also a character since complex conju- gation is a . Definition 2.9. The dual group Gˆ is the set of all the characters over G. It is a group under the following group laws: a. χ0 is the identity. b. ∀g ∈ G, (χ1χ2)(g) := χ1(g)χ2(g). c. ∀g ∈ G, χ−1(g) := (χ(g))−1. Example 2.10. If G is cyclic, then G ' Gˆ. This is a direct result of (2.2). The orthogonality relation is one of the most useful algebraic gadgets when we are dealing with Dirichlet characters. We start with the original statement of the Schur Orthogonality relation which works for any group.

Theorem 2.11. Schur orthogonality relations g ˆ Given a group G, CG the centralizer of g, h ∈ G and χi, χj, χ ∈ G, we have the following:

( X |G| i = j. a. χi(g)χj(g) = 0 i 6= j. g∈G ( g X |C | g and h are conjugate. b. χ(g)χ(h) = G 0 otherwise. χ∈Gˆ g In this paper, G is considered to be abelian, so we can simply replace |CG| with |G|. We refine the orthogonality relation by setting j to be 04 and h to be the identity. Lemma 2.12. Given group G and its dual Gˆ, we have the following:

( X |G| χ = χ0. a. χ(x) = 0 otherwise. x∈G ( X |Gˆ| x = 1. b. χ(x) = 0 otherwise. χ∈Gˆ

∗ ˆ ∗ ∗ Note that |Z/kZ | = |Z/kZ | = ϕ(k). Now we consider Z/kZ and the corre- sponding Dirichlet characters. We have the following lemma. Lemma 2.13. For any character χ on G = Z/kZ∗, we have: ( X ϕ(k) χ = χ0. a. χ(l) = 0 otherwise. l∈G ( X ϕ(k) l ≡ 1 mod k. b. χ(l) = 0 otherwise. χ∈Gˆ

4 Here χ0 denotes the principal character i.e. the trivial homomorphism from G to C. 4 ZIJIAN WANG

Recall that every Dirichlet character χ corresponds to a character χ0 on Z/kZ∗. We can deduce that the sum of χ on any interval of finite length is bounded by a Pa+k−1 Pk Pk 0 constant if χ 6= χ0 since the sum of i=a χ(i) = i=1 χ(i) = i=1 χ (i) = 0 by the orthogonality relations of χ0. This result is stated in the following lemma. Lemma 2.14. Given a, b ∈ N and a < b, we have b X χ(n) = O(1).5 n=a 3. Dirichlet series This section provides a brief introduction to Dirichlet series and Dirichlet L- function. We will assume certain analytic properties such as the covergence of the Dirichlet L-function on the right half plane. Our main focus will be the different arithmetic functions that emerges from Dirichlet functions including the M¨obius function µ and the Von Mangoldt function Λ. We will also delve a bit into as well as useful techniques such as the M¨obiusinversion. Definition 3.1. Dirichlet series P∞ −s Given arithmetic function f, Df (s) = n=1(n)n is a Dirichlet series. The Riemann zeta function is a special case of the Dirichlet series where all the weights are set to 1. ∞ X ζ(s) = n−s. n=1 When s ∈ (0, 1), X x1−s ζ(s) = lim (n−s ). x→∞ 1 − s n6x Remark 3.2. Generally speaking, s ∈ C and Df converge for any Dirichlet character f when Re(s) > 1. Proposition 3.3. L(s, χ) converge for s > 0 and non-principal character χ There are some useful asymptotic formulas related to the Riemann zeta function as defined above. Theorem 3.4. X x1−s n−s = + ζ(s) + O(x−s)(s > 0, s 6= 1) 1 − s n≤x This theorem is a direct application of the following lemma. Lemma 3.5. Euler summation formula f ∈ C1 on [a, b] with 0 < a < b, we have X Z b Z b f(n) = f(x)dx + (x − [x])f 0(x)dx + f(b)([b] − b) − f(a)([a] − a) a

5For any two functions f and g, f(x) = O(g(x)) if and only if |f(x)| ≤ M|g(x)| for some constant M and all large x. 6[] is the floor function. [x] is the largest integer smaller or equal to x. ELEMENTARY PROOF OF DIRICHLET THEOREM 5

Proof. Note that [x] is hard to deal with generally, but is easy to simplify if we confine x in some small and nice intervals.

Z a Z a [x]f 0(x)dx = [a] f 0(x)dx = [a](f(a) − f[a]) [a] [a] Z b Z b [x]f 0(x)dx = [b] f 0(x)dx = [b](f(b) − f[b]) [b] [b] More generally, for any integer n such that a < n − 1 < n < b, we have Z n Z n [x]f 0(x)dx = (n − 1)f 0(x)dx n−1 n−1 = (n − 1)(f(n) − f(n − 1)) = nf(n) − (n − 1)f(n − 1) − f(n)

R [b] 0 We can now calculate [a] [x]f (x)dx.

[b] Z [b] X Z n [x]f 0(x)dx = [x]f 0(x)dx [a] n=[a]+1 n−1 [b] X = nf(n) − (n − 1)f(n − 1) − f(n) n=[a]+1 X = [b]f[b] − [a]f[a] − f(n) a

X Z [b] f(n) = − [x]f 0(x)dx + [b]f[b] − [a]f[a](6) a

Z ∞ 1 −s−1 where Cs = 1 − − s (t − [t])t dt. 1 − s 1 Now we compare Cs with the Riemann zeta function ζ(s) with x → ∞. If s > 1, X −s ζ(s) = lim n = 0 + Cs + 0. x→∞ n≤x If s ∈ (0, 1), by (3.23), 1−s X −s x Cs = lim ( n − ) = ζ(s). x→∞ 1 − s n≤x  Proposition 3.9. Euler Product If f is multiplicative, we have ∞ Y X n −ns Df (s) = (1 + f(p )p ), p n=1 Y ζ(s) = (1 − p−s)−1. p Proof. The Euler product is a direct consequence of the unique prime factorization of integers.  Definition 3.10. Dirichlet Convolution Given arithmetic functions f and g, the Dirichlet convolution of f and g f ∗ g is defined by X n f ∗ g = f(d)g( ). d d|n Remark 3.11. The set of arithmetic functions actually forms a ring under canonical addition and Dirichlet convolution. More specifically, if we multiply two Dirichlet series Df and Dg, we get a new Dirichlet series Dh where the arithmetic function h is equivalent to the convolution of f and g. This result is stated as the following proposition. Proposition 3.12.

Df∗g = Df Dg Definition 3.13. M¨obiusfunction The M¨obiusfunction µ : N∗ → {0, −1, 1} is defined by ( (−1)t if m is the product of t distinct primes. µ(m) = 0 otherwise.

Proposition 3.14. For n ∈ Z, ( X 1 if n = 1. µ(d) = 0 if n > 1. d|n Proof. ELEMENTARY PROOF OF DIRICHLET THEOREM 7

P If n = 1, d|1 µ(d) = µ(1) = 1. Qr αi If n > 1, suppose n = i=1 pi , we have X X r µ(d) = r(−1)k k d|n k=1 = (1 − 1)r = 0.

 The M¨obiusfunction appears as the coefficient of the Dirichlet series obtained by taking the reciprocal of the Riemann zeta function. We now prove this result in the following proposition. Proposition 3.15. −1 7 ζ = Dµ Proof. Using the Euler product of zeta function, we have ∞ X ζ−1(s) = ( n−s)−1 n=1 Y = (1 − p−s) p 1. If p2 | n, then p2s | ns. Therefore, the coefficient of n−s must be 0. 2. If n is the product of t distinct primes, coefficient of such n−s is (−1)t. Therefore the weight function of the Dirichlet series generated by the reciprocal of Riemann zeta function is the same as that of Dµ.  One of the most well-known results on arithmetic functions is the M¨obius Inver- sion, which can be verified by direct calculation. Proposition 3.16. M¨obiusInversion Given arithmetic functions f, g : N → C, we have X X n g(n) = f(d) ⇔ f(n) = µ(d)g( ). d d|n d|n Definition 3.17. von Mangoldt function The von Mangoldt function Λ is defined as ( log p if n = pt, t ≥ 1 Λ(n) = 0 otherwise Proposition 3.18. 0 −1 8 −(ζ) ζ = DΛ −1 0 −1 Proof. It is equivalent to prove DΛζ = −(ζ) . Let Dh = DΛζ . We need to Qr αi show h is the same as log. ∀n with prime factorization i=1 pi , for each i, we have

7Here ζ−1 is the reciprocal of the Riemann zeta function. 8(ζ)0 is the derivative of the Riemann zeta function. 8 ZIJIAN WANG

n n n n n = p1( ) = p2( ) = p3( )... = pαi ( ). i 1 i 2 i 3 i αi pi pi pi pi

r αi X X j h(n) = Λ(pi ) i=1 j=1 r X = αi log pi i=1 r X αi = pi i=1 = log n.  Proposition 3.19. X log n = Λ(d) d|n Proof. By the previous proposition we know that Λ = µ ∗ log, in other words, X n Λ(n) = µ(d) log d d|n P By M¨obiusinversion, we have log n = d|n Λ(d)  At the end of this section, we introduce the Dirichlet L function, which will later replace Dirichlet series. Definition 3.20. Dirichlet L-function

L(s, χ) = Dχ(s) where χ is the Dirichlet character defined in Section 2. Remark 3.21. Complex analysis is not the focus of this paper, so here we only provide a simple and intuitive definition of the L-function instead of a full and rigorous construction. If χ is multiplicative, we have the Euler product for L-function Y L(s, χ) = (1 − χ(p)p−s) − 1. p

When χ is the principal character χ0 of mod k, we can factor out the Riemann zeta function and have Y −s L(s, χ0) = ζ(s) (1 − p ). p|k We now prove some useful asymptotic properties on the L-function.

Proposition 3.22. X d χ(k)k−1 = L(1, χ) + O( ). x x k≤ d ELEMENTARY PROOF OF DIRICHLET THEOREM 9

Proof. X X |L(1, χ) − χ(k)k−1| = | χ(k)k−1| x x k≤ d k> d P∞ −1 | k=1 χ(k)k | = x d d = L(1, χ) x d = O( ) since L(1, χ) converges. x  4. The non-vanishing property of L(1, χ) In this section we prove that L(1, χ) 6= 0 for all non-principle characters. This turns out to be the key of our proof of the Dirichlet Theorem. We do not follow the original strategy of Dirichlet, which uses the Class Number Formula. Instead, we use an elementary method9. Theorem 4.1. ∀χ 6= χ0,L(1, χ) 6= 0. Proposition 4.2. X a. If L(1, χ) 6= 0, then χ(l)Λ(l)l−1 = O(1). l≤x X b. If L(1, χ) = 0, then χ(l)Λ(l)l−1 = − log x + O(1). l≤x P −1 Proof of a. We estimate the series n≤x χ(n)n log n using the formulas we de- rived in previous sections.

Since χ is multiplicative, for l|n we have n n χ(n)n−1Λ(l) = χ( )χ(l)( )−1l−1Λ(l) l l n n = χ(l)Λ(l)l−1χ( )( )−1. l l Using this, we can deduce that X X X χ(n)n−1 log n = (χ(n)n−1 Λ(l)) n≤x n≤x l|n X X = (χ(l)Λ(l)l−1 χ(m)m−1). x l≤x m≤ l P −1 By (3.22) we can replace x χ(m)m . Therefore, m≤ l X l X X l χ(l)Λ(l)l−1(L(1, χ) + O( )) = χ(l)Λ(l)l−1L(1, χ) + χ(l)Λ(l)l−1O( ) x x l≤x l≤x l≤x X = L(1, χ) χ(l)Λ(l)l−1 + O(1). l≤x

9We follow the idea in the book Analytic Number theory by Iwaniec and Kowalski. 10 ZIJIAN WANG

In all, we have X X χ(n)n−1 log n = L(1, χ) χ(l)Λ(l)l−1 + O(1). n≤x l≤x

P −1 P −1 Since n≤x χ(n)n log n and L(1, χ) are both bounded, we have l≤x χ(l)Λ(l)l = O(1).  Proof of b. X X x X χ(n)n−1 µ(d) log − χ(n)Λ(n)n−1 d n≤x d|n n≤x X X x X X n = χ(n)n−1 µ(d) log − χ(n)n−1 µ(d) log d d n≤x d|n n≤x d|n X X x = χ(n)n−1 µ(d) log n n≤x d|n X X X X = χ(n)n−1 µ(d) log x − χ(n)n−1 µ(d) log n. n≤x d|n n≤x d|n Using (3.14), we have X X X X χ(n)n−1 µ(d) log x − χ(n)n−1 µ(d) log n n≤x d|n n≤x d|n X X X X = log x χ(n)n−1 µ(d) − χ(n)n−1log(n) µ(d) n≤x d|n n≤x d|n = log xχ(1)1−1 − χ(1)1−1log(1) = log x. Therefore, X X X x log x + χ(n)Λ(n)n−1 = χ(n)n−1 µ(d) log d n≤x n≤x d|n X x X = µ(d)χ(d)d−1 log χ(l)l−1 d x d≤x l≤ d X x = L(1, χ) µ(d)χ(d)d−1 log + O(1). d d≤x By our assumption L(1, χ) = 0, we know that X log x + χ(n)Λ(n)n−1 = O(1). n≤x In other words, X χ(l)Λ(l)l−1 = − log x + O(1). l≤x  Remark 4.3. ELEMENTARY PROOF OF DIRICHLET THEOREM 11

We can conclude (4.2) by the following.

X −1 χ(l)Λ(l)l = −λχ log x + O(1), l≤x where the coefficient λχ depends on χ  −1 L(1, χ) = 0 and χ 6= χ .  0 λχ = 0 L(1, χ) 6= 0 and χ 6= χ0.   1 χ = χ0. Proof of 4.1. Now we sum (4.3) over all characters. By the orthogonality relations (2.13), X X LHS = χ(l)Λ(l)l−1 χ mod k l≤x X = ϕ(k) Λ(l)l−1. l≤x,l≡1 mod k

X RHS = λχ log x + O(1). χ mod k We have X −1 X ϕ(k) Λ(l)l = λχ log x + O(1). l≤x,l≡1 mod k χ mod k P As a result, χ mod k λχ must be nonnegative. Since λχ > 0 only for the principal character, there is at most one non-principal character for which L(1, χ) = 0. If there exist such non-principal character, then it must be real because L(s, χ) = L(s, χ) for real s. Therefore, all we need to prove now is that real character χ s.t. L(1, χ) = 0 does not exist. Assume L(1.χ) is real. We consider the function

X − 1 F (x) = fχ(n)n 2 n≤x where X fχ(n) = χ(d). d|n

Qr αi Since χ is multiplicative, given n = i=1 pi , we know that

r αi Y X j fχ(n) = χ(pi ). i=1 j=1

2 Therefore, fχ(n) is always non-negative. Notice that fχ(m ) ≥ 1. We have the following estimation of the lower bound for F , √ Z x X 2 X −1 −1 √ log x (4) F (x) ≥ fχ(m )m ≥ m > t dt = log x = . √ √ 1 2 m2≤ x m≤ x 12 ZIJIAN WANG

On the other hand, we can split F (x) into two parts X − 1 F (x) = χ(a)(ab) 2 ab≤x X − 1 X − 1 X − 1 X − 1 = χ(a)a 2 b 2 + b 2 χ(a)a 2 . √ √ √ b≤ x x a≤ x a b≤ x x log x.

Since limx→∞ log x = ∞, we can safely conclude from the above inequality that L(1, χ) > 0  P 10 Now we know that χ λχ = 1 , so we have the following lemma Lemma 4.5. For all x ∈ R, if (l, k) = 1, we have X Λ(n)n−1 = ϕ(k)−1 log x + O(1) n≤x,n≡l mod k

5. Elementary proof of the Dirichlet Theorem Finally we are able prove the Dirichlet Theorem in the way that Dirichlet proved it, using the Dirichlet characters and the Dirichlet L-function. The proof uses the non-vanishing result from the last section and does not involve any complex analysis. Theorem 5.1 (Dirichlet Theorem). There exist infinitely many primes p ≡ l mod k where l is coprime to k.

l Proof. For simplicity, we define Pk to be the set of primes p ≡ l mod k. Our goal l is to show that Pk is infinite if l and k are coprime. We first recall the Euler product for Dirichlet L-function Y L(s, χ) = (1 − χ(p)p−s)−1. p 1 Using the Taylor expansion for log 1−t , we have, 1 log = χ(p)p−s + O((χ(p)p−s)2). 1 − χ(p)p−s If we take the logarithm of both sides, we would have X X log L(s, χ) = χ(p)p−s + (χ(p)p−s)2. p p

10 λχ is defined in 4.3 ELEMENTARY PROOF OF DIRICHLET THEOREM 13

P −s 2 P 2 −2s Notice that p(χ(p)p ) = p χ(p) p converges. Therefore, X log L(s, χ) = χ(p)p−s + O(1). p Observe that the range of χ is within S1 where χ(n)−1 = χ(n). 11 By the orthog- P onality relations, we know that χ mod k χ(l)χ(p) does not vanish if and only if l p ∈ Pk. X X X ϕ(k)−1 χ(l) log L(s, χ) = ϕ(k)−1 χ(l)( χ(p)p−s + O(1)) χ mod k χ mod k p X X X = ϕ(k)−1 χ(l) χ(p)p−s + ϕ(k)−1 χ(l)O(1) χ mod k p χ mod k X = p−s + O(1). l p∈Pk If χ is the principal character, recall that we can factor out the Riemann zeta function from the L-function. Y −s L(s, χ0) = ζ(s) (1 − p ) p|k where ∞ X Z ∞ ζ(s) = n−s = t−sdt + O(1) = (s − 1)−1 + O(1). 1 1 As a result, −1 −1 ϕ(k) χ0(l) log L(s, χ0) = ϕ(k) log L(s, χ) = ϕ(k)−1 log(s − 1)−1 + O(1). Now we need to show the rest of the sum is bounded/finite in order to prove that P −s + our target series l p diverge as s → 1 . p∈Pk Recall that lims→1+ L(s, χ) exists and does not vanish. Therefore lims→1+ log L(s, χ) is finite for non-principal characters, in other words, when s → 1+, X p−s = ϕ(k)−1 log(s − 1)−1 + O(1). l p∈Pk P −s + The series l p diverges as s → 1 . In other words, that there are infinitely p∈Pk many primes of the form p ≡ l mod k.  Acknowledgments. It is a pleasure to thank my mentor Owen Barrett for his time and effort in guiding me on the topic and helping me understanding the material. I also want to thank Professor May for helping me with my paper writing skills and giving me advice on academic writing in mathematics.

References [1] Henryk Iwaniec, Emmanuel Kowalski. Analytic Number Theory. American Mathematical So- ciety.2004. [2] Tom M. Apostol. Introduction to Analytic Number Theory. Springer.1976. [3] Wikipedia topic on Schur Orthogonality Relations. https://en.wikipedia.org/wiki/Schur orthogonality relations.

11If z ∈ S1, then zz = 1.