Research Collection

Doctoral Thesis

Primes of the shape x² + ny² The distribution on average and prime number races

Author(s): Ditchen, Jakob J.

Publication Date: 2013

Permanent Link: https://doi.org/10.3929/ethz-a-010138958

Rights / License: In Copyright - Non-Commercial Use Permitted

This page was generated automatically upon download from the ETH Zurich Research Collection. For more information please consult the Terms of use.

ETH Library Diss. ETH No. 21502

Primes of the shape x2 + ny2 The Distribution on Average and Prime Number Races

A dissertation submitted to ETH Zürich for the degree of Doctor of Sciences

Presented by Jakob Johann Ditchen

Dipl.-Math. techn. Universität Karlsruhe (TH) Certificate of Advanced Study in Mathematics University of Cambridge

Born on 10 November 1982 Citizen of Germany

Accepted on the recommendation of Prof. Dr. Emmanuel Kowalski Examiner Prof. Dr. Özlem Imamoglu Co-examiner Prof. Dr. Philippe Michel Co-examiner

2013

Abstract

This thesis focuses on uniformities and discrepancies in the distribution of prime numbers represented by positive definite integral binary quadratic forms of various discriminants. We prove results of Bombieri–Vinogradov and Barban–Davenport–Halberstam type on the average distribution of the primes with respect to their representability by these forms. Our results imply that the corresponding prime number theorem holds uniformly and with a non-trivial error term for almost all negative fundamental discriminants in long ranges. Moreover, we investigate a variant of “Chebyshev’s bias” between primes of the shapes x2 + ny2 and x2 + my2 for certain distinct positive integers n and m.

Deutsche Zusammenfassung

Die vorliegende Dissertation befasst sich mit Gleichmäßigkeiten und Diskrepanzen in der Verteilung von Primzahlen, die durch positiv definite, ganzzahlige binäre quadratische Formen mit unterschiedlichen Diskriminanten darstellbar sind. Wir beweisen Varianten des Satzes von Bombieri-Vinogradov sowie des Satzes von Barban-Davenport-Halberstam und zeigen so, dass der betreffende Primzahlsatz, von höchstens „wenigen“ Ausnahmen abgesehen, für negative Fundamentaldiskrimi- nanten in langen Intervallen ein gleichmäßiges und nicht-triviales Restglied aufweist. Des Weiteren untersuchen wir für gewisse Paare (n, m) positiver ganzer Zahlen eine Diskrepanz zwischen den Verteilungen von Primzahlen der Form x2 + ny2 und solchen der Form x2 + my2; dies stellt ein Gegenstück zu einer klassischen Beobach- tung von Tschebyscheff bezüglich Primzahlen der Formen 4k + 1 und 4k + 3 dar, die in den letzten Jahren intensiv untersucht wurde.

Binäre quadratische Formen, das heißt homogene Polynome der Gestalt

2 2 f(x, y) = ax + bxy + cy (x, y ∈ Z) mit ganzzahligen Koeffizienten a, b und c, sind neben arithmetischen Folgen die einfachsten Polynome von denen bekannt ist, dass jedes von ihnen unendlich viele Primzahlen darzustellen vermag, sofern dem nicht Kongruenzbeziehungen der Koeffizienten offensichtlich entgegenstehen. Die analytische Theorie der Darstellung von Primzahlen durch fest gewählte binäre quadratische Formen ist ähnlich gut erforscht wie jene der Primzahlen in fest gewählten arithmetischen Folgen. Es ist hingegen nur wenig darüber bekannt, wie sich diese Eigenschaften im Durchschnitt über mehrere binäre quadratische Formen unterschiedlicher Diskriminante verhalten oder wie sie im Vergleich zweier verschiedener Formen von einander abweichen – andererseits existieren zahlreiche solcher Resultate für Primzahlen in arithmetischen Folgen. Für Primzahlen in arithmetischen Folgen wurden für die erstgenannte Art von Problemen mittels des Großen Siebs ab den 1960er Jahren Resultate erzielt, die es in Anwendungen häufig ermöglichen, auf den Gebrauch der verallgemeinerten Riemannschen Vermutung zu verzichten. Die bekanntesten dieser Ergebnisse sind der Satz von Bombieri-Vinogradov sowie der Satz von Barban-Davenport-Halberstam. Diese zeigen zum einen, dass das Restglied im Primzahlsatz für arithmetische Folgen im Durchschnitt dem durch die Riemannsche Vermutung vorhergesagten entspricht; dabei wird der Durchschnitt über die Moduln der arithmetischen Folgen im selben Bereich betrachtet, in welchem die Riemannsche Vermutung nicht-triviale Ergebnisse liefert. Zum anderen konnte gezeigt werden, dass der mittlere quadratische Fehler im Primzahlsatz sehr klein ist, wenn sowohl über die Moduln als auch über deren Restklassen gemittelt wird; der hier zulässige Bereich für die Moduln übersteigt dabei sogar den durch die Riemannsche Vermutung kontrollierten Bereich. In der vorliegenden Arbeit werden analoge Resultate für positiv definite, ganzzahlige binäre quadratische Formen gefunden: Sei X eine große, positive Zahl. Für die Anzahl der Primzahlen p 6 X, für welche – bei gegebener ganzer Zahl n – ganze Zahlen x und y existieren, so dass p sich in der Form p = x2 + ny2 schreiben lässt, zeigen wir insbesondere, dass der zugehörige Primzahlsatz für die quadratfreien, positiven ganzen Zahlen n ≡ 1 (mod 4) unterhalb von etwa X1/10 gleichmäßig in n gilt – abgesehen von höchstens „wenigen“ Ausnahmen. Allgemeiner beweisen wir konkret, dass für alle A > 0 eine Konstante B = B(A) existiert, so dass für alle ε > 0 die Beziehung

X0 li(X) 1/2 −A max π(X; q, C) − ε,A Q X(log X) C∈K(q) e(C)h(q) q>−Q

10+ε −B gilt, falls Q 6 X(log X) ist. Hierbei bezeichnet π(X; q, C) die Anzahl der Primzahlen p 6 X, welche durch die quadratischen Formen der Formenklasse C aus der Formenklassen- gruppe K(q) zur Diskriminante q darstellbar sind, h(q) bezeichnet die Klassenzahl zu dieser Diskriminante, li steht für das logarithmische Integral und e(C) ist eine von der Klasse abhän- gige Konstante; die Summe auf der linken Seite läuft über negative Fundamentaldiskriminanten q > −Q mit q 6≡ 0 (mod 8). Ferner zeigen wir, dass das Restglied im Primzahlsatz für positiv definite binäre quadra- tische Formen im quadratischen Mittel über sowohl Fundamentaldiskriminanten als auch die zugehörigen Formenklassen in einem größeren Bereich klein ist: Für alle A > 0 existiert eine Konstante B = B(A), so dass für alle ε > 0 die Beziehung

2 X0 X li(X) 1/2 2 −A π(X; q, C) − ε,A Q X (log X) e(C)h(q) q>−Q C∈K(q)

3+ε −B gilt, falls Q 6 X(log X) ist. Beide Ergebnisse erreichen nicht die Stärke der oben genannten Resultate für arithmetische Folgen. Dies ist unter anderem auf den Umstand zurückzuführen, dass es uns lediglich gelingt eine schwächere Version einer Ungleichung zum Großen Sieb für komplexe Klassengruppen- charaktere zu finden, welche für Ergebnisse dieses Typs unentbehrlich scheint. Während die bisher genannten Ergebnisse sich mit der Untersuchung von Uniformität in der Verteilung von durch arithmetische Folgen respektive binäre quadratische Formen darstellba- ren Primzahlen befassen, ist die Frage nach Diskrepanzen in diesen Verteilungen nicht minder interessant. Tschebyscheff bemerkte bereits, dass die Anzahl der Primzahlen in der Folge 4k + 1 unterhalb einer gegebenen Zahl meist kleiner ist als diejenige in der Folge 4k + 3. Dem Prim- zahlsatz zufolge sind beide Anzahlen asymptotisch gleich, so dass die Ursachen dieser „Vorliebe“ der Primzahlen für die zweite Folge nicht offensichtlich sind. Erst in den letzten Jahren wurde diese Diskrepanz in allgemeiner Form für arithmetische Folgen eingehend untersucht. Wir untersuchen in dieser Arbeit einen ähnlichen Effekt, der sich für die Anzahl der Prim- zahlen der Form x2 + ny2 und solche der Form x2 + my2 unterhalb einer gegebenen Zahl offenbart, wenn sich die zugehörigen Diskriminanten der beiden Formen in der Klassenzahl zwar gleichen – und somit, nach dem Primzahlsatz, auch das asymptotische Verhalten der Verteilun- gen übereinstimmt –, sie sich in der Anzahl ihrer ungeraden Primfaktoren aber unterscheiden. Von den großen Meistern wie Fermat, Euler, Gauß und Dirichlet wurde Primzahlen der Form x2 + ny2 mindestens ebenso viel Beachtung geschenkt wie Primzahlen der Form a + nk. Und noch de la Vallée Poussin bewies in seiner Arbeit zum Primzahlsatz diesen nicht nur in der gewöhnlichen Form und in der Form für arithmetische Folgen, sondern ebenfalls gleich in der Form für positiv definite binäre quadratische Formen. Wiewohl Primzahlen, die durch binäre quadratische Formen darstellbar sind, seither immer wieder prominent in Erscheinung getreten sind – beispielsweise als wichtiger Bestandteil bestimmter Faktorisierungsalgorithmen –, haben die Untersuchungen bezüglich Gleichmäßigkeiten und Diskrepanzen in ihrer Verteilung jedoch bei weitem nicht mehr dieselbe Aufmerksamkeit erhalten wie die entsprechende Forschung zu Primzahlen in arithmetischen Folgen, die sich häufig einfacher gestaltet. Die vorliegende Arbeit möchte einen Beitrag dazu leisten, diese Lücke dereinst zu schließen. Meinen Eltern gewidmet

Contents

Primes of the shape x2 + ny2: The Distribution on Average and Prime Number Races

Preface 1 Notation 5

1 Primes represented by positive definite binary quadratic forms 7

1.1 The composition of binary quadratic forms and form classes ...... 9

1.2 Algebraic methods for arithmetic objects ...... 11

1.3 The Chebotarev density theorem and conditional results ...... 17

2 The average distribution of primes represented by positive definite binary quadratic forms with varying discriminant 21

2.1 Mean-value results for primes in arithmetic progressions ...... 22

2.2 A large sieve inequality for complex ideal class group characters ...... 25

2.3 Results of Bombieri–Vinogradov type ...... 34

2.4 The mean square distribution ...... 57

2.5 Applications and open questions ...... 63

3 Chebyshev’s bias and prime number races for binary quadratic forms 71

3.1 Bias in the distribution of primes in arithmetic progressions ...... 72

3.2 Primes represented by different classes of forms with a fixed discriminant . . . . 74

3.3 Prime number races for forms of the shape x2 + ny2 ...... 76

Bibliography 89

Preface 1

Preface

Apart from arithmetic progressions, integral binary quadratic forms – that is, homogeneous polynomials of the shape 2 2 f(x, y) = ax + bxy + cy (x, y ∈ Z) with integral coefficients a, b and c – are the simplest polynomials which are known to represent infinitely many prime numbers unless there is an obvious obstacle by means of a common prime divisor of the coefficients. Analytic questions about prime numbers which are representable by any fixed binary quadratic form have been studied almost as extensively as analytic questions about prime numbers in arithmetic progressions. There is, however, not much known about the average behaviour of these representation properties when averaged over binary quadratic forms of distinct discriminants or about differences in these properties between two distinct binary quadratic forms. In contrast, there exist plenty of comparable results for prime numbers in arithmetic progressions. Results for questions on “uniformity on average” in the prime number theorem for arithmetic progressions of various moduli have been achieved by means of the Large Sieve since the 1960s. In applications they often allow to dispense with the assumption of the Generalized Riemann Hypothesis (GRH). The Bombieri–Vinogradov theorem and the Barban–Davenport–Halberstam theorem are the most famous and important of these results: The first one shows that the error term in the prime number theorem for arithmetic progressions is small – as small as predicted by GRH – for all reduced residue classes, “on average” over moduli in about the same range of moduli in which GRH yields non-trivial results. The second theorem shows that the mean square of the error term is small if one averages over both moduli and their reduced residue classes; here the admissible range for the moduli even exceeds the range that may be controlled by GRH. In this dissertation we find analogous results for positive definite integral binary quadratic forms. For all large positive numbers X, we show that the prime number theorem for primes 2 2 p 6 X of the shape p = x + ny holds, with at most “few” exceptions, for almost all squarefree positive integers n ≡ 1 (mod 4) up to about X1/10 with a uniform, small remainder term. In fact, we prove more generally that for all A > 0 and all ε > 0, there exists a constant B = B(A) such that

X0 li(X) 1/2 −A max π(X; q, C) − ε,A Q X(log X) C∈K(q) e(C)h(q) q>−Q 10+ε −B if Q 6 X(log X) . Here π(X; q, C) denotes the number of primes p 6 X which are representable by quadratic forms lying in the form class C of the form class group K(q) of discriminant q; the corresponding class number is h(q), the logarithmic integral is denoted by li, and e(C) is a constant which depends on the form class only; the sum on the left-hand side is over negative fundamental discriminants q > −Q with q 6≡ 0 (mod 8). Furthermore, we show that the mean square of the remainder term in the prime number theorem for positive definite binary quadratic forms is small in a longer range – if we average over both the discriminants and the corresponding form classes: For all A > 0 and all ε > 0, there exists a constant B = B(A) such that the bound 2 X0 X li(X) 1/2 2 −A π(X; q, C) − ε,A Q X (log X) e(C)h(q) q>−Q C∈K(q)

3+ε −B holds if Q 6 X(log X) . 2 Preface

Both statements do not reach the strength of the aforementioned theorems for arithmetic progressions. Among other reasons, this paucity is due to the fact that we only succeed to find a weaker version of a large sieve inequality for complex ideal class group characters, which seems to be essential for results of this type.

The results that we have mentioned so far have been concerned with the examination of uniformities in the distributions of primes in arithmetic progressions and primes which may be represented by binary quadratic forms. The question on discrepancies in these distributions is scarcely less interesting. Chebyshev already noticed that the number of primes below a given bound and lying in the progression 4k + 1 is usually smaller than the number of primes lying in the progression 4k + 3. The reason for this “bias” is not obvious as, by the corresponding prime number theorem, the cardinalities of both sets are asymptotically equal. It has only been recently that this discrepancy has been analysed in a more general setting. We examine a similar “bias” which shows itself between primes of the shape x2 + ny2 and primes of the shape x2 +my2 when the discriminants of these forms have the same class number but a different number of distinct odd prime divisors.

The old masters like Fermat, Euler, Gauß and Dirichlet paid at least as much regard to primes of the form x2 + ny2 as to primes of the form a + nk. Likewise de la Vallée Poussin, in his seminal work on the Prime Number Theorem, not only proved it in the ordinary form for all prime numbers but he also gave the proofs of the prime number theorem for arithmetic progressions as well as the prime number theorem for positive definite binary quadratic forms in the same work. Since then, prime numbers that can be represented by specific binary quad- ratic forms have seen many important applications, e.g. as ingredients of certain factorization algorithms, to name only one example. However, research on uniformities and discrepancies in the distribution of such prime numbers has received much less attention than the corresponding research on prime numbers in arithmetic progressions, which often proves to be easier. This thesis aims to provide a modest step towards closing this gap.

Outline Chapter 1 serves as an introduction to the fundamental results of the theory of binary quadratic forms and the representability of primes by such forms of a fixed discriminant. We start the chapter with some historical remarks and give a review of the primary notions of this theory. The basis of the theory of binary quadratic forms was mainly established by Gauß in his seminal Disquisitiones Arithmeticae. In particular, he fleshed out the definition of equivalence classes of binary quadratic forms and the theory of composition of these forms (which both have their origins in the work of Lagrange). In Section 1.1, we give a brief review of these concepts, which led to the first results by Dirichlet, Weber and de la Vallée Poussin on the number of primes that are representable by any given binary quadratic form. In the subsequent Section 1.2, we summarize the relation between binary quadratic forms and ideals in quadratic fields. This relationship has been essential for most analytic investigations on primes and binary quadratic forms. In particular, Landau’s improvement of de la Vallée Poussin’s prime number theorem for binary quadratic forms, which we also state in this section, is built on this connection and our own results in Chapters 2 and 3 will also rely on it. We pick up this topic in Section 1.3, in which we state the Chebotarev density theorem and relate it to the prime number theorem for binary quadratic forms. The emphasis is laid on conditional results that depend on appropriate versions of the Generalized Riemann Hypothesis; they will be used in Chapter 3. Preface 3

In Chapter 2, the main part of this thesis, we investigate the uniformity of the distribution of primes “in” form classes (i.e., with respect to the representability by these classes) when the discriminant of the classes varies over negative fundamental discriminants q 6≡ 0 (mod 8) and we demonstrate that good error terms in the corresponding prime number theorem hold “on average” (however, we do not achieve the above-mentioned conditional error terms). We start with a review of analogous results for primes in arithmetic progressions in Section 2.1. The first original results are proved in Section 2.2, in which we find a large sieve inequality for complex ideal class group characters. This inequality lies at the heart of the subsequent two sections. Our restriction to fundamental discriminants q with q 6≡ 0 (mod 8) in this chapter is mainly due to the cumbersome proof that this large sieve inequality would require for more general discriminants. In Section 2.3, we prove results of Bombieri–Vinogradov type for the counting function for primes represented by binary quadratic forms as well as similar results for smooth versions of appropriate Chebyshev functions and for special subsets of negative fundamental discriminants. The results for the latter functions and sets show an interesting feature in that we end up with more than the usual saving of just a power of a logarithm over “trivial” bounds. We also notice that the results may be improved if we assume the Lindelöf Hypothesis for Rankin–Selberg convolutions of holomorphic cusp forms of weight one. The second type of uniformity results are of Barban–Davenport–Halberstam type; they are the topic of Section 2.4. We show that general arithmetic functions exhibit an “average behaviour” with respect to the representability of integers by form classes – for most form classes to most discriminants in long ranges – if the functions satisfy Siegel–Walfisz conditions for both arithmetic progressions and form classes (and an additional technical condition). Two applications and an outlook on possible extensions and generalizations are provided in Section 2.5. The first application deals with the question about the size of the least primes that are representable by binary quadratic forms of a given discriminant. The second application is a uniformity result for integers of the form k = x2 + ny2 which are the product of two primes that are representable by forms of discriminant −4n.

In spite of these uniformity results, frequency discrepancies still occur between primes of the shapes x2 +ny2 and x2 +my2 for distinct positive integers n and m – even when their frequencies show the same asymptotic behaviour. These discrepancies are the subject of the original results of Chapter 3, which can be considered as a counterpoint to the results of Chapter 2. Before we come to the new results, the so-called Chebyshev bias (or prime number race) for primes in arithmetic progressions is reviewed in Section 3.1. Many important results on this topic have been found only recently. Previous research has also investigated an analogous bias in the distribution of prime ideals in distinct ideal classes of a fixed imaginary quadratic field; we look at the corresponding results in Section 3.2. Due to the close relation between ideal classes and form classes which we have mentioned above, some of these results may be interpreted as prime number races for primes represented by binary quadratic forms in different form classes of the same discriminant. Finally, in Section 3.3, we demonstrate that there exists a bias in the distribution of primes of the shapes x2 +ny2 and x2 +my2 when −4n and −4m are negative fundamental discriminants with a different number of odd prime divisors but the same class number. Similarly to most other recent results in comparative prime , our results are conditional on the Generalized Riemann Hypothesis and a linear independence hypothesis for the zeros of certain class group L-functions and Dirichlet L-functions. In the proofs of the results of this section – unlike the results of Chapter 2 – almost no major new difficulties arise that would require a significant deviation from the proofs of results on Chebyshev’s bias for arithmetic progressions; 4 Preface thus, after the initial setting of the scene, we mostly only stress the differences that occur and do not repeat the proofs in full detail. We close the section with a list of questions and possible extensions that could be investigated in future work.

About this Thesis The research described in this dissertation was performed in the Department of Mathematics (D-MATH) at ETH Zürich between December 2008 and October 2013, and was supervised by Professor Dr. Emmanuel Kowalski. Chapter 1 as well as Sections 2.1, 3.1 and 3.2 are of an expository nature and do not contain any original results. The work presented in the remaining sections of this dissertation is original. It is influenced primarily by the following earlier works:

• The proof of the large sieve inequality in Section 2.2 is based on a similar result of Duke and Kowalski [DK00].

• The proof of the results in Section 2.3 follows roughly Gallagher’s method of proof for the original Bombieri–Vinogradov theorem as presented in [Bom87].

• The proof of the results in Section 2.4 follows roughly the method of proof for the original Barban–Davenport–Halberstam theorem and its generalizations as described in [IK04], for example.

• The proofs of the results in Section 3.3 largely parallel some of the proofs of results on Chebyshev’s bias in [RS94] and [FM13].

Jakob J. Ditchen Zürich, Autumn 2013

Acknowledgements I am greatly indebted to Professor Emmanuel Kowalski for suggesting the problems from which this dissertation arose and for patiently guiding me with valuable advice. I would also like to thank Professor Özlem Imamoglu and Professor Philippe Michel for accepting to examine this thesis. Many colleagues made the time that I spent at the Department of Mathematics at ETH Zürich a very pleasant and inspiring one, for which I thank them. Zutiefst dankbar bin ich schließlich auch meiner geliebten Andrea, die mir in ihrer solch wundervollen und unterstützenden Art stets zur Seite steht, Martha und Izabela, die mir großartige Schwestern sind, und meinen lieben Eltern, die mir den Weg zur Mathematik sowie der Kunst des Lernens aufgezeigt haben und mich seit all den Jahren in einzigartiger Weise unterstützen. Notation 5

Notation

We list here the main notation, symbols and assumptions that will be used throughout this thesis; many of them are standard in (analytic) number theory. The letters m and n denote positive integers, q denotes a negative discriminant, q0 and q00 denote fundamental discriminants, s denotes a complex number, Q and X denote positive real numbers and C denotes a form class when used as parameters or arguments of functions in the definitions below.

Symbol Meaning ϕ(n) the Euler totient function µ(n) the Möbius function π(X) the number of primes p 6 X Z X 1 li(X) the logarithmic integral, i.e. li(X) = dt log t Z 2 · ds the integral on the line Re(s) = c in the complex plane (c) Γ(s) the gamma function Λ(n) the von Mangoldt function ψ(X) the Chebyshev function, i.e. ψ(X) = X Λ(n) n6X τ(n) the number of positive integer divisors of n ω(n) the number of prime divisors of n counted without multiplicity (m, n) the greatest common divisor of m and n [m, n] the least common multiple of m and n D the set of all negative discriminants, i.e. the set of all negative integers q satisfying q ≡ 0 (mod 4) or q ≡ 1 (mod 4) F the set of all negative fundamental discriminants; see (1.3) F the set of all negative fundamental discriminants q 6≡ 0 (mod 8) F(Q) the set of negative fundamental discriminants q ∈ F with |q| 6 Q Fex(Q) the subset of exceptional discriminants in F(Q); see Section 2.3.2 K(q) the form class group of binary quadratic forms with discriminant q C0(q), C0 the form class which contains the principal form with discrimin- ant q, i.e. the identity element of K(q) R(q, C) the set of all positive integers which can be represented by all forms f of the form class C ∈ K(q) √ Oq the ring of integers of Q( q) O(q) the order of discriminant q in an imaginary quadratic field K; it equals Oq if q ∈ F (which we will usually assume; see below) I(q) the group of invertible fractional O(q)-ideals, i.e. the group of invertible finitely generated O(q)-submodules of K P (q) the subgroup of principal fractional O(q)-ideals H(q) the quotient I(q)/P (q), i.e. the ideal class group of the order O(q) Bq the bijection K(q) → H(q) given in Lemma 1.4 h(q) the class number for the discriminant q, i.e. h(q) = |K(q)| = |H(q)| Z(q) the set of non-zero integral O(q)-ideals N(a) the norm of the ideal a ∈ Z(q), i.e. the size of the quotient ring O(q)/a (the dependence on q is suppressed) Hb(q) the dual group of H(q), i.e. the set of (ideal) class group characters (q) χ0 the trivial character in Hb(q) 6 Notation

P λχ(n) the sum a∈Z(q) χ(a) for χ ∈ Hb(q); here and throughout we set N(a)=n χ(a) = χ(C), where C ∈ H(q) is the ideal class of a ∈ I(q) q0 0 χq0 the Kronecker symbol ( · ) for the fundamental discriminant q , i.e. the primitive real Dirichlet character modulo |q0| if q0 6≡ 0 (mod 8) χq0,q00 the real (ideal) class group character arising from the convolution of the Dirichlet characters χq0 and χq00 ; see (2.69) and (2.98) L(s, λχ) the L-function for the class group character χ L(s, χ) the L-function for the Dirichlet character χ (it may also denote a class group L-function in the displayed sums in Section 3.3) π(X; m, n) the number of primes p 6 X with p ≡ n (mod m) π(X; q, C) the number of primes p 6 X such that for every form f in C ∈ K(q) there exist x, y ∈ Z satisfying f(x, y) = p ψ(X; q, C) the corresponding Chebyshev function ψk(X; q, C) the smoothed and weighted Chebyshev function which is defined in equation (2.22) e(C) the constant which equals 1 if the form class C has order > 3 in K(q) and equals 2 otherwise κ(q) the number of form classes C in K(q) with e(C) = 2, i.e. the number of ambiguous classes; see equation (1.2) w(C, n) the number of ideals a ∈ Bq(C) with N(a) = n; see Remark 1.8 ν a divisor frequency associated to a subset of F; see Definition 2.12 π0(X; n) the number of primes p 6 X such that there exist x, y ∈ Z satisfying p = x2 + ny2

Complex numbers are generally denoted by s = σ + it with σ, t ∈ R. Non-trivial zeros of L-functions are generally denoted by ρ = β + iγ with β, γ ∈ R. When used as a variable, the letter p will always denote a rational prime and the Fraktur letter p will always denote a prime ideal (of a ring which will be clear from the context).

Asymptotic notation: For arithmetic functions f and g, we write f(x) = O(g(x)), or equivalently f(x)  g(x), when there is an absolute constant c such that |f(x)| 6 cg(x) for all values of x under consideration. We usually write f(x) = Oα(g(x)) or f(x) α g(x) if the constant depends on some para- meter α; we may suppress such dependencies if they are sufficiently clear from the context. We write f(x) g(x) if lim f(x)/g(x) = 1, and f(x) = o(g(x)) when lim f(x)/g(x) = 0. ∼ x→∞ x→∞ General assumptions on binary quadratic forms and discriminants: All binary quadratic forms in this thesis are assumed to be integral, primitive and positive definite; in particular, all discriminants of forms, orders and fields are negative integers q which satisfy either q ≡ 0 (mod 4) or q ≡ 1 (mod 4). In addition, all discriminants which will appear after Remark 1.7 – in particular, all discriminants in Chapter 2 and Chapter 3 – are assumed to be fundamental discriminants, i.e. each of these discriminants q is assumed to satisfy either (a) q ≡ 1 (mod 4) with q squarefree, or

q q (b) q ≡ 0 (mod 4) with 4 ≡ 2 or 3 (mod 4) and 4 squarefree. Moreover, none of the discriminants in Chapter 2 will be a multiple of 8. In Chapter 3, we will only consider forms of the shape x2 + ny2 for positive integers n; note that the discriminant of x2 +ny2 is a negative fundamental discriminant if and only if n 6≡ 3 (mod 4) and n is squarefree. Chapter 1

Primes represented by positive definite binary quadratic forms

The history of questions about prime numbers of the shape x2 + ny2 probably starts with Fermat, who stated the following three assertions in letters to Mersenne in 1640 and to Pascal in 1654: For all odd primes p,

2 2 ∃ x, y ∈ Z : p = x + y if and only if p ≡ 1 (mod 4), 2 2 ∃ x, y ∈ Z : p = x + 2y if and only if p ≡ 1 or 3 (mod 8), (1.1) 2 2 ∃ x, y ∈ Z : p = x + 3y if and only if p = 3 or p ≡ 1 (mod 3).

Fermat claimed to have proofs for all these statements, but there is no evidence that this was indeed the case. It took about one hundred years before Euler actually provided complete proofs for Fermat’s assertions (see [Cox97, §1.1] and the references there). Similar statements for primes of the shape x2 + ny2 for particular values of n > 3 were conjectured by Euler and most of them were later proved by Lagrange and Gauß by means of quadratic reciprocity. However, it slowly became clear that congruence relations like these could not exist for all positive integers n, even less for arbitrary binary quadratic forms. Only the evolution of the theory of the composition of forms and form classes as well as the development of ideal theory and the ensuing relation between classes of quadratic forms and ideal classes in quadratic fields opened up the possibility to find conditions for the representability of primes by general binary quadratic forms and statements for the frequency of such primes. Before going into details in the upcoming sections, let us recall the basic definitions and fix certain assumptions concerning binary quadratic forms: An integral binary quadratic form is a homogeneous polynomial of the shape

f(x, y) = ax2 + bxy + cy2 in two variables over the ring of rational integers. Such a form is called primitive if the coefficients a, b and c share no common prime divisor; all forms in this thesis will be assumed to be integral and primitive. We say that the binary quadratic form f represents an integer m if there exist integers x and y such that f(x, y) = m. The discriminant of such a binary quadratic form is defined to be q = b2 − 4ac; note that this implies q ≡ 0 or 1 (mod 4). If q = 0, then the form can only represent squares of integers. If q > 0, then the form can represent both positive and negative integers; such a form is called indefinite. If q < 0, then the form can represent either only negative integers or only positive integers (depending on the sign of the coefficient a); the form is accordingly called negative definite or positive definite. 8 Primes represented by positive definite binary quadratic forms

In this thesis we shall only deal with binary quadratic forms of negative discriminant (in fact, only with positive definite forms, since we are interested in the representability of primes) as the theory of these forms is considerably simpler than the theory of forms of positive discriminants. The difference is basically due to the fact that, if the discriminant is negative, then the equation k = ax2 + bxy + cy2 (for fixed integers k, a, b and c) is the equation of an ellipse, which contains only finitely many lattice points; it is, however, the equation of a hyperbola, which contains an infinite number of lattice points, if the discriminant is positive. We quote Gauß: Formae vero determinantium positivorum, quae tractationem prorsus peculiarem requirunt, commentationi alteri reservatae manere debebunt1 and we also hope that the questions which are examined in this thesis for forms of negative dis- criminant will be investigated for forms of positive discriminant in another work. Consequently, we will also state all classical results in this chapter for positive definite binary quadratic forms only – regardless of the possible existence of analogous results for indefinite forms. In order to avoid excessive repetitions of these assumptions, we adopt the following convention: Convention: Whenever we will say form or quadratic form or binary quadratic form in this thesis, we will mean a positive definite integral primitive binary quadratic form.

This introductory chapter intends to give a brief account of the basic, mostly classical results in the prime number theory for binary quadratic forms of a fixed discriminant. It is organized as follows: In the next section we start with a review of the theory of composition of form classes of binary quadratic forms, which we basically owe to Dirichlet, but which has its origins already in the works of Lagrange and Gauß who composed forms instead of form classes. It was exactly this transition that was essential to answer a broad variety of questions on the representability of integers by such forms and made it possible to prove general qualitative as well as the first quantitative statements on the infinitude of primes that are representable by any given primitive binary quadratic form. The convenience to work with classes became even clearer after the introduction of the theory of “ideal numbers” by Kummer and its development by Dedekind. The relation between form classes and ideal classes is exhibited in Section 1.2 and the resulting more precise quantitative prime number theorem is given. Advances in class field theory (in the shape of the Chebotarev density theorem) and in led to even better results, conditional and unconditional ones, which are covered in Section 1.3. There exist plenty of excellent books and papers that present many of these topics in a much more detailed and more elaborated way than it would be possible and appropriate to provide here. The author profited particularly from the books Primes of the form x2 + ny2 [Cox97], Zetafunktionen und quadratische Körper [Zag81] and The shaping of arithmetic after C. F. Gauss’s Disquisitiones arithmeticae [GSS07] while writing this introductory chapter. We would like to end this opening section by reminding the reader that binary quadratic forms have found important applications in cryptography, which we cannot discuss here. Extensive accounts of the underlying algorithms can be found in [BV07], [Bue89] and [Coh93], for example.

1“The forms with positive determinant [discriminant], which require a special treatment, must remain reserved for other studies.”; quoted from the introduction to Gauß’s article De nexu inter multitudinem classium, in quas formae binariae secundi gradus distribuuntur, earumque determinantem, which can be found in the second volume of his collective works as well as in the cumulative German translation Untersuchungen über höhere Arithmetik of his number-theoretic works. Similarly, de la Vallée Poussin stated that “tandis que l’extension se fait naturellement à ces dernières [i.e., forms with negative discriminant], les formes de déterminant positif exigent une analyse beaucoup plus compliquée” in the third part [dlVP96] of his Recherches analytiques sur la théorie des nombres premiers in which he proved the prime number theorem for positive definite binary quadratic forms; he provided the analysis for indefinite forms one year later, in the fourth part [dlVP97] of his work. 1.1 The composition of binary quadratic forms and form classes 9

1.1 The composition of binary quadratic forms and form classes

Gauß’s Disquisitiones Arithmeticae, published in 1801, are widely regarded as the beginning of modern number theory. And about half of this work is devoted to the theory of binary quadratic forms. Building on former definitions and results from Lagrange’s Recherches d’Arithmétique, Gauß revealed here the importance and depth of the notion of equivalence of binary quadratic forms and of the way in which these forms can be composed. Edwards [GSS07, §II.2.1] notes that one of the purposes for Gauß to present the theory of composition in its full generality was to give another proof of quadratic reciprocity – a simpler proof than the one he gave in an earlier part of the Disquisitiones. However, in order to be able to derive the laws and properties of the composition for arbi- trarily given forms, long and complicated computations are necessary and the resulting com- position is not even a binary operation. Later, Dirichlet “simplified” Gauß’s composition by forfeiting the capability of composing arbitrary forms but contented himself with the ability to compose certain forms which are equivalent to the given ones. Thus, his composition of forms is really a composition of equivalence classes of forms. Yet, this kind of composition was sufficient for his questions on the representability of numbers by binary quadratic forms – and therefore it is also sufficient for the questions we will be concerned with in this thesis. We now describe this method of composing equivalence classes of binary quadratic forms. First of all, we must say what we mean by equivalence of quadratic forms: Two binary quadratic forms f and g are (properly) equivalent if there exists an element ! r s ∈ SL(2, ) t u Z such that f(x, y) = g(rx + sy, tx + uy) for all x, y ∈ Z. A short calculation shows that equivalent forms have the same discriminant. Moreover, it can be shown that equivalence of binary quadratic forms is indeed an equivalence relation and the number of equivalence classes – which we call form classes – is finite for any given discriminant of forms; see [Cox97, §2], for example. We denote the set of form classes of forms with discriminant q by K(q) and its cardinality by h(q). The main importance of this classification lies in the following fact (see [Zag81, §8], for example): Equivalent forms represent the same numbers. Therefore, we may define the set  R(q, C) = n ∈ Z | ∀f ∈ C ∃x, y ∈ Z : f(x, y) = n for every negative integer q ≡ 0, 1 (mod 4) and every form class C ∈ K(q). We proceed to the definition of the composition of equivalence classes: Let

f(x, y) = ax2 + bxy + cy2 and g(x, y) = a0x2 + b0xy + c0y2

0 b+b0 be two forms of negative discriminant q and assume that the coefficients a, a and 2 have no common prime divisor. Then the (Dirichlet) composition F of f and g is the form

B2 − q F (x, y) = aa0x2 + Bxy + y2, 4aa0 where B is the unique integer modulo 2aa0 such that

B ≡ b (mod 2a),B ≡ b0 (mod 2a0) and B2 ≡ q (mod 4aa0). 10 Primes represented by positive definite binary quadratic forms

See [Cox97, Lemma 3.2] for a proof of the uniqueness of B. Now, one can show (see the references given in [Cox97, §3]) that this composition of special forms induces a well-defined binary operation on K(q) and turns it into an abelian group – the form class group of discriminant q – with order h(q) and the following identity element and inverses: Given a negative integer q ≡ 0, 1 (mod 4), the principal form of discriminant q is defined by q x2 − y2 if q ≡ 0 (mod 4), 4 1 − q x2 + xy + y2 if q ≡ 1 (mod 4). 4 The form class containing the principal form is the identity element of the class group K(q) and it is called its principal class; we denote it by C0(q) (or simply by C0 if the corresponding discriminant is clear from the context). The inverse of the class which contains the form ax2 + bxy + cy2 is the class which contains the form ax2 − bxy + cy2. We say that a class is an ambiguous class if its order in K(q) is at most 2; any form in an ambiguous class is called an ambiguous form. Switching the attention from quadratic forms to their equivalence classes led to the advent of the first results on the number of primes that may be represented by any given form. Dirichlet already sketched a proof of the infinitude of primes representable by certain binary quadratic forms in 1840, but it was Weber [Web82] who gave the first complete proof that held for all primitive forms:

Theorem 1.1 (Weber). Every positive definite integral primitive binary quadratic form repres- ents infinitely many primes.

De la Vallée Poussin is best known for his proof of the Prime Number Theorem (which was independently proved by Hadamard at about the same time). This work [dlVP96] is even more remarkable if one recalls that it not only contains the proofs of the ordinary prime number theorem and the corresponding one for primes in arithmetic progressions, but he also proved there:

Theorem 1.2 (De la Vallée Poussin). Let π(X; q, C) be the number of primes p 6 X that may be represented by forms in the form class C of the form class group of the negative discriminant q. Then li(X) π(X; q, C) = · 1 + o(1) as X → ∞, e(C)h(q) where e(C) = 2 if C is an ambiguous class and e(C) = 1 otherwise.

The original proofs of these theorems were quite long-winded. In the next section, we will see how the groups K(q), which consist of arithmetic objects, can be linked to groups of algebraic objects, which turn out to be more convenient to work with. This link led to shorter and more precise forms of the above statements and will also be the basis of our results.

Remark 1.3. There exists another natural classification of binary quadratic forms, which is also due to Gauß: We say that two forms of discriminant q lie in the same genus if they represent ∗ the same values in (Z/qZ) . Equivalent forms are always in the same genus, but the converse is usually not true. The most important properties are:

(a) All genera of forms of discriminant q consist of the same number of form classes. If this number is 1 and q = −4n for some positive integer n, then there exists a congruence condition like (1.1) which characterizes the primes of the shape x2 + ny2. 1.2 Algebraic methods for arithmetic objects 11

(b) The genus containing the principal form is called the principal genus. It consists of the squares in the form class group.

(c) Let q be the discriminant of a positive definite form. The number of genera of forms of discriminant q is given by  2ω(q)−1 if q is odd,   ω(q)−2 q 2 if q is even and 4 ≡ 1, 5 (mod 8), κ(q) = q (1.2) 2ω(q)−1 if q is even and ≡ 2, 3, 4, 6, 7 (mod 8),  4  ω(q) q 2 if q is even and 4 ≡ 0 (mod 8),

where ω(q) denotes the number of distinct prime divisors of q. Both the number of ambiguous classes in K(q) and the index of the subgroup (K(q))2 in K(q) are also equal to κ(q).

The proofs of these properties can be found in [Cox97, §3]. We will come across genera and the numbers κ(q) in Chapter 3, but for the most part of this thesis form classes will be more important for us.

1.2 Algebraic methods for arithmetic objects

Kummer, in his endeavours to find a way to compensate the lack of unique factorization in the rings of integers of cyclotomic number fields, introduced the notion of “ideal numbers” in 1847. According to [GSS07, §II.2.1], Kummer was led to the definition of equivalence classes of these numbers by the way Gauß had partitioned binary quadratic forms into classes. The intimate relation to binary quadratic forms persisted when Dedekind generalized Kummer’s concept and introduced the language of ideals. Before we can state this connection explicitly, we have to fix the notation that we will use for certain notions of algebraic number theory: Every quadratic number field K can be written √ uniquely in the form Q( q) for a squarefree integer q 6= 0, 1. Its discriminant dK is given by dK = q if q ≡ 1 (mod 4) and by dK = 4q otherwise. The union of the set of integers which are discriminants of quadratic fields and {1} is called the set of fundamental discriminants. We will denote the set of all negative fundamental discriminants by F, i.e.  F = d ∈ Z | d < 0, d ≡ 1 (mod 4) and d is squarefree (1.3)  d ∪ d ∈ Z | d < 0, d ≡ 0 (mod 4) and 4 6≡ 1 (mod 4) is squarefree . Furthermore, the set of all negative integers q ≡ 0, 1 (mod 4), i.e. the set of negative discriminants of quadratic forms, will be denoted by D. Let q ∈ D. Then there exist a unique positive integer r (which is called the conductor of q) and a fundamental discriminant q ∈ F such that q = r2q . Moreover, there exists a unique √ 0 0 order of discriminant q in Q( q0): Recall that an order O in a quadratic field K is any subring of K containing 1 such that O is a finitely generated Z-module that contains a Q-basis of K. For example, the ring of integers of K is always an order in K and, in fact, the maximal one. The discriminant of any order O in K is the product of the square of the index of O in the ring of integers times the discriminant of the field. For any given discriminant q ∈ D of a binary √ quadratic form, we will only be interested in the order of discriminant q in ( q ), which we √ Q 0 denote by O(q); the ring of integers of Q( q0) will be denoted by Oq. Note that O(q) equals Oq if q ∈ F. 12 Primes represented by positive definite binary quadratic forms

For all q ∈ D we define: • I(q), the group of invertible fractional O(q)-ideals, i.e. the group of invertible finitely √ generated O(q)-submodules of Q( q0); note that fractional O(q)-ideals are usually not ring ideals of O(q);

• P (q), the subgroup of principal fractional O(q)-ideals;

•H(q), the quotient I(q)/P (q), i.e. the ideal class group of the order O(q);

• Z(q), the set of non-zero integral O(q)-ideals;

• N(a), the norm of the ideal a ∈ Z(q), i.e. the size of the quotient ring O(q)/a (the dependence on q is suppressed).

The algebraic properties of all these objects are explained in [Cox97, §5 and §7], for example. At this point, we just recall that H(q) is always a finite abelian group and say, analogously to the notion for form classes, that an ideal class K ∈ H(q) is ambiguous if K = K−1 in H(q). Binary quadratic forms and ideal classes are linked through the following result, which is due to Dedekind:

Lemma 1.4 (Dedekind). For every negative discriminant q, there exists an isomorphism

Bq : K(q) → H(q)

2 2 which is induced by the map that sends the binary√ quadratic form f(x, y) = ax + bxy + cy to −b+ q the ideal of O(q) that is generated by a and 2 . In particular, we have h(q) = |K(q)| = |H(q)|.

Moreover, a positive integer m is represented by the positive definite binary quadratic forms in the class C ∈ K(q) if and only if there exists an ideal a ∈ Bq(C) such that N(a) = m. A proof can be found in [Cox97, Theorem 7.7], for example. This relation helped to drive forward the development of algebraic number theory thanks to the extensive theory Gauß had created on the arithmetic side of this bijection. On the other hand, it turned out that many statements on binary quadratic forms can be proved in a simpler way by using the amenities of the algebraic side.

Remark 1.5. A major drawback of orders is the fact that they are usually not Dedekind domains (i.e., the factorization is not unique at the level of ideals); on the other hand, the ring of integers Oq is always a Dedekind domain. However, it turns out that this problem is not a severe one for most questions on primes represented by binary quadratic forms. In fact, if 2 q ∈ D with q = r q0, where q0 ∈ F and r is a positive integer, we let Ir denote the group of fractional Oq-ideals a satisfying a + rOq = Oq. Moreover, let Pr denote the subgroup of Ir generated by principal ideals of the form αOq such that α ∈ Oq satisfies α = a (mod rOq) for some integer a with (a, r) = 1. Then one can show (see [Cox97, §7]) that there exists an isomorphism Beq : H(q) → Ir/Pr. In particular, it follows with Lemma 1.4 that a positive integer m satisfying (m, r) = 1 is represented by the positive definite binary quadratic forms in C ∈ K(q) if and only if there exists an Oq-ideal a ∈ Bbq(C) := Beq(Bq(C)) such that |Oq/a| = m. We state the resulting qualitative information about the representability of integers by binary quadratic forms of a given discriminant in a more explicit way: 1.2 Algebraic methods for arithmetic objects 13

Proposition 1.6. Let q be a negative discriminant and let C ∈ K(q) be a form class. Write 2 q = r q0, where r is the conductor of q and q0 ∈ F.

(a) Let p be a prime which does not divide r. Then there exist x, y ∈ Z and a binary quadratic form f of discriminant q such that f(x, y) = p if and only if either

2 (i) p ramifies in Oq, i.e. there exists a prime ideal p in Oq with pOq = p ; in this case p may be represented by forms of the class C if and only if p ∈ Bbq(C) and p is then representable by forms of the class C only; or

(ii) p splits in Oq, i.e. there exist distinct prime ideals p1, p2 in Oq with

pOq = p1p2.

In this case p may be represented by forms of the class C if and only if p1 ∈ Bbq(C) −1 or p2 ∈ Bbq(C); in particular, p1 ∈ Bbq(C) if and only if p2 ∈ (Bbq(C)) , i.e. p is representable exactly by forms of the classes C and C−1.

(b) Let n be a positive integer which is coprime to r and let

Y αi Y βj Y γk Y δ` n = pi rj sk t`

be its prime factorization, where the first product is taken over all primes which split in Oq and are representable by forms of ambiguous classes, the second product is taken over all primes which split in Oq and are representable by forms of non-ambiguous classes, the third product is taken over all primes which remain prime in Oq and the fourth product is taken over all primes which ramify in Oq.

Denote the class which represents the prime pi by Cpi , the classes which represent the −1 2 prime rj by Crj and Crj , the class which represents the square sk by Csk (which must

always be the principal class) and the class which represents the prime t` by Ct` (which must always be an ambiguous class). Then n is representable by forms of discriminant q if and only if all the exponents γk are even and it is then representable by exactly the classes of the form

Y αj Y βj −vj Y Y Y Y βj −vj Y C C Cγk/2 Cδ` = Cαi C Cδ` pi rj sk t` pi rj t`

for all tuples (vj) of integers vj ∈ {0, 2,..., 2βj}.

Remark 1.7. Most of the classical results that we will present in this chapter are known to hold for both non-fundamental and fundamental discriminants. It should also be possible to prove many of the original results in Chapter 2 and Chapter 3 in a general form for both kinds of discriminants – along the lines of the proofs that we will give for fundamental discriminants only. However, we believe that the amount of additional technical details that are usually necessary for general proofs – due to the peculiarities of the square factors of non-fundamental discriminants – would often eclipse the main arguments. From now on, we will therefore restrict our attention to fundamental discriminants. In particular, we may henceforth always assume that O(q) = Oq. 14 Primes represented by positive definite binary quadratic forms

Remark 1.8. If a negative fundamental discriminant q and a form class C ∈ K(q) are given and we want to estimate a sum of the form

X g(n) n6X n∈R(q,C) for some arithmetic function g, then Proposition 1.6 allows us to equivalently estimate the sum g(n) X X , v(C, n) n6X a∈Bq(C) N(a)=n where the weight function v(C, n) accounts for the fact that, in general, the number of ideals a ∈ Bq(C) with norm n does not equal 1. Thus, if there exists in Bq(C) an ideal a with norm n, then v(C, n) is the number of ideals a ∈ Bq(C) with N(a) = n. For further use, we also set

w(C, n) = X 1. (1.4)

a∈Bq(C) N(a)=n

Note that v(C, n) remains undefined if there is no ideal a ∈ Bq(C) with N(a) = n, while w(C, n) = 0 in this case; thus, we have w(C, n) = 0 if and only if n∈ / R(q, C), by Lemma 1.4. Using Proposition 1.6, we may also give an expression for w(C, n) which does not use the language of ideals but only the language of form classes: We have

Y Y βj −vj Y Y w(C, n) = (v ): C = Cαi C Cδ` · (α + 1); (1.5) j pi rj t` i the second factor arises from the (αi + 1) possibilities when choosing the prime ideals which lie over each split prime pi that is representable by forms of an ambiguous class. Remark 1.9. The arithmetic functions we will be most interested in are the characteristic function for the set of rational primes and (smooth versions of) the von Mangoldt function. For q all (positive or negative) fundamental discriminants q, let χq denote the Kronecker symbol ( · ) (see [IK04, §3.5] or [MV07, §9.3] for the explicit definition); it equals the unique primitive real Dirichlet character modulo |q| if q 6≡ 0 (mod 8) (there are two primitive real Dirichlet characters if q ≡ 0 (mod 8)). For each rational prime p, the number of solutions m (mod p) to m2 ≡ q (mod p) equals 1 + χq(p) and one can easily show (see [Cox97, Proposition 5.16], for example):

2 • If χq(p) = 0, i.e. if p divides q, then p ramifies in O(q), i.e. pO(q) = p for some prime ideal p of O(q) and N(p) = p;

• if χq(p) = 1, then p splits in O(q), i.e. pO(q) = p1p2 for two distinct prime ideals p1, p2 of O(q) and N(p1) = N(p2) = p;

• if χq(p) = −1, then p remains prime in O(q), i.e. pO(q) = p is a prime ideal in O(q) and N(p) = p2. Consequently, Proposition 1.6 implies that, if n = p` for a prime p and a positive integer ` and if n can be represented by the forms in the class C ∈ K(q), then  6 ` + 1 if χq(p) = 1, `  w(C, n) = w(C, p ) = 1 if χq(p) = 0, (1.6)  = 1 if χq(p) = −1 (and ` must be even). 1.2 Algebraic methods for arithmetic objects 15

Only a small set of primes ramifies in O(q). Thus, if the number w(C, p) is positive, it will usually be given by w(C, p) = 2 if C is ambiguous, and w(C, p) = 1 otherwise. For further use, we therefore put (2 if C is ambiguous, e(C) = (1.7) 1 if C is not ambiguous. Note that we thus have

X X X X X X w(C, p) = e(C) 1 − 1 = (1 + χq(p)). (1.8) C∈K(q) p6X C∈K(q) p6X p6X p6X p∈R(q,C) p|q

In Chapter 2, we will be interested in questions of uniformity. That is, given a large real number X and a “reasonable” arithmetic function g, we would like to know whether there exists an estimate for X g(n) (1.9) n6X n∈R(q,C) which is uniform in (i.e., independent of) the choice of the form class C ∈ K(q) and the error term of which is also uniform in the choice of the discriminant q. Thus, one would instinctively expect a “reasonable” function to be a function which shows no obvious reason to favour any classes or discriminants; the sum in (1.9) should, for any specific form class, therefore not differ much from the average over all C ∈ K(q) of such sums, for all q in some large range. However, the distinct behaviour of ambiguous and non-ambiguous classes in their capability to represent primes, which is evident from Proposition 1.6 and the remarks above, shows that we usually cannot expect estimates that are independent of the given form class. Nevertheless, if g is the characteristic function for the set of prime numbers, for example, we may still hope that a “uniformity up to the factors e(C)” holds. Due to the close relation between forms and ideals, we may then ask the same question on uniformity for sums of the type X X ge(n) (1.10) n6X a∈C N(a)=n for (fundamental) discriminants q, ideal classes C ∈ H(q) and arithmetic functions ge. It turns out that there is less reason to expect a significant dependence on the given class here.2 Ana- lytic methods also often tend to cooperate better with algebraic objects like ideals than with arithmetic objects like quadratic forms. Hence, chances are better to estimate a sum like (1.10) g(n) for a function ge(n) which is “usually” close to v(C,n) (see Remark 1.8), then translate the result by means of the bijection Bq to an estimate for (1.9) and hope that the term “usually” indeed means “sufficiently often” in order to give an additional error term which is small (and still uniform). We will often benefit from this procedure. Leaving the uniformity in q aside (which will be the topic of Chapter 2), there exist classical results which give uniformity in C ∈ H(q) in (1.10) for certain functions ge. For constant functions we have:

2Note, however, that this is basically only true if the inner sum in (1.10) is only over ideals which are products of prime ideals that lie over split primes: By Remark 1.9, prime ideals that lie over ramified primes or over primes that remain prime in O(q) may only be contained in an ambiguous class or in the principal class, respectively. Conveniently, these prime ideals are rare (or have a relatively large norm) and are therefore negligible for most of our considerations. 16 Primes represented by positive definite binary quadratic forms

Theorem 1.10 (The ideal theorem for ideal classes). Let q ∈ F with |q| > 4 and let C ∈ H(q). Then X X πX 1/3 1 = + Oq X ). p|q| n6X a∈C N(a)=n

This was proved by Landau in 1918; see [Nar04, §7.4.13] and the references there.

Of even greater interest to us, when ge is the characteristic function for the set of primes, we have li(X)  √  X X 1 = + O Xe−c log X (1.11) h(q) q p6X a∈C prime N(a)=p for all q ∈ F, all C ∈ H(q) and a constant c = c(q) > 0. This is a consequence of:

Theorem 1.11 (The prime ideal theorem for ideal classes). Let q ∈ F and let C ∈ H(q). Then there exists a constant c = c(q) > 0 such that

li(X)  √  X 1 = + O Xe−c log X , (1.12) h(q) q p∈C N(p)6X where the sum on the left side is over prime ideals of O(q) only.

Landau proved a general version of this statement with a weaker error term in 1907. The version above was shown by him in 1918 and the error term has been only slightly improved since then; see [Nar04, §7.2 and §7.4.12] and the references there. Note that the left sides of (1.11) and (1.12) may only differ by the prime ideals which lie over rational primes that remain prime in O(q) (and this may only happen if C is the principal class). By Remark 1.9, the norm of these prime√ ideals is the square of the respective rational primes. Thus, their contribution is less than X and therefore negligible in (1.11). From (1.11) and our previous remarks, we easily derive:

Theorem 1.12 (Landau’s prime number theorem for binary quadratic forms). Let q ∈ F and let C ∈ K(q). Then there exists a constant c = c(q) > 0 such that

li(X)  √  π(X; q, C) := X 1 = + O Xe−c log X . e(C)h(q) q p6X prime p∈R(q,C)

Indeed, whenever a prime ideal of O(q) lies over a split prime p in Bq(C), there are, by Proposition 1.6 and Remark 1.9, exactly e(C) prime ideals lying over p in Bq(C). Thus,

e(C)π(X; q, C) = X X 1 + O(log |q|), p6X a∈C N(a)=p where the error term takes into account potential prime ideals which lie over ramified primes, i.e. whose norm divides q; note that there are ω(q)  log |q| such prime ideals. This error term is, of course, negligible in Theorem 1.12. Therefore, Theorem 1.12 follows from (1.11).3

3It should be remarked that Landau [Lan14] gave a direct proof of Theorem 1.12 already in 1914 – without using ideals and even with an absolute constant c. 1.3 The Chebotarev density theorem and conditional results 17

Remark 1.13. As for the distribution of all integers representable by forms in a given form class C ∈ K(q), Bernays [Ber12] proved – without using ideals – that there exists a constant b(q), which does not depend on C, such that

b(q)X  X  X 1 = + O (1.13) (log X)1/2 a,q (log X)1/2+a n6X n∈R(q,C)

1 1  for every a < min h(q) , 4 . Unless X is much larger than |q| (that is, unless |q| is smaller than (log X)), an easy lattice point counting argument gives a better estimate: If f ∈ C, then

X 2 X 1/2 1 {(x, y) ∈ Z | f(x, y) X}  + X . (1.14) 6 6 p|q| n6X n∈R(q,C)

A proof can be found in [BG06, Lemma 3.1], for example; more precise results for all ranges of X (relative to q) are given in Theorem 6 of the same paper. Note that the corresponding statement for integers in any given reduced residue class a of any given modulus q is completely trivial.

In the last section of this introductory chapter, we will review versions of Theorem 1.12 with an explicit dependence on q as well as conditional results.

1.3 The Chebotarev density theorem and conditional results

For the investigations on discrepancies in the distribution of primes of the shape x2 + ny2 in Chapter 3, we will need precise information on the size of the error term in Theorem 1.12 when a suitable version of the Generalized Riemann Hypothesis is assumed. Such information exists in the literature in the form of (effective) versions of the Chebotarev density theorem. This theorem, first published in 1923, was one of the milestones of algebraic number theory. It provides quantitative information on the splitting behaviour of prime ideals in Galois exten- sions of number fields:

Theorem 1.14 (Chebotarev density theorem). Let K be a number field, let L be a Galois extension of K and let C be a conjugacy class in the Galois group G = Gal(L/K). Let P be the L/K  set of prime ideals of K which are unramified in L. For each p ∈ P, let be the Artin p symbol, i.e. the conjugacy class of Frobenius automorphisms in G corresponding to prime ideals in L which divide p. Then, as X → ∞,     n L/K o |C| X π(X; L/K, C) := p ∈ P : = C, N(p) X = + o(1) . e p 6 |G| log X

See [Nar04, Theorem 7.30] and the references in §7.4.15 of that book. The connection to our questions is given by the Artin reciprocity law in class field theory (see [Cox97, §5, §8 and §9]): Let q ∈ D and C ∈ K(q). We have seen in the last section that we can translate questions on primes which may be represented by forms in C to questions on ideals in the ideal class group Bq(C) ∈ H(q). Moreover, one can show (see [Cox97, §9.A]) that there exists an isomorphism B0 : H(q) → Gal(L/K) between H(q) and the Galois group √ q Gal(L/K), where K = Q( q) and L is the ring class field of the order O(q). If q is a fundamental discriminant – which we always assume –, then L is called the Hilbert class field of K and is 18 Primes represented by positive definite binary quadratic forms the maximal unramified abelian extension of K. Since Gal(L/K) is abelian, every conjugacy class contains only one element. Thus, Theorem 1.14 yields

 1  X π(X; L/K, B0 (B (C))) = + o(1) e q q h(q) log X as X → ∞. With this information and Proposition 1.6, one can therefore recover Theorem 1.2 from Theorem 1.14. Lagarias and Odlyzko [LO77] gave an explicit error term in Theorem 1.14. Using the con- nection which we have just described, their result yields:

Theorem 1.15 (Explicit prime number theorem for binary quadratic forms). Let q ∈ F and let C ∈ K(q). Then there exists an absolute constant c > 0 such that

li(X) li(Xβ)  √  π(X; q, C) = + + O Xe−c h(q) log X , e(C)h(q) e(C)h(q) where β is a possible Landau–Siegel zero4 of the Dedekind zeta-function of the Hilbert class field √ of Q( q). We will not use this theorem, because we will need a much stronger – and therefore conditional – result to achieve useful statements in Chapter 3. Lagarias and Odlyzko proved such a result that is conditional on the Generalized Riemann √ Hypothesis for the Dedekind zeta-function ζL of the Hilbert class field L of K = Q( q). We recall that ζL is given by X 1 ζL(s) = s a N(a) for all s ∈ C with Re(s) > 1, and this series admits an analytic continuation to C r {1} (see [Neu92, §VII.5], for example); here the sum is over the non-zero ideals of the ring of integers of L, and N(a) denotes the corresponding absolute norm of the ideal a. Serre [Ser81] gave a slightly simplified version of the result in [LO77]: If the Dedekind zeta- 1 function of the Hilbert class field has all its non-trivial zeros on the line Re(s) = 2 , then the conditional prime ideal theorem

li(X) π(X; L/K, C) = + OX1/2(log |q|X) (1.15) e h(q) holds for all q ∈ F and all conjugacy classes C ∈ Gal(L/K); compare [Ser81, Théorème 4 and (20R)]. As in the proof of Theorem 1.12, we may conclude the following conditional prime number theorem for binary quadratic forms:

Theorem 1.16 (Explicit prime number theorem for binary quadratic forms under GRH). Let q ∈ F and let C ∈ K(q). Assume the Generalized Riemann Hypothesis for the Dedekind zeta- √ function of the Hilbert class field of Q( q). Then

li(X) π(X; q, C) = + OX1/2(log |q|X). (1.16) e(C)h(q)

4These hypothetical zeros are similarly defined as the Landau–Siegel zeros for Dirichlet L-functions that we will discuss in Section 2.3.2; see [LO77, Theorem 1.3] for the precise definition and a reference to explicit and effectively computable bounds for β. Thus, this result is indeed an explicit and effective version of Theorem 1.12. 1.3 The Chebotarev density theorem and conditional results 19

A conditional version of a formula that links sums of the values of ideal class group characters at prime ideal powers with sums over the zeros of the corresponding L-functions up to a certain height will also be necessary; such formulas are known as (approximate) explicit formulas. One can derive a formula of this kind from the results in [LO77, §7]; this has already been done in [LP92] (see the proof of Theorem 3.1 there). Before we state the result, we recall the notions of (ideal) class group characters and the corresponding L-functions:

Remark and Definition 1.17. The results in [LO77, §7] are given for the Artin L-functions which are attached to the characters of Gal(L/K). However, because H(q) and Gal(L/K) are isomorphic, these results also hold for (ideal) class group characters, i.e. for the group homomorphisms from H(q) to the unit circle in the complex plane. We denote the set of the ideal class group characters for the discriminant q by Hb(q) – which (q) is therefore the (Pontryagin) dual group of H(q) – and we write χ0 for the trivial character. Overloading the notation, we define χ(a) := χ(C) if C ∈ H(q) is the ideal class of the non-zero fractional ideal a. For further use, we also define

X λχ(n) := χ(a) a∈Z(q) N(a)=n for all χ ∈ Hb(q) and all positive integers n. The L-functions associated to the characters χ ∈ Hb(q), the class group L-functions, are given by χ(a) λ (n) L(s, λ ) := X = X χ χ N(a)s ns a∈Z(q) n>1 for Re(s) > 1 and each of these series has an analytic continuation to the whole complex plane (q) unless χ = χ0 when the continuation is meromorphic with a pole at s = 1 (see [Hec17, §4]). We also note that the Dedekind zeta-function ζL may be written as the product of all Artin L-functions which are attached to the irreducible characters of Gal(L/K) (see [Neu92, §VII.10]). 0 The isomorphism Bq : H(q) → Gal(L/K) then yields a product expansion of ζL(s) in terms of the class group L-functions L(s, λχ) of all χ ∈ Hb(q). Since the functions L(s, λχ) are entire whenever χ is non-trivial (and have only a pole in s = 1 otherwise), the assumption of the Generalized Riemann Hypothesis for ζL is therefore equivalent to the assumption that all class 1 group L-functions for the discriminant q have no non-trivial zeros off the line Re(s) = 2 . By [LO77, §7] and [LP92, §3], a conditional approximate explicit formula for ideal class group characters is therefore given by:

Theorem 1.18. Let q ∈ F and let χ be a non-trivial character in Hb(q). Let Λe denote the von Mangoldt function for powers of prime ideals in O(q) (see (2.36)). Let T > 2. Assume the Generalized Riemann Hypothesis for the Dedekind zeta-function of the Hilbert class field of √ Q( q). Then

1/2+iγ 2 X X X   X(log XT )  ψe(X; χ) := χ(a)Λ(e a) = − + O (log |q|) (log X) + , 1 + iγ T a∈Z(q) |γ|

P 1 where the sum γ is over the zeros 2 + iγ with |γ| < T of the L-function L(s, λχ) that is associated to the class group character χ. 20 Primes represented by positive definite binary quadratic forms

Remark. It will not escape the reader’s notice that the complete generality and depth of neither the original Chebotarev density theorem nor its explicit versions by Lagarias–Odlyzko and Serre is needed for our questions on primes represented by binary quadratic forms. It is, of course, just a very special case of their result that we actually use. The proof of Lagarias and Odlyzko is, as they also say, “a direct descendent of de la Vallée Poussin’s proof of the prime number theorem”. It would therefore be certainly possible to deduce the results above by making explicit either Landau’s or de la Vallée Poussin’s proof of the prime number theorems for binary quadratic forms that we have mentioned in the preceding sections.

Remark 1.19. Moreover, we will not even use the explicit dependency on the discriminant that is present in Theorem 1.16 and Theorem 1.18. Nevertheless, we decided to state these results explicitly to make clear that they may be a starting point for an improvement of our Theorem 3.5 on prime number races in terms of a completely explicit error term – after all, the proofs in [LO77] should make it possible to give numerical values for the implicit absolute constants in the theorems above. Chapter 2

The average distribution of primes represented by positive definite binary quadratic forms with varying discriminant

Among the most important and attractive questions in the theory of primes are those that ask for uniformities in their distribution. The prime number theorem for binary quadratic forms, Theorem 1.12, therefore raises the question whether the error term can be shown to be uniform, that is, independent of the discriminant for discriminants in certain ranges. As we have mentioned before, the prime number theorem for binary quadratic forms (in the form of Theorem 1.2 with non-explicit error term) is as old as the corresponding theorem for primes in arithmetic progressions. The investigation of uniformity in the distribution of primes in arithmetic progressions has formed an area of extensive research in the past century. On the other hand, there exist hardly any uniformity results for binary quadratic forms and so we aim to reduce this deficit in this chapter. Uniformity results – with respect to the discriminants or the moduli – for primes represented by binary quadratic forms or primes in arithmetic progressions can be basically classified into the following four types:

(i) for all discriminants or moduli of a certain range and all corresponding form classes or residue classes;

(ii) for almost all discriminants or moduli of a certain range and all corresponding form classes or residue classes:

(iii) for almost all discriminants or moduli of a certain range and any fixed form class or residue class;

(iv) for almost all discriminants or moduli of a certain range and almost all corresponding form classes or residue classes.

Results of all these types exist for primes in arithmetic progressions. The strongest known theorems of each type are

(i) the Siegel–Walfisz theorem;

(ii) the Bombieri–Vinogradov theorem; 22 The average distribution of primes represented by binary quadratic forms

(iii) the Fouvry–Iwaniec / Bombieri–Friedlander–Iwaniec theorems; (iv) the Barban–Davenport–Halberstam theorem. The strength of the underlying assumptions decreases from (i) to (iv) and therefore the known admissible ranges for the moduli increase from (i) to (iv). We will review all these results in the next section. The only hitherto known result of this kind for primes represented by binary quadratic forms has been of type (i): Theorem 2.1 (Blomer, [Blo04a, Lemma 3.1]). For any A > 0, there exists a constant c = c(A) > 0 such that

li(X)  √  π(X; q, C) = + O Xe−c log X e(C)h(q)

A uniformly for all negative integers q ≡ 0 or 1 (mod 4) with |q| 6 (log X) and all C ∈ K(q). We are not aware of any prior results of types (ii)–(iv) for primes represented by binary quadratic forms and will therefore prove results of type (ii) in Section 2.3 and results of type (iv) in Section 2.4. Type-(iii)-results that go beyond the range of the corresponding type-(ii)-results are usually much deeper and will be left aside in this work. Appropriate large sieve inequalities lie at the heart of results of types (ii)–(iv). Such an inequality for complex class group characters will be proved in Section 2.2. We go without an analogous inequality for real class group characters as these can be regarded as convolutions of Dirichlet characters and may be controlled in a simpler way. We close this chapter in Section 2.5 with two easy applications and a list of open problems. All discriminants in this chapter are assumed to be negative fundamental discriminants which are not integer multiples of 8 (this last restriction will be used to greatly simplify the proof of Lemma 2.8, but it does not appear to be a crucial condition to make the proof work); we denote this set of discriminants by F.

2.1 Mean-value results for primes in arithmetic progressions

Davenport [Dav00] calls Dirichlet’s 1837 memoir, entitled Beweis des Satzes, daß jede unbe- grenzte arithmetische Progression, deren erstes Glied und Differenz ganze Zahlen ohne gemein- schaftlichen Factor sind, unendlich viele Primzahlen enthält, the beginning of analytic number theory. In this paper Dirichlet gave the first correct proof of the infinitude of primes in arith- metic progressions whose difference, i.e. the modulus, and the first element, i.e. the reduced residue class, are coprime. In 1896, Hadamard and independently de la Vallée Poussin (in the second part of his comprehensive work [dlVP96] that we already mentioned in Chapter 1) proved 1  X that π(X; q, a), the number of primes p 6 X satisfying p ≡ a (mod q), equals ϕ(q) + o(1) log X as X → ∞ if q and a are positive integers with (a, q) = 1; that is, the primes are uniformly distributed in reduced residue classes with respect to a fixed modulus. Landau was the first to give an explicit error term and the essentially best known result for π(X; q, a) is still due to Page who proved in 1935 that

βex √ li(X) χex(a)X  −c log X  π(X; q, a) = − + Oq Xe ϕ(q) ϕ(q)βex for all q and a with (a, q) = 1; here c is a positive (effectively computable) constant and the second term is only present when there is an exceptional character χex modulo q and βex is the 2.1 Mean-value results for primes in arithmetic progressions 23 corresponding Landau–Siegel zero (see Section 2.3.2 for the definition of such characters and zeros). Shortly after this result of Page, estimates which are uniform in the modulus began to emerge: 1. Walfisz used a result of Siegel on lower bounds for the size of Dirichlet L-functions at the point 1 to get a uniformity result of type (i):1 Theorem 2.2 (Siegel–Walfisz theorem). Let A > 0. There exists a number c = c(A) > 0 such that li(X)  √  π(X; q, a) = + O Xe−c log X , ϕ(q) A uniformly for all pairs of positive integers a and q with (a, q) = 1 and q 6 (log X) . Thus, this result is uniform in q, however only in a very small range of moduli. Moreover, c is a positive, but not effectively computable constant depending only on A. That is, the proof does not enable us to calculate the constant as long as we neither know that there do not exist Landau–Siegel zeros nor know the Dirichlet L-functions which have such an exceptional zero. Which error terms and therefore which ranges of uniformity can one expect? The Generalized Riemann Hypothesis for Dirichlet characters implies the much better error term O(X1/2(log X)2) and there is reason to believe (see [MV07, Conjecture 13.9]) that the true size of the error term is ! X1/2+ε O (2.1) q1/2 for q 6 X. On the other hand, Friedlander and Granville [FG92] – building on an idea of Maier – showed that the asymptotic formula π(X) π(X; q, a) ∼ , ϕ(q)

−A for any fixed a, cannot hold uniformly in the range q 6 X(log X) for any fixed A > 0. 2. If one relinquishes the premise to find uniformity for both all moduli in a large range and all corresponding residue classes, then one can fill much of the space between the Siegel– Walfisz theorem and the result of Friedlander and Granville: The enhancements of Linnik’s large sieve in the 1960s (see Section 2.2) led to unconditional results which show that these error terms do hold “on average” (in a suitable sense). The most influential result of this kind was the following type-(ii)-result, independently proved by Bombieri [Bom65] and Vinogradov [Vin65, Vin66], which shows that the prime number theorem for arithmetic progressions holds uniformly for almost all arithmetic progressions modulo q 6 Q with Q only slightly smaller than X1/2. Theorem 2.3 (Bombieri–Vinogradov theorem). For each A > 0, there exists a number B = B(A) such that

X li(X) −A max π(X; q, a) − A X(log X) a q ϕ(q) q Q 6 6 (a,q)=1

1/2 −B 2 for Q 6 X (log X) . The implied constant is not effectively computable. 1References to this and the other above-mentioned classical results as well as a more detailed exposition of their history can be found in the chapters 2.1, 5 and 6.2 of [Nar00]. 2 1/2−ε Vinogradov’s result was actually slightly weaker in that his admissible range was Q 6 X for any arbitrarily small ε > 0. See also [HR11, p. 127] for references to weaker results of this type which had been previously proved by Rényi, Barban and Pan. 24 The average distribution of primes represented by binary quadratic forms

Although this bound is only a power of a logarithm smaller than what the trivial estimate X π(X; q, a) 6 q + 1 would give, the Bombieri–Vinogradov theorem can replace the General- ized Riemann Hypothesis in many cases. This is the case, for instance, for the Titchmarsh divisor problem, i.e. the question of determining the asymptotic behaviour of the function P τ(p + a) for any fixed integer a, and the proof of the infinitude of primes p such that p6X p + 2 has at most three prime factors; see §3.5 and §9 of [HR11]. Lately, it was used to prove the remarkable fact that lim inf pn+1−pn = 0, where p denotes the n-th prime [GPY09].3 n→∞ log pn n It is worth mentioning that the conjectured error term (2.1) suggests that the Bombieri– 1−ε Vinogradov theorem should hold up to Q 6 X for all ε > 0; this is the Elliott–Halberstam conjecture. Analogues of the Bombieri–Vinogradov theorem have been proved in various contexts. With regard to the connection between classes of binary quadratic forms and ideal classes of quadratic fields that we will exploit in Section 2.3, it is important to note that many results of Bombieri– Vinogradov type have also already been proved for number fields by Wilson (1969), Huxley (1971), Fogels (1972), Johnson (1979) and Hinz (1988)4, but all these results have examined cases in which the number field is fixed; this is not useful in our case. The only results that have hitherto been proved for varying number fields are [MM87] and the recent generalization [MP13]; the fields in these works are of the form K(ζq) where K is a fixed number field, ζq is a primitive q-th root of unity and q varies. This case is also quite different to what we will prove in Section 2.3. We will not discuss the proof(s) here, since our variants for binary quadratic forms in Section 2.3 will roughly follow the proofs for arithmetic progressions. We just remark that the two main ingredients of all known proofs (except for Vinogradov’s proof, which is somewhat different and harder to describe) are the Siegel–Walfisz theorem, which handles the initial range of moduli, and the use of the large sieve inequality, which we will discuss in the next section. The method of proof still changed much over the years: In the original proofs of Bombieri and Vinogradov the theorem was a consequence of zero-density results for Dirichlet L-functions; Gallagher [Gal68] then found a way to omit any discussion of zeros. Nowadays bilinear forms are at the heart of the proofs; see [IK04, §17] or [FI10, §9.8], for example. 3. A systematic use of bilinear forms was also one of the keys to the type-(iii)-results of the 1980s. Bombieri, Deshouillers, Fouvry, Friedlander and Iwaniec (in alphabetical, but not chronological order; moreover, working together in various combinations) were the main figures who introduced new deep combinatorial techniques and the use of Kloosterman sums to problems on the distribution of primes in arithmetic progressions. We just quote two of their type-(iii)-results and refer the reader to [Bom87, §12] and [FI10, §22.2] for an exposition of and references to their results.

Theorem 2.4 (Fouvry–Iwaniec / Bombieri–Friedlander–Iwaniec theorems). Let a 6= 0.

(a) There exists an absolute constant B > 0 such that

X li(X) X π(X; q, a) − a ϕ(q) (log X)(log log X)B Q6q<2Q (a,q)=1

1/2+(log log X)−B if Q 6 X . 3This result has seen an astonishing recent improvement, which we state at the very end of this section. The proof of this advancement does not use the Bombieri–Vinogradov theorem, but it relies instead on deep methods similar to the ones which are used in the proofs of the type-(iii)-results that we mention below. 4See the references given in [Nar04, §7.4.12]. 2.2 A large sieve inequality for complex ideal class group characters 25

4/7−ε (b) Let ε > 0 and Q = X . Let λ be an arithmetic function such that, for any Q1,Q2 > 1 with Q1Q2 = Q, there exist two arithmetic functions λi (i = 1, 2) with λi(n) = 0 if n > Qi, 5 |λi(n)| 6 1 for all n > 1 and λ = λ1 ∗ λ2 (the Dirichlet convolution). Then  li(X) X λ(q) π(X; q, a) −  X(log X)−A ϕ(q) a,A,ε q>1 (a,q)=1 for all A > 0. Notably, the range of moduli under consideration crosses the square-root barrier and there- fore the realm of the Generalized Riemann Hypothesis. 4. Finally, the range of the moduli can be made even longer and the proof of the following result is even the easiest one of the theorems that we quoted in this section. We pay for this larger range by giving up some control over the residue classes because only a (square) mean over residue classes is considered, i.e. it is a result of type (iv). Theorem 2.5 (Barban–Davenport–Halberstam theorem). For each A > 0, there exists a num- ber B = B(A) such that

 li(X)2 X X π(X; q, a) −  X2(log X)−A ϕ(q) A q6Q a6q (a,q)=1

−B for Q 6 X(log X) . The implied constant is not effectively computable. This was proved by Barban and independently by Davenport and Halberstam in the 1960s. The proof is again based on the Siegel–Walfisz theorem as well as the large sieve inequality and it is similar to the proof of the Bombieri–Vinogradov theorem, but most of the technical difficulties of the latter proof do not arise here. Montgomery – who notably did not use any kind of large sieve inequality but a deep result of Lavrik on twin primes on average – and Hooley replaced the inequality in the theorem by an asymptotic formula.6 Although the double-average nature of the Barban–Davenport–Halberstam theorem is usu- ally not as useful in applications as the Bombieri–Vinogradov theorem, variations of it have cameo parts in the proofs of two fascinating results in prime number theory: in the work [FI98] of Friedlander and Iwaniec on the infinitude of primes of the form x2 + y4 and in Zhang’s work [Zha13] on the infinitude of primes for which the gap to the next prime is bounded by an explicit absolute constant.

2.2 A large sieve inequality for complex ideal class group characters

Initiated by Linnik in the early 1940s, the large sieve had its first heyday during the 1960s and early 1970s, being essential in the proofs of seminal results like the already mentioned Bombieri– Vinogradov theorem on the distribution of primes in arithmetic progressions or Chen’s theorems towards the Goldbach conjecture and the Twin Prime conjecture (see Chapter 11 in [HR11]). Bombieri, Friedlander, Iwaniec, Fouvry and others revived its importance in the 1980s with the type-(iii)-results that we have discussed in the previous section.

5Arithmetic functions with this property are said to be well-factorable. 6See the first paper [Hoo75] of Hooley’s “On the Barban–Davenport–Halberstam theorem”-series (the count of which reached XIX by 2007) for the last-mentioned result and references to the earlier ones. 26 The average distribution of primes represented by binary quadratic forms

Recently, new fascinating applications to the theory of automorphic forms, but also in arithmetic geometry, ergodic theory and other areas were found; see [IK04, §7] and [Kow08] for these late developments. In addition to the already mentioned books, we would also like to cite the very early excellent account [Bar66] on the applications of the large sieve in analytic number theory. The general form of large sieve inequalities is as follows: For a given finite set Y of “harmonics” that are defined on a range {1,...,N} of integers, the large sieve provides a constant c = c(Y,N) 0 such that the inequality P P a y(n) 2  ckak2 holds for > y∈Y n6N n N all complex vectors a = (a1, . . . , aN ) ∈ C . The appropriate harmonics for problems involving primes in arithmetic progressions are the Dirichlet characters. Hence, the results of the preced- ing section are built on this large sieve inequality for Dirichlet characters:

Lemma 2.6 (Large sieve inequality for Dirichlet characters). For any positive integers Q and

N and any complex numbers (an)n6N , we have

∗ 2 X X X 2 X 2 anχ(n) 6 (Q + N) |an| , q6Q χ (mod q) n6N n6N

∗ where X means that the sum is taken over primitive Dirichlet characters only.

A slightly stronger form of this inequality is proved in [IK04, Theorem 7.13]. Due to the close relationship between form class groups and ideal class groups (see Section 1.2), the ideal class group characters, which we have introduced in Remark 1.17, are the essential harmonics for mean-value estimates for primes represented by binary quadratic forms. Real class group characters arise from convolutions of real Dirichlet characters (compare Section 2.3.4) and will be handled by means of Lemma 2.6. This is not possible for complex class group characters and therefore the following large sieve inequality for such characters will play a major role in the proofs of our variants of Bombieri–Vinogradov and Barban–Davenport– Halberstam type results.

Lemma 2.7. Let F(Q) denote the set of negative fundamental discriminants q 6≡ 0 (mod 8) 2 (q) satisfying |q| 6 Q. Set Hb1(q) = {χ ∈ Hb(q) | χ 6= χ0 } for all q ∈ F(Q) and [ Hb1([Q]) = Hb1(q). q∈F(Q)

For each χ ∈ Hb1([Q]) and each positive integer n, set X λχ(n) = χ(a). a∈Z(q) N(a)=n

Then 2   X X 3 1/2 5/2+ε X 2 anλχ(n) ε N(log N) + N (log N)Q |an| (2.2) n N n N χ∈Hb1([Q]) 6 6 for all complex numbers (an)n6N and all ε > 0, Q > 1 and N > 3. This result could essentially be regarded as a consequence of a mean-value estimate for automorphic representations by Duke and Kowalski [DK00, Theorem 4]: Since the numbers λχ(n) are coefficients of holomorphic cusp forms (for complex characters χ; this is not true for real class group characters) one might consider the correspondence between holomorphic cusp 2.2 A large sieve inequality for complex ideal class group characters 27 forms and automorphic representations (see [Kud03] or [Gel75], for example) and thus set out to deduce our result from theirs. However, we feel that using such a general result would disguise the fact that our result does not require any modern tools from the theory of automorphic representations. Moreover, it seems that a direct deduction from [DK00, Theorem 4] would require the involved modular forms to be of weight at least two while we need a bound for coefficients of weight-one cusp forms. Therefore, although we will in essence follow the proof of Duke and Kowalski, we will give a proof of Lemma 2.7 that uses only “classical” results about holomorphic cusp forms. Their proof (and therefore also ours) rests upon three principles: (i) The duality principle: This method basically exploits the fact that the norm of a bounded operator on a Hilbert space is the same as the norm of its adjoint and thus allows us to interchange the sums over χ and n. (ii) The technique of smoothing sums: When the asymptotic evaluation of sums of the type P a(n) is difficult, it often turns out that we can transform the problem at hand to n6N deal with smoothed sums of the type P a(n)φ(n), where φ is a smooth function which n>1 decays very fast for n larger than N. The smoothed sums can then often be understood by basic properties of harmonic analysis and the last ingredient: (iii) Rankin–Selberg theory: The underlying Rankin–Selberg method is a tool that allows one to obtain the meromorphic continuation and a functional equation for the Mellin transform of the constant term in the Fourier development of an automorphic function. As a result P −s of this, the meromorphic continuation and a functional equation of λ (n)λ 0 (n)n n>1 χ χ 0 can be found for all χ, χ ∈ Hb1([Q]). In our case, the properties of the involved Rankin–Selberg convolutions do not depend on deep facts from the theory of automorphic representations. Instead, our proof will make use of Li’s functional equation for L-functions associated to Rankin–Selberg convolutions of holomorphic cusp forms [Li79]. Proof of Lemma 2.7. Let φ be a smooth majorant of the characteristic function of the interval ∞ [0,N], i.e. a positive C function on [0, +∞) with compact support, 0 6 φ 6 1 and φ(n) = 1 for n 6 N. For all χ1 ∈ Hb1(q1), χ2 ∈ Hb1(q2) with q1, q2 ∈ F(Q), let χ1,2 be the product of the (unique) primitive real Dirichlet characters modulo |q1| and |q2|; χ1,2 is therefore a real Dirichlet character modulo the least common multiple of q1 and q2. Set X SN (χ1, χ2) = λχ1 (n)λχ2 (n)φ(n/N), n>1 X −s L(s; χ1, χ2) = λχ1 (n)λχ2 (n)n , n>1 LRS(s; χ1, χ2) = L(2s, χ1,2)L(s; χ1, χ2) for all s ∈ C with Re(s) > 1. The first L-function is the “naïve” convolution L-series of λχ1 (n) and λχ2 (n) (which equals λχ2 (n) for all integers n and is therefore real although χ2 is not a real character – a fact that we do not use in this chapter); the second L-function is known as the Rankin–Selberg convolution L-function. By the Mellin inversion theorem, we have Z Z 1 s 1 s LRS(s; χ1, χ2) SN (χ1, χ2) = N φb(s)L(s; χ1, χ2) ds = N φb(s) ds, (2.3) 2πi (2) 2πi (2) L(2s, χ1,2) where Z +∞ φb(s) = φ(x)xs−1 dx 0 28 The average distribution of primes represented by binary quadratic forms denotes the Mellin transform of φ (see [Kow04, §2.3] for this and the following basic properties of smooth cutoff functions and Mellin transforms). We would like to shift the line of integration on the right-hand side of (2.3) as far to the left as possible. Herefore, we need to know the growth behaviour of the functions in this integral: By the choice of φ, its Mellin transform φb decays faster than any polynomial in all vertical strips of the complex plane. Furthermore, we have 1 1  ζ(2σ + 2it)  (2.4) L(2(σ + it), χ1,2) 2σ − 1 1 uniformly in t ∈ R if σ > 2 . As for the Rankin-Selberg L-function LRS(s; χ1, χ2), we consider the functions X 2πizn fj(z) = λχj (n)e (j = 1, 2) (2.5) n>1 on the complex upper half plane. Since the involved class group characters χj are not real, we know (see [IK04, §14.3] or [BGHZ08, §4.3], for example) that the functions fj are normalized primitive holomorphic cusp forms of weight one, level qj and nebentypus χqj (i.e., the primitive real Dirichlet character modulo |qj|). Therefore we also know from classical Rankin–Selberg the- ory (see [Li79, Theorem 3.1]) that LRS(s; χ1, χ2) is an entire function if f1 6= f2 or, equivalently, 1 if χ1 6= χ2. In this case, it is therefore possible to shift the line of integration to Re(s) = 2 + α with α = (log N)−1. Thus, Z s LRS(s; χ1, χ2) SN (χ1, χ2)  N φb(s) ds . (1/2+α) L(2s, χ1,2)

The Rankin–Selberg L-function satisfies a functional equation which relates LRS(s; χ1, χ2) with LRS(1 − s; χ1, χ2) and we may deduce the convexity bound 2 1/2−α+ε LRS(1/2 + α + it; χ1, χ2) ε (q1q2(1 + |t|) ) (2.6) for every ε > 0 and all t ∈ R; we postpone the proof to Lemma 2.8, where we prove a slightly sharper bound. By the fast decay of φb and (2.4), we thus get 1/2 1+ε SN (χ1, χ2) ε N (log N)Q (2.7) if χ1 6= χ2. Remark. One could try to remove the extraneous ε from the exponent: Heath-Brown [HB09], using not much more than a modified variant of the classical Jensen formula, recently showed that one can eliminate the ε from the exponent in the convexity bounds for general (Selberg 1 class) L-functions on the critical line Re(s) = 2 . Thus, choosing α = 0 above would expunge the ε in (2.6). This would require us to give a lower bound for L(1 + it, χ1,2), which would reinsert ε a factor of (q1q2) by Siegel’s theorem whenever χ1,2 is an exceptional character; however, dealing with these exceptional cases separately – similar to Proposition 2.20 – it should be possible to show that these are rare events and that they contribute negligibly (but would lead to an ineffective overall result). Since this would only be a cosmetic, but not a substantial improvement of the result at hand, we will not delve into that. Also note that hitherto existing subconvexity bounds for Rankin–Selberg convolutions either require that one of the two involved cusp forms is fixed [HM06] or that one cusp form has a much smaller level than the other [HM13]. Although one may hope that more general results will be obtained in the future, these will probably only slightly improve our results (due to the saving of probably only a tiny power of the conductor) and will therefore be less important for us than for other applications. The best bound one could hope for in (2.6) is provided by the Lindelöf Hypothesis. We will state the resulting large sieve inequality in Lemma 2.10. 2.2 A large sieve inequality for complex ideal class group characters 29

If χ1 = χ2 ∈ Hb1(q), we use the bound X Y |λχ(n)| 6 1 6 (v + 1) = τ(n), (2.8) a∈Z(q) pv||n N(a)=n where the second inequality is due to the fact that each prime divisor p of n splits into at most √ two distinct prime ideals in the quadratic field Q( q). Therefore

X 2 3 SN (χ1, χ1) 6 τ(n) φ(n/N)  N(log N) , (2.9) n>1 where the implied constant is absolute (see [MV07, (2.31)], for example). Now that we have bounded SN (χ1, χ2) for all pairs χ1, χ2 ∈ Hb1([Q]), it remains to use a simple positivity argument and the duality principle in order to get the bound (2.2), which we originally set out to prove: For all complex numbers bχ, indexed by the characters χ ∈ Hb1([Q]), the positivity of φ yields

2 2 X X X X X bχλχ(n) 6 bχλχ(n) φ(n/N) = bχ1 bχ2 SN (χ1, χ2) n N n 1 6 χ∈Hb1([Q]) > χ∈Hb1([Q]) χ1,χ2∈Hb1([Q]) X X 2 2 6 |SN (χ1, χ2)||bχ1 ||bχ2 | 6 |SN (χ1, χ2)|(|bχ1 | + |bχ2 | ) χ1,χ2∈Hb1([Q]) χ1,χ2∈Hb1([Q])  X  X 2 6 2 max |SN (χ1, χ2)| |bχ2 | . χ2∈Hb1([Q]) χ1∈Hb1([Q]) χ2∈Hb1([Q]) We insert the bounds (2.7) and (2.9) into the right-hand side of this inequality and note that

X X 1/2 3/2 |Hb1([Q])| 6 h(q)  |q| (log |q|)  Q (log Q) (2.10) q∈F(Q) q∈F(Q) by the upper class number bound

h(q)  |q|1/2(log |q|), (2.11) which follows from the bound L(1, χq)  log |q| (see [MV07, Lemma 10.15], for example) and Dirichlet’s class number formula w|q|1/2L(1, χ ) h(q) = q , (2.12) 2π where w = 6 if q = −3, w = 4 if q = −4 and w = 2 if q < −4 (see [Dav00, §6], for example). Thus the bound 2   X X 3 1/2 5/2+ε X 2 bχλχ(n) ε N(log N) + N (log N)Q |bχ| n N 6 χ∈Hb1([Q]) χ∈Hb1([Q]) holds for all tuples (bχ) of complex numbers. By the duality principle (see [IK04, p. 171], χ∈Hb1([Q]) for example), this is equivalent to the statement of the lemma.

It remains to prove the bound (2.6) for the Rankin–Selberg L-function LRS: Based on the functional equation for Rankin–Selberg L-functions for convolutions of general holomorphic cusp forms (see [Li79] and (2.13)–(2.15) below) and the Phragmén–Lindelöf principle, we show 30 The average distribution of primes represented by binary quadratic forms that the following convexity bound holds for values of the analytic continuation of the Rankin– Selberg L-function

X −s LRS(s; χ1, χ2) = L(2s, χq1 χq2 ) λχ1 (n)λχ2 (n)n n>1 inside the critical strip:

Lemma 2.8. Let q1, q2 6≡ 0 (mod 8) be two negative fundamental discriminants. Let χ1 ∈ Hb1(q1) and χ2 ∈ Hb1(q2) be two distinct complex class group characters. For every ε > 0 we have  1−σ+ε q1q2 2 LRS(s; χ1, χ2) ε 1−ε · (1 + |t|) (q1, q2) for all s = σ + it with 0 6 σ 6 1 and t ∈ R. The implied constant is effectively computable. Proof. We start the proof by gathering the notation needed to formulate the functional equation [Li79, Theorem 2.2] for the Rankin–Selberg L-function LRS as it shows itself in our situation; although it is not necessary to have a copy of that paper at hand, we suppose that it might make following the proof a little bit easier. We will use the notation from [Li79] whenever there is no clash with the basic notation that we have used so far in this work. 7 Let q be the least common multiple of q1 and q2. Let M be the conductor of χq1 χq2 ; as this is a real Dirichlet character, we have M = [q1,q2] . Write q = MM 0M 00 such that every prime (q1,q2) 0 00 0 0 00 8 divisor p of M also divides M and (M ,MM ) = 1; hence M = 1 and M = (q1, q2). Note that q is squarefree if q 6≡ 0 (mod 4); otherwise, 22 is the only proper prime power 00 dividing q and we then have either 4 | M (if 4 divides both q1 and q2) or 4 | M. For each 0 prime divisor p of q, let R(p), R (p), R1(p) and R2(p) denote the largest power of p that divides 0 9 q, M , q1 and q2, respectively. Since no square of an odd prime can divide the levels of the 0 fundamental discriminants q1 and q2, we have R(p) = p, R (p) = 1 and ( p if p divides qj, Rj(p) = (j = 1, 2) 1 otherwise, for each odd prime p that divides q. Moreover, we set

−s θ(s; p, χ1, χ2) = 1 − χq1q2 (p)λχ1 (p)λχ2 (p)p

00 for each odd prime p that divides M and all s ∈ C. Finally, we have to check that the conditions A)–C) on page 141 in [Li79] are satisfied. The conditions B) and C) are both only concerned with prime divisors of M 0 and are therefore trivially satisfied in our case. Condition A) is 00 00 non-trivial if λχ1 (p) = λχ2 (p) = 0 for some prime divisor p of M . But if p divides M , then p √ √ 2 ramifies in both Q( q1) and Q( q2); thus pO(qj) = pj for some pj ∈ Z(qj) for j ∈ {1, 2}. Hence

λχj (p) = χj(pj) 6= 0 for each χj ∈ Hb(qj). Therefore, Condition A) is also trivially satisfied. We now quote the functional equation for LRS(s; χ1, χ2) from [Li79, Theorem 2.2]: We have

Ψ(s; χ1, χ2) = A(s; χ1, χ2)Ψ(1 − s; χ1, χ2), (2.13)

7 Note that q corresponds to the variable N in [Li79] although N is defined as the maximum of the levels q1 and q2 at the beginning of section 2 of that paper. We believe that this is a typographical error – which occurs at multiple places in this paper – as N is used throughout the paper as the least common multiple of the levels (see Example 2 on page 146, for example) and another interpretation of N would render the other definitions in section 2 and the proofs in section 5 of [Li79] meaningless. 8Note that this is only true because none of the two fundamental discriminants is an integer multiple of 8. 9 0 They are called Q, Q ,Q1,Q2, respectively, in Li’s paper. 2.2 A large sieve inequality for complex ideal class group characters 31 where

−2s 2 Y −1 Ψ(s; χ1, χ2) = (2π) Γ (s)θ(s; 2, χ1, χ2) θ(s; p, χ1, χ2) LRS(s; χ1, χ2) (2.14) p | M 00 p6=2 and

Y 1−2s Y 1−s −s −s A(s; χ1, χ2) = c2(s; χ1, χ2) B1(p, χ1, χ2)p B2(p, χ1, χ2)R(p) R1(p) R2(p) . p | M 00 p | M p6=2 p6=2 (2.15) Here B1 and B2 are functions which depend on p, χ1 and χ2 (but not on s) and always have absolute value 1; moreover, θ(s; 2, χ1, χ2) is a function that is bounded in the critical strip; 0 1−2s 0 and c2 is a function satisfying c2(s; χ1, χ2) = c2(χ1, χ2)4 for some function c2 with absolute value 1 if q is even, and c2 is constantly 1 otherwise. Remark. Note that Li’s general functional equation also contains a product over prime factors of M 0, which we have omitted since M 0 = 1 here. Moreover, the product over prime factors of M 00 in (2.15) appears in [Li79] there with an exponent “r(p)−m(p)”, which equals 1 here for all 00 odd primes p that divide M (this is because no proper power of an odd prime divides q1 and q2); this also explains the definition of θ above if one compares it with the original definitions of r(p), m(p) and θ on page 142 of [Li79].10 Set Y −1 LeRS(s; χ1, χ2) = θ(s; p, χ1, χ2) LRS(s; χ1, χ2). p|M 00

Since R1(p)R2(p) = p = R(p) for all odd primes p | M, the functional equation for LeRS therefore reads −2s 2  −2(1−s) 2  (2π) Γ (s)LeRS(s; χ1, χ2) = (2π) Γ (1 − s)LeRS(1 − s; χ1, χ2)   1−2s Y · |q| B3(p, χ1, χ2) p|q with |B3(p, χ1, χ2)| = 1 for all p | q. By the definition of the conductor ce(χ1, χ2) of the L-function LeRS(s; χ1, χ2), we get

−2s 2 1/2−s LeRS(s; χ1, χ2)(2π) Γ (s) 1−2s c(χ1, χ2) = = |q| ; e −2(1−s) 2 LeRS(1 − s; χ1, χ2)(2π) Γ (1 − s)

2 2 2 (q1q2) see [IK04, p. 94], for example. Hence c(χ1, χ2) = q = [q1, q2] = 2 . By the Phragmén– e (q1,q2) Lindelöf principle (see [Gol06, Theorem 8.2.3]), for example), we therefore have the convexity bound  1−σ+ε q1q2 2 LeRS(s; χ1, χ2) ε · (1 + |t|) (q1, q2) for all ε > 0 and all s = σ + it with 0 6 σ 6 1 and t ∈ R. In order to bound LRS(s; χ1, χ2) it Q 00 remains to find an upper bound for the product p|M 00 θ(s; p, χ1, χ2). Note that p | M implies

10The complexity of the general functional equation for Rankin–Selberg L-functions for convolutions of holo- morphic cusp forms displays the major drawback of considering these L-functions from the classical viewpoint and not using the correspondence to L-functions of automorphic representations, which usually take a more natural form (see [Mic07, §2.3] and the references there). The effort needed to apply this equation when q1 and q2 are not fundamental discriminants seems disproportionate and one would certainly be well-advised to translate the situation to the automorphic setting then. 32 The average distribution of primes represented by binary quadratic forms

√ √ both p | q1 and p | q2, i.e. p ramifies in both Q( q1) and Q( q2). Thus, |λχ1 (p)| = |λχ2 (p)| = 1 for all p | M 00. Hence,

Y Y −s 00 ε ε θ(s; p, χ1, χ2)  (1 + p ) ε (M ) = ((q1, q2)) p|M 00 p|M 00 p6=2 for all ε > 0 and all s = σ + it with 0 6 σ 6 1 and t ∈ R. This yields the stated bound for LRS(s; χ1, χ2).

Remark 2.9. (a) Harcos and Michel [HM06, p. 582] mention that the bounds

2 2 (q1q2) (q1q2) 4 6 c(χ1, χ2) 6 (q1, q2) (q1, q2)

for the conductor of LRS(s; χ1, χ2) can be derived using the local Langlands correspond- ence. This yields basically the same convexity bound as above.

(b) The Lindelöf Hypothesis (see [IK04, Corollary 5.20], for example) yields

ε 2ε 2ε 2ε LRS(1/2 + it; χ1, χ2) ε c(χ1, χ2) (1 + |t|)  (q1q2) (1 + |t|) . (2.16)

1/2 ε This gives SN (χ1, χ2)  N Q in (2.7) and therefore we have the following conditional large sieve inequality: Lemma 2.10. If the Lindelöf Hypothesis holds for Rankin–Selberg convolutions of holo- morphic cusp forms of weight one, then

2   X X 3 1/2 3/2+ε X 2 anλχ(n) ε N(log N) + N Q |an| n N n N χ∈Hb1([Q]) 6 6

for all complex numbers (an)n6N and all ε > 0, Q > 1 and N > 3.

Given the fact that the essentially best-possible large sieve inequality for Dirichlet char- acters, Lemma 2.6, can be proved unconditionally, there is some reason to hope that it might be possible to improve Lemma 2.7 without employing any kind of subconvexity bounds for the involved L-functions. Remark. The technique of the proof of Lemma 2.7 does not look promising for real class group characters, which do not give a cusp form in (2.5) but an Eisenstein series. Indeed, we seem to accumulate too many poles of the corresponding Rankin–Selberg L-function: If χ is a real

0 class group character in Hb(q1), then λχ is the Dirichlet convolution χd1 ∗ χd of two Dirichlet 1 0 characters modulo the absolute values of two fundamental discriminants d1 and d1 such that 0 d d = q (compare Section 2.3.4); denote this character χ by χ 0 . The Rankin–Selberg 1 1 1 d1,d1 L-function

L (s; χ 0 , χ 0 ) = L(s, χ χ )L(s, χ χ 0 )L(s, χ 0 χ )L(s, χ 0 χ 0 ) RS d1,d1 d2,d2 d1 d2 d1 d2 d1 d2 d1 q2 has a pole at s = 1 whenever (d d0 , d d0 ) > 1. Hence, a term of size Qd+εN P |a |2 accrues 1 1 2 2 n6N n d for every discriminant q1 that shares a common factor with Q other discriminants in F(Q). We will therefore use a different method to cope with real class group characters. 2.2 A large sieve inequality for complex ideal class group characters 33

Remark. There exist other large sieve inequalities for algebraic number fields. For instance, Schumer’s [Sch86] general inequality with explicit dependence of the constants on the parameters of the underlying fixed field yields

2 X X 1/2  X 2 c(a)χ(a)  (log |q|) |q| + |qN| + N |c(a)|

a∈Z(q) a∈Z(q) χ∈Hb1(q) N(a)6N N(a)6N for any fixed q ∈ F and any function c on Z(q). However, the mean-value results of the next sections consider situations where the underlying number fields vary and therefore also require a large sieve inequality which has an extra averaging over the discriminant. To our knowledge, Lemma 2.7 is the first large sieve inequality for varying number fields. Similarly, there also exist other large sieve inequalities for modular forms of a fixed level (see [Mic07, §3.1.3], for example).

In the proof of our variant of the Bombieri–Vinogradov theorem we will need the large sieve inequality for complex class group characters in the following form:

Corollary 2.11. Let (a ) be a complex sequence with P |a | < ∞. Let Q 1, k 2, c 1 n n>1 n > > > 2 and ε > 0. Then

Z 2 X X −s −(k+1) ε X 2 1−2c 1/2−2c 5/2 3 anλχ(n)n |s| |ds| ε Q |an| (n + n Q ) 1 + (log n) .

(c) n 1 n 1 χ∈Hb1([Q]) > > (2.17) Moreover, we have

Z 2 X X −s −(k+1) ε X 2 1−2c 1/2−2c 3/2 3 anλχ(n)n |s| |ds| ε Q |an| (n + n Q ) 1 + (log n)

(c) n 1 n 1 χ∈Hb1([Q]) > > (2.18) if the Lindelöf Hypothesis holds.

Proof. Let (b ) be a complex sequence with P |b | < ∞ and T 1. Using Lemma 2.7, we n n>1 n > deduce

Z T 2 X X it ε X 2 1/2 5/2 3 bnλχ(n)n dt ε TQ |bn| (n + n Q ) 1 + (log n) (2.19)

−T n 1 n 1 χ∈Hb1([Q]) > >

−c as in [Bom87, Théorème 10]. Set bn = ann and T = 1 in (2.19). Then

Z c+i 2 X X −s −(k+1) anλχ(n)n |s| |ds|

c−i n 1 χ∈Hb1([Q]) > Z 1 2 X X it −(k+1) = bnλχ(n)n |c − it| dt (2.20)

−1 n 1 χ∈Hb1([Q]) > ε X 2 1−2c 1/2−2c 5/2 3 ε Q |an| (n + n Q ) 1 + (log n) . n>1 34 The average distribution of primes represented by binary quadratic forms

Let m be a positive integer. Then, again by (2.19) with T = m + 1, we have

 Z c−mi 2 X X −s −(k+1) anλχ(n)n |s| |ds|

c−(m+1)i n 1 χ∈Hb1([Q]) > Z c+(m+1)i 2  X −s −(k+1) + anλχ(n)n |s| |ds| c+mi n>1  Z −m 2 Z m+1 2  1 X X it X it  bnλχ(n)n dt + bnλχ(n)n dt mk+1 −(m+1) n 1 m n 1 χ∈Hb1([Q]) > > Z m+1 2 1 X X it  bnλχ(n)n dt mk+1 −(m+1) n 1 χ∈Hb1([Q]) > Qε  X |a |2(n1−2c + n1/2−2cQ5/2)1 + (log n)3. ε mk n n>1

Since k > 2 we may sum over m from 1 to ∞ here. Together with (2.20) this leads to the bound (2.17). 5 3 By Lemma 2.10, the exponents which are equal to 2 above may be replaced by 2 if we assume the Lindelöf Hypothesis and this yields the bound (2.18).

2.3 Results of Bombieri–Vinogradov type

Being now equipped with the most important ingredient for proofs of average results on the distribution of primes, we can now proceed to primes represented by positive definite binary quadratic forms, for which we will prove that the error term in the corresponding prime num- ber theorem, Theorem 1.2, is small on average over negative fundamental discriminants. The admissible range for the discriminants will, however, be considerably smaller than in the original Bombieri–Vinogradov theorem due to the comparatively weak large sieve result of the preceding section.11

2.3.1 Statement and interpretation of results In the first two results we will consider smoothed versions of the Chebyshev function for integers represented by positive definite binary quadratic forms. We demonstrate that these functions are “well distributed”12 with respect to the form classes of almost all negative fundamental 10+ε −B discriminants q 6≡ 0 (mod 8) with |q| 6 X(log X) , where ε > 0 is arbitrarily small and B is some positive number which depends on the power of (log X) that we wish to save over “trivial” bounds (see Remark 2.18). Moreover, we may even save a positive power of X, if we confine ourselves to sets M(Q) of negative fundamental discriminants with absolute value less than Q such that no (positive or negative) fundamental discriminant has many integer multiples in M(Q).

11After the defence of this thesis, the author noticed two rather simple means to improve this range (and consequently also some of the results of Section 2.5): First, the inequality (2.17) can be improved by noting that the large sieve inequality of Lemma 2.7 is worse than the “trivial” bound for the left-hand side of (2.2) if N is 1 small there. Second, the line Re(s) = 2 is not an optimal choice for the lines of integration inside the critical strip of the integrals that arise from “Gallagher’s identity” (see (2.44)). We refer the reader to [Dit13] for details. 12The notion of “well-distribution” is used here as a vague but suggestive description for the results of this section (as suggested by the discussion in Section 1.2). We will discuss and specify this notion in Section 2.4. 2.3 Results of Bombieri–Vinogradov type 35

Definition 2.12. For any Q > 1, let M(Q) be a subset of F(Q), the set of all negative fundamental discriminants q 6≡ 0 (mod 8) with |q| 6 Q. We say that ν ∈ [0, 1] is a divisor frequency of M(Q) if it satisfies the property: The cardinality of the set {q ∈ M(Q): q0 | q} is bounded by Qν for each 0 0 (2.21) (positive or negative) fundamental discriminant q with 1 < |q | 6 Q. For all X > 3, all q ∈ F, all C ∈ K(q) and all integers k > 0, we define 1  X k ψ (X; q, C) = X Λ(n) log w(C, n), (2.22) k k! n n6X where w(C, n) is given by (1.4) (see also (1.5) and (1.6)). 1 X k Remark. This type of smoothing with the weight factor k! (log n ) (so-called Riesz typical means) arises from the inverse Mellin transform of the Dirichlet series P Λ(n)w(C, n)n−s n>1 with the kernel s−(k+1); see [MV07, §5.1], for example. We also recall that the factor w(C, n) accounts for the fact that every positive integer n may usually be represented by forms from different form classes of a given discriminant or by forms from no class at all; see Remark 1.8. Let Q ≈ X1/(15−5ν) and let M(Q) ⊆ F(Q) be a set with divisor frequency ν ∈ (0, 1]. We will show that the smoothed and weighted Chebyshev functions ψk(X; q, C) are well distributed with respect to the form classes to most discriminants q ∈ M(Q) and the error term is at most Qν/2 −A |M(Q)| · X(log X) (with A > 0 arbitrarily large) for most q ∈ M(Q): Theorem 2.13. Let M(Q) ⊆ F(Q) for some Q > 1 and let ν ∈ (0, 1] be a divisor frequency of M(Q). For every integer k > 2, every (arbitrarily large) real number A > 0 and every (arbitrarily small) real number ε > 0, there exists a real number B = B(A) such that

X 1 X ν/2 −A max max ψk(Y ; q, C) − ψk(Y ; q, K)  Q X(log X) (2.23) C∈K(q) Y X h(q) q∈M(Q) 6 K∈K(q)

(15−5ν)+ε −B for Q 6 X(log X) . The implied constant depends on ε, A, k and ν; the dependence on ε is effective, the dependence on A, k and ν is non-effective. The constant B is explicitly computable; in particular, one may choose B = 10A + 190. If the set M(Q) is composed of (odd) negative prime discriminants, then M(Q) has divisor frequency ν = 0. In this case we just fail to achieve (2.23) with ν = 0. Nevertheless, it is worth recording that the proof of Theorem 2.13 yields

Theorem 2.14. Let Q > 1 and let Π(Q) be the set of all odd negative prime discriminants whose absolute value is at most Q. For every integer k > 2 and every (arbitrarily small) real number ε > 0, we may find an absolute constant B such that

X 1 X k+3 max max ψk(Y ; q, C) − ψk(Y ; q, K) ε,k X(log X) C∈K(q) Y X h(q) q∈Π(Q) 6 K∈K(q)

15+ε −B for Q 6 X(log X) . 2 1−q 2 To put this last result into perspective, we consider the form fq(x, y) = x + xy + 4 y for each negative fundamental discriminant q ≡ 1 (mod 4). Note that fq lies in the principal class C0(q) of K(q) then and consider the function  X 2 S (X) = X log(p) log , q p p6X ∃x,y∈Z: fq(x,y)=p 36 The average distribution of primes represented by binary quadratic forms

which gives a smoothed and weighted count of the primes represented by fq up to X. By Remark 1.9, we have

 X 2 S (X) = 1 X Λ(n) log w(C , n) + O(X1/2(log X)3) q 2 n 0 n6X ∃x,y∈Z: fq(x,y)=n for all negative fundamental discriminants q ≡ 1 (mod 4) with |q| 6 X. Thus, Theorem 2.14 1/15 implies that, for most negative prime discriminants q with |q| 6 Q ≈ X , the function Sq(X) deviates from the (expectable) average function

1  X 2 X e(K) X log(p) log , 2h(q) p K∈K(q) p6X p∈R(q,K) where the factor e(K) was defined in (1.7), by only a small amount at most – and the sum (over q ∈ Π(Q)) of these discrepancies is a positive power of X smaller than “trivial” estimates can guarantee. We will quantify this improvement in Remark 2.18. Unfortunately, this saving of a power of X is not possible for the analogous smoothed and weighted count of primes represented by the forms of the shape x2 + ny2 because the divisor −4 of the corresponding discriminants −4n occurs too often, i.e. we have to choose ν = 1 (if we do not want ν to depend on Q) in condition (2.21) and therefore only save a power of (log X). If ν < 1, then it does not seem to be possible to unsmooth these results, i.e. to take k = 0, while keeping the given estimates. This is because the unsmoothing process (see Section 2.3.6) produces a term of size Q1/2X(log X)−D (where D is an arbitrary positive number). However, for ν = 1, i.e. for arbitrary sets M(Q) of negative fundamental discriminants, these extra terms of size Q1/2X(log X)−D are not too large and we can obtain from Theorem 2.13 the following result, which has more resemblance to the original Bombieri–Vinogradov theorem:

Theorem 2.15. For all q ∈ F and all C ∈ K(q), define

ψ(X; q, C) = X Λ(n). n6X n∈R(q,C)

Let F(Q) be the set of all negative fundamental discriminants q 6≡ 0 (mod 8) with |q| 6 Q. Let A > 0 be an arbitrarily large real number, let ε > 0 be an arbitrarily small real number and let e(C) be defined by (1.7). Then there exists a real number B = B(A) such that

X Y 1/2 −A max max ψ(Y ; q, C) − ε,A Q X(log X) C∈K(q) Y X e(C)h(q) q∈F(Q) 6

10+ε −B for Q 6 X(log X) . By standard methods, the function ψ(X; q, C) can be replaced by the prime counting function

π(X; q, C) = |{p 6 X prime | ∀f ∈ C : f(x, y) = p for some x, y ∈ Z}|. This lets us interpret Theorem 2.15 as an average result for the error term in the prime number theorem for primes represented by positive definitive binary quadratic forms and therefore as an analogue of Theorem 2.3: 2.3 Results of Bombieri–Vinogradov type 37

Theorem 2.16. Let A > 0 be an arbitrarily large real number, let ε > 0 be an arbitrarily small real number and let e(C) be defined by (1.7). Then there exists a real number B = B(A) such that

X li(X) 1/2 −A max π(X; q, C) −  Q X(log X) C∈K(q) e(C)h(q) q∈F(Q)

10+ε −B for Q 6 X(log X) . The implied constant depends on ε and A; the dependence on ε is effective, the dependence on A is non-effective. The constant B is explicitly computable; in particular, one may choose B = 40A + 220.

Remark. Recall here that the weight e(C) equals w(C, n) whenever n is a prime that does not divide q and it is therefore also the weight factor in the asymptotic behaviour of π(X; q, C) that is necessary to compensate for the distinct behaviour of ambiguous and non-ambiguous classes in the representability of primes.

Under the assumption of the Lindelöf Hypothesis, we have seen in Section 2.2 that a stronger large sieve inequality holds for complex class group characters. It yields an increased range for the fundamental discriminants in the statements above:

Theorem 2.17. Assume the Lindelöf Hypothesis for Rankin–Selberg convolutions of holo- morphic cusp forms of weight one (or more specifically: assume that estimate (2.16) holds for all pairs of distinct complex class group characters). Then

6−3ν+3ε −B 1 7−5ν+5ε (a) Theorem 2.13 holds with Q 6 X(log X) if ν > 2 + ε and with Q 6 −B 1 X(log X) if ν 6 2 + ε; 7+ε −B (b) Theorem 2.14 holds with Q 6 X(log X) ; 3+ε −B (c) Theorems 2.15 and 2.16 hold with Q 6 X(log X) . We will justify these conditional improvements in Remarks 2.26 and 2.27.

Remark 2.18. How do these results compare to “trivial” estimates? There is no upper bound for the number of primes p 6 X represented by a given binary quadratic form which is as trivial X as the estimate π(X; q, a) 6 q + 1 for primes in arithmetic progressions, where the right-hand side of the inequality is simply the number of integers in the given arithmetic progression (or this number plus one). However, the bound

X X 1  + X1/2 p|q| n6X n∈R(q,C)

(see (1.14)) can be proved by elementary means and may therefore be considered as a suitable substitute for a completely trivial bound. This estimate and (1.6) give the bound

 1  O X(log X)k+1 X (2.24) p|q| q∈M(Q) for the left-hand side of (2.23) in Theorem 2.13. If M(Q) = F(Q) (note that |F(Q)|  Q), we beat this bound by an arbitrary power of (log X). As for Theorem 2.15 and Theorem 2.16, it is known that the class number h(q) has the lower bound |q|1/2(log |q|)−1  h(q) (2.25) 38 The average distribution of primes represented by binary quadratic forms

1/2−ε if the primitive real Dirichlet character modulo q is not exceptional, and |q| ε h(q) for all ε > 0 if it is exceptional; see Section 2.3.2. Thus, we get the “trivial” bound

1/2+ε  Oε Q X(log X) in Theorem 2.15. However, since exceptional discriminants are very rare (see Proposition 2.20), it is reasonable to use (2.25) and aim to improve on the “trivial” bounds

OQ1/2(log Q)X(log X) for Theorem 2.15 and OQ1/2(log Q)X for Theorem 2.16 – which we indeed improve by an arbitrary power of (log X), just as in the original Bombieri–Vinogradov theorem. As we have already mentioned above, we do even better if ν < 1 because we then beat the bound   Q 1/2 O X(log X)k+1 · , (2.26) log Q which (2.24) yields for M(Q) = Π(Q), for example, by a positive power of Q (and therefore by a positive power of X): Let X be large, Q = X1/16 and k = 2; then Theorem 2.14 beats (2.26) 1/2  Q  −2 1/32−ε by a factor of size log Q (log X) ε X for all arbitrarily small ε > 0. This result is unusual as it does not seem to be possible to achieve a saving of a positive power of X over the trivial bound for the corresponding smooth version of the original Bombieri–Vinogradov theorem. Something that we do not achieve is to prove the conditional error term in Theorem 1.16 “on average” as the original Bombieri–Vinogradov theorem is capable to do for primes in arithmetic progressions. Due to the shorter range for q in Theorem 2.16, we may only deduce:

10+ε −B Corollary 2.19. Let X > 3, A > 0 and ε > 0. Let Q satisfy Q = X(log X) with 19+2ε B = B(A) as in Theorem 2.16. Set d = 20+2ε . Then there exist constants D1 = D1(A, ε) > 0 and D2 = D2(A, ε) > 0 such that

li(X) π(X; q, C) = + O Xd(log X)−D1  e(C)h(q) ε,A for all form classes of all negative fundamental discriminants q 6≡ 0 (mod 8) satisfying |q| 6 Q, with the possible exception of at most Q(log X)−D2 discriminants in this range.

Remark. One reason for the comparatively short ranges that are, for now, admissible for the discriminants in our results may be found in the fact that the size of a form class group is much smaller than the corresponding discriminant. This offers therefore less potential for possible cancellation effects than in the case of arithmetic progressions where the number of reduced residue classes of a modulus is usually only slightly smaller than the modulus itself.

In order to prove Theorems 2.13 and 2.14, and in consequence also for the other statements of this section, we will largely follow Gallagher’s proof of the original Bombieri–Vinogradov theorem as presented by Bombieri in [Bom87, §7]. The key ingredients will be:

(i) The large sieve inequality for complex ideal class group characters, which we have found in Section 2.2; see Section 2.3.3. 2.3 Results of Bombieri–Vinogradov type 39

(ii) The original Bombieri–Vinogradov theorem itself, which we will use to estimate the con- tribution coming from real ideal class group characters; see Section 2.3.4. (iii) Landau’s theorem on the scarcity of exceptional moduli, that is, the rarity of integers q for which there could possibly exist a Dirichlet character χ modulo q whose associated L-function has a Landau–Siegel zero; see Proposition 2.20 in the next subsection. (iv) A result of Siegel–Walfisz type for ideal class group characters by Goldstein; see Lemma 2.22 in the next subsection. The use of this result and the original Siegel–Walfisz theorem account for the ineffectivity of all results in this chapter. We choose to follow Gallagher’s method as it is sufficient for the investigation of primes represented by binary quadratic forms and it is slightly easier (to this author’s mind) than the “modern” proof of the original Bombieri–Vinogradov theorem as presented in [IK04, §17] and [Bom87, §12]. The modern proof is characterized by a more systematic use of bilinear forms and it is thus capable to yield more general results on equidistribution in arithmetic progressions (see [IK04, Theorem 17.4], for example). We will get an impression of such results in Section 2.4, where we prove a result on the mean square distribution (over form classes) for more general arithmetic functions than the prime counting function and (smooth) versions of the Chebyshev function, which we consider in the theorems above. However, a comparison between our Theorem 2.30 and [IK04, Theorem 17.5], for example, shows that our restrictions on these functions are quite severe and an analogous version of [IK04, Theorem 17.4] might therefore be quite inconvenient to use. Nevertheless, it should be possible to fit the ingredients (i)-(iv) that we mentioned above into a proof of Theorem 2.15 that follows [IK04, §17]. It is less clear whether, by this method, we could also achieve its smooth version, which saves a positive power of X (or Q) in cer- tain situations as in Theorem 2.14: In the modern proof one splits certain bilinear sums into small boxes and “excess areas”; the latter are trivially estimated (and no other estimate seems attainable), which prevents at that point the saving of a positive power of X.

2.3.2 Preliminaries Let A > 0 (arbitrarily large) and ε > 0 (arbitrarily small; for simplicity we will also assume 1 ε 6 4 in the end) be real numbers; let k > 2 be an integer; let M(Q) ⊆ F(Q) be a set of negative fundamental discriminants q 6≡ 0 (mod 8) with divisor frequency ν ∈ [0, 1]. These numbers will be considered as fixed parameters which the implied constants in the estimates of this and the subsequent subsections may depend on; the constants will not depend on the positive real numbers X and Q, for which we always assume that Q 6 X. We have already mentioned in Section 1.2 that questions on primes that are represented by binary quadratic forms are usually best dealt with by looking at a similar problem for ideals in the corresponding quadratic fields. The weights w(C, n) in the definition of the functions ψk(X; q, C) already put these functions into the right form for this transition. Indeed, by definition (1.4) of w(C, n), we have

1  X k ψ (X; q, C) = X Λ(N(a)) log . k k! N(a) a∈Bq(C)∩Z(q) N(a)6X for all q ∈ F(Q) and all C ∈ K(q). For ease of notation we set

1 X Ek(X; q) = max max ψk(Y ; q, C) − ψk(Y ; q, K) . (2.27) C∈K(q) Y X h(q) 6 K∈K(q) 40 The average distribution of primes represented by binary quadratic forms

Thus, if the assumptions of Theorem 2.13 and Theorem 2.14 hold, we have to prove the bounds

X ν/2 −A Ek(X; q)  Q X(log X) (2.28) q∈M(Q)

(15−5ν)+ε −B(A) if ν > 0 and Q 6 X(log X) , and X k+3 Ek(X; q)  X(log X) (2.29) q∈Π(Q)

15+ε −B if Q 6 X(log X) . We start the proof of both (2.28) and (2.29) by using the orthogonality property of the ideal class group characters. We know from Section 1.2 that H(q) is a finite abelian group and so is the group of ideal class group characters Hb(q) 'H(q). Define 1  Y k ψ (Y ; q, χ) = X Λ(N(a))χ(a) log k k! N(a) a∈Z(q) N(a)6Y for all Y 6 X, all q ∈ F(Q), all χ ∈ Hb(q) and all k > 0. The orthogonality property of the characters of finite abelian groups (see [IK04, §3.1], for example) yields !  X k 1 ψ (Y ; q, C) = X Λ(N(a)) log X χ(B (C))χ(a) k N(a) h(q) q a∈Z(q) χ∈Hb(q) N(a)6Y (2.30) 1 = X χ(B (C))ψ (Y ; q, χ) h(q) q k χ∈Hb(q) for all q ∈ F(Q) and all C ∈ K(q). Together with the triangle inequality we thus get

X X 1 X Ek(X; q) 6 max |ψk(Y ; q, χ)| . Y X h(q) (2.31) q∈M(Q) 6 q∈M(Q) (q) χ6=χ0 Remark. Here we have to ignore the possibility that the sum on the right side of (2.30) is presumably cancelling. This is a defect which is also characteristic of all proofs of the original Bombieri–Vinogradov theorem and presumably the superficial reason why it is not possible to extend its range beyond the square-root barrier.

As before, for every fundamental discriminant q 6≡ 0 (mod 8), we let χq denote the unique primitive real Dirichlet character modulo |q|. By Siegel’s theorem (see [MV07, Theorem 11.14], −ε for example), we have the unconditional, non-effective lower bound |q| ε L(1, χq) for the corresponding Dirichlet L-function. This yields the lower class number bound

1/2−ε |q| ε h(q) (2.32) by Dirichlet’s class number formula (2.12). Yet, there exists a better bound for many q and it turns out that the contribution from the other discriminants is often negligible: We know (see [MV07, Theorem 11.3]) that there exists an absolute constant c1 > 0 such that, for any q ∈ F, the Dirichlet L-function L(s, χq) has at most one zero in the set n c o s = σ + it ∈ : σ 1 − 1 . C > log |q|(|t| + 4) 2.3 Results of Bombieri–Vinogradov type 41

The potential only zero in this region (for a fixed admissible value of c1) is called the exceptional zero or Landau–Siegel zero for the modulus |q| and χq is then called an exceptional character; it is conjectured that no such zero and character exist for any modulus. Moreover, there exists c2 > 0 −1 such that L(1, χq) > c2(log |q|) if L(s, χq) has no exceptional zero (see [MV07, Theorem 11.4]). Thus, by the class number formula (2.12), there exists c3 > 0 such that

1/2 −1 |q| (log |q|) 6 c3h(q) (2.33) holds for all q ∈ F for which L(s, χq) has no exceptional zero. We fix such a value of c3. We now give an upper bound for the contribution to the right side of (2.31) coming from the (presumably empty) set Fex(Q) ⊂ F(Q) of exceptional fundamental discriminants; here we call q ∈ F exceptional if it fails to satisfy (2.33) for the fixed value of c3 (and therefore L(s, χq) has an exceptional zero then).

Proposition 2.20. Let Mex(Q) = Fex(Q) ∩ M(Q) be the (possibly empty) subset of exceptional fundamental discriminants of M(Q). Then we have

X 1 X k+3 max |ψk(Y ; q, χ)|  X(log X) . Y 6X h(q) q∈Mex(Q) (q) χ∈Hb(q)r{χ0 } In particular, exceptional discriminants contribute acceptably to the right side of (2.31) if either (2A+2k+6)/ν ν > 0 and Q > (log X) or ν = 0. Remark 2.21. The case Q < (log X)(2A+2k+6)/ν will be dealt with later on by means of an appropriate Siegel–Walfisz type theorem; see Remark 2.23 below. Moreover, note that if ν = 0, then this contribution would not be negligible in Theorem 2.13, which is why we get the slightly weaker bound in Theorem 2.14.

Proof. Let q1 be an exceptional modulus. By a theorem of Landau (see [MV07, Corollary 11.9]), 2 we know that there cannot exist an exceptional modulus q with q1 < q < q1. Thus, there can be at most O(log log Q) exceptional moduli which are smaller than Q. Using standard estimates (see (2.8)), we also have

 k X Y X k X k+2 ψ (Y ; q, χ) Λ(n) log χ(a) (log X) log(n)τ(n)  X(log X) k 6 n 6 n6Y a∈Z(q) n6X N(a)=n for all q ∈ Mex(Q) and all χ ∈ Hb(q), and the first assertion follows immediately. (2A+2k+6)/ν If ν > 0 and Q > (log X) , then

k+3 ν/2 −A X(log X) 6 Q X(log X) , i.e. the contribution from exceptional discriminants is acceptable for Theorem 2.13.

Therefore it remains to estimate the contribution from non-exceptional discriminants on the right side of (2.31), i.e. we have to bound

X 1 X max |ψk(Y ; q, χ)| , (2.34) Y X h(q) 6 q∈M 0(Q) (q) χ6=χ0 where 0 M (Q) = M(Q) r Mex(Q) 42 The average distribution of primes represented by binary quadratic forms or 0 M (Q) = Π(Q) r Mex(Q), and we will show that it is bounded above by

Qν/2X(log X)−A (2.35) for both ν > 0 and ν = 0.

If Q is very small, a uniform bound for ψ0(X; q, χ) exists, which easily yields the desired bound for (2.34); the following is a special case of Goldstein’s generalization of the Siegel–Walfisz theorem [Gol70]:

D Lemma 2.22 (Goldstein). Suppose that q ∈ F with |q| 6 (log X) for some positive constant D. Then −2D ψ0(X; q, χ) D X(log X) for all non-trivial class group characters χ ∈ Hb(q). The implied constant does not depend on q or χ, but is ineffective.

D So suppose that Q = (log X) for some D > A + k. We have Z Y dt k ψk(Y ; q, χ) = ψk−1(t; q, χ)  max |ψ0(y; q, χ)| · (log Y ) . 1 t y6Y Summing over q ∈ M 0((log X)D), Lemma 2.22 therefore yields the upper bound (2.35) for (2.34) if Q = (log X)D. Remark 2.23. We have now proved that the bounds in both Theorem 2.13 and Theorem 2.14 D 0 hold for Q 6 (log X) =: Q0 and it remains to bound (2.34) with M (Q) replaced by 00 0 M (Q) := M (Q) ∩ {q : |q| > Q0} for a value of D that we will choose later; compare (2.67). We already record that, because of Remark 2.21, we must choose D at least as large as D1 := (2A + 2k + 6)/ν if ν > 0. If ν = 0, we will have to choose some D > D1 := A + k to guarantee the bound (2.35) for (2.34) (which is more than enough for Theorem 2.14).

We recall that we have defined in Remark 1.17 the numbers λχ(n) and the class group L-functions L(s, λχ) for all class group characters. The expansion of the logarithmic derivative of such an L-function is given by

0 L X (s, λ ) = − Λ(e a)χ(a)N(a)−s, L χ a∈Z(q) where (log N(p) if a = pm for some prime ideal p ∈ Z(q) and some integer m, Λ(e a) = (2.36) 0 otherwise.

L0 Similarly to the computation of the inverse Mellin transform of L , one can show (see [MV07, (5.22)], for example) that

Z 0  k 1 L s −(k+1) 1 X Y − (s, λχ)Y s ds = Λ(e a)χ(a) log =: ψek(Y ; q, χ) (2.37) 2πi (c) L k! N(a) a∈Z(q) N(a)6Y 2.3 Results of Bombieri–Vinogradov type 43

for all c > 1. This does not equal ψk(Y ; q, χ), but we miss it only by a negligible margin: Set

 Y k c(a) = χ(a) log N(a) and note that we have

X X X X 2 X X ` k! ψek(Y ; q, χ) = (log p)c(p) + (log p )c(p) + log(N(p))c(p ) p6Y p∈Z(q) p6Y 1/2 p∈Z(q) `>2 p∈Z(q) 2 ` N(p)=p N(p)=p N(p) 6Y = X X (log p)c(p) + O(Y 1/2(log Y )k+2) p6Y p∈Z(q) N(p)=p and

X X X k! ψk(Y ; q, χ) = (log p)c(a) `>1 p6Y 1/` a∈Z(q) N(a)=p` = X X (log p)c(p) + X X (log p) X c(a) p6Y p∈Z(q) `>2 p6Y 1/` a∈Z(q) N(p)=p N(a)=p` = X X (log p)c(p) + O(Y 1/2(log Y )k+3). p6Y p∈Z(q) N(p)=p

Hence 1/2 k+3 ψk(Y ; q, χ) = ψek(Y ; q, χ) + O(Y (log Y ) ). (2.38) Summing over q ∈ M 00(Q), the contribution of the remainder terms is  QX1/2(log X)k+3 in (2.34) if we replace ψk(Y ; q, χ) by ψek(Y ; q, χ) there. But this is negligible in (2.28) and (2.29). Thus it remains to estimate

X 1 X max |ψek(Y ; q, χ)|. (2.39) Y X h(q) 6 q∈M 00(Q) χ∈Hb(q) (q) χ6=χ0

Next, we split (2.39) into

X 1 X X 1 X max |ψek(Y ; q, χ)| + max |ψek(Y ; q, χ)| Y X Y X 6 00 h(q) 6 00 h(q) q∈M (Q) χ∈H(q) q∈M (Q) χ∈H(q) {χ(q)} b b r 0 (2.40) χ26=χ(q) 2 (q) 0 χ =χ0 0 00 = Ek(Q, X) + Ek (Q, X), say, i.e. we split it into sums over complex class group characters and sums over real class group characters. We will estimate both terms separately in the next two subsections and show that they are both bounded above by (2.35). Together with the results for exceptional discriminants (Proposition 2.20) and small discriminants (Remark 2.23) we may then conclude that (2.28) and (2.29) hold. 44 The average distribution of primes represented by binary quadratic forms

Remark 2.24. The only other work known to the author that considers a somewhat related average over fundamental discriminants and class group characters is [FI03a]: Let φ be a smooth even test function on R whose Fourier transform φb is compactly supported. On the assumption of the Generalized Riemann Hypothesis for class group L-functions, Fouvry and Iwaniec consider 1  γ  B(q; φ) := X X φ χ (log |q|) , h(q) γ 2π χ∈Hb(q) χ

1 where the inner sum is over the imaginary parts of the non-trivial zeros ρ = 2 + iγχ of L(s, λχ). They conjecture that Z ∞  sin 2πx B(q; φ) = φ(x) 1 − dx + o(1) (2.41) −∞ 2πx as q → −∞ in the set of fundamental discriminants. In fact, they verify the conjecture when φb is supported in [−1, 1] and then go on to establish it for almost all negative fundamental discriminants if φb is supported in a wider interval, i.e. beyond the discontinuities of the Fourier  sin 2πx  transform of 1 − 2πx at ±1: Theorem (Fouvry–Iwaniec, [FI03a, Theorem 1.2]). Assume the Generalized Riemann Hypo- 4 thesis for class group L-functions and Dirichlet L-functions. Let 0 < ϑ < 3 and let φ be a test function with φb compactly supported in (−ϑ, ϑ). Let Q > 3 and let M(Q) be a set of squarefree ϑ−1/3 negative integers q ≡ 1 (mod 4) with Q < −q 6 2Q and |M(Q)| > Q . Then Z ∞   2 X sin 2πx log log Q B(q; φ) − φ(x) 1 − dx ϑ,φ |M(Q)| · . −∞ 2πx log Q q∈M(Q)

It is worth noting that the proof of their result draws heavily on the correspondence between the density conjecture (2.41) and the distribution of primes p of the form 4p = x2 − qy2 for negative fundamental discriminants q.

2.3.3 Complex class group characters 0 In this section, we estimate the first term Ek(Q, X) in (2.40). Using dyadic decomposition and the class number bound (2.33) for the discriminants in M 00(Q), we get

Z L0 0 2 −1/2 X X s −(k+1) Ek(Q, X)  (log X) max max Q1 (s, λχ)Y s ds (2.42) Y X Q Q Q 6 06 16 00 (c) L q∈M (Q1) χ∈Hb(q) 2 χ 6=χ0 for all c > 1. Like in Section 2.2, we set

2 (q) Hb1(q) = {χ ∈ Hb(q) | χ 6= χ0 }

00 for all q ∈ M (Q1) and 00 [ Hb1([Q1] ) = Hb1(q). 00 q∈M (Q1)

Moreover, we let aχ(n) denote the coefficients of the L-series of the logarithmic derivative of L(s, λχ), i.e. L0 a (n) (s, λ ) = X χ L χ ns n>1 2.3 Results of Bombieri–Vinogradov type 45 and split it according to Bombieri’s modification of Gallagher’s identity: For every 1 6 z 6 X, we set a (n) a (n) b (n) F := F (s, λ ) := X χ ,G := G (s, λ ) := X χ ,M := M (s, λ ) := X χ , z z χ ns z z χ ns z z χ ns n6z n>z n6z

−1 where the coefficients bχ(n) are the coefficients of L(s, λχ) . Then

L0 = G (1 − LM ) + F (1 − LM ) + L0M . (2.43) L z z z z z Thus, for all c > 1, we have

Z 0 s Z s Z s L Y Y  0  Y (s, λχ) k+1 ds = Gz(1 − LMz) k+1 ds + Fz(1 − LMz) + L Mz k+1 ds. (c) L s (c) s (c) s

We may move the line of integration of the second integral into the critical strip because Fz 0 00 and Mz are Dirichlet polynomials and L and L are entire functions for all χ ∈ Hb1([Q1] ) (see 2 2 [IK04, Theorem 14.17], for example). Repeatedly using the inequality 2|ab| 6 |a| + |b| , we obtain Z 0 X L s −(k+1) max (s, λχ)Y s ds Y 6X (c) L 00 χ∈Hb1([Q1] ) Z c X 2 2 −(k+1)  X (|Gz| + |1 − LMz| )|s| |ds| (c) 00 χ∈Hb1([Q1] ) Z (2.44) 1/2 X 2 2 2 −(k+1) + X (1 + |Fz| + |Mz| + |FzMz| )|s| |ds| (1/2) 00 χ∈Hb1([Q1] ) Z + X1/2 X (|L|2 + |L0|2)|s|−(k+1) |ds| (1/2) 00 χ∈Hb1([Q1] ) for all c > 1. The first and second term on the right-hand side will be evaluated by our large sieve inequality for complex class group characters, in particular by Corollary 2.11. Before we can do this, we have to determine the coefficients aχ(n) and bχ(n) of Fz, Gz and Mz. This is slightly more complicated than in the classical case, since if χ is a class group character in Hb1(q), then the product

X −2 λχ(m)λχ(n) = χq(d)λ(mnd ), (2.45) d|(m,n) where χq is the primitive real Dirichlet character modulo |q|, is not as simple as the product of two values of a Dirichlet character (see [Iwa97, §6.6], for example; recall that the λχ(n) are coefficients of primitive holomorphic cusp forms of weight one, level q and nebentypus χq, as we already mentioned in Section 2.2). This product formula yields the Euler product

Y −s −2s−1 L(s, λχ) = 1 − λχ(p)p + χq(p)p p from which one easily deduces (see [KM97, Lemma 2.1]) that

−1 X 2 −s L(s, λχ) = χq(m)µ(`)|µ(`m)|λχ(`)(lm ) . (2.46) `,m>1 46 The average distribution of primes represented by binary quadratic forms

We thus get the following expressions for the Dirichlet series Mz, Fz, Gz and 1 − LMz: X 2 −s Mz(s, λχ) = χq(m)µ(`)|µ(`m)|λχ(`)(lm ) , (2.47) `,m>1 2 `m 6z X 2 −s Fz(s, λχ) = − (log k)χq(m)µ(`)|µ(`m)|λχ(`)λχ(k)(klm ) , (2.48) k,`,m>1 2 k`m 6z X 2 −s Gz(s, λχ) = − (log k)χq(m)µ(`)|µ(`m)|λχ(`)λχ(k)(klm ) , (2.49) k,`,m>1 k`m2>z X 2 −s 1 − LMz(s, λχ) = − χq(m)µ(`)|µ(`m)|λχ(`)λχ(k)(klm ) . (2.50) k,`,m>1 2 `m 6z k`m2>z These series are not yet in the right form for a direct application of Corollary 2.11, but the following (in)equalities will bring them into the right shape:

Lemma 2.25. For all positive integers ` and m, let A(`, m) be a complex number. (a) Let α > 0. Assume that P A(`, m)`−(1+α)+it  m for all t ∈ . Then `>1 R 2 2

X 2 −(1+α)+it −1 X −3−2α X −(1+α)+it A(`, m)(`m )  α m A(`, m)` . (2.51) `,m>1 m>1 `>1

(b) Assume that P A(`, m)`−1/2+it < ∞ for all m 1 and all t ∈ . Moreover, assume `>1 > R that there exists a real number M such that A(`, m) = 0 for all m > M and all ` > 1. Then 2 2

X 2 −1/2+it X −1 X −1/2+it A(`, m)(`m )  (log M) m A(`, m)` . (2.52) `,m>1 m6M `>1

(c) Let χ ∈ Hb1(q) and j1, j2, j3 > 1. Then X −s X X 2 −s A(`, m)λχ(`)λχ(m)(`m) = χq(d) A(vd, wd)λχ(h)(hd ) (2.53) `,m>1 h,d>1 v,w>1 2 `m>j1 hd >j1 vw=h and X X −s X X X 2 −s A(`, m)λχ(`)λχ(m)(`m) = χq(d) A(vd, wd)λχ(h)(hd ) `6j2 m6j3 h,d>1 v j2 j3 6 d w6 d vw=h (2.54) for all s ∈ C for which the series converge. Proof. By the Cauchy–Schwarz inequality, we have 2   2 X 2 −s X 2(r−Re(s)) X −2(r+Re(s)) X −s A(`, m)(`m ) m m A(`, m)` (2.55) 6 `,m>1 m>1 m>1 `>1 for all real numbers r and all complex numbers s for which the sums on the right side converge. 1 The first bound follows for r = 2 and s = (1 + α) − it. As for the second bound, the sums on the right side of (2.55) are then only over m 6 M; 1 the bound follows for r = 0 and s = 2 − it. The equalities in (c) follow from (2.45). 2.3 Results of Bombieri–Vinogradov type 47

Remark. These (in)equalities have been used in [KM97, §7] to prove a zero-density estimate for L-functions associated to certain cusp forms. The first proofs of the Bombieri–Vinogradov theorem relied heavily on zero-density estimates for Dirichlet L-functions; Gallagher’s simplific- ation of these proofs then removed any direct appeal to the zeros but still kept the core of the argument. Thus, it is not surprising that Lemma 2.25 plays a role both here and in [KM97].

Now we set α = (log X)−1 and c = 1 + α, then apply (2.51) and (2.53) to (2.49) and obtain

2 2 X −3−2α X −(c+it) |Gz(c + it, λχ)|  (log X) m (log k)µ(`)|µ(`m)|λχ(`)λχ(k)(kl)

m>1 k,`>1 k`> z m2 2 X −3−2α X X 2 −(c+it) = (log X) m χq(d) (log vd)µ(wd)|µ(wdm)|λχ(h)(hd ) .

m>1 h,d>1 v,w>1 hd2> z vw=h m2 (2.56) Set X a1(h, d, m) = (log vd)µ(wd)|µ(wdm)| (2.57) v,w>1 vw=h and apply once again (2.51) to the right side of (2.56). This yields

2 2 2 X −3−2α X −3−2α X −(c+it) |Gz(c + it, λχ)|  (log X) m d a1(h, d, m)λχ(h)h ,

m 1 d 1 h> z > > m2d2 which now has the right form to apply Corollary 2.11. We get Z X 2 −(k+1) |Gz(s, λχ)| |s| |ds| (c) 00 χ∈Hb1([Q1] ) ε 2 X −3−2α X 2 −1−2α −3/2−2α 5/2 3 ε Q1(log X) (md) |a1(h, d, m)| (h + h Q1 )(log h) . m,d 1 h> z > m2d2

α α Since z 6 X  1 and 2 2 2 |a1(h, d, m)| 6 τ(h) (log hd) 2 for all h, d and m, the contribution coming from |Gz| in (2.44) is bounded by

K1 ε 5/2 −1/2 Oε X(log X) Q1(1 + Q1 z )) (2.58) for some K1 > 0; in fact, we may choose K1 = 11. A comparison of (2.49) and (2.50) shows that the analysis of the contribution coming from 2 |1 − LMz| in (2.44) can be performed in almost exactly the same way and the same bound is obtained. Thus we record that the whole first term on the right side of (2.44) can be bounded by (2.58). Moving on to the second line of (2.44), each summand in the integrand is again analysed separately. The contribution coming from the integrand 1 follows directly from (2.10): Z 1/2 X −(k+1) 1/2 00 1/2 3/2 X 1 · |s| |ds|  X |Hb1([Q1] )|  X Q1 (log Q1). (2.59) (1/2) 00 χ∈Hb1([Q1] ) 48 The average distribution of primes represented by binary quadratic forms

Next, Fz and Mz are bounded in the same way as Gz but with appeal to (2.52) (with M = z) instead of (2.51) and (2.54) instead of (2.53). In fact, with a1(h, d, m) given by (2.57), we find Z 1/2 X 2 −(k+1) X |Fz(s, λχ)| |s| |ds| (1/2) 00 χ∈Hb1([Q1] ) 1/2 2 ε X −1 X 2 −1/2 5/2 3 (2.60) ε X (log z) Q1 (md) |a1(h, d, m)| (1 + h Q1 )(log h) m,d z h z 6 6 m2d2

1/2 K2 ε 5/2 1/2  X (log X) Q1(z + Q1 z ) for some K2 > 0; we may choose K2 = 10. Similarly, Z 1/2 X 2 −(k+1) X |Mz(s, λχ)| |s| |ds| (1/2) 00 χ∈Hb1([Q1] ) 1/2 ε X −1 X 2 −1/2 5/2 3 (2.61) ε X Q1(log z) m |µ(`)µ(`m)| (1 + ` Q1 )(log `) m z ` z 6 6 m2

1/2 K2 ε 5/2 1/2  X (log X) Q1(z + Q1 z ).

2 The integrand |FzMz| requires a little bit more work, but the approach is familiar by now: By (2.47), (2.48), (2.52) and (2.54), we have

1  1  2 Fz 2 + it, λχ Mz 2 + it, λχ 2 3 X −1 X X −(1/2+it)  (log z) (mwd) a2(m, b, v, w)λχ(b)λχ(v)(bv) , m,w,d z v z b z 6 6 w2 6 (md)2 where a2(b, d, m, v, w) = µ(v)|µ(vw)|a1(b, d, m). By (2.54) and (2.52), we then get

1  1  2 Fz 2 + it, λχ Mz 2 + it, λχ 2 4 X −1 X −(1/2+it)  (log z) (mwdr) a3(h, r, d, m, w)λχ(h)h , m,w,d,r6z h>1 where X X 0 0 a3(h, r, d, m, w) = a2(b r, d, m, v r, w) 0 z 0 z v 2 b 6 w r 6 (md)2r v0b0=h whose absolute value is

X X 0 0 |a3(h, r, d, m, w)| 6 τ(b r)(log b rd)  (log z) τ(r) τ3(h), 0 z 0 z v 2 b 6 w r 6 (md)2r v0b0=h where τ3(h) is the ternary divisor function (i.e., the number of ordered 3-tuples (b1, b2, b2) of positive integers such that h = b1b2b3). By Corollary 2.11 and the bound

2 8 X 2 z (log z) τ3(h)  2 , 2 (mwdr) h z 6 (mwdr)2 2.3 Results of Bombieri–Vinogradov type 49 which follows by the method of [Kow04, p. 37], for example, we obtain Z 1/2 X  2 −(k+1) 1/2 K3 ε 2 5/2 X |Fz s, λχ Mz(s, λχ)| |s| |ds| ε X (log X) Q1(z + Q1 z) (1/2) 00 χ∈Hb1([Q1] ) (2.62) for some K3 > 0; we may choose K3 = 17. We gather the bounds (2.59), (2.60), (2.61), (2.62) and record that the contribution to the right side of (2.44) coming from the second line is

1/2 K3 ε 2 5/2  Oε X (log X) Q1(z + Q1 z) . (2.63)

It remains to bound the third term on the right side of (2.44). We could proceed as in [Bom87], using the bound P λ (n)  (|q|2N)1/2+ε that holds for Fourier coefficients of n6N χ ε weight-one cusp forms and therefore for our coefficients λχ as they arise from complex class group characters here (see Proposition 5 in [HM06], for example). However, in our case it is sufficient and easier to use the convexity bound for the functions L(s, λχ): Each of them satisfies a functional equation of the form

Φ(s, λχ) = Φ(1 − s, λχ), (2.64) where p|q|s Φ(s, λ ) = Γ(s)L(s, λ ); χ 2π χ see [IK04, §22.3], for example. Therefore, the convexity principle of Phragmén–Lindelöf (see [Gol06, Theorem 8.2.3]), for example) yields

1 1/2 1/2+ε L( 2 + it, λχ) ε |q| (1 + |t|) , for all t ∈ R. Combining the convexity principle for L(s, λχ) and Cauchy’s inequality for the 1 −1 derivative of analytic functions (consider the disc around 2 + it with radius (log Q1) ; see [Rud87, Theorem 10.26], for example), we also get

−1 0 1/4+ε 1/2+ε+(log Q1) L (s, λχ) ε |q| (1 + |t|) (log Q1).

These bounds and (2.10) yield Z X 2 0 2 −(k+1) 3 2+ε (|L(s, λχ)| + |L (s, λχ)| )|s| |ds| ε (log Q1) Q1 (2.65) (1/2) 00 χ∈Hb1([Q1] ) for k > 2. Remark. Duke, Friedlander and Iwaniec [DFI02, Theorem 2.6] proved the first subconvexity bound for the L-functions associated to complex class group characters for all fundamental discriminants (they had previously proved such a bound for special types of discriminants). Subsequently, a simpler proof – and a slightly better bound – was found by Blomer, Harcos and Michel [BHM07, Corollary 1]. As is clear from the theorem numbering of these results, these are only special cases of subconvexity bounds for much more general L-functions (and any such list of results would be incomplete without the general GL2-result by Michel and Venkatesh [MV10]). The convexity bound is more than enough for our needs and any invocation of these deep results would be pretentious here. 50 The average distribution of primes represented by binary quadratic forms

Let K = A + 2 + K4 for some K4 > max(K1,K2,K3, 3); thus, K = A + 19 is admissible, for example. We put together the upper bounds (2.58), (2.63) and (2.65) that we have found for the three summands in (2.44), insert them into (2.42) and get

0 K−A −1/2+ε 1/2 Ek(Q, X) ε (log X) max Q1 X Q06Q16Q (2.66)  1/2 5/2 −1/2 2 5/2 2 × X (1 + Q1 z ) + (z + Q1 z) + Q1 .

Now we set D := max(D1, 4K) (2.67) 1 D in Remark 2.23. We can assume without loss of generality that ε 6 4 . If Q > Q0 = (log X) , 1/2−ε K 4−ν+2ε 2K then Q1 is therefore at least (log X) and if we choose z = Q1 (log X) , we get

0 −A ν/2 5K 1/2 (15−5ν)/2+5ε Ek(Q, X) ε (log X) Q (X + (log X) X Q ).

This gives 0 ν/2 −A Ek(Q, X) ε Q X(log X) 15−5ν+10ε −B if Q 6 X(log X) where B = 10K; that is, we may choose

B = 10A + 190. (2.68)

Remark 2.26. If we assume the Lindelöf Hypothesis, we may use the conditional large sieve inequality (2.18) instead of (2.17). This leads to the bound

0 K−A −1/2+ε 1/2 Ek(Q, X) ε (log X) max Q1 X Q06Q16Q  1/2 3/2 −1/2 2 3/2 2 × X (1 + Q1 z ) + (z + Q1 z) + Q1 instead of (2.66), and this yields

0 ν/2 −A Ek(Q, X) ε Q X(log X)

1 6−3ν+6ε −6K 1 7−5ν+10ε −10K if either ν > 2 +2ε and Q 6 X(log X) , or ν 6 2 +2ε and Q 6 X(log X) .

2.3.4 Real class group characters

00 Before approaching the second sum Ek (Q, X) in (2.40), we note that each of the functions ψe(X; q, χ) for real class group characters χ can be written as the sum of two Chebyshev functions for Dirichlet characters: If q ∈ F and χ ∈ Hb(q) is a real class group character, it is given by

( χd1 (N(p)) if p - d1, χ(p) = χd1,d2 (p) := (2.69) χd2 (N(p)) if p - d2, for all prime ideals p ∈ Z(q) and then defined, by extension, for all non-zero fractional ideals. Here d1 and d2 are two (positive or negative) fundamental discriminants with d1d2 = q and, d as before, χd denotes the Kronecker symbol ( · ), which is the unique primitive real Dirichlet character modulo |d| if d 6≡ 0 (mod 8) (which we assume in this chapter). On the other hand, every such factorization of q gives a real class group character of H(q). Note that the trivial (q) class group character χ0 corresponds to the trivial factorization q = 1 · q (see [IK04, p. 510]). 2.3 Results of Bombieri–Vinogradov type 51

By the Kronecker Factorization Formula [Iwa97, Theorem 12.7], the L-function associated to the real class group character χd1,d2 factors as

L(s, λ ) = L(s, χ )L(s, χ ) (2.70) χd1,d2 d1 d2 into Dirichlet L-functions for all fundamental discriminants d1 and d2. Thus

0 0 0 L (s, λχ ) L (s, χ ) L (s, χ ) d1,d2 = d1 + d2 . (2.71) L(s, λ ) L(s, χ ) L(s, χ ) χd1,d2 d1 d2 Let F(Q) denote the set of all (positive or negative) fundamental discriminants d 6= 1 with |d| 6 Q and set 1  X k ψ (X; χ ) := X χ (n)Λ(n) log (2.72) k d k! d n n6X for every d ∈ F(Q). Then we have

Z 0 1 L s −(k+1) ψk(X; χd) = − (s, χd)X s ds (2.73) 2πi (c) L for each c > 1, by the method of [MV07, §5.1]. Therefore, (2.71) and (2.73) imply

00 X X 1 Ek (Q, X) 6 (|ψk(X; χd1 )| + |ψk(X; χd2 )|) h(d1d2) d1∈F(Q) d2∈F(Q) 00 d1d2∈M (Q) X X 1 = 2 |ψk(X; χd1 )| . h(d1d2) d1∈F(Q) d2∈F(Q) 00 d1d2∈M (Q)

The class number bound (2.33) for the discriminants in M 00(Q) yields 1 1 E00(Q, X)  (log Q) X |ψ (X; χ )| X . k 1/2 k d1 1/2 |d1| |d2| d1∈F(Q) d2∈F(Q) 00 d1d2∈M (Q)

By the assumptions – regarding the divisor frequency ν, which we defined in Definition 2.12 – ν for M(Q) in Theorem 2.13 and for Π(Q) in Theorem 2.14, the sum over d2 has at most Q terms. Hence 1 1 E00(Q, X)  (log Q) X |ψ (X; χ )| X , k |d |1/2 k d1 1/2 1 ν Q d d1∈F(Q) d2 min(Q , ) 2 6 |d1| which implies, by dyadic decomposition,

00 2 ν/2 −1/2 X E (Q, X)  (log Q) Q max Q |ψk(X; χd )| k 1−ν 1 1 Q16Q d1∈F(Q1) 2 1/2 −1 X + (log Q) Q max Q |ψk(X; χd )| 1−ν 1 1 Q 6Q16Q d1∈F(Q1) 00 00 = E1;k(Q, X) + E2;k(Q, X), say.

Note that we cannot profit here from the fact that M 00(Q) does not contain any small discrim- inants, which were already handled by means of Goldstein’s generalization of the Siegel–Walfisz 52 The average distribution of primes represented by binary quadratic forms theorem. Instead, we may use the original Siegel–Walfisz theorem to handle the small discrim- inant divisors d1 here. In fact, we have now basically reduced the problem to the analogous problem for Dirichlet characters, i.e. we are in a similar position as in the original Bombieri–Vinogradov theorem, the only differences being:

00 −1/2 (a) The first term E1;k(Q, X) above has the factor Q1 in front of the sum (coming from the −1 class number estimate) instead of Q1 (coming from the Euler totient function estimate) in the classical case. This will lead to a smaller admissible Q for ν < 1.

(b) Our sums are only over real primitive characters modulo |d1| with |d1| 6 Q; by positivity, we can, of course, include the non-real primitive Dirichlet characters as well. We proceed like in Section 2.3.3, but using the large sieve inequality for Dirichlet characters. We skip the explicit calculations as they are the same as in [Bom87] and obtain (compare the inequality at the bottom of page 62 and the top of page 63 in [Bom87])

X 4 4 2 −1 1/2 6 2 |ψk(X; χd1 )|  X(log X) + X(log X) Q1z + X (log X) z d1∈F(Q1) 1/2 6 2 1/2 2 4 −2 + X (log X) Q1 + X (log X) Q1z =: G(X,Q1, z).

L0 Here the variable z denotes the ordinate at which we truncate the Dirichlet series of L in the corresponding Gallagher identity (compare (2.43)) and it will be chosen in a moment. We obtain

00 ν/2 2 −1/2 E (Q, X)  Q (log X) max Q G(X,Q1, z) 1;k 1−ν 1 Q16Q and we want to bound the right-hand side with Qν/2X(log X)−A. This can be achieved when we set 3/2 6+A z = Q1 (log X) if the maximum above is attained for

12+2A 1/5 −8−6A/5 (log X) 6 Q1 6 X (log X) .

If the maximum is attained for a smaller Q1, we use the relation Z X dt ψk(X; χ) = ψk−1(t; χ) (2.74) 1 t and the Siegel–Walfisz theorem 2.2, in the form (see [MV07, Corollary 11.18], for example) √ −c log X ψ0(X; χ) A Xe , (2.75)

12+2A which holds with some absolute positive constant c for all q 6 (log X) and all non-principal Dirichlet characters χ modulo q, to get the desired bound. Altogether, we thus have

00 ν/2 −A E1;k(Q, X) A Q X(log X) (2.76)

5−5ν −B if Q 6 X(log X) for some B = B(A) > 0. Similarly,

00 1/2 2 −1 E (Q, X)  Q (log X) max Q G(X,Q1, z) 2;k 1−ν 1 Q 6Q16Q 2.3 Results of Bombieri–Vinogradov type 53

ν/2 −A 1/2−ν/2 6+A is bounded by Q X(log X) if we set z = Q1Q (log X) and if the maximum is attained for

12+2A 1/(5−3ν) −(40−6A)/(5−3ν) (log X) 6 Q1 and Q 6 X (log X) . Together with the Siegel–Walfisz theorem this leads to the bound

00 ν/2 −A E2;k(Q, X) A Q X(log X) (2.77)

5−3ν −B if Q 6 X(log X) for some B = B(A) > 0. Since this range is shorter than the range for 00 E1;k(Q, X) in (2.76), we have

00 ν/2 −A Ek (Q, X) A Q X(log X)

5−3ν −B if Q 6 X(log X) for some B = B(A) > 0. Note that we may choose B = 40 + 6A, which is smaller than the B-value (2.68) that we have found at the end of Section 2.3.3.

00 Remark 2.27. It is not possible to achieve a larger range for the above bound on Ek (Q, X) on the assumption of the Lindelöf Hypothesis for Dirichlet characters because the corresponding large sieve inequality, Lemma 2.6, is already optimal. However, the conditional range for the 0 bound on Ek(Q, X) (see Remark 2.26) is never larger than the unconditional range for the 00 bound on Ek (Q, X), which is why the former range yields the ranges given in Theorem 2.17. Remark 2.28. For ν < 1, we could also employ Heath-Brown’s large sieve inequality for real Dirichlet characters [HB95]: Let (an) be a sequence of complex numbers and let ε > 0; then

2 X X ε X anχ(n) ε (QX) (Q + X) an1 an2 , (2.78) χ n6X n1n2= where the outer sum on the left side is over real primitive Dirichlet characters modulo q 6 Q and the sum on the right side is over pairs of integers n1, n2 6 X for which n1n2 is a square. For ν < 1, this inequality yields a larger range for the discriminants in this section, but it requires a more careful analysis due to the distinct form of the sum on the right side; since we 0 are anyway limited by the much shorter range coming from Ek, this gives no overall gain and therefore we will not delve into this. Note that (2.78) does not seem to be applicable for ν = 1: The term X1+ε in (2.78) does not permit us to beat the bound (2.24) since our method can only compensate powers of (log X) when ν = 1, but not a genuine Xε.

2.3.5 Conclusion of the proofs The proofs of Theorem 2.13 and Theorem 2.14 are now complete, but we recall the main steps as well as the bounds which we have found in the last three subsections and thus tie the loose ends: P We started by noting that, by transition to ideal classes, we have to bound q∈M(Q) Ek(X; q) where Ek is defined by (2.27). In Proposition 2.20 we proved that the contribution of exceptional discriminants is negligible or at most of the desired size

OQν/2X(log X)−A if ν > 0 and OX(log X)k+3 if ν = 0. 54 The average distribution of primes represented by binary quadratic forms

Then we showed, by means of Lemma 2.22, that the contribution of small discriminants q of D size |q| 6 (log Q) is also acceptable. The exponent D was bounded below in Remark 2.23 and then defined in (2.67). Since it depends on the parameters A, k and ν, the implied constants in the bounds of Theorem 2.13 and Theorem 2.14 also depend on all these parameters; the dependence is ineffective because of the ineffectivity of Lemma 2.22. Next, we demonstrated that the functions ψk(Y ; q, χ) do not differ much from the k-th Riesz typical means ψek(Y ; q, χ) corresponding to the logarithmic derivatives of the L-functions which are associated to the ideal class group characters χ (see (2.38)) as long as we can guarantee that an additional term of size QX1/2(log X)k+3 is acceptable – which we can do, because we see in (2.79) below that Q must be much smaller than X. The contribution of large discriminants q of D size |q| > (log Q) was then estimated for complex and real class group characters separately: In Section 2.3.3 we showed that the contribution coming from complex class group characters is of the desired size if 15−5ν+ε −B Q 6 X(log X) (2.79) for all ε > 0 and all ν > 0; moreover, we may choose

B = 10A + 190. (2.80)

The contribution depends (effectively) on the parameter ε and therefore the bounds of Theorem 2.13 and Theorem 2.14 also depend on ε. (In Remark 2.26 we also found the ad- missible range which holds if we assume the Lindelöf Hypothesis.) Finally, in Section 2.3.4, we showed that the contribution coming from real class group characters is of the desired size if

5−3ν −B Q 6 X(log X) and we may choose B = 6A + 40. Since this range is larger than (2.79) and this value of B is smaller than (2.80), the final admissible range and the final admissible value of B for Theorem 2.13 and Theorem 2.14 are given by (2.79) and (2.80), respectively. This finishes both proofs.

2.3.6 Proofs of the Bombieri–Vinogradov type results We close this section with the proofs of Theorem 2.15 and Theorem 2.16:

Proof of Theorem 2.15. Fix A > 0 and ε > 0. By (1.6), we have

` w(C, p ) 6 ` + 1 for all form classes C, all primes p and all positive integers `. Moreover, (1.6) and (1.7) also yield X (log p)w(C, p) = e(C) X (log p) + O((log Y )(log |q|)). p6Y p6Y p∈R(q,C)

Thus, for all q ∈ F(Q), all C ∈ K(q) and all Y 6 X, we have Y Y ψ(Y ; q, C) − ψ (Y ; q, C) − + O(Y 1/2(log Y )3 + (log Y )(log |q|)). (2.81) e(C)h(q) 6 0 h(q)

Summing over q ∈ F(Q), we see that the remainder term is negligible in Theorem 2.15. 2.3 Results of Bombieri–Vinogradov type 55

The unsmoothing process from ψk(Y ; q, C) to ψ0(Y ; q, C) is similar to the one for the original Chebyshev functions (see [Bom87, §7.4]); we include it for the sake of completeness: Since ψk(X; q, C) is positive and increasing, the mean value theorem implies that Z X Z eαX −1 dt −1 dt α ψk(t; q, C) 6 ψk(X; q, C) 6 α ψk(t; q, C) e−αX t X t for every α > 0. Combining these inequalities with Z X dt ψk(X; q, C) = ψk−1(t; q, C) 1 t and the Taylor expansion of eα, we deduce that Y X Y max max ψ (Y ; q, C) −  α + α−1 max max ψ (Y ; q, C) − . k α k+1 C∈K(q) Y 6X h(q) h(q) C∈K(q) Y 6e X h(q)

If we now set α = log(X)−(A0+1)/2 and if

X Y 1/2 −A0 max max ψ2(Y ; q, C) −  Q X(log X) (2.82) C∈K(q) Y X h(q) q∈F(Q) 6 holds for some A0 > 0, then

X Y 1/2 −(A0−1)/2 X X max max ψ1(Y ; q, C) −  Q X(log X) + α . C∈K(q) Y X h(q) h(q) q∈F(Q) 6 q∈F(Q)

Repeating the procedure with α0 = log(X)−((A0−1)/2+1)/2 (which is larger than α) we get

X Y 1/2 −(A0−3)/4 X 0 X max max ψ0(Y ; q, C) −  Q X(log X) + α . (2.83) C∈K(q) Y X h(q) h(q) q∈F(Q) 6 q∈F(Q) The second term on the right side is at most

0  1  X(log X)−(A +1)/4 X 1 + X h(q) q∈Fex(Q) q∈F(Q)rFex(Q) (2.84) 0 0  X(log X)−(A +1)/4((log log Q) + Q1/2(log Q))  Q1/2X(log X)−(A −3)/4,

1/2 −1 which follows by using the bounds |Fex(Q)|  log log Q and h(q)  |q| (log |q|) for q ∈ F(Q) r Fex(Q), which we have found in Section 2.3.2. Therefore, Theorem 2.15 will follow from (2.81), (2.83) and (2.84) as soon as we prove the bound (2.82) for 10+ε −B Q 6 X(log X) (2.85) with B = B(A0) = 10A0 + 190 and then set A0 = 4A + 3. We split the left side of (2.82) into

X Y max max ψ2(Y ; q, C) − C∈K(q) Y X h(q) q∈F(Q) 6 X 1 X 6 max max ψ2(Y ; q, C) − ψ2(Y ; q, K) C∈K(q) Y X h(q) (2.86) q∈F(Q) 6 K∈K(q) P Y − ψ2(Y ; q, K) + X max K∈K(q) . Y X h(q) q∈F(Q) 6 56 The average distribution of primes represented by binary quadratic forms

The first term on the right side of (2.86) is  Q1/2X(log X)−A0 by Theorem 2.13 if (2.85) holds and B = 10A0 + 190. As for the second term, we note that equation (1.8) yields

 Y 2  Y 2 X X (log p) log w(K, p) = X (log p) log (1 + χ (p)). p p q K∈K(q) p6Y p6Y

Thus X Y − ψ2(Y ; q, K) K∈K(q)  Y 2 Y − 1 X X (log p) log w(K, p) + OY 1/2(log Y )3 6 2 p K∈K(q) p6Y   1/2 3 6 Y − ψ2(Y ) + ψ2(Y ; χq) + O Y (log Y ) , where ψ2(Y ; χq) was defined in (2.72) and where we have set ψ2(Y ) := ψ2(Y ; 1). Summing over q ∈ F(Q), we see that the remainder term is negligible in Theorem 2.15. By (2.74) and the Prime Number Theorem, we have

−D Y − ψ2(Y ) D Y (log Y ) for all D > 0. Thus, the bound

Y − ψ2(Y ) 0 max X  Q1/2X(log X)−A Y X h(q) 6 q∈F(Q) follows from splitting the sum into exceptional and non-exceptional discriminants as in (2.84).

As for the term ψ2(Y ; χq) above, we first note that

ψ2(Y ; χq) max X  (log log Q)X(log X)2 Y 6X h(q) q∈Fex(Q)

2A0+6 is negligible if Q is not too small, i.e. if Q > (log X) ; but if Q is small, then

ψ2(Y ; χq) max X Y X h(q) 6 q∈F(Q) is negligible by the Siegel–Walfisz theorem (see (2.75)). Thus, it remains to bound the sum over q ∈ F(Q)rFex(Q) and this may be accomplished by means of the original Bombieri–Vinogradov theorem – or rather the underlying average character sum that we have used in Section 2.3.4 00 (compare the bound (2.77) for E2;k with ν = 1 and k = 2 there). Hence we also get

ψ2(Y ; χq) 0 max X  Q1/2X(log X)−A Y X h(q) 6 q∈F(Q)

2 −B0 0 0 0 0 if Q 6 X(log X) for some B = B (A ) > 0 (which may be chosen as small as B(A ) above). In summary, the same bound holds for the second term on the right-hand side of (2.86) in the same range, which is larger than the range (2.85) for which we have bounded the first term. This finishes the proof of (2.82) in the range (2.85) with B = 10A0 + 190 and therefore it also concludes the proof of Theorem 2.15. 2.4 The mean square distribution 57

Proof of Theorem 2.16. The assertion of this theorem follows easily by partial integration; we skip the proof since the corresponding proof for the original Bombieri–Vinogradov theorem is given in full detail in [Br95, pp. 201/2] and nothing new happens in our case. We just remark that it comes in handy at this point to have Theorem 2.15 with the feature “maxY 6X ” – which we got for free in the preceding proofs, but which is a priori not obvious – because the integrals that appear in the transition from the Chebyshev function to the prime counting function may thus be estimated trivially.

2.4 The mean square distribution

In this section we consider the “variance” of the distribution of primes which can be represented by positive definite binary quadratic forms when the discriminant varies over bounded subsets of the set of negative fundamental discriminants and when we additionally average over all form classes in the corresponding form class groups. This can be viewed as an analogue of the Barban–Davenport–Halberstam theorem for primes in arithmetic progressions, Theorem 2.5. In fact, we show a more general mean square distribution result for arithmetic functions that satisfy Siegel–Walfisz conditions for both arithmetic progressions and form classes.

2.4.1 Statement of results The representability of primes is well distributed over almost all form classes to almost all (negative fundamental) discriminants in long ranges: Theorem 2.29. Let F(Q) be the set of all negative fundamental discriminants q 6≡ 0 (mod 8) with |q| 6 Q, let A > 0 be arbitrarily large and let ε > 0 be arbitrarily small. Then  li(X) 2 X X π(X; q, C) −  Q1/2X2(log X)−A (2.87) e(C)h(q) A,ε q∈F(Q) C∈K(q)

3+ε −2A−6 if Q 6 X(log X) . In the Barban–Davenport–Halberstam theorem for arithmetic progressions, the prime count- ing function can be replaced by many other arithmetic functions g. Indeed, it suffices to show that g is well distributed in arithmetic progressions to small moduli in order to prove that g shows a similar behaviour for almost all residue classes to almost all large moduli; see [IK04, §17.4] or [Br95, §5.6], for example. We encounter some difficulties when we try to generalize Theorem 2.29 accordingly, i.e. when we attempt to show that well-distribution (in a sense) of g with respect to form classes to small discriminants implies that g shows at most a small deviation from the expected “average behaviour” for almost all form classes to almost all large discriminants: First, we recall that every positive integer n may usually be represented by forms from different form classes of a given discriminant (or by no forms at all), whereas n always lies in exactly one residue class of a given modulus. Secondly – and very much related to the first point –, we have seen that, for example, non-ambiguous classes represent about twice as many primes as ambiguous classes (the factor e(C) accounts for this fact in (2.87)). Regarding Proposition 1.6 as well as the subsequent remarks and discussion, it is reasonable to say (and we have already vaguely used that notion in this sense in Section 2.3) that a function g is well distributed with respect to form classes when the weighted difference 1 X w(C, n)g(n) − X X w(K, n)g(n) (2.88) h(q) n6X K∈K(q) n6X 58 The average distribution of primes represented by binary quadratic forms is small for all form classes C of some discriminant q ∈ F; recall that the weights w(C, n) are given by (1.4) or equivalently by (1.5). We will assume that this kind of well-distribution with respect to form classes holds uniformly for all “small” negative fundamental discriminants. Moreover, as seen in the proofs of the preceding section, sums which involve real class group characters may be reduced to sums which involve Dirichlet characters. Thus, we will also require the function g to be well distributed in arithmetic progressions to all small moduli and the sums X X g(n) χ1(k)χ2(m) n6X 1

Theorem 2.30. Let 3 6 Q 6 X, let M(Q) be any subset of F(Q) and let g be an arithmetic function. Assume that 1 D(g; X; q, C) := X w(C, n)g(n) − X X w(K, n)g(n) h(q) n6X K∈K(q) n6X (2.89) !1/2 1/2 −L X 2 L X (log X) |g(n)| n6X for all L > 0, all q ∈ F(Q) and all form classes C ∈ K(q). Also assume that

!1/2 1 X g(n) − X g(n)  X1/2(log X)−L X |g(n)|2 (2.90) ϕ(q) L n6X n6X n6X n≡a (mod q) (n,q)=1 (n,q)=1 for all L > 0, all q ∈ F(Q) and all integers a with (a, q) = 1. Set 2 X X 1 X X R(g, Q, X) := g(n) χd1 (k)χd2 (m) , h(d1d2) |d1|>1 |d2|>1 n6X 1 0 and all arbitrarily small ε > 0. Remark 2.31. Assumption (2.89) is “non-trivial” only for |q| < (log X)8L+32 since

 X1/4   1/2 X ω(C, n)g(n)  + X1/8 X1/4(log X)4 X |g(n)|2 |q|1/8 n6X n6X by the Cauchy–Schwarz inequality and the easy bounds (1.14) and (2.8). Similarly, the second assumption (2.90) is non-trivial only for |q| < (log X)2L. 2.4 The mean square distribution 59

Theorem 2.29 follows easily from Theorem 2.30: Let g be the characteristic function of the primes. Assumption (2.90) holds by Remark 2.31, the Siegel–Walfisz theorem for arithmetic progressions (Theorem 2.2) and the Prime Number Theorem. As for assumption (2.89), we have 1 D(g; X; q, C) = X w(C, p) − X X w(K, p) h(q) p6X K∈K(q) p6X 1 = π(X; q, C)e(C) − X (1 + χ (p)) + O(log |q|) h(q) q p6X by (1.8). Assumption (2.89) now follows from Remark 2.31, the Siegel–Walfisz theorem in the form (2.75), the Prime Number Theorem and the Siegel–Walfisz theorem for binary quadratic forms (Theorem 2.1). The term R(g, Q, X) vanishes. Thus, from (2.91) we get  1 2 X X π(X; q, C)e(C) − X (1 + χ (p))  Q1/2X2(log X)−A (2.92) h(q) q A,ε q∈F(Q) C∈K(q) p6X 3+ε −2A−6 if Q 6 X(log X) . Similarly to the argument in Section 2.3.2, one shows that the contribution from exceptional discriminants to the left side of (2.92) is negligible (also compare the corresponding argument in the next subsection). Thus, we may assume the class number bound (2.33). Dyadic decomposition and the large sieve inequality for Dirichlet characters (Lemma 2.6) then yield 1  2 X X χ (p)  (log Q)2(Q3/2X + X2). (2.93) h(q) q q∈F(Q) p6X Therefore, Theorem 2.29 follows from (2.92), (2.93) and the Prime Number Theorem if 2A+4 Q > (log X) . If Q is smaller, Theorem 2.29 follows directly from Theorem 2.1. Remarks: (1) The term R(g, Q, X) clearly vanishes if the function g is supported on primes only or if the set M(Q) contains only prime discriminants, for example. Thus, we get a clean well-distribution result in these cases. It would be interesting to find other cases in which R(g, Q, X) is dominated by the first term on the right-hand side of (2.91). (2) The proof of Theorem 2.30 will be very similar to what was done in the preceding section (and, of course, similar to the proof of the Barban–Davenport–Halberstam theorem for arithmetic progressions). In fact, most of the steps will be much simpler. It will be clear from the proof that we could have also proved results for sets with divisor frequency ν < 1 as in Theorem 2.13 and Theorem 2.14. (3) By the easy estimate (1.14) and the class number bound |q|1/2(log |q|)−1  h(q) (for non-exceptional discriminants), we have the “trivial” estimate  li(X) 2 X2 X2  X X π(X; q, C) −  X X +  Q1/2(log Q)X2. e(C)h(q) |q| h(q)2 q∈F(Q) C∈K(q) q∈F(Q) C∈K(q) Thus, we save once again an arbitrary power of (log X) over this estimate. Assuming the Lindelöf Hypothesis as in Theorem 2.17, we can increase the admissible range for the fundamental discriminants in Theorem 2.29 almost up to Q ≈ X: Theorem 2.32. Assume the Lindelöf Hypothesis for Rankin–Selberg convolutions of holo- morphic cusp forms of weight one (or only assume the bound (2.16) for all pairs of complex 1+ε −2A−6 class group characters). Then Theorem 2.29 holds for Q 6 X(log X) . See Remark 2.33 for the proof of this conditional result. 60 The average distribution of primes represented by binary quadratic forms

2.4.2 Preliminaries

First, we consider the contribution coming from the initial range of negative fundamental L discriminants. Fix A > 0. Set Q0 = (log X) 0 for some L0 > 0, which will be chosen later and which will depend on A only. By assumption (2.89) and the class number bound h(q)  |q|1/2(log |q|), the contribution to the left-hand side of (2.91) coming from discriminants q with |q| 6 Q0 is

3/2 −L1 X 2 1/2 L0−L1+1 X 2 L1 (log Q0)Q0 X(log X) |g(n)| L1 Q0 X(log X) |g(n)| n6X n6X for each L1 > L0. This is dominated by the right-hand side of (2.91) if

L0 − L1 + 1 6 −A. (2.94)

It remains to consider the large discriminants, i.e. all q in

0 M (Q) := {q ∈ M(Q): Q0 < |q| 6 Q} and we may assume from now on that Q > Q0. By the definition (1.4) of the weights w(C, n), we have 1 D(g; X; q, C) = X g(N(a)) − X g(N(a)). h(q) a∈Bq(C)∩Z(q) a∈Z(q) N(a)6X N(a)6X

For every q ∈ F and every χ ∈ Hb(q), we set

X X G(X; χ, q) := g(N(a))χ(a) = g(n)λχ(n). a∈Z(q) n6X N(a)6X

By the orthogonality property of ideal class group characters, we may rewrite D(g; X; q, C) as

 1  1 D(g; X; q, C) = X g(N(a)) X χ(B (C))χ(a) − X g(N(a)) h(q) q h(q) a∈Z(q) χ∈Hb(q) a∈Z(q) N(a)6X N(a)6X 1 X = χ(Bq(C))G(X; χ, q). h(q) (q) χ∈Hb(q)r{χ0 }

Moreover, orthogonality also yields

2 X X χ(C)G(X; χ, q)

C∈H(q) (q) χ∈Hb(q)r{χ0 } X  X  = G(X; χ1, q)G(X; χ2, q) χ1(C)χ2(C) (q) C∈H(q) χ1,χ2∈Hb(q)r{χ0 } = h(q) X |G(X; χ, q)|2. (q) χ∈Hb(q)r{χ0 } 2.4 The mean square distribution 61

Thus, the contribution from large discriminants to the left-hand side of (2.91) is 2 X X −1 D(g; X; q, Bq (C)) q∈M 0(Q) C∈H(q) 2 X 1 X X = χ(C)G(X; χ, q) (h(q))2 q∈M 0(Q) C∈H(q) (q) χ∈Hb(q)r{χ0 } 1 = X X |G(X; χ, q)|2. h(q) q∈M 0(Q) (q) χ∈Hb(q)r{χ0 } The contribution coming from exceptional discriminants is again negligible if Q is not very small. Indeed, by the bound |Fex(Q)|  log log Q (see the proof of Proposition 2.20) for the set of exceptional fundamental discriminants q ∈ F(Q), the Cauchy–Schwarz inequality and the bound (2.8), we have 1 X X |G(X; χ, q)|2  X(log X)4 X |g(n)|2. h(q) q∈Fex(Q) (q) n6X χ∈Hb(q)r{χ0 } In particular, the contribution to the left-hand side of (2.91) coming from exceptional discrim- 2A+8 inants is negligible if Q > (log X) . This means that we must choose at least

L0 > 2A + 8 (2.95) above. Therefore it remains to estimate the contribution from non-exceptional discriminants, i.e. we have to bound 1 X X |G(X; χ, q)|2 h(q) q∈M 00(Q) χ∈Hb(q) χ26=χ(q) 0 (2.96) 1 + X X |G(X; χ, q)|2, h(q) q∈M 00(Q) (q) χ∈Hb(q)r{χ0 } 2 (q) χ =χ0 00 0 where M (Q) = M (Q) r Fex(Q).

2.4.3 Complex class group characters The lower class number bound (2.33), dyadic decomposition and the large sieve inequality for complex class group characters (Lemma 2.7) together imply that the first sum in (2.96) is bounded above by

2 −1/2 X X 2 (log Q) max Q1 |G(X; χ, q)| Q0 Q1 Q 6 6 q∈M 00(Q ) 1 χ∈Hb(q) 2 (q) χ 6=χ0 2 −1/2  3 1/2 5/2+ε X 2 (2.97) ε (log Q) max Q1 X(log X) + X (log X)Q1 |g(n)| Q06Q16Q n6X 1/2 1/2  1/2 5 −1/2 −1/2 3 3/2+ε X 2  Q X X (log X) Q Q0 + (log X) Q |g(n)| n6X 2A+10−L for every ε > 0. This is dominated by the right-hand side of (2.91) if Q > (log X) 0 , which is certainly satisfied if the above-mentioned condition L0 > 2A + 8 holds. 62 The average distribution of primes represented by binary quadratic forms

2.4.4 Real class group characters Like in Section 2.3.4, the second sum in (2.96) is handled by reducing it to a sum over real Dirichlet characters. If q ∈ F and χ ∈ Hb(q) is a real non-trivial class group character, then the Kronecker Factorization Formula (2.70) implies that λχ(n) is the Dirichlet convolution

λχ(n) = χd1 ∗ χd2 (n) (2.98) of two primitive real Dirichlet characters modulo the absolute values of non-trivial fundamental discriminants d1 and d2 with d1d2 = q. Thus, if χ ∈ Hb(q) is non-trivial and real, then X X G(X; χ, q) = g(n) χd1 (k)χd2 (m) n6X km=n for some fundamental discriminants d1 and d2 with d1d2 = q and |d1|, |d2| > 1. Moreover, each such pair of discriminants induces one of the non-trivial real class group characters in Hb(q). Let F (Q) denote the set of all fundamental discriminants d with 1 < |d| 6 Q (as in Section 2.3.4). The second sum in (2.96) can thus be bounded as follows: 1 X X |G(X; χ, q)|2 h(q) q∈M 00(Q) χ∈Hb(q)r{χ0} 2 χ =χ0 1 2 = X X X g(n) X χ (k)χ (m) d1 d2 h(d1d2) d1∈F (Q) d2∈F (Q) n6X 16k,m6n 00 d1d2∈M (Q) km=n 1 2 (2.99)  (log Q) X X X g(n)(χ (n) + χ (n)) + R(g, Q, X) 1/2 d1 d2 |d1d2| d1∈F (Q) d2∈F (Q) n6X 00 d1d2∈M (Q) 1 2 1  (log Q) X X g(n)χ (n) X + R(g, Q, X) |d |1/2 d1 1/2 1 n X Q d d1∈F (Q) 6 d2 2 6 |d1|

 S1(Q, X) + S2(Q, X) + R(g, Q, X), where 2 1/2 X X S1(Q, X) = Q0 (log Q) g(n)χd(n) d∈F (Q0) n6X and 2 1/2 2 −1 X X S2(Q, X) = Q (log Q) max Q1 g(n)χd(n) Q06Q16Q d∈F (Q1) n6X and R(g, Q, X) was defined in Theorem 2.30. By positivity and orthogonality, we have 2 1/2 X X X S1(Q, X) 6 Q0 (log X) g(n)χ(n) 1

1/2 −L2+1+3L0 X 2 S1(Q, X) L2 Q0 X(log X) |g(n)| n6X 2.5 Applications and open questions 63

for all L2 > L0. Hence, S1(Q, X) is dominated by the right side of (2.91) if

− L2 + 1 + 3L0 6 −A. (2.100)

Finally, we use the large sieve inequality for Dirichlet characters, Lemma 2.6, to bound S2(Q, X). We get 1/2 2 −1 X 2 S2(Q, X) L0 Q (log X) (Q + XQ0 ) |g(n)| . n6X −1 3/2+ε −A−2 This is dominated by the right side of (2.91) if Q + XQ0 6 Q (log X) + X(log X) , which is certainly true if the above-mentioned condition L0 > 2A + 8 holds. By (2.95), (2.94) and (2.100) we also see that all implied constants above that depend on L0, L1 or L2, can be made dependent on A only, if we choose L0 = 2A + 8, L1 = A + L0 + 1 and L2 = A + 3L0 + 1, for example. This concludes the proof of Theorem 2.30.

Remark 2.33. If we assume the Lindelöf Hypothesis we may use the conditional large sieve inequality of Lemma 2.10 instead of Lemma 2.7. Thus, we may then replace the term Q5/2+ε in the second line of (2.97) by Q3/2+ε; the term Q3/2+ε in the last line of (2.97) and in (2.91) may 1/2+ε 1+ε −2A−6 therefore be replaced by Q . Thus, (2.87) holds if Q 6 X(log X) , which concludes the proof of Theorem 2.32.

2.5 Applications and open questions

Our results of Bombieri–Vinogradov type and of Barban–Davenport–Halberstam type give equidistribution results for primes represented by binary quadratic forms in smaller ranges than the corresponding original theorems for arithmetic progressions. This will probably some- what limit the scope of their applications. We only give two different types of consequences which can be derived irrespectively of the size of the admissible ranges in the theorems of Sections 2.3 and 2.4 and which are analogues of similar known results for primes in arithmetic progressions. It would be extremely interesting to find applications which genuinely require mean-value results on primes that are represented by binary quadratic forms and which are more than mere adaptations of applications of the mean-value results for primes in arithmetic progressions. We will close this section by mentioning possible extensions and generalizations of our mean- value results.

2.5.1 The least prime represented by binary quadratic forms The most natural question about primes of a specific shape is certainly to ask whether an infinitude of primes of this shape exist; this was answered by Dirichlet’s theorem for primes in arithmetic progressions and by Weber’s theorem for primes represented by binary quadratic forms (Theorem 1.1). However, the question on the size of the least prime of a specific form is probably a close second. Both the Bombieri–Vinogradov theorem and the Barban–Davenport–Halberstam theorem provide information on this question for primes in arithmetic progressions. Linnik [Lin44] proved that there exist absolute, effectively computable constants c and L such that p(q; a), the least prime p ≡ a (mod q) with (a, q) = 1, is at most cqL. The best upper bound for the constant L is currently L = 5.18 which is due to Xylouris [Xyl11]. It is easy to show that, conditionally on the Generalized Riemann Hypothesis, L = 2 + ε is admissible for every ε > 0; on the other 64 The average distribution of primes represented by binary quadratic forms hand, Friedlander and Iwaniec [FI03b] proved the remarkable fact that one can take L = 1.983 if Landau–Siegel zeros exist (and satisfy certain properties). Being average estimates, the Bombieri–Vinogradov theorem and the Barban–Davenport– Halberstam theorem are not capable of giving bounds which hold for all arithmetic progressions. However, one can achieve L = 2 + ε for all residue classes in almost all progressions and even L = 1 + ε for most residue classes in almost all progressions. Likewise our variants of these theorems can provide bounds for the size of the least prime number which can be represented by a given positive definite binary quadratic form with the bounds holding for all or almost all form classes and almost all fundamental discriminants. In fact, using Theorem 2.29, we can show that the smallest prime represented by any binary quadratic form in a given class of a given discriminant is, for all ε > 0, less or equal |q|3+ε for most classes of forms and most discriminants q:

Theorem 2.34. For each negative fundamental discriminant q 6≡ 0 (mod 8) and each form class C ∈ K(q), let

p(q; C) = the least prime which is represented by all binary quadratic forms in C.

Then, for each ε > 0, there exists a subset S = S(ε) of the set F of negative fundamental discriminants q 6≡ 0 (mod 8) such that

S ∩ [−N, −1] lim = 1, N→∞ F ∩ [−N, −1] i.e. S has asymptotic density 1 in F, and

|{C ∈ K(q ) | p(q ; C) |q |3+ε}| lim n n 6 n = 1 (2.101) n→∞ h(qn) holds for each sequence (qn) in S with |qn| → ∞ as n → ∞.

Remark. Elliott and Halberstam [EH71] proved the analogous theorem for arithmetic progres- sions (with exponent 1+ε) and Hinz [Hin81] generalized their result to arbitrary (fixed) number fields.

Proof. Fix ε > 0 and set A = 3 + ε. Let Q0 = Q0(A, ε) be sufficiently large, i.e. such that

 Q2A+7 Qε Q3+2ε (log Q/2)2A+7 log and Q3+2ε · (2.102) 2 6 2 6 2 (log Q4)2A+6

−2A−6 3+2ε for all Q > Q0. For each Q > Q0, we fix X = X(Q, A, ε) with X(log X) = Q and we 4 assume that Q > X (otherwise we increase Q0). Moreover, let T (Q) be the (possibly empty) subset of the fundamental discriminants in the interval [−Q, −Q/2) such that q lies in T (Q) if and only if p(q; C) > |q|3+3ε holds for at least h(q)(log |q|)−(A−3) classes C ∈ K(q). Fix Q > Q0. Then

Q3+3ε Q3+2ε  Q2A+7 p(q; C) > log X(log X)−2A−6(log Q4)2A+6 X 2 > 2 2 > > 2.5 Applications and open questions 65 holds for at least h(q)(log |q|)−(A−3) classes C ∈ K(q) for all q ∈ T (Q); here we have used condi- tion (2.102) and the choice of X. Thus, we have π(X; q, C) = 0 for at least h(q)(log |q|)−(A−3) classes, and therefore

h(q) (li(X))2 Q1/2X2 X ·  , (log |q|)A−3 (h(q))2 ε,A (log X)A q∈T (Q) by Theorem 2.29. Hence

Q(log Q)A−4 Q |T (Q)|  . ε,A (log X)A−2 6 (log Q)2 Set T = [ T (2m). m m: Q062 |T ∩ [−N,−1]| −1 Then N  (log N) → 0 as N → ∞. Therefore, if S is the complement of T in F, then S has asymptotic density 1 and if (qn) is a sequence in S with |qn| → ∞ as n → ∞, then

3+3ε −(A−3) h(qn) > |{C ∈ K(qn) | p(qn; C) 6 |qn| }| > h(qn) − h(qn)(log |qn|) for all n ∈ N, hence (2.101) holds.

Our variant of the Bombieri–Vinogradov theorem, Theorem 2.16, gives a weaker bound, but this time it holds for all classes of forms and most discriminants: Theorem 2.35. For every ε > 0, the upper bound

10+ε max p(q; C) 6 |q| C∈K(q) may only fail for fundamental discriminants q lying in a set V = V (ε) ⊂ F that has asymptotic density 0 in F. We skip the proof since it is very similar to the one of Theorem 2.34 above. Remarks: (1) From the Siegel–Walfisz theorem for binary quadratic forms, Theorem 2.1, it follows easily that there exists an absolute constant L such that

max p(q; C)  |q|L(log |q|) C∈K(q)

for all q ∈ F. (2) Kowalski and Michel have proved in [KM02] a log-free zero-density estimate for auto- morphic forms on GL(n)/Q and described how this can be used to show the existence of an absolute constant L such that

max p(q; C)  |q|L (2.103) C∈K(q)

for all q ∈ F. This bound is also a consequence of earlier results by Fogels [Fog65,Fog67] and Weiss [Wei83]. However, no explicit admissible value for L has yet been published. (3) The Generalized Riemann Hypothesis for the Dedekind zeta-functions of the Hilbert class √ fields of the fields Q( q) implies (see Theorem 1.16) that (2.103) holds for all q ∈ F with L = 1 + ε for all ε > 0. 66 The average distribution of primes represented by binary quadratic forms

(4) Assuming the Lindelöf Hypothesis as in Theorem 2.17 and Theorem 2.32, one may replace the exponent 3+ε by 1+ε in Theorem 2.34 and the exponent 10+ε by 3+ε in Theorem 2.35.

Focussing on the primes of the shape x2 + ny2, which are mentioned in the title of this thesis, that is, on primes represented by the principal class of discriminant −4n, it would be interesting to investigate bounds for the values of xmin and ymin that yield the smallest prime of this form for any given positive integer n. One would naturally assume that ymin is typically very small. Notwithstanding, it is somewhat surprising that numerical calculations even suggest 8 that ymin > 1 can only occur for an exceedingly small set of values n: Up to at least n = 10 , the smallest prime of the form x2 + ny2 is actually of the form x2 + n in all but the eleven cases n ∈ {5, 41, 59, 314, 341, 479, 626, 749, 755, 881, 1784}; in all these exceptional cases we have ymin = 2. If we could show that ymin = 1 for all n > 1784 (which appears to be formidable) or could at least get a nice bound for the number of exceptions, the problem of bounding the least prime of the form x2 + ny2 would reduce to bounding the smallest prime of the form x2 + n. Although this polynomial looks simpler than our original one, there are questions on the prime numbers which it represents that are so much tougher than for binary quadratic forms: There is no integer n > 0 for which it is nowadays known whether there are infinitely many primes of the form x2 + n. Nevertheless, Baier and Zhao [BZ07] proved that, given A, B > 0, 2 2 X X 2 NX Λ(x + n) − G(n)X  , (log X)B n6N x6X µ(n)2=1 where ! χ (p) G(n) = Y 1 − n , p − 1 p>2 2 −A 2 −1 −1 holds if X (log X) 6 N 6 X . Note that G(n) converges and G(n)  (log n)  (log X) . Thus, using a similar argument as in the proof of Theorem 2.34, we can conclude: Theorem 2.36. For every ε > 0, the smallest prime of the form x2 + n is less than n2+ε for at least almost all positive squarefree integers n, with the possible exception of a set of integers with asymptotic density 0. 2 2 Consequently, conditional on the assumption that p0(n), the least prime of the form x +ny , is attained for y = 1 for all positive squarefree integers n in a set of asymptotic density 1, we have 2+ε p0(n) 6 n for all positive squarefree integers n in a set of asymptotic density 1.

2.5.2 A mean-value estimate for products of two primes Strictly speaking not a real application but rather another mean-value result of Bombieri– Vinogradov type – this time for the sequence of numbers that are products of two primes – can be derived from a combination of the results of Sections 2.3 and 2.4:

Theorem 2.37. Let ε > 0 and A > 0. Let M2(Q) be the set of positive squarefree integers n ≡ 1 (mod 4) with n 6 Q. For each n ∈ M2(Q), let π2(X; n) denote the number of integers k 6 X which can be written in the two shapes 2 2 k = x + ny and k = p1p2 2.5 Applications and open questions 67

1/2 for some integers x and y as well as some primes p1 and p2 with p1, p2 6 X both of which can be represented by positive definite binary quadratic forms of discriminant −4n. Then there exists a constant B = B(A) such that

  1/2 !2 1/2 X h(−4n) ω(n)−3 li(X ) Q X π2(X; n) − + 2 · A,ε 4 h(−4n) (log X)A n∈M2(Q)

20+ε −B if Q 6 X(log X) .

Remark. With more effort it is possible to derive similar statements for integers k = p1 ··· p` a1 a with any fixed number ` of prime factors and p1 6 X , . . . , p` 6 X ` , where a1 6 ... 6 a` and a1 +···+a` = 1. The resulting admissible range for the discriminants is then always determined by the smallest exponent a1. This type of result is due to Barban [Bar66, Theorem 3.3] in the case of primes in arithmetic progressions and it was generalized to arbitrary number fields by Hinz [Hin81]. Their results do not use results of Bombieri–Vinogradov type but only appropriate versions of the Barban– Davenport–Halberstam theorem. We also have to use Theorem 2.16 because of the difference (in the factor e(C)) between the asymptotic behaviours of the numbers of primes represented by ambiguous or by non-ambiguous forms.

1/2 Proof. Set Y = X and fix a positive squarefree integer n ≡ 1 (mod 4) with n 6 Y . For all k ∈ N and all C ∈ H(−4n), let δ(C, k) = 1 if there exists an ideal a ∈ C with N(a) = k, and δ(C, k) = 0 otherwise. If k is of the form k = x2 + ny2, then k can be represented by the principal form of discriminant −4n; hence δ(P (−4n), k) = 1 by Lemma 1.4. Since we also assume that k = p1p2, where p1 and p2 are primes with p1, p2 6 Y and both can be represented by positive definite binary quadratic forms of discriminant −4n, we must have δ(C1, p1) = 1 −1 and δ(C2, p2) = 1 for two classes C1,C2 ∈ H(−4n) with C1C2 = P (−4n), i.e. C2 = C1 . (1) Let π2 (X; n) be the√ number of pairs of primes p1 6 p2 6 Y with p1p2 6 X such that both p1 and p2 split in Q( −n) and both are representable by forms of an ambiguous class of discriminant −4n; then 1 π(1)(X; n) = X X X δ(C, p )δ(C−1, p ) + O(Y ) 2 2 1 2 p16Y p26Y C∈H(−4n) p1 splits p2 splits C2=P (−4n) 1 = X π(Y ; −4n, C)π(Y ; −4n, C−1) + O(Y + ω(4n)2). 2 C∈K(−4n) 2 C =C0(−4n) The error term is a trivial bound for the squares of primes and the contribution from ramified primes in π(Y ; −4n, C) and π(Y ; −4n, C−1). (2) Let π2 (X; n) be the√ number of pairs of primes p1 6 p2 6 Y with p1p2 6 X such that both p1 and p2 split in Q( −n) and both are representable by forms of non-ambiguous classes of discriminant −4n; then 1 1 π(2)(X; n) = · X X X δ(C, p )δ(C−1, p ) + O(Y ) 2 2 2 1 2 p16Y p26Y C∈H(−4n) p1 splits p2 splits C26=P (−4n) (2.104) 1 = X π(Y ; −4n, C)π(Y ; −4n, C−1) + O(Y + ω(4n)2). 4 C∈K(−4n) 2 C 6=C0(−4n) 68 The average distribution of primes represented by binary quadratic forms

1 The additional factor 2 in (2.104) stems from the existence of two distinct classes which contain a prime ideal of norm p1 if the classes are non-ambiguous. (3) Let π2 (X; n) be the number√ of pairs of primes p1 6 p2 6 Y with p1p2 6 X such that at least one of these primes ramifies in Q( −n) and both are representable by forms of discriminant −4n; (3) then π2 (X; n) 6 ω(4n)Y 6 Y (log Y ). Thus,

3 π(Y ; −4n, C)π(Y ; −4n, C−1) π (X; n) = X π(i)(X; n) = X + O(Y (log Y )). 2 2 2(3 − e(C)) i=1 C∈K(−4n)

2 ω(−4n)−1 ω(n) We have |{C ∈ K(−4n) | C = C0(−4n)}| = 2 = 2 (see Remark 1.3) and therefore h(−4n) 3 1 + 2ω(n)−3 = X 1 + X 1. 4 8 4 C∈K(−4n) C∈K(−4n) 2 2 C =C0(−4n) C 6=C0(−4n) Thus,

h(−4n)   li(Y ) 2 X π (X; n) − + 2ω(n)−3 2 4 h(−4n) n∈M2(Q) −1  2 X X π(Y ; −4n, C)π(Y ; −4n, C ) 3 li(Y ) − 6 2 8 h(−4n) n∈M2(Q) C∈K(−4n) 2 C =C0(−4n) (2.105) −1  2 X X π(Y ; −4n, C)π(Y ; −4n, C ) 1 li(Y ) + − 4 4 h(−4n) n∈M2(Q) C∈K(−4n) 2 C 6=C0(−4n) + O(QY (log Y )).

By the triangle inequality, the first term on the right side is

X X li(Y ) −1 li(Y ) π(Y ; −4n, C) − · π(Y ; −4n, C ) − 6 2h(−4n) 2h(−4n) n∈M2(Q) C∈K(−4n) 2 C =C0(−4n) (2.106)

X li(Y ) + 2 li(Y ) max π(Y ; −4n, C) − . C∈K(−4n) 2h(−4n) n∈M2(Q) 2 C =C0(−4n)

Let B1 = B1(A) be the constant from Theorem 2.16 and let B2 = 2A + 6 be the constant from Theorem 2.29. Set B = max(B1,B2). By Theorem 2.16, the second term of (2.106) is

 Q1/2Y 2(log Y )−A

10+ε −B if Q 6 Y (log Y ) . By the Cauchy–Schwarz inequality and Theorem 2.29, the first term of (2.106) is  Q1/2Y 2(log Y )−A in the same range. 2.5 Applications and open questions 69

The second term on the right side of (2.105) is

X X li(Y ) −1 li(Y ) π(Y ; −4n, C) − · π(Y ; −4n, C ) − 6 h(−4n) h(−4n) n∈M2(Q) C∈K(−4n) 2 C 6=C0(−4n)

X li(Y ) + 2 li(Y ) max π(Y ; −4n, C) − . C∈K(−4n) h(−4n) n∈M2(Q) 2 C 6=C0(−4n)

By Theorem 2.16 and Theorem 2.29 this is again

 Q1/2Y 2(log Y )−A

10+ε −B if Q 6 Y (log Y ) . It finally remains to note that the error term on the right side of (2.105) is negligible.

2.5.3 Possible extensions and generalizations We would like to end this chapter by mentioning some potential extensions of the results which we have proved in this chapter for primitive positive definite binary quadratic forms with fun- damental discriminants q 6≡ 0 (mod 8).

(1) The restriction to fundamental discriminants and the congruence restriction q 6≡ 0 (mod 8) seem to be the easiest to drop, but this would render certain parts of the proofs more technical. Primarily, this restriction allowed us to not have to worry about odd prime factors of the discriminants appearing more than once. Thus, we were able to find a bound for the conductor of Rankin–Selberg convolutions of class group L-functions in Section 2.2 by a quite easy application of Li’s functional equation; odd square factors (and, to a lesser extent, the factor 8) would have turned this into a rather messy affair or would have required us to use much deeper results on local Langlands correspondence; see the last footnote in the proof of Lemma 2.8. We also profited from the absence of square factors when using the Kronecker Factorization Formula (2.70). However, an analogue of this formula is known to hold for non-fundamental discriminants as well (see [Fog61, Lemma 4]), so that this part of the proof could probably be adapted accordingly without much trouble.

(2) A rather different picture shows for primitive binary quadratic forms of positive discrim- inant q. The proof of the infinitude of primes which are represented by any such form is essentially the same as for negative discriminants. But already the corresponding prime number theorem is significantly more complicated: The first proof by de la Vallée Poussin [dlVP97] was 92 pages long, even though he skipped the parts which were the same as in his 35-pages-proof for negative discriminants (Landau was later able to simplify both proofs but a notable difference in complexity persists). The additional difficulty lies mainly in the presence of non-trivial units. Their number – or rather their “density” – is hard to control, but it determines the regulator, which is a supplementary factor in the class number formula for real quadratic fields. This in turn makes it difficult to obtain good estimates for the class number and would obviously be an obstacle to overcome when trying to find mean-value estimates for the error term in the prime number theorem for such forms. Moreover, by the Cohen–Lenstra heuristics (see [Coh93, §5.10]), these class numbers are conjectured to be usually very small and give therefore presumably not much potential for cancellation effects in average results. 70 The average distribution of primes represented by binary quadratic forms

(3) Regarding the many similarities between the prime number theory for arithmetic progres- sions and the prime number theory for binary quadratic forms, it is natural to continue by investigating primes lying simultaneously in given (families of) arithmetic progressions and given (families of) binary quadratic forms. Dirichlet stated in 1840 and Meyer proved in 1888 that every positive definite binary quadratic form represents infinitely many primes lying in a prescribed residue class of a given modulus provided that certain compatibility conditions are fulfilled; de la Vallée Poussin proved a quantitative version of this statement in the fifth part of his Recherches analytiques sur la théorie des nombres premiers (see [Nar00, §2.2] for references). One can imagine that it should be feasible to combine these old results for fixed progressions and forms, the original mean-value results for arithmetic progressions and our variants for binary quadratic forms.

(4) Leaving binary quadratic forms behind, a vast number of much more ambitious extensions are conceivable. At the turn of the millennium some binary polynomials of higher degree were proved to represent infinitely many primes. After Friedlander and Iwaniec [FI98] gave a deep proof for the infinitude of primes of the form x2 +y4, their ideas were picked up and advanced by Heath-Brown and Moroz who proved that each primitive irreducible binary cubic form f(x, y) represents infinitely many primes unless f represents only even numbers; see [HBM02] and the references there. The family of binary cubic forms would therefore be a candidate for new “on average”-results. There exist correspondences between classes of binary cubic forms and classes of cubic rings, the Delone–Faddeev correspondence and the related Davenport–Heilbronn corres- pondence (see [BST13], for example); maybe they could be used in a similar way as the correspondence between classes of binary quadratic forms and ideal classes that we have used. Moreover, we call to mind that one is not necessarily confined to families of forms or families of polynomials whose members are already known to represent an infinitude of primes (but which are still conjectured to do so) as can be seen from the mean-value results for the 2 family of polynomials x + n with n ∈ N that we mentioned at the end of Section 2.5.1. (5) Apart from generalizing the forms under consideration, it would be also interesting to strengthen our existing results for positive definite binary quadratic forms. The most effective gain would certainly arise from a stronger large sieve inequality for complex class group characters; see Remark 2.9(b). Since there exist two essentially different ways to replace the classical Barban–Davenport– Halberstam theorem by an asymptotic formula (see Section 2.1), one with and one without appeal to the large sieve inequality, an analogous asymptotic formula for binary quadratic forms poses an appealing follow-up challenge. Ultimately, it would, of course, be most fascinating to see anything proved towards results of Fouvry–Iwaniec/Bombieri–Friedlander–Iwaniec type (see Theorem 2.4) for primes of the shape x2 + ny2. Regarding the depth of these results in their original form, this probably is but a daydream for now. Chapter 3

Chebyshev’s bias and prime number races for binary quadratic forms

At the beginning of his famous 1837 memoir in which Dirichlet proved the existence of infin- itely many primes in each arithmetic progression a (mod q) with (a, q) = 1, he mentioned the observation that the number of primes p with p ≡ a (mod q) asymptotically equals the number of primes p with p ≡ b (mod q) if (a, q) = 1 = (b, q).1 However, if all things were equal in all arithmetic progressions, nothing would be prized, to paraphrase Thomas Hobbes; and there- fore it soon got noticed that Dirichlet’s observed equality cannot be true in a stronger sense. Chebyshev wrote in 1853 (see [MS99, §29]): En cherchant l’expression limitative des fonctions qui déterminent la totalité des nombres premiers de la forme 4n + 1 et de ceux de la forme 4n + 3, pris au-dessous d’une limite très grande, je suis parvenu à reconnaître que ces deux fonctions diffèrent notablement entre elles par leurs seconds termes, dont la valeur, pour les nombres 4n + 3, est plus grande que celle pour les nombres 4n + 1; [...]. Chebyshev goes on to give more specific assertions on this difference and he claimed to have proofs, but he did not write any of them down. The first of these assertions states that, given positive numbers X0 and δ, there exists X = X(X0, δ) > X0 such that

π(X; 4, 1) − π(X; 4, 3)

√ − 1 < δ. X(log X)−1 It was only in 1891 when Phragmén gave the first correct proof of this statement (see [Lan06] and the references there). Note that this result does not specify the direction of the “bias” which Chebyshev mentioned earlier. His second claim states that

e−3c − e−5c + e−7c + e−11c − e−13c − e−17c + e−19c + e−23c − · · · → ∞ (3.1) as c → 0. Landau [Lan18] proved that this statement would imply the non-existence of zeros of L(s, χ4) to the right of the critical line (Hardy and Littlewood proved in the same year the inverse implication).2 Chebyshev’s remark remained a conundrum and it was only in 1959 when Shanks [Sha59] found a way to give an appropriate formulation of the bias. In fact, compared to the enormous

1As we have mentioned in Section 2.1, it took another 59 years before de la Vallée Poussin actually proved that π(X; q, a) ∼ π(X; q, b) as X → ∞. 2Landau aptly notes that this implication “erhöht für Ungläubige die Wahrscheinlichkeit, daß Tschebyschef sich geirrt hat, und für Gläubige den Wunsch, aus seinen Papieren den Beweis von [(3.1)] rekonstruiert zu sehen.” 72 Chebyshev’s bias and prime number races for binary quadratic forms advances in the study of uniformities in the distribution behaviour of primes lying in various residue classes of a given modulus during the first half of the 20th century, the study of the discrepancies in this distribution behaviour received little to no attention until the 1960s when Knapowski and Turán coined the term comparative prime number theory for this field and systematically investigated many of the problems concerning this subject in two series of papers. Yet, the field mostly stagnated in the subsequent years and it was only in the 1990s that it was revived, primarily by the work of Rubinstein and Sarnak [RS94]. Building on their paper, comparative prime number theory has enjoyed a resurgence in popularity in recent years and results in this field now trade under the illustrious names Chebyshev’s bias, Prime Number Races and the Shanks–Rényi race. In the next section, we will give a short review of these recent advances in comparative prime number theory for arithmetic progressions. Then we will move on to primes represented by binary quadratic forms. Ng has shown that a bias exists between certain ideal classes of a given fixed imaginary quadratic field; this may be interpreted as a bias between form classes of a given fixed fundamental discriminant. We will review some of his results in Section 3.2. Finally, the overall theme of this thesis, the distribution of primes of the shape x2 + ny2 for various positive integers n, will be taken up in Section 3.3. We will prove that certain even negative fundamental discriminants are put at a disadvantage in their capability to represent primes by their principal form – in a way that is not directly apparent from the corresponding prime number theorem, which we have seen in Chapter 1, and that contrasts with the uniformity results of Chapter 2. Our restriction to fundamental discriminants reduces the amount of technical details in our analysis, but we expect similar results to hold for non-fundamental discriminants as well and therefore for all pairs (n, m) of positive integers for which −4n and −4m have the same class number. We close this chapter by giving a list of possible lines of further research.

3.1 Bias in the distribution of primes in arithmetic progressions

It would hardly be possible to give a better modern-day introduction to the field of comparative prime number theory than the survey articles [GM06] and [FK02]. We will not endeavour to do so, but will only give an overview of some of the most important recent results and especially the definitions and assumptions which our own results in Section 3.3 will also heavily rely on. A natural generalization of Chebyshev’s observation suggests the following setting: Fix an integer q > 3 and a pair (a1, a2) of distinct positive integers less than q which are both coprime to q. One considers the properties of the set

P (q; a1, a2) = {X > 2 | π(X; q, a1) > π(X; q, a2)} and wants to see whether P (q; a1, a2) ∩ [2,X] is “usually” (in a suitable sense) larger than the set P (q; a2, a1) ∩ [2,X]. That is, one aims to find out if there is a bias in the “race” of the prime counting functions π(X; q, a1) and π(X; q, a2). Thus, Chebyshev’s original remark can be expressed as the observation that

P (4; 3, 1) ∩ [2,X] > P (4; 1, 3) ∩ [2,X] for “most” X > 2. To describe this asymmetry rigorously, one has to fix the right notion of density on N. The asymptotic density is not appropriate for problems of this kind since, subject to the Generalized Riemann Hypothesis, it is known that this density does not even exist for the sets P (q; a1, a2) (see the references to the works of Kaczorowski and Sarnak in [GM06]). 3.1 Bias in the distribution of primes in arithmetic progressions 73

However, Rubinstein and Sarnak [RS94] proved that the logarithmic density

1 Z dt δ(q; a1, a2) = lim X→∞ log X [2,X]∩P (q;a1,a2) t always exists and that it is always positive3 – subject to the following two hypotheses:

• the Generalized Riemann Hypothesis for Dirichlet characters modulo q (GRHq);

• a linear independence hypothesis, which they call the Grand Simplicity Hypothesis (GSHq): The set of all non-negative imaginary parts of zeros of all Dirichlet L-functions L(s, χ), with χ running over the primitive Dirichlet characters modulo q, is linearly independent over Q. According to [RS94, §1] and [FK02, §3], the consequences of similar linear independence as- sumptions were already investigated by Wintner and Ingham in the 1930s and 1940s (for the zeros of the Riemann zeta-function) and by Hooley and Montgomery in the 1970s and 1980s (for the zeros of Dirichlet L-functions). For some results on prime number races one can dis- pense with the use of the (full) Grand Simplicity Hypothesis, but it seems to be very difficult to obtain results independent of the Generalized Riemann Hypothesis. For instance, in [FLK13] it is shown how certain hypothetical sets of zeros of Dirichlet L-functions lying off the critical line may cause P (q; a1, a2) to have asymptotic density 0. This would be a stark contrast to the results of Rubinstein and Sarnak. Let us come back to their paper: They investigate configurations (q; a1, a2) as above for 1 which the logarithmic density deviates from the expected value 2 , i.e. configurations for which a bias exists. The superficial reason for the existence of such a bias lies in the difference between the prime counting function π(X; q, a) and the corresponding Chebyshev function ψ(X; q, a); the latter function does not have a similar bias as the former count. In particular, the absence of the squares of primes in π(X; q, a) may cause a bias: Under the assumption of GRHq the contribution of the squares of primes is roughly of the same size as the error term in the explicit formula for ψ(X; q, a); this contribution, of course, only exists if a is a square modulo q. Indeed, it is shown in [RS94] that the strict inequalities

1 0 < δ(q; a1, a2) < 2 < δ(q; a2, a1) < 1 (3.2) hold if a1 is a square modulo q and a2 is not a square modulo q. In particular, this explains 1 Chebyshev’s original observation. Moreover, one has δ(q; a1, a2) = 2 = δ(q; a2, a1) if both a1 and a2 are quadratic residues modulo q or if both are quadratic non-residues modulo q. As indicated above, the starting point of the proof of (3.2) is the so-called explicit formula for the Chebyshev function ψ(X; q, a), which relates this function with the zeros of the Dirichlet L-functions L(s, χ) for the Dirichlet characters χ modulo q. This formula simplifies considerably on the assumption of GRHq and they prove that the vector-valued function

log X  E(X; q; a1, a2) = √ ϕ(q)π(X; q, a1) − π(X), ϕ(q)π(X; q, a2) − π(X) X

3 They actually consider a more general situation: Let (a1, . . . , ar) be an r-tuple of distinct positive integers less than q which are all coprime to q. Set

P (q; a1, . . . , ar) = {X > 2 | π(X; q, a1) > ··· > π(X; q, ar)} and find sets of this type which are usually larger (in the sense of a larger logarithmic density) than other sets P (q; aσ(1), . . . , aσ(r)) when σ is a permutation of {1, . . . , r}. We will ignore these interesting generalizations as they are not relevant for our results in Section 3.3. 74 Chebyshev’s bias and prime number races for binary quadratic forms

2 has a limiting (logarithmic) distribution on R , i.e. there exists a probability measure µq;a1,a2 2 on R such that 1 Z X dt Z lim g(E(t; q; a1, a2)) = g(x) dµq;a1,a2 (x) X→∞ log X 2 t R2 2 for all bounded continuous functions g on R . In order to relate µq;a1,a2 and δ(q; a1, a2) the assumption GSHq is needed; it allows Rubinstein and Sarnak to show that µq;a1,a2 is absolutely 2  continuous and so δ(q; a1, a2) = µq;a1,a2 {(x1, x2) ∈ R | x1 > x2} . Their explicit formulas for µq;a1,a2 and, importantly, the knowledge of the exact position of the zeros of Dirichlet L-functions up to some height then also make it possible to compute numerically the densities δ(q; a1, a2) for some values of q, a1 and a2 (in particular, they get δ(4; 3, 1) ≈ 0.996, showing that the bias in Chebyshev’s original prime number race is indeed very pronounced). Moreover, they show that the limiting distribution of E(X√;q;a1,a2) converges in measure to log q x2+x2 −1 the Gaussian (2πe 1 2 ) dx1dx2 as q → ∞; this leads to the asymptotic behaviour 1 max δ(q; a1, a2) → a1,a2 2 as q → ∞, i.e. any bias dissolves as q grows. We only mention one of the many recent papers which build on the work of Rubinstein and Sarnak as we have chosen some of its results as a template for our own results in Section 3.3: Fiorilli and Martin [FM13] broke new ground by attaining an asymptotic formula for δ(q; a1, a2) which can be exactly evaluated as a finite expression of arithmetic (rather than analytic) in- formation – without explicit use of the position of zeros on the critical line. Therefore, extensive numerical computations can be omitted and most of the results of Rubinstein and Sarnak can be calculated in a more precise and effective way. We will come back to their work in Section 3.3.

3.2 Primes represented by different classes of forms with a fixed discriminant

Similar to the extension or generalization of the large sieve inequality, the Bombieri–Vinogradov type results and the Barban–Davenport–Halberstam type results to algebraic number fields, which we have mentioned in Section 2.1, it is natural to extend the notion of Chebyshev’s bias or prime number races to algebraic number fields. This was indicated by Rubinstein and Sarnak in [RS94, §5] and then carried out in great detail by Ng in [Ng00]. In particular, he considered discrepancies in the distribution of unramified prime ideals in distinct conjugacy classes of a fixed Galois group, i.e. discrepancies in the dominion of the Chebotarev density theorem: Let K be a fixed number field and let L/K be a normal extension; let G be its Galois group. For each conjugacy class C of G, we set |G| π (X; L/K, C) = · π(X; L/K, C), enorm |C| e where πe(X; L/K, C) was defined in the statement of the Chebotarev density theorem, X Theorem 1.14. This theorem also yields the asymptotic relation πenorm(X; L/K, C) ∼ log X as X → ∞. For any two distinct conjugacy classes C1 and C2 of G, Ng defines the bias sets

P (L/K; C1, C2) = {X > 2 | πenorm(X; L/K, C1) > πenorm(X; L/K, C2)} , then constructs limiting distributions attached to these sets and computes explicit logarithmic 4 densities for some specific examples of sets P (L/K; C1, C2).

4 In fact, more general results are obtained for r-tuples (C1,..., Cr) of distinct conjugacy classes of G. 3.2 Primes represented by different classes of forms with a fixed discriminant 75

√ It is of particular interest to us that he analyses in detail the case when K = Q( q), where q is a fixed negative (fundamental) discriminant, and L is the Hilbert class field of K; as we have mentioned in Section 1.3, the ideal class group H(q) and the Galois group Gal(L/K) are isomorphic. To state his results, some additional definitions are necessary: Let πe(X; q) and πe(X; q, C) be the numbers of prime ideals p with N(p) 6 X in O(q) and in C ∈ H(q), respectively. For any two ideal classes C1,C2 ∈ H(q), we define the normalized functions log X  Eq(X; C1,C2) = √ h(q)π(X; q, C1) − π(X; q), h(q)π(X; q, C2) − π(X; q) , X e e e e 0 log X  E (X; C1,C2) = √ π(X; q, C1) − π(X; q, C2) . q X e e From the conditional explicit formula for the Chebyshev function associated to ideal class group characters (Theorem 1.18), one quickly derives that

−1 −1 iγχ 0 |% (C2)| − |% (C1)| 1 X X X Eq(X; C1,C2) = − (χ(C1) − χ(C2)) 1 h(q) h(q) + iγχ (q) |γχ|6T 2 χ∈Hb(q)r{χ0 } ! X1/2(log T )2 1 + O + , q T log X where % : H(q) → H(q), C 7→ C2. The first term on the right-hand side accounts for the bias: It is 0 if both C1 and C2 are in the image of % (or if none of them is) and it is |%−1(C )| − |%−1(C )| κ(q) 2 1 = h(q) h(q) if C2 is in the image of % and C1 is not in the image of %; recall that κ(q) denotes the number of ambiguous classes in H(q) (see (1.2)). Thus, by Remark 1.3, an ideal class whose corres- ponding form class (via the correspondence of Lemma 1.4) belongs to the principal genus will be discriminated against in a prime ideal race with any ideal class whose corresponding form class lies in a different genus. Clearly, the presumable bias is the more pronounced the smaller the odd part H(q)/(H(q))2 of the class group is. Ng also proves – under the assumption of the Generalized Riemann Hypothesis and a suit- able linear independence hypothesis – that a limiting distribution µ of Eq(X; C1,C2) exists. 2 Moreover, the probability measure on R whose Fourier transform equals an appropriately nor- malized form of the Fourier transform of µ converges, as q → −∞, in measure to a Gaussian which is independent of the classes C1 and C2. It then follows that biases between different ideal classes disappear as |q| grows – similarly to the above-mentioned behaviour of prime number races for residue classes modulo q. Remark 3.1. Using the relation between prime ideals in imaginary quadratic fields and primes represented by binary quadratic forms, which we described in Section 1.2, one can translate these results into statements on prime number races for primes represented by forms of either of two distinct form classes of the same discriminant. Some caution has to be exercised whenever the principal class C0 enters the race: Both πe(X; q, P (q)) = πe(X; q, Bq(C0)) and πe(X; q) do not only contain prime ideals which correspond (via Lemma 1.4) to primes represented by the principal form or by any form of discriminant q, respectively: They usually contain prime ideals which lie over rational primes that remain prime in O(q), too. Note that this cannot happen for other form classes; compare Remark 3.7. This disparity distorts the bias results that follow here for primes represented by forms of distinct form classes but the same discriminant. We will not particularize the results as doing so would make too large a digression from the main theme of the thesis. The analysis that would be necessary is, however, very similar to the one that we will perform in the upcoming Section 3.3. 76 Chebyshev’s bias and prime number races for binary quadratic forms

3.3 Prime number races for forms of the shape x2 + ny2

After having studied questions on uniformity in the distribution of primes represented by binary quadratic forms with varying discriminants in Chapter 2, we will now throw light on discrepan- cies in this distribution. The works that we have discussed in the preceding two sections fix the modulus of an arithmetic progression or the discriminant of an imaginary quadratic field before studying the existence of a bias between two different residue classes to this same modulus or between two different ideal classes in this same field (and therefore, to some extent, between two different form classes of the same discriminant). The only results that consider varying moduli or varying discriminants are the ones which show that the maximal bias in the prime number races – between quadratic residues and non-residues or between ideal classes whose corresponding form classes lie inside and outside the principal genus, respectively – decreases as the absolute value of the modulus or the discriminant grows. We will investigate here what happens when we fix the form class and then compare prime number races between this class for two distinct discriminants. In fact, we will concentrate on the principal class as this is the only case in which “fixing the class” seems to be a meaningful notion.

The analogous problem for primes in arithmetic progressions would be the following: Let q1 and q2 be two distinct positive integers with ϕ(q1) = ϕ(q2). How is the difference

π(X; q1, 1) − π(X; q2, 1) distributed? It seems that this question has drawn much less direct attention than the problems discussed in Section 3.1. The only work known to the author which mentions this kind of problem explicitly is the second paper [KT64] of the “Further developments in the comparative prime-number theory”-series of Knapowski and Turán (a shorter remark also appears in the first paper of the series). In the appendix of this paper, they suggest the investigation of the distribution of the difference π(X; q1, a1) − π(X; q2, a2) for integers q1, q2, a1 and a2 with (a1, q1) = 1 = (a2, q2) and ϕ(q1) = ϕ(q2). They attribute this question to G. Lorentz. In some cases such problems can be directly reduced to prime number races for distinct residue classes of a fixed modulus: If q1 = 3 and q2 = 4 (in particular: ϕ(q1) = ϕ(q2)) and a1 = 2 and a2 = 1, we have

π(X; 3, 2) − π(X; 4, 1)   = p 6 X | p ≡ 2, 5, 8 or 11 (mod 12) − p 6 X | p ≡ 1, 5 or 9 (mod 12) = 1 + π(X; 12, 11) − π(X; 12, 1).

Thus, the race between primes of the form 3n + 2 and primes of the form 4n + 1 reduces to the race between primes of the form 12n + 11 (together with a handicap coming from the prime 2) and primes of the form 12n + 1. Since 1 is a quadratic residue and 11 is a quadratic non-residue modulo 12 we expect a bias towards the first arithmetic progression by the results mentioned in Section 3.1 (the resulting bias will only be slightly changed by the extra summand 1). However, if q1 = 7, q2 = 9 and a1 = a2 = 1, for example, then

π(X; 7, 1) − π(X; 9, 1) = π(X; 63, 29) + π(X; 63, 43) − π(X; 63, 19) − π(X; 63, 37).

Thus, the union of the residue classes 29 and 43 modulo 63 competes against the union of the residue classes 19 and 37. Note that 37 and 43 are quadratic residues while 19 and 29 are quadratic non-residues modulo 63. The results of Section 3.1 cannot be directly applied 3.3 Prime number races for forms of the shape x2 + ny2 77 to this “union-problem” (as it is called in the papers of Knapowski and Turán), but it is entirely conceivable that the methods of Rubinstein–Sarnak and Fiorilli–Martin could be used to investigate this problem. Coming back to primes represented by binary quadratic forms, or specifically to primes of the shape x2 + ny2 for various positive integers n, the purpose of this section will be to study the distribution of ∆(X; n, m) := π0(X; n) − π0(X; m) for positive squarefree integers n and m with n, m 6≡ 3 (mod 4), where X π0(X; n) := π(X; −4n, C0) = 1. p6X 2 2 ∃x,y∈Z: p=x +ny

That is, we want primes of the shape x2 + ny2 to compete against primes of the shape x2 + my2 and we assume h(−4n) = h(−4m) to allow for a reasonably fair race: By the prime number theorem for these forms, Theorem 1.2 or Theorem 1.12, this assumption yields the asymptotic relation π0(X; n) ∼ π0(X; m) as X → ∞. Thus, a priori there is no reason to expect that ∆(X; n, m) has a preference for either positive or negative values. Nevertheless, we will see that a difference in the number of odd prime divisors of n and m causes a bias. This phenomenon is visible from the graphs of ∆(X; n, m), which we have plotted – using PARI/GP and gnuplot – for the exemplary pairs 9 (n, m) = (17, 21), (33, 34), (1201, 1365) and (14, 17) and X 6 10 .

Prime number race: x2+17y2 vs. x2+21y2 1600

1400

1200

1000

800

600

400

200

0

-200

-400

-600 0 1 2 3 4 5 6 7 8 9 10 x 108

9 Figure 3.1: Graph of X 7→ ∆(X; 17, 21) = π0(X; 17) − π0(X; 21) for X 6 X0 = 10 . The discriminants of x2 + 17y2 and x2 + 21y2 both have class number 4. We have an apparent bias for 17, the number with fewer (odd) prime factors: The ratio of the truncated logarithmic densities 1 R dt of the log X0 X∈Si t sets S1 = {X 6 X0 | ∆(X; 17, 21) > 0} and S2 = {X 6 X0 | ∆(X; 17, 21) < 0} is about 6.6. The ratio of the corresponding truncated natural densities |{X∈Si∩Z}| is about 6.4. X0 78 Chebyshev’s bias and prime number races for binary quadratic forms

Prime number race: x2+33y2 vs. x2+34y2 600

400

200

0

-200

-400

-600

-800

-1000

-1200

-1400

-1600 0 1 2 3 4 5 6 7 8 9 10 x 108

9 Figure 3.2: Graph of X 7→ ∆(X; 33, 34) = π0(X; 33) − π0(X; 34) for X 6 X0 = 10 . The discriminants of x2 + 33y2 and x2 + 34y2 both have class number 4. We have an apparent bias for 34, the number with fewer odd prime factors: The ratio of the truncated logarithmic densities of {X 6 X0 | ∆(X; 33, 34) > 0} 1 and {X 6 X0 | ∆(X; 33, 34) < 0} is about 4.2 . The ratio of the corresponding truncated natural densities 1 is about 5.7 .

Prime number race: x2+1201y2 vs. x2+1365y2 1000

800

600

400

200

0

-200

-400

-600 0 1 2 3 4 5 6 7 8 9 10 x 108

9 Figure 3.3: Graph of X 7→ ∆(X; 1201, 1365) = π0(X; 1201) − π0(X; 1365) for X 6 X0 = 10 . The discriminants of x2 + 1201y2 and x2 + 1365y2 both have class number 16. We have a (visually slightly less) apparent bias for 1201, the number with fewer (odd) prime factors: The ratio of the truncated logarithmic densities of {X 6 X0 | ∆(X; 1201, 1365) > 0} and {X 6 X0 | ∆(X; 1201, 1365) < 0} is about 9.3. The ratio of the corresponding truncated natural densities is about 2.0. 3.3 Prime number races for forms of the shape x2 + ny2 79

Prime number race: x2+14y2 vs. x2+17y2 1500

1000

500

0

-500

-1000 0 1 2 3 4 5 6 7 8 9 10 x 108

9 Figure 3.4: Graph of X 7→ ∆(X; 14, 17) = π0(X; 14)−π0(X; 17) for X 6 X0 = 10 . The discriminants of x2 + 14y2 and x2 + 17y2 both have class number 4. There is no apparent bias. The ratio of the truncated 1 logarithmic densities of {X 6 X0 | ∆(X; 14, 17) > 0} and {X 6 X0 | ∆(X; 14, 17) < 0} is about 1.3 . The ratio of the corresponding truncated natural densities is about 1.06.

It does not seem to be possible to reduce this problem to a “union-problem” for a fixed discriminant like in the analogous race for primes in arithmetic progressions. Nevertheless, we will follow certain parts of the papers of Rubinstein and Sarnak [RS94] and Fiorilli and Martin [FM13] and see how their proofs and results can be used in our context. Beside providing an appealing contrast to the uniformity results of Chapter 2, we hope that this investigation will also spark further interest in the original question by Knapowski, Turán and Lorentz. Remark. Moree and te Riele [MtR04] considered the following race for all integers (not only primes) represented by the forms x2 + y2 or x2 + 3y2: Let 2 2 Bn(X) = {k 6 X | k = x + ny for some x, y ∈ Z} for n = 1 and n = 3. The constant in (1.13) can be made explicit: As X → ∞, we have X B (X) ∼ b(−4n)√ n log X with Y 1 Y 1 b(−4) = q ≈ 0.764 and b(−12) = q √ ≈ 0.639. −2 −2 p≡3 (mod 4) 2 1 − p p≡2 (mod 3) 2 3 1 − p

Thus, this is not a fair race to begin with and it is clear that B3(X) will gather an increasing deficit in the long run. However, Moree and te Riele prove that B1(X) actually never relinquishes the leadership after taking the pole position with 2 = 12 + 12. Their proof relies, of course, on numerical computations, but involves an interesting underlying analytic toolbox, which they developed in an earlier paper on Chebyshev’s bias for composite numbers with restricted prime divisors. 80 Chebyshev’s bias and prime number races for binary quadratic forms

3.3.1 Definitions and statement of the results For ease of notation and in order to put ourselves in a good position to use the methods in [RS94] and [FM13] in as straightforward a way as possible, we will consider the function π˘0(X; n) that 2 2 counts the primes of the shape x +ny twice: For all X > 2 and all positive squarefree integers n 6≡ 3 (mod 4), we set π˘0(X; n) = 2 π0(X; n), π˘(X; n) = 2 X 1; p6X χ−4n(p)=1 note that, by Proposition 1.6 and Remark 1.9, the second function counts (twice) all primes p 6 X which do not divide n and which can be represented by some binary quadratic form of discriminant −4n. Furthermore, if m 6≡ 3 (mod 4) is also a positive squarefree integer with h(−4n) = h(−4m), we define the bias set

P (n, m) = {X > 2 | π˘0(X; n) > π˘0(X; m)} = {X > 2 | π0(X; n) > π0(X; m)}, the corresponding logarithmic density

1 Z dt δ(n, m) = lim , X→∞ log X [2,X]∩P (n,m) t and the normalized bias functions

log X  E(X; n) = √ · h(−4n)˘π0(X; n) − π˘(X; n) , X log X  E(X; n, m) = √ · h(−4n) · π˘0(X; n) − π˘0(X; m) (3.3) X log X = E(X; n) − E(X; m) + √ · π˘(X; n) − π˘(X; m). X

Definition 3.2. For any two positive squarefree integers n and m with n, m 6≡ 3 (mod 4) and h(−4n) = h(−4m), we will need the usual assumptions (see Section 3.1) in the following form:

• Special cases of the Generalized Riemann Hypothesis (GRHn,m): We assume√ that all non-trivial zeros of the Dedekind zeta-functions of the Hilbert class fields of Q( −n) and √ 1 Q( −m) have real part equal to 2 . (Note that, by Remark 1.17 and the Kronecker Factor- ization Formula (2.70), this assumption implies that the Dirichlet L-function L(s, χd) has −4n −4m no non-trivial zeros lying off the critical line if d and d or d and d are fundamental discriminants.)

0 • A linear independence hypothesis (LIn,m): For both k = m and k = n, let Hb1(−4k) be 0 any maximal set of complex class group characters of H(−4k) such that χ ∈ Hb1(−4k) 0 0 implies χ 6∈ Hb1(−4k); moreover, let Hb2(−4k) be the set of real class group characters of H(−4k). Let F± denote the set of all (positive or negative) fundamental discriminants. Set n −4n −4m o D(n, m) = d ∈ F : ∈ F and ∈ F , ± d ± d ± 0 n o n o Hb3(n, m) = χ−4n/d, χ−4m/d : d ∈ D(n, m) r χd : d ∈ D(n, m) ,

0 n 0 0 o Hb4(n, m) = χd,−4n/d ∈ Hb2(−4n), χd,−4m/d ∈ Hb2(−4m): d ∈ D(n, m) . 3.3 Prime number races for forms of the shape x2 + ny2 81

We assume that the multiset of the non-negative imaginary parts of all non-trivial zeros of all L-functions associated to the class group characters and Dirichlet characters in 0 0 0 0 0  0 Hb1(−4n) ∪ Hb1(−4m) ∪ Hb2(−4n) ∪ Hb2(−4m) ∪ Hb3(n, m) r Hb4(n, m) (3.4) is linearly independent over Q; in particular, we assume that all elements of this multiset have multiplicity one. 0 Remark 3.3. The exclusion of the set Hb4(n, m) of real class group characters in (3.4) is necessary as the Kronecker Factorization Formula shows that the linear independence hypothesis 0 could not hold otherwise. The multiple zeros which arise from Hb4(n, m) will cancel out in the key equation (3.11), which justifies the exclusion; however, we have to add the Dirichlet characters 0 lying in Hb3(n, m) since the zeros of the associated L-functions remain after this cancellation. Furthermore, we remark that Proposition 1.6 implies L(s, λχ) = L(s, λχ) for all s ∈ C 0 and all χ ∈ Hb(−4n) ∪ Hb(−4m). This makes it necessary to consider the sets Hb1(−4n) and 0 Hb1(−4m), which contain a representative character for each pair of conjugate complex char- acters, instead of the respective full sets of complex class group characters, in order to have a fighting chance of linear independence. Note that the results below do not depend on the choice of the representatives as we only use information about the corresponding L-functions. We also note that the assumption LIn,m implies that none of the L-functions that are 1 associated to the class group and Dirichlet characters in (3.4) has a zero at s = 2 . Blomer [Blo04b] has proved an unconditional upper bound on the proportion of class group L-functions that can vanish there and Ng [Ng00, §5.1 and §6.2] has investigated the effect of central zeros on the generalized races that we mentioned in Section 3.2. We will first prove that E(X; n, m) has a limiting distribution. In fact, we will show that this distribution is related to the distribution of the following type of random variables: For each real number γ, let Zγ denote a random variable that is uniformly distributed on the unit circle such that the set {Zγ}γ>0 is independent and Z−γ = Zγ; let Yγ denote the random variable that is given by the real part of Zγ. Theorem 3.4. Let n and m be two distinct positive squarefree integers with n, m 6≡ 3 (mod 4) 0 and h(−4n) = h(−4m). Assume that GRHn,m holds. For k ∈ {n, m}, let ω (k) denote the number of odd prime divisors of k. For all characters χ in 0  0 Hb(n, m) := Hb(−4n) ∪ Hb(−4m) ∪ Hb3(n, m) r Hb4(n, m) set ( 2 if χ is a complex character in Hb(−4n) or Hb(−4m), m(χ) = 1 otherwise. Define ω0(m) ω0(n) X X Yγ Y (n, m) = 2 − 2 + 2 q 1 2 χ∈H(n,m) γ>0 4 + γ b L(1/2+iγ,χ)=0 and 1 V (n, m) = X m(χ) X . (3.5) 1 + γ2 χ∈H(n,m) γ∈R 4 b L(1/2+iγ,χ)=0 Then the bias function E( · ; n, m), as defined in (3.3), has a limiting (logarithmic) distribution on R, i.e. there exists a probability measure µn,m on R such that 1 Z X dt Z ∞ lim g(E(t; n, m)) = g(t) dµn,m(t) X→∞ log X 2 t −∞ 82 Chebyshev’s bias and prime number races for binary quadratic forms for all bounded continuous functions g on R. Moreover, if we additionally assume LIn,m, then E( · ; n, m) has the same limiting distribution as the random variable Y (n, m) and its variance is V (n, m). This corresponds to Theorem 1.1 in [RS94] combined with Proposition 2.6 and Proposition 2.7 in [FM13]. From Theorem 3.4 we can then derive an explanation for the apparent bias phenomenon in our prime number races as well as an asymptotic expression for the logarithmic density δ(n, m):

Theorem 3.5. Let n and m be as in Theorem 3.4. Assume that the hypotheses GRHn,m and 1 LIn,m hold. Then the logarithmic density δ(n, m) exists. It is greater than 2 if and only if ω0(n) < ω0(m). Moreover, 1 2ω0(m) − 2ω0(n) (2ω0(m) − 2ω0(n))3  δ(n, m) = + + O (3.6) 2 p2πV (n, m) V (n, m)3/2 and the asymptotic behaviour of the variance in the denominators is given by V (n, m) ∼ 4h(−4n) log(nm) (3.7)

−1/8+ε as n, m → ∞. Thus, the “bias term” in (3.6) is Oε((nm) ) and the error term is −3/8+ε Oε((nm) ) for all ε > 0. These results correspond to Remark 1.3, the ensuing symmetry analysis and one part of Proposition 3.1 in [RS94] as well as Theorem 1.1 and Proposition 3.6 in [FM13]. They immediately imply: Corollary 3.6. For all integers n and m as in Theorem 3.4, with n and m having a distinct number of odd prime divisors, there exists – subject to the assumptions GRHn,m and LIn,m – a “bias” in the “prime number race”

n 2 2o n 2 2o p 6 X | ∃x, y ∈ Z : p = x + ny versus p 6 X | ∃x, y ∈ Z : p = x + my towards the contestant corresponding to the integer n or m with the smaller number of odd prime divisors. The “bias” dissolves as n and m grow. This result provides an explanation of the biases that show in Figures 3.1–3.3 as well as the lack of an apparent bias in Figure 3.4. However, a more explicit result would be necessary to quantify the biases by means of explicit values for the respective logarithmic densities.

3.3.2 Proofs We will closely follow the corresponding proofs in [RS94] and [FM13]. In fact, we will soon see that, after some initial preparation and adaptation, the proofs in these papers may be used almost verbatim. Thus, we will mostly only point out the differences that arise. Let X > 2 and fix two positive squarefree integers n and m with n, m 6≡ 3 (mod 4) and h(−4n) = h(−4m), i.e. such that −4n and −4m are two negative fundamental discriminants with the same class number. As in Chapter 2, we will again capitalize on the relationship between ideal classes and form classes. For this purpose we define X πe0(X; n) = 1, p∈P (−4n) N(p)6X X πe(X; n) = 1. p∈Z(−4n) N(p)6X 3.3 Prime number races for forms of the shape x2 + ny2 83

Note that we have

1 X πe0(X; n) =π ˘0(X; n) + 2 (1 − χ−4n(p)) + O(1), p X1/2 6 (3.8) 1 X πe(X; n) =π ˘(X; n) + 2 (1 − χ−4n(p)) + O(1 + log n). p6X1/2 Remark 3.7. We recall that, by Remark 1.9, the second terms on the right sides of (3.8) account for the prime ideals which lie over rational primes that remain prime in O(−4n); their norm is the square of a prime. These ideals are the prime ideals which do not correspond to primes that can be represented by a binary quadratic form of discriminant −4n. Note that these prime ideals cannot lie in any other ideal class than the principal one. The remainder terms in (3.8) arise from prime ideals which lie over ramified primes. The corresponding normalized bias functions are

log X   Ee(X; n) = √ h(−4n)π0(X; n) − π(X; n) , X e e log X   Ee(X; n, m) = √ · h(−4n) · π0(X; n) − π0(X; m) . X e e

For all class group characters χ ∈ Hb(−4n), we also set X ψe(X; χ) = χ(a)Λ(e a), a∈Z(−4n) N(a)6X 1 X ψe (X; n) = ψe(X; χ), 0 h(−4n) χ∈Hb(−4n) where Λe is the function that we have defined in (2.36). Thus, we have X X ψe0(X; n) = log(N(p)) `>1 p∈Z(−4n) ` ` p ∈P (−4n), N(p) 6X = X log(N(p)) + X X log(N(p)) p∈P (−4n) C∈H(−4n) p∈C 1/2 N(p)6X C2=P (−4n) N(p)6X + X X log(N(p)). `>3 p∈Z(−4n) ` ` p ∈P (−4n), N(p) 6X It is the second term on the right side that will account for the bias – similar to the cases in Sections 3.1 and 3.2. Recall that κ(−4n) denotes the number of ambiguous classes in H(−4n) and since n 6≡ 3 (mod 4) and n is squarefree, we see from (1.2) that κ(−4n) = 2ω0(n), where ω0(n) is the number of odd prime divisors of n. Thus, for all ε > 0,

1/2 X ω0(n) X 1/3 ε ψe (X; n) = log(N(p)) + 2 + O X (log X)n (3.9) 0 h(−4n) ε p∈P (−4n) N(p)6X

ω0(n) ε by the conditional prime ideal theorem (1.15) and the bound 2 6 τ(n) ε n . Using the conditional explicit formula for ψe(X; χ), Theorem 1.18, instead of [RS94, (2.1)] and the general 84 Chebyshev’s bias and prime number races for binary quadratic forms asymptotic formula for the number of zeros of L-functions, Theorem 5.8 in [IK04], instead of [RS94, (2.4)], we deduce (as in [RS94, Lemma 2.1])

 ε  ω0(n) X ψe(X; χ) n Ee(X; n) = −2 + 1 + √ + Oε X log X χ∈Hb(−4n) (−4n) χ6=χ0 by partial summation from (3.9). Switching back from prime ideals to prime numbers, this equation and the equations in (3.8) imply that

0 ψe(X; χ) E(X; n) = − 2ω (n) + 1 + X √ X χ∈Hb(−4n) (−4n) χ6=χ0   ε  (log X) h(−4n) − 1 X  n − √ 1 − χ−4n(p) + Oε . 2 X log X p6X1/2 Note that  1/2 2 2 (log X) h(−4n) − 1 X n (log n) (log X) √ χ−4n(p)  X X1/4 p6X1/2 under the Generalized Riemann Hypothesis for Dirichlet characters (see [MV07, Theorem 13.7], for example). Moreover, we have X  π˘(X; n) − π˘(X; m) = χ−4n(p) − χ−4m(p) + O(log nm). p6X So 0 0 ψe(X; χ) ψe(X; χ) E(X; n, m) = 2ω (m) − 2ω (n) + X √ − X √ X X χ∈Hb(−4n) χ∈Hb(−4m) (−4n) (−4m) χ6=χ0 χ6=χ0 (3.10) 1/2+ε log X  X  (n + m)  + √ χ−4n(p) − χ−4m(p) + Oε . X log X p6X Using once again the explicit formula for ψe(X; χ) for class group characters χ as well as the explicit formula for ψ(X; χ) for χ = χ−4n and χ = χ−4m (compare [RS94, (2.12)]), we arrive at

0 0 E(X; n, m) = 2ω (m) − 2ω (n) iγ iγ X X X X X X − 1 + 1 2 + iγ 2 + iγ χ∈Hb(−4n) |γ| 1 and all ε > 0. But this is of the same form as “E(X; q, a)” in [RS94, (2.5), (2.6)]; to be precise: It is of the same form that the corresponding difference E(X; q, a) − E(X; q, b) 3.3 Prime number races for forms of the shape x2 + ny2 85 of Rubinstein–Sarnak would take in [RS94, (2.5), (2.6)]. Note that the factor “χ(a)” which appears in their formula is hidden in the sums of the second line of the right side of (3.11): The corresponding factors would be χ(P (−4n)) and χ(P (−4m)), respectively, but these are equal to 1 for all class group characters χ as P (−4n) and P (−4m) are the trivial elements of the respective ideal class groups. Now, this means that the remaining part of the proof of Theorem 1.1 in [RS94, §2.1] can be applied verbatim here. This finishes the proof of the first assertion of Theorem 3.4, i.e. the proof of the existence of a limiting distribution of E(X; n, m). Next, we turn to the paper of Fiorilli and Martin [FM13]. The second assertion of our Theorem 3.4 corresponds to their Proposition 2.6. Their proof uses:

(i) Their Proposition 2.3, which establishes the relation between the sums over zeros and the random variables Yγ: The assumption that the random variables Zγ (and therefore also the random variables Yγ) are independent requires in [FM13] the assumption of the Grand Simplicity Hypothesis GSHq, which we have introduced in Section 3.1. Therefore we require the assumption of an appropriate analogue of this linear independence hypothesis here. In fact, our hypothesis LIn,m is an appropriate analogue: Set

0 0 Hb5(−4n) = Hb(−4n) r Hb4(n, m), 0  −4n Hb6(−4n) = χ−4n/d : d ∈ D(n, m) and d ∈/ D(n, m) ,

0 where D(n, m) and Hb4(n, m) were defined in Definition 3.2. The second and third line of the right side of (3.11) can then be rewritten as

Xiγ Xiγ − X X + X X 1 + iγ 1 + iγ 0 |γ|

because the zeros of the L-functions L(s, χd) for d ∈ D(n, m) r {1} cancel out; see Remark 3.3. There we have also noted that L(s, λχ) = L(s, λχ). Thus we may rewrite the second and third line of the right side of (3.11) as

Xiγ Xiγ − 2 X X + 2 X X 1 + iγ 1 + iγ 0 |γ|

where 0  0 0  0 Hb7(−4n) = Hb2(−4n) ∪ Hb6(−4n) r Hb4(n, m).

By our hypothesis LIn,m in Definition 3.2, all remaining zeros are linearly independent over Q; this assumption is therefore indeed the right analogue of the Grand Simplicity Hypothesis. Thus, the proof of an appropriate analogue of [FM13, Proposition 2.3] goes through verbatim. 86 Chebyshev’s bias and prime number races for binary quadratic forms

(ii) The analogue in [RS94] of our equation (3.11): Letting T → ∞ in (3.11) with the second and third line replaced by the rewritten form (3.12), we get iγ ω0(m) ω0(n) X X X E(X; n, m) = 2 − 2 + (δχ,m − δχ,n) 1 + iγ χ∈H(n,m) γ∈R 2 b L(1/2+iγ,χ)=0 (3.13)  1  + O , n,m log X where ( 0 0 0 1 if χ ∈ Hb5(−4n) ∪ Hb6(−4n) =: Hb8(−4n), δχ,n = 0 otherwise,

0 0 and Hb(n, m) = Hb8(−4n)∪Hb8(−4m). Thus, equation (3.13) has now the same form as the second displayed equation after [FM13, (2.3)] and we are therefore in the same situation to finish the proof of the second assertion of Theorem 3.4. Finally, the proof of our assertion about the variance of E(X; n, m) is basically the same as the proof of [FM13, Proposition 2.7], except that we have to factor in the double appearance of the zeros of the L-functions that are associated to the complex class group characters: Since the variance of 2Yγ equals four times the variance of Yγ, the extra factor m(χ) appears in (3.5). The proof of Theorem 3.4 may therefore be considered complete. Now we come to the proof of Theorem 3.5. Its initial assertion, which explains the existence or non-existence of a bias in our prime number races, follows along the lines of §3.1 and §3.2 in [RS94]; it is basically a consequence of the existence of the limiting distribution that we have found in Theorem 3.4, the linear independence hypothesis and the symmetry of the Bessel function J0. The double appearance of some zeros does not pose a problem since we know – or rather stipulated – which zeros appear twice and thus we can adjust the proof in the same way that we have adjusted the linear independence hypothesis LIn,m and the proof of Theorem 3.4. In order to prove the remaining statements in Theorem 3.5, we may continue to follow [FM13, §2.2 – §3.2]. In fact, the proof simplifies in our situation: We do not have to deal with imprimitive characters and whenever they have an expression involving “|χ(a) − χ(b)|” this is replaced by “|δχ,n − δχ,m| = 1” here. Basically, it usually suffices to replace “c(q, b) − c(q, a)” by ω0(m) ω0(n) “2 −2 ”, “χ(a)−χ(b)” by “δχ,n −δχ,m” and “ϕ(q)” by “2h(−4n)+2h(−4m) = 4h(−4n)” 0 0 0 0 (or, to be precise, by “4|Hb1(−4n)| + 4|Hb1(−4n)| + |Hb7(−4m)| + |Hb7(−4m)|”, but this is not significant for the asymptotic considerations) in their proofs; note that the factor 2 stems from the weight m(χ) for complex class group characters in (3.5). Along the way we encounter only few points which demand slight intervention: (i) In their Proposition 2.15, Fiorilli and Martin use a completely explicit bound for N(T, χ), 1 the number of zeros of L( 2 + it, χ) with |t| 6 T . We may replace this bound with a non-explicit estimate like the one in [IK04, Theorem 5.8] since we do not aim to find an explicit error term in Theorem 3.5. (ii) A key ingredient of their proof is an integral formula for the logarithmic density, which they quote from a paper of Feuerverger and Martin (see [FM13, Proposition 2.18]). Feuerverger and Martin derive this formula from a much more general result of theirs, but the proof of this special case is also sketched in [RS94] (see equations (4.1) and (4.2) there) and does not need any essential adaptations to satisfy our needs; similarly to the first assertion of Theorem 3.5, the formula follows from the existence of the limiting distribution in Theorem 3.4 and the linear independence hypothesis. A slightly more detailed proof of the integral formula can be found in [Ng00, §5.2 + §5.3.1]. 3.3 Prime number races for forms of the shape x2 + ny2 87

(iii) In order to compute the asymptotic behaviour of the variance, two ingredients are im- portant: First, a formula which relates the sums

X 1 1 + γ2 γ∈R 4 L(1/2+iγ,χ)=0

L0(1,χ) to the real parts of the quotients L(1,χ) is required. A general formula which also cov- ers class group characters can be found by combining [IK04, Theorem 5.6] and [IK04, Proposition 5.7]. Second, a bound for these quotients is needed and we find such a bound in [IK04, Theorem 5.17]. Thus, we split the variance into multiple parts: the contributions 0 0 0 0 from the sums over characters in Hb5(−4n), in Hb5(−4m), in Hb6(−4n) and in Hb6(−4m). Then we apply these formulas to all parts separately, we note that

X m(χ) ∼ 4h(−4n) χ∈Hb(n,m) as n, m → ∞, and thus we get (3.7) as in [FM13, §3.1 + §3.2]. On a side note, we remark that it is not possible to fix either n or m and let the other grow because the condition h(−4n) = h(−4m) and the class number bounds (2.11) and (2.32) prevent this.

Note that the proof is capable to give, in fact, a direct analogue of the more general formula [FM13, (1.1)]. Finally, the concluding assertions of Theorem 3.5 are a consequence of (3.7) and the class number bound (2.32). This finishes the proof of Theorem 3.5 and Corollary 3.6 follows at once.

3.3.3 Possible extensions The great deal of recent research in comparative prime number theory for primes in arithmetic progressions provides multiple ways to extend the results of this section:

(1) Prime number races with more than two competing discriminants could be investigated, i.e. one could determine the existence (and size) of any deviation of the logarithmic  1 densities of the sets X > 2 | π0(X; n1) > ··· > π0(X; nr) from the symmetric value r! for all r-tuples (n1, . . . , nr) of distinct positive (squarefree) integers ni 6≡ 3 (mod 4) with h(−4n1) = ··· = h(−4nr). (2) A more detailed analysis should make it possible to compute explicitly the logarithmic densities for certain races or to give explicit general bounds for all such races. This would require either the numerical computation of many zeros of class group L-functions (as in [RS94] and [Ng00]) or the calculation of completely explicit estimates for the corres- ponding numbers of zeros on the critical line up to any given height (as in [FM13]) – which do not seem to exist in the literature, but which could probably be extracted from [LO77] (compare Remark 1.19).

(3) The low-lying zeros of (real) Dirichlet L-functions have a major effect on the bias in prime number races between quadratic residues and non-residues; see [FM13, §3.6], for example. Thus, it would be interesting to see whether the results of Fouvry and Iwaniec that we sketched in Remark 2.24 can give additional information on the prime number races between forms of the shape x2 + ny2. 88 Chebyshev’s bias and prime number races for binary quadratic forms

(4) Eventually, it would also be worthwhile to investigate the need of the unproved assump- tions. In particular, one could try to find out whether the existence of certain hypothetical zeros off the critical line would distort the densities of the sets P (n, m) in an unusual way.

This last point then also raises the question whether the two antithetic types of results in this thesis, those on “uniformity on average” and those on discrepancies in the distribution of prime numbers represented by binary quadratic forms, could somehow connect with each other. Since the mean-value results of Bombieri–Vinogradov type and Barban–Davenport–Halberstam type have often turned out to be apt substitutes for the Generalized Riemann Hypothesis, one would hope to find a way to introduce them in questions on prime number races. Of course, one cannot reasonably expect that they may be of any use when one considers biases for a fixed discriminant or for a fixed residue class as in the first two sections of this chapter – just as little as for the fixed pairs of discriminants that we have considered in this section. Also, the ranges in the results of the second chapter would probably first have to be improved to reach the status of a “Generalized Riemann Hypothesis on average” to be potentially useful for prime number races of binary quadratic forms. However, it is conceivable that interesting unconditional results could then be attained for pairs of growing discriminants, for example. But this is another story and should be investigated at another time. Bibliography

[Bar66] Mark B. Barban, The “large sieve” method and its application to number theory, Russian Mathematical Surveys 21 (1966), no. 1, 49–103. Translated by H. J. Godwin (Russian original appeared in Uspehi Mat. Nauk 21 (1), 1966). [Ber12] Paul Bernays, Über die Darstellung von positiven, ganzen Zahlen durch die primitiven, binären quadratischen Formen einer nicht-quadratischen Diskriminante, Dissertation (Universität Göttingen), 1912. [BG06] Valentin Blomer and Andrew Granville, Estimates for representation numbers of quadratic forms, Duke Math. J. 135 (2006), no. 2, 261–302. [BGHZ08] Jan Hendrik Bruinier, Gerard van der Geer, Günter Harder, and Don Zagier, The 1-2-3 of modular forms, Universitext, Springer-Verlag, Berlin, 2008. [BHM07] Valentin Blomer, Gergely Harcos, and Philippe Michel, Bounds for modular L-functions in the level aspect, Ann. Sci. École Norm. Sup. (4) 40 (2007), no. 5, 697–740. [Blo04a] Valentin Blomer, Binary quadratic forms with large discriminants and sums of two squareful numbers, J. Reine Angew. Math. 569 (2004), 213–234. [Blo04b] Valentin Blomer, Non-vanishing of class group L-functions at the central point, Ann. Inst. Fourier (Grenoble) 54 (2004), no. 4, 831–847. [Bom65] Enrico Bombieri, On the large sieve, Mathematika 12 (1965), 201–225. [Bom87] Enrico Bombieri, Le grand crible dans la théorie analytique des nombres, Astérisque 18 (1987), 103 pp. [Br95] Jörg Brüdern, Einführung in die analytische Zahlentheorie, Springer-Verlag, Berlin, 1995. [BST13] Manjul Bhargava, Arul Shankar, and Jacob Tsimerman, On the Davenport–Heilbronn theorems and second order terms, Invent. Math. 193 (2013), no. 2, 439–499. [Bue89] Duncan A. Buell, Binary quadratic forms, Springer-Verlag, New York, 1989. [BV07] Johannes Buchmann and Ulrich Vollmer, Binary quadratic forms, Algorithms and Computation in Mathematics, vol. 20, Springer-Verlag, Berlin, 2007. [BZ07] Stephan Baier and Liangyi Zhao, Primes in quadratic progressions on average, Math. Ann. 338 (2007), no. 4, 963–982. [Coh93] Henri Cohen, A course in computational algebraic number theory, Graduate Texts in Mathematics, vol. 138, Springer-Verlag, Berlin, 1993. [Cox97] David A. Cox, Primes of the form x2 + ny2: Fermat, class field theory and complex multiplication, Paperback ed., A Wiley-Interscience Publication, John Wiley & Sons Inc., New York, 1997. [Dav00] , Multiplicative number theory, Third ed., Graduate Texts in Mathematics, vol. 74, Springer-Verlag, New York, 2000. [DFI02] William Duke, John Friedlander, and Henryk Iwaniec, The subconvexity problem for Artin L-functions, Invent. Math. 149 (2002), no. 3, 489–577. [Dit13] Jakob Ditchen, On the average distribution of primes represented by binary quadratic forms, ArXiv e-print (December 2013), available at http://arxiv.org/abs/1312.1502. [DK00] William Duke and Emmanuel Kowalski, A problem of Linnik for elliptic curves and mean-value es- timates for automorphic representations, Invent. Math. 139 (2000), no. 1, 1–39. [dlVP96] Charles-Jean de la Vallée Poussin, Recherches analytiques sur la théorie des nombres premiers, Ann. Soc. Sci. Bruxelles 20 (1896), 183–256, 281–362, 363–397. [dlVP97] Charles-Jean de la Vallée Poussin, Recherches analytiques sur la théorie des nombres premiers, Ann. Soc. Sci. Bruxelles 21 (1897), 251–342, 343–368. 90 Bibliography

[EH71] Peter D. T. A. Elliott and Heini Halberstam, The least prime in an arithmetic progression, in: Studies in Pure Mathematics (Presented to Richard Rado), Academic Press, 1971, pp. 59–61. [FG92] John Friedlander and Andrew Granville, Limitations to the equi-distribution of primes. III, Compositio Math. 81 (1992), no. 1, 19–32. [FI03a] Étienne Fouvry and Henryk Iwaniec, Low-lying zeros of dihedral L-functions, Duke Math. J. 116 (2003), no. 2, 189–217. [FI03b] John Friedlander and Henryk Iwaniec, Exceptional characters and prime numbers in arithmetic pro- gressions, Int. Math. Res. Not. 37 (2003), 2033–2050. [FI10] John Friedlander and Henryk Iwaniec, Opera de cribro, American Mathematical Society Colloquium Publications, vol. 57, American Mathematical Society, Providence, RI, 2010. [FI98] John Friedlander and Henryk Iwaniec, The polynomial X2 + Y 4 captures its primes, Ann. of Math. (2) 148 (1998), no. 3, 945–1040. [FK02] Kevin Ford and Sergei Konyagin, Chebyshev’s conjecture and the prime number race, in: IV Interna- tional Conference “Modern Problems of Number Theory and its Applications” (Tula, 2001), Mosk. Gos. Univ. im. Lomonosova, Mekh.-Mat. Fak., Moscow, 2002, pp. 67–91. [FLK13] Kevin Ford, Youness Lamzouri, and Sergei Konyagin, The prime number race and zeros of Dirichlet L-functions off the critical line: Part III, Q. J. Math. 64 (2013), no. 4, 1091–1098. [FM13] Daniel Fiorilli and Greg Martin, Inequities in the Shanks-Rényi Prime Number Race: An asymptotic formula for the densities, J. Reine Angew. Math. 676 (2013), 121–212. [Fog61] Ernests Fogels, On the distribution of prime ideals, Acta Arith. 7 (1961/1962), 255–269. [Fog65] Ernests Fogels, On the zeros of L-functions, Acta Arith 11 (1965), 67–96. [Fog67] Ernests Fogels, Corrigendum: “On the zeros of L-functions”, Acta Arith. 14 (1967/1968), 435. [Gal68] P. X. Gallagher, Bombieri’s mean value theorem, Mathematika 15 (1968), 1–6. [Gel75] Stephen S. Gelbart, Automorphic forms on adèle groups, Annals of Mathematics Studies, No. 83, Princeton University Press, Princeton, N.J., 1975. [GM06] Andrew Granville and Greg Martin, Prime number races, Amer. Math. Monthly 113 (2006), no. 1, 1–33. [Gol06] Dorian Goldfeld, Automorphic forms and L-functions for the group GL(n, R), Cambridge Studies in Advanced Mathematics, vol. 99, Cambridge University Press, Cambridge, 2006. [Gol70] Larry Joel Goldstein, A generalization of the Siegel-Walfisz theorem, Trans. Amer. Math. Soc. 149 (1970), 417–429. [GPY09] Daniel A. Goldston, János Pintz, and Cem Y. Yıldırım, Primes in tuples. I, Ann. of Math. (2) 170 (2009), no. 2, 819–862. [GSS07] Catherine Goldstein, Norbert Schappacher, and Joachim Schwermer (eds.), The shaping of arithmetic after C. F. Gauss’s disquisitiones arithmeticae, Springer-Verlag, Berlin, 2007. [HB09] D. R. Heath-Brown, Convexity bounds for L-functions, Acta Arith. 136 (2009), no. 4, 391–395. [HB95] D. R. Heath-Brown, A mean value estimate for real character sums, Acta Arith. 72 (1995), no. 3, 235–275. [HBM02] D. R. Heath-Brown and B. Z. Moroz, Primes represented by binary cubic forms, Proc. London Math. Soc. (3) 84 (2002), no. 2, 257–288. [Hec17] Erich Hecke, Über die L-Funktionen und den Dirichletschen Primzahlsatz für einen beliebigen Zahlkörper, Nachrichten der K. Gesellschaft der Wissenschaften zu Göttingen, Mathematisch- physikalische Klasse (1917), 299–318. [Hin81] Jürgen G. Hinz, On the theorem of Barban and Davenport-Halberstam in algebraic number fields, J. Number Theory 13 (1981), no. 4, 463–484. [HM06] Gergely Harcos and Philippe Michel, The subconvexity problem for Rankin-Selberg L-functions and equidistribution of Heegner points. II, Invent. Math. 163 (2006), no. 3, 581–655. [HM13] Roman Holowinsky and Ritabrata Munshi, Level aspect subconvexity for Rankin-Selberg L-functions, in: Automorphic Representations and L-functions, Tata Inst. Fund. Res., 2013, pp. 311–334. [Hoo75] Christopher Hooley, On the Barban-Davenport-Halberstam theorem. I, J. Reine Angew. Math. 274/275 (1975), 206–223. Bibliography 91

[HR11] Heini Halberstam and Hans-Egon Richert, Sieve methods, Dover Publications, 2011. Republication of the work originally published in 1974 by Academic Press. [IK04] Henryk Iwaniec and Emmanuel Kowalski, Analytic number theory, American Mathematical Society Colloquium Publications, vol. 53, American Mathematical Society, Providence, RI, 2004. [Iwa97] Henryk Iwaniec, Topics in classical automorphic forms, Graduate Studies in Mathematics, vol. 17, American Mathematical Society, Providence, RI, 1997. [KM02] Emmanuel Kowalski and Philippe Michel, Zeros of families of automorphic L-functions close to 1, Pacific J. Math. 207 (2002), no. 2, 411–431. [KM97] Emmanuel Kowalski and Philippe Michel, Sur les zéros des fonctions L automorphes de grand niveau, ArXiv e-print (1997), available at http://arxiv.org/abs/math/9707238v1. [Kow04] Emmanuel Kowalski, Un cours de théorie analytique des nombres, Cours Spécialisés, vol. 13, Société Mathématique de France, Paris, 2004. [Kow08] Emmanuel Kowalski, The large sieve and its applications, Cambridge Tracts in Mathematics, vol. 175, Cambridge University Press, Cambridge, 2008. [KT64] S. Knapowski and P. Turán, Further developments in the comparative prime-number theory. II. A modification of Chebyshev’s assertion, Acta Arith. 10 (1964), 293–313. [Kud03] Stephen S. Kudla, From modular forms to automorphic representations, in: An introduction to the Langlands program (Jerusalem, 2001), Birkhäuser Boston, 2003, pp. 133–151. [Lan06] Edmund Landau, Über einen Satz von Tschebyschef, Math. Ann. 61 (1906), no. 4, 527–550. [Lan14] Edmund Landau, Über die Primzahlen in definiten quadratischen Formen und die Zetafunktion reiner kubischer Körper, in: Schwarz-Festschrift, Berlin, 1914, pp. 244–273. [Lan18] Edmund Landau, Über einige ältere Vermutungen und Behauptungen in der Primzahltheorie, Math. Z. 1 (1918), 1–24. [Li79] Wen-Ch’ing Winnie Li, L-series of Rankin type and their functional equations, Math. Ann. 244 (1979), no. 2, 135–166. [Lin44] U. V. Linnik, On the least prime in an arithmetic progression, Rec. Math. [Mat. Sbornik] N.S. 15(57) (1944), 139–178 and 347–368. [LO77] J. C. Lagarias and A. M. Odlyzko, Effective versions of the Chebotarev density theorem, in: Algebraic number fields: L-functions and Galois properties (Proc. Sympos., Univ. Durham, Durham, 1975), Academic Press, 1977, pp. 409–464. [LP92] H. W. Lenstra Jr. and Carl Pomerance, A rigorous time bound for factoring integers, J. Amer. Math. Soc. 5 (1992), no. 3, 483–516. [Mic07] Philippe Michel, Analytic number theory and families of automorphic L-functions, in: Automorphic forms and applications, IAS/Park City Math. Ser., vol. 12, Amer. Math. Soc., 2007, pp. 181–295. [MM87] M. Ram Murty and V. Kumar Murty, A variant of the Bombieri-Vinogradov theorem, in: Number theory (Montreal, Que., 1985), CMS Conf. Proc., vol. 7, Amer. Math. Soc., 1987, pp. 243–272. [MP13] M. Ram Murty and Kathleen L. Petersen, A Bombieri-Vinogradov theorem for all number fields, Trans. Amer. Math. Soc. 365 (2013), no. 9, 4987–5032. [MS99] Andrey Markov and Nikolay Sonin (eds.), Œuvres de P.L. Tchebychef, Commissionaires de l’Académie impériale des sciences, Saint Petersburg, 1899. [MtR04] Pieter Moree and Herman J. J. te Riele, The hexagonal versus the square lattice, Math. Comp. 73 (2004), no. 245, 451–473. [MV07] Hugh L. Montgomery and Robert C. Vaughan, Multiplicative number theory. I. Classical theory, Cambridge Studies in Advanced Mathematics, vol. 97, Cambridge University Press, Cambridge, 2007.

[MV10] Philippe Michel and Akshay Venkatesh, The subconvexity problem for GL2, Publ. Math. Inst. Hautes Études Sci. 111 (2010), 171–271. [Nar00] Władysław Narkiewicz, The development of prime number theory, Springer Monographs in Mathem- atics, Springer-Verlag, Berlin, 2000. [Nar04] Władysław Narkiewicz, Elementary and analytic theory of algebraic numbers, Third ed., Springer Monographs in Mathematics, Springer-Verlag, Berlin, 2004. [Neu92] Jürgen Neukirch, Algebraische Zahlentheorie, Springer-Verlag, Berlin, 1992. 92 Bibliography

[Ng00] Nathan Ng, Limiting distributions and zeros of Artin L-functions, Dissertation (University of British Columbia), 2000. [RS94] Michael Rubinstein and Peter Sarnak, Chebyshev’s bias, Experiment. Math. 3 (1994), no. 3, 173–197. [Rud87] Walter Rudin, Real and complex analysis, Third ed., McGraw-Hill Book Co., New York, 1987. [Sch86] P. D. Schumer, On the large sieve inequality in an algebraic number field, Mathematika 33 (1986), no. 1, 31–54. [Ser81] Jean-Pierre Serre, Quelques applications du théorème de densité de Chebotarev, Inst. Hautes Études Sci. Publ. Math. 54 (1981), 323–401. [Sha59] Daniel Shanks, Quadratic residues and the distribution of primes, Math. Tables Aids Comput. 13 (1959), 272–284. [Vin65] Askold I. Vinogradov, On the density hypothesis for Dirichet L-series (Russian), Izv. Akad. Nauk SSSR Ser. Mat. 29 (1965), 903–934. [Vin66] Askold I. Vinogradov, Correction to the paper of A. I. Vinogradov “On the density hypothesis for Dirichlet L-series” (Russian), Izv. Akad. Nauk SSSR Ser. Mat. 30 (1966), 719–720. [Web82] Heinrich Weber, Beweis des Satzes, dass jede eigentlich primitive quadratische Form unendlich viele Primzahlen darzustellen fähig ist, Math. Ann. 20 (1882), no. 3, 301–329. [Wei83] Alfred Weiss, The least prime ideal, J. Reine Angew. Math. 338 (1983), 56–94. [Xyl11] Triantafyllos Xylouris, On the least prime in an arithmetic progression and estimates for the zeros of Dirichlet L-functions, Acta Arith. 150 (2011), no. 1, 65–91. [Zag81] Don Zagier, Zetafunktionen und quadratische Körper, Springer-Verlag, Berlin, 1981. [Zha13] Yitang Zhang, Bounded gaps between primes, Preprint (2013). To appear in Annals of Math.

Jakob J. Ditchen Zürich, Autumn 2013