<<

NUMBER THEORY VIA ALGEBRA AND GEOMETRY

DANIEL LARSSON

CONTENTS 1. Introduction 2 2. Rings 3 2.1. Definition and examples 3 3. Basics @ring 5 3.1. Ideals and subrings 6 4. Integral domains 7 4.1. Homomorphisms, quotient rings and the first isomorphism theorem 8 4.2. UFD’s, PID’s and Euclidean domains 13 4.3. The Gaussian integers 18 4.4. 20 5. Fields 22 5.1. Definition and examples 22 6. Basics @field 23 6.1. Fields of fractions 24 7. extensions 25 7.1. Field extensions 25 7.2. Algebraic extensions, transcendental extensions 26 7.3. Simple and finitely generated extensions 27 7.4. 28 8. Finite fields 28 8.1. The main theorem 29 8.2. The Frobenius morphism 30 9. fields 30 9.1. Algebraic numbers 30 9.2. Norms, traces and conjugates 31 9.3. Algebraic integers and rings of integers 34 9.4. Integral bases 38 9.5. Computing rings of integers 39 9.6. Examples 42 10. Quadratic number fields 44 10.1. Ring of integers of quadratic number fields 44 10.2. The Ramanujan–Nagell theorem 44 11. Dirichlet’s unit theorem 47 11.1. Roots of unity 47 11.2. Units in number fields 48 12. Dedekind domains 49 12.1. A few important remarks 49 12.2. The main theorem on Dedekind domains 50 1 2 D. LARSSON

13. Extensions, decomposition and ramification 54 13.1. Ramification and decomposition 55 13.2. Consequences for quadratic number fields 58 13.3. Consequences for some non-quadratic number fields 64 14. Cyclotomic number fields 65 14.1. Cyclotomic fields 65 14.2. Galois theory of number fields 72 14.3. Gauss sums and Quadratic Reciprocity 74 14.4. Cubic reciprocity 81 15. Arithmetic and Geometry 84 15.1. Affine n-space 84 15.2. Projective n-space 86 15.3. Algebraic curves 87 15.4. Cubic and elliptic curves 91 15.5. The structure of an elliptic curve 92 15.6. The group law 93 15.7. Points of finite order 95 15.8. The Nagell–Lutz theorem 96 15.9. Mordell’s Theorem and Conjecture 97 16. Gauss’ Class Number Problem and the Riemann hypotheses: A Historical Survey 98 16.1. Gauss’ class number problem 98 16.2. Quadratic fields and forms 99 16.3. What does this have to do with the class number problem? 103 16.4. Zeta functions and the Riemann hypothesis 104 16.5. Back to quadratic fields 107 17. Appendix A: Linear algebra 109 17.1. Vector spaces and bases 109 17.2. Maps 109 17.3. Dual spaces 110 17.4. Operations on maps 110 17.5. Linear equations and inverses 111 17.6. Modules 111 17.7. Vandermonde determinants 112 18. Appendix B: Chinese remainder theorem 113

1. INTRODUCTION These of notes are the Lecture Notes accompanying my class in Number theory at Uppsala University, Spring terms 2008 and 2009. The notes begin with some very basic ring and field theory in order to set the stage, and continues to more advanced topics successively and probably in a rather steep upwards slant from an extremely soft and cosy start. I allow myself a few digressions in the text that are not part of the syllabus but that I feel are in a sense part of the required “know-of” for aspiring mathematicians. In this case I am mainly refering to the sections concerning the Gauss’ class number problem and the Ramanujan–Nagell theorem. These are then not formally part of the course and will not 3 be discussed in an exam. But I strongly encourage readers to at least read through these parts to get an idea of the beauty that lies within. Also, some parts are more abstract and technical, mainly the section on rings of in- tegers. The proofs here are rather difficult but I felt that if I didn’t include them in the course I would be cheating (which I don’t like). Therefore, I don’t require the students to learn this material, but rather to have this as a fall-back solution if the later results feel a bit hollow and improperly motivated. In the same spirit, I include Appendices with notions from “linear algebra over rings” for easy reference. As a twist to this course I added a section on elliptic curves, a topic that, without a doubt, will be part of every course on number theory that ever will be given anywhere on the planet, or elsewhere (this is a foretelling on my part). A home-assignment will be given where the students are to learn the basics of elliptic curves over finite fields, so that they immediately can understand the basic ideas and quickly learn the techniques of elliptic curve cryptography and (large) integer factorization using elliptic curves. This will surely be a worthwhile effort for every mathematically inclined student.

2. RINGS I will assume that everyone knows what an is. 2.1. Definition and examples. We begin with the following definition. Definition 2.1. Let R be a binary set with two closed operations, ’+’ (addition) och ’∗’ (multiplication). Then R = (R,+,∗) is a ring if addition and multiplication is compatible according to the following axioms: Rng1: The operation + makes R into an abelian group, that is, a + b = b + a for all a,b ∈ R. Rng2: There is an element 1 such that 1 ∗ a = a ∗ 1 = a for all a ∈ R, or in other words, a multiplicative unit, often simply called a one. Rng3: The multiplication is associative: a ∗ (b ∗ c) = (a ∗ b) ∗ c, ∀a,b,c ∈ R. Rng4: The multiplication is distributive: a ∗ (b + c) = a ∗ b + a ∗ c and (b + c) ∗ a = b ∗ a + c ∗ a, ∀a,b,c ∈ R. Remark 2.1. Strictly speaking, what we have defined here should be called an associative ring with unity (or associative, unital ring) The most general definition includes only Rng1 och Rng4. There are many examples of structures satisfying only these two axioms. However, for us it is enough to state the definition in the above, more restrictive, way. Note! From now on we write multiplication in the usual fashion ’a · b’ or simply ’ab’. 2.1.1. Examples. Example 2.1. The easiest and most obvious (and arguably the most important) examples are of course the following - Z, the ring of integers; - Q, the ring of rational numbers; - R, the ring of real numbers; - C, the ring of complex numbers. 4 D. LARSSON

Convince yourselves that these are indeed rings! In fact, they are even commutative.

Example 2.2. Another, extremely important example of a ring is Z/hni, the ring of integers modulo n. Recall that this is the set of congruence classes of elements modulo division by n, and can be represented by {0,1,2,...,n − 1}, the remainders after division with n. Informally, one writes Z/hni := {0,1,2,...,n − 1}, implicit being that addition and multiplication are allowed. Example 2.3. Another example is √ √ Z[ n] := {a + b n | a,b ∈ Z} for n a square-free integer. We will see a lot of examples like this in the future, since number-theoretic applications to rings often involve examples like this one. So far, every example has been commutative. Let’s round off this example-listing with a non-commutative number system.

Example 2.4 (Quaternions)√. Recall that the complex numbers C was formed by, to R, add an imaginary number i := −1. As you no doubt remember C := {a + bi | a,b ∈ R}. The ring C has one further property that we haven’t discussed yet (but will, in the next 1 section): every non-zero has an inverse A question that was asked√ during around if one could add yet another “really” imaginary element (not equal to −1), j, so that one gets a new ring with the property that every non-zero element has an inverse. This can be shown to be impossible! However, 1843, W.R. Hamilton realized that if one adds two more elements to C, the element j and another k, then every non-zero element has an inverse! But, alas, one loses one important thing: commutativity. The definition is as follows. Definition 2.2. The set of all “numbers” 2 2 H := {z = a + bi + cj + dk | a,b,c,d ∈ R, i = j = −1 · 1, ij = −ji = k} is a non-commutative ring, where all elements 6= 0 has an inverse. An element of H is called a quaternion and H is called the ring of quaternions. I strongly suggest that you do the following exercise.

Exercise 2.1. Try out a few computations and check that H is indeed a non-commutative ring. (You don’t have to check that it is a ring. This is obvious! Why?) To define an inverse recall how inverses are computed in C and emulate that. In order to do this you will have to define a suitable notion of conjugate quaternion.

Notice that R ⊂ C ⊂ H and that we double the dimension in each step: C is two- dimensional as a over R and H, is four-dimensional. The natural follow-up question is: is it possible to add more imaginary elements and get larger number systems? The answer is yes, and this was given only a few months after Hamilton’s discovery by J.T Graves and later by A. Cayley. But adding just one element is not sufficient (just as it was not sufficient to add just one to C to get H): one has to add

1 One thing, however that is lost when passing from R to C (that is by adding i) is order: it is meaningless to ask which one of two complex numbers is the biggest. The only thing one can, meaningfully say is which of them has the greatest distance from the origin. 5 four new ones! The result is the eight-dimensional octonions, denoted by O. The problem is now that another property is lost: namely, associativity! (Recall the remark after the definition of a ring.) After the octonions comes the sixteen-dimensional sedonions, S. This time there are non-zero elements a and b such that ab = 0. Such elements are called zero-divisors (we will see more on this soon). One can continue this indefinitely with doubling dimensions in each step. The objects in this nested sequence of rings are called Cayley–Dickson algebras (CDA’s) R ⊂ C ⊂ H ⊂ O ⊂ S ⊂ ···. | {z } CDA For those finding this fascinating I recommend the Wikipedia-article on the Cayley–Dickson construction.

3. BASICS @RING We begin with the following rather lengthy definition. Definition 3.1. Let R be a ring (commutative with unity). - An element 0 6= a ∈ R is a zero-divisor if there is a 0 6= b ∈ R such that ab = 0. - An inverse to 0 6= a ∈ R is an element a−1 such that aa−1 = a−1a = 1. An element is called invertible if it has an inverse. - The characteristic, char(R), of R is the least number n ∈ N such that na = a + a + ···a = 0 for all a ∈ R. | {z } n times If no such least n exists, we put char(R) = 0.

Example 3.1. In Z/h6i = {0,1,2,3,4,5}, 2 is a zero-divisor since 2 · 3 = 6 = 0. The characteristic of Z/h6i is six. The invertible elements are 1 and 5 which are their own inverses (check!). On the other hand, Z has no zero-divisors and characteristic zero. The only invertible elements in Z are ±1 and they are also their own inverses. Theorem 3.1. Let R be a ring and a,b,c ∈ R∗ := R \{0}. Then, (i) 0a = a0 = 0, (ii) the unity of R is unique, (iii) (−a)b = a(−b) = −(ab), and (iv) if a multiplicative inverse to a exists, it is unique. Proof. These are simple: (i) 0a = (0 + 0)a = 0a + 0a = 2 · 0a ⇔ 0a = 0 and similarly we have a0 = 0. (ii) The proof of this is exactly as for groups: 1 = 1 · 10 = 10. (iii) What one needs to observe is that to show (iii) all you need to show is that the terms involved are all inverses ab. Then the result follows since additive inverses are unique (the additive structure is a group structure, remember). Indeed, ab + (−a)b = (a + (−a))b = (a − a)b = 0b = [from (i)] = 0. The others are exactly the same and are left to you. (iv) Suppose there are b,c ∈ R such that ab = ac = 1. Then ab−ac = 0 ⇐⇒ a(b−c) = 0 so multiply with one of the inverses to a from the left: ba(b − c) = 0 ⇐⇒ 1(b − c) = 0 ⇐⇒ b = c.  6 D. LARSSON

Direct products. There are several ways of constructing new rings out of old ones. Here is one useful example. Let {Ri}, i ∈ I, where I is some index set, be a collection of rings. The direct product of {Ri}, written ∏i∈I Ri is the set of all sequences

∏Ri := {(ri) | ri ∈ Ri}. i∈I When I is finite (as all our examples will be), one usually write this as

R1 × R2 × ··· × Rn. In this case the ring structure is given by component-wise addition and multiplication, 0 0 0 0 (r1,...,rn) + (r1,...,rn) := (r1 + r1,...rn + rn), 0 0 0 0 (r1,...,rn)(r1,...,rn) := (r1r1,...rnrn), and the zero element and unity are respectively, (0,0,...,0), (1,1,...,1). It is easy to prove (do this!) that R1 × ··· × Rn is a ring when all the Ri’s are rings. On the other hand, it is a little more subtle to show the ring axioms for infinite index sets. The statement is nonetheless true for general index sets.

3.1. Ideals and subrings. From now on we will only deal with commutative rings. This makes life a lot easier since non-commutative rings are often very strange creatures and one has to be very tongue-in-cheek when dealing with them so as not to fall into the trap of thinking commutatively. But for basic number theory the commutative theory suffices. Definition 3.2. Let R be a (commutative) ring. • An ideal in R is a subgroup i ⊆ R such that ri ⊆ i for all r ∈ R. This means that for all r ∈ R and i ∈ i, ri ∈ i. An ideal is called proper if i ( R and trivial if i = 0. • A subring is a subgroup S of R such that ab ∈ S for a,b ∈ S and such that 1 ∈ S. Notice the difference. Lemma 3.2. If 1 ∈ i for some ideal i then i = R. Proof. For every r ∈ R, r = r1 ∈ i so R ⊆ i. The other inclusion is clear from definition of course. 

Example 3.2. In Z, the ideals are on the form nZ := {nz | z ∈ Z}, for some n ∈ Z. How do we prove this? First of all, any subgroup S of a cyclic group (like 2 Z) is cyclic so S = nZ as groups . Clearly nZ is stable under multiplication of Z in the sense that for a ∈ nZ, xa ∈ nZ, for x ∈ Z. Hence nZ is an ideal. What are the subrings? A subring has to be a subgroup so they must then also be on the form nZ for some n ∈ Z. But a subring has to include 1 and so n = 1. This means that there is only one subring, namely, Z itself.

2If this doesn’t ring a bell, here is a direct argument. Let G be a group generated by a ∈ G in the sense that k l for g ∈ G, g = a for some k ∈ Z and let S be a proper subgroup. Every element s ∈ S can be written as s = a d for some l ∈ Z. Let d be the smallest integer dividing all the l’s as s ranges through S. Then S is generated by a . Notice that if d = 1 then S = G, so by the assumption of properness we have, d > 1. 7

Example 3.3. The only ideal of Q is Q itself, so there are no proper ideals. Indeed, let − i ⊆ Q be an ideal. Then for q ∈ i, we have q 1q = 1 ∈ i and so by the lemma above, i = Q. This is a general phenomena for rings where all non-zero elements have inverses. As an exercise, find a subring (there are infinitely many)! You will see examples of this later. Definition 3.3. Let R be a ring. - An ideal given as h f i := {r f | r ∈ R} is called a principal ideal. - A proper ideal p ⊂ R is called a prime ideal if ab ∈ p implies that a ∈ p or b ∈ p. - A proper ideal m ⊂ R is called a maximal ideal if, for a an ideal such that m ⊆ a ⊂ R then a = m.

Example 3.4. In Z all ideals are principal hni = nZ. An ideal in Z is prime if and only if n is a prime; an ideal in Z is maximal if and only if it is prime. 3.1.1. Ideal generation. The easiest and by far the most common way of constructing ideals is by generation:

Definition 3.4. The ideal i is generated by { fi | fi ∈ i,i ∈ I}, where I is some index set, if a ∈ i =⇒ a = ∑ ri fi, for ri ∈ R. J⊆I J finite We write this as i = h{ fi | fi ∈ i,i ∈ I}i. Check that this is an ideal! The ideal is called finitely generated if |I| < ∞, that is, if the number of fi’s are finite. Notice that if |I| = 1 then we get a principal ideal. 3.1.2. Operations on ideals. Suppose that i and j are two ideals of R. We make the following definitions: i + j := {i + j | i ∈ i, j ∈ j} iR := {∑ir | i ∈ i,r ∈ R, finite sum} ij := {∑i j | i ∈ i, j ∈ j, finite sum}. Theorem 3.3. The sets i + j, iR and ij are all ideals of R.

Proof. Exercise! 

4. INTEGRALDOMAINS For number theorists the following definition is of utmost importance. Definition 4.1. A ring with no zero-divisors is called an integral domain or simply a domain.

Example 4.1. The ring Z is an integral domain. In fact, Z is the reason for “integral” in the name. It was the first and most natural ring to study. On the other hand, Z/h6i is not an integral domain since 2 · 3 = 0. Theorem 4.1. A ring R is an integral domain if and only if h0i is a prime ideal.

Proof. Exercise!  8 D. LARSSON

Theorem 4.2. A ring R is an integral domain if and only if it satisfies the cancellation properties: if a 6= 0 then ab = ac =⇒ b = c, and ba = ca =⇒ b = c. Proof. Suppose first that R is a domain. That ab = ac is equivalent to 0 = ab−ac = a(b−c) and since R does not have any zero-divisors and a 6= 0, we get b−c = 0. In the same manner one shows the right-handed version. Conversely, if ab = ac ⇒ b = c, suppose that xy = 0 with x 6= 0. This is equivalent to xy = x0 and hence y = 0.  Theorem 4.3. If every non-zero element in R has an inverse then R is an integral domain. −1 Proof. Let ab = 0 with a 6= 0. Multiply with a from the left and the result follows.  Theorem 4.4. Every non-zero element in a finite domain is invertible.

Proof. Let R := {0,1,a2,··· ,an}. Pick an element 0 6= b ∈ R. We want to fin a c ∈ R such that bc = cb = 1. Multiply all elements in R with b from the left. Then we get: {0,b,ba2,ba3 ··· ,ban}. Every element in R appears in this set exactly once. This is be- cause, bai = bak ⇒ ai = ak since R is an integral domain. This means that 1 has to appear somewhere in this set. Hence ba j = 1 for some (specific) a j. The same argument applies to the right-handed case.  Theorem 4.5. The zero-divisors in Z/hni are exactly the elements m 6= 0 that are not relatively prime to n, that is, gcd(m,n) = d > 1. Proof. First if gcd(m,n) = d > 1, m 6= 0, then m(n/d) = (m/d)n = 0 in Z/hni since the right-hand-side is a multiple of n. This means that m is a zero-divisor in Z/hni since neither m nor n/d is zero. On the other hand, if gcd(m,n) = 1, m 6= 0, and mk = 0 = tn. Therefore, n has to divide mk. But since m and n are relatively prime, we have n|k, which is equivalent to k = 0 in Z/hni.  Corollary 4.6. The ring Z/hpi, where p is a prime, has no zero-divisors. Corollary 4.7. The ring Z/hni is a domain and every non-zero element is invertible if and only if n is a prime. Proof. If n is not a prime there are zero-divisors3. On the other hand, if n = p, it follows from Theorem 4.4 that every element is invertible, and from the previous corollary that Z/hpi has no zero-divisors.  This shows, incidentally, that Z/hpi is a field. More on this later. 4.1. Homomorphisms, quotient rings and the first isomorphism theorem. A very im- portant principle in the philosophy of modern mathematics is that mathematical objects are to a very large extent governed by their maps to other objects of the same “category” (e.g., groups or rings). Hence we need to define what is to be meant by a “map between rings”. Definition 4.2. A set-theoretical map of rings φ : R → S is a ring homomorphism or ring morphism if it is a homomorphism of the underlying abelian groups, i.e., φ(a + b) = φ(a) + φ(b), and if φ respects multiplication in the sense that φ(ab) = φ(a)φ(b), in addition to φ(1R) = 1S.

3Also, we note that zero-divisors have no inverses: ab = 0 =⇒ b = 0 and zero is not invertible. 9

φ Definition 4.3. Let R −→ S be a ring morphism and for s ∈ S denote

φ −1(s) := {r ∈ R | φ(r) = s}.

This is called the fiber over s or the inverse image of s. Put

ker(φ) := φ −1(0) = {r ∈ R | φ(r) = 0}, the kernel to φ, and im(φ) := {s ∈ S | ∃r ∈ R, φ(r) = s}, the image to φ.

Notice that since φ is a we have φ(0) = 0.

φ Theorem 4.8. For R −→ S a ring morphism, we have (i) ker(φ) is an ideal in R. (ii) im(φ) is a subring in S but not necessarily an ideal.

Proof. The proof is rather easy: (i) Let b ∈ ker(φ) ⊆ R, a ∈ S. Take r ∈ R. We want to show that rb ∈ ker(φ) to show that ker(φ) is an ideal. But this is obvious since φ(rb) = φ(r)φ(b) = φ(r) · 0 = 0, and so rb ∈ ker(φ). (ii) That im(φ) is a subring follows immediately from the definition. Think through this! 

Definition 4.4. We can make the following definitions in analogy with the corresponding notions from group theory. - A ring morphism φ : R → S is an injection (or “one-to-one”) if kerφ = {0}; this is written φ : R ,→ S; - φ is a surjection (or “onto”) if imφ = S, written φ : R  S; and • an isomorphism (“one-to-one, onto”) if it is both injective and surjective. φ We say that R and S are isomorphic if there is an isomorphism R −→ S and it is customary to write this as R ' S or R ≈ S.

Example 4.2. Here are some easy examples. - Reduction mod n is a ring morphism Z → Z/hni. - The inclusion Z ,→ Q is a ring morphism. - Complex conjugation is a ring morphism C → C. This is in fact an isomorphism. Isomorphisms between a ring and itself are called automorphisms.

The set of all ring morphisms between rings R and S is denoted by Hom(R,S) (to remind you of ’homomorphism’). If S = R then we put End(R) := Hom(R,R). Ring morphisms from R to itself is called endomorphisms.

Example 4.3. The map n· : Z → Z, n · z 7→ nz, is a group homomorphism, but not a ring homomorphism since it is not multiplicative: n · (ab) 6= nanb. So there are a lot fewer ring morphisms than group morphisms. 10 D. LARSSON

4.1.1. Quotient rings. Let i be an ideal of R. Introduce a relation on R as follows:

a ∼i b ⇐⇒ a − b ∈ i, for all a,b ∈ R.

Theorem 4.9. The relation ∼i is an equivalence relation on R. Proof. We need to show three things:

- Reflexivity: a ∼i a for all a ∈ R. This is clear since 0 ∈ i. - Symmetry: a ∼i b ⇒ b ∼i a. This follows since a − b ∈ i ⇔ −(b − a) ∈ i and since i is a subgroup −a ∈ i ⇒ a ∈ i. - Transitivity: a ∼i b and b ∼i c implies a ∼i c. This follows by a − b ∈ i and b − c ∈ i which, by adding (once again using that i is a subgroup), gives a − b + b − c = a − b ∈ i.  Definition 4.5. The coset (i.e., an equivalence class) of a ∈ R under this equivalence rela- tion is denoted a + i or sometimes [a] ora ¯. The set of cosets is denoted R/i := {a + i | a ∈ R}. Theorem 4.10. The operations (a + i) + (b + i) := (a + b) + i (a + i)(b + i) := ab + i defines a ring structure on R/i, with zero element i and unity 1 + i. There is also a canonical surjective ring morphism π : R → R/i with kerπ = i, defined by π(r) = r + i. Definition 4.6. The ring defined by this theorem is called the quotient (or factor) ring modulo i. What this essentially means is that we set all elements in i to zero, i.e., we are “killing the kernel kerπ = i” since, in R/i the zero element is 0 + i = i. Proof. We have to check that the definitions are well-defined, i.e., that choosing different representatives from each coset gives the same result. Let us prove this in the case of multiplication, the additive counter-part being completely analogous and therefore left to the reader. So, suppose we are given x + i = a + i and y + i = b + i. This means that x = a + ix and y = b + iy, for ix,iy ∈ i. Hence,

(x + i)(y + i) = xy + i = (a + ix)(b + iy) + i = ab + aiy + ixb + ixiy + i = = ab + i = (a + i)(b + i), where we have used that aiy,ixb,ixiy ∈ i. Notice that this uses in a essential way that i is an ideal. The rest of the ring axioms follow since the ring operations are induced from R. For the last statement, π(r + s) = (r + s) + i = (r + i) + (s + i) = π(r) + π(s), π(rs) = rs + i = (r + i)(s + i) = π(r)π(s), and clearly given r + i ∈ R/i, π(r) = r + i so π is certainly surjective.  Example 4.4. The canonical example is the following. Every subset of Z of the form nZ = {nz | z ∈ Z} is an ideal of Z. It is a fact (that can be deduced from the next subsection) that Z/hni ' Z/nZ. 11

We will return to this example after the following theorem, subsequent discussion and proof. 4.1.2. The first isomorphism theorem. The following theorem is arguably the most useful theorem in all ring theory. Theorem 4.11 (First isomorphism theorem). Let φ : R → S be a ring homomorphism. Then the following diagram commutes:

φ R / S x; xx π xx xx ¯  xx φ R/kerφ, that is, φ = φ¯ ◦ π, and induces an isomorphism R/kerφ ' imφ. Remark 4.1. First of all, what does it mean for a diagram to “commute”? Well, intuitively it means that what path you take from a particular point to any other (following the direc- tions of the arrows) is not important. In the above example it means that going from R to R/i and then to S is the same as going from R to S directly. Another way of saying this is that φ factorizes through R/kerφ as φ = φ¯ ◦ π. Proof. The proof of this theorem is surprisingly simple. First of all we define π by π(r) := r + kerφ. Then φ¯ can be defined as φ¯(r + kerφ) := φ(r). Notice that all the maps are now ring morphisms. Clearly, φ = φ¯ ◦ π, but we have to check that φ¯ is well defined. So suppose that r + kerφ = s + kerφ ⇔ r − s = f ∈ kerφ ⇔ r = s + f . Hence, φ¯(r + kerφ) = φ(r) = φ(s + f ) = φ(s) + φ( f ) = φ(s) = φ¯(s + kerφ). To see that we have the isomorphism R/kerφ ' imφ, note that φ is a surjection onto its image (by definition!) and so since imφ¯ = imφ the same goes for φ¯. Also, kerφ¯ = {0}, so kerφ¯ is injective, and therefore an isomorphism.  This theorem gives a more precise meaning to “killing the kernel”! Example 4.5. Let’s continue the example given right before this subsection. We have the ring morphism “reduction mod n”: Redn : Z → Z/hni given by Z 3 r 7→ r¯, wherer ¯ is the reduction (remainder) modulo n. The kernel of this map is all the multiples of n: ker(Redn) = nZ. Therefore, we get the following commutative diagram:

Redn Z / Z/hni v; vv π vv vvRedn  vv Z/nZ.

Clearly, Redn : Z → Z/hni is a surjection so im(Redn) = Z/hni and the isomorphism from the theorem becomes, Z/nZ ' im(Redn) = Z/hni. Example 4.6. The projections prR R × S // R,S prS 12 D. LARSSON are ring homomorphisms (check this!). Consider the morphism prR : R×S → R. The kernel of this is clearly S. Hence the theorem tells us, (R × S)S ' R,  and similarly, by using prS, (R × S) R ' S. So in this sense, ’/’ is really like a division, if ’×’ is viewed as a multiplication. This is probably the origin of the name “quotient ring” and the notation ‘/’. 4.1.3. Ideals and ring morphisms. Now we come to another, very important result: how do ideals behave under ring morphisms. We make the following definition. Definition 4.7. Let φ : R → S be a morphism of rings and i an ideal of R and j an ideal of S. Then, i∗ := φ(i)S = { ∑ iφ s | iφ ∈ φ(i),s ∈ S} finite is the extended ideal of i in S and j∗ := φ −1(j) = {r ∈ R | φ(r) ∈ j} the contracted ideal of j in R. Sometimes these are denoted iS and j ∩ R, respectively. I will probably use the first of these alternate notations but hardly the second. ∗ Theorem 4.12. With notation as above, i∗ and j are ideals of S and R, respectively. Proof. We prove the second, leaving the first to you as an exercise (do it!). Suppose, j ∈ j∗ and r ∈ R. Then φ(r j) = φ(r)φ( j) ∈ j since φ(r) ∈ S, φ( j) ∈ j (by definition) and j is an ideal. In the same manner, j + j0 ∈ j∗, when j, j0 ∈ j∗, φ( j + j0) = φ( j) + φ( j0) ∈ j since j is an ideal.  Theorem 4.13. Let φ : R → S be a ring morphism between two rings with ideals i ⊆ R and j ⊆ S. Then ∗ −1 (4.1) (j )∗ = φ(φ (j)) = imφ ∩ j ∗ −1 (4.2) (i∗) = φ (φ(i)) = kerφ + i. Proof. We divide the proof as in the statement of the theorem. (4.1) We need to show two inclusions: φ(φ −1(j)) ⊆ imφ ∩ j and φ(φ −1(j)) ⊇ imφ ∩ j. Let’s start with the first. Take a ∈ φ(φ −1(j)). Obviously a ∈ imφ. Since φ −1(j) is the set of elements mapping into j, we obviously also have a ∈ j. Hence a ∈ imφ ∩ j. For the other inclusion, assume a ∈ imφ ∩ j. Then, φ −1(a) is a set of elements mapping to a ∈ imφ ∩ j under φ, so a ∈ φ(φ −1(j)). Hence the (4.1) is proved. (4.2) As above, we need to prove two inclusions: φ −1(φ(i)) ⊆ kerφ + i and φ −1(φ(i)) ⊇ kerφ + i. Take a ∈ kerφ + i. Then φ(a) ∈ φ(i), and so a ∈ φ −1(φ(s)) ⊆ φ −1(φ(i)) since this is the set of elements mapping to φ(i) under φ. To show the other inclusion, take a ∈ φ −1(φ(i)). Then φ(a) ∈ φ(i) so φ(a) = φ(b) for some b ∈ i. This is equivalent to φ(a − b) = 0 and so a − b ∈ kerφ meaning that a = b + k, with k ∈ kerφ. But b ∈ i so the inclusion is proved, thereby completing the whole proof.  13

This theorem has the following remarkable corollaries. ∗ −1 Corollary 4.14. If φ : R  S is a surjection, then (j )∗ = φ(φ (j)) = j. On the other ∗ −1 hand if φ : R ,→ S is an injection, then (i∗) = φ (φ(i)) = i. Proof. Obvious.  Corollary 4.15. Let i be an ideal of R. Then there is a bijective correspondence between ideals of R/i and ideals j of R, such that i ⊆ j. The correspondence is given by i ⊆ j 7→ j/i := π(j) = { j + i | j ∈ j}. Proof. First of all it is clear that every ideal in R/i must be on the form B/i = {b+i | s ∈ B} for some subset B of R. However, if B/i is an ideal then (r+i)(b+i) should be in B/i for all r + i ∈ R/i. We have, (r + i)(b + i) = rb + i, and so rb must be in B, i.e., B must be an ideal. Furthermore, it is clear that B ⊇ i. Hence every ideal in R/i is on the form j/i for some ideal j ⊇ i. Conversely, given an ideal j ⊆ R we get an ideal j/i ⊆ R/i by ∗ projection. Suppose k1 := j1/i = j1/i =: k2. Then (k1)∗ = k2 by the previous corollary. ∗ ∗ ∗ But this means that ((k1)∗) = k2 and so j1 = j2 and the correspondence is also one-to- one.  4.2. UFD’s, PID’s and Euclidean domains. In this subsection all rings are assumed to be integral domains even if it is not explicitly stated. 4.2.1. Prime elements and irreducible elements. Definition 4.8. Let D be an integral domain and a,b ∈ D. - We say that b divides a, written (as usual) a|b, if there is a c ∈ D such that a = bc; a is then called a divisor of b. - A divisor of 1 is called a unit4. The set of units in a ring is a multiplicative group (check!) and denoted U(D). - Two elements a and b such that a = bu ⇐⇒ au−1 = b, for u ∈ U(D), are called associates. Association is an equivalence relation (check!). - An element a ∈ D is irreducible if any factorization a = bc implies that either b or c is a unit. Otherwise, a is called reducible. - An element p ∈ D is called a prime (element) if p|ab implies p|a or p|b. Remark 4.2. Be sure to separate the similarly named, but distinctly different, notions ’unit’ and ’unity’. This is notorious source of confusion and frustration. The following simple remark is important enough to earn its way as a full-fledged the- orem. Theorem 4.16. Let D be an integral domain. Then (i) a|b ⇐⇒ b ∈ hai; (ii) a|b and b|a ⇐⇒ a = ub for a unit u ∈ U(D) ⇐⇒ hai = hbi.

Proof. Exercise!  Theorem 4.17. Any prime element is irreducible (but in general, not the other way around) and p ∈ D is prime if and only if hpi is a prime ideal.

4Notice that this is equivalent to an invertible element. To be honest, I don’t know why there are two names for the same concept. My lingering feeling is that the designation ’unit’ is a bit more “number-theoretical”. But I could be wrong. 14 D. LARSSON

Proof. Let p be a prime and assume that it is reducible p = ab where neither a nor b is a unit. Then clearly p|ab but p - a and p - b (since if, for instance, p | a then a = pk for some k, and so p = pkb; since D is a domain, we can cancel p to obtain bk = 1, but b was not a unit so we have a contradiction) so we have a contradiction to p being prime. Suppose now that p is a prime and let ab ∈ hpi. This is equivalent to ab = pc for some c ∈ D and this in turn equivalent to p | ab and since p is prime p|a or p|b. Now use Theorem 4.16 (i). Conversely, suppose that hpi is a prime ideal and ab ∈ hpi. This implies that a ∈ hpi or b ∈ hpi. In either case a = pc or b = pc and we are done.  This theorem is the reason why prime ideals are called prime ideals.

4.2.2. Unique factorization domains. Definition 4.9. A domain D is called a unique factorization domain, abbreviated UFD, if UFD1. Every element can be written as a product of irreducibles, and UFD2. this factorization is unique up to multiplication by a unit. Let me comment on the last condition. Let

n1 nr m1 ms p1 ··· pr = q1 ···qs be two factorizations of a into irreducibles. Then the condition says that r = s and = ui jq j for some i and j and ui j ∈ U(D).

Theorem 4.18 (The fundamental theorem of Arithmetic). The ring of integers Z is a unique factorization domain. In general it is rather difficult to prove that a given integral domain admits unique fac- torization. Look up the proof of the above theorem in your elementary algebra books. Theorem 4.19. If UFD1 in Definition 4.9 holds, then UFD2 is equivalent to the statement “every irreducible element is prime.” Proof. Assume first that factorization is unique up to a unit. So let p be an irreducible element and suppose p|ab. We need to show that p|a or p|b. We begin by assuming that a and b are also irreducible and p - a. We have that pc = ab and p,a,b are irreducible so c is not a unit, since otherwise p = (c−1a)b, a factorization into irreducibles. This implies that c = ak for some k ∈ D and so pak = ab ⇔ a(pk − b) = 0, implying that p|b since D is a domain. Let us return to the general case. Let a = α1 ···αn and b = β1 ···βm. Since the factorization is unique (up to units) the product of a and b must be α1 ···αnβ1 ···βm. So if p|ab then p must divide a product of irreducibles and so must divide (at least) one of those. Hence p divides a or b. Assume now that every irreducible element is prime and let

n1 nr m1 ms p1 ··· pr = q1 ···qs be two factorizations of a into irreducibles. This tells us that q1 (for instance) divides both sides. Since every irreducible element is prime we see that all the pi’s and q j’s are prime n1 nr and q1|p1 ··· pr implies that q1|pi for some i and so q1 = spi, but since q1 is irreducible s ∈ U(D). Continuing this way inductively the result follows.  15

4.2.3. Principal ideal domains. Definition 4.10. A domain D is called a principal ideal domain, abbreviated PID, if all ideals are principal, i.e., generated by one element. Theorem 4.20. Let D be a PID. Then, p is irreducible ⇐⇒ p is prime. In addition, in a PID, every prime ideal is maximal. Proof. The implication ⇐ has already been demonstrated (as a general fact). For the reverse implication, let p be irreducible and let ab ∈ hpi with b ∈/ hpi. We then have that ab = cp. Since p is irreducible and b ∈/ hpi we have a = c1 p, i.e., a ∈ hpi. For the last statement, observe that if hpi is prime and hpi ⊆ hqi, for q ∈/ U(D), then p = cq and since p is irreducible, c ∈ U(D) and so hpi = hqi.  Theorem 4.21. Every PID is a UFD. Proof. Let D be a PID and S ⊆ D be the set of elements which cannot be written as a product of irreducibles. Assume that S is non-empty and take a ∈ S. Then a = a1b1, a1,b1 ∈/ U(D). This means that hai ⊂ ha1i and hai ⊂ hb1i. If both a1,b1 ∈/ S then this would imply that a is a product of irreducibles so assume that a1 ∈ S. The same argument applies with a replaced by a1 to get another a2 such that hai ⊂ ha1i ⊂ ha2i. Continuing this process we get an infinite sequence of strict ideal inclusions:

hai ⊂ ha1i ⊂ ha2i ⊂ ··· ⊂ hani ⊂ ··· . S∞ The infinite union i=1haii is an ideal (check this!), and since D is a PID there is an element S∞ S∞ a ∈ D such that hai = i=1haii. Obviously, a ∈ i=1haii and so there is a 1 ≤ j < ∞ such that a ∈ ha ji. But this means that ∞ ∞ [ [ haii = hai ⊆ ha ji ⊂ ha j+1i ⊆ hai = haii, i=1 i=1 which is a contradiction and hence S must be empty. To show that the factorization is unique we use Theorem 4.19. So let p be an irreducible element and p|ab and p - a. Since hpi is maximal by Theorem 4.20, hp,ai = D and so 1 ∈ hp,ai. This means that there are α,β ∈ D such that aα + pβ = 1. Multiply this equation with b, to get abα + pbβ = b. Since p divides the left hand side, it divides b and by Theorem 4.19, the factorization is unique.  The proof of this theorem shows, incidentally, that PID’s belong to a large and important class of rings which we now define. Definition 4.11. A ring R (not necessarily a domain5) is called Noetherian6 if it satisfies the ascending chain condition on ideals: for any ascending chain of ideals

a1 ⊆ a2 ⊆ ··· ⊆ an ⊆ ··· , there is an N such that for all k ≥ N,

ak = ak+1 = ak+2 = ··· . Theorem 4.22. Any PID is Noetherian.

5Or, commutative for that matter, although, in the non-commutative case one has to be a little more precise. 6In honor of Emmy Noether, 1882–1935. She was one of the big pioneers of abstract algebra. For more info, see her biography on http://www-groups.dcs.st-and.ac.uk/. 16 D. LARSSON

4.2.4. Integral domains that doesn’t have the unique factorization property. Definition 4.12. Let D be an integral domain. A (weak) norm on D is a function

Nrm : D → N0 := N ∪ {0}, such that (i) Nrm(ab) = Nrm(a)Nrm(b), and (ii) Nrm(a) = 1 ⇐⇒ a ∈ U(D). The definition implies: Lemma 4.23. If Nrm is a norm on D, then Nrm(0) = 0. Proof. Let a ∈ D be arbitrary. Then Nrm(0) = Nrm(a · 0) = Nrm(a)Nrm(0), which is equivalent to Nrm(0)(Nrm(a)−1) = 0. Since a was arbitrary, and N, as a subset of Z does not have any zero-divisors, we conclude that Nrm(0) = 0.  This norm-function is convenient tool to show that certain elements are irreducible. √ √ √Example 4.7. Let D = Z[ −5] = {a + b −5 | a,b ∈ Z}. Let us show that 2,3 and 1 ± −5 are irreducible. √ First of all we need to define a norm on Z[ −5]. We do this as √ √ √ Nrm(z) = zz¯ = (a + b −5)(a − b −5) = a2 + 5b2, for z = a + b −5. Check for yourselves that this is indeed a norm. If 2 is not irreducible we have 2 = ab, where a,b are not units. This means that Nrm(a)Nrm√(b) = Nrm(2) = 4. Since a,b are not units, Nrm(a) = Nrm(b) = 2. Assume a = α +β −5. Then 2 = Nrm(a) = α2 +5β 2, which is clearly impossible for integers α and β. Hence 2 is irreducible.√ A similar argument shows√ that 3 is irreducible.√ Suppose now that 1 ± −5 is reducible, i.e., 1 ± −5 = uw, u,w ∈/ U(Z[ −5]). Then √ Nrm(u)Nrm(w) = Nrm(1 ± −5) = 1 + 5 = 6. Since 6 = 2 · 3 and u and w are not units either u is 2 or 3 and w the other possibility. But in that case u or w must be units because,√ as we showed above, it is impossible to find non-units with norm 2 or 3. Hence 1 ± −5 is irreducible.√ If 2 and 3 were associates then 2 = 3u, with u ∈ U(Z[ −5]). This would imply that Nrm(2) = Nrm(3)Nrm(u) = Nrm(3)·1, and this is clearly false (we also see that associates√ have the√ same norm). The same argument applies to the other cases except for 1 + −5 and 1 − −5. √ What are the units of Z[ −5]? An element u ∈ D is a unit if√ and only if Nrm(u) = 1, 2 2 which in this case is equivalent to, a + 5b = 1, for u = a + b −√5. This is clearly only 2 possible if b√= 0 and so a =√1, showing that u = ±1. Hence U(Z[ −5]) = {−1√,1}. So, 1 − −5 = ±1 · (1 + −5) is clearly not an option and hence, 2,3,1 ± −5 are irreducible, non-associates. Now, we see that √ √ 6 = 2 · 3 = (1 − −5)(1 + −5), and so we see that 6 has two different√ (i.e., they don’t differ by a unit) factorizations into irreducibles! This means that Z[ −5] is not a UFD, and hence not a PID either. √ Ok, you might say, what on earth is so interesting about (strange) rings such as Z[ −5]? If you also say that it is uninteresting to find integer solutions to equations of the type y2 + a = x3, a ∈ Z (the above case being a = 5), then, fine, I agree, in that case studying 17 √ rings such as Z[ −5] could be seen as a bit artificial. But, on the other hand, then your interest in number theory may be, at best, superficial. The above example indicates, for instance, that the equation y2 + 5 = x3 might not have any, non-trivial, integer solutions7. Also, as we will very soon see, these kind of rings are very important in number theory. Clearly, unique factorization in number theory would be a something extremely desirable. But as this example shows this is not always (or most often even) possible to achieve. However, and here is where things become beautiful, even though unique factorization of elements is not always possible, unique factorization of ideals into prime ideals is! Unfortunately, I don’t have time to go into this in detail, but we will come back to it briefly. Theorem 4.24. If Nrm(a) = p, where p is a prime, then a is irreducible. Proof. Suppose a is reducible a = bc, b,c ∈/ U(D). Then, p = Nrm(a) = Nrm(bc) = Nrm(b)Nrm(c). Since p is a prime, Nrm(b) (or Nrm(c)) must be p. But this implies that Nrm(c) = 1 and so c is a unit, contrary to assumption. Hence, a is irreducible.  4.2.5. Euclidean domains. Now we will introduce yet another type of norm, or rather .

Definition 4.13. Let E be a domain. Then a function v : E → N0 is called an Euclidean valuation if the following are satisfied. Eu1. There is a division algorithm with respect to v. That is, for each pair a,b ∈ E, b 6= 0, there are q,r ∈ E such that a = bq + r, with v(r) < v(b), or, r = 0. Eu2. v(a) ≤ v(ab) for all E 3 a,b 6= 0. A domain E with a Euclidean valuation is called a Euclidean domain. Example 4.8. The following are examples of Euclidean domains. - Z,√ with v(a) := |a√|; √ 2 2 - Z[ ±2] = {a + b 2 | a,b ∈√Z}, with v(a +√b ±2) = a ± 2b ; √ - The Gaussian integers, Z[ −1] := {a + b −1 | a,b ∈ Z}, with v(a + b −1) = 2 2 a √+ b ; √ - Z[ 14] = {a+b 14 | a,b ∈ Z}. This example is highly non-trivial, proved 2004. √ It is not true that any Z[ n] is Euclidean, even if n is prime (see below). So what is so special about Euclidean domains? Well, we have: Theorem 4.25. Euclidean domain =⇒ PID =⇒ UFD. Proof. Only the first implication needs a proof since we have already proved the second one. For the first implication let i be an ideal of an Euclidean domain. Then let s be a smallest element, with respect to v, of i. Such an element exists since imv ⊆ N0. Then i = hsi (check!).  Hence we have unique factorization in Euclidean domains, something desirable in num- ber theory.

7Fix x and factor the left-hand-side to so what I mean. 18 D. LARSSON

Example 4.9. At the moment of writing, the following facts√ are known (according to Wikipedia, which I kind of trust on this issue) concerning Z[ d]: • it is Euclidean for d = −1,−2,−3,−7,−11, and hence a PID; • it is not Euclidean for d = −19,−43,−67,−163, but is in fact a PID for these d’s. Quite an amazing result! √ Example 4.10. Notice that the above theorem shows that Z[ −5] is neither a Euclidean domain, nor a PID, since it is not a UFD. It is natural to wonder (somewhat stunned, I hope) “why are all these ’nearby’ cases so different!?”. Frankly, I don’t think anyone knows. But after all, this is number theory, the science of simple questions and hard (at best) answers. 4.3. The Gaussian integers. In order to fully motivate the abstract theory that I have covered so far, and that will be taken to even greater apparently nonsensical heights in a while, I thought it prudent to insert an example to show how all this actually has something to do with number theory, the de facto topic of this course. Theorem 4.26. Any odd prime number p (i.e., p 6= 0) is the sum of two squares if and only if p ≡ 1 (mod4), that is, p = a2 + b2 ⇐⇒ p ≡ 1 (mod4). √ Proof. A very convenient way to prove this is by using the Gaussian integers Z[ −1]. Recall that √ √ [ −1] = {a + b −1 | a,b ∈ }. Z √ Z We have already observed that Z[ −1] is Euclidean (see Example√ 4.8), and so, conse- quently, both a PID and UFD. Moreover, there is a norm Nrm : Z[ −1] → Z defined by √ √ Nrm(z) := zz¯ = a2 + b2, where z = a + b −1 andz ¯ = a − b −1. √ So p = a2 + b2, when considered in Z[ −1], factorizes as √ √ p = (a + b −1)(a − b −1). √ Since we will be dealing both with primes in Z and primes in Z[ −1], we follow tradition and refer to primes in Z as rational primes. √ Suppose a rational√ prime considered as an element of Z[ −1], p factors as p = xy, where x,y ∈/ U(Z[ −1]), that is, x and y are not units (see Remark below). Then Nrm(p) = Nrm(x)Nrm(y), and Nrm(p) = p2 =⇒ Nrm(x)Nrm(y) = p2. An element x in a normed ring is a unit if and only if Nrm(x) = 1 and since we assumed that x and y are not units√ we must have that Nrm(x) = Nrm(y) = p. This means in particular that, for x = a + b −1, Nrm(x) = a2 + b2 = p. Now, assume that p ≡ 1 (mod4) ⇔ p = 4n+1, for some n ∈ Z. By quadratic reciprocity 2 the congruence z ≡ −1 (mod p) have a solution for√ p = 4n + 1 (check this!). This means that there is a t ∈ Z such that p|(t2 + 1). But in Z[ −1], √ √ t2 + 1 = (t + −1)(t − −1) √ √ so p|(t + −1√) or p|(t − −1) if it were to be a prime. However, none of these are elements of Z[ −1]: for instance, √ √ t/p + −1/p ∈/ Z[ −1]. √ Hence p cannot be a prime in Z[ −1], and so by the preceeding paragraph p = a2 + b2. 19

The other implication, namely, p = a2 +b2 ⇒ p ≡ 1 (mod4) follows by elementary rea- soning. Indeed, not both a and b can be even because then p would be even, contradicting the assumption. Similarly, if a and b are both odd then p would once again be even. So we must have that one is odd and one is even, and this implies that p ≡ 1 (mod4) (check it!). 

Remark 4.3. A very important remark here is that even though p ∈ Spec(Z) is irreducible, it can become reducible in an extension of Z. This is one of the basic realization of the number theorists of the 19’th century. In fact, this can always be given as general good advice: if something can’t be solved, can it be solved in some larger context? √ √ √ Lemma 4.27. The units of Z[ −1] are {1,−1, −1,− −1}. Proof. Use the characterization: z ∈ U(R) ⇔ Nrm(z) = 1.  √ Theorem 4.28. The primes, modulo multiplication by units, of Z[ −1] are exactly the elements on the forms: √ (i) p = 1 + √−1; (ii) p = a + b −1 with a2 + b2 = p and p ≡ 1 (mod4), where p is a rational prime 6= 2; (iii) p = p a rational prime such that p ≡ 3 (mod4). √ Recall that, since Z[ −1] is a PID, the irreducible elements are exactly the prime ele- ments.

Proof. First we prove that the elements listed are indeed prime. If p factored as xy, we would have p = Nrm(p) = Nrm(x)Nrm(y) (recall that the norm of an irreducible element is a rational prime). This immediately implies that Nrm(x) (for instance) has norm 1 and hence is a unit. Therefore, the elements in (i) and (ii) are indeed primes. For the third case, note that a factorization, p = xy would imply p2 = Nrm(x)Nrm(y) and so p = Nrm(x) = Nrm(y) = a2 + b2.

But this is equivalent to p ≡ 1 (mod4) and√ contradicts the assumption of (iii). We still need to prove that any p ∈ Z[ −1] is associated to one (and only one) of the forms (i)–(iii). To do this, assume that Nrm(p) factors as

Nrm(p) = pp¯ = p1 ··· pn, with the pi’s prime √ in Z[ −1]. From this we see that p|p1 ··· pn and so p|pi for some 1 ≤ i ≤ n. This means that 2 2 Nrm(p)|Nrm(pi) ⇐⇒ Nrm(p)|pi , so either Nrm(p) = pi or Nrm(p) = pi . √ 2 2 In the first case we get that p =√a + b −1 with a + b = pi so is on the form (ii), or if pi = 2, p is associated with 1+ −1, i.e., (i). In the second case, since p|pi ⇔ pi = pa we get that √ 2 pi = Nrm(pi) = Nrm(p)Nrm(a) and so Nrm(a) = 1 ⇒ a ∈ U(Z[ −1]).

This would then mean that pi√≡ 3 (mod4)√since otherwise pi = 2 or pi ≡ 1 (mod4) which would mean that pi = (a + b −1)(a − b −1) and this is impossible when p is a prime, so we must have that pi ≡ 3 (mod4).  20 D. LARSSON

√ FIGURE 1. The geometry of Z[ −1]

√ Corollary 4.29. Every rational prime p ∈ Z decomposes in Z[ −1] into primes as √ √ p = (a + b −1)(a − b −1) if p ≡ 1 (mod4), √ √ (1+ −1)2 or remains prime in Z[ −1] if p ≡ 3 (mod4). If p = 2 then we see that 2 = √ so √ −1 is associated to the square of (1 + −1). See Figure 1 for the geometric interpretation of this result.

4.4. Polynomials. We could define rings of polynomials on any level of abstraction. It can be well argued that the more abstract the more general, but for the purpose of number theory this is a bit shooting over the target. For this reason, we will take a more hands-on approach, sacrificing absolute rigor and utmost generality in favor of real mathematics. So we make the following definition. Definition 4.14. Let A be a ring. Then a over A is a formal sum 2 n a0 + a1z + a2z + ··· + anz , with ai ∈ A, 0 ≤ n < ∞. The set of all polynomials in an indeterminate z is denoted by A[z]. We want to make this into a ring. This we do with the following definitions. Let p(z),q(z) ∈ A[z]. Then we define, m p(z) + q(z) = (p0 + p1z + ··· + pnzn) + (q0 + q1z + ··· + qmz ) =

= (p0 + q0) + (p1 + q1)z + ··· , and m p(z)q(z) = (p0 + p1z + ··· + pnzn)(q0 + q1z + ··· + qmz ) = n+m n+m k   k   k = ∑ ∑ piq j z = ∑ ∑ pk− jq j z . k=0 i+ j=k k=0 j=0 Notice that, if we want A[z] to be a (commutative) ring, distributivity forces us to have the 0 above multiplication. We also make the convention that z = 1A[z] = 1A. In this way A becomes a subring of A[z]. Theorem 4.30. The above definitions endow A[z] with the structure of a commutative ring with unity, and A can in a natural way be considered a subring of A[z].

Proof. Exercise!  21

It would be a serious negligence of duty if I didn’t mention the ring of formal power series over A, denoted A[[z]], at this point (although we will probably don’t need it). This is defined to be the set of elements of the form ∞ i 2 n ∑ aiz = a0 + a1z + a2z + ··· + anz + ··· , ai ∈ A, i=0 with the same addition and multiplication as above. However, we need to observe that the sum giving the coefficient of zi is always a finite sum, so the definition makes sense. Also, we disregard all questions of convergence. This is indeed a formal construction.

4.4.1. Iterated polynomial rings. Since A was only supposed to be a ring and A[z] was a ring we can consider the iterate A[z][w]. This is obviously then the ring of polynomials

2 n z0 + z1w + z2w + ··· + znw , with zi ∈ A[z], 0 ≤ n < ∞. Convince yourselves that we have the equality A[z][w] = A[z,w], that is, any p(w) ∈ A[z][w] can be written as i j ∑ pi jz w , with pi j ∈ A, 0 ≤ i, j < ∞. i, j

In this way we can continue to iteratively construct polynomial rings A[z] = A[z1,z2,...,zm], in several indeterminates, z := {z1,z2,...,zm} (think through this!). From now on we will however focus our attention on the one-determinate case.

4.4.2. The degree-function on A[z]. There is a degree-function deg : A[z] → N0 defined by n deg p(z) = deg(p0 + p1z + ··· + pnz ) := n, and deg(0) = ∞. Observe that some author’s define deg(0) = −∞. The difference is a matter of taste more than anything. We note one immediate consequence of deg:

Theorem 4.31. If D is a domain then deg(pq) = deg p + degq and the ring D[z] is also a domain.

Proof. The first statement follows from the definition of a product of two polynomials, observing that the highest coefficient is not zero, since pdeg p and qdegq are non-zero (and not zero-divisors). Assume 0 6= p,q ∈ D[z] with pq = 0. Then

∞ = deg(0) = deg(pq) = deg p + degq.

For this last sum to be ∞ either deg p or degq (or both) must be ∞ and the only polynomial satisfying this is the zero polynomial. Hence p or q (or both) is zero. 

A polynomial f is called irreducible if f = gh implies that either g or h is a unit. If it is not irreducible, it is said to be reducible. We have the following useful criterion:

4.4.3. The Eisenstein criterion.

n Theorem 4.32. Let f (z) ∈ Z[z], f (z) = f0 + f1z + ··· + fnz . Suppose there is a prime 2 2 p ∈ Z such that p - fn, p|ai for 0 ≤ i ≤ n − 1, and p - f0 . Then f (z) is irreducible over Q. 22 D. LARSSON

4.4.4. D[z] is Euclidean. The following theorem is well-known to you, at least when for- mulated differently.

Theorem 4.33. The ring F[z], for F a field (i.e., a domain where all non-zero elements are invertible), is a Euclidean domain with deg as Euclidean valuation.

Proof. We know from high school that there is a division algorithm on F[z] (at least when F is the field of real numbers R). That deg(pq) ≥ deg(p) is obvious for p,q 6= 0.  This result holds for more general coefficient rings (other than fields) but then the degree function is not the right choice for an Euclidean valuation. As a result we have the first statement of the following theorem. Theorem 4.34. Let D[z] be a over a domain D. Then, (i) if D is a field D[z] is a PID and hence a UFD; (ii) if D is a UFD, D[z] is a UFD. Proof. The proof of the first statement follows from Theorems 4.33 and 4.25. For the proof of the second I refer you to more specialized abstract algebra literature.  4.4.5. D[z,w] is not Euclidean. The reason is simply because it is not a PID. There are ideals in D[z,w] which cannot be generated by one element. One example: i := hz,wi.

4.4.6. Reduction modulo ideals in A. We have seen how to define quotient rings A/i mod- ulo an ideal i. Given a polynomial p ∈ A[z] there is a ring homomorphism A[z] → (A/i)[z], “reduction mod i”, where we reduce the coefficients modulo i.

Example 4.11. Let 3+5z−6z2 +z3 ∈ Z[z]. Reducing this polynomial mod h5i ⊆ Z yields the polynomial 3 + 4z2 + z3 ∈ Z/h5i[z]. This is a useful technique in number theory. In fact we have the following useful result:

Theorem 4.35. Let f (z) ∈ Z[z]. If the reduction mod n, f¯(z) ∈ Z/hni[z], of f is irreducible, where deg( f¯) = deg( f ), then f is irreducible over Z. Proof. Suppose f is reducible: f = gh. Then the ring morphism Z → Z/hni yields a ring morphism Z[z] → Z/hni[z] mapping f → f¯. Therefore, f¯ = gh = g¯h¯, and so, since deg( f¯) = deg( f ), deg(g¯) = deg(g) and deg(h¯) = deg(h) and so f¯ is also reducible. 

5. FIELDS Now we begin to approach the heart of the subject, namely algebraic number fields. But first some more generalities.

5.1. Definition and examples.

Definition 5.1. A field F is an integral domain where every non-zero element is invertible. We will often denote fields by ’blackboard’ letters, F,E,J,L, etc. The one exception, and the example most important to us, is algebraic field extensions. Traditionally, these are written in an ordinary roman (italic) font. Example 5.1. The following are examples of fields: - Q, R and C; - Fp := Z/hpi, where p is prime; 23 √ √ - Q( d) := {a + b d | a,b ∈ Q}, for d a square-free integer,√ e.g., d = −1,±2. Convince yourselves that this is indeed a field. Why is C( d) not an interesting creature? Also, why do we insist that d is square-free? - F(z) := { f /g | f ,g ∈ F[z], g 6= 0}, the field of rational functions over F. We will see more examples in a little while.

6. BASICS @FIELD

Let R be a ring. Then there is a ring morphism Z → R sending n 7→ n1 = 1+1+···+1 (n times). The kernel of this morphism is an ideal of Z, and hence is on the form nZ, where we allow for the possibility n = 0. We can also assume that n is not negative (why?). Then there is an injective ring morphism Z/nZ ,→ R. If n = 0, we interpret Z/nZ as Z. This means that there is an isomorphism between Z/nZ and a subring of R. This number n is the characteristic of R, as defined before. We will from now on only consider fields. So we replace R with a field F. Notice that if n = 0 then Z is a subring of F. But every 0 6= z ∈ Z maps to z1 ∈ F and F being a field, shows that z1 is invertible. Hence F includes Q as a subfield. Suppose n 6= 0. Let ab ∈ hni = nZ = ker(Z → F). Then ab1 = (a1)(b1) = 0. Since F is a field, and hence an integral domain, either a1 = 0 or b1 = 0. But this means that a ∈ hni or b ∈ hni and so hni is a prime ideal which is equivalent to n being a prime. Hence we have shown,

Theorem 6.1. For every field F there are two (mutually exclusive) possibilities: (i) Either F includes the field of rational numbers Q, the case of zero characteristic, or (ii) the field Fp, the case of positive characteristic. Theorem 6.2. Let E and F be two fields. (i) There is only one, proper, ideal of any field, namely (0). (ii) If φ : F → E is a ring morphism, then it is injective. Proof. The first statement follows since 1 belongs to every non-zero ideal and hence are the whole field. The second statement follows from the first since kerφ ⊆ F is an ideal so must either be the whole field, in which case φ = 0, or kerφ = (0) and this is, by definition, the same as saying that φ is injective. 

Notice that this makes F into a subfield of E, and, conversely, any subfield of a field is given by an injection, namely the inclusion morphism. Theorem 6.3. Let R be a ring (not necessarily a domain). Then (i) the ideal p is prime if and only if R/p is an integral domain; (ii) the ideal m is maximal if and only if R/m is a field. Proof. We prove the statements separately. (i) Suppose p is a prime ideal, and let a,b ∈ R/p such that ab ∈ p, which is equivalent to they being zero in R/p. We want to show that a or b is zero, i.e., that a ∈ p or b ∈ p. But this follows immediately since p is a prime ideal. Conversely, suppose R/p is a domain. Then ab = 0 ⇔ ab ∈ p means that either a or b is zero, i.e., either a ∈ p or b ∈ p. 24 D. LARSSON

(ii) Suppose m is maximal. Since any maximal ideal is prime, R/m is an integral do- main by (ii). Take 0 6= a ∈ R/m. We want to show that a has an inverse. That a 6= 0 is equivalent to a ∈/ m. This means that hai + m = R so there are α,β ∈ R such that αa + βm = 1 for some m ∈ m. But, modulo m (i.e., in the reduction R/m), βm is zero. Hence α is the required inverse to a. For the other implication, suppose that m ⊂ a ⊆ R (ideal inclusion). Then there is an a ∈ a, a ∈/ m. Hence m ⊆ hai + m ⊆ a ⊆ R. Since R/m is a field there is b ∈ R such that ab = 1 modulo m, or, equivalently ab + cm = 1 for some c ∈ R.This element ab + cm is in hai + m and so 1 ∈ hai + m, showing that hai + m = a = R.  6.1. Fields of fractions. To any domain D one can associate a field, called the field of fractions of D. This is done by formally inverting all non-zero elements of D. Theorem 6.4. Let D be an integral domain. Then there is a field Frac(D) and a ring injection D ,→ Frac(D) making D into a subring of Frac(D). Proof. The proof of the theorem is constructive. Let D∗ be the multiplicatively closed (i.e., 1 ∈ D∗ and a,b ∈ D∗ ⇒ ab ∈ D∗) set of non-zero elements of D and consider the direct product D × D∗ = {(a,s) | a ∈ D,s ∈ D∗}. We equip this set with the following relation: (a,s) ∼ (b,t) ⇐⇒ at = bs. This is an equivalence relation. Indeed, the only non-trivial point is transitivity: (a,s) ∼ (b,t) and (b,t) ∼ (c,u) means at = bs and bu = ct. Multiply the first equation by u and the second by s leads to atu = bus and bus = cts. Putting these together leads to atu = cts and tu,ts ∈ D∗ so (au − cs)t = 0. Since D is a domain and t ∈ D∗, we see that au = cs and so (a,s) ∼ (c,u). Notice that (a,s) ∼ (at,st) for all t ∈ D∗. To show that Frac(D) := D × D∗/ ∼ is a field we need to have suitable definitions of addition and multiplication. So define, (a,s) + (b,t) := (at + bs,st), and (a,s)(b,t) := (ab,st). We have to show that these indeed define a ring structure on Frac(D) in such a way that it is a domain and every non-zero element is invertible. Group structure. We first show that Frac(D) is an abelian group with the above defined addition. (Unit) Define the zero element as (0,1). We immediately see that (0,1)+(a,s) = (a,s)+ (0,1). (Ass.) This follows from the following computation: (a,s) + ((b,t) + (c,u)) = (a,s) + (bu + ct,tu) = (atu + sbu + sct,stu) = = (at + sb,st) + (c,u) = ((a,s) + (b,t)) + (c,u). (Inv.) The invers to (a,s) is (−a,s). It is obvious that this satisfies the required properties. Hence Frac(D) is an abelian group under addition. Ring structure. Similarly we have to check the multiplication. (One) The unity is defined by (1,1). Once again it is immediate that this is indeed a unity. (Ass.) This follows from the following trivial observation: (a,s)((b,t)(c,u)) = (a,s)(bc,tu) = (abc,stu) = (ab,st)(c,u) = = ((a,s)(b,t))(c,u). 25

(Dist.) Finally, this follows from the following, (a,s)((b,t) + (c,u)) = (a,s)(bu + ct,tu) = (abus + acts,s2tu) = = (ab,st) + (ac,su) = (a,s)(b,t) + (a,s)(c,u) and similarly we show the right-handed version. Suppose (a,s) 6= (0,1). Then (a,s)−1 = (s,a) and (0,1) = (a,s)(b,u) = (ab,su) implies that either a or b is zero since D is a domain. Hence, we finally conclude that Frac(D) is a field. To end the proof we need to show that D can be considered a subring of Frac(D). Define ι : D → Frac(D) by ι(a) := (a,1). This is a ring morphism: ι(a + b) = (a + b,1) = (a1 + b1,1 · 1) = (a,1) + (b,1) = ι(a) + ι(b) and ι(ab) = (ab,1) = (a,1)(b,1) = ι(a)ι(b). In addition ι is injective since kerι = h0i. This completes the proof.  a −1 Definition 6.1. Write (a,s) as a/s, s or as . We have just shown that Frac(D), with structure given by (in the new notation) a b at + bs a b ab + := , and := s t st s t st is a field, called the field of fractions (sometimes field of quotients) of D. The above construction is actually just a special case of a more general theory called localization. However, for our purposes the above is more than sufficient.

7. FIELDEXTENSIONS This section is the technical heart of this short expose´ on fields.

7.1. Field extensions.

Definition 7.1. An inclusion of fields F ⊆ E is called a field extension, and denoted E/F. Notice that E is a vector space over F. The dimension dimF(E) is called the degree of the extension, and is denoted [E/F]. The extension is finite if [E/F] < ∞. The definition implies that every z ∈ E can be written uniquely as a sum

z = α1e1 + α2e2 + ··· + αnen + ··· , αi ∈ F, ei ∈ E, and where all the ei’s are linearly independent over F. We will primarily be concerned with finite extensions so in this case every z can be written as a finite sum

z = α1e1 + α2e2 + ··· + αnen αi ∈ F, ei ∈ E. Despite the conflicting notation with quotient rings (and groups) there is in practice never any risk of confusion: fields are trivially ideals.

7.1.1. The tower law. One often encounters a sequence of field extensions F ⊆ E ⊆ J of a field F. We then have the following theorem. Theorem 7.1. Suppose F ⊆ E ⊆ J is a sequence of finite field extensions of F. Then [J/F] = [J/E][E/F]. 0 0 In fact, if {ei} is a basis for E/F and {ei} is a basis for J/E, then {eie j} is a basis for J/F. 26 D. LARSSON

Proof. The proof of this theorem is rather simple. Take z ∈ J/E, so that 0 z = ∑α je j, α j ∈ E. j 0 By assumption, α j = ∑i β jiei, for β ji ∈ F. This means that z = ∑i, j β jieie j. Now we only 0 need to show that {eie j} are linearly independent. For this assume that 0   0 ∑γ jieie j = ∑ ∑γ jiei e j = 0, for some γ ji ∈ F. i, j j i 0 Since {e j} is a basis for J/E, we must have ∑i γ jiei = 0, and similarly, since {ei} is a basis for E/F, we get γ ji = 0.  7.2. Algebraic extensions, transcendental extensions. Definition 7.2. An element α ∈ E/F is called algebraic over F if there is a polynomial F ∈ F[z] such that F(α) = 0.

We can define a ring morphism evα : F[z] → E by evα (F) := F(α). Two possibilities occur: - If α is algebraic, the kernel is a non-trivial, proper, ideal of F[z]. Since F[z] is a PID, this ideal is generated by a single polynomial irr, i.e., ker(evα ) = hirri (the reason for the weird notation will be apparent soon). - On the other hand, if α is not algebraic, the kernel is zero, in which case α is called transcendental over F. √ √ Example 7.1. The √ 2 is algebraic over Q since 2 is a solution to the equa- tion z2 − 2 = 0. Notice that 2 ∈/ Q. The real number π is transcendental over Q since there is no polynomial with coefficients in Q which have π as a zero (this was proved by Lindemann in 1882). Note. The concepts of algebraic and transcendental numbers are very much dependent on over which field we work. For instance, e is transcendental over Q (proved by Hermite in 1873) but it is algebraic over R since it is a zero of the polynomial z − e ∈ R[z]. Let α be an algebraic element of E/F. By the first isomorphism theorem we have the following isomorphism:

F[z]/hPi ' F[α] = im(evα ), for some P ∈ F[z]. Since F[α] is an integral domain hPi must be a prime ideal, and so, since F[z] is a PID, P is an irreducible polynomial. Normalizing P by multiplying it by the inverse of the lead- ing coefficient we get an irreducible monic (i.e., having leading coefficient 1) polynomial irr and it is clear that irr is the unique polynomial of least degree generating hPi. In this respect, the monic irr is uniquely given by α, and is called the minimal polynomial asso- ciated with α. We denote this as irr(α,F) or simply irrα or irr if F, α and F, respectively, are obvious from the context. In addition, irr(α,F) divides every polynomial for which α is zero. Indeed, suppose L(α) = 0. Then by assumption on irr, deg(L) ≥ deg(irr) and so by the division algorithm for polynomials, L(z) = q(z)irr(z) + r(z) for some r ∈ F[z], deg(r) < deg(irr). But since L(α) = irr(α) = 0, we must have r(α) = 0, which, by as- sumption on irr, implies that r(z) = 0, and so irr|L. We have proved the following: Theorem 7.2. The minimial polynomial to irr(α,F) is the unique monic polynomial of minimal degree having α as a zero. Any other such polynomial with α as a zero is a multiple of irr(α,F). 27

Definition 7.3. An extension E/F is called algebraic if all e ∈ E are algebraic over F.

Theorem 7.3. Every finite extension E/F is algebraic.

Proof. The powers of 0 6= α ∈ E cannot be linearly independent since the extension is finite. 

7.3. Simple and finitely generated extensions.

Theorem 7.4. Let α ∈ E/F be algebraic. Then F[α], the polynomial ring generated by n− α over F, is a subfield of E. Denote this subfield by F(α). The set {1,α,α2,...,α 1}, n := deg(irr(α,F)), is a basis for F(α) over F.

Proof. We know that F[α] ' F[z]/hirri, where irr = irr(α,F) is the minimal polynomial associated to α. That F[α] is a subring of E is clear. Since F[z] is a PID, every prime ideal is maximal, and so F[α] ' F[z]/hirri is a field. We need to show that every a ∈ F(α) can be written uniquely on the form

2 n−1 a = a01 + a1α + a2α + ··· + an−1α , with ai ∈ F.

That every a ∈ F(α) can be written on the above form follows since a, by the initial defi- nition, can be represented by a polynomial F in α. If deg(F) ≥ deg(irr) we can reduce it modulo irr by the division algorithm and represent F by the remainder. This remainder is unique and has degree less than n and so the theorem is proved. 

Combining Theorems 7.3 and 7.4 we get

Theorem 7.5. An element α ∈ E/F is algebraic if and only if F(α)/F is a finite (algebraic) extension.

Theorem 7.6. If α,β ∈ E/F have the same minimal polynomial, then F(α) ' F(β).

Proof. This actually follows from the isomorphism F[α] ' F[z]/hirri, but it can be nice to have an explicitly given isomorphism. So, we define a morphism by

n−1 n−1 φ : a01 + a1α + ··· + an−1α 7→ a01 + a1β + ··· + bn−1β .

Clearly, φ is one-to-one onto and a morphism of groups, i.e., φ(x+y) = φ(x)+φ(y) (check this!). We need to show that it is also multiplicative: φ(xy) = φ(x)φ(y). This follows since x, y and xy can be written as polynomials in α. 

Now, take for simplicity (this is not necessary) F = Q, and note that this construction can be iterated, adding more algebraic elements to a field. Indeed, suppose Q(α)/Q is a simple extension and that Q(α) is a subfield of some larger field (it is always a subfield of C in any case so this is not a problem). Take an algebraic element β ∈ C over Q(α). Then Q(α,β) := Q(α)[β] (polynomials in β over Q(α)) is a field by the same reasoning as before. This is called the field obtained by adjoining β to Q(α) and similarly, Q(α) is the field obtained by adjoining α to Q.

Definition 7.4. The field extension Q(α1,...,αn)/Q is called the finitely generated field extension generated by α1,...,αn ∈ C. 28 D. LARSSON

7.4. Algebraic closure.

Definition 7.5. A field F is algebraically closed if every F ∈ F[z], deg(F) ≥ 1, has a zero α ∈ F. An equivalent definition is: The field F is algebraically closed if for any algebraic ex- tension E/F, we have E = F. This is not entirely obvious since one needs to know that every polynomial over a field has zeros in some field extension (which is true).

Example 7.2. The field C is algebraically closed. This follows from the fundamental theorem of algebra: every non-zero polynomial over C has a zero. Definition 7.6. Let F be a field. Then a field E such that F ⊆ E is called an algebraic closure of F if E/F is algebraic and E algebraically closed. Theorem 7.7. Every field F has an algebraic closure and any two such are isomorphic. This theorem enables us to speak of the algebraic closure of F. This is usually denoted F¯ or Falg. The proof of this theorem is rather involved and deep, using in an essential way the axiom of choice or Zorn’s lemma. I refer to more advanced algebra texts for this.

Example 7.3. The field of rationals Q has an algebraic closure Qalg. This is the set of all possible complex zeros of polynomials in Q[z]. It is a fact (that I won’t prove) that 8 Qalg (C. In fact, Qalg is countable whereas C is uncountable . The extension Qalg/Q is one of the most (maybe the most) complicated objects in mathematics and much of the research done in number theory, algebra and geometry, directly or indirectly, aims at understanding the properties of this extension.

Example 7.4. The set Zalg ⊆ Qalg of elements which are complex roots of monic integer polynomials, i.e., monic polynomials in Z[z], is called the set of algebraic integers of Qalg. This will be generalized soon to more general number-theoretic situations.

Theorem 7.8. The algebraic closure Qalg of Q is a field. Proof. By the tower law and Theorem 7.5 [Q(α,β)/Q] = [Q(α,β)/Q(α)][Q(α)/Q] < ∞ and so Q(α,β)/Q is an algebraic extension. This implies that Q(α + β)/Q, Q(−α)/Q − and Q(α 1)/Q, α 6= 0, are all algebraic extensions since α + β, αβ and α/β are all in alg Q(α,β). This clearly implies that Q is a field.  Notice that [Qalg/Q] = ∞, but Qalg is (by definition) algebraic over Q. This shows that the implication “algebraic ⇒ finite” is not true in general. It is, however, true for simple extensions as we have seen (and more generally, finitely generated extensions).

8. FINITE FIELDS Let f (z) be a polynomial over a field K (not necessarily finite). Then L ⊇ K is a splitting field for f (z) if f (z) splits into linear factors over L. In fact, a splitting field is gotten by adjoining to K all zeros of f (z) in some algebraic closure Kalg. Splitting fields are unique up to isomorphism (i.e., any two splitting fields are isomorphic).

8 alg If this means nothing to you, don’t worry. You only need to know that C is “infinitely bigger” than Q . 29

8.1. The main theorem. We know that Fp := Z/pZ are fields with p elements for p ∈ Spec(Z). Let F be a finite field. Since F is a ring there is a canonical ring morphism Z → F sending 1Z 7→ 1F. Clearly, this morphism cannot be injective since Z is infinte. Therefore it has a non-zero kernel which has to be a maximal ideal in Z generated by a prime p (recall that Z is a PID). From this follows (by the first isomorphism theorem) that there is an injection Fp = Z/pZ ,→ F, so F ⊇ Fp is a finite field extension of finite fields. Hence, n F is finite-dimensional as a vector space over Fp and so #F = p for some n ≥ 1. We have proven the first part of the following theorem. Theorem 8.1 (Main theorem on finite fields). The number of elements in any finite field is a power of a prime p. Every finite field F is the splitting field of the polynomial pn f (z) := z − z ∈ Fp[z] and the elements of F are exactly the zeros of f (z). Therefore, there is only one finite field of order pn up to isomorphism. × Proof. The multiplicative group F of F is cyclic with q − 1 elements where we have put n × q := p . Therefore, by Fermat’s little theorem for instance, every a ∈ F is a solution to the q− q equation z 1 − 1 = 0, implying that every element in F is a solution to f (z) := z − z = 0. Hence f (z) has q distinct zeros (the elements of F) and so F is a splitting field for f (z) = ∏α∈F(z − α). So, assuming that a finite field of pn elements exists, it is the splitting field of the poly- n nomial f (z) = zp − z, and since splitting fields are unique up to isomorphism, every two finite fields of pn elements are isomorphic. n It remains to construct a finite field of p elements for every p ∈ Spec(Z) and n ≥ 1. We will do this by showing that the set of zeros of f (z) actually is a field (and so has n p elements in some algebraic closure of Fp). Take zeros α,β of f (z). Then, for p odd, α ±β is also a zero of f (z). This follows from the characteristic-p-version of the binomial theorem (a ± b)p = ap ± bp (check this!). If p = 2, then −α = α so this poses no problem either. Clearly, 0 and 1 are zeros; αβ is also a zero: n n n (αβ)p − αβ = α p β p − αβ = αβ − αβ = 0. Similarly, α−1 is a zero: n n (α−1)p − α−1 = (α p )−1 − α−1 = α−1 − α−1 = 0. n Since, f 0(z) = pnzp −1 − 1 = −1, the polynomial f (z) can have no multiple zeros. Hence, n there are p distinct zeros and these form a field.  Let me remark that in constructing the finite field we could use any irreducible poly- nomial over Fp, not necessarily f (z) as defined in the theorem (although f (z) is actually not irreducible). Indeed, choose an irreducible polynomial g(z) and let n := deg(g). No- 2 tice that Fp is a subring of Fp[z] via a 7→ a + 0z + 0z + ··· (this is true for any ring R in R[z]). We know that g(z) generates a maximal ideal in Fp[z] since Fp[z] is a PID. Therefore Fp[z]/hg(z)i is a field and we have sequence of morphisms

Fp / Fp[z] / / Fp[z]/hg(z)i .

The composition Fp → Fp[z]/hg(z)i is a morphism of fields so is injective; hence, we n get a finite extension of Fp of dimension p . Now, since finite fields are unique up to isomorphism, this must be the same (isomorphic) field as the one constructed via f (z) in the theorem. The difference is simply that the multiplication rule has different appearance 30 D. LARSSON

(given in terms of f (z) in one case, and g(z) in the other), but the resulting fields are actually isomorphic.

Corollary 8.2. Any finite extension of finite fields Fpn /Fp is Galois. n 8.2. The Frobenius morphism. Put q := p , for some p ∈ Spec(Z) and n ≥ 1. Then Fq/Fp is a finite extension of finite fields. Then there is a morphism p Frobp : Fq → Fq, a 7→ a . This is easily checked to be a ring morphism and since Fq is a field it is injective, and since Fq is finite, it is surjective. Hence Frobp is an automorphism. This automorphism is called the Frobenius morphism of Fq/Fp, or simply, ”the Frobenius”. Also, Frobp |Fp = id, so Frobp ∈ Gal(Fq/Fp).

Theorem 8.3. Let Fq/Fp be a finite extension of finite fields. Then Gal(Fq/Fp) = hFrobpi. In particular, Fq/Fp is a cyclic Galois extension.

Proof. Let G := hFrobpi ⊆ Gal(Fq/Fp). We have n pn Frobp(α) = α = α, for all α ∈ Fq, n d so Frobp = id. Let 1 ≤ d ≤ n be the smallest integer such that Frobp = id. Then d pd Frobp(α) = α = α for all α ∈ Fq. pd d Hence every α ∈ Fq is a zero of z − z. This polynomial has at most p zeros so d ≥ n from which we see that d = n. Therefore, G is cyclic of order n. But the number of automorphisms of a field extension is less than or equal to the degree, so we must have that G = Gal(Fq/Fp). 

9. ALGEBRAIC NUMBER FIELDS 9.1. Algebraic numbers. Fact 9.1. Every irreducible polynomial P(z) over a subfield of C splits as

P(z) = (z − α1)(z − α2)···(z − αn), where all αi ∈ C are distinct. This fact is a special case of something called ’separability’: Definition 9.1. An element α ∈ E/F is called separable if α is not a multiple zero of irr(α,F). The E/F is called separable if all elements are separable. Theorem 9.2. Every algebraic field extension of characteristic zero is separable. Also, every finite extension of a finite field is separable.

Proof. See any respectable book in Galois theory.  From this theorem the above fact follows since any subfield of C is of characteristic zero. Definition 9.2. An algebraic number field is a finite (and hence algebraic) extension K/Q in Qalg. Elements of K/Q are called algebraic numbers. We will often simply write K for the extension K/Q as we, from now on, mainly will consider extensions over Q. In cases where the field over which the extension takes place is not Q this will be explicitly stated. 31

9.1.1. Primitive element theorem.

Theorem 9.3. If K/Q is an algebraic number field, i.e., a field extension on the form alg Q(α1,...,αn)/Q, there is an algebraic element δ ∈ Q such that K/Q = Q(δ). The element δ is called a primitive element for K/F. Hence, as a result we may restrict our attention to simple extensions.

Proof. By induction it is clearly sufficient to consider only the case when K = Q(α,β), for some subfield Q of K over which α and β are algebraic. Let irrα := irr(α,Q) and irrβ := irr(β,Q) be the minimal polynomials of α and β. By the above fact, irrα and irrβ factorizes into distinct factors as

irrα (z) = (z − α)(z − α2)···(z − αn), and irrβ (z) = (z − β)(z − β2)···(z − βm).

We make the convention that α1 := α and β1 := β. Since all the αi’s and βi’s are distinct there is at most one d ∈ Q such that

αi + dβ j = α1 + dβ1, for every 2 ≤ i ≤ n, 2 ≤ j ≤ m.

(Check this!) This means that we can choose d such that αi + dβ j 6= α1 + dβ1 since in this case there are only finitely many d’s for which the equality holds. Put

δ := α + dβ = α1 + dβ1. Obviously Q(δ) ⊆ Q(α,β), and so we only need to show the reverse inclusion. In fact, it suffices to show that β ∈ Q(δ) because then α = δ − dβ ∈ Q(δ). We have that

irrα (δ − dβ) = irrα (α) = 0 so putting L(z) := irrα (δ − dz), we see that irrβ (β) = L(β) = 0. By the choice of d, these two polynomials have only one zero in common. Indeed, suppose ε is such that L(ε) = irrβ (ε) = 0. Then ε must be one of the βi’s and δ −dε one of the αi’s and so, by the choice of d, ε = β. Since irr(β,Q(δ)) (the minimal polynomial of β over Q(δ)) divides both L and irrβ and these two only have one zero in common, deg(irr(β,Q(δ))) = 1, say, 0 0 irr(β,Q(δ))(z) = z − β , for β ∈ Q(δ). Hence, 0 0 = irr(β,Q(δ))(β) = β − β ∈ Q(δ) 0 so β = β ∈ Q(δ) and the proof is finished.  9.2. Norms, traces and conjugates. 9.2.1. Field morphisms. Any morphism of algebraic number field φ : K → L, restricted to Q, is the identity φ|Q = idQ. This follows from the commutativity of the diagram

 ιK / K Q sMM MMM MMM φ ιL MM MMM  & L. Definition 9.3. A field morphism on K is a morphism of fields ϕ : K → C such that ϕ|Q = idQ. We denote the set of all field morphisms K → C by Mor(K). Theorem 9.4. Let K = Q(α) be an algebraic number field with irr(α,Q) the minimal poly- nomial of α. Then the number of field morphisms ϕ : K → C is [K/Q] = deg(irr(α,Q)). Moreover, ϕi(α) = αi, 2 ≤ i ≤ [K/Q], where αi are the other zeros of irr(α,Q). We put α1 := α. 32 D. LARSSON

Proof. Let αi be the zeros of the minimal polynomial of α, with α1 = α. Then ϕi(α) = αi alg defines field morphisms Q(α) ,→ Q ⊂ C, and 1 ≤ i ≤ [Q(α)/Q]. We have that Q(αi) ' Q(α) for all i by Theorem 7.6. Since all the αi’s are distinct the number of such morphisms is [Q(α)/Q]. Conversely, if ϕ is a field morphism ϕ : Q(α) ,→ C, then

0 = ϕ(irrα (α)) = irrα (ϕ(α)) so ϕ(α) must be one of the zeros to irrα , i.e., one of the αi’s and so ϕ is one of the ϕi’s. 

Corollary 9.5. With notation as in the theorem, irr(α,Q) splits in C as irr(α,Q) = ∏ (z − ϕi(α)). ϕi∈Mor(K)

Definition 9.4. An algebraic number field K/Q = Q(α)/Q such that - imϕ ⊆ R for all ϕ ∈ Mor(K), is called totally real; - imϕ ⊆ C \ R for all ϕ ∈ Mor(K), is called totally imaginary; - K is a totally imaginary quadratic extension of a totally real field, is called a CM- field or complex multiplication field9. Examples of these notions will follow when we come to quadratic and cyclotomic num- ber fields. Proposition 9.6. The set of field morphisms can be decomposed (disjointly) into ℜ ℜ Mor(K) := {ϕ ∈ Mor(K) | imϕ ⊆ R}, r1 := #Mor(K) , ℑ ℑ Mor(K) := {ϕ ∈ Mor(K) | imϕ ⊆ C}, 2r2 := #Mor(K) and r1 + 2r2 = n. The pair (r1,r2) is called the signature of K/Q. Notice that for ϕ ∈ Mor(K)ℑ we might have imϕ ∩ R 6= /0. Proof. The only thing needing a proof is that the number of imaginary field morphisms is even. But this follows since for every ϕ ∈ Mor(K)ℑ we get a unique other by composing with complex conjugation. This means that there are r2 pairs (ϕ,ϕ¯). 

Definition 9.5. The field polynomial of β ∈ K = Q(α) is defined by

[Q(α)/Q] Ψβ (z) := ∏ (z − ϕi(β)). i=1

The elements ϕi(β) are called the conjugates of β. An extension K/Q is called normal if all conjugates to α, i.e., all zeros of irrα , are elements in Q(α)/Q; a normal extension of Q is called a Galois extension (over Q). Definition 9.6. For fixed α ∈ E/F, we get a natural F-linear map α· : E → E, α ·v := αv. The norm of α is the determinant of α· and denoted NrmE/F(α) := det(α·). Similarly, the trace of α is the trace of α·, TrE/F(α) := Tr(α·).

9The reason for this strange notion is historical and is still extremely important to this day. It traces back to something called ”Kronecker’s Jugendtraum” and concerns how one can construct ”abelian extensions” of number fields using elliptic curves (or elliptic functions in the days of Kronecker). It would take me to afar, totally disregarding that my competence is severely lacking here, to explain this. But I encourage you do a Wikipedia search. 33

Remark 9.1. Authors tend to simplify notation whenever possible to avoid cumbersome and heavy notation, sometimes at the cost of absolute rigor, as you will no doubt find time and time again (if you haven’t already). I am no exception to this rule. As a consequence, I will often be rather careless not distinguishing between α as an element of E and its associated operator α· acting on E/F. Theorem 9.7. We have Tr(α + β) = Tr(α) + Tr(β) and Nrm(αβ) = Nrm(α)Nrm(β), so we get maps

TrE/F : E → F and NrmE/F : E \{0} → F \{0}, the first being linear and the second multiplicative.

Proof. This follows immediately from the definitions.  Theorem 9.8. Let K/Q = Q(α)/Q be an algebraic field extension. Then - The field polynomial Ψβ (z) of β ∈ K/Q is equal to the characteristic polynomial Pβ (z). In fact, [K/Q(β)] Ψβ (z) = Pβ (z) = irr(β,Q) .

- TrK/Q(β) = ∑ϕi∈Mor(K) ϕi(β). - NrmK/Q(β) = ∏ϕi∈Mor(K) ϕi(β). In particular, Ψβ (z) ∈ Q[z]. Proof. We put for simplicity n := [K/Q(β)] (recall that K = Q(α)) and m := [Q(β)/Q]. Let m m−1 irr(β,Q) := z + km−1z + ··· + k1z + k0 m− be the minimal polynomial of β over Q. Hence, {1,β,β 2,...,β 1} is a basis for Q(β)/Q, so that, by the tower law, 2 m−1 2 m−1 e := {e1,...,enm} := { f11, f1β, f1β ,..., f1β ;...; fn1, fnβ, fnβ ,..., fnβ } is a basis for K/Q, given that f := { f1,..., fn} is a basis for K/Q(β). The matrix of β· in m−1 each block of basis vectors { f j1, f jβ,... f jβ } is given by a matrix   0 0 0 ··· 0 −k0 1 0 0 ··· 0 −k1    0 1 0 ··· 0 −k2    0 0 1 ··· 0 −k3    ......  ......  0 0 0 ··· 1 −km−1 as is easily checked (do this!). The characteristic polynomial of each block is given by   −z 0 0 ··· 0 −k0  1 −z 0 ··· 0 −k1     0 1 −z ··· 0 −k2  det   0 0 1 ··· 0 −k3     ......   ......  0 0 0 ··· 1 −z − km−1 which expanded (at the last column, for instance) gives m m−1 z + km−1z + ··· + k1z + k0 = irr(β,Q), 34 D. LARSSON

n so Pβ (z) = irr(β,Q) since we have n such blocks. By the tower law, [Q(α)/Q] = [Q(α)/Q(β)][Q(β)/Q] = nm, we see that there are exactly n morphisms in Mor(Q(α)) restricting to the same morphism in Mor(Q(β)). This means that  [Q(β)/Q] n [Q(α)/Q] Pβ (z) = ∏ (z − ϕi(β)) = ∏ (z − ϕi(β)) = ∏ (z − ϕi(β)) = Ψβ (z), i=1 ϕi∈Mor(Q(α)) i=1 where the first equality follows by Corollary (9.5). Now, from Pβ (z) = Ψβ (z) and the relation between the trace, determinant and characteristic polynomial, follows TrK/Q(β) = ∑ϕi∈Mor(K) ϕi(β) and NrmK/Q(β) = ∏ϕi∈Mor(K) ϕi(β). The proof is finished.  9.3. Algebraic integers and rings of integers. We now want to prove properties of the set of algebraic integers Zalg. For one thing we want to prove that it is a subring of the field of algebraic numbers Qalg. But number theorists are maybe more interested, in a first instance at least, to better understand other (smaller) number fields. This is the reason for introducing and working with the following definitions instead. Definition 9.7. Let D be a domain properly contained in a field F (for instance its field of fractions). Then α ∈ F is said to be integral over D if there is a monic polynomial (not necessarily irreducible!) with coefficients in D, having α as a zero. We denote by FD the set of elements of F integral over D. Notice that, since every a ∈ D is the solution of z−a = 0, D ⊆ FD. Hence FD is certainly non-empty.

Theorem 9.9. Let D be a subring of a field F (and so, in particular, a domain). Then FD is also a subring (domain) of F. We will prove this theorem in two steps. The first step is the following proposition.

Proposition 9.10. An element α ∈ F belongs to FD if and only if there is a finitely gener- ated D-submodule M of F such that αM ⊆ M. n−1 Proof. Suppose that α ∈ F is integral over D. Then a01 + a1α + ··· + an−1α = 0 n−1 for some ai ∈ D. The D-submodule M generated by {1,α,...,α } satisfies αM ⊆ M. Conversely, suppose that αM ⊆ M for some finitely generated D-submodule of F. Let m1,m2,...,mn be the generators. Then αmi = ∑ j ωi jm j, or re-written in matrix-form:      (ω11 − α) ω12 ··· ω1n m1 0  ω21 (ω22 − α) ··· ω2n m2 0    =  .  . .. .  .  .  . . .  .  . ωn1 ··· (ωnn − α) mn 0 Let Ω be the matrix on the coefficient matrix on the left. By Cramer’s rule we see that det(Ω)m j = 0 for all j. Since F is a field, and in particular a domain, and not all m j = 0, we must have det(Ω) = 0. Expanding this determinant yields a polynomial in α and so α is thus integral over D.  The second step towards Theorem 9.9 is the following proposition. Proposition 9.11. Let α and β be two elements of F integral over D and M and N two finitely generated D-submodules of F such that αM ⊆ M and βN ⊆ N. Then, the set M · N = {mn | m ∈ M, n ∈ N} is a finitely generated D-submodule of F, invariant under multiplication by αβ and α ± β. 35

Proof. Clearly M · N is a D-submodule of F. Further, if e is a generating set for M and f a generating set for N, then e · f := {e1 f1,e1 f2,...,ei f j,...,en fm} is a generating set for M · N. Equally clear is it that (αβ)M · N ⊆ M · N and (α ± β)M · N ⊆ M · N by the corresponding properties of M and N.  Proof of Theorem 9.9. Apply Proposition 9.10 to elements from Proposition 9.11.  Integral closure.

Definition 9.8. Let D be a domain contained in a field F. Then the set of elements FD is called the integral closure of D in F. The integral closure of Z in a number field K/Q is called the ring of integers of K/Q and is denoted oK.

Definition 9.9. The domain D is called integrally closed if (Frac(D))D = D, that is, every element of Frac(D) integral over D, is already in D. We have the following nice result and eye-candy for a proof. Theorem 9.12. Every UFD is integrally closed. Proof. Let a/b ∈ Frac(D) be integral over D. Hence n n−1 (a/b) + pn−1(a/b) + ··· + p1(a/b) + p0 = 0, for pi ∈ D. Multiply this relation with bn, to get n n−1 n−2 2 n−1 (9.1) a + pn−1a b + pn−2a b + ··· + p1ab + p0b = 0. If b were a unit then a/b would already be in D, so suppose it is not. There is an irreducible element p being a factor in b but not in a (why?). This means that, p is a factor in every term of (9.1) except the first. However, this shows that p|an, and so p is a factor in a, contrary to assumption.  Proposition 9.13. Let α ∈ F/Frac(D) be algebraic, where D is an integral domain. Then there is a d ∈ D such that dα ∈ FD. Proof. Since α is algebraic over Frac(D), we have that n n−1 α + an−1α + ··· + a1α + a0 = 0, with ai ∈ Frac(D).

This means that every ai is on the form ai = bi/ci, for bi,ci ∈ D, 1 ≤ i ≤ n − 1. Put n d := c0c1 ···cn−1, and multiply the above equation by d . This gives n 0 n−1 0 0 0 (αd) + bn−1(αd) + ··· + b1(αd) + b0 = 0, with bi := bic0 ···cbi ···cn−1 ∈ D, where cbi means that ci is omitted. The result follows.  Theorem 9.14. Let E/Frac(D) be a finite (algebraic) extension of the field of fractions of D and assume that D is integrally closed. An element α ∈ E is integral over D if and only if irr(α,Frac(D)) ∈ D[z].

Proof. That irrα := irr(α,Frac(D)) ∈ D[z] ⇒ α integral over D, is obvious, so let us show the other implication, assuming that α is integral over D. Hence n n−1 α + pn−1α + ··· + p1α + p0 = 0, pi ∈ D.

Let β be any other zero of irrα . Then there is an Frac(D)-isomorphism ' L : Frac(D)(α) −→ Frac(D)(β) such that L(α) = β. Applying L to the equation above shows that β is also integral over D and so this is indeed the case for every Frac(D)-conjugate of α (i.e., every zero of irrα ). By 36 D. LARSSON a famous theorem of Newton, every coefficient of a polynomial P is itself a (symmetric) polynomial in the zeros of P. Hence, all the coefficients of irrα are integral over D by Theorem 9.9, and since D is integrally closed, these coefficients are in D. 

Proposition 9.15. Let D be integrally closed and assume that E is a finite extension of Frac(D). If a ∈ E is integral over D, then TrE/Frac(D)(a),NrmE/Frac(D)(a) ∈ D.

Proof. If a is integral, then so is all of its conjugates. Now apply Theorem 9.8. 

We have the following lemma:

Lemma 9.16. Let M be a free Z-module of rank n with basis e := {e1,...,en} and suppose ω := (ωi j) is a matrix with integer entries. Then f := ωe is a basis for M if and only if det(ω) = ±1.

Proof. Suppose that e and f = ωe are both basis sets. This implies that ω is invertible. So, det(ω)det(ω−1) = 1, and since det(ω) is an integer we must have that det(ω) = ±1 by (17.1). Conversely, suppose that det(ω) = ±1. Then det(ω) is invertible, and by (17.1) once again, ω−1 have only integer entries. From the fact that det(ω) 6= 0, we see that f is linearly independent (check this!) and so f = ωe is also a basis since f has the same cardinality as e. 

The following theorem is a generalization of the theorem from group theory stating that a subgroup of a cyclic group is also cyclic. The proof is a bit tricky but since the result is important it is well worth the effort of understanding the details.

Theorem 9.17. Let M be a free Z-module and m ⊆ M a submodule. Then m is free of rank rk(m) ≤ rk(M). Furthermore, there is a basis e of M such that the basis of m can be choosen as {α1e1,...,αrk(m)erk(m)}, for αi ∈ N, where {e1,...,erk(M)} is a basis for M. Proof. The proof is by induction, the case rk(M) = 1 being the case of cyclic groups. Assume then that f is a basis for M. For every m ∈ m we have

T m = (α1,...,αn)f = α1 f1 + ··· + αn fn, where n := rk(M) and T denotes the transpose. If m = 0 the result is trivial so assume otherwise. This means that there is at least one m ∈ m such that at least one αi is different from zero. Let

S(m,f) := {α ∈ N | α is a coordinate for some m ∈ m}, and put ξ := ξm(f) := minf(S(m,f)). In words, ξ is the least positive integer, taken over all basis sets for M, such that ξ is a coordinate for an element of m. Now, choosing the basis e such that ξ is minimal, form the element

a := ξe1 + α2e2 + ··· + αnen is an element of m. By the division algorithm we have αi = qiξ + ri, 0 ≤ ri < ξ and 2 ≤ i ≤ n. Form the element

b := e1 + q2e2 + ··· + qnen 37 and consider the set g := {b,e2,...,en}. The matrix   1 q2 q3 ··· qn  .  0 1 0 .    ω := . .  . ..     1 0  0 ··· 0 1 has determinant 1 and g = ωe so g is also a basis by the above Lemma 9.16. In the basis g the element a becomes a = ξb + r2e2 + ··· + rnen, but by the minimality over all basis sets of ξ and the fact that ri < ξ for all i, leads to ri = 0 and so a = ξb. Put hai := Za, the cyclic Z-submodule of M generated by a, and T n := {(0,β2,...,βn)g }.

Obviously, hai∩n = {0}. Take m ∈ m. We know that m = ε1b+ε2e2 +···+εnen. Reducing 0 0 0 ε1 modulo ξ via the division algorithm we get ε1 = q ξ + r , r < ξ. So, since a = ξb, we get 0 0 0 m − q ξb = m − q a = r b + ε2e2 + ··· + εnen. By the minimiality of ξ once more, we get r0 = 0 and so m − q0a ∈ n which implies that m ∈ n + hai. Hence m = n ⊕ hai. By the induction hypothesis n is a free Z-submodule of M, with rk(n) ≤ n − 1 and so m is a free Z-submodule with rk(m) ≤ n rank less than n.  Theorem 9.18. Suppose M is a free Z-module with rank m, and m ⊆ M a submodule of rank n. Then M/m is finite if and only if m = n, in which case (M : m) = |M/m| = det(ω), where f = ∑ωe, for e a basis for M and f a basis for m. Proof. From the previous Theorem we see that m is free of rk(m) ≤ rk(M) and the first part of the theorem follows. If rk(m) = rk(M) then there are a1,...,an ∈ N such that 0 0 0 0 f = diag(a1,...,an)e for some basis sets f and e of m and M, respectively. Hence, we 0 0 0 0 get e = ωee, f = ω f f with det(ωe) = det(ω f ) = ±1 since e, e , f and f are all basis sets. Clearly, det(diag(a1,...,an)) = a1 ···an and this is the number of elements in M/m. Also, ω = ω f diag(a1,...,an)ωe, so det(ω) = a1 ···an and the proof is finished.  Discriminants. We now need to be a little more general than previously. Recall how field extensions were defined. We now do the same construction for rings. Indeed, we could define a ring extension simply as a ring injection S ,→ R. Normally this is way to general to be of much use, and so also for us. Therefore, we define a ring extension as a ring injection S ,→ R such that R is a free A-module of rank r. This means that there is a basis e := {e1,...er} for R such that r R = Se1 ⊕ Se2 ⊕ ··· ⊕ Ser, that is, ∀r ∈ R, r = ∑ siei, si ∈ S. i=1 That R is an extension of S is denoted as R/S. Notice that, as in the case of field extensions, this notation can hardly be confused with quotient rings since S is (in general) only a subring of R, not an ideal. Now, we define a symmetric, non-degenerate, bilinear S-form (see 17.2.3) by

T(·,·) : R/S → S, (x,y) 7→ TrR/S(xy). 38 D. LARSSON

Definition 9.10. Let R/S be a ring extension as defined above. Then the discriminant of e is defined as Disc(e) := det(T(ei,e j)) = det(TrR/S(eie j)).

Continue to let e be a basis for R/S and form f := σ ·e, where σ = (σi j) is a matrix with n entries in S, that is, f j = ∑i=1 σ jiei. This changes the discriminant as 2 (9.2) Disc(f) = det((σi j)) Disc(e). Hence, Disc(e) is only unique up to multiplication of the square of a unit in S. On the other hand, the ideal that it generates, Disc(R/S) := hDisc(e)i, is well-defined. This is called the discriminant of R/S. Notice that (9.2) shows that if f is not a basis, then Disc(f) = 0. This discussion almost proves the following proposition.

Proposition 9.19. Let notation be as above. If Disc(R/S) 6= h0i then {g1,...,gr} is a basis for R/S as an S-module if and only if

Disc(R/S) = hDisc({g1,...,gr})i.

Proof. This follows from (9.2) and the fact that g := {g1,...,gr} is a basis if and only if det(ω) is a unit, where g = ωe.  If S = Z, which will be our only concern here, then the discriminant is fully unique since the only square of a unit in Z is 1.

Corollary 9.20. Let S = Z. Then f = { f1,..., fm} generates a Z-submodule r ⊆ R of finite index if and only if Disc(f) 6= 0. In that case, 2 Disc(f) = (R : r) Disc(R/Z). Proof. Follows from the above Proposition and Theorem 9.18.  Proposition 9.21. Let E/F be an extension of number fields (i.e., both E and F are exten- sions of Q and E is an extension of F) with F-basis f := { f1,..., fn}. Then 2 Disc({ f1,..., fn}) = det(ϕk( fi)) 6= 0, where ϕk ∈ Mor(E). Proof. The first part follows by direct computation:

Disc({ f1,..., fn}) = det(TrE/F( fi f j)) = det(∑ϕk( fi f j)) = det(∑ϕk( fi)ϕk( f j)) k k 2 = det(ϕk( fi))det(ϕk( f j)) = det(ϕk( fi)) .

If Disc(f) = 0, then the matrix ϕk( fi) is not-invertible. It is not hard to see that this is impossible when f is a basis.  9.4. Integral bases.

Definition 9.11. Let K/Q be an algebraic number field. Then a set {ω1,...,ωm} ⊂ K is called an integral basis for K/Q (or oK) if oK is a free Z-module of rank m with basis {ω1,...,ωm}, i.e., oK = Zω1 ⊕ Zω2 ⊕ ···Zωm, as a Z-module.

This definition means in particular that every a ∈ oK can be written uniquely as

a = a1ω1 + a2ω2 + ··· + amωm, with ai ∈ Z. 39

Rings of integers are finitely generated.

Theorem 9.22. Every algebraic number field K/Q has an integral basis, i.e., the ring of integers oK is a free Z-module. Furthermore, rk(oK) = [K/Q].

Proof. The idea of the proof is to wedge oK between two Z-modules of rank [K/Q]. Let ω := {ω1,...,ωk} be a basis for K/Q. By a previous theorem we can assume that ωi ∈ oK since otherwise we can replace it with dωi for some d ∈ Z. The bilinear map T(x,y) := TrK/Q(xy) is symmetric and non-degenerate. Therefore, by ∨ ∨ ∨ ∨ 17.3.1, there is a T-dual basis ω := {ω1 ,...,ωk } such that ωi (ω j) = δi j. We will show that ∨ ∨ ∨ Zω1 ⊕ Zω2 ⊕ ··· ⊕ Zωk ⊆ oK ⊆ Zω1 ⊕ Zω2 ⊕ ··· ⊕ Zωk . The first inclusion is obvious, so we only need to prove the second. ∨ ∨ Every a ∈ oK can be written as a = a1ω1 + ··· + akωk , with ai ∈ Q. We want to show that, in fact, ai ∈ Z. Since a,ωi ∈ oK, 1 ≤ i ≤ k, we have a · ωi ∈ oK as well. Therefore, T(a,ωi) ∈ Z as Z is integrally closed by Proposition 9.15. But, k ∨ T(a,ωi) = TrK/Q(a · ωi) = TrK/Q(∑ a jω j ωi) j=1 k k ∨ = ∑ a j TrK/Q(ω j ωi) = ∑ a j TrK/Q(δi j) = a j, j=1 j=1 implying that a j ∈ Z. The theorem now follows since we have that oK is a Z-submodule of a free Z-module of rank k and oK also includes a free Z-module of rank k.  9.5. Computing rings of integers. In general, it is quite difficult to compute the rings of integers of a given algebraic number field. There are a number of algorithms of various complexity implemented in several computer algebra programs, but going into that would take us too far. Hence we will have to be satisfied with resorting to some general tricks and folklore guesses. The main tool for us will be computing discriminants. Since discriminants are defined using determinants and determinants are notoriously hard to compute for larger matrices, we need some other means. We start this section with a result that is often helpful in this endeavor. 9.5.1. Computing discriminants.

Theorem 9.23. Let K/Q = Q(α)/Q be an algebraic number field, where α has minimial polynomial irr = irr(α,Q). Then n n−1 (2) Disc({1,α,...,α }) = (−1) NrmQ(α)/Q(∂irr(α)), dirr where ∂irr(z) := dz . Definition 9.12. Let f (z) be an arbitrary polynomial over a field F. Then the discriminant of f in α is n (2) Disc( f )(α) := (−1) NrmE/F(∂ f (α)), for α ∈ E/F, n := [E/F] < ∞. If E = F(α) then Disc( f ) := Disc( f )(α) is simply called the discriminant of f . Notice that the discriminant can be viewed as a polynomial, evaluated at α. This poly- nomial can also be called the discriminant. 40 D. LARSSON

Proof. Recall (see Proposition 9.21) that the discriminant of a set of elements {ω1,...,ωn} is given by j Disc({ω1,...,ωn}) := det(ϕ j(ωi)) = det(ωi ), j where ϕ j ∈ Mor(K) and ωi := ϕ j(ωi). From 17.7 follows n−1 i 2 i 2 2 (9.3) Disc({1,α,...,α }) = det(ϕ j(α )) = det(α j) = ∏(αi − α j) , i< j i i i where α j := ϕ j(α ) = (ϕ j(α)) . The minimal polynomial of α factorizes (in C, for in- n stance) as irr(z) = ∏ j=1(z − α j) and so differentiating gives n n n−1 ∂irr(z) = ∑ ∏(z − α j) =⇒ ∂irr(αi) = ∏(αi − α j). i=1 j=1 i=1 i6= j i6= j Therefore, n n n ∏∂irr(αi) = ∏∏(αi − α j). i=1 i=1 j=1 i6= j

In the left-hand side we recognize NrmQ(α)/Q(∂irr(α)); in the right-hand side each factor appears twice: once as (αi − α j) and once as (α j − αi). Hence, n n l 2 ∏∏(αi − α j) = (−1) ∏(αi − α j) . i=1 j=1 i< j i6= j n Lastly, a simple combinatorial argument shows that l = 2 . The result follows from this together with (9.3).  n Theorem 9.24. Let the minimal polynomial over Q of α be irr(α,Q) = z + az + b. Then n−1 (n) n n−1 n−1 n−1 n Disc({1,α,...,α }) = Disc(irrα ) = (−1) 2 n b + (−1) (n − 1) a . n−1 n −1 Proof. Put θ := ∂irrα (α) = nα + a. Multiplying α + aα + b = 0 by nα we get nαn−1 = −na − nbα−1 and so −nb θ = −(n − 1)a − nbα−1 =⇒ α = . θ + (n − 1)a

From this we see that Q(α) = Q(θ) so deg(irr(θ,Q)) = deg(irrα ) = n. Writing  −nb  irr = g(z)/h(z), α z + (n − 1)a we see that g(θ)/h(θ) = irr(α) = 0 and so g(θ) = 0. I claim that this is the minimal n polynomial of θ. Indeed, using that irrα (z) = z + az + b we get that g(z) = (z + (n − 1)a)n − na(z + (n − 1)a)n−1 + (−1)nnnbn−1. Since this polynomial has degree n and is monic it has to be the minimal polynomial of θ. Therefore, the norm of θ is (−1)n times the degree-zero term in g(z), so

9.23 n n−1 (2) Disc({1,α,...,α }) = (−1) NrmQ(α)/Q(∂irrα (α)) = n n (2) (2) n n−1 n n n n−1 = (−1) NrmQ(α)/Q(θ) = (−1) (−1) (−(n − 1) a + (−1) n b ) = n n n− n− n− n = (−1)(2)(n b 1 + (−1) 1(n − 1) 1a ), 41 proving the theorem.  9.5.2. Discriminants and bases.

Theorem 9.25. Let K/Q be an algebraic number field with ring of integers oK. Suppose 2 that m ⊆ oK is a Z-submodule of oK with Z-basis { f1,..., fk}, k := [K/Q]. Then |oK/m| is a divisor of Disc({ f1,..., fk}). Proof. This is a direct consequence of Theorem 9.17 and Theorem 9.18 with M = oK. 

Corollary 9.26. With notation as in the previous theorem, if Disc({ f1,..., fk}) is square- free, then { f1,..., fk} is a basis for oK. Theorem 9.27. Let K/Q be a number field with ring of integers oK. Suppose that [K/Q] M m = Z fi ( oK, (notice the strict inclusion), i=1 where rk(m) = rk(oK) = [K/Q]. Then there is an element d ∈ oK given by −1 (9.4) d := p (d1 f1 + d2 f2 + ··· + dk fk), where k := [K/Q], 0 ≤ di ≤ p−1, are rational integers (i.e., di ∈ Z for all i) and p a rational 2 prime, such that p |Disc({ f1,..., fk}). Proof. The assumption m ( oK implies that |oK/m| > 1. By Theorem 9.18 follows that r1 rk |oK/m| = p1 ··· pk and so there is a prime p := pi (1 ≤ i ≤ k) dividing |oK/m|. By the the- orem of Cauchy, or Sylow’s first theorem, there is a subgroup of oK/m of order p generated by one element d ∈ oK/m. Therefore, pd ∈ m. Let { f1,..., fk} be the basis for m. Then pd = ε1 f1 + ··· + εk fk. To finish the proof we need to show that εi can be chosen ≤ p − 1. By Theorem 9.17 a basis for m can be chosen such that mi = αiei, for αi ∈ N, 1 ≤ i ≤ k, and where {e1,...,ek} is a basis for oK. The element pd can be written, on the one hand, as pd = pid1e1 + ··· + pidkek, and on the other as, pd = ε1α1e1 + ··· + εkαkek. Suppose ε j ≥ p for some 1 ≤ j ≤ k. Reducing this we get ε j = q j p + r j for 0 ≤ r j < p. Then

ε1α1e1 + ··· + ε jα je j + ··· + εkαkek =

= ε1α1e1 + ··· + q j pα je j + r jα je j + ··· + εkαkek, and so

pd1e1 + ··· + pd je j + ··· + pdkek =

= ε1α1e1 + ··· + q j pα je j + r jα je j + ··· + εkαkek ∈ m which is equivalent to 0 pd := pd1e1 + ··· + p(d j − q jα j)e j + ··· + pdkek =

= ε1α1e1 + ··· + r jα je j + ··· + εkαkek ∈ m, finishing the proof.  9.5.3. An algorithm of sorts. The following is an informal algorithm to compute the ring of integers in a number field.

• Guess the basis B := {β1,...,βn}; • Compute Disc(B); • ∀p ∈ Spec(Z) : p2|Disc(B), is (9.4) an algebraic integer?; • If ’yes’ add this to B; • Repeat until no more integers are found. 42 D. LARSSON

9.5.4. Stickelberger’s theorem.

Theorem 9.28. Let K/Q be an algebraic number field with ring of integers oK, then Disc(oK/Z) ≡ 0 or 1 (mod4). The only proof I know of this theorem, although quite simple, involves Galois theory so I won’t include it here. For those who know some Galois theory might find the proof instructive and is encouraged to look it up in the literature.

9.6. Examples. I will show two examples here on how one can compute rings of integers. The examples are not mine10, so don’t be too impressed by the apparent ingenuity, it is certainly not mine (if I ever possessed it).

Example 9.1. Let Q(α)/Q be an algebraic field extension with α a zero (any) of the polynomial f (z) = z3 − z − 1. This polynomial is monic and irreducible over Q because if it factored we would have f (z) = (z − a)g(z), deg(g) = 2 and a ∈ Q. Any rational zero of f would have to divide 1 so a = ±1. But neither of these is a zero of f . The discriminant is given by Theorem 9.24 and is computed as

3 − − − Disc( f ) = Disc({1,α,α2}) = (−1)(2)(33 · (−1)3 1 + (−1)3 1(3 − 1)3 1(−1)3) = −23. 2 Since this is square-free we get that {1,α,α } is a basis for oQ(α), i.e., 2 oQ(α) = Z1 ⊕ Zα ⊕ Zα . Notice that this is true for which ever zero of f we take.

√3 Example 9.2. Let K/Q be Q( 2)/Q. We want to determine o := o √3 . √ Q( 2) 3 3 The minimal polynomial of α := 2 over Q is irrα := z − 2. This shows that there are three field morphisms

ϕ1 := id : α 7→ α

ϕ2 : α 7→ α2 := ωα 2 ϕ3 : α 7→ α3 := ω α, √ where ω := e2π −1/3. Recall that the field morphisms map the algebraic element to its conjugates, i.e., the other zeros of its minimial polynomial. We make an initial guess that the Z-basis for o is {1,α,α2}. Compute the discriminant: 2  1 α α2  2 2 2 2 Disc({1,α,α }) = det(ϕi(α j)) = det1 ωα ω α  = 1 ω2α ωα2  1 1 1 2 6 2 2 2 2 3 3 = α det1 ω ω2 = 2 · 3 (ω − ω) = −2 · 3 (remember ω = 1). 1 ω2 ω Hence, by Theorem 9.27 we need to consider the possibilities: are 1 2 • θ := 2 (a11 + a2α + a3α ), 0 ≤ ai ≤ 1, or 0 1 2 • θ := 3 (b11 + b2α + b3α ), 0 ≤ bi ≤ 2,

10They are standard examples, appearing more or less in every book on . 43 algebraic integers? We begin with θ. Taking the trace of θ gives us

3 Tr(θ) = ∑ fi(θ) = f1(θ) + f2(θ) + f3(θ) = i=1 1 1 = θ + (a 1 + a ωα + a ω2α2) + (a 1 + a ω2α + a ω4α2) = 2 1 2 3 2 1 2 3 3 1 1 3 = a 1 + a (1 + ω + ω2)α + a (1 + ω2 + ω)α2 = a 1. 2 1 2 2 2 3 2 1 3 Recall that if θ is an algebraic integer, then Tr(θ),Nrm(θ) ∈ Z. Hence, the trace 2 a1 has to be in Z so a1 ∈ 2Z if θ is to be an algebraic integer. 1 2 In that case, θ1 := 2 (a2α + a3α ) also have to be an algebraic integer since θ1 = θ − 1 2 a1. Now, take the norm of θ1: 3 3 3 3 2 Nrm(θ1) = 2 ∏ fi(θ1) = 2 f1(θ1) f2(θ1) f3(θ1) = i=1 2 2 2 2 = θ1(a2ωα + a3ω α )(a2ω α + a3ωα ) = 2 2 2 2 4 2 = (a2α + a3α )(a2ωα + a3ω α )(a2ω ω α ) = 3 3 2 = ω α (a2 + a3α)(a2 + a3ωα)(a2 + a3ω α) = 3 3 2 2 2 2 2 3 3 3 = α (a2 + a2a3(ω + ω + 1)α + a2a3ω(ω + ω + 1)α + a3ω α ) = 3 2 3 = (αa2) + (α a3) , and since α3 = 2, we get 1 a3 + 2a3 Nrm(θ ) = (2a3 + 4a3) = 2 3 ∈ . 1 23 2 3 4 Z

However, we demanded that 0 ≤ a2,a3 ≤ 1, so the above condition is clearly impossible unless a2 = a3 = 0. Therefore, there are no algebraic integers on the first form θ. 0 0 For θ we get Tr(θ ) = b11 and this is clearly an algebraic integer (since b1 ∈ Z). How- 0 0 0 ever, this doesn’t help us much since the difference θ1 := θ − b1 is no better than θ (in fact it is slightly worse). So we compute the norm of θ 0 directly:

1 Nrm(θ 0) = (b + b α + b α2)(b + b ωα + b ω2α2)(b + b ω2α + ω4α2) = 33 1 2 3 1 2 3 1 2 = [after a lot of calculating] = 1 1 = (b3 + b3α3 + b3α6) = (b3 + 2b3 + 4b3). 33 1 2 3 33 1 2 3 Contemplating this for a while, or doing some brute-force calculations, one realizes that this cannot be an integer for 0 ≤ b1,b2,b3 ≤ 2 unless b1 = b2 = b3 = 0. Therefore, we have shown that no algebraic integers exists that cannot be written in the basis {1,α,α2} and so 2 o √3 = Z1 ⊕ Zα ⊕ Zα . Q( 2) Notice that we could have simplified the computation of the discriminant considerably by using Theorem 9.24, but I wanted to show you the hard-core version also. 44 D. LARSSON

10. QUADRATIC NUMBER FIELDS Let d be a square-free integer, that is, if d = pr1 ··· prk then we must have r = ··· = √ 1 k 1 rk = 1.√ The algebraic number field Q( d) is called a quadratic number field. Clearly 2 α := d has minimial polynomial irrα (z)√ = z − d and so the extension√ Q(α)/Q is of degree two. Any element of Q(α) = Q( d) can be written as a + b d = a + bα, for a,b ∈ Q. Despite their innocent appearance, quadratic number fields have a deep theory and still harbors a lot of secrets. One could easily build a whole semester long course on quadratic number fields and still not get very far. Let me also, here in the beginning, mention that quadratic number fields have found applications outside mathematics, in cryptography (factorizing large integers for instance), information technology etc. The first natural question is: what are the rings of integers? 10.1. Ring of integers of quadratic number fields. √ Theorem 10.1. Let K/Q := Q( d)/Q be a quadratic number field. Then ( √ 1 ⊕ d, if d ≡ 2,3 (mod 4); Z Z √ oK = 1+ d Z1 ⊕ Z 2 , if d ≡ 1 (mod 4). (Notice that d ≡ 0 (mod 4) is not allowed since then d wouldn’t be square-free.) Proof. We have that Disc(z2 − d) = 4d by Theorem 9.24. Suppose first that d ≡ 2,3 (mod 4). By Stickelberger’s theorem we have that Disc(oK/Z) is either congruent to zero or one modulo four. By Corollary 9.20, we must have √ √ 2 Disc({1, d}) = Disc(irrα ) = (oK : Z[ d]) Disc(oK/Z), √ √ where Z[ d] denotes the Z-submodule of oK generated by the basis {1, d}. Now, if Disc(oK/Z) ≡ 1 (mod 4) then Disc(oK/Z) = 4k + 1 for some k ∈ Z. Hence, √ √ 2 4d = Disc({1, d}) = (oK : Z[ d]) (4k + 1). Since d is square-free and d ≡ 2,3 (mod 4), it is easy√ to see that we get a contradiction. 2 Therefore,√ Disc(oK/Z) ≡ 0 (mod 4) and so (oK : Z[ d]) = 1. Clearly, if the index is one, oK = Z[ d], and the first case is taken√ care of. For the other√ case, note first that√ {1, d} cannot√ be a basis, because then d 6≡ 1 (mod 4). Notice that Q( d) = Q((1 + d)/2) and (1 + d)/√2 is integral since it is a zero of 2 the polynomial z − z + (1 − d)/4. So Disc({1,(1 + d)/2}) = d. If Disc(oK/Z) ≡ 0 2 (mod 4) (Stickelberger) then 4n + 1 ≡ 4k(oK/Z) and this is impossible. On the other 2 hand, if Disc(oK/Z) ≡ 1 (mod 4) then 4n + 1 ≡ (4k + 1)(oK/Z) which is possible only if (oK/Z) = 1.  10.2. The Ramanujan–Nagell theorem. The topic of this subsection is a fine illustra- tion of the dogma that “elementary 6= easy”. I am going to prove the Ramanujan–Nagell theorem, a result that was conjectured by Srinivasa Ramanujan (1887–1920, self-taught math genius) and proved by Trygve Nagell (1895–1988)√ in 1948, using in an ingenious way the unique factorization of the ring of integers of Q( −7). Nagell was Norwegian but professor at Uppsala University from 1931 to his retirement 196211.

11See his obituary by famous British number-theorist John Cassels (born 1922) at http://www.numbertheory.org/obituaries/AA/nagell/ . 45

Theorem 10.2 (Ramanujan–Nagell). The only solutions to the equation ± z = 1, 3, 5, 11, 181 z2 + 7 = 2n in , are Z n = 3, 4, 5, 7, 15. Proof. Clearly, z must be odd, so assume this in addition to z > 0. If n is even we have: 2 n n n z + 7 = 2 ⇐⇒ (2 2 − z)(2 2 + z) = 7. n n This implies that 2 2 + z = 7 (or the other one, since 7 is a prime) and 2 2 − z = 1. Hence, n 2 · 2 2 = 8 =⇒ n = 4 and so, z = 3. Therefore, from now on we assume that n is odd and n > 3. Now, we have the following factorization into irreducibles √ √ 1 + −71 − −7 (10.1) 2 = . 2 2 Since z = 2k + 1, z2 + 7 is divisible by 4 and so we can re-write the original equation as z2 + 7 (10.2) = 2n−2. 4 Put m := n − 2. Then (10.1) and (10.2) can be combined to the factorization √ √ √ √ z + −7z − −7 1 + −7m1 − −7m (10.3) = . 2 2 2 2

Clearly, the√ right-hand side√ is a factorization into irreducibles (from (10.1)). Any common z+ −7 z− −7 √ factor of 2 and 2 must divide their difference, which is −7. Suppose √ √ 1 ± −7 √ √ 1 ± −7 −7 ⇐⇒ −7 = q. 2 2 Taking the norm of this, yields √ 1 ± −7  1 1  1 + 7 7 = Nrm Nrm(q) = + 7 Nrm(q) = Nrm(q) = 2Nrm(q), 2 22 22 4 √ 1± −7 √ which is obviously impossible. Therefore, 2 6 −7. Since neither does this, we deduce that there can be no common factors. Fact 10.3. The only units in o √ are ±1. Q( −7) √ √ √ √ 1+ −7 z− −7 z− −7  1+ −7  If 2 2 then 2 = 2 k for some k ∈ Z. But this is equivalent to √ √ √ z − −7 = k + k −7 ⇐⇒ z = k + (k + 1) −7 which is a rational integer only if k = −1, leading to z = −1, contrary to assumption. Hence, √ √ 1 + −7 z + −7 =⇒ 2 2 √ √ √ √ 1 ± −7m z ± −7 z ± −7 1 ± −7m ⇐⇒ = , 2 2 2 2 the last equivalence following from the above Fact 10.3. 46 D. LARSSON √ √ 1+ −7 1− −7 Put α := 2 and β := 2 . From the last equivalence above we get √ √ √ 1 + −7m 1 − −7m (10.4) ± −7 = − ⇐⇒ ±(α − β) = αm − β m. 2 2 We have α2 ≡ (1 − β)2 ≡ 1 (mod β 2) which follows since α + β = 1. From this we get m 2 m−1 2 α = α(α ) 2 ≡ α (mod β ) and so, assuming the sign positive in (10.4), α − β = αm − β m ≡ α − β m (mod β 2), implying that β ≡ β m (mod β 2) which is not the case. Hence the sign must be negative in (10.4). By expanding (10.4) with the binomial formula we deduce:           m−1 m m m 2 m 3 m m−1 (10.5) −2 = − 7 + 7 − 7 + ··· ± 7 2 . 1 3 5 7 m Since all the terms on the right except the first one is divisible by 7, we have m (10.6) −2m−1 ≡ = m (mod 7). 1 Using that, 26 ≡ 1 (mod 7) it is quite easy to see that (by trying every possibility if neces- sary), modulo 42, the only solutions to (10.6) are m = 3, 5, 13 (mod 42). We shall show that m = 3, 5, 13 are the only remaining possibilities for solving the original problem. That 3, 5, 13 are solutions follows by construction or by simply checking. Suppose first that m ≡ 3 (mod 42). We will show that this is impossible unless m actually is 3. So, (10.7) m ≡ 42 ⇐⇒ m − 3 = 7q · 6 · k, for some k ∈ Z, 7 6 |k. We have,  m  m(m − 1)(m − 2)(m − 3) m − 4 7` = 7`. 2` + 1 (2` + 1)(2`)(2` − 1)(2` − 2) 2` − 3 Since 7`−1 > 2` − 1, we have that this is divisible by 7q+1, for ` > 1. Therefore, from (10.5) follows that 7 (10.8) −2m−1 ≡ m − m(m − 1)(m − 2)(mod 7q+1) 6 and so, −2m−1 ≡ −4 ≡ m − 7 (mod 7q+1) where the last congruence comes from combining (10.7) and (10.8). But this implies that m − 3 ≡ 0 (mod 7q+1), thereby contradicting the assumption on k. Hence m = 3. The next case m ≡ 5 (mod 42) follows with a similar method. Now,  m  m(m − 1)(m − 2)(m − 3)(m − 4)(m − 5) m − 6 7` = 7`, 2` + 1 (2` + 1)(2`)(2` − 1)(2` − 2)(2` − 3)(2` − 4) 2` − 5 leading to −25−1 ≡ −16 ≡ m − 70 + 49 (mod 7q+1), 47 implying that m − 5 ≡ 0 (mod 7q+1), a contradiction. q A little messier is the last case. Suppose m − 13 = 7 · 6 · k, for some k ∈ Z, 7 6 |k. Here,  m  m(m − 1)(m − 2)···(m − 13) m − 14 7` = 7`, 2` + 1 (2` + 1)(2`)(2` − 1)···(2` − 12) 2` − 13 and so, for ` > 6, since 7`−2 > 2` + 1, this is divisible by 7q+1· Hence,

− 213−1 ≡ −4096 ≡ m m m m  m   m  ≡ m − 7 + 72 − 73 + 74 − 75 + 76 (mod 7q+1). 3 5 7 9 11 13 Replacing m in this by 13 + 7q · 6 · k in all terms divisible by 7 gives, −4096 ≡ m − 4109 (mod 7q+1) ⇐⇒ m − 13 ≡ 0 (mod 7q+1), a contradiction, thus finishing the proof. 

11. DIRICHLET’SUNITTHEOREM 11.1. Roots of unity. Let K be a field and throughout below, n denotes an integer rel- atively prime to the characteristic (if this is different from zero). We recall some basic facts: (i) Any polynomial f ∈ K[z] of degree n has at most n zeros; n (ii) a cyclic group C is a group generated by one element, i.e., C = {a | n ∈ Z}; the n order of C is the least n ∈ Z>0 such that a = e (where e is the identity); every cyclic group is abelian, i.e., ab = ba ∈ C; (iii) a product of cyclic groups of relatively prime order is cyclic, i.e.,

Cn × Cm =∼ Cmn, gcd(m,n) = 1; Proposition 11.1. Every finite multiplicative group of a field K is cyclic. Proof. Let G ⊂ K× be a finite subgroup of K× = U(K). Clearly G is abelian. Choose r r a ∈ G with maximal period, i.e., ap = 1 and ap −i 6= 1 for all i < pr (it is easy to see that this maximal period has to be a power of a rational prime; indeed, it follows from r the Chinese remainder theorem), and consider the equation zp − 1 = 0. Every element of G is a solution to this equation. Consider the cyclic subgroup of G generated by a. This r has to be the whole G since otherwise the equation zp − 1 = 0 would have more than pr solutions.  An mth root of unity is an element ζ ∈ K such that ζ m = 1. The set of m-roots of unity in K is denoted (K) or simply if the field is obvious. Since is an mth root of unity µm µm ζ if and only if it satisfies the equation zm − 1 = 0, the cardinality of (K) is finite, there µm being only finitely many zeros of a polynomial. Put [ (K) := := (K). µ µ µm m≥2 Proposition 11.2. The set ⊂ K× is a finite cyclic group; (K) is an abelian. A gener- µm µ ator for is called a primitive mth root of unity. µm Proof. Let x,y ∈ . Then (xy)m = xmym = 1, so is a subgroup of K×. The rest µm µm follows immediately by the above proposition. If x ∈ ⊆ (K) and y ∈ ⊆ (K), then µn µ µm µ xy ∈ ⊆ (K) so (K) is also an group (not finite in general). µn+m µ µ  48 D. LARSSON

It is in this generality impossible to tell the cardinality of (K). µm Notice that (K) ⊆ o since every a ∈ (K) is a zero of the monic integral polyno- µm K µm m mial z − 1. This obviously implies that µ(K) ⊆ oK also. 11.2. Units in number fields. 11.2.1. Dirichlet’s unit theorem.

Theorem 11.3. Let K/Q be an algebraic number field. Then the set of units is a finitely generated abelian group

× ∼ r1+r2−1 oK := U(oK) = µ(K) × Z , ℜ ℑ where r1 = #Mor(K) and r2 = #Mor(K) . Proof. The proof of this goes beyond the scope of the course. However, since the classical proof is rather interesting I encourage you to look it up in the literature.  × 11.2.2. Consequences. The theorem says that there are u1,...,ur1+r2−1 ∈ oK such that any × unit u ∈ oK can be written as n u = ζ · un1 un2 ···u r1+r2−1 , ζ ∈ µ(K). 1 2 r1+r2−1

The elements u1,...,ur1+r2−1 are called the fundamental units of K and it is a formidable problem to compute these in general. However, if the field is real, i.e., if K ⊆ R, then µ(K) = {±1} (this is true for many (most?) fields) so the only problem is finding explicit generators for the ”Betti-part” r +r − Z 1 2 1. Recall that any finitely generated abelian group G can be decomposed as betti G = Gtors × G , where every element in the Betti-part is of infinite order12. An element a in a group is a torsion element if an = e for some n > 1. The set of torsion elements of a group is a subgroup, the torsion subgroup, Gtors. This holds for every group, abelian or not. But when G is abelian the above decomposition can be made. In fact, it is possible to prove that ∼ betti ∼ b Gtors = Z r1 × ··· × Zprs , and G = Z , b,s ≥ 0. p1 s If b = 0 then G is called a torsion group and if s = 0 then G is called a free group of rank b. In the case of units in number fields, we see that the torsion elements are exactly the subgroup µ(K) and the ”free” elements are the elements of the Betti-part.

Proposition 11.4. An element a ∈ oK is a unit if and only if NrmK/Q(a) = ±1.

Proof. If a is a unit then there is a b ∈ oK such that ab = 1 and so taking norms

1 = NrmK/Q(ab) = NrmK/Q(a)NrmK/Q(b) =⇒ NrmK/Q(a) = ±1. Conversely, recall that the norm can be computed as

NrmK/Q(a) = ∏ ϕ(a) = a · ∏ ϕ(a). ϕ∈Mor(K) ϕ∈Mor(K) ϕ6=id

12Enrico Betti (1823–1892) was an Italian mathematician specializing in algebra and topology. The name ”Betti-number” in the present case is actually borrowed from topology obviously named in Betti’s honor, where this is the rank of certain abelian groups appearing as groups attached to topological spaces. 49

Taking b := ±∏ϕ6=id ϕ(a) gives the desired result. 

12. DEDEKINDDOMAINS Recall the following notions: - A ring R is noetherian (see Definition 4.11) if every ascending ideal chain

··· ⊆ ii−1 ⊆ ii ⊆ ii+1 ⊆ ··· stabilizes, i.e., such that there is an N ∈ Z such that

iN = iN+1 = iN+2 = ··· ; - a domain D is integrally closed if every a ∈ Frac(D) satisfying P(a) = 0 with P ∈ D[z] monic, implies that a ∈ D. Recall also that every maximal ideal is prime. We now make the following definition: Definition 12.1. An integral domain is called a Dedekind domain if it is noetherian, integrally closed and every prime ideal is maximal. √ We already know one example, namely, Z. Another is the Gaussian integers Z[ −1]. In fact,

Theorem 12.1. The ring of integers oK to an algebraic number field K is a Dedekind domain.

Proof. That oK is noetherian follows since oK is a finitely generated Z-module and Z is a noetherian ring (this is a standard fact from abstract algebra). Since oK is the integral closure of Z in K, oK is integrally closed. Hence we only need to show that every prime ∗ ideal is maximal. Let p be a prime ideal of oK. Then p is a non-zero prime ideal hpi of Z. It is clear that it is prime. Now, every a ∈ oK is the zero of a monic polynomial Pa with coefficients in Z. That it is non-zero follows since if a ∈ p then, n n−1 a + αn−1a + ··· + α0 = 0, for αi ∈ Z, α0 6= 0; ∗ hence, α0 ∈ p . Reducing oK modulo p is equivalent to reducing every a ∈ oK modulo p ∗ [ and so reducing all the Pa’s modulo p . Hence, since Z/hpi is a field, and a ∈ oK/p is algebraic over Z/hpi, oK/p must be a field and so p must be a maximal ideal.  12.1. A few important remarks. Before coming to the main theorem in this section, let me spare a few breaths on how ideals behave under morphisms. Some of this is simply reminders on topics from previous section, while others are new. First a notation. It is convenient and suggestive to sometimes use the notation Fp for oK/p. We will not use this in the case of powers of p, i.e., we will not use Fpe as standing e for oK/p . In this section, let A and B be arbitrary (commutative, associative) rings (with unity) and let f : A → B be a ring morphism13. We let a and b be ideals of A and B, respectively. When f is an injection B is called a ring extension of A, denoted as B/A or B | A. Recall the following notation b∗ := f −1(b) := b ∩ A := {a ∈ A | f (a) ∈ b},

a∗ := hai := f (a)B := aB := { ∑ aibi | ai ∈ f (a),bi ∈ B}. finite

f 13A very common way in all of algebra and number theory of saying that there is a morphism A −→ B, is that B is an A-algebra, or an algebra over A. 50 D. LARSSON

In the first case here one often says that ”b lies over a”. Notice that if f is an inclusion then this is equivalent to b ∩ A = a in the ordinary set-theoretical sense, hence the general notation14. Remember that there is a bijection:

a⊆a07→a0/a {ideals of A which includes a} ←→ {ideals of A/a}. Marvel at the suggestive notation a | a0 for a0 ⊆ a. Therefore, the ideals of a | a0 are exactly the ones giving the ideals of A/a0; or the ideals of A/a0 are exactly the ideals lying over a0; hence, by slightly abusing good mathematical decorum, passing to the quotient A/a0 kills ideals lying ”under” a0 (i.e., the ideals that are not over a0). Every morphism f : A → B induces a morphism b∗ = f −1(b) → b (by definition). There is a commutative diagram

f πb A K 9/ B / B/b O KK πb∗ M KKK K% % A/b∗ 0 8 rrr rrr b∗ / b ∗ The kernel of the composition πb ◦ f : A → B/b, is clearly b . Therefore, by the first isomorphism theorem, the dotted arrow exists and is an injection. From this follows that p ∈ Spec(B) =⇒ p∗ ∈ Spec(A). Indeed, since p is prime, B/p is an integral domain. We have that A/p∗ ,→ B/p and so A/p∗ is a subring of B/p and thus must be an integral domain also, implying that p∗ is prime. The implication p ∈ Spec(A) ⇒ p∗ ∈ Spec(B) is not true! In fact, this is one (the?) reason why decomposition of primes in Dedekind ring exten- sions is needed, because if primes were to stay primes under extensions the decomposition problem would be an empty problem. However, this non-implication is also a blessing since the theory becomes so much richer and more beautiful now! This will hopefully be apparent in what follows.

12.2. The main theorem on Dedekind domains. We know that in general, elements in domains cannot be decomposed into a unique (up to associates even) product of irre- ducibles in general. That is, factorization is not unique. The amazing thing is now that factorization is unique if we consider, not the elements themselves, but the ideals they generate! Therefore, connecting to the last paragraph of the previous section, the extended primes do not in general stay prime but they can be decomposed into unique primes! I want to emphasize in no small manner that this is in general only possible for Dedekind ring extensions. In what follows, oK is a Dedekind domain and K its field of fractions K := Frac(oK).

14Frankly, though, I’m not so hot on the notation b ∩ A in general, unless it is true set-theoretically, but the notation abounds in the literature so you better get used to it. 51

Theorem 12.2. Every non-zero, proper, ideal i in a Dedekind domain oK can be decom- posed uniquely (up to a re-ordering of factors) into prime ideals, i.e.,

ep i = ∏ p , where ep ≥ 0, ep = 0, for all but a finite number of p. p∈Spec(oK ) For the proof of this we need two lemmas:

Lemma 12.3. For every ideal a ⊆ oK there are non-zero prime ideals p1,...,pr such that p1 ···pr ⊆ a.

Lemma 12.4. Let p be a prime ideal in oK. Define −1 p := {a ∈ K | a · p ⊆ oK}. −1 −1 −1 Then a · p 6= a for every non-zero ideal a in oK. Notice that a ⊆ ap since 1 ∈ p . Proof of Lemma 12.3. Let S be the set of all ideals such that the statement of the lemma does not hold, and assume that S is non-empty. Since oK is noetherian the set S must have a maximal element, a. Furthermore, a cannot be a prime ideal so there are b,c ∈ oK such that bc ∈ a but b 6∈ a, c 6∈ a. Clearly, a ⊂ a + hbi, a ⊂ a + hci and (a + hbi)(a + hci) ⊆ a. Since a is maximal with respect to not containing a product of prime ideals, a + hbi and a+hci do. But this together with (a+hbi)(a+hci) ⊆ a implies that a also does, a contra- diction. 

Proof of Lemma 12.4. Let a ∈ p, a 6= 0. Then by the previous lemma there are primes p1,...,pr such that p1 ···pr ⊆ hai ⊆ p. We can assume that r is the smallest possible such that this is true. Then one of the pi’s, say p1, is contained in p since if not, then we could choose a j ∈ p \ p j with a1 ···ar ∈ p; but since p is prime, a j ∈ p, for some j, a contradiction. This implies that p1 = p since p1 is maximal. We have that p2 ···pr 6⊆ hai, −1 so there is a b ∈ p2 ···pr such that b 6∈ aoK, i.e., a b 6∈ oK. However, we have that bp ⊆ hai −1 −1 −1 −1 so a bp ⊆ oK, implying that a b ∈ p , so p 6= oK. Let a be a non-zero ideal with generators a1,...,an (since oK is noetherian every ideal is finitely generated, another standard fact of noetherian rings). Assume that p−1a = a. Then for every b ∈ p−1 we have

bai = ∑Ai ja j, where Ai j ∈ oK. j This is equivalent to    b − A11 −A12 ... −A1n a1  −A21 b − A22 ... −A2n a2    = 0¯.  . . .. .  .   . . . .  .  −An1 −An2 ... b − Ann an Denote the square-matrix by W. By Cramer’s rule (see the Appendix) we get

det(W)a1 = det(W)a2 = ··· = det(W)an = 0, implying that det(W) = 0.

Hence b is integral over oK (expand det(W)); so b ∈ oK since oK is integrally closed, and −1 −1 thus p = oK, a contradiction. Therefore, p a 6= a and the proof is finished.  Now we can prove Theorem 12.2.

Proof of Theorem 12.2. We begin by showing existence. Let S be the set of proper non-zero ideals that cannot be decomposed into prime ideals. The same argument as in the proof 52 D. LARSSON of Lemma 12.3 shows that there is a maximal element a ∈ S. This ideal is not prime so is included in a prime (maximal) ideal15 p. We get −1 −1 a ⊆ ap ⊆ pp ⊆ oK. −1 −1 However, Lemma 12.4 shows that a ⊂ ap and p ⊂ pp ⊆ oK strictly. Since p is maximal (notice that ap−1 is an ideal for all non-zero ideals a) we must have that −1 pp = oK.

Clearly, a 6= p implies that ap 6= oK, hence, taking into account the maximality of a in S and a ⊂ ap−1, the ideal ap−1 admits a prime decomposition −1 −1 ap = p1 ···pn and then so does a = ap p = pp1 ···pn, a contradiction. To show uniqueness assume that a can be decomposed as

a = p1p2 ···pn = q1q2 ···qm. The definition of prime ideals can be re-phrased as ab ⊆ p ⇒ a ⊆ p or b ⊆ p ⇐⇒ p | ab ⇒ p | a or p | b.

Now, p1p2 ···pn = q1q2 ···qm implies that p1|qi for some 1 ≤ i ≤ m. Since p1 is maximal, −1 −1 p1 = qi. Hence, multiplying with p1 and using that p1p1 = oK, we can cancel p1 = qi. Continuing like this shows that n = m and exactly one of the q j’s correspond to a given pi. The proof is finished.  Definition 12.2. Let K/Q be an algebraic number field. Then a of K is a non-zero, finitely generated oK-submodule i ⊆ K. We denote the set of fractional ideals in K by J(K).

To distinguish between the different notions of ideals we will call ideals in oK integral ideals. Notice that the definition of fractional ideal is equivalent to: i is a fractional ideal if ∗ there is an α ∈ oK such that αi ⊆ oK is an integral ideal. Theorem 12.5. Let J(K) denote the set of fractional ideals in the algebraic number field K/Q. Then J(K) is an abelian group under the multiplication defined above, with identity element h1i = oK and inverses defined by −1 a := {x ∈ K | xa ⊆ h1i = oK}. Proof. The only thing that is not trivial to prove is that every fractional ideal has an inverse. Consider first an integral ideal a. By Theorem 12.2 this can be decomposed as a = p1 ···pn. −1 −1 −1 −1 This gives the inverse as a := p1 ···pn since, by the proof of Lemma 12.4, p ⊂ pp −1 −1 and so, because p is maximal, pp = oK. Also, if ba = oK, then b ⊆ a ; if ba ⊆ oK then −1 bab ⊆ b and so, since ab = oK, b ∈ b. Therefore, b = a . For the case of fractional ideals, ∗ recall that a is fractional if and only if there is a d ∈ oK such that da ⊆ oK is an integral −1 −1 −1 ideal. Then d a is an inverse to da, whence aa = oK.  Corollary 12.6. Any fractional ideal a can be decomposed as

ep a = ∏ p , with ep ∈ Z, ep = 0, for all but a finite number of p. p∈Spec(oK )

This means that J(K) is the free abelian group on the set Spec(oK).

15This is a fact from ring theory (following from Zorn’s lemma): every ideal is contained in a maximal ideal. 53

The set of principal fractional ideals, i.e., fractional ideals on the form

∗ a = hai = a · oK, where a ∈ K , form a subgroup of J(K) denoted P(K).

Definition 12.3. The quotient group

Cls(K) := J(K)P(K), is called the ideal class group of K. The class of a ⊂ oK is called the ideal class of a. The number

hK := #Cls(K) is called the class number of K.

Notice that when hK = 1, it follows from definition, that every ideal in oK is principal, and thus oK is a unique factorization domain. This implication does not hold for fields of class number two, i.e., such that hK = 2. Hence, Cls(K) measures the deviation of K/Q to have unique factorization of elements.

Theorem 12.7. When K/Q is a number field and oK its ring of integers, the class number hK is finite. Proof. I know of no proof of this theorem that doesn’t use rather advanced ideas (though, classical), therefore I will leave this important fact un-proven, much to my dissatisfaction. 

Remark 12.1. This theorem is not valid for general Dedekind domains.

We have the following happy consequence for Dedekind domains:

Theorem 12.8. Let oK be the ring of integers in a number field K/Q. Then

oK is a PID ⇐⇒ oK is a UFD, i.e., hK = 1 if and only if oK is a unique factorization domain.

Proof. 

Let i be an ideal in oK for K/Q a number field. We define the absolute norm of i as + N(i) := (oK : i), N : Id(oK) → Z , where Id(oK) denotes the set of all ideals in oK. This can be extended to a homomorphism of groups n N : J(K) → + := { | n,m ∈ +}, Q m Z and satisfies

N(a · b) = N(a)N(b), and N(hai) = |NrmK/Q(a)|.

Notice that, by the above multiplicative property, we have that N(oK) = 1 since oK is the unit element in J(K). From this follows that we can define N(a−1) = N(a)−1. 54 D. LARSSON

13. EXTENSIONS, DECOMPOSITION AND RAMIFICATION From now on I will use terminology and facts from the theory of finite fields. Consult the section entitled ”Finite fields” for an introduction to this topic. The following theorem will be left un-proven.

Theorem 13.1. Let oK be a Dedekind domain with fraction field K and let L/K be a finite extension of K. Then the integral closure oL := oK of oK in L is also a Dedekind domain.

Now, let p be a prime ideal of K. Then the extension p∗ = poL of p in oL is an ideal but not necessarily prime. However, since oL is a Dedekind domain p∗ decomposes as a product of primes of L:

eP p∗ = ∏ P , where, eP ≥ 0, eP = 0, for all but a finite number of P. P∈Spec(oL)

As is customary, we will simply write the extension of p in L as p and not p∗. Lemma 13.2. Let L/K be a finite field extension.

(i) If K = Q, i.e., if L/Q is an algebraic field extension, and p ∈ Spec(oL). Then oL/p is a finite field. ∗ (ii) In general, the extension FP/Fp = (oL/P)/(oK/p), where p = P , is a finite exten- sion of finite fields, called the residue field extension (associated to P and p).

Proof. (i) Since p is maximal it is clear that oL/p is a field. Every prime ideal ∗ p ∈ Spec(oL) lies over a non-zero proper prime ideal in Z, i.e., p = (p), via the canonical injection Z ,→ oL. Reducing oL modulo p automatically reduces Z modulo (p). Hence there is an injection Fp ,→ Fp, and so Fp/Fp is a finite field ex- tension of a finite field (recall that oK is a Z-module of finite rank equal to [K/Q]), implying that Fp = oL/p is itself finite. (ii) This follows from (i) and the tower law. 

e1 e2 er Definition 13.1. Let p decompose in L as p = P1 P2 ···Pr . Then - p is called unramified in L if e1 = e2 = ··· = er = 1 and if FP/Fp is a (this is always the case for us); otherwise, p is ramified (in L) and P j ramifies over p if e j > 1 and is unramified if e j = 1. - p is totally split in L if r = [L/K] and non-split if r = 1; - p is called inert if it stays prime in the extension, i.e., if p∗ is prime; - the number f j := [FP/Fp]

is called the jth inertia degree of p (in L) and the number e j is called the jth ramification degree; - p (or Pi) is tamely ramified if ei > 1 and gcd(ei,char(Fp)) = 1; - if char(Fp) | ei, p (or P) is called wildly ramified.

e1 er Theorem 13.3. Let L/K be a finite extension. For every p ∈ Spec(oK) with p = P1 ···Pr we have the so-called fundamental identity r ∑ e j f j = [L/K]. j=1 55

For a proof see any book on algebraic number theory. Notice that we can write the sum as ∑v(P | p) f (P | p) P|p which is often done in the literature. This is to be interpreted as summing over all primes P lying over the given p. Observe the suggestive notation P | p. 13.1. Ramification and decomposition. 13.1.1. Which primes ramify? The treatment that follows is a (very) slight adaption of arguments I learned from Keith Conrad.

Theorem 13.4. Let K/Q be an algebraic number field. Then the primes of Q that ramify in K are the prime divisors of the discriminant, i.e.,

e1 e2 er p | Disc(oK/Z) ⇔ p = p1 p2 ···pr , where at least oneei > 0. To prove this theorem we need a lemma.

Lemma 13.5. Let oK be the ring of integers of K/Q and let M and N be two Z-modules. Then

(i) DiscZ(oK)(mod p) = DiscZ/hpi(oK/hpi); (ii) DiscZ(M × N) = DiscZ(M)DiscZ(N). [,p Proof. Note first that if f := { f1,..., f[K/Q]} is a basis for oK over Z then f is a basis of oK/hpi over Z/hpi. Hence oK/hpi is a Fp-vector space of dimension [K/Q]. Recall the definition of the discriminant of oK with basis f: Disc(oK) = Disc(f) = det(Tr( fi f j)). If mx is the matrix of the linear mapping associated with multiplication of x ∈ oK, it is clear that mx[,p is the matrix of the reduced map oK/hpi → oK/hpi. Hence,

Tr(o /hpi)/ (m [ [ ) = Tro / (m fi f j )(mod p). K Fp fi f j K Z Now taking determinants gives (i). Notice that if f and f0 are basis sets of M and N, 0 0 0 0 0 respectively, then f ∪ f = { f1, f2,..., fm, f1, f2 ..., fn} is a basis for M × N and fi f j = 0 for all i and j. From this follows that the matrix to take determinant of to get Disc(M × N) is the matrix  ¯   ¯  Tr(M×N)/Z( fi f j) 0 TrM/Z( fi f j) 0 ¯ 0 0 = ¯ 0 0 . 0 Tr(M×N)/Z( fi f j) 0 TrN/Z( fi f j) Therefore, taking determinants gives (ii). 

e1 er Proof of Theorem 13.4. Since hpi = p1 ···pr , pi 6= p j, i 6= j, the ring-theoretical version of the Chinese remainder theorem (see the Appendix), gives that ∼ e1 e2 er oK/hpi = oK/p1 × oK/p2 × ··· × oK/pr . ei ei We have that hpi ⊆ pi so each oK/pi is an Fp-vector space. Using the above lemma (ii) gives ei DiscFp (oK/hpi) = ∏DiscFp (oK/pi ). By part (i) of the lemma,

DiscZ(oK)(mod p) = DiscFp (oK/hpi) so p|DiscZ(oK) if and only if DiscFp (oK/hpi) = 0 in Fp. Hence we need to show that for every pe|hpi, v DiscFp (oK/p ) = 0 ⇐⇒ e > 1. 56 D. LARSSON

Since the vanishing of discriminants is independent on the chosen basis this will work in any basis. Assume that e > 1 and pick a non-zero x ∈ p \ pe. Extend the reduction of x modulo pe, [ [ e [ [ e x , to a basis {x1 := x ,x2,...,xn} of oK/p . Notice that x is nilpotent, i.e., (x ) = 0. The [ [ first column of the matrix Tr(xix j) is Tr(x x j). The linear mapping associated with x x j is [ nilpotent so has eigenvalues all equal zero (in some basis). So Tr(x x j) = 0. Hence the determinant of the trace matrix is zero and so then is also the discriminant. Assume now that e = 1. We need to show that p - DiscZ(oK) in this case. Then Fp := a oK/p is a finite field extension of Fp. In fact [Fp/Fp] = p , for some a > 1. If DiscFp (Fp) = det(Tr(eie j)) = 0 then Tr : Fp → Fp must be the zero function. However, for every x ∈ Fpa 2 a−1 (for every finite field) Tr(x) = x+xp +xp +···+xp which can never be the zero function a since this is a polynomial of degree less than [Fp/Fp] = p . Hence DiscZ(oK) 6= 0, and the proof is finished.  Corollary 13.6. There are only finitely many primes that ramify in any given finite Q- extension. Also, there are no finite, unramified extensions of Q. Hence, once we have computed the discriminant, it is easy16 to determine which primes ramify in the given extension. Note also that even though there are no unramified algebraic number fields one can ask for number fields unramified outside a specified set of primes. This is a useful technique (and important problem) in number theory. Don’t, however, be misled into thinking that there are no unramified extensions of Q; the point is that there are no finite extensions. Also, extending a number field K/Q to L ⊇ K can give an unramified extension L/K (al- though, of course, L/Q is ramified). In addition, the so-called local fields, which for our concern can be thought as the finite field extensions of the p-adic numbers. 13.1.2. Dedekind’s theorem. There is also a beautiful result, apparently due to Dedekind, on how to compute the decomposition of a prime in an extension. Recall that the index, (M : m), of a submodule m ⊆ M is the number of left (or right) cosets of m, i.e., (M : m) := #(M/m) ∈ N ∪ {∞}. 2 [Q(θ)/Q] Let Z[θ] denote the submodule of oQ(θ) spanned over Z by {1,θ,θ ,...,θ }. No- tice that oK does not have to have the same generators. To simplify we write F(z) := irr(θ,Q)(z).

Theorem 13.7 (Dedekind). Let K/Q be a finite field extension and let θ ∈ oK such that θ is a primitive element for K, i.e., such that K = Q(θ). Let p be a rational prime not dividing the index (oK : Z[θ]). Factor the reduction of the minimal polynomial of θ modulo p into irreducibles as

[,p [,p e1 [,p er [,p F (z) = F1 (z) ···Fr (z) , Fj(z) ∈ Z[z], Fj (z) ∈ (Z/hpi)[z], [,p where (·) : Z[z] → (Z/hpi)[z] denotes the reduction modulo p. Then

(13.1) p j := hpi∗ + hFj(θ)i∗ = hp,Fj(θ)i∗ are the different prime ideals in oQ(θ) over hpi. Furthermore,

[,p  e1 er f j = deg Fj (z) , and hpi = p1 ···pr ,

16Easy in principle, at least; factoring large integers is an extremely hard problem even for the fastest com- puters, a fact that is utilized in cryptography. 57

[,p i.e., the e j are the ramification indices and the degrees of Fj (z) the inertia degrees. When no confusion can arise, we write (·)[ for (·)[,p.

Proof. First put s := (oK : Z[θ]), and note that hsi ⊆ Z[θ] ⊆ oK. There is a natural morphism Z[θ]/hpi → oK/hpi induced by the injection Z[θ] ,→ oK. The fact that p - s implies that hpi + hsi = oK, which in turn, since Z[θ] ⊇ hsi, implies that hpi + Z[θ] = oK. From this follows that Z[θ] → oK/hpi is onto. The kernel of this morphism is hpi∩Z[θ]. Clearly, p(Z[θ]) ⊆ hpi∩Z[θ] since Z[θ] ⊆ oK. On the other hand, we also have,  hpi ∩ Z[θ] ⊆ p hpi + hsi ∩ Z[θ] ⊆ p(Z[θ]). Therefore, ∼ Z[θ]/hpi = oK/hpi. We also have that Z[z]/hF(z)i =∼ Z[θ], by f (z)(modF(z)) 7→ f (θ). So, ∼ ∼ [ Z[θ]/hpi = Z[z]/hF(z)i = Fp[z]/hF (z)i. Hence, ∼ [ (13.2a) oK/hpi = Fp[z]/hF (z)i [ since both oK/hpi and Fp[z]/hF (z)i are isomorphic to Z[θ]/hpi. We can expliticize the isomorphism in the steps:

[ z7→θ (13.2b) Fp[z]/hF (z)i −→ Z[z]/hp,F(z)i −−→ Z[θ]/hpi → oK/hpi. The first map in (13.2b) is given by the composition [ [ Fp[z]/hF (z)i → Z[z] → Z[z]/hp,F (z)i : f (z)(modhF[(z)i) 7→ F(z) 7→ F(z)(modhp,F[(z)i), where F(z) is a polynomial such that f (z) = F(z)(modhF[(z)i). This composition is a well-defined morphism. The last morphism doesn’t have to be specified since we are already inside oK/hpi. [ [ [ The maximal ideals m of Fp[z]/hF (z)i are ideals on the form m/hF (z)i, where m is a maximal ideal of Fp[z] and hence, since Fp[z] is a PID, on the form m = hπi for some [ irreducible polynomial π ∈ Fp[z], in particular, π | F (z). Therefore, the maximal ideals [ [ of Fp[z]/hF (z)i are in bijection with the irreducible factors of F (z). Using the isomor- phism17 (R/i)(j/i) =∼ R/j, where j and i are ideals of R, we get [ . [ ∼ Fp[z]/hF (z)i hπi/hF (z)i = Fp[z]/hπi.

deg(π) The number of elements in Fp[z]/hπi is p and so the number of elements of Fp[z]/hπi is exactly the degree of the corresponding irreducible factor of F[(z). Since hπik = hπki (check!) the kth power of m[ = hπi/hF[(z)i is (hπki + hF[(z)i)hF[(z)i (check!).

17This is sometimes called the second isomorphism theorem. 58 D. LARSSON

The ideal hπki + hF[(z)i is in fact the ideal generated by the greatest common divisor hgcd(πk,F[(z))i and this is ( hπki if 1 ≤ k < v hgcd(πk,F[(z))i = hπvi if k ≥ v, where v is the power of π in F[(z). Now we have clarified the connection between the [ [ decomposition shape of F (z) and the maximal ideals of Fp[z]/hF (z)i. A word by word analysis can be applied to oK/hpi so by the isomorphism (13.2a) the same decomposition shape holds for maximal ideals in oK/hpi. [ To finish the proof, we use the isomorphism (13.2b) to transfer the ideals from Fp[z]/hF i to oK/hpi, yielding the explicit expression (13.1).  Corollary 13.8. If p - Disc(Z[θ]) then Theorem 13.7 applies to p. 2 Proof. This follows since Disc(Z[θ]) = (oK : Z[θ]) Disc(oK).  This theorem can be generalized to finite simple extensions of any separable field K.

Also, the condition that p - (oQ(θ) : Z[θ]) is unnecessarily strong. In fact, let us define the conductor of L/K = K(θ)/K as follows

f(L/K) := {a ∈ oL | hai ⊆ oK[θ]}, where K is the field of fractions of K and oL the integral closure of oK in L. Then the above theorem remains true, with p now a prime of K, if we replace the condition that p should not divide (oK(θ) : oK[θ]) by the condition ’p∗ + f(L/K) = oL’. Restricting to our case where K = Q, the condition ’p - (oK(θ) : oK[θ])’ actually implies ’p∗ + f(L/K) = oL’ but I was not able to prove the other direction so I don’t think we have equivalence in general (that is why I said that ’p - (oK(θ) : oK[θ])’ was unnecessarily strong). 13.2. Consequences for quadratic number fields. We will now discuss in detail how the above allows us to investigate decompositions and ramification of primes in quadratic number fields. √ 13.2.1. Recollections. Let K = Q( d) be a quadratic number field, where d is a square- free integer. This means that √ √ Q( d) = {a + b d | a,b ∈ Q}. √ √ 1+ d If d ≡ 2,3 (mod4), then we put ε := d and if d ≡ 1 (mod4) we put ε := 2 . In either 2 2 1−d case, oK = Z[ε]. This means that ε is either a zero of z − d or z − z + 4 . Therefore, the splitting of these polynomials modulo p determines the decomposition of p in oK. Now let’s do some examples. √ Example√ 13.1. Let K = Q( 3). This is a totally real extension. Since 3 ≡ 3 (mod4), ε = d with minimal polynomial irr(z) = z2 − d and so Z[ε] is the ring of integers of K/Q. This means that the index (oK : Z[ε]) = 1 implying that Dedekind’s theorem applies to all primes in this case. Notice that for quadratic extensions, we only have two possibilities for the index, namely, 1 and 2. This follows from Theorem 9.18 since the determinant of the matrix giving the basis of the submodule in terms of the basis of the ”big” module, is either 1 or 2 (check this!). In any case, Dedekind’s theorem applies to any odd prime in the quadratic case. This is not true in general as we will see later. 59

We can use Theorem 9.24 to compute the discriminant of oK = Z[ε] and we get 2 Disc(Z[ε]) = Disc({1,ε}) = 12 = 2 · 3. Hence the only primes ramifying are 2 and 3; all other are unramified. Table 1 shows the ramification behaviour for the first few primes. Let’s begin by looking at the behaviour at √ TABLE 1. Decomposition of primes in Q( 3):

hpi irr(z)(mod p) ppq√p 2 2 h2i (z + 1) h2,√3 + 1i h3i z2 h3, 3i2 = h3i h5i z2 + 2 h5i 2 h7i z + 4 h7i √ √ h11i (z + 5)(z + 6) h11,√3 + 5ih11,√3 + 6i h13i (z + 4)(z + 9) h13, 3 + 4ih11, 3 + 9i h17i z2 + 14 h17i 2 h19i z + 16 h19i√ √ h23i (z + 7)(z + 16) h23, 3 + 7ih23, 3 + 16i h29i z2 + 26 h29i 2 h31i z + 28 h31i√ √ h37i (z + 15)(z + 22) h37, 3 + 15ih37, 3 + 22i h2i. Reducing z2 − 3 modulo 2 gives z2 − 1 = (z − 1)(z + 1) = (z + 1)(z + 1) = (z + 1)2 (modulo 2). Therefore we see that √ (a) the prime h2i is ramified in Q( 3), which we already knew,√ and (b) that the prime ideal above h2i in the decomposition is h2, 3 + 1i. Notice that since the extension is quadratic h2i is in fact totally ramified (i.e., the ramifica- tion degree is the same as the degree of the extension). Notice also that the ramification is wild at h2i since char(F2) = e in the present case. √ √ The same reasoning applies to h3i. But, notice that h3, 3i2 = h3i, so h3, 3i2 is actu- ally principal. However, this does not make h3i inert since it can be decomposed so it is not prime in o √ . Q( 3) 2 For the case h5i, we see that z√− 3 stays irreducible modulo 5. This means that h5i stays prime when extended to Q( 3), i.e., the prime h5i is inert. We also see that h7i is 2 inert. For the prime h11i, however, z − 3 is reducible into the factors z√+ 5 and z +√6. This tells us that√ the prime ideal factors of the decomposition of h11i in Q( 3) is h11, 3 + 5i and h11, 3 + 6i. Continuing this way we get Table 1. Let me point out that the representation of the prime√ ideal factors are√not unique. For instance, since z + 5 = z − 6 modulo 11, we see that h11, 3 + 5i = h11, 3 − 6i. Indeed, √ √ √ h11, 3 + 5i = {11a + b( 3 + 5) | a,b ∈ Z} = {11a + b( 3 + 5 − 11 + 11) | a,b ∈ Z} √ √ = {11(a + b) + b( 3 − 6) | a,b ∈ Z} = h11, 3 − 6i. Hence, ideal-wise they are the same. Be sure that you understand what Table 1 says and what you can deduce from it in regards to arithmetic information. 60 D. LARSSON √ Example 13.2. Now, consider√ the totally imaginary extension Q( −5). We have −5 ≡ 3 (mod4) so oK = Z[ε], ε = −5. Hence, (oK : Z[ε]) = 1 and so Dedekind’s theorem applies to all primes. The discriminant is computed to −20 = −22 · 5, so the ramified primes are 2 and 5. We now get Table 2. So we see explicitly that h2i and h5i are (totally) √ TABLE 2. Decomposition of primes in Q( −5):

hpi irr(z)(mod p) ppq√p 2 2 h2i (z + 1) h2,√−5 + 1i √ h3i (z + 1)(z + 2) h3,√−5 + 1ih3, −5 + 2i 2 2 h5i z h5,√−5i = h5i√ h7i (z + 3)(z + 4) h7, −5 + 3ih7, −5 + 4i h11i z2 + 5 h11i h13i z2 + 5 h13i h17i z2 + 5 h17i 2 h19i z + 5 h19i√ √ h23i (z + 8)(z + 15) h23,√−5 + 8ih23, √−5 + 15i h29i (z + 13)(z + 16) h29, −5 + 13ih29, −5 + 16i h31i z2 + 5 h31i h37i z2 + 5 h37i ramified, with h2i wild and h5i tame. √ Example 13.3. Here we take Q( 10). Check that we get the results given in Table 3. Make sure you do the ramification analysis for yourself. √ TABLE 3. Decomposition of primes in Q( 10):

hpi irr(z)(mod p) ppq√p 2 2 h2i z h2,√10i = h2i√ h3i (z + 1)(z + 2) h3,√10 + 1ih3, 10 + 2i h5i z2 h5, 10i2 = h5i h7i z2 + 4 h7i 2 h11i z + 1 h11i√ √ h13i (z + 6)(z + 7) h13, 10 + 6ih13, 10 + 7i h17i z2 + 7 h17i h19i z2 + 9 h19i h23i z2 + 13 h23i 2 h29i z + 19 h29i√ √ h31i (z + 14)(z + 17) h31,√10 + 14ih31,√10 + 17i h37i (z + 11)(z + 26) h37, 10 + 11ih37, 10 + 26i

Example 13.4. √One last example in the quadratic case: take the totally imaginary exten- sion K/Q = Q( −15)/Q. This example is interesting for many√ reasons but the most apparent for us is that −15√ ≡ 1 (mod4) so oK = Z[ε], with ε = (1 + d)/2, implying (and check this!) that (oK : Z[ −15]) = 2. Therefore, Dedekind’s theorem applies to all primes 61 except 2. However, in in many cases like this there is a clever trick (that I learned from Keith Conrad) that allows us to solve this problem18. Before we start let us note that this quadratic field is the first imaginary field with class number two! √ The minimal polynomial for −15 is z2 + 15, and using this we can compute the dis- criminant to −15; the ramified primes are 3 and 5. Notice that we can still say that 2 is unramified since we use Theorem 13.4 for this deduction. We now compute the Table 4 in the same manner as before: Notice that h2i is undecided as yet. In fact, notice that √ TABLE 4. Decomposition of primes in Q( −15) v.1:

hpi irr(z)(mod p) ppqp 2 h2i (z + 1) ? √ 2 2 h3i z h3,√−15i h5i z2 h5, −15i2 h7i z2 + 1 h7i h11i z2 + 4 h11i 2 h13i z + 2 h13i√ √ h17i (z + 6)(z + 11) h17,√−15 + 6ih17,√−15 + 11i h19i (z + 2)(z + 17) h19,√−15 + 2ih19, √−15 + 17i h23i (z + 10)(z + 13) h23, −15 + 10ih23, −15 + 13i 2 h29i z + 15 h29i√ √ h31i (z + 4)(z + 27) h31, −15 + 4ih31, −15 + 27i h37i z2 + 15 h37i according to the factorization of z2 + 15 modulo 2, the ideal h2i should be ramified, which we know that it cannot be! Ok, what now? Well the salvation is the statement of the theorem. The theorem doesn’t say that θ has to be the given primitive element, it can be any such. So, notice that √ √ √ ε := 1+ −15 is a primitive element here: ( −15) = ( 1+ −15 ). This follows since obvi- 2 √ √ Q Q 2 ously ( 1+ −15 ) ⊆ ( −15) and the degree of (ε) is two. A way of seeing this more Q 2 Q √ √ Q concretely is to note that −15 = −1 + 2 1+ −15 , so we can express the natural basis of √ √ 2 ( −15) in terms of ( 1+ −15 ) and vice versa. Q Q 2 √ Now, since oK = Z[ε] the index (oK : Z[ε]) = 1. Therefore Z[ε] instead of Z[ −15] allows us to use Dedekind’s theorem on all primes. Doing the computations with the minimal polynomial for ε, which is z2 − z + 4, we get Table 5. Now, you might worry that the decompositions for the odd primes in Tables 4 and 5 are different. But actually they are not. They are just expressed with different generators; the elements in the ideals are exactly the same (there is nothing unique√ about ideal generators in general).√ So, for instance, in the case h3i above, the generator −15 in Table 4 can be expressed as −15 = √ √ 3+ −15 3+ −15 −3 + 2 · 2 ∈ h3, 2 i so the ideals are really the same (since you can go back and forth between the generators). The same applies to all the other ideals. Notice that, apart from the actual structure on the generators, all ramification data are in agreement in both tables as it should. A prime cannot be inert in one representation19 and non-inert in another, for instance. We could say this differently as: ramification behaviour

18There are number fields for which this trick doesn’t work, for instance, the so-called ”Dedekind field”, 3 2 Q(α)/Q with α a zero of z − z − 2z − 8 = 0. 19By a ”representation” I mean, expressing the field extension with a specific primitive element. 62 D. LARSSON √ TABLE 5. Decomposition of primes in Q( −15) v.2:

hpi irr(z)(mod p) ppqp h2i z(z + 1) h2,εih2,ε + 1i h3i (z + 1)2 h3,ε + 1i2 h5i (z + 2)2 h5,ε + 2i2 h7i z2 + 6z + 4 h7i h11i z2 + 10z + 4 h11i h13i z2 + 12z + 4 h13i h17i (z + 5)(z + 11) h17,ε + 5ih17,ε + 11i h19i (z + 8)(z + 10) h19,ε + 8ih19,ε + 10i h23i (z + 6)(z + 16) h23,ε + 6ih23,ε + 16i h29i z2 + 28z + 4 h29i h31i (z + 13)(z + 17) h31,ε + 13ih31,ε + 17i h37i z2 + 36z + 4 h37i is an invariant under change of primitive elements. The only actual difference is that we could handle the decompostion of h2i in the last table. √ One more thing. Clearly, an objection here is possible: −15 and ε have different min- imal polynomials so they should generate non-isomorphic field extensions! Or? No! Think through this. There are two reasons why this is not so. The first has something to do with primitive elements and that the zeros of one minimal polynomial are included in the other field extension (associated with the other polynomial). The second point has something to do with structurally different primitive elements coming from conjugate elements in C. 13.2.2. Splitting behaviour and Quadratic reciprocity. Let f (z) ∈ Z[z] be the polynomial z2 − q, for q a rational prime. We will denote the reduction of f (z) modulo a prime p by fp(z). Clearly, f (z) is irreducible. Since f (z) is quadratic there are three possibilites after reduction:

(a) fp(z) is irreducible; (b) fp(z) = (z − s)(z −t); 2 (c) fp(z) = (z − s) . Case (c) occurs only for p = q and p = 2 so let us disregard these cases here. By Dedekind’s theorem we know that the splitting behaviour of f (z) modulo p has √ something to do with the decomposition of p in Q( q). We will now relate this to the Law of Quadratic Reciprocity (LQR): Theorem 13.9 (Law of Quadratic reciprocity). Let p and q be odd distinct primes. Then

 p  p−1 q−1  q  = (−1) 2 2 , q p or equivalently,   q   p   if q ≡ 1 (mod4) then =  p q   p   if p ≡ 1 (mod4)  q   q if q ≡ 3 (mod4) then =    p p  − if p ≡ 3 (mod4).   q 63

Furthermore,

 1   −1  p−1  2  p2−1 = 1, = (−1) 2 and = (−1) 8 . p p p

2 2 Now, if z ≡ q (mod p) has a solution, or equivalently, if z = q has a solution in Fp, then f (z) = (z − a)(z + a) for some a ∈ Z; if no solution exist then f (z) remains irreducible. Hence,  q   q  f (z) = (z − a)(z + a) ⇐⇒ = 1 and f (z) = z2 − q ⇐⇒ = −1. p p p p Therefore, √ Theorem 13.10. Let p and q be distinct odd primes. Then q is totally split in Q( q) if  q   q  and only if = 1 and q is inert if and only if = −1. p p Notice that the only ramified primes here are p and/or 2. Example 13.5. Let p = 17. Then one can compute that the squares modulo 17 are 1,2,4,8,9,13,15 and 16. This means that  q  = 1 ⇐⇒ q ≡ 1,2,4,8,9,13,15,16 (mod17) 17  q   q   17  and = −1 otherwise. Since 17 ≡ 1 (mod4), the LQR gives that = . 17 17 q This implies that √ q is totally split in Q( 17) ⇐⇒ q ≡ 1,2,4,8,9,13,15,16 (mod17). Make your own table and check that this is correct.  q  Example 13.6. Let p = 11. Here the squares are 1,3,4,5 and 9, so = 1 if and only 11 if q ≡ 1,3,4,5,9 (mod11). We now need to solve the following system of congruences q ≡ a (mod11), q ≡ b (mod4), for a,b = 1,3,4,5,9. By the Chinese Remainder Theorem this is equivalent to solving a single congruence mod- ulo 44. Therefore, solving for all a,b = 1,3,4,5,9 we get √ q is totally split in Q( 11) ⇐⇒ q ≡ 1,5,7,9,19,25,35,37,39,43 (mod44). Once more, it is instructive to make your own table to check this. There is nothing specific in the above computations so we can actually state a theorem on the basis of the examples. Denote by Spl(K/Q) the set of all totally split primes in K. √ Theorem 13.11. Let p be an odd prime. Then Spl(Q( p)/Q) can be described by (i) congruence relations modulo p if p ≡ 1 (mod4), or (ii) congruence relations modulo 4p if p ≡ 3 (mod4). 64 D. LARSSON

This theorem is actually a ””. What I mean by this is the following. As we have remarked the values of Legendre symbols is equivalent to the splitting behaviour of √ primes in quadratic extensions on the form Q( p)/Q. The Law of Quadratic Reciprocity then, gives conditions in the guise of congruence relations on the primes p and q such that √ 20 q splits in Q( p). This is exactly what the above theorem also does , and this inspires to investigate other reciprocity laws in this way. We will in the next section give a ”Cyclotomic Reciprocity Law”. 13.3. Consequences for some non-quadratic number fields. We will here discuss some higher-order examples. Example 13.7. We begin with the field extension Q(α)/Q where α is any zero of the 3 polynomial f1(z) := z − z − 1. The actual zeros are very complicated as you may check for yourselves (there are solvers available on the web); but we don’t need to know the explicit expressions and since conjugate elements generate isomorphic extensions we can take any zero. 2 Recall that we computed the ring of integers of this to be oK = Z1 ⊕ ZαZα (see Ex- ample ??) and that Disc(oK/Z) = −23. This means that Dedekind’s theorem applies to all primes and that 23 is the only ramified prime. See Table 6. The first totally split prime

TABLE 6. Decomposition of primes in Q(α), α = RootOf( f1):

hpi irr(z)(mod p) Decomposition h2i z3 + z + 1 h2i h3i z3 + 2z + 2 h3i h5i (z + 3)(z2 + 2z + 3) h5,α + 3ih5,α2 + 2α + 3i h7i (z + 2)(z2 + 5z + 3) h7,α + 2ih7,α2 + 5α + 3i h11i (z + 5)(z2 + 6z + 2) h11,α + 5ih11,α + 6α + 2i h13i z3 + 12z + 12 h13i h17i (z + 12)(z2 + 5z + 7) h17,α + 12ih17,α2 + 5α + 7i h19i (z + 13)(z2 + 6z + 16) h19,α + 13ih19,α2 + 6α + 16i h23i (z + 20)(z + 13)2 h23,α + 20ih23,α + 13i2 h29i z3 + 28z + 28 h29i h31i z3 + 30z + 30 h31i h37i (z + 24)(z2 + 13z + 20) h37,α + 24ih37,α2 + 13α + 20i appears at p = 59 and this splits as h59,α + 17ih59,α + 46ih59,α + 55i. √ 3 3 Example 13.8. Now, we take Q( 2)/Q, f2(z) := z − 2, for which we computed the discriminant and the ring of integers in Example ??. Remember: Disc(oK/Z) = −108 = 2 3 2 −2 ·3 and oK = Z1⊕Zα ⊕Zα , implying that 2 and 3 ramifies and that Dedekind applies to all primes. Table 7 illustrates the splitting behaviour here. 3 2 Example 13.9. Let α be any zero of the polynomial f3(z) := z − z + 4z − 9 (check that this is irreducible over Q). 2 It is possible to compute that the discriminant is −1815 = −3 · 5 · 11 and oK = Z[α] = Z1 ⊕ Zα ⊕ Zα2. Hence, the ramified primes are 3,5 and 11, and Dedekind’s theorem applies to all primes. Now we get Table 8.

20In fact, this theorem and LQR are equivalent. 65

TABLE 7. Decomposition of primes in Q(α), α = RootOf( f2):

hpi irr(z)(mod p) Decomposition h2i z3 h2,αi3 h3i (z + 1)3 h3,α + 1i3 h5i (z + 2)(z2 + 3z + 4) h5,α + 2ih5,α2 + 3α + 4i h7i z3 + 5 h7i h11i (z + 4)(z2 + 7z + 5) h11,α + 4ih11,α + 7α + 5i h13i z3 + 11 h13i h17i (z + 9)(z2 + 8z + 13) h17,α + 9ih17,α2 + 8α + 13i h19i z3 + 17 h19i h23i (z + 7)(z2 + 16z + 3) h23,α + 7ih23,α2 + 16α + 3i h29i (z + 3)(z2 + 26z + 9) h29,α + 3ih29,α2 + 26α + 9i h31i (z + 11)(z + 24)(z + 27) h31,α + 11ih31,α + 24ih31,α + 27i h37i z3 + 35 h37i

TABLE 8. Decomposition of primes in Q(α), α = RootOf( f3):

hpi irr(z)(mod p) Decomposition h2i z3 + z2 + 1 h2i h3i z(z + 1)2 h3,αih3,α + 1i2 h5i (z + 1)(z + 4)2 h5,α + 1ih5,α + 4i2 h7i (z + 4)(z2 + 2z + 3) h7,α + 4ih7,α2 + 2α + 3i h11i (z + 7)3 h11,α + 7i3 h13i (z + 7)(z2 + 5z + 8) h13,α + 7ih13,α2 + 5α + 8i h17i z3 + 16z2 + 4z + 8 h17i h19i (z + 3)(z + 6)(z + 9) h19,α + 3ih19,α + 6ih19,α + 9i h23i z3 + 22z2 + 4z + 14 h23i h29i (z + 2)(z2 + 26z + 10) h29,α + 2ih29,α2 + 26α + 10i h31i z3 + 30z2 + 4z + 22 h31i h37i (z + 32)(z2 + 4z + 24) h37,α + 32ih37,α2 + 4α + 24i

6 5 4 3 Example 13.10. Now, consider a zero of the polynomial: f4(z) = z + 3z + 2z − z − 3z2 − 2z + 1. We compute the discriminant to −218491 = −75 · 13, so 7 and 13 ramifies and once more Dedekind tells us the decompostion of all primes. The first totally split prime is p = 97 and splits as h97i = h97,α + 8ih97,α + 9ih97,α + 24ih97,α + 74ih97,α + 89ih97,α + 90i. See Table 9 for details.

14. CYCLOTOMIC NUMBER FIELDS Cyclotomic number fields were introduced by Ernst Kummer (1810–1893) although Gauss at least had some ideas along the cyclotomic lines before that. Contrary to what many suspect, it was not a possible proof of Fermat’s last theorem that instigated Kummer to invent and investigate cyclotomic fields, but rather his pursuit of higher reciprocity laws.

14.1. Cyclotomic fields. 66 D. LARSSON

TABLE 9. Decomposition of primes in Q(α), α = RootOf( f4):

hpi irr(z)(mod p) Decomposition h2i z6 + z5 + z3 + z2 + 1 h2i h3i z6 + 2z4 + 2z3 + z + 1 h3i h5i (z3 + 3z2 + 4)(z3 + 2z + 4) h5,α3 + 3α2 + 4ih5,α3 + 2α + 4i h7i (z + 4)6 h7,α + 4i6 h11i z6 + 3z5 + 2z4 + 10z3 + 8z2 + 9z + 1 h11i (z + 3),(z + 7)2,(z + 11), h13,α + 3i,h13,α + 7i2, h13i (z2 + z + 8) h13,α + 11ih13,α2 + α + 8i h17i z6 + 3z5 + 2z4 + 16z3 + 14z2 + 15z + 1 h17i (z3 + 11z2 + 7z + 3), h19,α3 + 11α2 + 7α + 3i, h19i (z3 + 22z2 + 8z + 13) h19,α3 + 22α2 + 8α + 13i (z3 + 4z2 + 17z + 2), h23,α3 + 4α2 + 17α + 2i, h23i (z3 + 22z2 + 12z + 12) h23,α3 + 22α2 + 12α + 12i (z + 13),(z + 17), h29,α + 13i,h29,α + 17i, h29i (z2 + z + 3), h29,α2 + α + 3i, (z2 + z + 7) h29,α2 + α + 7i (z3 + 13z2 + 23), h31,α3 + 13α2 + 23i, h31i (z3 + 21z2 + 8z + 27) h31,α3 + 21α2 + 8α + 27i h37i z6 + 3z5 + 2z4 + 36z3 + 34z2 + 35z + 1 h37i

14.1.1. Cyclotomic polynomials. Let n be a positive integer and put

Primn := {ζ ∈ C | ζ is a primitive nth root of unity}. Notice that Prim generates the cyclic group of nth roots of unity, . n µn Definition 14.1. The nth cyclotomic polynomial is the following integral polynomial

Φn(z) := ∏ (z − ζ). ζ∈Primn Notice that the number of factors is φ(n) (Euler’s totient function) since this counts all the elements prime to n. Immediately from definition follows that 2 p−1 Φp(z) = 1 + z + z + ··· + z , for p prime. It is also possible to prove the very useful identity: n (14.1) z − 1 = ∏Φd(z). d|n

14.1.2. Cyclotomic field extensions.

Definition 14.2. A cyclotomic extension (of Q) is an extension of the form Q(ζ)/Q with ζ ∈ Primn. That is, Q(ζ) is the smallest extension of Q containing both Q and ζ. An equivalent definition is as ( ). Q µn So far we know nothing of the minimal polynomial but it will turn out to be Φn(z) (surprise). It is however not obvious that Φn(z) is irreducible and of minimal degree, although, clearly, Φn(ζ) = 0. But all this will follow from the next theorem. 67

First note that that every primitive nth root of unity is a zero of Φn(z). Therefore, we can define field morphisms i ϕi : Q(ζ) → C, ζ 7→ ζ . It will follow from part (a) of the theorem below that these are all the field morphisms of Q(ζ)/Q. Theorem 14.1. Let ζ be a primitive prth root of unity, where p is a (rational) prime and r > 0. Then r r r− (a) [Q(ζ)/Q] = φ(p ) (recall φ(p ) = p 1(p − 1)); (b) oQ(ζ) = Z[ζ]; r (c) the element π := 1 − ζ is a prime element and hpi = hπiφ(p ); pr−1(pr−r−1) (d) Disc(oQ(ζ)/Z) = ±p Notice that (d) implies that p is the only ramified prime and it is totally ramified by (a) and (d). Also, (b) implies that Dedekind’s theorem can be applied to all primes here.

n Proof. Put K := Q(ζ). Clearly, ζ is integral over Z since it is a zero of z −1 (for instance). r So Z[ζ] ⊆ oK. If α is another primitive p th root of unity then, since elements of Primpr r t s generates the p th roots of unity, there are s,t ∈ Z, p - s,t, such that α = ζ and ζ = α. Hence Z[ζ] = Z[α] implying that Q(ζ) = Q(α). We have that s 1 − α 1 − ζ 2 s−1 = = 1 + ζ + ζ + ··· + ζ ∈ Z[ζ], 1 − ζ 1 − ζ and similarly t 1 − ζ 1 − α 2 t−1 s 2s s(t−1) = = 1 + α + α + ··· + α = 1 + ζ + ζ + ··· + ζ ∈ Z[ζ], 1 − α 1 − α

1−α so 1−ζ is invertible in Z[ζ] and so also in oK. We have that

pr p z − 1 u − 1 2 p−1 pr−1 Φpr (z) = = = 1 + y + y + ··· + y , where y := z zpr−1 − 1 u − 1 from which it follows that Φpr (1) = p. Therefore, we get

1 − α  Φpr (1) = (1 − α) = (1 − ζ) ∏ ∏ 1 − ζ α∈Primpr α∈Primpr r 1 − α = u · (1 − ζ)φ(p ), where u := ∈ o×. ∏ 1 − ζ K

The fundamental identity [K/Q] = ∑i ei fi (see Theorem 13.3) then implies that [Q(ζ)/Q] ≥ r r φ(p ). But we clearly also have φ(p ) ≥ [Q(ζ)/Q] since the minimal polynomial can have r degree no bigger than deg(Φpr (z)) = φ(p ) (remember Φpr (ζ) = 0 so irr(ζ,Q) | Φpr ). r Hence [Q(ζ)/Q] = φ(p ). This shows (a) and (c) if we note that π := 1−ζ must generate a prime ideal. Otherwise, hpi would decompose into too many prime factor (we are in the Dedekind domain oK so the decomposition can be done). Indeed, if hπi was not prime, φ(pr) then hπi = ∏i pi and so hpi = ∏i p contradicting the fact that (by the fundamental identity) there must be exactly φ(pr) prime factors. 68 D. LARSSON

This also shows that f (hπi | hpi) = 1, since, also by the fundamental identity, and the fact that hπi is the only prime above hpi,

r r [Q(ζ)/Q] = φ(p ) = ∑ e(P | hpi) f (P | hpi) = ∑ φ(p ) f (P | hpi) ⇒ f (P | hpi) = 1. P|hpi P|hpi Hence

(14.2) Z/hpi → oK/hπi must be an isomorphism. We shall now compute Disc(Z[ζ]) via Theorem 9.23: 0 Disc(Z[ζ]/Z) = ±NrmK/Q(Φpr (ζ)). The actual signs won’t be important in what follows. Notice that what we have already proven, Φpr (z) is the minimal polynomial for ζ. pr−1 pr Differentiating the identity Φpr (z)(z − 1) = z − 1 gives

r pr−1 r−1 pr−1−1 0 p z − Φpr (z)p z Φ r (z) = , p zpr−1 − 1 and evaluating at z = ζ, taking into account Φpr (ζ) = 0, gives

r pr−1 0 p ζ Φpr (ζ) = . ζ pr−1 − 1 We also have

r rφ(pr) NrmK/Q(ζ) = ±1, NrmK/Q(p ) = p , NrmK/Q(π) = ±p,

ps where the sign depends on r. To compute NrmK/Q(1 − ζ ), 0 ≤ s < p, we note that

ps ps ζ ∈ Prim r−s so Nrm ps (1 − ζ ) = ±p p Q(ζ )/Q by the same arguments as for π (notice the field extension here). Now, we use a result which can be found in books in abstract algebra, namely, let H ⊆ K ⊆ L be a tower of fields, then

NrmL/K = NrmK/H ◦ NrmL/K (notice the order). (It is not hard, but not particularly easy either, to figure the proof out.) By definition, [K/H] pr NrmK/H (x) = x , for x ∈ H. Using this on the tower Q ⊆ Q(ζ ) ⊆ Q(ζ) = K we get

ps ps  Nrm (1 − ζ ) = Nrm ps Nrm ps (1 − ζ ) K/Q Q(ζ )/Q K/Q(ζ ) s ps p [K/Q(ζ )] = Nrm ps (1 − ζ ) Q(ζ )/Q ps  s [K/Q(ζ )] ps p [K/Q(ζ )] = Nrm ps (1 − ζ ) = ±p . Q(ζ )/Q By the tower law we have

ps r  r−s ps [K/Q(ζ )] = [Q(ζ)/Q] = φ(p ) φ(p ) = p . 69

Therefore,  r pr−1  0 p ζ Disc(Z[ζ]/Z) = ±NrmK/ (Φpr (ζ)) = ±NrmK/ r− Q Q ζ p 1−1 r pr−1 pr−1 −1 = ±NrmK/Q(p )NrmK/Q(ζ )NrmK/Q(ζ − 1) r r−1 r−1 = ±prφ(p ) · p−p = ±pp (rp−r−1), where the last equality follows since φ(pr) = pr−1(p − 1). =∼ Recall that we have an isomorphism Z/hpi −→ oQ(ζ)/hπi. This means in particular that

oQ(ζ) = Z + hπi =⇒ oQ(ζ) = Z[ζ] + hπi. Multiplying this equation with π gives 2 2 2 hπi = πZ[ζ] + hπ i =⇒ oQ(ζ) = Z[ζ] + πZ[ζ] + hπ i = Z[ζ] + hπ i.

Notice that hπi denotes the ideal in oQ(ζ) while πZ[ζ] is the ideal generated by π in Z[ζ]. Clearly we can continue like this indefinitely to obtain m oQ(ζ) = Z[ζ] + hπ i. (pr) m m Since p = uπφ , choosing M ∈ Z big enough we see that hp i = hπ i, for all m ≥ M. M We have, by possibly choosing a larger M, p oQ(ζ) ⊆ Z[ζ]. This shows, finally, that M M oQ(ζ) = Z[ζ] + hπ i = Z[ζ] + hp i = Z[ζ], and this completes the proof.  With a little more work, using some technical field theory, one can extend the previous theorem and prove the following generalization. Theorem 14.2. Let ζ be an nth root of unity. Then, (a) [Q(ζ)/Q] = φ(n); (b) oQ(ζ) = Z[ζ]; (c) the discrimiant can be computed with the following formula: nφ(n) Disc( [ζ]/ ) = (−1)φ(n)/2 , Z Z φ(n)/(p−1) ∏p|n p where φ(n) is the totient function; r (d) if n = p · m, p - m, then φ(pr) hpi = (p1p2 ···ps) , with all the pi’s distinct in Spec(oQ(ζ)). r r In particular, unless φ(p ) = 1, p ramifies in Q(ζ). The case φ(p ) = 1 is only satisfied for pr = 2, implying that p = 2, r = 1, so n = 2 · (odd). √ Example 14.1. Let ζ := ζ7 := exp(2π −1/7) be a primitive 7th root of unity. Then Theorem 14.1 tells us: - [Q(ζ)/Q] = 6 and so {1,ζ,ζ 2,ζ 3,ζ 4,ζ 5} is a basis for Q(ζ) over Q; - the ring of integers is: 2 3 4 5 oQ(ζ) = Z1 ⊕ Zζ ⊕ Zζ ⊕ Zζ ⊕ Zζ ⊕ Zζ ; 5 - Disc(oQ(ζ)/Z) = ±7 = ±16807, and 7 is the totally ramified prime with h7i = h1 − ζi6; - all other primes are unramified. 70 D. LARSSON

We will now use Dedekind’s theorem to find the ramification behaviour of the first few primes. Recall that, since (oQ(ζ) : Z[ζ]) = 1, Dedekind’s theorem applies to all primes with ζ as the primitive element. 6 5 4 3 2 The minimal polynomial of ζ7 is Φ7(z) = z + z + z + z + z + z + 1. From this we get Table 10: We see explicitly that the only ramified prime is 7; all other primes are

TABLE 10. Decomposition of primes in Q(ζ7):

hpi irr(z)(mod p) Decomposition (z3 + z + 1), h2,ζ 3 + ζ + 1i, h2i (z3 + z2 + 1) h2,ζ 3 + ζ 2 + 1i h3i z6 + z5 + z4 + z3 + z2 + z + 1 h3i h5i z6 + z5 + z4 + z3 + z2 + z + 1 h5i h7i (z + 6)6 h7,ζ + 6i6 (z3 + 5z2 + 4z + 10), h11,ζ 3 + 5ζ 2 + 4ζ + 10i, h11i (z3 + 7z2 + 6z + 10) h11,ζ 3 + 7ζ 2 + 6ζ + 10i (z2 + 3z + 1), h13,ζ 2 + 3ζ + 1i, h13i (z2 + 6z + 1), h13,ζ 2 + 6ζ + 1i, (z2 + 5z + 1) h13,ζ 2 + 5ζ + 1i h17i z6 + z5 + z4 + z3 + z2 + z + 1 h17i h19i z6 + z5 + z4 + z3 + z2 + z + 1 h19i (z3 + 10z2 + 9z + 22), h23,ζ 3 + 10ζ 2 + 9ζ + 22i, h23i (z3 + 14z2 + 13z + 22) h23,ζ 3 + 14ζ 2 + 13ζ + 22i (z + 4),(z + 5), h29,ζ + 4ih29,ζ + 5i, h29i (z + 6),(z + 9), h29,ζ + 6ih29,ζ + 9i, (z + 13),(z + 22) h29,ζ + 13ih29,ζ + 22i h31i z6 + z5 + z4 + z3 + z2 + z + 1 h31i (z3 + 9z2 + 8z + 36), h37,ζ 3 + 9ζ 2 + 8ζ + 36i, h37i (z3 + 29z2 + 28z + 36) h37,ζ 3 + 29ζ 2 + 28ζ + 36i unramified. Since gcd(6,7) = 1, we see that 7 is tamely ramified (the characteristic of Fhζ+6i = 7). The only totally split prime in the table is h29i. In general, as we will see in a while, the split primes in Q(ζn) are exactly the primes p such that p ≡ 1 (modn).

Example 14.2. Now, let ζ := ζ9 be a 9th root of unity. We compute that the discrimi- 9 6 3 nant is ±3 = ±19683. Also, the minimal polynomial is Φ9(z) = z + z + 1. Therefore, [Q(ζ)/Q] = 6 and oQ(ζ) = Z[ζ]. Since oQ(ζ) = Z[ζ], Dedekind’s theorem is applicable to all primes. The decomposition table becomes as in Table 11. It is interesting to compare this table with the previous one. For one thing, the degrees are the same but the arithmetic is completely different! For example, in the present case we have two totally split primes in the range we are considering, namely, p = 19,37. Also, comparing the splitting behaviour for the primes in Table 10 and Table 11 shows that, for instance, 17 is inert in the first and non-inert in the second. We have one more typical example to consider, namely, the non-prime power case.

Example 14.3. Let us take ζ := ζ15 a 15th root of unity. The minimal polynomial here is 8 7 5 4 3 4 6 Φ15(z) = z − z + z − z + z − z + 1 and the discriminant is 1265625 = 3 · 5 . Hence, here the ramified primes are 3 and 5. Dedekind’s theorem still applies to all primes. We compute Table 12. The primes 3 and 5 are tamely ramified. The interesting thing here is 71

TABLE 11. Decomposition of primes in Q(ζ9):

hpi irr(z)(mod p) Decomposition h2i z6 + z3 + 1 h2i h3i (z + 2)6 h3,ζ + 2i6 h5i z6 + z3 + 1 h5i h7i (z3 + 3)(z3 + 5) h7,ζ 3 + 3ih7,ζ 3 + 5i h11i z6 + z3 + 1 h11i h13i (z3 + 4)(z3 + 10) h13,ζ 3 + 4ih13,ζ 3 + 10i (z2 + 3z + 1), h17,ζ 2 + 3ζ + 1i, h17i (z2 + 4z + 1), h17,ζ 2 + 4ζ + 1i, (z2 + 10z + 1) h17,ζ 2 + 10ζ + 1i (z + 2),(z + 3), h19,ζ + 2i,h19,ζ + 3i, h19i (z + 10),(z + 13), h19,ζ + 10i,h19,ζ + 13i, (z + 14),(z + 15) h19,ζ + 14i,h19,ζ + 15i h23i z6 + z3 + 1 h23i h29i z6 + z3 + 1 h29i h31i (z3 + 6)(z3 + 26) h31,ζ 3 + 6ih31,ζ 3 + 26i (z + 3),(z + 4), h37,ζ + 3i,h37,ζ + 4i, h37i (z + 21),(z + 25), h37,ζ + 21i,h37,ζ + 25i, (z + 28),(z + 30) h37,ζ + 28i,h37,ζ + 30i that there are no inert primes. This is a general fact for cyclotomic extensions generated by an nth root of unity where n is not a prime power. Hence, every prime hpi decomposes in Q(ζ15)! Notice that 31 is totally split. − 14.1.3. CM-fields and cyclotomy. Let ζ be a primitive nth root of unity, 2 - n. Then ζ 1 + − is also a primitive nth root of unity. Consider the subfield Q(ζ) := Q(ζ + ζ 1) ⊂ Q(ζ). s Notice that this is strict inclusion. Indeed, since Q(ζ) is imaginary unless n = 2 (in which case it can be either imaginary or real) there is a non-trivial automorphism induced by complex conjugation. We have,

−1 = ¯| |−2 = ¯ = , (since ¯ = 1 if ∈ ) ζ ζ ζ ζ ζ ζζ ζ µn − − + − so ζ + ζ −1 = ζ +ζ 1. This means that ζ +ζ 1 ∈ R so Q(ζ) = Q(ζ +ζ 1) ⊂ R; clearly + then, Q(ζ) (Q(ζ). On the other hand, adjoining a suitable element ζˆ := ζ 2 + ζ −2 − 2 (which is a zero of − + z2 − (ζ + ζ 1)z + 1) to Q(ζ) will give us the whole cyclotomic extension Q(ζ), i.e., + ˆ Q(ζ) = Q(ζ) (ζ). + This means that the extension Q(ζ)/Q(ζ) has degree two. Hence, Q(ζ) is a CM-field, being a totally imaginary quadratic extension of a totally real field21. Unfortunately, we won’t be able to investigate the deeper implications of this fact, but I will at least state a + proposition connecting units in cycloctomic extensions and the subfield Q(ζ) to soften the blow.

21 + An alternative argument can be given as follows. Notice that Q(ζ) is the fixed field of complex conjuga- 2 + tion on Q(ζ), which is an automorphism of degree two (i.e., σ = id) so Q(ζ)/Q(ζ) has degree two by Galois theory (see below). 72 D. LARSSON

TABLE 12. Decomposition of primes in Q(ζ15):

hpi irr(z)(mod p) Decomposition h2i (z4 + z + 1)(z4 + z3 + 1) h2,ζ 4 + ζ + 1ih2,ζ 4 + ζ 3 + 1i h3i (z4 + z3 + z2 + z + 1)2 h3,ζ 4 + ζ 3 + ζ 2 + ζ + 1i2 h5i (z2 + z + 1)4 h5,ζ 2 + ζ + 1i4 (z4 + 2z3 + 4z2 + z + 2), h7,ζ 4 + 2ζ 3 + 4ζ 2 + ζ + 2i h7i (z4 + 4z3 + 2z2 + z + 4) h7,ζ 4 + 4ζ 3 + 2ζ 2 + ζ + 4i (z2 + 3z + 9),(z2 + 4z + 5), h11,ζ 2 + 3ζ + 9i,h7,ζ 2 + 4ζ + 5i h11i (z2 + 5z + 3),(z2 + 9z + 4) h11,ζ 2 + 5ζ + 3ih7,ζ 2 + 9ζ + 4i (z4 + 9z3 + 3z2 + z + 9), h13,ζ 4 + 9ζ 3 + 3ζ 2 + ζ + 9i h13i (z4 + 3z3 + 9z2 + z + 3) h13,ζ 4 + 3ζ 3 + 9ζ 2 + ζ + 3i (z4 + 11z3 + 15z2 + 5z + 1), h17,ζ 4 + 11ζ 3 + 15ζ 2 + 5ζ + 1i h17i (z4 + 5z3 + 15z2 + 11z + 1) h17,ζ 4 + 5ζ 3 + 15ζ 2 + 11ζ + 1i (z2 + 10z + 11),(z2 + 13z + 7), h19,ζ 2 + 10ζ + 11i,h19,ζ 2 + 13ζ + 7i h19i (z2 + 16z + 11),(z2 + 17z + 7) h19,ζ 2 + 16ζ + 11ih19,ζ 2 + 17ζ + 7i (z4 + 6z3 + 21z2 + 16z + 1), h23,ζ 4 + 6ζ 3 + 21ζ 2 + 16ζ + 1i h23i (z4 + 16z3 + 21z2 + 6z + 1) h23,ζ 4 + 16ζ 3 + 21ζ 2 + 6ζ + 1i (z2 + 8z + 1),(z2 + 9z + 1), h29,ζ 2 + 8ζ + 1i,h29,ζ 2 + 9ζ + 1i h29i (z2 + 15z + 1),(z2 + 25z + 1) h29,ζ 2 + 15ζ + 1ih29,ζ 2 + 25ζ + 1i (z + 3),(z + 11), h31,ζ + 3i,h31,ζ + 11i, (z + 12),(z + 13), h31,ζ + 12i,h31,ζ + 13i h31i (z + 17),(z + 21), h31,ζ + 17ih31,ζ + 21i, (z + 22),(z + 24) h31,ζ + 22i,h31,ζ + 24i (z4 + 10z3 + 26z2 + z + 10), h37,ζ 4 + 10ζ 3 + 26ζ 2 + ζ + 10i h37i (z4 + 26z3 + 10z2 + z + 26) h37,ζ 4 + 26ζ 3 + 10ζ 2 + ζ + 26i

r Proposition 14.3. Let Q(ζ)/Q be a cyclotomic field extension with ζ an p th root of unity × + with p an odd prime. Assume that a ∈ Q(ζ) . Then there is a b ∈ Q(ζ) and s ≥ 0 such that a = bζ s. + It is also a fact that Q(ζ)/Q(ζ) is unramified. We will now begin the journey towards explaining the splitting behaviour in cyclotomic and quadratic extensions. 14.2. Galois theory of number fields. To discuss the next topic we need to define and state the main theorems in Galois theory as applied to number fields. I tried in vain to avoid this, but there are in any case considerable advantages in actually spending a few moments on this topic. Not only for the next section but since many results in number theory actually depends on Galois theory. So this section will in principle only be a list of definitions and theorems. No proofs will be given.

Definition 14.3. Let L/K be a finite extension of a number field K/Q given by a primitive element α ∈ K. Then L/K is a Galois number field if every conjugate of θ is in L. The of L/K is the group of automorphisms of L fixing K, i.e.,

Gal(K/Q) := {σ ∈ Aut(L) | σ(a) = a, for all a ∈ K} = {σ ∈ Aut(L) | σ|K = idK, for all a ∈ K}. 73

Notice that if K = Q, then Gal(L/Q) = Mor(L). Also, every σ ∈ Gal(L/Q) descends to an automorphism of oK. Theorem 14.4. Any quadratic number field is a Galois number field with √ Gal(Q( d)/Q) =∼ {id,σ} =∼ Z/2Z, √ √ where σ is the ”conjugation” σ(a + b d) := a − b d. The following is the main theorem of Galois theory for number fields. Theorem 14.5. Let L/K be a Galois number field. Then there is a bijection between the subfields L ⊇ k ⊇ K and subgroups of Gal(L/K) given by the association H ⊆ Gal(L/K) ←→ LH := {a ∈ L | σ(a) = a, for all σ ∈ H}. Moreover, if H is normal, LH/K is a Galois number field with Galois group Gal(L/K)/H; in particular, this holds if Gal(L/K) is abelian. We also recall a result concerning cyclic groups. Proposition 14.6. Let G be a cyclic group of order n. Then, for every d | n there is a unique subgroup of G of order d.

Theorem 14.7. The cyclotomic extension Q(ζ)/Q is Galois with Galois group × Gal(Q(ζ)/Q) = (Z/nZ) , where ζ is a primitive nth root of unity. In particular, if n = p is a prime, Gal(K/Q) is the × cyclic group (Z/pZ) =∼ Z/(p − 1)Z. Theorem 14.8. Let L/K be a Galois number field and let p be a prime in K, i.e,, p ∈ Spec(oK). Then for P,Q | p, with P,Q ∈ Spec(oL), there is a σ ∈ Gal(L/K) such that σ(P) = Q. It is easy to see that σ(P) | p. Indeed, σ(P)∗ = σ(P∗) = σ(p) = p. Definition 14.4. Let L/K be Galois and let P | p.

- The subgroup GP ⊆ Gal(L/K) defined by

GP := {σ ∈ Gal(L/K) | σ(P) = P} is called the decomposition group of P. - The fixed field

ZP := {a ∈ K | σ(a) = a, for all σ ∈ Gp} is called the decomposition field of P.

Notice that the number of primes over p is exactly the index (Gal(L/K) : GP) for all P lying over p. This follows since Gal(L/K)/GP acts on the primes over p without fixed points. In particular, we have

GP = {1} ⇐⇒ ZP = L ⇐⇒ p is totally split, GP = Gal(L/K) ⇐⇒ ZP = K ⇐⇒ p is non-split, and 0 0 −1 0 σ ∈ Gσ(P) ⇐⇒ σ ◦ σ(P) = σ(P) ⇐⇒ σ ◦ σ ◦ σ(P) = P (14.3) −1 0 0 −1 ⇐⇒ σ ◦ σ ◦ σ ∈ GP ⇐⇒ σ ∈ σGPσ . 74 D. LARSSON

This shows that the subgroups Gσ(P) and GP are conjugate. ∧ ∧ Let p := {P1,P2,...,Pr} be the set of prime ideals over p. Then for each pair Pi,P j ∈ p there is a σ such that P j = σ(Pi). Since σ induces an automorphism of oL we get an isomorphism of fields =∼ oL/Pi −→ oL/P j. This implies that all the inertia degrees over p are the same. We also have σ(p) = p so Pe | p ⇐⇒ σ(Pe) | σ(p) ⇐⇒ (σ(P))e | p. Hence all the ramification indices are also the same and p = ∏ σ(P)e, for all P | p. σ∈Gal(L/K)

Proposition 14.9. Let L/K be a Galois number field and let PZ be P restricted to the decomposition subfield ZP. Then the ramification index and inertia degree of PZ over K both equal 1 and the fundamental identity Theorem 13.3 reduces to

[L/K] = #Gal(L/K) = e · f · (Gal(L/K) : GP). This is what we need to know from Galois theory to be able to digest the next topic, namely Gauss sums and quadratic reciprocity.

14.3. Gauss sums and Quadratic Reciprocity. From now on we assume that ζ is a pth root of unity for p a (rational) odd prime.

14.3.1. Quadratic Gauss sums. Consider the sum  a  T := ζ a ∑ p a∈(Z/pZ)× inside Q(ζ). This is a special case of so-called Gauss sums. We compute:  ab  T 2 = ζ a+b. ∑ p a,b∈(Z/pZ)× × Fixing b (say), ab ranges over (Z/pZ) as a does. Therefore, we can replace a by ab in the above equation:

 ab   ab2   a  T 2 = ∑ ζ a+b = ∑ ζ b(a+1) = ∑ ζ b(a+1) a,b p a,b p a,b p  −1   a  = + ζ b(a+1), ∑ p ∑ p b∈(Z/pZ)× a6=−1,b × a+ where unspecified summation is over (Z/pZ) . Putting τ := ζ 1 we get  a   a   a  ∑ ζ b(a+1) = ∑ ∑ζ b(a+1) = ∑ ∑τb. a6=−1,b p a6=−1 p b a6=−1 p b Using that τ + τ2 + ··· + τ p−1 = −1, we get  −1   a  T 2 = − . ∑ p ∑ p b∈(Z/pZ)× a6=−1 75

 a   c  By multiplying ∑ by a symbol = −1 we find a p p  a   a  c   ac   a  −∑ = ∑ = ∑ = ∑ , a p a p p a p a p  a  which implies that ∑ = 0. Hence, a p  −1   a   −1   a   −1   −1  T 2 = ∑ − ∑ = (p − 1) − ∑ + = p. b p a6=−1 p p a p p p

∗ p−1 Introducing the notation p := (−1) 2 p, we have proved ∗ √ Theorem 14.10. The element p is a square in Q(ζ) and so Q( p∗) is a subfield of Q(ζ). We will use this to prove the Law of Quadratic Reciprocity. First we need the following proposition. Proposition 14.11. Let p and q be odd distinct primes and let ζ be a pth root of unity. Then p q ∈ Spl(Q( p∗)) ⇐⇒ q splits into an even number of prime factors in Q(ζ). √ √ ∗ ∗ Proof. If q ∈ Spl(Q( p )) then hqi = p1p2 in Q( p ). The primes p1 and p2 splits in Q(ζ) into the same number of prime factors (why?), so hqi splits into an even number of prime factors in Q(ζ). Conversely, assume this. Then for Q | q we have that 2 | [ZQ/Q]. Since Q(ζ) is a cyclic Galois extension of even degree, the subextension ZQ/Q√is also, and ∗ √so there is a unique quadratic subextension over Q inside ZQ. This must be Q( p ) since ∗ p ∈ ZQ. The prime Q restricted√ to ZQ has inertia degree one over Q and so this must also ∗ be the√ case for Q restricted to Q( p ). Hence, by the fundamental identity, q is totally split in Q( p∗).  Notice that the Law of Quadratic Reciprocity is equivalent to  q   p∗  = . p q

 −1  p−1 Indeed, since = (−1) 2 we have p p−1 ∗  q   p   −1  2  p  p−1 q−1  p  = = = (−1) 2 2 . p q q q q Reversing the argument gives the other direction.  p∗  √ We know that = 1 if and only if q is totally split in ( p∗) and if and only if q Q hqi splits into an even number of prime factors in Q(ζ), where ζ is a primitive pth root of unity. Assume this. The fundamental identity gives

[Q(ζ)/Q] = v · f · (Gal(Q(ζ)/Q) : GQ) ⇐⇒ p − 1 = 1 · f · (Gal(Q(ζ)/Q) : GQ) for any Q | q. Recall that (Gal(Q(ζ)/Q) : GQ) = [ZQ/Q] is the number of prime ideals over p−1 × × q. This implies that p − 1 = 2x f and so f | 2 . Consider q modulo p in Fp . Since Fp is a cyclic group there is a least a ≤ p − 1 such that qa = 1. This a must be f by the relation p−1 × × p − 1 = 2x f . Therefore we have q 2 ≡ 1 (mod p). But any y ∈ Fp having order in Fp 76 D. LARSSON

p−1 ×2 dividing 2 must belong to Fp (and conversely of course), i.e., must be a square. This  q   p∗  implies that = 1 = and LQR is proved. p q 14.3.2. The Cyclotomic Reciprocity Law. [,p Lemma 14.12. Suppose that Φn(z) decomposes into linear factors. Then p - n.

Proof. Let ζ be a primitive nth root of unity and assume that p | n. Then Φn(ζ) = 0 and d Φd(ζ) 6= 0 for d < n since ζ 6= 1. Differentiating the relation n z − 1 = ∏ Φd(z) · Φn(z) = F(z) · Φn(z) d|n d

[,p Conversely, by Lemma 14.12, p cannot divide n if Φn(z) splits into distinct linear factors. This means that (zn − 1)[,p also have distinct factors since otherwise (nzn−1)[,p and (zn − 1)[,p would have a common zero which is impossible unless p | n. Assume that [,p n d a ∈ Fp is a zero of Φn(z) . Then a = 1. If d is the smallest divisor of n such that a = 1, [,p n then Φd(a) = 0 by Lemma 14.13. If d 6= n then z − 1 = ∏d|n Φd(z) shows that a is at least a double zero of zn − 1, which is a contradiction. Therefore, d = n and a generates a × cyclic subgroup of Fp of order n, so n | p − 1  We can actually give and prove a more precise statement without too much difficulty, namely:

Theorem 14.15 (Cyclotomic Reciprocity Law). Suppose p - n and let Q(ζ)/Q be a cy- clotomic extension with ζ a primitive nth root of unity. Furthermore, let f be the smallest f integer such that p ≡ 1 (modn). Then p splits into φ(n)/ f distinct factors in Q(ζ), each of residue class degree f . In particular, p is totally split if and only if p ≡ 1 (modn).

Proof. We note first that if p - n then, for any p | p in Q(ζn), we have that all nth roots [ × × of unity are distinct modulo p. Indeed, ζn generates a subgroup of Fp := (oQ(ζn)/p) of order less than or equal to n (notice that n | q − 1). If the order is strictly less than n, then i j i j−i ζn ≡ ζn (modp) for i 6= j < n, implying that ζn(1 − ζn ) ≡ 0 (modp) (assume i < j). j−i Since p is prime we have that 1 − ζn ≡ 0 (modp). Now, consider the equation n−1 l n = Φn(1) = ∏(1 − ζn). l=1 Reducing this modulo p yields something non-zero since p - n. But somewhere in this j−i product is a factor 1 − ζn which, modulo p, is zero, and so we have a contradiction. p The element Frobp(ζn) is a root of unity modulo p so by the above Frobp(ζn) = ζn . Let f f := [Fp/Fp], we get Frobp = id, so f f p f f Frobp = id ⇐⇒ Frobp(ζn) = ζn ⇐⇒ ζn = ζn ⇐⇒ p ≡ 1 (modn). From the fundamental identity we have φ(n) = e ft, where t is the number of primes above p. Since Q(ζn)/Q is Galois, e = 1, and so the result follows.  14.3.3. Residue symbols in number fields. Let K/Q be a number field containing a primi- tive nth root of unity ζ := ζn. For p ∈ Spec(oK), recall that r N(p) := (oK : p) = #(oK/p) = p , for some r ≥ 1.

Let a ∈ oK be an element relatively prime to p, i.e., hai + p = oK ⇐⇒ a ∈/ p. Then we have the following generalization of Fermat’s theorem: aN(p)−1 ≡ 1 (modp). × This follows easily since (oK/p) is a finite group with N(p) − 1 elements. Put q := N(p) and assume that hni ⊂ oK is relatively prime to p. (This is equivalent to × gcd(q,n) = 1.) Reducing ζ modulo p gives a generator for a subgroup of (oK/p) with n elements. Thus n | q − 1. We now make the following definition: for a ∈ oK, we define  K/ ;a  Q to be such that p n   q−1 K/Q;a (14.4) a n ≡ (modp). p n 78 D. LARSSON

This is well-defined and is called the nth power residue symbol in K. Notice that (14.4) is formally the same as Euler’s theorem on Legendre symbols. The slightly awkward notation is one of many traditional ones for nth power residues and the reason for introducing it (and its variants) will hopefully be clear in a while. It follows immediately from definition that  K/ ;ab   K/ ;a   K/ ;b  (14.5) Q = Q Q , p n p n p n since     q−1 q−1 q−1 K/Q;a K/Q;b (ab) n = a n b n ≡ (modp), p n p n and we extend this to non-prime ideals a as  K/ ;a   K/ ;a  Q = ∏ Q . a n p|a p n  / ;·  Notice that the case Q Q is the ordinary Legendre symbol. The above extension · 2 to non-prime ”denominators” for quadratic residue symbols is called the Jacobi symbol. Theorem 14.16. Under the above assumptions we have the following properties:  K/ ;a   K/ ;b  (a) if a ≡ b (modp) then Q = Q ; p n p n  K/ ;a  (b) Q = 1 is equivalent to the existence of a non-zero solution to zn ≡ p n a (modp). (c) z ∈ Z is an nth power residue modulo a prime p ≡ 1 (modn) if and only if  (ζ )/ ;z  Q n Q = 1 for any prime p | p. p n n Proof. (a) is clear. To prove (b) assume that a ≡ x (modp) for x ∈ oK \ p (hence x is not q−1 q−1 zero). Then a n ≡ (xn) n = xq−1 ≡ 1 (modp) so the result follows. Assume now that   K/Q;a × = 1. Since a ∈ oK \ p, modulo p, a is non-zero, so, since (oK/p) is cyclic, p n j there is a primitive nth root of unity y ∈ oK such that a ≡ y (modp) for some j ≥ 0. Hence, q−1 q−1 1 ≡ a n ≡ y j n (modp). This implies, because y is a primitive nth root of unity, that n | j, and so (b) follows with z = y j/n. The last item (c) follows from (b) and the Cyclotomic Reciprocity Law.  14.3.4. General Gauss sums and Jacobi sums. Let us now consider more general Gauss sums. The reason for this is that the shortest path towards higher reciprocity laws goes via Gauss sums. × × A multiplicative is a group homomorphism ϑ : Fq → C . To simplify the terminology we will often omit the specifier ”multiplicative”. By 1 we denote the unit character defined by 1(y) = 1 for all y ∈ G. We begin by noting some useful properties: (i) ϑ(1) = 1 ∈ C (by definition); (ii) ϑ(−1) = ±1, since 1 = ϑ(1) = ϑ((−1)(−1)) = ϑ(−1)2; (iii) im( ) ⊆ , for some n > 1, since × is finite of order; ϑ µn Fq − (iv) by (iii) we have ϑ(a) = ϑ(a) 1, since |ϑ(a)|2 = 1 ∈ C. 79

 K/ ; ·  Example 14.4. Notice that equation (14.5) shows that Q is a multiplicative char- p n acter for all primes p ∈ Spec(oK), at least in a suitable sense. In fact, generalizing slightly,  K/ ;a   K/ ;b  we see that if a = b modulo some prime p then Q = Q so the nth p n p n × power residue symbol is a multplicative character on the group (oK/p) , i.e.,     K/Q; · × × K/Q;a : (oK/p) → C , a (modp) 7→ . p n p n This is clearly the a multiplicative character as defined above when K = Q. It is also a deep  K/ ; ·  fact know as Artin’s reciprocity law that Q is also a group homomorphism in · n the ”denominator” (interpreted correctly). We will take a short peek in this direction later.

Let ζ be a primitive pth root of unity. We will be interested in the trace Trq : Fq → Fp. p Since Gal(Fq/Fp) is cyclic with generator the Frobenius morphism Frobp(a) = a the trace is q−1 q−1 i pi Trq(a) = ∑ σ(a) = ∑ Frobp(a) = ∑ a ∈ Fp. σ∈Gal(Fq/Fp) i=0 i=0 (a) Interpreting the value of this as an element in Z we can consider ζ Trq . Extend ϑ 6= 1 to × the whole Fq by ϑ(0) := 0. This will simplify computations since we can sum over the × whole Fq and not just over Fq . However, we put 1(0) := 1. Let Gˆ denote the set of characters of G. We define a group structure on Gˆ as  ϑ1(a)ϑ2(a) if a 6= 0;  −1 (14.6) (ϑ1 · ϑ2)(a) := 0 if a = 0 and ϑ1 6= ϑ2 ;  −1 1 if a = 0 and ϑ1 = ϑ2 . This implies that ϑ −1(a) = ϑ(a)−1. By definition we have that ϑ 2(a) = (ϑ · ϑ)(a) = 2 q−1 q−1 × ϑ(a)ϑ(a) = ϑ(a ), so by induction, ϑ (a) = ϑ(a ) = ϑ(1) = 1, for all a ∈ Fq . Therefore, ϑ q−1 is the unit character 1 and so ϑ has order dividing q − 1, i.e., ϑ n = 1 where n | q − 1. Since q = pr, we see that gcd(n, p) > 1. From now on, let n denote the smallest integer such that n = 1. Hence, im( ) ⊆ . In particular, im( ) ⊆ ( ), ϑ ϑ µn ϑ Q ζn where ζn is a primitive nth root of unity. × Definition 14.5. The Gauss sum associated with w ∈ Fq and ϑ is the sum defined as

Trq(wy) Gw(ϑ) := ∑ ϑ(y)ζ . × y∈Fq

We define the Jacobi sum of ϑ1 and ϑ2 as J(ϑ1,ϑ2) := ∑ ϑ1(t)ϑ2(1 −t). t∈Fq (This is a discrete version of the convolution product.) We note that G1(1) = −1. Let me remark that to be perfectly accurate one should define the sum as

Trq(wy) Gw(ϑ) := − ∑ ϑ(y)ζ × y∈Fq 80 D. LARSSON but for simplicity, and to be consistent with the definition of quadratic Gauss sums, we disregard this minus sign. In addition, by the convention that ϑ(0) = 0, we can, unless ϑ = 1, extend the sum to the whole Fq. The above definition looks very strange but noticing that the Legendre symbol is a multiplicative character gives us hope and motivation for considering the general creature defined here:

Example 14.5. To see that this generalizes quadratic Gauss sums we take  ·   y  ϑ := , and w := 1, so T = G (ϑ ) = ζ y. p p 1 p ∑ p y∈Fp This will soon be generalized to nth power residue symbols, this being the reason for defin- ing Gauss sums via characters (nth power residue symbols are multiplicative characters).

−1 Theorem 14.17. Let ϑ,ϑ1,ϑ2 be characters and assume that ϑ1 6= ϑ2 . Then the Gauss and Jacobi sums satisfy the following properties:

(a) Gw(ϑ) ∈ Z[ζp,ζn] = Z[ζpn] = oQ(ζpn) and, in addition, J(ϑ1,ϑ2) ∈ Z[ζn] = oQ(ζn); −1 (b) Gw(ϑ) = ϑ(w) G(ϑ); (c) G(ϑ1)G(ϑ2) = G(ϑ1ϑ2)J(ϑ1,ϑ2); (d) G(ϑ) = ϑ(−1)G(ϑ −1); (e) G(ϑ)G(ϑ −1) = ϑ(−1)q; (f) G(ϑ)G(ϑ) = q.

Writing ζ without subscript will always mean ζp.

Proof. Since im(ϑ) ⊆ Z[ζn] ⊂ Q(ζn), Gw(ϑ) ∈ Z[ζp,ζn] and since gcd(n, p) = 1 we have Z[ζp,ζn] = Z[ζpn]. Hence (a) is clear. Now, observe that

Tr(wy) −1 Trq(wy) −1 Gw(ϑ) = ∑ϑ(y)ζ = ϑ(w) ∑ϑ(w)ζ = ϑ(w) G1(ϑ), proving (b). This means that we can restrict our attention to the case when w = 1, the general case being easily deduced from this special case. Also, (14.7) ∑ ϑ(a) = 0, ϑ 6= 1, a∈Fq which can be seen by an argument similar to the corresponding for quadratic Gauss sums. Indeed, assume ϑ(x) 6= 0,1. Then ϑ(x)∑ϑ(a) = ∑ϑ(xa) = ∑ϑ(a) =⇒ (ϑ(x) − 1)∑ϑ(a) = 0 =⇒ ∑ϑ(a) = 0. a a a a a

Now, assume that ϑ1,ϑ2 6= 1 and use the notation G(ϑ) := G1(ϑ). Then

Trq(x+y) Trq(z) G(ϑ1)G(ϑ2) = ∑ ϑ1(x)ϑ2(y)ζ = ∑ϑ1(x)ϑ2(z − x)ζ , x,y∈Fq x,z where we have put z = x + y. We split the last sum into two:

Trq(z) Trq(z) ∑ϑ1(x)ϑ2(z − x)ζ = ∑ ϑ1(x)ϑ2(z − x)ζ + ∑ϑ1(x)ϑ2(−x). x,z x,(z6=0) x 81

In the first of these sum we introduce t := xz−1 and get

Trq(z) Trq(z) ∑ ϑ1(x)ϑ2(z − x)ζ = ∑ ϑ1(t)ϑ1(z)ϑ2(z)ϑ2(1 −t)ζ x,(z6=0) t,(z6=0)

Trq(z) = ∑ ϑ1(z)ϑ2(z)ζ ∑ϑ1(t)ϑ2(1 −t) = G(ϑ1ϑ2)J(ϑ1,ϑ2). z6=0 t −1 As for the sum ∑x ϑ1(x)ϑ2(−x), we first assume that ϑ1 6= ϑ2 : ∑ϑ1(x)ϑ2(−x) = ϑ2(−1)∑ϑ1(x)ϑ2(x) = ϑ2(−1)∑(ϑ1 · ϑ2)(x) = 0, x x x −1 where the last equality follows by (14.7). This proves (c). When ϑ1 = ϑ2 we instead get −1 ∑ϑ1(x)ϑ2(−x) = ϑ2(−1)∑ϑ1(x)ϑ1 (x) = (q − 1)ϑ(−1). x x

The map Fq \{1} → Fq \ {−1} : t 7→ t/(1 −t) is a set-bijection, so J(ϑ,ϑ −1) = ∑ ϑ(a)ϑ(1 − a)−1 = ∑ ϑ(a/(1 − a)) = ∑ ϑ(b) = −ϑ(−1). a∈Fq a6=1 b6=−1 −1 −1 This means that if ϑ1 = ϑ2 , then G(1)J(ϑ,ϑ ) = ϑ(−1) since G1(1) = −1. Hence G(ϑ)G(ϑ −1) = qϑ(−1), and this proves (e). Also, G(ϑ) = ∑ϑ(y)ζ¯Trq(wy) = ∑ϑ(y)−1ζ −Trq(wy) = ∑ϑ(y)−1ζ Trq(w(−y)) y u y = ∑ϑ(−1)−1ϑ(−y)−1ζ Trq(w(−y)) = ϑ(−1)−1G(ϑ −1). y Since ϑ(−1) = ϑ(−1)−1 = ±1, (d) follows. Combining (d) and (e) gives (f), and the proof is finished.  Using property (c) above we get: Corollary 14.18. If n is the smallest integer such that ϑ n = 1, then ` ` 2 `−1 G(ϑ) = G(ϑ )J(ϑ,ϑ)J(ϑ,ϑ )···J(ϑ,ϑ) ∈ Z[ζp,ζn], for 1 ≤ ` ≤ n − 1, and, for ` = n, we get n 2 n−2 G(ϑ) = qϑ(−1)J(ϑ,ϑ)J(ϑ,ϑ )···J(ϑ,ϑ) ∈ Z[ζn]. Notice the difference in rings. 14.4. Cubic reciprocity. We will now state and prove Eisenstein’s law of cubic reci- procity. The right context to do this is in the cyclotomic extension Q(ζ3)/Q√, where ζ3 is a primitive third root of unity. As ζ3, is a third root of unity, it is √(−1 ± −3)/2 so we could, in fact√ simply consider this in the quadratic extension Q( −3)/Q. Indeed, Q(ζ3)/Q = Q( −3)/Q. However, the correct context for more general reciprocity laws more often than not involves cyclotomic extensions (at least from the historical point of view, e.g., Kummer) so using the language of cyclotomy seems more appropriate.

From now on in this section we will simply write ζ for ζ3. The ring of integers in oQ(ζ) is Z[ζ] = Z1 ⊕ Zζ ⊕ Zζ 2 with relation ζ 2 + ζ + 1 = 0. Here we write Eis for Z[ζ]. This is called the Eisenstein ring. 82 D. LARSSON

Proposition 14.19. The norm Nrm := NrmQ(ζ) on Q(ζ) is Nrm(a + bζ) = a2 − ab + b2. If a + bζ is a prime element (i.e., generating a prime ideal) then N(ha + bζi) = Nrm(a + bζ). The Eisenstein ring with the induced norm is a Euclidean domain and hence a PID and a UFD. The units in the Eisenstein ring Eis are {±1,±ζ,±ζ 2}.

Proof.  We introduce the cubic residue symbol as follows:   N(p)−1 Q(ζ)/Q;a a 3 ≡ (modp) for p - a. p 3 We will simplify notation and write  (ζ)/ ;a  bape := Q Q . p 3 As before we have Fermat’s little theorem: aN(p)−1 ≡ 1 (modp).

Proposition 14.20. For p - a, N(p) 6= 3 we have that  N(p)−1 i ba pe ≡ a 3 ≡ ζ (modp), a ∈ o for some unique i (depending on a, of course).

Proof. This follows since Q(ζ) includes all third roots of unity (by definition) and so the N(p)−1 equation z3 − 1 = 0 have three distinct solutions. The element a 3 is one of these so

N(p)−1 N(p)−1 N(p)−1 N(p)−1 2 a − 1 = (a 3 − 1)(a 3 − ζ)(a 3 − ζ ). This implies, since p is prime, that p divides exactly one of these factors, and so the result follows.  We also have the following propostion. Proposition 14.21. The cubic residue symbol satisfies the identities: (i) bape = bape2 = ba2pe; (ii) bape = ba¯p¯e Except for the second equality in (i), these relations are only valid in Eis. Proof. (i) follows since bape ∈ {1,ζ,ζ 2} and each element in this set is squared equal to N(p)−1 its conjugate. For the case (ii) notice that taking the conjugate of the congruence a 3 ≡  N(p)−1  N(p¯)−1 N(p)−1 ba pe (modp), we geta ¯ 3 ≡ ba pe (modp ¯). On the other hand,a ¯ 3 = a¯ 3 ≡      ba¯ p¯e (modp ¯), so ba¯ p¯e ≡ ba pe (modp ¯), which shows that ba¯ p¯e = ba pe.  Proposition 14.22. For every rational prime q ≡ 2 (mod3) we have ba2qe = ba¯qe and  bx qe = 1 for all x ∈ Z. Proof. Sinceq ¯ = q we get from the above propostion that ba¯qe = ba¯q¯e = baqe = 2    2   ba qe. Also, bx qe = bx qe = bx qe so, since bx qe= 6 0, we have bx qe = 1.  83

N(q)−1 Notice that N(q) = q2 so if q ≡ 2 (mod3) then q2 ≡ 1 (mod3), hence x 3 is well- defined. Every prime element π has five associates, namely, −π,±πζ,±ζ 2. Therefore we need a way to pick out precisely one element in this set. To this end we introduce the following terminology.

Definition 14.6. A prime element π ∈ Eis is called primary if π ≡ 2 (mod(1 − ζ)2).

Proposition 14.23. Suppose that π ∈ Eis is a prime element with N(π) = p ≡ 1 (mod3), then there is exactly one primary element in the set {±π,±πζ,±πζ 2}.

In this form the proposition is a bit technical to prove. However, by strengthening the concept of primary primes as follows the proposition becomes an easy check. The prime element π is primary if and only if π ≡ 2 (mod3). This means that if π = a + bζ then π is primary if and only if 3 | a − 2 and 3 | b. If you like you can assume in what follows that primary means this stronger condition.

Theorem 14.24. Let ϑ be a character of order three on Fp, where p ≡ 1 (mod3) is prime, i.e., ϑ 3 = 1. Then

J(ϑ,ϑ) = a + bζ, and G(ϑ)3 = p · J(ϑ,ϑ), where p = a2 −ab+b2, a ≡ −1 (mod3) and b ≡ 0 (mod3). In particular, if ϑ = b·pe for some p = hπi ∈ Spec(Eis) with π non-primary, then

J(ϑ,ϑ) = π, and G(ϑ)3 = π2π¯. Eis = o This theorem tells us how the Gauss and Jacobi sums decompose in Q(ζ3).

Proof. From Theorem 14.17 and its corollary we know that G(ϑ)G(ϑ) = p and G(ϑ)3 = √ ϑ(−1) · p · J(ϑ,ϑ). This shows that |J(ϑ,ϑ)| = p−1|G(ϑ)|3 = p. Note that ϑ(−1) = ϑ((−1)3) = ϑ(−1)3 = 1(−1) = 1 and G(ϑ)3 ≡ ∑ϑ 3(a)ζ 3a = ∑ ζ a = −1 (mod3). a a6=0 Therefore, since p ≡ 1 (mod3) we get

J(ϑ,ϑ) ≡ p · J(ϑ,ϑ) = ϑ(−1)p · J(ϑ,ϑ) = G(ϑ)3 ≡ −1 (mod3).

Now, consider the sum p−1 k Rk = ∑ a . a=1 Multiplying this with (p − 1)k we get

p−1 p−1 k k k (p − 1) Rk = ∑ (a(p − 1)) ≡ ∑ a = Rk (mod p), a=1 a=1  hence modulo p, Rk = 0. Assume now that ϑ = b· πe. From definition we have  N(π)−1 ba πe ≡ a 3 (modπ) 84 D. LARSSON and so     N(π)−1 N(π)−1 J(b· πe,b· πe) = ∑ ba πeb1 − a πe ≡ ∑ a 3 (1 − a) 3 a∈Fp a∈Fp N(π)−1 = ∑ (a(1 − a)) 3 ≡ 0 (modπ). a∈Fp The rest follows easily thus finishing the proof.  14.4.1. More on Galois theory of number fields. Let L/K be a finite extension of a number field K/Q and let p ∈ Spec(oK). Pick P ∈ Spec(oL) such that P | p. Recall the notation Fp := oK/p. Then FP/Fp is a cyclic Galois extension. Put ` := #Fp. The Frobenius is ` Frob`(a) := a , a ∈ FP, and this generates Gal(FP/Fp). There is a surjection GP  Gal(FP/Fp) whose kernel IP is called the inertia group of P. This group is the set of all elements in GP mapping to the identity in Gal(FP/Fp). ` Given σ ∈ Gal(FP/Fp) such that σ(a) ≡ a (modP) we can form the non-trivial coset σIP in GP. Notice that σ 6∈ IP since, modulo P, σ is not the identity. Any element of this coset induces the Frobenius modulo P, so we refer to this entire coset as the Frobenius  L/K   L/K  symbol and denote it . If I = {1}, then G =∼ Gal( / ) so is unique P P P FP Fp P in GP. −1 Given another prime Q | p we know that GQ = αGPα for some α ∈ Gal(L/K). The same applies to IQ and so  L/K   L/K  = α α−1. P P We note p is totally split ⇐⇒ GP = {1} for all P | p.  L/K  Since G /I =∼ Gal( / ), this being a cyclic group and maps to the gener- P P FP Fp P  L/K   L/K  ator of Gal( / ), is a generator for G /I . So = 1 if and only if p is FP Fp P P P P totally split. If L/K is abelian and IP = {1} all this is independent on the primes above p,  L/K  so we can form the unique Artin symbol, . What happens for quadratic fields? p

15. ARITHMETICAND GEOMETRY This will be a very brief and sketchy introduction to geometric aspects of number theory, and in particular elliptic curves. Throughout, K will denote any field. 15.1. Affine n-space. In algebraic geometry the space Kn = K × K × ··· × K (n times) is n n called the affine n-space over K and is denoted AK or simply A if the field K is under- stood. The reason for this (seemingly redundant) notation, An rather than Kn, has to do with the fact that algebraic geometers don’t consider the affine space with the Euclidean topol- ogy. Instead they use the larger (i.e., larger opens) and less “precise” (non-Hausdorff22)

22A topological space (i.e., a set with a collection of “open” subsets), is said to be Hausdorff (after Felix Hausdorff (1868–1942)) if any two points can be separated with open neighborhoods, in the sense that there is an open set around each point such that these sets has empty intersection. 85

(a) F(x,y,z) = x3y + y3z + z3x (b) F(x,y,z) = x4y + xz3 + y2xz2 + 8

FIGURE 2. Algebraic surfaces. topology called Zariski topology (after Oscar Zariski, 1899–1986), or maybe the etale´ topology which is in a sense more suited for number theory (but much more complicated n to define). Anyway, points in A are given in the normal way p = (p1,..., pn), pi ∈ K. The subject of algebraic geometry has one of its roots in number theory (Diophantine equations) where wanting to solve (systems of) polynomial equations

F1(z) = F2(z) = ··· = Fm(z) = 0, z := (z1,...zn), one is led to consider the solutions as geometric objects. Let us first consider the case of one equation F(z) = 0. The set of solutions, or zeros, to this equation is then n Z(F) := {p = (p1,..., pn) ∈ AKalg | F(p1,..., pn) = 0}. n F( ) = n a z a ∈ K This defines a hypersurface in AKalg . A special case is when z ∑i=1 i i, i , i.e., a linear form over K, in which case we get a hyperplane. This you all know from linear algebra. The case of general hypersurfaces in A3 is encountered in multivariable calculus or beginning differential geometry. n Notice that we require that the set of solutions have coordinates in AKalg . The reason for this can be guessed from the Fundamental Theorem of Algebra: “any polynomial equation have a solution”. The fact that this holds for multivariate polynomials is a consequence of the classical Hilbert’s Nullstellensatz (Hilbert’s zeroset theorem)23. But this theorem holds only for algebraically closed fields like C (which is the algebraic closure of R). However, in number theory (among other areas of mathematics) it is desirable to have solutions in subfields (or rings) of Kalg so this should somehow be included in the picture. Fear not, it is: the name is K-rational points (see below). I should be very careful and explicitly indicate, at every possible moment, that geometry is always24 done in (Kalg)n (see Remark 15.1); number theory, on the other hand, is not so interesting in algebraically closed fields. For number-theoretical purposes, algebraic closure is only a technical tool, not a thing to study in itself.

23David Hilbert (1862–1943). 24’Aaaahhhh’, I hear someone say.... Is this really true? No. 86 D. LARSSON

Now, what if we consider a system of equations F := {Fi(z)} = 0 as above? Well, since every Fi gives a hypersurface and we want the simultaneous zeros we simply take the intersection \ Z(F) = Z(Fi). i Hence the zero set is the intersection of a number of hypersurfaces. Zero sets of systems of polynomials is a special case of something called an algebraic (reducible) variety, which in turn is a special case of something called a scheme, which is a special case of something called a stack, which is a special case of... 25

15.2. Projective n-space. We now want to generalize affine space by adding “points at infinity”. The idea is that every pair of lines in affine space should intersect somewhere. Since parallel lines doesn’t one introduces a “point at infinity” for every direction (e.g., parallel lines have the same direction) where parallel lines intersect. There are two ways of doing this, one algebraic and one geometric. We give both since they complement each other very nicely.

n+1 Algebraic definition. Define an equivalence relation ’∼’ on AK by ∗ (p0,..., pn) ∼ (q0,...,qn) ⇐⇒ ∃α ∈ K , such that (p0,..., pn) = (αq0,...,αqn). We now define the projective n-space over K as the following set n n+1  PK := AK \{0} ∼ (notice the superscripts, indicating the dimensions). This means that two points in An+1 are equivalent if they lie on the same line through the origin, so Pn could be said to be the set of lines in An+1 through the origin. Of special interest for us are P1 = {lines through (0,0) in A2} and P2 = {lines through (0,0,0) in A3}. n+1 The equivalence class of (p0, p1,..., pn) ∈ A is denoted [p0 : p1 : ··· : pn], and are called homogeneous coordinates, to keep separate the notations for points in An+1 and Pn. Hence,

[p0 : p1 : ··· : pn] = [α p0 : α p1 : ··· : α pn], α 6= 0. It is not so easy to see from this definition what the “points at infinity” are. But this will become clear with the geometric definition.

Geometric definition. Observe from the above definition that P1 actually is the set of di- rections in A2. Similarly, Pn is the set of directions in An+1. The “geometric definition” is recursive in nature and goes like: Pn := An ∪ Pn−1. The points Pn−1 are then the “points at infinity”, one point for each direction in An. In particular, P2 = A2 ∪ P1 = {(a,b) ∈ A2} ∪ {the set of all lines through (0,0) in A2}. Passing from one description to the other is done according to:

25Actually this is far from a saturated chain of generalizations, but I wouldn’t want to spoil the fun of revealing everything, not leaving anything for future teachers. 87

Algebra Geometry 2 a b 2 P 3 [a : b : c], c 6= 0 −→ ( c , c ) ∈ A , P2 3 [a : b : c], c = 0 −→ [a : b] ∈ P1, P2 3 [a : b : 1] ←− (a,b) ∈ A2, P2 3 [a : b : 0] ←− [a : b] ∈ P1 So, in the above table, the case when c = 0 correspond to points at infinity. Clearly there is nothing special of our choice of c as being zero or non-zero, we could equally well have taken a or b. These choices correspond to different “affine coverings” of P2. We will see the some examples on how to handle projective geometry in the next section. 15.3. Algebraic curves. An algebraic curve is a one-dimensional algebraic variety, most often considered embedded (i.e., as being a subset of) some projective space Pn. However, we are only going to consider the “plane algebraic curves”, i.e., curves in P2 (or A2). We start with the affine case. Affine curves. The definition is simple and follows here: Definition 15.1. A (plane) affine algebraic curve over K is the zero set of a binary poly- nomial F(x,y) ∈ K[x,y], i.e., 2 Z(F) = {(a,b) ∈ AKalg | F(a,b) = 0}. We denote an algebraic curve by C or CF if we want to show off the polynomial also. To indicate that the defining polynomial have coefficients in K, we write C /K or CF /K, and say that the curve is defined over K. If only C appear, it means that the ground fields is irrelevant. (See Figure 3.) I want to be clear, once again, that for doing geometry we need to take the algebraic closure Kalg of K, although the coefficients of the defining polynomial belong to K. (See Remark 15.1) I now list a series of definitions and notions that are important to be aware of (the items marked ’t’, means that you can skip this while not impairing your understanding of the subsequent material one bit). • The degree of C is the degree of F as a polynomial. It is also very often the case that curves have other special names refering to their degree: – A quadric is an algebraic curve of degree two; – a cubic is an algebraic curve of degree three; – a quartic is an algebraic curve of degree four, and – then we have quintics, sextics, septics, octics, nonics,... etc. • A point p ∈ C is called singular if ∂F ∂F (p) = (p) = 0. ∂x ∂y If p is not singular it is called smooth or regular. The tangent space (or tangent line as actually is the case for curves), TpC , of C at a smooth p = (a,b) is the line ∂F ∂F TpC := (p)(x − a) + (p)(y − b) = 0. ∂x ∂y In Figure 3, curves (b,c,d,e) are singular. • If every point on C is smooth, C is called smooth or non-singular. • For a smooth algebraic curve, the genus, g(C ) is defined as (d − 1)(d − 2) g(C ) := , where d := degree of C . 2 88 D. LARSSON

(a) F(x,y) = x3 + y3 − 1 (Fermat curve) (b) F(x,y) = x3 − y4 − 2x2 + y2

(c) F(x,y) = x13 + y2 − x11 − x9 − x2 − x − 4 (d) F(x,y) = x5 + y3x2 − y2 + 7xy

(e) F(x,y) = y2 − x7 + x3 + xy (f) F(x,y) = x3 + y3 − x2 − x − y

FIGURE 3. The real part of algebraic curves C /Q. 89

It is significantly harder to compute the genus for singular curves26. • The curve is called irreducible if it only has one component, otherwise it is re- ducible. In Figure 3, the real part decomposes into irreducible components for curves (b,e,f). t The ideal I(C ) := { f ∈ K[x,y] | f (a,b) = 0, for all (a,b) ∈ C } is the vanishing ideal of C . If C is irreducible, then I(C ) = hFi, i.e., I(C ) is the ideal generated by the defining polynomial of C (which in this case is equivalent to F being irreducible implying that hFi, and thus also I(C ), is a prime ideal). The ring, K[C ] := K[x,y]/I(C ) = K[x,y]/hFi is called the coordinate ring of C . You can view it as the ring of functions on C . t A smooth point p ∈ C is called an inflection point or flex if the intersection of the tangent line at p and C has multiplicity strictly higher than two. The geomet- ric picture is that the tangent and the curve has “more contact” than an ordinary tangent (think of the curve y = x3 at the origin). • A point p = (a,b) ∈ C is called rational if a,b ∈ Q and integral if a,b ∈ Z. More generally, for a general field L ⊇ K, the set of points (a,b) ∈ L × L lying on the curve C , i.e., solving F(x,y) = 0, is denoted C (L), and called the L-rational points on L or simply L-points.

Projective curves. For the case of projective curves one has to be a little bit more careful (but not much). Recall that a point p ∈ P2 is given as [a : b : c] with abc 6= 0. But this is only determined up to a multiple so [αa : αb : αc] = [a : b : c] for α 6= 0. So if we want to speak about zeros of polynomial we have to take this into account somehow. The way to do this is to only consider homogeneous polynomials, i.e., polynomials in which each monomial term has the same degree. For example, (and we now write equations with capital letters as is customary in projective geometry): F(X,Y,Z) = Z3X + X2Y 2 + X4 − 2XY 3, is a homogeneous polynomial of degree four. In general a homogeneous polynomial of degree n can be written: i j k F(X,Y,Z) = ∑ fi jkX Y Z , fi jk ∈ K. i+ j+k=n Suppose that [a : b : c] is a zero of this polynomial, i.e., F(a,b,c) = ∑ aib jck = 0. i+ j+k=n However, the ambiguity of the point [a : b : c] (since it represents a whole line) forces [αa : αb : αc], α 6= 0, to be a zero also. Here then comes the reason (or one of them) for

26The geometric motivation for the genus is that it measures how many “holes” the curve has. It is hard to get a feeling for this from the above. To really understand this one has to represent the algebebraic curve analytically as a compact Riemann surface and then the genus is a topological invariant on this surface corresponding to the number of holes it has (i.e., how many adjoined donuts it consists of). 90 D. LARSSON using homogeneous coordinates:

i j k i+ j+k i j k F(αa,αb,αc) = ∑ fi jk(αa) (αb) (αc) = ∑ fi jkα a b c = i+ j+k=n i+ j+k=n n i j k n = α ∑ fi jka b c = α F(a,b,c) = 0. i+ j+k Now we can define: Definition 15.2. A projective (plane) algebraic curve over K is the zero set (which we now know is well-defined) of a homogeneous polynomial. Spelled out, suppose F(X,Y,Z) ∈ K[X,Y,Z] is homogeneous, then Z(F) = {[a : b : c] ∈ P2 | F(a,b,c) = 0} ⊂ P2. Notice that the number of variables is three even though we speak of a plane curve which intuititively should be dependent on two variables. In fact, normally when one speaks about “algebraic curves” it is most often implied to mean projective algebraic curves. There is a way to move between projective and affine curves called “homogenization” and “dehomogenization” depending on the direction. Hopefully the following table will clarify things. Affine/Dehomogenized Projective/Homogenized n i j n i j n−i− j ∑i, j fi jx y −−−−→ ∑i, j fi jX Y Z mult. Z

i j i j k ∑i+ j+k=n fi jkx y ←−− ∑i+ j+k=n fi jkX Y Z Z=1 The homogenized version C¯ of an affine curve C is often called the projective closure of C . The reason is that C¯ is C with added points at infinity, making the curve “compact” in a topological sense. The above definitions apply equally to projective curves with one exception: singulari- ties. So we define: Definition 15.3. A point p = [a : b : c] ∈ C ⊂ P2 is called singular if, after dehomogeniz- ing, the point is singular in the affine sense. The point is called smooth or regular if it is not singular. The set of all singular points (and this applies to affine curves also) is called the singular locus of C , denoted Sing(C ). It is a theorem that the Sing(C ) is finite for all curves. At this point it is maybe nice to refer back to Figure 3 with some additional remarks. So, look back at Figure 3 and compare with the table of data below (that I computed using Maple): C g(C ) Sing(C ) Singular data (a) 1 - (-,-,-) (b) 2 [0 : 0 : 1] (2,1,2) (c) 6 [0 : 1 : 0] (11,60,1) (d) 4 [0 : 0 : 1], [0 : 1 : 0] (2,1,2), (2,1,1) (e) 2 [0 : 0 : 1], [0 : 1 : 0] (2,1,2), (5,12,1) (f) 1 - (-,-,-) The singular data appearing above are invariants associated with the singular point: (m,δ,B), where m is the multiplicity, δ is the so-called δ-value (which is kind of hard to explain), 91 and B is the number of branches at the singular point (i.e., then number of “arms” of the curve that cross in that point). You can check that the above numbers agree with the for- mula for the genus in the relavant cases. Notice that (c) has a single singularity at infinity, while (d) and (e) has one (affine) sin- gularity at origo and one at infinity. Curve (b) has only one singularity (at origo). Observe also that the data (2,1,2) means that the singularity is a double point. Curves (c,e), the point at infinity is an inflection point. Looking at the curves it would appear as if (d,e) has inflection points at origo, but this is only a result of fooled by a drawing.

15.4. Cubic and elliptic curves. A cubic (plane, as always) curve is simply a curve of degree three. Let us first treat the affine case. So, by an affine cubic curve we mean a curve (over K) with defining equation 3 2 2 2 2 3 a1x + a2x + a3x + a4xy + a5x y + a6y x + a7y + a8y + a9y + a10 = 0, ai ∈ K. The amazing thing is now that every such curve can be written as (or in algebraic-geometric lingo, is “birationally equivalent” to) a curve with definining equation (15.1a) C(x,y) := y2 − x3 − ax2 − bx − c = 0, a,b,c ∈ K, or even 2 3 (15.1b) CW (x,y) := y − 4x + g2x + g3 = 0, g2,g3 ∈ K. Either of these is said to be put in Weierstrass normal form27 (although, in honesty, I think only the second one could shamelessly be referred to as a Weierstrass normal form). Over the algebraic closure Kalg of K, the polynomial f (x) := x3 + ax2 + bx + c can be completely factorized as

f (x) = (x − α1)(x − α2)(x − α3). 28 When the αi’s are all distinct the cubic curve (15.1a) is called an elliptic curve . (See Figures 3(a,f), and 4.) We denote elliptic curves by E and arbitrary cubics by C . That the zeros of f (x) are distinct is equivalent to E being non-singular. This can be seen as follows. Recall that a point p on an arbitrary algebraic curve is singular if and only if ∂xF(p) = 2 ∂yF(p) = 0, where F is the defining polynomial of the curve. In our case F(x,y) = y − f (x). So for p ∈ E to be singular we need that 0 ∂xF = − f (x) = 0 and ∂yF = 2y = 0. This implies immediately that y = 0 and so any singular point is situated on the x-axis. If there were an a such that f (a) = f 0(a) = 0, this would mean that f (x) had a as a multiple zero, and so the zeros can not be distinct in this case. Conversly, if f (x) has multiple zeros 0 we get that f (a) = f (a) = 0 for some a and so y = 0 implying that ∂yF(a,0) = 0. So this was the affine case. The projective case is easily deduced from this by homogenizing. Therefore, a projective cubic curve is the zero set of a homogeneous polynomial: (15.2) C(X,Y,Z) := Y 2Z − X3 − aX2Z − bXZ2 − cZ3, a,b,c ∈ K. This defines a projective elliptic curve if its dehomogenized version (with respect to Z) is an affine elliptic curve.

27Karl Theodor Wilhelm Weierstrass (1815–1897). 28The name stems from the fact that these curves appear (via “elliptic functions”) when one tries to compute the arc-length of ellipses. Elliptic curves are not ellipses themselves. 92 D. LARSSON

(a) F(x,y) = y2 − x3 (b) F(x,y) = y2 − x3 − x2 + x

(c) F(x,y) = y2 − x3 + x2 (d) F(x,y) = y2 − x3 + 3x − 3

FIGURE 4. Cubic curves.

Remark 15.1. At this point I will (as I did before) admit to one fault: I am being a little bit sloppy concerning the ground field as some observant reader might have seen. Even though the equation of the curve is defined over K, the geometry takes place over the algebraic closure Kalg of K. Otherwise it could happen that the curve is the empty set! This is the reason for introducing the notion of K-rational points, to still be able to query whether a curve has points over K. 15.5. The group structure of an elliptic curve.

Pell’s equation again. Recall the Pell equation, i.e., what are the integer solutions to x2 − dy2 = 1, d square-free.

Now we see that we have come almost full circle: finding√ integer solutions to this equation is the same as finding√ those elements in the subring Z[ d] (recall, not necessarily the ring of integers) of Q( d) having norm one. But even more concerning this equation was true: the integer solutions formed a group! The rule was given by

(x1,y1) ∗ (x2,y2) = (x1x2 + dy1y2,x1y2 + y1x2). 93

(a) Addition of P + Q (b) Addition of P with itself: P + P

FIGURE 5. Group law (the grey area at the top is meant to symbolize the point at infinity, O).

2 2 Recall that the proof√ relied√ on the fact that√ we could factor x − dy (non-uniquely in general) as (x − dy)(x + dy) in the ring [ d] ⊆ o √ . Z Q( d) This means that we, given a point with integer coordinates on the curve defined by 2 2 CPell : x − dy − 1, could find all other such points via the above group structure. The fact that an integer solution exists at all is due to Lagrange (recall Lagrange’s theorem on quadratic forms!), with a simpler proof given by Dirichlet in the 1840’s. Notice also that the curve CPell defined by the Pell equation is a quadric over Z, so the arithmetic associated with the integer points on CPell is structured as an infinite cyclic (abelian) group, that is, isomorphic as a group to Z. We will now show that the same is true for elliptic curves, although the underlying group is significantly more complicated (and much is still unknown).

15.6. The group law. We will begin by showing the geometry behind the group law, i.e., how the group law is defined geometrically. Then we will derive actual formulas. Consider the generic elliptic curve in P2: Y 2Z = X3 + aX2Z + bXZ2 + cZ3. Putting Z = 0 yields the points at infinity, i.e., X3 = 0. Hence, the points at infinity for an elliptic curve is actually a point at infinity, namely [0 : 1 : 0] with multiplicity three. Thinking affinely, this can be thought of as a triple point at infinity along the y-axis. Put O := [0 : 1 : 0]. This will be the identity element for the group law. This implies that every line in P2 will intersect E in exactly three points: for instance, any line parallel to the y-axis will intersect E at infinity three times since this is a triple point. Assume now that we have two rational points, P and Q. We define their (group) sum as follows. Let L be the line in P2 going through P and Q. Recall that for our purposes here, P2 ∩ E can simply be thought of as the affine curve together with O. The line L intersect E in three points: P, Q and a third which we denote P∗Q. Any elliptic curve is symmetric around the x-axis which is obvious since there is only one appearance of y and this is y2. This means that we can reflect P ∗ Q with respect to the x-axis to get a point P + Q, which we define to be the (group) sum of P and Q. If we don’t want to rely on the geometric 94 D. LARSSON picture so much the above reflection can be given as taking the line L0 connecting P ∗ Q and O, and then the third intersection point is P + Q.

Theorem 15.1. The curve E /Q is endowed with an abelian group structure, E (C), with the above defined addition and O the identity element. Furthermore, the rational points (or Q-points), E (Q), form a subgroup of E (C). In fact, restricting to any field extension K/Q yields a group structure on E (K). The proof of this theorem, i.e., the proof that this is actually a group, is too complicated to give here, the hard part being the associativity. It is possible to “prove” this geometrically but it would take up to much of my precious space29. However, that O is the identity and what the inverses are, are rather simple to address, so this I will do. First, P + O = P: draw a line between P and O. The third intersection point is PO; connecting this with a line through O clearly gives back P. As for the inverse, we use the following construction: take the tangent line L at O and let the third intersection point (L has two intersections at O) be T. Then the third intersection of the line connecting PT is defined to be −P. By construction then P+(−P) = O since L meets O twice. Notice that these constructions do not actually use that O is a point at infinity. This is no accident but to go into it would take us to far. Let us now derive explicit formulas for the group law. Explicit formulas. This derivation is surprisingly simple. Suppose that P = (p1, p2) and Q = (q1,q2) and that P 6= Q. We want to find P + Q. Observe that it is sufficient to find PQ = (l1,l2) since P + Q = (l1,−l2). Now, a line L 2 through (p1, p2), (q1,q2) in A is given by the equation (as you all know from somewhere way back in your dark ages) q2 − p2 q2 − p2 q2 − p2 y − p2 = (x − p1), or y = x + p2 − p1. q1 − p1 q1 − p1 q1 − p1 Plugging this into the affine equation for E in Weierstrass form, where we put k := q2−p2 q1−p1 q2−p2 and m := p1, we get q1−p1 y2 = (kx + m)2 = x3 + ax2 + bx + c. Expanding this we get, x3 + (a − k2)x2 + (b − 2km)x + (c − m2) = 0. This is a cubic equation in one variable so it gives us three zeros. These zeros are the x-coordinates of the intersection of the line L and E . So, 2 2 a − k = −(α1 + α2 + α3), b − 2km = (α1α2 + α1α3 + α2α3), c − m = −α1α2α3, 3 2 2 2 where α1,α2,α3 are the zeros of the polynomial x + (a − k )x + (b − 2km)x + (c − m ). Since α1 and α2 (say) are known (these are the already given x-coordinates) we find the last, α3, as (for example) 2 2 α3 = k − a − p1 − q1, and so y = k(k − a − 2p1 − q1) + p2.

29See the book by Joe Silverman and , Rational points on cubic curves, Springer-Verlag, 1994, for instance (I follow their presentation to some extent). This is a very good book (although, as many American books, a bit talkative, you know, avoiding coming to the point until it is impossible to come up with anything ¨ more to say `). 95

Finally, then

 2 2   q2 − p2 (15.3) P + Q = k − a − p1 − q1,−k k − a − q1 + p2 , where k = . q1 − p1 So what happens if P = Q? Clearly the above formulas don’t work in this case since we get a zero denominator. Therefore, you need to take the tangent at P, this giving a line through dy f 0(x) P and P. In this case the k-value is dx = 2y and from this we continue as before. Let us introduce the convenient notation x(P),y(P) for the x, y-coordinates, respectively, of P. Then, from the above we get the following useful formula, sometimes called the duplica- tion formula, for x(2P): 1 x(P)4 − 2bx(P)2 − 8cx(P) + b2 − 4ac (15.4) x(2P) = . 4 x(P)3 + ax(P)2 + bx(P) + c We now use the above to investigate some group-theoretical, and thus arithmetical, conse- quences of what has just been done.

15.7. Points of finite order. Let G be a group. Recall that an element g ∈ G is said to have order n if the cyclic group it generates has order n, or (with abelian notation): ng = g + g + ··· + g = O. | {z } n times We now want to investigate when a point P ∈ E has finite order. Let us start with low orders first and treat the general case in the next section. The case of order two is simple. Suppsose P is such that 2P = O, or equivalently, P = −P. In coordinate form this amounts to −(p1, p2) = (p1,−p2), implying that p2 = 0. Hence there are exactly three non-zero points of order two on any elliptic curve, namely

P1(2) = (α1,0), P2(2) = (α2,0), and P3(2) = (α3,0), 3 2 where α1,α2,α3 are the zeros of the polynomial x + ax + bx + c (recall that for elliptic curves this polynomial has distinct zeros), and where the notation Pi(2) is meant to indicate that the order is two. Clearly, 2O = O so this also has order two, giving the complete set of order two elements in E (C) as 2 := {O,P1(2),P2(2),P3(2)}. As the set of all elements of a given order in a group is a subgroup, the set 2 is a subgroup of E (K). Which group it is dependson which field K we are looking for points in; namely, if all the αi’s are distinct or not in this fiels K. In the case E (C) it is easy to see that 2 = Z/2Z × Z/2Z, since adding two non-zeros points gives the third. Let us now turn to the case of order three. This is more involved. If 3P = O then 2P = −P and so x(2P) = x(−P) = x(P). On the other hand, if x(2P) = x(P), P 6= O, then 2P = ±P, implying that either P = O or 3P = O. Therefore, the points on E of order three are exactly those satisfying x(2P) = x(P). We now have the following theorem, part one of which we have already proven. Theorem 15.2. Let E be non-singular elliptic curve on the Weierstrass normal form. Then, (a) A point P ∈ E (K) has order two if and only if y(P) = 0. Moreover, E has exactly four points of order dividing two, 2 = {O,P1(2),P2(2),P3(2)}' Z/2Z × Z/2Z (the isomorphism holds only if E (C)), with O the only element of order less than two. 96 D. LARSSON

(b) A point P = (x,y) ∈ E has order three if and only if x is a zero of h(x) = 3x4 + 4ax3 + 6bx2 + 12cx + (4ac − b2). Furthermore, E has exactly nine points of order dividing 3, namely n p p p p o 3 := O,(α1,± f (α1)),(α2,± f (α2)),(α3,± f (α3)),(α4,± f (α4)) ,

where αi are the distinct complex zeros of h(x). (The point at infinity is the only point with order one.) In addition, when E (C), we have 3 ' Z/3Z × Z/3Z. Proof. We have proved part (a) above. For part (b) we use the duplication formula (15.4) of the last section to get the polynomial h(x). In fact, we have seen that we have to have x(2P) = x(P), so (15.4) gives 1 x(P)4 − 2bx(P)2 − 8cx(P) + b2 − 4ac = x(P). 4 x(P)3 + ax(P)2 + bx(P) + c Rewriting this gives h(x). f 0(x)2 Notice that x(2P) = 4 f (x) − a − 2x and so, since x(2P) = x(P), and using the fact that f 00(x) = 6x + 2a, we can rewrite h(x) as h(x) = f 0(x)2 − 2 f (x) f 00(x). In this form it is easy to see that h(x) has four distinct zeros. Indeed, differentiating h(x) yields h0(x) = 2 f 0(x) f 00(x) − 2 f 0(x) f 00(x) − 2 f (x) f (3)(x) = −2 f (x) f (3)(x). However, this is equal to −12 f (x). This shows that if α were a common zero of h and h0 then it would also be a common zero of −12 f (x) and f 0(x)2 − 2 f (x) f 00(x) which would imply that α was a common zero of f (x) and f 0(x), in contradiction with the assumption that E is an elliptic (non-singular) curve. Hence h(x) has four distinct complex zeros, α1,α2,α3,α4. Therefore, the set n p p p p o O,(α1,± f (α1)),(α2,± f (α2)),(α3,± f (α3)),(α4,± f (α4)) must be the set of all points of order three and f (αi) 6= 0, since that would imply that p (αi, f (αi) had order two, which contradicts the assumption that it has order three. The only other point with order dividing three is obviously O. There are only two non-isomorphic abelian groups of order nine: Z/9Z and Z/3Z × Z/3Z. However, 3 is not cyclic so we can rule out Z/9Z. The proof is finished.  15.8. The Nagell–Lutz theorem. We now come to the Nagell–Lutz theorem. Let me first point out that we will not be able to prove this in its full detail since the proof is rather long, albeit elementary (i.e., it does not use any sophisticated techniques beyond what you have already seen here). Before I can state it properly I need to define something that you have already seen but not in this generality, namely the discriminant of a polynomial. Without going into detail on the general definition, let me simply announce that, for a cubic curve in Weierstrass normal form, the discriminant (equivalently, the discriminant of f (x)) is given by 2 2 3 3 2 ∆C := Disc(C ) := a b + 18abc − 4a c − 4b − 27c . We have 97

Theorem 15.3. The discriminant is zero if and only if Disc(C ) is singular (or if and only if f (x) has multiple zeros).

This means that if Disc(C ) 6= 0 then C is an elliptic curve with the definition we have given. Notice that we work almost exclusively with curve having a = 0. Then the discrim- inant becomes the much simpler expression

3 2 ∆C = −4b − 27c . Theorem 15.4 (Nagell–Lutz). Let E be an elliptic curve given in Weierstrass normal form, and let P = (u,v) ∈ E be a rational point of finite order, i.e., P ∈ E (Q) and nP = O for some n > 0. Then P is in fact integral, i.e., u,v ∈ Z, and v is either zero (and so P has order two), or divides the discriminant, v|Disc(E ). This theorem was proved by Trygve Nagell in 1935 and, independently, by Elisabeth´ Lutz in 193730.

Proof. As I said the proof of this is going to be very sketchy. With a bit of ardour and determination, you should be able to fill in most of the missing details. 

15.9. Mordell’s Theorem and Conjecture.

Theorem 15.5 (Mordell). Let E /Q be an elliptic curve. Then the group E (Q) of Q- rational points is a finitely generated abelian group. This theorem was proved in 1922 by Louis Mordell (1888–1972) and was subsequently generalized by Andre´ Weil (1906–1998) to so called abelian varieties over number fields. However, an explicit (finite) algorithm that works in general, designed to compute the actual group, is still a subject of much research. On the other hand, with some guile one 31 can compute E (Q) in some special cases. But this is beyond this presentation . Mordell is famous for one more thing, highly relevant to all this, namely the so-called Mordell conjecture. This was proved by Gerd Faltings (born 1954) in 1983 using ex- tremely sophisticated methods from algebraic geometry and number theory. The conjec- ture (now Falting’s theorem) states:

Theorem 15.6. Let C be a smooth algebraic curve. Then, either: • g(C ) = 0 with C (Q) = /0,or |C (Q)| < ∞, or • g(C ) = 1 with C (Q) = /0,or C is an elliptic curve, C (Q) is a finite abelian group (see Barry Mazur’s (born 1937) theorem below) or, by Mordell’s theorem, |C (Q)| = ∞, in which case the group is finitely generated and abelian, or • if g(C ) > 1 then there are at most finitely many rational solutions.

Theorem 15.7 (Mazur). Let E /Q be a curve over Q, and assume that E has a point of finite order. Then the subgroup of E (Q) of elements of finite order is either • isomorphic to Z/nZ, for 1 ≤ n ≤ 10 or n = 12, or • isomorphic to Z/2Z × Z/nZ, for 1 ≤ n ≤ 4. This is a remarkble and deep theorem! Think about what it means.

30Elisabeth´ Lutz (1914–2008). 31See the book by Silverman–Tate from a previous footnote. 98 D. LARSSON

16. GAUSS’CLASS NUMBER PROBLEMANDTHE RIEMANNHYPOTHESES: AHISTORICAL SURVEY 16.1. Gauss’ class number problem. This and the following few sections contains an overview concerning class numbers, quadratic forms and quadratic fields. It turns out (which I didn’t know) that there is a subtle connection with the (generalized) Riemann hypotheses! Gauss32 was a remarkable mathematician33. One indication of this was his extraordi- nary ability to do (hard) computations. For instance, he computed a long list of class num- bers for different negative discriminants of quadratic forms (i.e., positive definite forms). Doing this Gauss saw that for a finite number of the discriminants ∆ the computed class number h(∆) was one. In all other cases h(∆) > 1. He made the conjecture that his list of discriminants, and thus of positive definite quadratic forms, was complete. Gauss was unable to prove this (or didn’t try) and it was not until more than hundred years later that some progress was made. I’ll come back to that in a while.

A few recollections. Let Q : Z × Z → Z be a quadratic form in X,Y given as Q(X,Y) = aX2 + bXY + cY 2. Then the discriminant of Q is defined by Disc(Q) := b2 − 4ac. Put (Disc(Q) − Ω)/4, where Ω := (1 − (−1)b)/2 is an integer. If we make a change of variables α β (X0,Y 0)T = A · (X,Y)T , where A := γ δ the discriminant changes to Disc(Q0) = det(A)2Disc(Q). Recall that the special linear group SL2(Z) is defined as α β  SL ( ) := | α,β,γ,δ ∈ , αδ − βγ = 1 . 2 Z γ δ Z

Geometrically, this is the set of all invertible matrices of the integral lattice Z2, not chang- ing the area. There is an equivalence relation on the set of all binary quadratic forms given as: 0 A 0 0 0 Q(X,Y) ' Q (X,Y) ⇐⇒ ∃A ∈ SL2(Z), Q(X,Y) −→ Q (X ,Y ), meaning that Q transforms to Q0 with the indicated change of variables. One can easily show that equivalent forms have the same discriminant and 0 if Q ' Q , then {P(x,y) | x,y ∈ Z} = {Q(x,y) | x,y ∈ Z}. A quadratic form Q(X,Y) is said to be positive definite if Q(x,y) > 0 for x,y ∈ Z. Note that this means that Disc(Q) < 0. Any positive definite quadratic form Q+(X,Y) is equivalent to a unique, reduced, quadratic form, i.e., to a Q(X,Y) = aX2 + bXY + cY 2, such that ( c > a, and b ≤ |a|, or c = a, and 0 ≤ b ≤ a.

Let ∆ ∈ Z. The class number, hQ(∆) of ∆ is the number of equivalence classes of positive definite, quadratic forms of discriminant ∆. We put ’Q’ as a subscript in hQ(∆) to indicate

32Johann Carl Friedrich Gauss (1777–1855). 33I urge you to look up all the mathematicians I mention, either at Wikipedia or http://www-groups.dcs.st-and.ac.uk/. 99 that it is the class number associated to quadratic forms. There is (at least) one more which we will see in a few moments. A natural question is then:

Gauss’s class number problem: For which ∆ < 0 is hQ(∆) finite? Or how many equiva- lence classes of quadratic forms are there with a given class number? Gauss himself conjectured that the hQ(∆) is finite for all ∆ < 0 and made a list of those ∆’s with hQ(∆) = 1. He conjectured further that this list was complete for hQ(∆) = 1. We will discuss the “class number one problem” here which found a solution in the 1950’s. To do this we need to change venue so to speak and discuss class numbers of (quadratic) number fields and show that this is somehow equivalent to the class number one problem of Gauss. 16.2. Quadratic fields and forms. Some generalities. Before we can even begin to answer the Gauss class number one prob- lem we have to wallow through some general ideas applicable to all number fields. Un- proved claims will have to be taken on faith here. Proofs can be found in almost any standard textbook on algebraic number theory. Definition 16.1. Let D be an integral domain with field of fractions Frac(D). Then a fractional ideal of Frac(D) is a non-zero, finitely generated D-submodule i ⊆ Frac(D). We denote the set of fractional ideals in Frac(D) by J(Frac(D)). Remark 16.1. This definition is equivalent to the following. A D-submodule a of Frac(D) is a fractional ideal if a = α · j := {x ∈ K | x = αa, a ∈ j} where j is a non-zero ideal of D and α ∈ Frac(D)∗ = Frac(D) \{0}. That the former definition implies the latter is clear. For the other direction, note that it is possible to take a −1 −1 −1 generating set for the fractional ideal as defined above as {a1b ,a2b ,...,anb }, where −1 ai,b ∈ D. Now take α = b and then {a1,a2,...,an} is a generating set for an ideal of D. The element b is a common denominator. Remember that the multiplication of two ideals is defined as i · j := { ∑ i j | i ∈ i, j ∈ j}. finite The same rule applies to fractional ideals, and we have the following proposition. Proposition 16.1. Put K := Frac(D) and let J(K) denote the set of fractional ideals in K = Frac(D). Then J(K) is an abelian group under the multiplication defined above, with identity element h1i = D and inverses defined by a−1 := {x ∈ Frac(D) | xa ⊆ h1i = D}.

Two fractional ideals a and b of an arbitrary algebraic number field K/Q are said to be equivalent, a ∼ b, if there are principal ideals hg1i,hg2i, with g1,g2 ∈ oK/Q such that

ahg1i = bhg2i. The quotient group J(K)/ ∼, defined by this equivalence relation is called the ideal class group of K/Q, denoted Cls(K), and the class of a ⊂ oK is called the ideal class of a. The number of element (the order) in Cls(K) is called the class number of K/Q, denoted hK(∆). 100 D. LARSSON

Notice that when hK(∆) = 1, it follows from definition, that every ideal in oK is princi- pal. Recall that for any Z-module M (not necessarily free) and Z-submodule m ⊂ M, we denote by (M : m) the index of m in M, i.e., the number of (left) cosets to m. We will however be interested in the case when M is a free Z-module, implying that m is also free of rk(m) ≤ rk(M). In addition, if rk(m) = rk(M) and if f = ωe is a basis for m, where e is a basis for e, then (M : m) = det(ω). The following identity holds for Z-modules n ⊆ m ⊆ M: (M : n) = (M : m)(m : n), (“Tower law”). Notice that it is only by visual analogy that I refer to the above as a tower law. The proof of this identity is a simple counting argument using Lagrange’s34 theorem. It holds for all finite groups (see any textbook in abstract algebra). Let i be an ideal in oK for K/Q a number field. We define the absolute norm of i as + N(i) := (oK : i), N : Id(oK) → Z , where Id(oK) denotes the set of all ideals in oK. This can be extended to a homomorphism of groups n N : J(K) → + := { | n,m ∈ +}, Q m Z and satisfies

N(a · b) = N(a)N(b), and N(hai) = |NrmK/Q(a)|.

Notice that, by the above multiplicative property, we have that N(oK) = 1 since oK is the unity in J(K). From this follows that we can define N(a−1) = N(a)−1. We don’t really need to know how to compute N on fractional ideals, that would take us into factorization issues in Dedekind35 domains. An element a ∈ K/Q is called totally positive if σ(a) > 0 for all real imbeddings + ∗ σ : K → R. The set of all totally positive elements form a subgroup K ⊆ K = K \{0}. A principal (fractional) ideal hai is called totally positive if a ∈ K+. The set of all totally + positive principal ideals form a subgroup PK ⊆ PK, of all fractional principal ideals of K/Q. We define +  + Cls (K) := J(K) PK , the narrow ideal class group. Finally, we notice that the identity Disc({a1,...,an}) = 2 Disc(K/Q)(a : Z) , where {a1,...,an} is a basis for the fractional ideal a, can be written, 2 Disc({a1,...,an}) = Disc(K/Q)N(a) . Ideals and forms. Recall that ( √ 1 ⊕ d, if d ≡ 2,3 (mod 4); o √ = Z Z √ Q( d) 1+ d Z1 ⊕ Z 2 , if d ≡ 1 (mod 4). √ Suppose the discriminant of Q( d) is ∆. Then the above two cases can be expressed uniformly as: √ ∆ + ∆ o √ = 1 ⊕ . Q( d) Z Z 2

34Joseph-Louis Lagrange (1736–1813). 35Julius Wilhelm Richard Dedekind (1831–1916). 101 √ Indeed, since (∆ + ∆)/2 has minimal polynomial z2 − ∆z + (∆2 − ∆)/4, the discriminant √ √ of this polynomial, being the discriminant of ( d) = ( ∆+ ∆ ), is ∆ in both cases. We √ √ Q Q 2 ∆+ ∆ put L := Q( d) = Q( 2 ) and observe √ √ ∆ + ∆ ' √ L = ( d) = ( ) −→ ( ∆). Q Q 2 Q √ When writing ∆ we always√ mean√ √ the positive square-root if ∆ > 0 and the positive imag- inary part if ∆ < 0, e.g, −a = a −1, a > 0. We know that since [L : Q] = 2 there is only one non-trivial imbedding of L into C, namely √ σ √ σ : L ,→ C, a + b d 7→ a − b d ∈ L ⊂ C. √ (Clearly the σ has to be evaluated differently when√ using √∆; but this is easy to deter- mine.)The trivial imbedding is the identity a + b d 7→ a + b d. Notice that both of these imbeddings are automorphisms, i.e. isomorphisms L → L. We will find it convenient to number these automorphisms to simplify formulas. For this we choose σ1 = id and σ σ2 = σ 6= id. Sometimes we will also use the notation a for σ(a). We fix a fractional ideal a. Suppose a has a basis Ba := {a1,a2}. We note that 2 2 det(σi(a j)) = ∆N(a) . We will only consider ordered basis sets, so Ba{a1,a2} and ¬ Ba {a2,a1} are considered different. An ordered basis Ba := {a1,a2} is called normal- ized if √ det(σi(a j)) = ∆N(a) ⇐⇒ N(a)det(σi(a j)) > 0.

¬ By the properties of the determinant, either Ba or Ba is normalized. Given a normalized basis Ba := {a1,a2}, we define

−1 QBa (X,Y) := N(a) NrmL/Q(Xa1 +Ya2).

To simplify notation we will denote QBa by Qa. When different basis sets are used this will be explicitly part of the notation as above. By definition, for X,Y ∈ Z, Xa1 +Ya2 ∈ a and by the “Tower law” we have that

0 0 a ⊆ a ⊆ oL ⇒ N(a)|N(a ),

0 and so, taking a := hai, for a ∈ a, we get N(a)|NrmL/Q(a). This holds also for fractional ideals, but to see this would once again require us to venture into factorization of (frac- tional) ideals in Dedekind domains and so we only state this as a fact and use it without further thought.

That N(a)|NrmL/Q(a), implies that Qa(X,Y) is an integral quadratic form. Recalling that aσ = σ(a) by matter of convention, we have that

σ σ NrmL/Q(Xa1 +Ya2) = (Xa1 +Ya2)(Xa1 +Ya2 ) σ 2 σ σ σ 2 = a1a1 X + (a1a2 + a1 a2)XY + a2a2 Y and so σ 2 σ σ σ 2 Qa(X,Y) = a1a1 X + (a1a2 + a1 a2)XY + a2a2 Y N(a). 102 D. LARSSON

From this we compute the discriminant:

−2 σ σ 2 σ σ  Disc(Qa) = N(a) (a1a2 + a2a1 ) − 4a1a1 a2a2

−2 σ 2 σ σ σ 2 σ σ  = N(a) (a1a1 ) + 2a1a2 a2a1 + (a2a2 ) − 4a1a1 a2a2 −2 σ σ 2 −2 2 = N(a) (a1a2 − a1 a2) = N(a) det(σi(a j)) = ∆.

2 −1 Clearly, the coefficient of X in Qa is N(a) NrmL/Q(a1), so if ∆L < 0, then Qa is positive definite. We will now study what happens to a Qa(X,Y) when we change basis. So let Ba = 0 0 0 {a1,a2} be the old basis and Ba := {a1,a2} be a new. They are related by  0      0 a1 α β a1 a1 = αa1 + βa2 (16.1) 0 = ⇐⇒ 0 a2 γ δ a2 a2 = γa1 + δa2. 0 0 0 0 If u = X a1 +Y a2 in the new basis then it becomes u = Xa1 +Ya2 in the old, with X = X0α +Y 0γ and Y = X0β + δY 0. Applying σ to the system of equations (16.1) we get the following matrix equation:  0 0     a1 a2 a1 a2 α γ 0 σ 0 σ = σ σ . (a1) (a2) a1 a2 β δ  α β   α β  0 Taking determinants on both sides shows that γ δ , γ δ ∈ SL2(Z), since Ba and Ba Q (X,Y) Q 0 (X,Y) are both normalized. Hence Ba and Ba are equivalent. Now, take k ∈ L+, i.e., a totally positive element of L. Then   √ √ ka1 ka2 σ det(σi(a j)) = det σ σ σ σ = kk ∆N(a) = NrmL/Q(k) ∆N(a), k a1 k a2 so kBa = {ka1,ka2} is also a normalized basis. This means in particular that QkBa (X,Y) is a well-defined quadratic form. Since ka = hkia, where ka = {ka | a ∈ a}, and this certainly holds for a = oL, we have

N(koL) = N(hkioL) = N(hki)N(oL) = NrmL/Q(k). This leads to −1 QkBa (X,Y) = N(ka) NrmL/Q(Xka1 +Yka2) −1 −1 = N(a) NrmL/Q(k) NrmL/Q(k)NrmL/Q(Xa1 +Ya2) = Qa(X,Y). Hence, two quadratic forms associated to ideals in the same narrow class, are equivalent. Let Q∆ denote the set of all equivalence classes of quadratic forms with discriminant ∆ (i.e., the discriminant of L/Q). What we have done so far is constructing a map + π∆ : Cls (L) −→ Q∆, associating to every narrow class of fractional ideals an equivalence class of quadratic forms of discriminant ∆. We can also go the other direction.√ √ Remember that if L = Q( ∆) we can take as a basis for oL the set {1,(∆ + ∆)/2}, regardless of any congruence conditions. 103

2 2 Pick a quadratic form Q(X,Y) = aX √+ bXY + cY . We define a oL-module (fractional ideal) m by taking as basis Bm := {a,(b− ∆)/2}. We need to check that this indeed defines an oL-module, i.e., that multiplying any element in m by any element from oL results in an element of m. It is clearly sufficient to do this for the basis sets of oL and m. Clearly, that √  b − ∆  1 · a ⊕  ⊆ m Z Z 2 is obvious so we only need to verify √ √ ∆ + ∆ b − ∆ (16.2) · a ∈ Za ⊕ Z 2√ √ 2 √ ∆ + ∆ b − ∆ b − ∆ (16.3) · ∈ a ⊕ . 2 2 Z Z 2 For (16.2) observe that √ √ √ √ ∆ + ∆ a∆ + a ∆ ab − ab + a∆ + a ∆ b − ∆ b + ∆ · a = = = −a + a . 2 2 2 2 2 Since ∆ = Disc(Q) = b2 −4ac we see that b2 ≡ ∆ (mod 4) and so b ≡ ∆ (mod 2)36. Hence (b + ∆)/2 ∈ Z so (16.2) is proved. To prove (16.3) we use the following trick: √ √ √ √ √ √ ∆ + ∆ b − ∆ (∆ + ∆)(b − ∆) ∆b − ∆ ∆ + b ∆ − ∆ · = = 2 √ 2 √ 4 √ √ 4 √ √ ∆(b − ∆) + b ∆ − ∆ ∆ b − ∆ b ∆ − ∆ ∆ b − ∆ b ∆ − b2 + 4ac = = + = + 4 2 2 √ 4 √ 2 2 4√ ∆ b − ∆ −b(b − ∆) ∆ − b b − ∆ = + + ac = + ac 2 2 4 2 2 and once again b ≡ ∆ (mod 2) so (16.3) follow. Hence we have constructed a fractional ideal from Q(X,Y), proving a part of the fol- lowing theorem: + Theorem 16.2. The map π∆ : Cls (L) → Q∆ is a set bijection. What remains to prove is that we actually get a class in Cls+(L), and this would finish the surjection part, in addition to showing that π∆ is injective. Remark 16.2. This theorem implies, since Cls+(L) is a finite group, that the number of equivalence classes of quadratic forms is finite. In addition, the class numbers hQ(∆) and + + hL (∆) := |Cls (L)| are the same. 16.3. What does this have to do with the class number problem? What we have shown is that there is a bijection between quadratic forms with discriminant ∆ and narrow classes of fractional ideals quadratic number fields of discriminant ∆:  narrow classes of fractional ideals in   equivalence classes of quadratic  ' quadratic fields of discriminant ∆ forms with discriminant ∆ So instead of considering quadratic forms we can study fractional ideals. Or, put more correctly:

36 This follows since if t is a solution to congruence equation f (z) ≡ 0 (mod m), for f ∈ Z[z], and d|m, then t is also a solution to f (z) ≡ 0 (mod d). 104 D. LARSSON

Reformulation of the class number problem: It is equivalent to, on the one hand, study narrow classes of fractional ideals in quadratic fields of discriminant ∆ and, on the other, to study equivalence classes of quadratic forms of discriminant ∆. Hence if we can determine which quadratic fields (i.e., their discriminant) have a given class number, then we know the class number for quadratic forms of that discriminant also. I will now present the ideas behind the solution to the class number one problem, i.e., + the problem of determining the discriminants of the quadratic fields having hL (∆) = 1. This is based on the following observation:

Theorem 16.3. Let oK be the ring of integers in a number field (not necessarily quadratic) K/Q. Then oK is a PID ⇐⇒ oK is a UFD, i.e., hK(∆) = 1 if and only if oK is a unique factorization domain. The implication ’⇒’ is true for any ring, but the implication in the other direction is false in general. To prove the reverse implication, certain properties that oK enjoy, and that only a specific class of rings satisfy (Dedekind domains), is needed. Hence, going back to quadratic number fields, determining which oL are PID’s, i.e., hL(∆) = 1, is equivalent to determining which oL has unique factorization into irreducibles. However, before I continue on this, I will build up a little suspense, a cliffhanger if you will, and digress (seemingly) to something differently (once again, seemingly). 16.4. Zeta functions and the Riemann hypothesis. Euler, Dirichlet and primes. The main motivation for Euler37 in this context was to get a feel for the distribution of primes on the Z-axis. Indeed, he wanted to estimate the “prime counting function” π, which returns the number of primes less than or equal to the input, π(n) = #{primes, less than or equal to n}. Euler introduced the following function: ∞ −s ζ(s) := ∑ n , for s ∈ R, n=1 nowadays called the Euler ζ-function. One can prove (I don’t think Euler did this though) that ζ(s) converges uniformly for s ∈ [a,∞), a > 1. He did however prove that 1 ζ(s) = ∏ −s , for s > 1, p prime 1 − p a remarkable result. Taking log of both sides of the above equation yields logζ(s) = − ∑ log(1 − p−s). p prime As s → 1+, logζ(s) → ∞, so ζ(1) is a pole (in modern terminology). Also in modern parlance, Euler proved that 1 ∑ = loglogN + O(1), p

37Leonhard Euler (1707–1783). 105 showing, not only, that the number of primes is infinite38, but also a rather accurate estimate for the number of primes since it is possible to argue from this result that n π(n) ∼ , logn which we recognize as the prime number theorem. I don’t know exactly what Euler did or did not prove or know or suspect concerning this but this is nonetheless the gist of Euler’s contribution to this area which will interest us here. Dirichlet wanted to generalize this by considering the number of primes in given arith- metic progressions. This means the following: pick a number n. Question: how many primes are there congruent a modulo n, with gcd(a,n) = 1? He suspected that there were infinitely many. In other words, he conjectured that #{p ∈ Spec(Z) | p ≡ a (mod n)} = ∞ ⇐⇒ #{p ∈ {a + kn | k ∈ Z}} = ∞. He thought he could prove this by proving that 1 ∑ p p≡a (mod n) p prime is divergent. This was exactly Euler’s method, but summing, not over all primes, but only over the progression {a + kn | k ∈ Z}. In his attempt at doing this he introduced a class of ζ-functions which are now called Dirichlet’s L-functions, these being a special case of more general L-functions introduced by Dedekind, Hecke and Artin (among others). To define these L-functions in Dirichlet’s sense we need an idea from . Namely, group characters. Definition 16.2. Let G be an (abelian, for simplicity) group. Then a group character is ∗ a group homomorphism χ : G → C := C \{0}. In our case, we restrict attention to the following special case, which we can call Dirichlet characters: fix n ∈ Z and take G = ∗ ∗ (Z/nZ) , where (Z/nZ) is the group of invertible elements of Z/nZ. Then a Dirichlet ∗ character is a group morphism χ : (Z/nZ) → S1(C) := {z ∈ Z | |z| = 1}. Now, Dirichlet defined the L-function or L-series as ∞ −s 1 L(s, χ) := ∑ χ(n)n = ∏ −s . n=1 p prime 1 − χ(p)p This is a function of a complex variable s with ℜ(s) > 1, but can be analytically continued to a meromorphic function on the whole C. Using this L-function Dirichlet managed to show that, there are indeed infinitely many primes in arithmetic progressions by an argu- ment similar to Euler’s. Enter Riemann. Bernhard Riemann (1826–1866) wrote one paper in number theory (in 1859) during his short career. In this seven pages paper he managed to squeeze in, more like an afterthought, what has arguably become the most famous mathematical (unsolved) conjecture, now that Fermat’s last theorem has been cracked. He wrote: “One would of course like to have a rigorous proof of this, but I have put aside the search for such a proof after some fleeting vain attempts because it is not necessary for the immediate objective of my investigation”39

38Since the summation is over all primes and letting N → ∞ the right-hand side becomes infinity. 39Translated by Harold M. Edwards in The Riemann zeta function, Academic Press, New York, 1974. 106 D. LARSSON

What his paper was actually about was outlining a program for proving the prime number theorem, which by this time had been been conjectured in a more definitive form by Gauss. In doing this Riemann derived certain functional relations that ζ(s) satisfies. He did, how- ever, make one important adjustment to the definition: he proved that ζ(s) is meromorphic on the whole complex plane, with a simple pole at s = 1. This means that ζ(s) can be extended to s ∈ C without restrictions. So what is this conjecture about? The classical Riemann Hypothesis says: Conjecture (Classical Riemann Hypothesis). The complex, non-trivial, zeros of ∞ ζ(s) = ∑ n−s n=0 1 all lie on the so-called critical line ℜ(s) = 2 . The trivial zeros are the zeros along the negative x-axis, namely: {−2,−4,−6,...}. By doing many, long and hard computations, Riemann knew that the first 300 or so zeros all lie on ℜ(s) as was realized (long) after his death by decoding his many notebooks. He also knew that all non-trivial zeros, if they exist, must lie symmetrically in the critical 1 strip 0 < ℜ(s) < 1 with symmetry axis ℜ(s) = 2 . In fact, proving that no zeros lie on ℜ(s) = 1, was a big part for Hadamard40 and de la Vallee-Poussin´ 41 when they (independently) proved the prime number theorem in 1896. A major indication that the strange and crazy hypothesis could actually be true was presented in 1916 when Godfrey Harold Hardy (1877–1947) proved that there are infinitely 1 many zeros of ζ(s) on ℜ(s) = 2 . Also, the Swedish mathematician Helge von Koch (1870– 1924)42 proved in 1901 that the Riemann Hypothesis is equivalent to a stronger form of the prime number theorem, namely that for every ε > 0, Z n dt π(n) − = O(n1/2+ε ). 0 ln(t) Up to now (2004 at least) the first ten trillion zeros has been checked (according to Wikipedia) and the hypothesis still stands. There are other, non-numerical, support of the Riemann hypothesis but this would take us too far afield. Suffice it to say that, if the hypothesis turned out to be false, certain other mathematical objects would behave very badly, and this in turn would be very unexpected. Some people have also made the bold suggestion that the Riemann hypothesis is one of the true theorems in arithmetic that cannot be proved (within in the appropriate system of axioms) as given by the famous Godel¨ theorem.

Generalized Riemann Hypothesis. Let me also briefly mention here that there is a gener- alized version of the Riemann Hypothesis that actually is in a sense the “real” hypothesis. This concerns a large class of L-functions, the so-called global L-functions. To go into this would constitute a whole book, totally ignoring the fact that I’m not competent to write it, so I leave the subject swiftly by uttering a few remarks. Global L-functions come mainly from two sources: geometric objects and automorphic forms (which are certain meromorphic functions which are invariant “up to a function” under a group action). It is conjectured, or hoped, that these two sources of L-functions

40Jacques Hadamard (1865–1963). 41Charles Jean Gustave Nicolas Baron de la Vallee´ Poussin (1866–1962). 42He is the one with the famous fractal snowflake. 107 are actually incarnations of a hidden “super L-function”. This is actually a vast general- ization of the Taniyama–Shimura conjecture, a special case of which Andrew Wiles and Richard Taylor proved to get to the proof of Fermat’s last theorem. In fact, it was proved several years before that the special case of Taniyama–Shimura conjecture that Wiles and Taylor proved, would together with the so-called ε-conjecture (proved by Kenneth Ribet in 1990), imply Fermat’s last theorem. Since then the full conjecture was proved in 2001 by Christophe Breuil, Brian Conrad, Fred Diamond and Richard Taylor. Therefore, the Generalized Riemann Hypothesis would be a great leap forward in un- derstanding the very deep connections between geometry, number theory and complex analysis.

16.5. Back to quadratic fields. Let us now continue our story about quadratic fields and answer the question what this all has to do with the Riemann Hypothesis. Around 1918 Erich Hecke (1887–1947) proved the following theorem (given here in simplified form): Theorem 16.4 (Hecke). If the Generalized Riemann Hypothesis is true then Gauss’ class number conjecture (problem) is also true. On the other hand, in 1933, the following strange result was announced and proved by Max Deuring (1907–1984), and in the form given here by Louis Mordell (1888–1972): Theorem 16.5 (Deuring–Mordell). If the classical Riemann Hypothesis is false, then h(∆) → ∞ as ∆ → −∞ (but h(∆) is finite for every ∆). A year later, in 1934, Hans Heilbronn (1908–1975) proved Theorem 16.6 (Heilbronn). Suppose the Generalized Riemann Hypothesis is false, then h(∆) → ∞ as ∆ → −∞. Corollary 16.7 (Hecke–Deuring–Mordell–Heilbronn). Gauss’ conjecture is true! Proof. This follows since if the Riemann Hypothesis is true, then Gauss conjecture is true, and if the (Generalized/classical) Riemann Hypothesis is false, then Gauss conjecture is 43 still true .  Later in 1934 (this was apparently eventful two years), Heilbronn–Linfoot (Edward Linfoot, 1905–1982, was actually an astronomer) proved: Theorem 16.8. There are at most 10 negative discriminants with class number one: h(∆) = 1 for ∆ = −3,−4,−7,−8,−11,−19,−43,−67,−163, X? (all these were presumably known to Gauss). So, the list of discriminants with one equivalence class of quadratic forms, was complete with one possible exception. One outcome of this was that if a tenth discriminant ∆ with h(∆) = 1 existed then the Generalized Riemann Hypothesis had to be false! Put differently, if there was ten imaginary quadratic number fields with class number one, then the Generalized Riemann Hypothesis couldn’t possibly be true. Notice, however, that this says nothing about the classical Riemann Hypothesis; this could still be true regardless.

43Some logicians might object to this reasoning. 108 D. LARSSON

Now, a feverish activity was initiated determining whether this tenth imaginary qua- dratic number field existed or not. One could show that if the tenth field did exist, its discriminant ∆ would have to be very small, i.e., |∆| very large. In 1952, a German school teacher, Kurt Heegner (1893–1965) published a paper where he proved that no such tenth imaginary field could exist! The method he used was re- markable and not many understood its finer details. In fact, the paper was received by the mathematical commutity with a rather large dose of skepticism and was generally believed to be wrong. Then, in 1966/1967, Alan Baker, (born 1939), and Harold Stark (also born in 1939), independently showed that there is no tenth imaginary quadratic number field with class number one. Only then, one year after Heegner’s death, was it realized that his proof was essentially correct (a few mistakes were found but these were easily corrected). In 1971, Baker and Stark (once again independently) proved that there are exactly eigh- teen quadratic number fields with class number two. In fact, Theorem 16.9. The class number h(∆) is two for exactly: ∆ = −15,−20,−24,−35,−40,−51,−52,−88,−91,−115,−123, − 148,−187,−232,−235,−267,−403,−427. Notice that everything here is concerned with imaginary quadratic number fields, i.e., where ∆ < 0. Significantly less is known for the real case, i.e., when ∆ > 0. It is conjectured (Gauss?) that there are infinitely many real quadratic number fields of class number one. But this is still an open problem. Although the issue of the class number being finite for every discriminant was solved, the question of actually determining the class number, was open for some more years. Then in 1985, Dorian Goldfeld (born 1947), proved using L-functions associated to elliptic curves (so called Hasse–Weil L-functions44 together with recent results (1983) of Benedict Gross (born 1950) and Don Zagier (born in 1951), that the Gauss class number problem can be reduced to a finite computation. This means that there is a finite algorithm (i.e., an algorithm that produces a result within a finite time) that can, given a class number, find all negative discriminants of that class number.

Theorem 16.10.√ By computation it is possible to show that unique factorization fails in oK, K = Q( D), D < 0, for (at least): D = −5,−6,−10,−13,−14,−15,−17,−21,−22,−23,−26,−29,−30. Notice that this is not the discriminants. Those are given as ( 4D if D 6≡ 1 (mod 4) D if D ≡ 1 (mod 4). For positive D we have the following theorem: Theorem 16.11. For 0 < D ≤ 100, factorization is unique, precisely for the following D’s: D = 2,3,5,6,7,11,13,14,17,19,21,22,23,29,31,33,37,38,41,43,46,47,53, 57,59,61,62,67,69,71,73,77,83,86,89,93,94,97. (For all other D, o √ is not a UFD.) Q( D)

44After, (1898–1979) and Andre´ Weil (1906–1998). 109

17. APPENDIX A:LINEARALGEBRA 17.1. Vector spaces and bases.

17.1.1. Vector spaces. Let F be a field. A vector space over F is an abelian group of V (of ’vectors’) under addition such that, for v,w ∈ V αv ∈ V, α ∈ F, α(v + w) = αv + αw. 17.1.2. Bases. A set B of vectors is called a linear independent set if

∑ αbb = 0 =⇒ F 3 αb = 0, for all b ∈ B. b∈B The set B is a basis for V if B is a linear independent set and every v can be written uniquely as a linear combination

v = ∑ αbb, αb ∈ F. b∈B One can show that every vector space has a basis, and all basis sets has the same cardinality (number of elements) and this number (including ∞) is called the dimension of V. 17.2. Maps.

17.2.1. Linear maps. Now, let V and W be two vector spaces over a field F with n := 0 dimF(V) and m := dimF(W).A linear map F : V → W is a map such that F(αv + βv ) = 0 0 αF(v) + βF(v ), for v,v ∈ V and α,β ∈ F (or, in high-flung language, F is a morphism 45 of F-modules). The set of linear maps from V to W is denoted in algebraic contexts as HomF(V,W) or Hom(V,W) if the field is understood. We will be almost exclusively n be interested in the case when V = W. Choose a basis e := {ei}i=1 for V. Then it is n possible to express a linear map as a matrix. Indeed, F(ei) = ∑ j=1 a jie j and so we can take A := (a ji) as the matrix representing F in the basis e. Let ker(F) = {v ∈ V | F(v) = 0} and im(F) = {w ∈ W | ∃v ∈ V,F(v) = w}. Notice that this is exactly the same definition as for rings or groups. Recall that the rank of F is the dimension of im(F) as an F-vector space, or equivalently, the number of linearly independent columns (or rows) in a matrix representing F.

17.2.2. Multilinear maps. A multilinear map F is a mapping F : V ×k := V ×V × ··· × V → W, linear in each argument, that is 0 0 F(v1,...,αvi + βvi,...,vk) = αF(v1,...,vi,...,vk) + βF(v1,...,vi,...,vk) for all 1 ≤ i ≤ k. Notice that the case k = 1 is the case of linear maps. A multilinear map is called alternating or skew-symmetric if

F(vi,...,vi,vi+1,...,vk) = −F(v1,...,vi+1,vi,...,vk), for all 1 ≤ i ≤ k − 1. This is equivalent to F(v1,...,vi,...,vi,...,vk) = 0 for all 1 ≤ i ≤ k. On the other hand, if

F(vi,...,vi,vi+1,...,vk) = F(v1,...,vi+1,vi,...,vk), ×k for all 1 ≤ i ≤ k − 1, then F is called symmetric. Multilinear maps F : V → F, is called a (multilinear) form.

45Analysts often use different notations such as Lin(V,W). It is simply a matter of taste and tradition. They all mean basically the same thing. 110 D. LARSSON

17.2.3. Bilinear maps. Bilinear maps are multilinear maps F :V ×V →W, i.e., with k = 2. The set of bilinear maps are often denoted Bil(V,W). A bilinear map F is called non- degenerate if

{v ∈ V | F(v,x) = 0, for all x ∈ V} = {v ∈ V | F(x,v) = 0, for all x ∈ V} = 0.

17.3. Dual spaces. Associated to every vector space V there is a dual space V ∨ of linear ∨ ∨ maps V → F, that is, V := Hom(V,F). If e is a basis for V then there is a unique basis e of V ∨ such that ( ∨ 0 if i 6= j e (ei) = δi j := j 1 if i = j,

∨ ∨ (where δi j is called the Kronecker delta), for e j ∈ e and ei ∈ e. If we think of elements ∨ ∨ of V as column vectors, we can think of elements of V as row vectors, and e j (ei) then ∨ T becomes (e j ) · ei (matrix multiplication).

17.3.1. Dual spaces and bilinear maps. Given a non-degenerate bilinear map F : V ×V → F we can view the map v 7→ F(v,·) as a morphism ∨ ∨ ∨ (·)F : V → V , (v)F := F(v,·). This can easily be shown to be an isomorphism of vector spaces (i.e., it is linear and bijective). This shows that we can find another basis f of V such that F( f j,ei) = δi j, for T f j ∈ f and ei ∈ e. We say that f is F-dual to e. In matrix language, we have ( f j) ·(a ji)·ei = δi j, where (a ji) is the matrix of F in the basis e.

17.4. Operations on maps.

17.4.1. Trace of a linear map. The trace of F is defined as the sum of all the diagonal elements in a matrix representing F, that is, Tr(F) = Tr(A) = ∑i aii. This definition is independent of the choice of basis. It is easy to show that if F and G are two linear maps with matrices A and B respectively, then Tr(AB) = Tr(BA).

17.4.2. Determinant of a linear map. The determinant is a map det : Matn(F) → F, mul- tilinear with respect to columns (or rows) such that det(1n) = 1 ∈ F, where Matn(F) is the set of all n × n-matrices with entries in F and 1n is the n × n-identity matrix (i.e., the matrix with 1 along the diagonal and zero elsewhere). By representing a linear map F as a matrix (with respect to a certain basis) we can speak of the determinant of F. We shall jump between these designations freely, treating them as being equivalent. One can show that there is exactly one such map det and all the usual operations, such as column-row reduction, applies.

17.4.3. Characteristic polynomial. Recall also that the characteristic polynomial to a linear map F is defined as PF (z) := det((ai j) − z1n), where (ai j) is the matrix associated with F in some basis. Expanding this determinant

n n−1 n PF (z) = z − pn−1z + ··· + (−1) p0, one can readily show that pn−1 = Tr(F) and p0 = det(F). 111

17.4.4. Discriminants. Let F : V × V → F be a bilinear form. Then the discriminant n of F is defined as det(F(ei,e j)), denoted Disc(F,e), where e = {ei}i=1 is a basis for V. Changing basis f = ωe, yields the relation 2 Disc(F,f) = det(F( fi, f j)) = [ check this step ] = det(ω) det(F(ei,e j)) = = det(ω)2Disc(F,e). If the basis is understood, we simply write Disc(F). 17.5. Linear equations and inverses. 17.5.1. Cramer’s rule. Suppose we are given a system of equations ∑ωi jx j = yi, ω := (ωi j). j Then we have det(ω)x j = det(ω( j)), T where ω( j) is the matrix obtained from ω by replacing the j-th column by y = (y1,...,yn) . Proof. The result follows by expanding the determinant   ω11 ··· ∑ j ω1 jx j ··· ω1n  . . .   . . .  ωn1 ··· ∑ j ωn jx j ··· ωnn using standard properties of determinants.  17.5.2. The adjugate formula. Another useful formula from linear algebra is (17.1) A−1 = det(A)−1adj(A), where A is an invertible matrix and adj(A) is the adjugate of A. This is a matrix, if you remember, formed by certain determinants of submatrices of A, namely the minors of A (times certain powers of −1). The exact definition is not important, we note only that if A is an integral matrix (i.e., all entries are integers) then all minors are also integers (being determinants) and hence adj(A) is also integral.

17.5.3. Determinants over rings. Now suppose that ω = (ωi j) is a matrix with entries in a (commutative) ring R with unity. We have the following criterion: ω is invertible ⇐⇒ det(ω) is a unit in R. Indeed, suppose ω is invertible, then −1 −1 1 = det(1n) = det(ωω ) = det(ω)det(ω) . On the other hand, if det(ω) is a unit, we have ω−1 = det(ω)−1adj(ω), by 17.5.2. 17.6. Modules. 17.6.1. Definition. Let R be a ring. Then an R-module M is an abelian group such that • rm = mr ∈ M for all r ∈ R and m ∈ M; • r(m1 + m2) = rm1 + rm2 for r ∈ R and m1,m2 ∈ M; • (r1 + r2)m = r1m + r2m for r1,r2 ∈ R and m ∈ M; • 1m = m, for all m ∈ M. 112 D. LARSSON

17.6.2. Direct sums of modules. We define the direct sum of a collection {Mi} of mod- L ules (I is an index set), written i∈I Mi by, M Mi := {mi1 + mi2 + ··· + min | i1,i2,··· ∈ I, n < ∞, mik ∈ Mik }. i∈I L Notice that every element in i∈I Mi is a finite sum of elements from the Mi’s. As for the direct product (of rings) we will encounter only the case when I is a finite set. This will thus boil down to,

M1 ⊕ M2 ⊕ ··· ⊕ Mn := {m1 + m2 + ··· + mn | mi ∈ Mi}, and this becomes an R-module with term-wise addition and scalar multiplication which is extended distributively, 0 0 0 0 0 (m1 + ··· + mn) + (m1 + ··· + mn) := (m1 + m1) + (m2 + m2) + ··· + (mn + mn) r(m1 + ··· + mn) := rm1 + ··· + rmn.

17.6.3. Examples and some properties. Example 17.1. We have the following examples. • If R = F for F a field, then an R-module is a vector space. • A Z-module is an abelian group. • The set Rn := R ⊕ R ⊕ ··· ⊕ R (n times) is an R-module. Modules of this type are called free modules. The number n is called the rank of Rn. • A free F-module (where F is a field) is once again the same as a vector space. n • The free Z-module Z is called an n-dimensional (integral) lattice. Notice that the case n = 1, gives us the case of abelian groups from above. Modules over commutative rings behave in very much the same way as modules over fields (i.e., vector spaces). For instance, • Every free module has a linear independent basis and the number of basis vectors in two different basis sets are the same. • Linear mappings between free modules can be given in terms of matrices. • The definition of matrices and determinants are exactly the same, and the same methods, e.g., Gaussian elimination, row-column expansions of determinants, ap- ply in the case of modules also.

17.7. Vandermonde determinants. Form the polynomial ring A := F[t1,...,tk]. Consider the matrix  1 1 ... 1   t1 t2 ... tk   2 2 2   t t ... t  V :=  1 2 k .  . . .   . . .  k−1 k−1 k−1 t1 t2 ... tk A matrix of this form is called a Vandermonde matrix46 and we want to compute the determinant of this. This is classical theory; matrices and determinants of Vandermonde- type has numerous applications throughout mathematics.

46Alexandre-Theophile´ Vandermonde (1735–1796) was, strangely enough, a musician and chemist and not a mathematician. 113

First of all, notice that if ti = t j for some i 6= j we get that P := det(V) ∈ A is zero, so (ti −t j) is a factor of P, i.e., P = (ti −t j)P1 for some P1 ∈ A. If ti = tl for i 6= l 6= j, we also get zero, so (ti −tl) must be a factor of P1. Continuing this way we see that P = ∏(ti −t j) + R, for some R ∈ A. i< j

By a degree-argument, comparing ∏i< j(ti − t j) and det(V), one realizes that R must be identically zero and so P = ∏(ti −t j). i< j

18. APPENDIX B:CHINESEREMAINDERTHEOREM By the ring-theoretic version of the Chinese remainder theorem we have r [ ∼  [ vi Fp[z]/hF i = ∏Fp[z] h(Fi ) i. i=0 [ Via this isomorphism the prime ideals of Fp[z]/hF i correspond to the prime ideals of Fp [ vi modulo (Fi ) . Recall that this isomorphism comes from

f : A → A/i1 × A/i2 × ··· × A/im, a 7→ (a (modi1),a (modi2),...,a (modim)), where ii + i j = A for all i 6= j. That this map is surjective is the actual statement of the Chinese remainder theorem. The kernel is ∩iii, which under the assumption ii + i j = A, equals i1i2 ···ir, so r  ∼ A ∏ii = A/i1 × A/i2 × ··· × A/im. i=1 Any ideal of ∏i A/ii comes from an ideal i of A such that i ⊇ ∏i ii. [ What this discussion boils down to is that every prime ideal p in Fp[z]/hF i comes from r  [ vi one, and only one, prime ideal p1 ×p2 ×···×pr ⊆ ∏i=0 Fp[z] h(Fi ) i and each pi satisfies [ vi pi ⊇ h(Fi ) i.

ROOM 14261 E-mail address: [email protected] URL: http://www.math.uu.se/staff/pages/?uname=larsson