<<

MATH 404: ARITHMETIC

S. PAUL SMITH

1. Divisibility and After learning to count, add, and subtract, children learn to multiply and divide. The notion of division makes sense in any ring and much of the initial impetus for the development of abstract algebra arose from problems of division√ and factoriza- tion, especially in rings closely related to the integers such as Z[ d]. In the next several chapters we will follow this historical development by studying arithmetic in these and similar rings. √ To see that there are some serious issues to be addressed notice that in Z[ −5] the number 6 factorizes as a product of irreducibles in two different ways, namely √ √ 6 = 2.3 = (1 + −5)(1 − −5). 1 It isn’t hard to show these factors are irreducible but we postpone√ that for now . The key point for the moment is to note that this implies that Z[ −5] is not a unique factorization domain. This is in stark contrast to the integers. Since our teenage years we have taken for granted the fact that every integer can be written as a product of primes, which in Z are the same things as irreducibles, in a unique way. To emphasize the fact that there are questions to be understood, let us mention that Z[i] is a UFD. Even so, there is the question which primes in Z remain prime in Z[i]? This is a question with a long history: Fermat proved in ??? that a prime p in Z remains prime in Z[i] if and only if it is ≡ 3(mod 4).√ It is then natural to ask for each integer d, which primes in Z remain prime in Z[ d]? These and closely related questions lie behind much of the material we cover in the next 30 or so pages. Definition 1.1. Let a and b be elements of a R. We say that a divides b in R if b = ar for some r ∈ R. We then call b a multiple of a and write a|b. ♦ Every element divides zero. Zero divides no elements other than itself. At the other end of the spectrum, 1 divides every element. But 1 is not the only element with this property; a unit u divides every element because b = u(u−1b). Conversely, if u divides every element of R it is a unit.

1.1. Greatest common . Let R be a domain. A greatest common of two elements a, b ∈ R is an element d ∈ R such that (1) d|a and d|b, and (2) if e|a and e|b, then e|d.

1See page 10. 1 2 S. PAUL SMITH

We write d = gcd(a, b), or just d = (a, b). We say that greatest common divisors exist in R if every pair of elements in R has a greatest common divisor in R. The greatest common divisor is not unique. For example, in the , both 2 and −2 are greatest common divisors of 6 and 10. Similarly, in Z[i] both 2 and 2i are greatest common divisors of 4 and 6.

Lemma 1.2. Let R be a domain. If d and d0 are greatest common divisors of a and b, then each is a unit multiple of the other.

Proof. Because d and d0 divide each other, we have d0 = du and d = d0v, for some elements u and v. Hence duv = d; because R is a domain and d is non-zero, we may cancel to get uv = 1.  To obtain uniqueness of a greatest common divisor we need some additional structure on R. For example, in Z if we also insist that the greatest common divisor be positive, then it becomes unique. √Actually, we haven’t even shown that greatest common divisors exist in Z or Z[ d]. There is something to do here. We can define the greatest common divisor of any collection of elements by saying that d is a greatest common divisor of a1, . . . , an if it divides each ai, and if e is any element of R dividing all of them, then e necessarily divides d. Notation. Sometimes we write (a, b) for the greatest common divisor of two integers a and b. This notation is also used to denote the generated by a and b. For some rings there is an equality of ideals, (a, b) = (d), when d is a greatest common divisor of a and b.

1.2. Primes and Irreducibles. First we introduce some terminology: • A factorization a = bc is proper if neither b nor c is a unit. • Elements x and y are associates if each is a unit multiple of the other. Elements x and yin a domain R are associates if and only if xR = yR. This is not true when R is not a domain: x and xy generate the same ideal in the ring R = C[x, y, z]/(x(yz − 1)) but xy 6= xu for any unit u—check! Definition 1.3. Let R be a commutative ring. A non-zero non-unit a ∈ R is • irreducible if it has no proper ; • prime if whenever a|bc either a|b or a|c. ♦ If a ∈ R and u is a unit, then au is irreducible (resp., prime) if and only if a is. Check this.

Lemma 1.4. In a commutative domain every prime is irreducible.

Proof. Let p be prime. If p = bc then, perhaps after relabelling the factors, p|b, so b = pu and p = puc; we can cancel in a domain, so 1 = uc, whence c is a unit. 

The converse of this lemma is not always true: an irreducible√ need not be prime. For example, we shall√ later see√ that 2 is irreducible in Z[ −5] but not prime because it divides 6 = (1 + −5)(1 − −5) but does not divide either factor. Notice, too, that being prime√ or irreducible depends on the ring: for example, 2 is prime in Z but not in Z[ −5] or in Z[i]. MATH 404: ARITHMETIC 3

2. Arithmetic in Z 2.1. Ordering. Here are some basic properties of the set of integers: • Z is the disjoint union of the positive integers, the negative integers, and 0; • we say that m < n if n − m is a positive and m ≤ n if either m < n or m = n; • this is a total ordering meaning that for any pair of integers a and b either a ≤ b or b ≤ a; • a subset S is bounded below if there is N ∈ Z such that s ≥ N for all s ∈ S; • every non-empty set of integers that is bounded below has a least member; • every non-empty set of integers that is bounded above has a maximal mem- ber. 2.2. The . The Euclidean algorithm is the name given to the procedure that is used to compute the greatest common divisor of two integers. It is, essentially, the repeated application of the following fundamental result. Theorem 2.1. Let a and b be integers and suppose that b 6= 0. Then there are unique integers q and r such that a = bq + r and 0 ≤ r < |b|. Proof. First suppose b > 0. Hence b(n − 1) < bn for all integers n. It follows that the set {n | bn ≤ a} has a largest element. Let q be that element. Then bq ≤ a < b(q + 1) so 0 ≤ a − bq < b. Setting r := a − bq gives the result. Now suppose b < 0. Then we can write a = (−b)q + s where 0 ≤ s < −b. Hence a = b(−q) + s. To see the uniqueness of q and r, suppose that a = bq1 + r1 = bq2 + r2. Then r1 − r2 = b(q2 − q1). But −|b| < r1 − r2 < |b|, so b(q2 − q1) = 0, whence q1 = q2 and r1 = r2.  Notice that the key to the proof is the fact that if a non-empty set of integers is bounded above, then it has a largest element.

The Algorithm. Let a and b be integers and suppose that b 6= 0. By repeatedly using Theorem 2.1 we may write

a = bq1 + r1 with 0 ≤ r1 < |b|,

b = r1q2 + r2 with 0 ≤ r2 < r1,

r1 = r2q3 + r3 with 0 ≤ r3 < r2, ······ .

Since |b| > r1 > r2 > · · · ≥ 0 this process must stop, say at rt+2 = 0. Settinget r−1 = b and r0 = a, the general equation becomes

(2-1) ri = ri+1qi+2 + ri+2 with 0 ≤ ri+2 < ri+1, and the last equation becomes

rt = rt+1qt+2.

Claim: rt+1 = gcd(a, b). Proof: Since rt+1 divides rt, it follows from (2-1) that rt+1 also divides rt−1. By descending induction, (2-1) implies that rt+1 divides all ri, i ≥ −1. In particular, rt+1 divides f and g. On the other hand, if e divides both 4 S. PAUL SMITH f and g, then it divides r1. If e divides ri and ri+1, then it follows from (2-1) that it also divides ri+2. By induction, e divides rt+1. Hence rt+1 is a greatest common divisor of f and g. ♦ Theorem 2.2. Greatest common divisors exist in Z and, if d = gcd{a, b} there are integers u and v such that d = au + bv. Furthermore, aZ + bZ = dZ. Theorem 2.3. Every ideal in Z is of the form dZ for some d ∈ Z. Proof. Let I be a non-zero ideal and d the smallest positive element in I. If a ∈ I, then a = dq + r for some q, r ∈ Z with 0 ≤ r < d. Then d = a − dq is a difference of elements in I so is in I. But r < d, so r = 0. Hence a ∈ dZ. 

2.3. Prime factorization in Z. An integer p is prime if the only numbers dividing it are ±1 and ±p. Lemma 2.4. Let n be an integer ≥ 2. Then n has a positive prime divisor. Proof. Let Φ = {m > 1 | m|n}. Then Φ ⊂ {2, . . . , n}, so has a smallest element, say p. Certainly p divides n. If p were not prime it would be divisible by an integer q satisfying 1 < q < p; but q would then divide n so would belong to Φ and, since it is smaller than p, this would contradict the choice of p. We conclude that p is prime. 

Theorem 2.5. There are infinitely many primes in Z.

Proof. If there were only finitely many positive primes p1, . . . , pt, the number m := 1 + p1 ··· pt would not be divisible by any pi, and this would contradict the previous lemma.  Remark 2.6. It follows that there are either infinitely many primes ≡ 1(mod 4) or ≡ 3(mod 4). Is it possible to use Euclid’s argument to show that there are infinitely many primes of both kinds? This is a classical question and so too is its extension: if a and b are relatively prime integers does the arithmetic sequence (a + bn)n≥0 contain infinitely many primes? I encourage you to try the first question and see that Euclid’s argument extends to one of the congruence classes but not the other. ♦ Lemma 2.7. Let p ∈ Z\{−1, 0, 1}. Then p is prime if and only if it has the following property: whenever p|ab, p|a or p|b. Proof. Since an integer p is prime if and only if −p is prime we can assume p is positive. (⇒) Suppose p divides ab but not a. The greatest common divisor, d = (p, a), is a positive integer dividing p so is either 1 or p. But a is not divisible by p so d = 1. By Theorem 2.2, there are integers u, v such that 1 = pu+av. Hence b = pbu+abv. Since p divides pbu and ab, it divides pbu + abv = b. (⇐) Suppose p has the property that whenever it divides a product it divides one of the factors. To see that p is prime, suppose that d divides p. Write p = dc. Then p divides either c or d so one of these has absolute value ≥ p. It follows that the absolute value of the other factor is ≤ 1, hence equal to 1 which implies that that factor is ±1. Hence p is prime.  MATH 404: ARITHMETIC 5

Theorem 2.8. [The Fundamental Theorem of Arithmetic] Let n be an integer ≥ 2. Then n is a product of primes in a unique way: if

n = p1p2 . . . pr = q1q2 . . . qs with all the pis and qjs positive primes, and p1 ≤ ... ≤ pr and q1 ≤ ... ≤ qs, then r = s and pi = qi for all i.

Proof. By Lemma 2.4, there is a smallest positive prime dividing n, say p1. Write 1 n = p1n1. Since p1 ≥ 2, n1 ≤ 2 n. Applying Lemma 2.4 to n1 now, we can write 1 n = p1p2n2 with p2 a prime and 2 ≤ p1 ≤ p2 and n2 ≤ 22 n. Continuing in this way, we get 1 n = p p . . . p n and n ≤ n. 1 2 t t t 2t This process will stop once 2t > n. Thus n is a product of primes. Now suppose that n = p1p2 . . . pr = q1q2 . . . qs as in the statement of the theorem. Since p1 is prime and divides q1q2 . . . qs, it must equal some qi. But q1 ≤ qi = p1 and p1 is the smallest positive prime dividing n, so p1 = q1. Hence n = p2 . . . pr = q2 . . . qs p1 and repeating the argument we get p2 = q2, ... et cetera. 

2.4. Unique factorization. The Fundamental Theorem of Arithmetic (2.8) is a model for what we would like to do in other rings. This leads to two questions: first, for which rings can every element be written as a product of irreducibles and, second, when is that product unique? Definition 2.9. A commutative domain R is a unique factorization domain, or UFD, if every element of R can be written uniquely as a product of irreducible elements, and the irreducibles that occur in the factorization are unique up to order and multiplication by units. ♦ To see what “uniqueness” means in this definition, consider the factorizations 6 = 2.3 = (−3).(−2) = (−3).(−1).(2).(−1).(−1) in Z. Uniqueness means this: if an irreducible element x divides y, then a unit multiple of x must appear in every factorization of y as a product of irreducibles. Lemma 2.10. In a unique factorization domain, primes and irreducibles are the same. Proof. We observed on page 2 that a prime is irreducible. We now prove the converse. Suppose p is an irreducible dividing bc. Then bc = py for some y. Write each of b, c, and y, as a product of irreducibles. Doing so gives two factorizations of bc as a product of irreducibles. By the uniqueness of such a factorization, at least one of the irreducibles in the factorizations of b and c must be an associate of p. But that implies that p divides either b or c. We conclude that p is prime. 

Proposition 2.11. Let R be a domain in which every non-zero non-unit can be written as a product of irreducibles. Then R is a UFD if and only if primes and irreducibles are the same things. 6 S. PAUL SMITH

Proof. Suppose that primes and irreducibles are the same. To show R is a UFD we need only establish the uniqueness of factorization. Suppose that p1p2 . . . pm = q1q2 . . . qn where each pi and each qj is irreducible. Without loss of generality, we assume that m ≤ n. Since p1 is prime and divides the product of the pjs it must divide one of them, say q1. But q1 is irreducible so has no proper factorizations; hence q1 = p1u1 for some unit u1. If m = 1, it now follows that 1 = u2q2 . . . qm, so m = 1 also. Now suppose that m ≥ 2. Then p2 . . . pm = (q2u2)q3 . . . qn and q2u2 is irreducible. The result now follows by induction on m. 

Proposition 2.12. Let R be a domain. The following conditions are equivalent: (1) for every non-zero non-unit in R, the process of factoring it, with proper factorizations, terminates in a finite number of steps and when it terminates it produces a factorization as a product of irreducibles; (2) every sequence a1R ⊂ a2R ⊂ · · · of principal ideals is eventually constant, i.e., anR = an+1R = ··· . Proof. (1) ⇒ (2) Suppose we have such a sequence of ideals. Then

a1 = b2a2 = b2b3a3 = b2b3b4a4 = ··· . By hypothesis, only a finite number of these factorizations are proper, so at some stage the bis become units; say bi is a unit for all i ≥ n. Then an−1, an,... are associates of one another so an−1R = anR = ··· . (2) ⇒ (1) For the purposes of this proof we will say that a non-zero non-unit in R is bad if the process of factoring it via proper factorizations does not terminate. Suppose that (1) fails; i.e., there is a bad element a. Certainly a is not irreducible, so there is a proper factorization a = a1b1. Since a is bad it follows that either a1 or b1 is bad. Relabelling, if necessary, we can assume a1 is bad. Hence there is a proper factorization a1 = a2b2 with either a2 or b2 bad. Relabelling, if necessary, we can assume a2 is bad. Continuing in this way, we obtain a sequence a1, a2,... of bad elements and proper factorizations ai = ai+1bi+1, hence a chain

Ra ⊂ Ra1 ⊂ Ra2 ⊂ · · · of ideals contradicting the hypothesis that (2) holds. We conclude that R has no bad elements. When the process of properly factoring an element terminates that termination will produce an expression for a as a product of irreducibles. 

Example 2.13. The process of factoring need not terminate, even in a domain in which primes are the same as irreducibles. The ring R = k[t, t1/2, t1/4, ··· ] is a domain in which prime and irreducible elements are the same but it is not a UFD. It fails to be a UFD because some elements, t for example, cannot be written as a product of irreducibles. To see that every irreducible is prime, suppose that x is irreducible and that x|yz. There is a suitably large n such that x, y, and z, all belong n n to = k[t, t1/2, ··· , t1/2 ]; this subring is equal to k[t1/2 ] which is a polynomial ring n in one variable (so a UFD); since x is still irreducible as an element of k[t1/2 ] it is 1/2n prime in k[t ], so must divide either y or z; hence√x is√ prime√ in R. 4 8 The same argument can be used with the ring Z[ 2, 2, 2,...]. ♦ MATH 404: ARITHMETIC 7

3. Principal ideal domains 3.1. An ideal of the form (r) in a ring R is said to be principal. Definition 3.1. A principal ideal domain is a domain in which every ideal is principal, i.e., every ideal consists of multiples of a single element. ♦ Using the Euclidean algorithm is the standard method to show that a ring is a principal ideal domain—the argument in Theorem 2.3 is typical and, as we will see below, can be used to show other rings are PIDs (see Proposition 3.7). We shall see lots of PIDs is chapter 4. Proposition 3.2. Let R be a principal ideal domain. Then (1) greatest common divisors exist in R; (2) if d = gcd(a, b), then d = ax + by for some x, y ∈ R; (3) every irreducible in R is prime. Proof. (1) and (2). The ideal aR + bR is principal, so is equal to dR for some d ∈ R. Clearly, d = ax + by for some x, y ∈ R. We will now show that d is a greatest common divisor of a and b. First, since a and b belong to dR, they are both divisible by d. Second, if e divides both a and b, then aR + bR is contained in eR, so d is a multiple of e. Hence d is a greatest common divisor of a and b. (3) Let a be irreducible, and suppose that a|bc. To show that a is prime, we must show it divides either b or c. Let d = ax + by = gcd(a, b). Since d divides a, either d is a unit or a = du with u a unit. But d|b, so the second alternative implies that a|b. Now suppose that d is a unit; since a divides bc it also divides −1 −1 −1 acxd + bcyd = c(ax + by)d = c. Hence a is prime.  Proposition 3.3. Let R be a PID and p a non-zero non-unit in R. The following are equivalent (1) p is prime; (2) p is irreducible; (3) (p) is maximal; (4) R/(p) is a field. Proof. (1) ⇒ (2) A prime is always irreducible (see page 2). (2) ⇒ (3) If (x) is an ideal that is strictly larger than (p), then p is a multiple of x but x is not a multiple of p. Thus p = xy but y is not a unit. Hence x is a unit and it follows that (x) = R. Hence (p) is maximal. (3) ⇒ (4) There is a bijection between the ideals in R/(p) and the ideals in R that contain (p). Hence (p) is maximal if and only if for all 0 6= u ∈ R¯ := R/(p), Ru¯ = R¯, i.e., u is a unit. But fields are precisely those commutative rings in which every non-zero element is a unit. (4) ⇒ (1) Suppose that p = ab. Then in R¯ := R/(p),a ¯¯b = 0, whence eithera ¯ or ¯ b is zero. Thus either a or b is divisible by p. Hence p is prime. 

Lemma 3.4. In any ring, the union of an ascending chain of ideals I1 ⊂ I2 ⊂ · · · is an ideal.

Proof. If a and b are elements in the union there is an In that contains both of them, whence a ± b belongs to In. Furthermore, ax ∈ In, so in the union, for all x ∈ R.  8 S. PAUL SMITH

Theorem 3.5. Every PID is a UFD. Proof. We will show that the second condition in Proposition 2.12 holds. To that end, let

(a1) ⊂ (a2) ⊂ · · · be an ascending chain of ideals. Their union is an ideal so is equal to some (b). It follows that there are elements ci, i ≥ 1, such that ai = bci. However, b is in the union of the ideals (ai) so belongs to one of them, say to (an). Hence b = anx = bcnx for some x and c. We can cancel because we are in a domain so either cn or x is a unit. Hence b and an are associates so (b) = (an) and it follows that (b) = (ai) for all i ≥ n. Hence the first condition in Proposition 2.12 holds. So, every element can be written as a product of irreducibles. By Proposition 2.11, the ring is a UFD. 

3.2. Euclidean domains. Definition 3.6. A commutative domain R is called a Euclidean domain if there is a function N : R − {0} → Z≥0 such that for any two elements a, b ∈ R with b 6= 0, there exist elements q, r ∈ R such that a = bq + r with r = 0 or N(r) < N(b). We call N a norm. (The q and r are not required to be unique.) ♦ The proof that Z is a PID (Theorem 2.3) extends easily to Euclidean domains. Proposition 3.7. Every Euclidean domain is a principal ideal domain. Proof. Let I be a non-zero ideal and choose 0 6= d ∈ I such that N(d) is as small as possible. If a ∈ I, then a = dq + r for some q, r with r = 0 or N(r) < N(d). But r = a − dq is a difference of elements in I so is in I. If r were not zero N(r) would be strictly less than N(d); but that can’t happen because of the way d was chosen, so r = 0 and hence a ∈ (d).  The standard examples of a Euclidean domain are Z, k[x], and Z[i]. Let’s check this for Z[i]. We define, for x ∈ Z[i], N(x) = |x|2 = xx¯ wherex ¯ is the usual complex conjugate. Visualize the elements of Z[i] as the points of a square lattice in the complex plane. Now suppose we are given a, b ∈ Z[i] with b 6= 0. Multiplication by i rotates a point by an angle of 90◦ in the counterclockwise direction. Multiplication by a positive real number r > 0 stretches the point by a factor of r. In this way one sees that the points of the ideal (b) themselves form a square lattice Zb+iZb with b and ib as a “basis” for the lattice. The length of a side of a “small” squares in the lattice (b) is |b|. Now a is in or on the boundary of one of the “small” squares of the lattice (b) so at least one corner of the square is distance ≤ |b| from a. More formally, there is a q ∈ Z[i] such that |a − bq| < |b|. Setting r = a − qb and squaring the inequality gives 0 ≤ N(r) < N(b). This completes the proof that Z[i] is a Euclidean domain. Corollary 3.8. Z[i] is a principal ideal domain. MATH 404: ARITHMETIC 9

Thus Z[i] is a unique factorization domain and primes are the same as irre- ducibles. It is a nice exercise to find all the primes in Z[i] in terms of the primes in Z. We turn to this problem in the next section. But first notice that primes in Z need not be prime in Z[i]. For example, 2 = (1 + i)(1 − i) is a proper factorization. 3.3. Historical remarks. The fact that the greatest common divisor d of integers a and b can be expressed as ax + by = d for suitable integers x and y has been called B´ezout’s identity. An in which such a formula holds is called a B´ezout domain. More succinctly, a B´ezout domain is a domain in which every finitely generated ideal is principal. The ring k[xp/q | p, q ∈ N] is a B´ezoutdomain but is not a principal ideal domain. Although attributed to B´ezout (1730-1783), some say that B´ezout’s identity is due to Bachet (1581-1638). Perhaps Bachet’s greatest claim to fame is that he translated Diophantus’s Arithmeticorum from Greek to Latin, so providing Fermat (1601-1665) with the stimulus to write in the margin of his own copy of Bachet’s translation the fact that xn + yn = zn has no non-trivial rational solutions when n ≥ 3. B´ezout is better known today for B´ezout’s Theorem, a result in algebraic geom- etry, that he did not prove.

4. Quadratic extensions of the integers 4.1. Squares in Z and Q. Proposition 4.1. Let x ∈ Q. Then there are relatively prime integers a and b, unique up to sign, such that x = a/b.

Proof. This follows from unique factorization in Z: write x as a/b for an arbitrary pair of integers a and b, then write a and b as products of irreducibles; cancel any common factors, i.e., if an irreducible factor of a has an associate appearing in the factorization of b, then cancel. Et cetera. 

Proposition 4.2. An integer is a square in Z if and only if it is a square in Q. Proof. It is clear that a square in Z is a square in Q. Now suppose that d ∈ Z is a square in Q, i.e., there are integers a and b such that d = (a/b)2. Without loss of generality we may assume that (a, b) = 1. Hence 1 = au+bv for some u, v ∈ Z and a = a2u+abv = b2du +abv = b(bdu+av). Hence 2 b divides a. But (a, b) = 1, so b = ±1 and d = a , as required.  √ 4.2. The norm on Z[ d]. Let d be a non-square integer. The ring √ √ Z[ d] := {a + b d | a, b ∈ Z} is called a quadratic extension of the integers. Rings such as these have played,√ and continue to play, a central role in . If d > 0 the elements in [ d] √ Z lie on the real axis. If d < 0 the elements√ in Z[ d] form a rectangular lattice in the complex plane. The structure of Z[ d] is very different depending on whether d > 0 or d < 0. √ Factorization and divisibility questions in Z[ d] are tackled by making use of what one knows√ about factorization and divisibility in Z, and this is done by defining a norm on Z[ d]. 10 S. PAUL SMITH √ √ The norm of x = a + b d ∈ Z[ d] is N(x) := a2 − b2d. √ If d is negative, then, as for Z[i], the norm of x ∈ Z[ d] is equal to xx¯ = |x|2 where x¯ is its complex conjugate. √ Lemma 4.3. Let d be a square-free integer. Let x, y, z ∈ Z[ d]. Then (1) N(x) = 0 ⇔ x = 0; (2) N(xy) = N(x)N(y)√; (3) If x divides z in Z[ d], then N(x) divides N(z) in Z; (4) x is a unit if and only if N(x) = ±1.

Proof. (1) Since d is not a√ square in Z it is not a square in Q. Thus, if a, b ∈ Z and 0 = a2 − b2d = N(a + b d), then a = b = 0. (2) Compute. (3) This follows at once from (2). (4) (⇒) If x is a unit it divides one so its norm divides N(1) = 1 whence N(x) is 1 or -1. √ √ √ √ (⇐) Suppose N(a + b d) = ±1. Then√ (a − b d)(a + b d) = 1 so either a − b d or its negative is the inverse of a + b d. 

Lemma 4.4. Let d be a negative square-free integer. √ √ (1) The element x = a + b d is a unit in Z[ d] if and only if N(x) = 1. (2) The units in Z[i] are {±1,√±i}. (3) If d 6= −1, the units in Z[ d] are {±1}. Proof. (1) Since d < 0, N(x) ≥ 0. It now follows from part (4) of Lemma 4.3 that x is a unit if and only if N(x) = 1. (2)+(3) Since d is negative the only way a2 − b2d can equal 1 is if a2 = 1 and b = 0, leading to the units ±1, or if a = 0, d = −1 and b2 = 1, leading to the units ±i in Z[i].  √ Warning. In general,√ Z[ d] is not a UFD so not a PID and not a Euclidean domain. To see that Z[ −5], for example, is not a UFD it suffices to exhibit an irreducible that is not√ prime. First,√ notice that 2 is not prime because although it divides neither 1 + −5 nor 1 − −5, it divides their product: √ √ (4-1) (1 + −5)(1 − −5) = 6 = 2.3. To see that 2 is irreducible, suppose 2 = bc where b, c ∈ R. Then 4 = N(2) = N(b)N(c). √ If b = x + y −5, then N(b) = x2 + 5y2, and the only way N(b) could divide 4 is if y = 0 and x = ±2 or if N(b) = 1; hence one of the factors of 2 has norm 1 and is therefore a unit. We conclude that 2 is irreducible. In fact, (4-1) gives two different factorizations of 6 into itrreducibles. We have just seen that 2√ is irreducible and a similar argument√ shows that 3 is irreducible. To see that 1 + −5√ is irreducible, write 1 + −5 = ab and suppose a is not a unit.√ Then 6 = N(1 + −5) = N(a)N(b). We just proved there are no elements in Z[ −5] having norm 2 or 3, so it must be that N(a) ∈ {1, 6}. But a is not a unit, so N(a) 6= 1; it follows that N(a) = 6, and hence that N(b) = 1, so b is a unit. MATH 404: ARITHMETIC 11 √ √ Thus 1 + −5 is irreducible, and a similar argument shows that 1 − −5 is irreducible too. ♦ √ Determining the√ units in Z[ d] is more complicated when d is a positive integer. If d > 0, then Z[ d] has infinitely many units. As a warning,√ you should know√ that apart from ±1 the smallest, in a suitable sense, unit in Z[ 23] is 1151 + 240√ 23. Of course√ it is sometimes√ not so hard√ to find some units.√ For example, 1 + 2 and 1 − 2 are units in Z[ 2], and 2 ± 5 are units in Z[ 5].

4.3. Primes in Z[i]. A prime in Z[i] is called a Gauss prime. 2 If m and n are integers, then m divides n in Z if and only if it divides n in Z[i] . Thus, when we work in Z[i] and say that one integer divides another we do not need to say in which ring the division occurs—if there is divisibility in one of Z and Z[i] there is divisibility in both. Theorem 4.5. (1) A prime p ∈ Z is either a Gauss prime or the product of two conjugate Gauss primes, p = ππ¯. (2) If π is a Gauss prime, then ππ¯ is either a prime in Z or the square of a prime in Z. (3) The primes in Z that remain prime in Z[i] are those ≡ 3(mod 4). (4) If p is a prime in Z, the following conditions are equivalent: (a) p is a product of two complex conjugate Gauss primes; (b) p = m2 + n2 for some m, n ∈ Z; (c) −1 is a square in Fp; (d) p = 2 or p ≡ 1(mod 4).

Proof. (1) Let p ∈ Z. Suppose√ p is prime in Z but not in Z[i]. Since p is neither a unit nor a prime in Z[ d] there is a proper factorization p = πτ in Z[i] and there is no loss of generality in assuming that π is prime. Write π = a + bi and τ = c + di. Also, p =p ¯ =π ¯τ¯. Hence p2 = ππτ¯ τ¯ = (a2 + b2)(c2 + d2). This is a proper factorization of p2 in Z because neither π nor τ has norm ±1. It follows that p = a2 + b2 = c2 + d2. Hence p = ππ¯. (2) Let π be a Gauss prime. Then ππ¯ is a positive integer and as such is a product of positive primes in Z. This is also a factorization in Z[i] so, since π is prime in Z[i], at least one of those factors, say p, is divisible by π. Hence ππ¯ divides pp¯ = p2. But ππ¯ is an integer and is not ±1 so ππ¯ is either p or p2. (4) Let p ∈ Z be a prime. (a)⇔(b) This is because p = m2 + n2 if and only if p = (m + in)(m − in). (a)⇔(c) There are ring isomorphisms

Z[i] Z[x] Fp[x] =∼ =∼ (p) (p, x2 + 1) (x2 + 1) so one of these is a field if and only if all are fields. However, Z[i]/(p) is a field if 2 2 and only if p is prime in Z[i] and Fp[x]/(x + 1) is a field if and only if x + 1 has no solution in Fp. Thus, p fails to be prime in Z[i], and is therefore a product of two conjugate Gaussian primes, if and only if the equation x2 = −1 has a solution in Fp.

2 This is because n/m is rational and the only rational numbers in Z[i] are the integers 12 S. PAUL SMITH

(c)⇒(d) Suppose −1 is a square in Fp. If p = 2 there is nothing to prove so 2 suppose p is odd. If a ∈ Fp and a = −1, then a has order 4 as an element of the × group Fp . Hence the order of this group, which is p − 1, is divisible by 4. (d)⇒(c) Since 2 = (1 + i)(1 − i) it suffices to show that if p ≡ 1(mod 4) then p is not prime in Z[i]. So, suppose p − 1 is a multiple of 4. Then the order of × the multiplicative group Fp is a multiple of four. We showed last quarter that the × ∼ group of units in a finite field is always cyclic. Hence Fp = Z4n. This group has an 2 2 2 2 element of order 4, say a ∈ Fp; hence (a ) = 1; but a 6= 1 so a = −1.  Corollary 4.6. Every positive integer that is not divisible by any prime congruent to 3(mod 4) is a sum of two squares. Proof. This is a consequence of the fact that if a and b are sums of two squares so is ab. I leave you to prove that.  4.4. Writing a number as a sum of two squares. It is natural to ask: if an integer n is a sum of two squares, how many ways can it be written as such a sum? Lemma 4.7. Let f(n) denote the number of ways an integer n can be written as a sum of two squares. Then 1 X f(n) = s(d)f 0(e) 2 de=n where ( 1 if d is a square s(d) = 0 if d is not a square and f 0(n) is the cardinality of the set A0(n) := {(x, y) | x2 + y2 = n, gcd{x, y} = 1}. Proof. For each pair of integers (e, d) define A(d, e) := {(u, x, y) | u2 = d, x2 + y2 = e, gcd{x, y} = 1}. 0 P 0 Then |A(d, e)| = s(d)f (e), so | de=n s(d)f (e)| is the cardinality of the disjoint union G A(d, e). de=n Define A(n) := {(, x, y) |  = ±1, x2 + y2 = n} The cardinality of A(n) is 2f(n). Define maps G Φ: A(d, e) −→ A(n) de=n G Ψ: A(n) −→ A(d, e) de=n by Φ(u, x, y) := (sgn(u), ux, uy) x y  Ψ(, x, y) := (d, , where d = gcd{x, y}. d d MATH 404: ARITHMETIC 13

Then Ψ and Φ are mutually inverse to each other. 

4.5. Historical remarks. In a letter to Mersenne dated 25 December 1640, Fer- mat (1601-1665) proved that a prime is a sum of two squares if and only if it is 2 or congruent to 1 modulo 4. The proof of this in Theorem 4.5 is due to Dedekind (1831-1916). In 1863 Dedekind published Vorlesungen ber Zahlentheorie (”Lectures on Num- ber Theory”) based on the lectures of Dirichlet (1805-1859). In the third edition, dated 1876, Dirichlet introduced the notion of an ideal. He confined the idea to ideals in the ring of integers in a number field. The notion of an ideal has its roots in the work of Kummer (1810-1893). In 1843, Kummer realized that the main diffi- culty in proving Fermat’s statement that xn + yn = zn had no non-trivial solutions when n ≥ 3 was that the unique factorization of the integers could fail in various subrings of C. He tried to restore that uniqueness by introducing the notion of ideal numbers. The Paris Academy of Sciences awarded Kummer the Grand Prize in 1857 for this work. The prize of 3000 francs had been offered for a solution to Fermat’s Last Theorem but when no solution was submitted, even after extending the deadline, the Prize was given to Kummer though he had not submitted an entry for the Prize. It is interesting to note that Lam´e() had announced to the Paris Academy on March 1, 1847, that he had proved Fermat’s Last Theorem. He sketched a proof which involved factoring xn + yn as a product of linear terms. Lam´eacknowledged that this idea had been suggested to him by Liouville but, addressing the meeting after Lam´e, Liouville suggested that Lam´e’s approach depended on uniqueness of factorization in rings other than Z and he doubted that this was true. In subse- quent weeks several mathematicians tried to settle the matter. Finally on May 24, Liouville read to the Acad´emie a letter from Kummer that enclosed an off-print of an 1844 paper that proved that unique factorization did fail in certain rings but could be recovered through the use of ideal numbers which he had done in 1846. Kummer then used his new theory to prove Fermat’s last theorem for regular primes. Kummer also mentioned that he thought that 37 was not a regular prime. Fermat had proved that x4 + y4 = z4 has no non-trivial rational solutions. It therefore suffices to prove Fermat’s Theorem for odd primes because if xp +yp = zq has no non-trivial rational solutions then the same is true of (xm)p + (ym)p = (zm)p.

An√ odd prime p ∈ Z is regular if it does not divide the class number of the field p Q( 1]. The class number of a number field K is the order of the class group of its ring of integers (see page 25). In 1847, Kummer wrote a paper giving a useful and simple characterization of regular primes: an odd prime p is regular if and only if it does not divide the numerator of any of the Bernoulli numbers B2,B4,...,Bp−3. The first few irregular primes are 37, 59, 67, 101, 103, 131, and 149. There are infinitely many irregular primes, but it is still not known if there are infinitely many regular primes. The Bernoulli numbers are defined by ∞ x X xn = B . ex − 1 2n n! n=0

In that paper Kummer observed that 37 divides the numerator of B32 so is not a regular prime. 14 S. PAUL SMITH

Euler claimed in a letter to Goldbach on 4 August 1753 that he could prove Fermat’s Theorem for n = 3 but his first published proof, in his Algebra (1770), contained an error. It is interesting to consider his error. He needed to consider a cube of the form m2 + 3n2. He showed that if

(4-2) m = a3 − 9ab2 andn = 3(a2b − b3), then m2 + 3n2 = (a2 + 3b2)3. This is correct but he then tried to show that if 2 2 m + 3n is a cube there must be an a and b such that m and√ n are of the form (4-2). He tries to do this by√ using numbers of the form c + d −3, but he did not appreciate the fact that Z[ −3] was not a unique factorization domain. Kummer’s idea was to work with ideals rather than numbers: i.e., one would now √ 3 ask whether the ideal (6) in Z[ −5] can be written as a product of prime ideals 4 in a unique way. √ Notice√ that (2) is NOT a√ in Z[ −5]. To see this, observe that neither a =√ 1 + −5 nor b = 1 − −5 is in (2),√ i.e., neither a nor b is divisible by 2 in ¯ Z[ −5]. Hence their imagesa ¯ and b in Z[ −5]/(2) are non-zero. However,√ ab = 6 is divisible by 2,√ so ab ∈ (2) and this translates into the fact that in Z[ −5]/(2), ¯ a¯b = 0. Since Z[ −√5]/(2) is not a domain it is√ not a field. √ However, (2, 1 + −√5) is a prime ideal√ in Z[ −5], and so is (2, 1 − −5). Simi- larly, the ideals (3, 1 + −5) and (3, 1 − −5) are prime, and we have the following factorization of (6) as a product of prime ideals: √ √ √ (6) = (2, 1 + −5)2.(3, 1 + −5).(3, 1 − −5). √ In fact, every non-zero ideal in Z[ −5] can be written as a product of prime ideals in a unique way. This result (which actually extends to many other rings) is a very good replacement for the Fundamental Theorem of Arithmetic. Fermat’s result classifying the integers that can be written as a sum of two squares prompts the following question: given integers a, b, c, which integers can be written as ax2 + bxy + cy2 for suitable integers x and y? This question was taken up by Lagrange in 1770 and was later of great interest to Gauss (177-1855). The expression ax2 + bxy + cy2 is called a binary quadratic form. Its discriminant is the number b2 − 4ac. Two forms ax2 + bxy + cy2 and a0x2 + b0xy + c0y2 are equivalent if there is an element α β γ δ

3An ideal p in a commutative ring R is prime if R/p is a domain. For example, the prime ideals in Z are the principal ideals generated by√ the prime numbers and the ideal 0. An integer p is prime if and√ only if Z/(p) is a field. In Z[ d], and others like it, an ideal p is prime if it is either zero or Z[ d]/p is a field. 4This is typical of how mathematics develops. One has a result of great utility, here the Fundamental Theorem of Arithmetic, that one would like to hold in other similar situations. Unfortunately it does not, so one modifies the original idea in a clever way (here by introducing ideals rather than numbers, and prime ideals rather than prime elements) so that a modified version of the original result is true in the new situation. Mathematicians are tricky—they want something to be true so introduce new concepts and ideas so that an appropriately modified version of what one originally wanted is true. MATH 404: ARITHMETIC 15 in GL(2, Z), the group of invertible 2 × 2 matrices with integer entries, such that making the substitutions x0 = αx + βy y0 = γx + δy into the first form yields the second. It is an easy matter to see that if two forms are equivalent they represent the same integers. Lagrange proved that all forms of discriminant −4 are equivalent. Hence, to prove that every prime p ≡ 1(mod 4) is of the form x2 + y2 it suffices to show that p can be represented by at any form of discriminant −4. To do this it suffices to show there is an integer m such that p 2 divides m + 1 (i.e., −1 is a square in Fp) because then the form m2 + 1 px2 + 2mxy + y2 p has discriminant −4 and represents p (take x = 1 and y = 0). Lagrange’s result can be restated in the following way. The group GL(2, Z) acts (via substitution) on the set of quadratic forms and the forms of discriminant −4 form a single orbit. √ 5. Units in Z[ d] when d > 0 √ We saw in the previous section that it is simple to find the units in Z[ d] when d is negative and square-free: there are only two units except in Z[i] which has four units. √ Now suppose that d is a positive square-free integer. We will show that Z[ d] has infinitely many units and that it is hard to determine them all. √We begin with a simple observation. By part (4) of Lemma 4.3,√ an element in Z[ d] is a unit if and only if its norm is ±1. Thus, to show that Z[ d] has infinitely many units it suffices to show that the equation x2 − dy2 = 1 has infinitely many integer solutions. If we only sought rational solutions the problem would be easy. Proposition 5.1. Let d be a positive square-free integer. There are infinitely many points (x, y) ∈ Q2 such that x2 − dy2 = 1. Proof. Notice that (1, 0) is a solution. Let’s write C for the curve in R2 that consists of all solutions in R2 to the equation x2 − dy2 = 1. This curve is a hyperbola. Consider a line L through (1, 0) with slope m ∈ Q. The equation of L is y = m(x − 1). We will show that L meets C at (1, 0) and at one more point and that other point must belong to Q2. The x-coordinates of the points where L meets C are the solutions to the equation x2 − dm2(x − 1)2 = 1 or, equivalently, x2 − 1 − dm2(x − 1)2 = 0. But this equation is (x − 1)(x + 1) − dm2(x − 1)2 = 0. Since this quadratic polynomial in Q[x] has one zero in Q it has a linear factor and is therefore the product of two linear factors. This shows it has two factors of the form x − c with c ∈ Q, thus giving two points on C ∩ L that belong to Q2. Since m can take infinitely many values there are infinitely many points on C that lie in 2 Q .  Lemma 5.2. Let d be a positive integer. Given any positive integer n there exist positive integers x and y such that

√ 1 x − y d < and 0 < y < n. n 16 S. PAUL SMITH

Proof. We will use the pigeonhole priniciple, the pigeonholes being the n intervals 1 of length n in the half-open unit interval [0, 1) given by  i i + 1 , 0 ≤ i ≤ n − 1. n n For each integer j between 0 and n we write √ j d = aj + bj where aj is an integer and 0 ≤ bj < 1. If we now insert the bjs, of which there are n + 1, into the n pigeonholes some pigeonhole must receive two pigeons. In other words, because there are n + 1 of the bjs and n subintervals of [0, 1) there is some interval containing two bjs. Hence there are distinct integers i and j between 0 and 1 n such that |bi − bj| < n . In other words, √ √ 1 i d − ai − j d + aj < . n Without√ loss of generality, i < j. Now√ set x = aj − ai and y = j − i. Then 1 |x − y d| < n and 0 < y < n. Since −y d is negative x is positive.  Lemma 5.2 is really a result about approximating an irrational number by a rational number. The inequality in the conclusion of the lemma can be written as

x √ 1 1 − d < < . y ny y2 Proposition 5.3. Given a positive real number λ there are infinitely many positive integers x and y such that

1 x − yλ < . y √ Proof. The proof of Lemma 5.2 did not use the fact that d is a square root, simply that it is a real number. Thus, that lemma says that we can make |x − yλ| as small as we wish. Hence there must be infinitely many pairs of positive integers x and y such that |x − yλ| < 1/y. 

Theorem 5.4. Let√ d be a positive square-free integer. Then there are infinitely many units in Z[ d]. √ Proof. Let x and y be positive integers such that |x − y d| < 1/y. It follows that √ 1 x < y d + y and therefore √ √ 2 2 |x − dy | = x − y d .(x + y d) 1 √ 1 √ < y d + + y d y y 1 √ < 3y d y √ = 3 d. There are infinitely many pairs x and y for which this holds. MATH 404: ARITHMETIC 17 √ Let M be the smallest integer such that M > 3 d. By the previous paragraph there are infinitely many choices of x and y such that x2 − dy2 ∈ [−M,M]. Hence there is an integer between −M and M, say N, such that the equation x2 − y2d = N has infinitely many solutions (x, y) ∈ Z2. The ring √ Z[ d] (N) is√ finite so has only a finite number of ideals. Hence only finitely many√ ideals in Z[ d] contain N. Among this finite set of√ ideals are√ the ideals (x + y d) where 2 2 x − y d = N; this√ is because N = (x − y d)(x + y d). Hence infinitely many of the ideals (x + y d) must be the same. However, if a and b generate the same ideal√ in a domain, then a and b are unit multiples of each other. It follows that Z[ d] has infinitely many units.  √ Here is a simple argument showing that if d > 0 and√ Z[ d] has a unit 6= ±1, then it has infinitely many. Suppose u is a unit in Z[ d] that is not ±1. Then all powers of u are units. If two of these powers√ were the same there would be a positive integer r such that ur = 1. However, Z[ d] ⊂ R so its only roots of unity are ±1 and u is neother of these√ by assumption. Hence all the powers of u are distinct and√ we deduce that Z[ d] has infinitely many units. (Hence the group of units in Z[ 5] contains a copy of Z if it contains a unit other than ±1. What is the structure of the group of units?) Notice that although we have shown there are infiinitely many solutions have have not shown how to produce even one solution beyond the obvious ones x = ±1 and y = 0.

5.1. Historical remarks. The equation x2 − dy2 = 1 is commonly known as Pell’s equation. This is a complete misnomer, initiated by Euler (1707-1783) who, in his book Introduction to Algebra, written around 1770, said that Pell (1611-1685) had discovered a general method for solving the equation. As early as 1730 Euler already attributed the solution to Pell. Euler had by that time read Wallis’s book Algebra, published in 1685, in which Pell’s name appears frequently, though never in connection with this equation. Indeed, Wallis (1616-1703) attributes the general method for solving the equation to Viscount William Brouncker, the first President of the Royal Society. Wallis wrote letters on 16 December 1657, and 30 January, 1658 communicating Brouncker’s solution. Brouncker (Pell was the first Vice-President) and Wallis’s interest in the equation arose in response to a February 1657 letter Fermat (1601-1665) sent Fr´enicle in which he challenged Fr´enicle to solve the equation x2 − 61y2 = 1. The smallest solution is given by x = 1766319049 and y = 226153980. Fermat knew that Pell’s equation had infinitely many solutions. Pell, apparently, is the inventor of the division sign, ÷. It is interesting to note that Wallis devoted seven of the 100 chapters in his Algebra to Pell’s work but fully 26 to the work of Thomas Harriot (1560-1621). There is now evidence that some of what was attributed to Harriot was really due 18 S. PAUL SMITH to Pell and that Wallis had minimized Pell’s contributions to his book at Pell’s request. Pell had an obsessive desire for anonymity, not just in Wallis’s book, but in many other works written at that time. Even his own publications tended to be made anonymously. In 1585-86 Harriot was sent to Virginia by Sir Walter Raleigh serving as cartog- rapher, historian, and surveyor. In 1588 his book A Briefe and True Report of the New Found Land of Virginia appeared, a book in which he recommends smoking tobacco, something he had learned to do in Virginia: ”[smoking] purgeth superflu- ous fleame & other grosse humors, openeth all the pores & passages of the body.” Smokers, he said, rarely get sick5. Harriot’s book, the first in English about the new world, is considered the cornerstone of the study of North American natural history, ethnology, and anthropology. After his return to England Harriot pursued scientific studies and, among other important discoveries, in July 1601 discovered the sine law of refraction. This is now called Snell’s law although Snell did not discover it until 1621. Neither Harriot nor Snell published their results and the sine law was first published by Descartes (1596-1650) in 1637. Raleigh had been a great favorite of Queen Elizabeth I and after she died in 1603 her successor, James I, accused Raleigh of treason. Raleigh was subsequently tried and condemned to death, although this sentence was not carried out until 1618. In the judgement at Raleigh’s trial, Harriot, who had been closely associated with Raleigh, and even in his employ as ship designer, accountant, and navigation instructor, prior to his trip to Virginia, was singled out as an atheist and evil influence. Harriot was arrested in 1604 on suspicion of being involved with Guy Fawkes’s plot to blow up the Houses of Parliament but was soon released when no evidence was found to support those suspicions. Harriot returned to his scientific work on optics but, after being influ- enced by his observation of a comet on 17 September 1607, turned his attention to astronomy. Kepler, who had earlier initiated a correspondence with Harriot upon hearing of his work on optics, had seen the same comet six days earlier. The comet would later be named after Edmund Halley (1656-1742). The measurements of Harriot and his student William Lower would later be used by Bessel (1784-1846) to compute the comet’s orbit. In August 1609, Harriot used a telescope to look at the moon, magnified six times. Not until four months later did Galileo, in Padua, Italy, study the moon with his telescope. Then, in December 1610, Harriot became the first Westerner to observe sunspots through a telescope (naked-eye observations in China beat Harriot by about 1,800 years). On 12 December 1591, Harriot wrote a manuscript in response to a question Raleigh had asked him about stacking canonballs. Harriot’s wide-ranging curiosity led him to generalize Raleigh’s question and consider its implications for the atomic theory of matter which he believed in. Later, when writing to Kepler regarding the atomic theory, Harriot posed a question about the densest packing of spheres. Kepler believed the densest packing would be obtained by placing the center of each sphere above the center of the hole in the lower layer. This was finally solved by Thomas Hales of the University of Michigan in 1998. Harriot published nothing about his scientific work, and the full extent of his discoveries is still the subject of intense scholarly interest. Unbeknownst to Fermat, or other Europeans of his time, the Hindu mathemati- cian Brahmagupta (598-670) had given a solution to the equation x2 − 61y2 = 1 in

5Harriot died from cancer of the nose. MATH 404: ARITHMETIC 19 his astronomy hbox Brahma Sphuta Siddhanta written in 628, over a thousand years before European mathematicians considered Pell’s equation. Brahmagupta did not know how to solve the general equation x2 − dy2 = 1 but developed an algorithm that sometimes produced a solution. What Brahmagupta did have was a method to produce new solutions from old ones. This latter√ method of Brahmagupta is essentially the result that if u and v are units in Z√[ d] so is uv. In more√ detail, if 2 2 2 2 x1 −√dy1 = 1 and x2 − dy2 = 1, i.e., if u = x1 + y1 d and v = x2 + y2 d are units in [ d], so is Z √ uv = (x1x2 + dy1y2) + (x1y2 + y1x2) d which then implies that 2 2 (x1x2 + dy1y2) − d(x1y2 + y1x2) = 1, 2 thus producing another solution. More generally, if (x1, y1) is a solution to x − 2 2 2 dy = k1 and (x2, y2) is a solution to x − dy = k2, Brahmagupta showed that 2 2 (x1x2 + dy1y2, x1y2 + y1x2) is a solution to x − dy = k1k2. This is equivalent to the statement that N(a)N(b) = N(ab). Among his other accomplishments, Brahmagupta defined zero to be the result obtained by subtracting a number from itself and gave rules for arithmetical oper- ations among negative numbers (debts) and positive numbers (property). He also gave a formula for the area of a quadrilateral whose vertices lie on a circle, and the length of its diagonals in terms of the length of its sides. He also gave a valuable interpolation formula for computing sines. Brahmagupta was also aware of the identity (5-1) (a2 + b2)(c2 + d2) = (ac − bd)2 + (ad − bc)2 which implies that if each of two numbers is a sum of two squares so is their product. Hence, since 2 and every prime congruent to 1 modulo 4 is of this form so are all integers that are not divisible by a prime congruent to 3 modulo 4. We can only wonder, and marvel, at Brahmagupta’s achievements. Five hundred years later the Indian mathematician Bhaskaracharya, or Bhaskara The Learned (1114-1185), improved Brahmagupta’s work by showing how one can produce a solution from an approximate solution. In Lilavati, or The Beautiful, and Bijaganita, or Seed Counting, Bhaskara filled many of the gaps in Brahmagupta’s work, in particular, giving a complete solution to Pell’s equation. In his works, Bhaskara used the decimal system and the modern convention of signs, minus × ± = ∓. He had some understanding of division by zero saying that 3/0 is an infinite quantity, but also giving the fallacious equality a × 0/0 = a. Bhaskara used letters to represent unknown quantities and solved linar and quadratic equations. He investigated regular polygons up to those having 384 sides, thus obtaining a reasonable approximate value of π as 3.141666. The achievements of Brahmagupta were conveyed to the Arab world when, in 770, the second Abbasid Caliph Abu Ja’far Abdallah ibn Muhammad al-Mansur (712-775), the founder of Baghdad, invited a scholar of Ujjain, where Brahmagupta spent most of his life, by the name of Kanka to his court. At the request of the Caliph, Al Fazaii translated Brahmagupta’s Brahma Sphuta Siddhanta into Arabic. This had a major impact on Arabic science and mathematics. In the early 800’s the Caliph al-Ma’mun founded the Bayt al-Hikma, or House of Wisdom, an institute of higher learning and scholarship, in Baghdad. There the works of 20 S. PAUL SMITH the Greeks and Indians in natural philosophy, mathematics and astronomy were translated into Arabic. By this time Baghdad was an important centre for Islamic mathematics. The House of Wisdom predates the earliest European universities by several centuries: the University of Bologna was established in in 1088, Paris in 1150, and Oxford in 1167. One of the earliest scholars at the House of Wisdom, al- Khwarizmi (?780-850?), wrote several books, drawing together Babylonian, Greek, and Hindu mathematics. His Arithmetic introduced the Indian decimal-place-value method for writing numbers and his book Al-kitab al-muhtasar fi hisab al-jabr w’al- muqabala, or The Condensed Book of Calclation by Restoration and Comparison, has given the subject of Algebra its name. The Latinized version of al-Khwarizmi has been preserved in the word ”algorithm”. By the late 12th century the mathematical knowledge in the Islamic world began to spread from the southern Mediterranean shores to the north, particularly into Sicily and Spain. For example, Leonardo of Pisa, or Fibonacci, who had travelled to Sicily, Egypt, and Syria in the course of trade, learned that the Arabs were using the more efficient Hindu system of writing numbers. He wrote a book called Liber Abaci (1202, Book of the Abacus) to publicize this. On February 10, 1258, Baghdad was overrun by the Mongols under Hulagu Khan. The city was utterly destroyed. Over 200,000 residents were killed, often in the most cruel manner. The city’s mosques, palaces, libraries, and hospitals, splendid buildings representing the best work of many generations, were burned to the ground. The Grand Library of Baghdad, at that time housing one of the world’s greatest collections of historical documents and books on subjects covering the gamut of human knowledge, medicine, science, astronomy, and mathematics, was destroyed. Survivors said the Tigris ran black with ink. The city’s intricate system of dikes and canals, built up over thousands of years, was destroyed. The sack of Baghdad was a psychological blow from which the Islamic world has yet to recover.

6. Algebraic numbers and algebraic integers 6.1. Algebraic elements. Let K be a field and k a subfield of K. We call K an extension field, or simply an extension, of k. The vector space dimension dimk K is called the degree of the extension and is denoted by [K : k]. If [K : k] < ∞ we call K a finite extension of k. The smallest extension is one of degree two and such extensions are called qua- dratic. For example, C is a quadratic extension of R. Notation. Let K be an extension of k and α ∈ K. We write k(α) := the smallest subfield of K that contains k and α k[α] := the smallest subring of K that contains k and α = k + kα + kα2 + ···  2 n = λ0 + λ1α + λ2α + ··· + λnα | n ≥ 0, λ0, . . . , λn ∈ k . Remarks. 1. Given any subset S of a field K there is a smallest subring (resp., subfield) containing S, namely the intersection of all subrings (resp., subfields) that contain S. An intersection of subrings (resp., subfields) is a subring (resp., subfield). Hence k[α] (resp., k(α)) exists—it is the intersection of all the subrings (resp., subfields) of K that contain k and α. MATH 404: ARITHMETIC 21

2. k[α] ⊂ k(α). 3. k[α] is the image of the evaluation homomorphism ε : k[x] → k[α] given by ε(f) := f(α).

An element α ∈ K is • algebraic over k if it is a zero of a non-zero polynomial with coefficients in k, i.e., if n n−1 λnα + λn−1α + ··· + λ1α + λ0 = 0

for some λ0, . . . , λn ∈ k, not all zero. • transcendental over k if it is not algebraic over k. Proposition 6.1. Let K be an extension of k and α ∈ K. The following are equivalent: (1) α is algebraic over k; (2) the map ε : k[x] → K defined by ε(f) := f(α) is not injective; (3) dimk k[α] < ∞; (4) k[α] is a field; (5) k[α] = k(α); (6) k(α) is a finite extension of k; Proof. (1) ⇔ (2) This is the definition of what it means to be algebraic over k. (2) ⇔ (3) The dimension of a quotient ring k[x]/I is finite if and only if I 6= 0. (3) ⇒ (4) A finite dimensional domain is a field (multiplication by a non-zero element of that domain is an injective linear map, so must also be surjective, whence 1 is in the image). However, once we know that k[α] is a field, then k[α] = k(α). (4) ⇒ (2) Since k[α] =∼ k[x]/ ker f and k[x] is not a field, ker f 6= 0. (4) ⇒ (5) If k[α] is a field, then k(α) ⊂ k[α]; but the reverse inclusion holds for any α so there is equality. (5) ⇒ (4) Obvious. We have now shown that the first five conditions are equivalent. (6) ⇒ (3) This is clear because k[α] ⊂ k(α). (5) + (3) ⇒ (6) Obvious.  When α is algebraic over k we call the lowest degree monic polynomial f ∈ k[x] such that f(α) = 0 the minimal polynomial of α. Equivalently, f is the monic generator of the ideal ker ε. Remark. When α is algebraic over k we will make frequent use of the fact that k[α] = k(α) and use both notations to denote k(α). 6.2. Algebraically closed fields. We say that k is algebraically closed if the only elements algebraic over k (whatever K may be) are the elements of k itself. Proposition 6.2. Let k be a field. The following are equivalent: (1) k is algebraically closed; (2) the only irreducible polynomials in k[x] are the degree one polynomials; (3) every polynomial in k[x] of positive degree has a zero in k.

Remark. Every field has an algebraic closure and it is unique up to isomor- phism. We therefore sometimes speak of the algebraic closure of a field; we write k¯ for the algebraic closure of k. If K is an algebraic extension of k, then K is isomorphic to a subfield of k¯. 22 S. PAUL SMITH

6.3. The field of algebraic numbers. A α is called an algebraic number if it is algebraic over Q. Theorem 6.3. The algebraic numbers form a subfield of C. Proof. Write R for the set of algebraic numbers. It certainly contains Q so contains 0 and 1. Now let α ∈ R and suppose its minimal polynomial is n X i f(x) = λix i=0 where λn = 1. Then −α is a zero of f(−x) so is algebraic over Q. Since f is minimal it is not divisible by x, whence λ0 6= 0. Define the monic polynomial n −1 X i g(x) := λ0 λn−ix . i=0 −1 −1 −n −1 A calculation shows that g(α ) = λ0 α f(α) = 0, so α ∈ R. To complete the proof it suffices to show that α+β and αβ belong to R whenever α and β do. So, suppose α, β ∈ R and define ψ : Q[x, y] → C by ψ(x) = α and ψ(y) = β. Write I = ker ψ. The elements {xiyj | i, j ≥ 0} form a basis for Q[x, y]. By hypothesis, I contains an element f(x) of degree m ≥ 0 and an element g(y) of degree n ≥ 0. Hence, modulo I, xm and yn belong to the linear span S := {xiyj | 0 ≤ i ≤ m − 1, 0 ≤ j ≤ n − 1}. It follows at once that the image of S spans Q[x, y]/I, whence dim Q[x, y]/I < ∞. ∼ Q But Q[x, y]/I = im ψ so dimQ(im ψ) < ∞. Since α + β and αβ belong to im ψ, Q[α + β] and Q[αβ] have finite dimension over Q. It now follows from Proposition 6.1 that α + β and αβ are algebraic over Q and hence in R. This completes the proof that R is a field. 

6.4. Quadratic extensions of Q. A subfield of the complex numbers contains 1, hence , hence .A number field is a subfield K ⊂ such that dim K < ∞. Z Q √ C Q If d ∈ Z is not a square, then Q[ d] is a quadratic√ extension of Q because it is strictly larger than Q and is spanned by 1 and d. The next result shows that every√ quadratic extension of Q is of this form. Of course, if d is a square, then Q[ d] = Q. Proposition 6.4. If K is a quadratic extension of Q, then (1) K = Q(a) for some a ∈ C and (2) the minimal√ polynomial of a over Q has degree two; (3) K =∼ Q[ d] for some square free integer d. Proof. (1) Let a ∈ K − Q. Then {1, a} is linearly independent over Q so a basis for K, i.e., K = Q ⊕ Qa. It is immediate that K = Q(a). 2 (2) Since a = λ0+λ1a for some λ0, λ1 ∈ Q, so a is a zero of the monic polynomial 2 λ0 + λ1x − x . Since a∈ / Q this is the minimal polynomial of a. MATH 404: ARITHMETIC 23

(3) From the quadratic formula we see that λ ± pλ2 + 4λ a = 1 1 0 . 2 √ 2 Hence a ∈ Q( d) where d = λ1 + 4λ0. Notice that√d is not a square in Q because√ a∈ / Q. By rewriting this formula, we also see that √d ∈ Q + Qa ⊂ K. Hence 1, d form a basis for K over Q. It follows that K = Q( d). At this√ stage d need not be√ an integer,√ however√ if n is any non-zero integer, then 2 1 and n d form a basis for Q[ d] so Q[ d] = Q[ n d]. Using an appropriate n√to 2 cancel the denominator in d, we can replace d by the integer n d. So, K = √Q[ d] for some d ∈ Z. Furthermore, if d is divisible by the integer m2, then Q[ d] = p 2 Q[ d/m2] so we may replace d by an appropriate square-free integer d/m .  6.5. Algebraic integers. Definition 6.5. A complex number a is called an algebraic integer if it is the zero of a monic polynomial in Z[x]. ♦ For example, 1 √  ω = −1 + −3 2 2 3 is an algebraic integer because it is a a zero of x + x +√ 1 = 0. Notice that ω = 1 because x2 + x + 1 divides x3 − 1. The only units in Z[ −3] are ±1 but Z[ω] has the additional units ±ω and ±ω2. It is not hard to show these are all the units in Z[ω]. In other words, the group of units in Z[ω] is the cyclic group consisting of the sixth roots of unitty. Proposition 6.6. The only algebraic integers in Q are the elements of Z. Proof. An ordinary integer m is an algebraic integer because it is a zero of x − m. a Now suppose b is an algebraic integer belonging to Q. We may suppose that a gcd{a, b} = 1. Supppose that b is a zero of n n−1 x + cn−1x + ··· + c1x + c0 ∈ Z[x]. Then n n−1 n−1 n a + cn−1a b + ··· + c1ab + c0b = 0. If p is a prime dividing b, then it follows from this equation that p divides an and hence a. This contradicts the assumption that (a, b) = 1.  n Definition 6.7. A degree n polynomial a0 + a1x + ··· + anx ∈ Z[x] is primitive if an > 0 and the greatest common divisor of a0, . . . , an is 1. ♦ An algebraic number is an element a ∈ C that is a zero of a non-zero polynomial f ∈ Q[x]. We can assume such an f is irreducible. Of course, a is also a zero of every integer multiple of f so is a zero of a polynomial in Z[x]. Taking such a polynomial in Z[x] and dividing it by the greatest common divisor of its coefficients we see that a is a zero of a primitive polynomial, say g(x), in Z[x]. Furthermore, g(x) is irreducible in Z[x]. However, g(x) need not be monic. For example, 2/3 is a zero of 3x − 2. Warning. A polynomial in Z[x] that is irreducible in Q[x] need not be ir- reducible in Z[x]; for example, 2x. However, a primitive polynomial in Z[x] is irreducible in Z[x] if and only if it is irreducible in Q[x]. 24 S. PAUL SMITH

Lemma 6.8. Let a be an algebraic number. The kernel of the homomorphism ε : Z[x] → Q(a) defined by ε(f) = f(a) is generated by a unique primitive g ∈ Z[x]. Furthermore, g(x) is an integer multiple of the minimal polynomial of a in Q[x].

Proof. Let q(x) ∈ Q[x] be the minimal polynomial of a. We have just seen that there is a rational number r such that rq(x) ∈ Z[x] is primitive. Of course, rq(x) is irreducible in Q[x] and hence in Z[x]. Define g(x) = rq(x). Then g(x) is a primitive irreducible polynomial Z[x] such that g(a) = 0. Now suppose that h ∈ Z[x] and h(a) = 0. Then g divides h in Q[x], and hence in Z[x] (see Artin, Ch. 11, Prop. 3.4). Hence, h is either reducible in Z[x] or not primitive.  We call the polynomial g in Lemma 6.8 the primitive irreducible polynomial of a. The proof of Lemma 6.8 shows that g is a multiple of the minimal polynomial of a in Q[x]. Proposition 6.9. Let a be an algebraic number. The following are equivalent: (1) a is an algebraic integer (2) the primitive irreducible polynomial of a is monic; (3) the minimal polynomial of a in Q[x] has integer coefficients.

Proof. Let f ∈ Q[x] and g ∈ Z[x] be the minimal, and the primitive irreducible, polynomials of a respectively. (3) ⇒ (1) This is obvious. (1) ⇒ (2) Since a is an algebraic integer it is a zero of a monic polynomial Z[x] and, by Lemma 6.8, that polynomial is gh for some h ∈ Z[x]. Because gh is monic, g is monic. Hence (2) holds. (2) ⇒ (3) By Lemma 6.8, g = nf for some n ∈ Z. Since g and f are monic, n = 1, and f ∈ Z[x]. 

7. Arithmetic in imaginary quadratic number fields √ An imaginary quadratic number field is a field of the form Q( d) where√ d is a negative square-free integer. We use the word quadratic because Q( d) is a 6 degree two extension of Q, i.e., it is a 2-dimensional vector space over Q The word quadratic is also√ applicable because the minimal polynomial of every non-rational√ 7 element of Q( d) has degree two. The word imaginary is used because Q( d) 6⊂ R . Finally, a number field is any finite degree extension field of Q. Until further notice, d will denote a negative square-free integer. Arithmetic sounds easy. But it is√ not, at least when interpreted in a broad sense. Certainly, the arithmetic in Q( d) is very rich and has been of continuing interest to mathematicians for several hundred years, even though the ancients did not always know that what they were studying was the arithmetic of these rings (see, e.g., the earlier remarks on Brahmagupta).

√ 6 With basis 1 and d. √ 7 We have already seen that the case d > 0, which implies Q( d) ⊂ R, is very different. MATH 404: ARITHMETIC 25 √ 7.1. Algebraic integers in Q[ d]. Definition 7.1. The ring of integers in a number field K is

OK := {a ∈ K | a is an algebraic integer}.

It is by no means clear that OK is a ring. It is, but I will omit the proof because we will only be interested in these rings when K is a quadratic extension of Q and in that case it will be clear from the explicit description of OK in Theorem 7.2 that it is a ring. Remark. Before proving the next result it is useful to notice that √ √ 1 + d 1 + d √ √ = ⊕ ⊃ ⊕ d = [ d] Z 2 Z Z 2 Z Z Z for all integers d. √ Theorem 7.2. Let d be a square-free integer and let K = Q( d). The ring of integers in K is  √ ⊕ d if d ≡ 2, 3(mod 4) Z Z (7-1) OK = √  1+ d  Z ⊕ Z 2 if d ≡ 1(mod 4) Proof. It is easy to verify that algebraic numbers of the form (7-1) are algebraic integers. For example, suppose d ≡ 1(mod 4). If a, b ∈ Z and √ 1 + d x = a + b , 2 √ then 2(x − a) − b = b d, so 4(x − a)2 − 4b(x − a) + b2 − b2d = 0 whence 1 (x − a)2 − b(x − a) + b2(1 − d) = 0. 4 This is a monic polynomial in x with coefficients in Z because d ≡ 1(mod 4). Hence x is an algebraic integer. To prove the converse, suppose √ r = a + b d √ is an algebraic integer belonging to Q[ d]. We will show r is of the form (7-1). If b = 0, then r ∈ Q so, by Proposition 6.6, r ∈ Z. Now suppose b 6= 0. Then r∈ / Q so its minimal polynomial has degree 2. A calculation shows that r is a zero of x2 − 2ax + a2 − b2d. This is the minimal polynomial of r in Q[x] so, by Proposition 6.9, its coefficients belong to Z. Hence 2a ∈ Z and a2 − b2d ∈ Z. 1 There are two cases: either a ∈ Z or a = 2 c with c an odd integer. Suppose a ∈ Z. Then b2d ∈ Z. Write b = m/n with m and n relatively prime integers. Then m2d/n2 ∈ Z. Since (m, n) = 1, d/n2 must belong to Z, whence 2 2 d ∈ n Z. But d is square-free√ so n = 1 and we conclude that m/n ∈ Z. Thus a, b ∈ Z so r ∈ Z ⊕ Z d. 26 S. PAUL SMITH

Now suppose a = m/2 with m an odd integer. Then 4b2d ∈ Z but b2d∈ / Z. Hence 2 2 1 2 2 b = n/2 for some odd integer n. Thus a − b d = 4 (m − n d) so we conclude that m2 − n2d is divisible by 4. However, m and n are odd, so m2 ≡ n2 ≡ 1(mod 4). This implies 1 − d is divisible by 4, i.e., d ≡ 1(mod 4). Futhermore, √ √ m n√ m − n 1 + d r = a + b d = + d = + n 2 2 2 2 √ 1+ d  so r is in Z ⊕ Z 2 , as required.  √ The ring OK fills in the following diagram and plays the same role for Q( d) that Z plays for Q:

√ ( d) ¨* Q ¨¨ ¨ 6 OK 6

¨* Q ¨¨ ¨¨ Z

Proposition 7.3. The√ only negative square-free integer d ≡ 3(mod 4) for which the ring of integers in Q[ d] is a√ UFD is d = −1, i.e., the ring of Gaussian integers is the only UFD among such Z[ d]s. Proof. We have shown that Z[i] is a Euclidean domain, hence a PID, hence a UFD, so let’s assume d 6= −1. Write R for the ring of integers. Notice that √ √ 1 − d (1 + d)(1 − d) = 1 − d = 2. ; 2 1 since d is odd, 2 (1 − d) belongs to R so this gives two factorizations of 1 − d. Let’s see that 2 is irreducible. Suppose to the contrary that 2 = ab and neither a nor b is√ a unit; then 4 = N(2) = N(a)N(b) so N(a) = N(b) = 2 also. Write a = x + y d. Then N(a) = x2 − dy2; the only solutions to this equation are (x, y, d) = (±1, ±1, −1) and (0, ±1, −2). But we are assuming that d 6= −1 and d =∼ 3(mod 4) so there is no such a in R. Hence 2 is irreducible. √ √ √ If R is a UFD, then 2 must divide either 1+ d or 1− d; but neither 1 (1+ d) √ 2 1 nor 2 (1 − d) is in R so we conclude that R is not a UFD.  Notice that the last step of the proof fails if d ≡ 1(mod 4). The following result is very deep and its proof requires ideas that go beyond the material in this course. It was proved in 1966, and verifies a conjecture Gauss made some 150 years earlier after he had proved that the ring of integers is a UFD in the listed cases. MATH 404: ARITHMETIC 27

Theorem√ 7.4. Let d be a negative square-free integer. The ring of integers in Q[ d] is a UFD if and only if d is one of the integers −1, −2, −3, −7, −11, −19, −43, −67, −163.

7.2. Lattices in R2. A lattice in R2 is a subset of the form L = Rα ⊕ Rβ where α, β ∈ R2 are linearly independent over R, i.e., β does not lie on the line Rα (through 0 and α). The points of L are the corners of a set of parallelograms 8 9 that tile the plane R2. These parallelograms are translations of the fundamental parallelogram which is defined as that with vertices 0, α, β, α + β. √ If d is a negative square-free integer and R the ring of integers in Q[ d], then R is a lattice. It follows that every non-zero principal ideal (b) = bR is a lattice too. In Proposition 7.7 we show that every non-zero ideal of R is a lattice.

Theorem 7.5. Let L be a subgroup of (R2, +) and suppose there is a positive real number ε such that {a ∈ L | |a| ≤ ε} = {0}. Then one of the following holds: (1) L = 0; (2) L = Zα for some 0 6= α ∈ R2; (3) L is a lattice.

Proof. Claim: A bounded subset of R2 contains only finitely many points of L. Proof: Let B ⊂ R2 be bounded. There is an r ∈ R such that B is contained in a disk of radius r centered the origin. Let D be the disk of radius r + ε centered 1 at the origin. If a ∈ L ∩ B, then the disk of radius 2 ε centered at a lies in D. If 1 the disks of radius 2 ε centered a a, b ∈ L ∩ B intersected their centers would be distance at most ε apart.; i.e., |a − b| ≤ ε; but a − b ∈ L so the initial hypothesis implies a = b. Hence the area of D is ≥ nπε2/4 where n is the number of points in L ∩ B. This gives an upper bound on n so B ∩ L is finite. ♦ Suppose L 6= 0. Let’s assume that L is contained in a line, say Rv through zero. It follows from the previous paragraph that there is an element 0 6= α ∈ L∩Rv such that |α| is minimal. We will now show that L = Zα. If not, there is an element β ∈ L∩Rv −Zα. But such an element lies between nα and (n+1)α for some n ∈ Z so |nα − β| < |α| contradicting the choice of α. Hence L = Zα. Now suppose L is not contained in a line. We can still pick an element α ∈ L−{0} such that |α| is minimal. By the previous paragraph L∩Rα = Zα. Pick β ∈ L−Rα such that |β| is minimal. Notice that R2 = Rα ⊕ Rβ. Let P denote the parallelogram with corners 0, α, β, and α + β. Claim: L ∩ P = {0, α, β, α + β}. Proof: Let p ∈ L ∩ P . Then p = rα + sβ for some r, a ∈ [0, 1]. If r + s < 1, then |p| ≤ r|α| + s|β| < |β| so, by choice of β, p ∈ Rα ∩ L = Zα. Hence p ∈ {0, α}. If r + s > 1, then α + β − p is in L and is equal to (1 − r)α + (1 − s)β with (1 − r) + (1 − s) < 1 so the analysis in the previous sentence gives p ∈ {0, β}. Now suppose r + s = 1. If r = 0 then p = β so suppose r 6= 0; hence s < 1 and |α − p| = |(1 − r)α − sβ| ≤ s|α| + s|β| < |β|

8 2 2 To be precise, a tiling of R is a decomposition of R as a disjoint union of (compact) tiles (which are usually all assumed to have the same shape). 9 2 A translation by v of a subset S of R is the set v + S := {v + s | s ∈ S}. 28 S. PAUL SMITH so α − p ∈ Rα; thus p ∈ Rα so s = 0 and p = α. ♦ Let δ ∈ L, and write δ = rα + sβ with r, s ∈ R. We can write r = r0 + r00 and s = s0 + s00 where r0, s0, ∈ Z and r00, s00 ∈ [0, 1). Now δ − r0α − s0β = r00α + s00β belongs to L ∩ P which, by the previous paragraph, is a subset of Zα ⊕ Zβ But 0 0 Zα ⊕ Zβ contains r α + s β too, so contains δ.  Theorem 7.5 has the following consequence. Suppose L = Zα + Zβ is a lattice in R2 and let γ ∈ R2. Consider the subgroup L0 = Zα + Zβ + Zγ. If γ ∈ Qα + Qβ then L0 is a lattice. If γ∈ / Qα + Qβ, then the sum Zα + Zβ + Zγ is direct and L0 is not a lattice. Hence, for every ε > 0, the disk of radius ε centered at 0 contains infinitely many points of L0. It follows that L0 is a dense subset of R2. Proposition 7.6. If L0 ⊂ L are lattices in R2, then L/L0 is finite and there are only finitely many lattices A such that L0 ⊂ A ⊂ L.

Proof. Suppose L0 = Zα ⊕ Zβ and consider the parallelogram P with corners 0, α, β, α + β. By the proof of Theorem 7.5, P ∩ L is finite. Let a ∈ L. Since the L0 translates of P tile C, a lies in `0 + P for some `0 ∈ L0. In other words, if φ : L → L/L0 is the obvious group homomorphism, then φ(a) ∈ φ(P ∩ L). Thus L/L0 = φ(P ∩ L). It follows that L/L0 is finite, so has only finitely many subgroups. Pulling this back to L, there are only finitely many subgroups of 0 L that contain L . 

7.3. Integers in the imaginary quadratic case.√ Let d be a square-free negative integer and write R for the ring of integers in Q[ d]. The study of R is enhanced by picturing it as a lattice in the complex plane. We may write R = Z ⊕ Zη MATH 404: ARITHMETIC 29 √ √ 7.3.1. Suppose d ≡ 2 or 3 modulo 4. Then R = Z ⊕ Z d and because d lies on the imaginary axis the elements of R form a rectangular lattice in C:

√ ··· • d • ···

··· • 0 1 ···

··· • • • ···

√ √ The fundamental rectangle has its corners at 0, 1, d, and 1 + d. The other rectangles are formed by translating10 the fundamental rectangle.

7.3.2. Suppose d ≡ 1 modulo 4. The points of R form an isoceles triangular√ lattice in the plane. To see this, notice that R contains the rectangular√ lattice Z⊕Z d and 1 the “centers” of these rectangles because η := 2 (1 + d) is the intersection point of the two diagonals of the fundamental√ recatangle; of course η is the midpoint of the diagonal from 0 to 1 + d = 2η of the fundamental rectangle. Drawing the diagonals of all the translations of the fundamental rectangle produces a division of the complex plane into isoceles triangles (when d = −3 these are equilateral triangles).

10 If S is a subset of C and z ∈ C, we call z + S := {z + s | s ∈ S} the translation of S by z. 30 S. PAUL SMITH

               √  ··· • d • ···                   ···• • η • ···               ··· • 0 1 ···

Proposition 7.7. Let I be a non-zero ideal in R. Then (1) I is is a lattice in C; (2) as an ideal, I is generated by ≤ 2 elements; (3) R/I is finite. Proof. (1) Let I be a non-zero ideal in R = Z ⊕ Zη. Then I is a subgroup of (C, +) ≡ (R2, +). Since R is a lattice in C, the main hypothesis of Theorem 7.5 applies to R and hence to I. If 0 6= a ∈ I, then I contains a and aη which are linearly independent over R because 1 and η are; hence I is not contained in a line. Therefore condition (2) in Theorem 7.5 does not apply to I. The result now follows from Theorem 7.5(3). (2) Since I is a lattice there are elements β, γ ∈ I such that I = Zβ ⊕ Zγ. Since Z ⊂ R, I ⊂ Rβ + Rγ; but I is an ideal so Rβ + Rγ ⊂ I also. Hence I = Rβ + Rγ. (3) This follows from Proposition 7.6.  If m is a maximal ideal of R then R/m is a finite field.

7.4. Factorization of Ideals. Throughout this section let d be a√ negative square- free integer and write R = Z ⊕ Zη for the ring of integers in Q[ d]. After some preliminary results, we will prove the following theorem which is one of the high points of 19th century mathematics (and of this course!). Theorem 7.8. Every non-zero ideal in R is a product of prime ideals in a unique way. Linguistically, this is analogous to the result that every integer is a product of i1 in primes in a unique way. The resemblance is more than linguistic: if a = p1 ··· pn MATH 404: ARITHMETIC 31 there is an equality of ideals:

i1 in (a) = (p1) ··· (pn) . Recall that a product of ideals I and J is defined by X  IJ := aibi | ai ∈ I, bi ∈ J .

We make this definition because the set of products {ab | a ∈ I, b ∈ J} is not an ideal—it is not closed under +. An ideal p is prime if R/p is a domain. Lemma 7.9. Let p be a prime ideal. If p contains a product IJ of ideals then either I or J is contained in p. Proof. If I is not contained in p, we can choose an element x ∈ I − p. Hence the image of x in R/p is non-zero. However, if y ∈ J, the image of xy is zero in the domain R/p so y must be in p. That is, J ⊂ p.  √ Proposition 7.10. Let R be the ring of integers in Q[ d]. Then the non-zero prime ideals in R are exactly the maximal ideals. Proof. Certainly every maximal ideal is prime. Now suppose that p is a non-zero prime ideal. Then R/p is a domain and is finite by Corollary ??. But a finite domain is a field, so p is maximal.  You already know an important special case of the next result: if a complex number a is a zero of a polynomial with real coefficients, so is its complex conjugate. Proposition 7.11. Let K be an extension field of k and θ an automorphism of K that is the identity on k. If a ∈ K is a zero of a polynomial f ∈ k[x], so is θ(a).

n i i Proof. Write f(x) = λ0 + λ1x + ··· + λnx . Since θ(a ) = θ(a) , n n 0 = θ(0) = θ(λ0 + λ1a + ··· + λna ) = λ0 + λ1θ(a) + ··· + λnθ(a) = f(θ(a)).

This shows that θ(a) is a zero of f.  The classical case mentioned a moment ago occurs when θ is complex conjuga- tion: a 7→ a¯ is an automorphism of C that is the identity on R. We will make use of the following consequence of this fact. Corollary 7.12. A complex number a is an algebraic integer if and only if a¯ is11. √ Applying this to R, the ring of integers in Q[ d], we see that a 7→ a¯ sends R to R. Hence, if I is an ideal in R so is I¯ := {a¯ | a ∈ I}. Proposition 7.13. If I is an ideal of R, then II¯ is a principal ideal generated by an element of Z.

11Here is a pictorial proof of this result: the conjugation map a 7→ a¯ is the reflection of the complex plane in the real axis, but the lattices we have just draw for Z ⊕ Zη are symmetric about the real axis. 32 S. PAUL SMITH

Proof. Since I is an ideal it is a lattice, say I = Zα ⊕ Zβ. Hence I = Rα + Rβ and II¯ = (αα,¯ αβ,¯ βα,¯ ββ¯). Since αα¯ is a real number it is in R ∩ R = Z. Hence II¯∩ Z is a non-zero ideal of Z so generated by some n ∈ Z. Certainly nR ⊂ II¯. The elements αα,¯ ββ,¯ αβ¯ +αβ ¯ of II¯ are equal to their own conjugates so are in R ∩ R = Z and are therefore integer multiples of n. Hence αβ¯ +αβ ¯  ααβ¯ β¯ x2 − x + n n2 ¯ is a monic polynomial with integer coefficients. But (αβ)/n and√ (¯αβ)/n are zeroes of this polynomial so are algebraic integers; they belong to Q( d) so are in R; i.e., ¯ ¯ αβ, αβ¯ ∈ nR. It follows that II ⊂ nR and hence there is equality.  Proposition 7.14. Let I, J, K be non-zero ideals of R. Then (1) IJ ⊂ IK ⇐⇒ J ⊂ K; (2) IJ = JK ⇐⇒ J = K; (3) if I ⊂ J there is an ideal A such that I = JA. Proof. (1) By Proposition 7.13, there is an ordinary integer n such that II¯ = nR. Thus, if IJ ⊂ IK, then nJ = IIJ¯ ⊂ IIK¯ = nK and since we are working in the domain C we may cancel the n to get J ⊂ K. Part (2) follows at once. (3) By Proposition 7.13, there is m ∈ Z such that JJ¯ = mR. Now IJ¯ ⊂ JJ¯ = 1 ¯ 1 ¯ mR, so A := m IJ is a subset of R and hence an ideal of R. Also AJ = m IJJ = 1 m ImR = I.  Proof of Theorem 7.8. Let I be a non-zero ideal in R. First we show that I is a product of maximal ideals by using an induction argument on the length of R/I; the length of R/I is the maximal number n that can occur when R = I0 ⊃ I1 ⊃ I2 ⊃ · · · ⊃ In = I and the Ijs are distinct ideals of R. If length(R/I) = 1, then I is maximal and there is nothing to prove. If length(R/I) = n + 1 ≥ 2, pick a maximal ideal m containing I. By Proposition 7.14(3), I = mJ for some ideal J. Notice that I 6= J otherwise by Proposition 7.14(2) we could cancel to conclude that R = m! Hence I ⊂ J and length(R/J) < length(R/I). So, by the induction hypothesis, J is a product of maximal ideals. This proves that every ideal in R is a product of maximal (=prime) ideals. To prove uniqueness, suppose that p1 ··· pn = q1 ··· qm. Because pn is prime and contains the product of the qis it must contain one of them, say qm. But qm is a non-zero prime ideal so is maximal, whence qm = pn. By Proposition 7.14, we can then cancel to deduce that p1 ··· pn−1 = q1 ··· qm−1. An obvious induction argument completes the proof.  √ Important remark. The reason for introducing the√ ring of integers in Q( d) is that the Factorization Theorem does not hold for Z[ d] when d ≡ 3(mod 4). In other words,√ if d is a negative square-free integer ≡ 3(mod 4), there is a non-zero ideal in Z[ d] that√ is not a product of prime ideals. We will illustrate this in the special case of Z[ −3]. The idea for finding such an ideal is this. Claim: If there is a maximal ideal m and an ideal I such that 2 m $ I $ m, then I is not a product of prime ideals. Proof: Suppose to the contrary that 2 I = p1 ··· pn with each pi prime. Since m is contained in I, and hence in each pi, MATH 404: ARITHMETIC 33 the fact that pi is prime implies that m ⊂ pi. But m is maximal so pi = m. Hence I is a power of m; but this is impossible because m2 is strictly smaller than I and m is strictly larger than I. ♦ √ So, we seek such I and m in Z[ −3]. Define √ m = (2, 1 + −3) and I = (2). √ √ √ 2 2 √Then m is generated by 4, 2(1√ + −3), and (1 + −3) = −2 + 2 −3 = 2(1 + −3 − 4; hence m2 = (4, 2(1 + −3)). It is clear that m2 ⊂ I ⊂ m. Notice that m is maximal because √ Z[ −3] Z[x] Z =∼ =∼ =∼ . m (x2 + 3, 2, 1 + x) (2) F2 √ √ Finally, I 6= m because 1 + −3 ∈/ (2) and m2 6= I because 2 ∈/ (4, 2(1 + −3); do a calculation if you need to. √ √ The inequality I 6= m rests on the fact that 1 (1 + −3) ∈/ [ −3]; however, √ 2 Z 1 √ 1+ −3 2 (1+ −3) is in the ring of integers Z⊕Z 2 , so our non-factorization argument would not work in√ the ring of integers. It is this that explains√ why the ring of integers is better than Z[ d]. In fancy-schmancy√ language, Z[ −3] is not integrally closed; or, m is a singular point of Spec Z[ −3].

7.5. Dedekind domains. A domain R is called a Dedekind domain if it has the following properties: (1) it is not a field; (2) it is integrally closed in its field of fractions; (3) length(R/I) < ∞ for all non-zero ideals I. Theorem 7.15. If R is a Dedekind domain then every non-zero ideal of R is a product of prime ideals in a unique way. We need to explain the terms in the definition. The notion of length is defined on page 32. To define the term “integrally closed” we must introduce the field of fractions of a domain.

7.5.1. The field of fractions of a domain. Let R be a domain. Its field of fractions, which we will shortly construct, is a field containing R whose elements are of the form ab−1 with a, b ∈ R and b 6= 0. Every domain has a field of fractions that can be constructed in the same way as Q is constructed from Z. First define an equivalence relation on R × R× := {(a, b) | a, b ∈ R, b 6= 0} by (a, b) ∼ (c, d) if ad = bc. Write a/b for the equivalence class of (a, b) and define an addition and multiplication on the set of equivalence classes by a/b + c/d := (ad + bc)/bd (a/b).(c/d) := ac/bd. 34 S. PAUL SMITH

It is tedious to see that the equivalence classes form a commutative ring. We will denote it by Fract R. It is not hard to see it is a field. There is a map R −→ Fract R, r 7→ r/1. It is an injective ring homomorphism. We identify R with its image in Fract R. Every non-zero element of R has an inverse in Fract R, namely r−1 = 1/r and every element in Fract R can be written as ab−1 for some a, b ∈ R with b 6= 0.

7.5.2. Integral closure. Let R be a subring of a domain S. We say that an element a ∈ S is integral over R if it is a zero of a monic polynomial n n−1 x + rn−1x + ··· + r1x + r0 with coefficients in R. The integral closure of R in S is the set of all elements that are integral over R. We say R is integrally closed in S if it is its own integral closure, i.e., if the only elements integral over R are the elements of R itself. There are two sources of Dedekind domains: number theory and algebraic curves. Theorem 7.16. The ring of integers in a number field is a Dedekind domain. Theorem 7.17. The coordinate ring O(X) of a smooth affine algebraic curve X is a Dedekind domain. The recognition in the late 1800’s that the coordinate rings of curves behaved like rings of integers is a very striking example of the unity of mathematics. A typical example of the coordinate ring of a smooth curve is R[x, y]/(y2−x3+x). Think of the elements in R[x, y] as functions R2 → R. Write X for the locus of points in R2 where y2 = x3 − x. By restricting a polynomial to the points of X each polynomial gives a function X → R. More formally, the rule f 7→ f|X is a ring homomorphism

R[x, y] → {the ring of R-valued functions on X}. Of course, the kernel of this map contains y2 − x3 + x and all its multiiples, It is easy to show this is the entire kernel. So... O(X) = R[x, y]/(y2 − x3 + x).

7.6. The . Let R be a Dedekind domain√ and K its field of fractions. For example, R might be the ring of integers in Q( d) with d a negative square-free integer. The class group of R is a finite abelian group that measures how close R is to being a principal ideal domain. There are several ways to define it. If I and J are non-zero ideals of R we write I ∼ J if there are non-zero elements a, b ∈ R such that aI = bJ. It is easy to see this is an equivalence relation. Equivalence classes are called ideal classes. There is one class if and only if R is a principal ideal domain. Ideal classes can be multiplied: if [I] denotes the equivalence class of I we define [I].[J] := [IJ]. Check that this is well-defined, associative, commutative, and that [R] is an identity. It is more difficult to show that inverses exist—one needs to show that given I, there is a J such that IJ is principal. MATH 404: ARITHMETIC 35

Proposition 7.14, which we proved for the ring of integers R = Z ⊕ Zη, holds for every Dedekind domain12. Part (3) of it tells us that if 0 6= a ∈ I, then there is an ideal J such that IJ = aR. Hence [I][J] = [R]. This completes the proof that the ideal classes do form a group. We denote it by Cl(R).

7.6.1. An alternative defintiion. If I is a non-zero ideal of R and 0 6= q ∈ K we call qI := {qa | a ∈ I} a fractional ideal. Every ideal in the usual sense is a fractional ideal, and these are the only fractional ideals contained in R, but there are many other fractional ideals, for example, sets of the form a−1R for every non-zero a in R. The set of fractional ideals is a group under multiplication X  IJ := aibi | ai ∈ I, bi ∈ J .

It I and J are fractional ideals, so is IJ because if qI ⊂ R and q0J ⊂ R, then qq0IJ ⊂ R. It is easy to see that multiplication is associative and that R is the identity. The inverse of a fractional ideal I is I−1 := {q ∈ K | qI ⊂ R}. The group of fractional ideals is never finite13 but the set of principal fractional ideals, i.e., those of the form qR where 0 6= q ∈ K, is a subgroup and the quotient by this is isomorphic to the ideal class group. It is easy to see that the class group is trivial if and only if R is a principal ideal domain. Some work is required to show that the class group is finite.

8. Galois Theory

Let k be a field of characteristic zero.

This hypothesis simplifies the theory. The main ideas are the same but the positive characteristic theory is more intricate. The characteristic zero hypothesis allows us to use the fact that an irreducible polynomial in k[x] has no multiple zeroes in any extension field K/k. To see this, n−1 n suppose f(x) = α0 +α1x+···+αn−1x +x ∈ k[x] is irreducible and that c ∈ K is a multiple zero of f(x). Because f is irreducible, it is the minimal polynomial of c. However, because (x − c)2 divides f in K[x], c is a zero of the derivative, f 0(x). But f 0(x) is in k[x] and is non-zero because char k = 0; this contradicts the fact that f is the minimal polynomial of c. A reminder. We will write [K : k] for the degree of a field extension K/k; by definition it is dimk K, the dimension of K as a vector space over k. If we have fields k ⊂ F ⊂ K, then [K : k] = [K : F ][F : k].

12Exercise! Hint: use the factorization theorem. 13 Figure out what it is for Z. 36 S. PAUL SMITH

8.1. Splitting fields. Let f ∈ k[x] and K/k an extension field. We say that f splits over K if f, when viewed as a polynomial in K[x], is a product of linear polynomials. We call K/k a splitting field for f (over k) if f splits over K but not over any strictly smaller subfield of K. A splitting field always exists. We showed last quarter that there is an extension field L/k in which f splits as a product of linear factors, so we may take K to be the subfield of L generated by k and the zeroes of f. In other words,

K = k(α1, . . . , αn) where f = λ(x − α1) ··· (x − αn) ∈ K[x]. We will speak of “the” splitting field of a polynomial—the justification is a consequence of the next result. Theorem 8.1. Let K/k be a splitting field for f ∈ K[x]. Let φ : k → k0 be an isomorphism of fields and let K0/k0 be a splitting field for φ(f) over k0. (1) There is an isomorphism Φ: K → K0 such that the diagram k / K

φ Φ   k0 / K0 commutes. (2) If char k = 0, there are [K : k] different extensions Φ. Proof. (1) We argue by induction on [K : k]. If [K : k] = 1, f is a product of linear polynomials in k[x], so K = k; φ(f) is clearly a product of linear polynomials in k0[x] too, so K0 = k0 and we can take Φ = φ. Now suppose [K : k] > 1 and let c ∈ K − k be a zero of f. Let g(x) ∈ k[x] be the minimal polynomial of c. Since f(c) = 0, g divides f. Hence φ(g) divides φ(f). Since φ(f) splits in K0, so does φ(g); let c0 ∈ K0 be a zero of φ(g). Then

∼ 0 k(c) =∼ k[x] / k [x] =∼ k0(c0) (g) φ (φ(g)) so φ extends to an isomorphism φe : k(c) → k0(c0). We therefore have a diagram k / k(c) / K

φ φe   k0 / k0(c0) / K0 Since K is a splitting field for f over k it is also a splitting field for f over k(c). Similarly, K0 is a splitting field for φ(f) over k0(c0). Now [K : k] = [K : k(c)][k(c): k] so [K : k(c)] < [K : k]. By the induction hypothesis, φe extends to an isomorphism Φ: K → K0. (2) We argue by induction on [K : k]. If [K : k] = 1, the result is clear so suppose that [K : k] > 1. Let g ∈ k[x] be the highest degree monic irreducible factor of f, say d = deg g, and write f = gh. Since [K : k] > 1, f does not split in k[x], so d > 1. MATH 404: ARITHMETIC 37

Let c ∈ K be a zero of g. Since char k = 0, g and φ(g) have d distinct zeroes. For 0 0 each of the d zeroes c1, . . . , cd of φ(g) in K , there is an extension φei : k(c) → k (ci). We now have k / k(c) / K

φ φei   0 0 0 k / k (ci) / K 1 By the induction hypothesis, there are [K : k(ci)] = d [K : k] extensions of φei to K. Hence we get [K : k] different extensions of φ to K. These are all possible extensions of φ because if Φ : K → K0 is an extension of φ, Φ(c) must be a zero of φ(g), so is one of the cis, and then Φ is an extension of φei so has been counted. 

Corollary 8.2. Any two splitting fields of a polynomial are isomorphic; i.e., if K/k and K0/k are splitting fields over k for f ∈ k[x], then there is an isomorphism 0 Φ: K → K such that Φ|k = idk. Corollary 8.3. Let p be a prime. Then any two fields with pn elements are iso- morphic. Proof. Let q = pn. If K is a field with q elements, then the group K× = K−{0} has q−1 elements so, by Lagrange’s theorem, aq−1 = 1 for all a ∈ K×. Hence aq = a for q all a ∈ Fq. Thus K contains q distinct zeroes of the polynomial f = x − x ∈ Fp[x]. Therefore K is a splitting field for f over Fp. Now apply the theorem.  An extension K/k is a Galois extension if K is the splitting field for some f ∈ k[x]. Remark. Suppose K/k is a finite extension such that every irreducible f ∈ k[x] having a zero in K splits in K[x]. Then K/k is a splitting field/Galois extension. To see this, write K = k(α1, . . . , αn), let fi ∈ k[x] be the minimal polynomial of αi and set f = f1 ··· fn; by hypothesis, each fi splits in K[x] so f splits in K[x]; since K is generated by the zeroes of f it is the splitting field of f over k.

8.2. The Galois group of an extension. The Galois group of an extension K/k is defined to be ( ) automorphisms σ : K → K such that σ| = id Gal(K/k) := k k . i.e., σ(a) = a for all a ∈ k

This is a group under composition. If K/k is a splitting field of f ∈ k[x] we call Gal(K/k) the Galois group of f and denote it by Gal(f/k). Theorem 8.4. If K/k is a Galois extension and char k = 0, then | Gal(K/k)| = [K : k].

0 Proof. Apply Theorem 9.1 with k = k and φ = idk. 

Lemma 8.5. Let L/k be an extension, c ∈ L, and σ ∈ Gal(L/k). If f ∈ k[x] and f(c) = 0, then f(σ(c)) = 0. In particular, the minimal polynomials of c and σ(c) are the same. 38 S. PAUL SMITH

n n−1 Proof. Write f = x + αn−1x + ··· + α1x + α0. Then n n−1 f(σ(c)) = σ(c) + αn−1σ(c) + ··· + α1σ(c) + α0 n n−1 = σ(c + αn−1c + ··· + α1c + α0) = σ(0) = 0. Now suppose f is the minimal polynomial of c. Since f(σ(c)) = 0, the minimal polynomial of σ(c) divides f. But a minimal polynomial is always irreducible, so the minimal polynomial of σ(c) is f.  Proposition 8.6. If K/k is the splitting field of a degree n polynomial, then Gal(K/k) is isomorphic to a subgroup of the symmetric group Sn.

Proof. Suppose K/k is the splitting field of f ∈ k[x] and let α1, . . . , αn ∈ K be the zeroes of f. If σ ∈ Gal(K/k), then σ(αi) is also a zero of f (Lemma 9.5). Hence σ permutes the elements of {α1, . . . , αn}. Restricting the action of σ to this set of zeroes gives a map Φ : Gal(K/k) → Sn. It is clearly a group homomorphism and is injective because K = k(α1, . . . , αn); if σ(αi) = αi for all i, then σ acts as the identity on all of K.  Notation. If G is a group of automorphisms of a field K we write KG := {a ∈ K | σ(a) = a for all σ ∈ G}. Corollary 8.7. Let K/k be a Galois extension and write G = Gal(K/k). Then KG = k. Proof. It is easy to see that KG is a subfield of K containing k. Since K is the splitting field of some f ∈ k[x] it is also the splitting field over KG of f ∈ KG[x]. Hence K/KG is a Galois extension. But Gal(K/KG) ⊂ Gal(K/k) so G is also the Galois group of K/KG. Hence [K : k] = |G| = | Gal(K/KG)| = [K : KG]. Thus G K = k.  Corollary 8.8. Let K/k be a Galois extension and write G = Gal(K/k).

(1) If β ∈ K and G.β = {β = β1, . . . , βm}, then the minimal polynomial of β over k is (x − β1) ··· (x − βm). (2) If an irreducible polynomial g ∈ k[x] has a root in K it splits in K. Proof. (1) Let g ∈ k[x] be the minimal polynomial of β. Then σ(β) is a zero of g, so g is divisible by each x − βi. The coefficents of (x − β1) ... (x − βm) are symmetric functions in the βis so are stable under the action of every σ ∈ G; those G coefficients are therefore in K = k; this (x − β1) ... (x − βm) ∈ k[x]. (2) Let β ∈ K be a root of g. Since g is irreducible it is the minimal polynomial of β. However, by (1), g(x) = (x − β1) ... (x − βm) where the βis are the distinct element in G.β. Hence g splits in K.  Proposition 8.9. Let L/k be a finite extension. Then there is an extension K/L such that K/k is a Galois extension.

Proof. Suppose L = k(α1, . . . , αn). Let fi ∈ k[x] be the minimal polynomial of αi and define f = f1f2 . . . fn. Let K be the splitting field of f over L. Then K is also the splitting field of f over k, so K/k is Galois.  MATH 404: ARITHMETIC 39

Corollary 8.10. An extension L/k is Galois if and only if Gal(L/k) = [L : k]. Proof. If L/k is Galois the result is true by Theorem 9.4. To prove the converse, assume that Gal(L/k) = [L : k]. Let K/k be a Galois extension that contains L. If σ ∈ Gal(L/k), there are [K : L] distinct σis in Gal(K/k) such that σi|L = σ (Theorem 9.1). Hence, [K : k] = | Gal(K/k)| ≥ [K : L]| Gal(L/k)| = [K : L][L : k] = [K : k]. It follows that every η in G is the extension of some σ ∈ Gal(L/k) and hence that η(L) ⊂ L for all η ∈ Gal(K/k). Suppose L = k(α1, . . . , αn). Let fi ∈ k[x] be the minimal polynomial of αi and define f = f1f2 . . . fn. Certainly f splits in K and, if β ∈ K is a zero of f, then β is a zero of some fi so β = σ(αi) for some σ ∈ Gal(K/k) by Corollary 9.8. Hence β ∈ σ(L) = L and we deduce that f splits in L. Hence L is the splitting field for f over k.  Corollary 8.11. A Galois extension of a Galois extension is a Galois extension. Proof. Let k ⊂ F ⊂ K be extension fields such that K/F and F/k are Galois. Each σ ∈ Gal(F/k) can be extended to K in [K : F ] ways, so | Gal(K/k)| ≥ [K : F ].| Gal(F/k)| = [K : F ].[F : k] = [K : k]. Let L/k be a Galois extension containing K. Each σ ∈ Gal(K/k) can be extended to L in [L : K] different ways, so | Gal(L/k)| ≥ [L : K].| Gal(K/k)|. Thus [L : k] | Gal(K/k)| ≤ = [K : k]. [L : K] Combining the two displayed inequalities gives | Gal(K/k)| = [K : k], so K/k is a Galois extension.  8.3. The Galois correspondence. Theorem 8.12. Consider fields k ⊂ F ⊂ K where K/k is a Galois extension. Then (1) Gal(K/F ) is a subgroup of Gal(K/k); (2) F/k is a Galois extension if and only if Gal(K/F ) is a normal subgroup of Gal(K/k) and (3) in that case Gal(K/k) =∼ Gal(F/k). Gal(K/F ) Proof. (1) This is clear: if σ : K → K is the identity on F it is the identity on k. Note that K/F is a Galois extension because if K/k is the splitting field of f ∈ k[x], then K/F is the splitting field of f ∈ F [x]. (2) (⇒) Suppose F/k is a Galois extension. Let c ∈ F and σ ∈ Gal(K/k). Let f ∈ k[x] be the minimal polynomial of c. Since f has a zero in F it splits in F . But σ(c) is also a zero of f so is in F . Hence σ(F ) ⊂ F . It follows that there is a well defined map

Φ : Gal(K/k) → Gal(F/k), Φ(σ) := σ|F . This is obviously a group homomorphism and its kernel is Gal(K/F ), so Gal(K/F ) is a normal subgroup. 40 S. PAUL SMITH

(3) If F/k is Galois, then we have the homomorphism Φ and the number of elements in its image is | Gal(K/k)| [K : k] = = [F : k] = | Gal(F/k)|. | Gal(K/F )| [K : F ] Hence Φ is surjective and we obtain the desired isomorphism. (2) (⇐) Suppose Gal(K/F ) is a normal subgroup of Gal(K/k). To show that F/k is Galois it suffices to show that every irreducible polynomial having a zero in F splits in F [x]. Let f ∈ k[x] be irreducible and let α ∈ F be a zero of f. But the minimal polynomial of an α ∈ F is (x − α1) ... (x − αm) where {α1, . . . , αm} is the orbit of α under the action of Gal(K/k) so it suffices to show that σ(α) ∈ F for all σ ∈ Gal(K/k). If η ∈ Gal(K/F ), then σ−1ησ ∈ Gal(K/F ), so α = σ−1ησ(α) and σ(α) = η(σ(α)). Hence σ(α) ∈ KGal(K/F ). But K/F is a Galois extension so, by Gal(K/F ) Corollary 9.10, K = F and σ(α) ∈ F . 

Theorem 8.13 (The Galois correspondence). Suppose K/k is a Galois extension. There is an order-reversing bijection {intermediate fields F , k ⊂ F ⊂ K} ←→ {subgroups of Gal(K/k)} implemented by F 7→ Gal(K/F ) and KH ← H. Proof. To prove these maps are mutual inverses we must show that F = KGal(K/F ) and H = Gal(K/KH ). The first equality is given by Theorem ??. If H is a subgroup of Gal(K/k), then K/KH is a Galois extension and H ⊂ Gal(K/KH ), so |H| = [K : KH ] = | Gal(K/KH )|. Hence H = Gal(K/KH ). 0 “Order-reversing” means that if H ⊂ H0, then KH ⊃ KH . Actually, the lattices involved in the bijection are anti-isomorphic.  Remarks. Consider a Galois extension K/k and an intermediate extension k ⊂ F ⊂ K. 1. We will always think of Gal(K/F ) as the subgroup of Gal(K/k) consisting of automorphisms of K/k that are the identity on F . 2. Notice that [F : k] = | Gal(K/k)|/| Gal(K/F )|. 3. Since Gal(K/k) is finite it has only a finite number of subgroups. Hence there are only a finite number of intermediate extensions F . This is not apriori obvious because one has intermediate extensions k(α) for all α ∈ K and there are infinitely many choices for α if k is infinite. For example, it√ is not at all obvious√ why 5 √ 3 only finitely many different extensions of Q appear as Q(a 2 + b 7 −11 + c 17) as a, b, c ∈ Z vary! MATH 404: ARITHMETIC 41

9. Solutions to polynomial equations 9.1. The big picture. The roots of the polynomial x3 + px + q are given by the formula v s v s u  2  3 u  2  3 u3 q q p u3 q q p t− + + − t + + 2 2 3 2 2 3 1 where the cube roots are chosen so that their product is − 3 p. Galois theory is about solutions to polynomial equations. The driving question is this: is there a formula for the roots of a polynomial in terms√ of its coefficients? By “formula” we mean an expression involving +, −, ×, ÷, r −. Let’s suppose there is such a formula and analyse what that means. We first do some calculations in the field k that contains the coefficients, using only the th operations +, −, ×, ÷, to obtain an element√ a ∈ k. We then take an r root of r a, thus moving into a new field K1 = k( a). We call an extension obtained by adjoining an rth root a radical extension. We then repeat the process, calculating inside K1, then taking a root. After repeating this process a finite number of times we obtain a field √ √ √ K = k( r a)( s b) ··· ( t c) that contains the roots of f. In other words, we obtain a sequence of fields

(9-1) k = K0 ⊂ K1 ⊂ · · · ⊂ Kn = K in which each field is a radical extension of the previous field. For the cubic x3 + px + q, √ q 2 p3 K = k( a) where a = + , 1 2 3 √ q √ K = K ( 3 b) where b = + a. 2 1 2 Definition 9.1. A polynomial f ∈ k[x] is solvable by radicals if all its roots belong to a field K that is obtained from k by a sequence of radical extensions (10-1). ♦ Whether or not f is solvable by radicals is completely determined by the structure of a certain finite group, the Galois group of f: f is solvable by radicals if and only if its Galois group is a solvable group. We will postpone the definition of what it means to be solvable. But here are some facts: • Every abelian group is solvable. • Subgroups and quotients of solvable groups are solvable. • The Galois group of a degree n polynomial is a subgroup of the symmetric group Sn. • If n ≤ 4, then every subgroup of Sn is solvable. • Every polynomial of degree ≤ 4 is solvable by radicals. • Every group of order < 60 is solvable. • The alternating group A5 and the symmetric group S5 are not solvable. • There is a degree 5 polynomial having S5 as its Galois group. • There is a degree 5 polynomial having A5 as its Galois group. • There is a degree 5 polynomial that is not solvable by radicals. • A non-abelian simple group14 is not solvable.

14A group is simple if its only normal subgroups are itself and the trivial group. 42 S. PAUL SMITH

9.2. Two technical simplifications. √ 9.2.1. A factorization of an integer r allows us to split a radical extension k( r a) into a sequence of smaller radical extensions: if r = mn, then √ √ √ k( r a) = k( m a)( n a0) √ where a0 = m a. We can therefore split a radical extension into a sequence of extensions by pth roots where p is prime. 9.2.2. Let K/k be a Galois extension and F an intermediate field, k ⊂ F ⊂ K. Suppose that p is prime and c ∈ K − F is such that cp = a ∈ F . Then √ F ⊂ F (ξ)( p a) ⊂ K where ξ is a primitive pth root of unity. 9.3. Solvable groups. A group G is solvable if there is a chain of subgroups

G = G0 ⊃ G1 ⊃ · · · ⊃ Gn = {1} such that each Gi is a normal subgroup of Gi−1 and Gi−1/Gi is abelian. We can replace the word “abelian” by “cyclic” or even by “cyclic of prime order” without affecting the definition. Proposition 9.2. If G is solvable so is every subgroup and every quotient of it. Proposition 9.3. Let N be a normal subgroup of G. Then G is solvable if and only if N and G/N are. Proposition 9.4. Let k ⊂ F ⊂ K be fields such that K/F and F/k are Galois extensions with solvable Galois groups. Then Gal(K/k) is solvable. Proof. By Corollary 9.11, a Galois extension of a Galois extension is Galois, so K/k is Galois. Hence Gal(K/k) =∼ Gal(F/k). Gal(K/F ) and it follows from Proposition 10.3 that Gal(K/k) is solvable. 

9.4. Corresponding to the chain k = K0 ⊂ K1 ⊂ · · · ⊂ Kn = K there is a chain of subgroups Gal(K/k) = G0 ⊃ G1 ⊃ · · · ⊃ Gn = {1} where Gi = Gal(K/Ki). The next result examines one step in this process. Theorem 9.5. Let K/k be a Galois extension. Let p be a prime and suppose c ∈ K − k is such that cp ∈ k. Set F = k(c). If k contains a primitive pth root of unity, then (1) σ(F ) ⊂ F for all σ ∈ Gal(K/k) (2) xp − cp is the minimal polynomial of c. (3) [F : k] = p. (4) F/k is Galois and Gal(F/k) =∼ Z/p. (5) Gal(K/F ) is a normal subgroup of Gal(K/k) and Gal(K/k) =∼ Gal(F/k). Gal(K/F ) . MATH 404: ARITHMETIC 43

Proof. (1) Write a = cp. Since c is a zero of xp − a ∈ k[x], so is σ(c). But the zeroes of xp − a are c, ξc, ξ2c, . . . , ξr−1c where ξ is a primitive pth root of unity in k. By Lemma 9.5, σ(c) = ξic for some i. Thus σ(c) ∈ F and hence σ(F ) ⊂ F . (2) It suffices to show that the minimal polynomial of c has degree d. Let d−1 d f(x) = α0 + α1x + ··· + αd−1x + x be the minimal polynomial of c. Since c∈ / k, d ≥ 2. But every zero of f(x) is a zero of xp − a so is of the form ξic for some i. By the remarks at the beginning of this chapter, c is not a multiple zero of f(x). Hence f has at least two distinct zeroes, c and ξic where ξi 6= 1. Therefore

d−1 d 0 = α0 + α1c + ··· + αd−1c + c i i(d−1) d−1 id d 0 = α0 + α1ξ c + ··· + αd−1ξ c + ξ c . By taking the difference and cancelling a factor of c, we see that c is a zero of i 2i (d−1)i d−2 di d−1 g(x) = α1(1 − ξ ) + α2(1 − ξ )x + ··· + αd−1(1 − ξ )x + (1 − ξ )x . Since the minimal polynomial of c has degree d, all the coefficients of g(x) are zero. In particular, 1 = ξdi; but ξi 6= 1 so ξi is a primitive pth root of unity. Hence d = p. (3) Since xp − a is the minimal polynomial of c, k(c) = k[c] =∼ k[x]/(xp − a). (4) Certainly F is a splitting field for xp − a so F/k is a Galois extension. An automorphism of F = k(c) is completely determined by its effect on c. The mapσ ¯ : k[x] → k[x] defined byσ ¯(x) = ξx is an automorphism of k[x] andσ ¯(xp − a) = xp − a so there is a map σ : k[x]/(xp − a) → k[x]/(xp − a) such that

k[x] σ¯ / k[x]

  k[x] / k[x] (xp−a) σ ______(xp−a) commutes. Hence there is an automorphism of k(c) given by σ(c) = ξc and σ|k = idk. For every integer n > 0, σn is also an automorphism of k(c). But σn(p) = ξnc, so hσi =∼ Z/p. (5) Define Φ : Gal(K/k) → Gal(F/k) by Φ(σ) = σ|F . This is a well-defined map by (1). It is obviously a group homomorphism and its kernel is Gal(K/F ) so we need only prove surjectivity. Let η ∈ Gal(F/k). We must show there is an automorphism σ of K such that the following diagram commutes: F / K

η σ   F / K But K/F is the splitting field of xp − a ∈ F [x] so. by Theorem 9.1, such a σ exists. 

Proposition 9.6. Let p be a prime and ξ a primitive pth root of unity. Then Q(ξ) is a Galois extension over Q and its Galois group is isomorphic to Zp−1. 44 S. PAUL SMITH

× Proof. The multiplicative group Fp of the field with p elements is cyclic of order p − 1, so it suffices to show there is an isomorphism × Φ : Gal(Q(ξ)/Q) → Fp . By Eisenstein’s criterion, f(x) = xp−1 + ··· + x + 1 is irreducible. It is therefore the minimal polynomial of ξ, and Q(ξ) =∼ Q[x]/I where I the ideal generated by f(x). If σ ∈ G = Gal(Q(ξ)/Q), then σ(ξ) is again a pth root of unity so is equal to ξi for some i. Since i is only well-defined modulo p, there is a well-defined map × i Φ: G → Fp , Φ(σ) = i if σ(ξ) = ξ . This is a group homomorphism because Φ(id) = 1 and if Φ(σ) = i and Φ(τ) = j, then στ(ξ) = σ(ξj) = σ(ξ)j = ξij so Φ(στ) = ij.

It is clear that Φ is injective since σ(ξ) = ξ implies σ = idQ(ξ). To show that Φ is surjective it suffices to show that |G| ≥ p − 1. To this end, fix an integer r 6≡ 0(mod p) and consider the ring homomorphism θ : Q[x] → Q[x] given by θ(x) = xr. Claim: θ(I) ⊂ I. Proof: It suffices to show that θ(f) ∈ I. But I is a prime ideal and xr − 1 ∈/ I so it suffices to show that (xr − 1)θ(f) ∈ I. But (xr − 1)θ(f) = r r rp p p(r−1) p (x − 1)f(x ) = x − 1 = (x − 1)(x + ··· + x + 1) which is clearly in I. ♦ Since θ(I) ⊂ I, I is contained in the kernel of the composition

θ [x] / [x] / Q[x] Q Q I . Hence there is a well-defined map σ making the following diagram

θ Q[x] / Q[x]

  Q[x] / Q[x] I σ ______I However, the isomorphism Q[x]/I =∼ Q(ξ) is induced by the map x 7→ ξ, so trans- r ferring σ to Q(ξ) gives a map σ(ξ) = ξ . 

Corollary 9.7. If Q ⊂ F ⊂ C and ξ is a primitive pth root of unity, then F (ξ)/F is a Galois extension and Gal(F (ξ)/F ) is cyclic.

Proof. Write H = Gal(F (ξ)/F ) and G = Gal(Q(ξ)/Q). If σ ∈ H, then σ(ξ) is a pth root of unity, so σ(Q(ξ)) ⊂ Q(ξ). Hence there is a group homomorphism Ψ: H → G. It is clear that Ψ is injective, so H is a subgroup of G. But G is cyclic, so H is cyclic too. 

Theorem 9.8. Let f ∈ Q[x]. If f is solvable by radicals, then Gal(f/Q) is solvable. Proof. Suppose f is solvable by radicals. Then there is an extension L/Q in which f splits and L is obtained from Q by a finite number of radical extensions, say √ √ √ r s t Q ⊂ L1 = Q( a) ⊂ L1( b) ⊂ · · · ⊂ Ln−1 ⊂ Ln−1( c) = L. Let m be the product of the distinct primes that divide r, s, . . . , t, and let ξ be a primitive mth root of 1. Note that Q(ξ) can be obtained from Q by adjoining pth roots of unity for various primes p. By successive applications of Corollaries 10.7 MATH 404: ARITHMETIC 45 and 9.11, Q(ξ) is a Galois extension of Q. By repeated application of Theorem 10.5, all the extensions in the chain √ √ √ r s t Q(ξ) ⊂ K1 = Q(ξ)( a) ⊂ K2 = K1( b) ⊂ · · · ⊂ Kn−1 ⊂ Kn−1( c) are Galois extensions, with solvable Galois groups. Since f splits in Kn, Kn contains a splitting field for f over k, say F , and Gal(F/k) is a quotient of Gal(Kn/k). Hence Gal(f/k) is solvable.  Remark. Theorem 10.8 holds for any field k of characteristic zero. The same proof will work.

9.5. A quintic polynomial that is not solvable by radicals. To produce a polynomial in Q[x] that is not solvable by radicals we must produce a polynomial whose Galois group is not solvable. The symmetric group S5 is not solvable, so it suffices to find a degree 5 polynomial f ∈ Q[x] such that Gal(f/Q) contains elements σ and τ that generate S5. Remember, that Gal(f/Q) permutes the zeroes of f so is a subgroup of S5. It is a fact that S5 is generated by a 5-cycle and a transposition.

Theorem 9.9. Let f ∈ Q[x] be an irreducible polynomial of prime degree, say p. ∼ If f has exactly two non-real zeroes, then Gal(f/Q) = Sp. In particular, if p ≥ 5, then f is not solvable by radicals.

Proof. By the Fundamental Theorem of Algebra f splits in C. Let K be the splitting field for f over Q, and set G = Gal(K/Q) = Gal(f/Q). The action of G on the zeroes of f gives an injective group homomorphism G → Sp. If c ∈ K is a zero of f, then f is the minimal polynomial of c so [k(c): k] = deg f = p. Hence p divides [K : Q] = |G| so, by Cauchy’s Theorem, G has an element σ of order p. This must be a p-cycle, say σ = (12 . . . p). Complex conjugation is a Q-automorphism of C, so restricts to a Q-automorphism of K. But conjugation fixes the p − 2 real zeroes of f, and permutes the two non-real zeroes so is a 2-cycle. Let’s assume this transposition is (1n). Notice that σn is again a p-cycle and σn = (1 n 2n − 1 ··· ), so after relabelling we can assume that G contains (12 . . . p) and (12). But these two elements generate all of Sp so G = Sp. 

Exercise. In S4, (13) and (1234) do not generate S4, so check the assumption in the last sentence of the proof—you need to use the fact that p is prime.

Example 9.10. The polynomial f = x5 −6x+3 ∈ Q[x] is not solvable by radicals. By Eisenstein’s criterion f is irreducible. Since f 0(x) = 5x4 − 6 has only two real zeroes, say ±α, f has at most three real zeroes by Rolle’s Theorem. Since f(−α) > 0 > f(α) and f(x) → ±∞ as x → ±∞, f has three real zeroes. Hence ∼ Gal(f/Q) = S5. ♦ 9.6.

Theorem 9.11. Let k be a subfield of C containing a primitive pth root of unity. Let K/k be a Galois extension such that Gal(K/k) =∼ Z/p. Then there is an a ∈ k such that K is the splitting field for xp − a over k. Proof. Let σ be a generator for Gal(K/k) and ξ ∈ k a primitive pth root of unity. Since σ is a k-linear map K → K it can be represented by a matrix, say A. Since 46 S. PAUL SMITH

p A = idK , the kernel of the ring homomorphism

r r Ψ: k[x] → Mn(k), Ψ(x ) := A , contains xp − 1 = (x − 1)(x − ξ)(x − ξ2) ... (x − ξp−1).

i For each i = 0, 1, . . . , p − 1, let Vi = {v ∈ K | σ(v) = ξ v}. i Since σ 6= idK , there an eigenvector v ∈ K having eigenvalue ξ for some 1 ≤ i ≤ p − 1. Now σ(vp) = σ(v)p = (ξiv)p = vp. Hence vp ∈ k. Now set a = vp. Thus xp − a ∈ k[x] and it splits in K because it has p distinct zeroes in K, namely {ξiv | 0 ≤ i ≤ p − 1}. Hence K contains a splitting field for xp − a over k, namely k(v). The argument in Theorem 10.5(2) shows that xp − a is irreducible. Hence xp − a is the minimal polynomial of v and k(v) =∼ k[x]/(xp − a). Thus k(v) has dimension p over k. But | Gal(K/k)| = [K : k]. Hence [K : k] = p = [k(v): k], and it follows that K = k(v). 

10. Quaternions The quaternions were discovered by Sir William Rowan Hamilton on October 16, 1843. We will use them to give a proof of Lagrange’s Theorem15 (1770) that every integer is a sum of four squares. In section 4.3 we showed that every prime that is congruent to 1 or 2 modulo 4 is a sum of two squares. Our proof made use of the Gaussian integers Z[i]. Our proof that every integer is a sum of four squares will be somewhat analogous but instead of using Z[i] the proof will use the ring of Hurwitz, or integer, quaternions. The most striking difference is that the ring of integer quaternions is not commutative.

10.1. The definition. The ring of quaternions is non-commutative. It is a 4- dimensional vector space over R and its basis is usually written as 1, i, j, k. In Hamilton’s honour we usually denote them by the letter H,

H = R + Ri + Rj + Rk. The multiplication is determined by the following rules:

i2 = j2 = k2 = ijk = −1 and, if r is any real number ri = ir, rj = jr, and rk = kr. It follows from these multiplication rules that

ij = −k−1 = k, jk = −i−1 = i, ki = j, ji = kii = −k = −ij, kj = kki = −i = −jk, ik = iij = −j = −ki.

Every quaternion can be written as a + bi + cj + dk for real numbers a, b, c, d. We call bi + cj + dk a pure quaternion.

15Fermat had asserted this in 1659 but, as usual, did not leave us a proof. MATH 404: ARITHMETIC 47

10.2. Conjugate and norm. The conjugate of a quaternion α = a + bi + cj + dk is α¯ := a − bi − cj − dk. Note that α¯β¯ = βα. A calculation gives αα¯ = a2 + b2 + c2 + d2. We call this the norm of α and denote it by N(α)—it is a non-negative real number and, in analogy with the complex numbers, is zero if and only if α = 0. It follows that every non-zero element of H has an inverse, namely α¯ α−1 = . N(α) This makes sense because we can multiply the quaternionα ¯ by the real number N(α)−1. The norm is multiplicative: N(αβ) = αβαβ = αββ¯α¯ = αN(β)¯α = ααN¯ (β) = N(α)N(β). Hence H is like a field except it is not commutative. A non-commutative ring in which every non-zero element has an inverse is called a division ring or skew field. The quaternions provided the first example of a division ring. Hamilton, who had an extremely distinguished career as a scientist, considered their construction and subsequent study his finest work. Before discovering them he had spent the best part of 15 years searching for a field that has dimension 3 over R. Nowadays it is easy to prove there is no such field.

10.3. The relation between H and C. Let’s write C for the subring R ⊕ Ri of H. Then every element in H can be written in a unique way as w + jz where w, z ∈ C. The conjugation map in H is compatible with conjugation in C in the sense that if z = a + bi with a, b ∈ R then the conjugatez ¯ in C is the same as the conjugatez ¯ in H; both are equal to a − bi. If a, b ∈ R, then (a+bi)j = aj +bk whereas j(a+bi) = aj −bk; hence (a+bi)j = j(a − bi); thus zj = jz¯ for all z ∈ C. Notice too that if w, z ∈ C, then w + zj =w ¯ + ¯jz¯ =w ¯ − jz¯ =w ¯ − zj where the left-hand side is the conjugation in H and the other conjugations denote conjugation in C. It follows that (10-1) N(w + zj) = (w + zj)(w + zj) = (w + zj)(w ¯ − zj) = ww¯ + zz.¯

Proposition 10.1. There is an isomorphism φ from H to a subring of M2(C) given by w −z w + jz 7→ z¯ w¯ for w, z ∈ R + Ri. The image of this map is the R-linear span of 1 0 i 0  0 −1  0 −i ,I := ,J := ,K := . 0 1 0 −i 1 0 −i 0 48 S. PAUL SMITH

Notice that N(q) = det φ(q). In the proposition I, J, and K, denote the images of i, j, and k, respectively. They span the space of trace zero matrices so form a basis for the Lie algebra sl(2, C). The matrices 0 1 0 −i 1 0  σ := = iK, σ := = iJ, σ := = −iI x 1 0 y i 0 z 0 −1 are known to physicists as the Pauli spin matrices. One can now proceed to say quite a lot about the ways in which the quaternions are used to formulate some physics.

10.4. The group of non-zero quaternions. The non-zero quaternions form a group under multiplication. The quaternions of norm one form a 3-sphere in R4, and it is elementary to see that these form a group under multiplication. This is an analog of the fact that the complex numbers of length one form a group that is, as a topological space, the circle. The only spheres having a (topological) group structure are the circle, the 3-sphere, and the 7-sphere. The 7-sphere group arises as the non-zero octonions of norm one. The octonions was the name given to a non-associative division ring discovered by Graves (and slightly later, and independently, by Cayley) shortly after Hamilton’s discovery of the quaternions. There are many subrings of H isomorphic to the complex numbers. The three obvious ones are R + Ri, R + Rj, and R + Rk. However, if q is any non-zero quaternion, the square of qiq−1 is −1, so R + Rqiq−1 =∼ C also (you need to note that qiq−1 ∈/ R). If 0 6= r ∈ R, then (rq)i(rq)−1 = qiq−1 so to understand all the elements {qiq−1 | 0 6= q ∈ H} form a single orbit under the action of the group of unit quaternions. The stabilizer in S3 of i consists of 3 3  1 the elements of S that commute with i, i.e., S ∩ (R + Ri) − {0} =∼ U(1) =∼ S . Hence, the subrings of H isomorphic to C are in natural bijection with S3/S1. 10.5. The ring of Hurwitz quaternions. Define 1 ξ := 2 (1 + i + j + k) and note that N(ξ) = 1. The ring of Hurwitz quaterions is the subring

H := Z ⊕ Zi ⊕ Zj ⊕ Zξ of H. The ring H behaves better than Z⊕Zi⊕Zj ⊕Zk because it is the appropriate analogue of the subring of algebraic integers in a number field: for example, the minimal polynomial of ξ is x2 −x+1 so it behaves like an algebraic integer. Notice, too, that the disease of non-unique factorization16 (1 + i)(1 − i) = 2 = (1 + j)(1 − j)

16Strictly speaking we only defined “unique factorization” in a commutative ring but there is an analogous notion for non-commutative rings which we don’t need to go into now. MATH 404: ARITHMETIC 49

1 is “cured” in H because there is a unit u ∈ H, namely u = 2 (1 − i + j − k), such that 1 + j = (1 + i)u and 1 − j = u−1(1 − i). The fancy schmancy reason why H is better is that it is a maximal order but the smaller ring is not. Being a maximal order is a non-commutative analogue of being integrally closed. Furthermore, the densest sphere-packing in R4 is achieved by placing the centers of the spheres at the lattice points given by the elements of H. The points of the lattice formed by H are the vertices of a tiling of R4 by 4- dimensional cells. Each cell is a translate of the fundamental cell having the sixteen vertices r + si + tj + uξ where r, s, t, u ∈ {0, 1}. The interior of this cell consists of the points r + si + tj + uξ such that r, s, t, u ∈ [0, 1]. The 96 edges of the fundamental cell have length one. Every point in this cell has distance < 1 from some vertex. Hence every point in R4 is of distance < 1 from some point of H. This leaads at once to a version of the Euclidean algorithm for H. Suppose a, b ∈ H and b 6= 0. Then b−1a ∈ H so there is a q ∈ H such that N(b−1a − q) < 1, and therefore N(a − bq) < N(b). Hence we can write a = bq + r with q, r ∈ H and N(r) < N(b). Likewise, there are q0, r0 ∈ H such that a = q0b + r0 with N(r0) < N(q0). Proposition 10.2. Every right ideal in H is of the form bH for some b ∈ H.

Proof. Just follow the proof for a Euclidean domain. 

10.6. Units in H. The group of units in Z⊕Zi⊕Zj ⊕Zk is {±1, ±i, ±j, ±k}. It is called the quaternion group. It and the dihedral group D4 are the only non-abelian groups of order 8. The group of units in H has order 24: it consists of the quaternion group and the sixteen additional elements 1 (±1 ± i ± j ± k). 2 It is isomorphic to the binary tetrahedral group (the double cover of the symmetry group of the tetrahedron) and to SL(2, F3), the group of 2 × 2 matrices with entries in F3 having determinant one. The units are exactly the elements in H having norm one. These 24 points in R4 form an important combinatorial geometric object called the 24-cell. It’s symmetry group is the Weyl group of type F4, another important object in mathematics. One reason for the importance of the 24-cell is that there is no good analogue of it in R3—all other regular polytopes in R4 are nice analogues of the platonic solids in R3.

10.7. Lagrange’s four squares theorem. When classifying the primes in Z[i] in terms of those in Z we proved that every prime conguent to 1(mod 4) is a sum of two squares. We will use the Hurwitz quaternions in an analogous way to prove that every prime in Z, and hence every integer, is a sum of four squares. Proposition 10.3. The norm of every element in Z + Zi + Zj + Zk is a sum of four squares in Z. 50 S. PAUL SMITH

Proof. This follows from the definition of the norm.  Compare the next result to Theorem ?? where it was shown that for a prime integer p ≡ 1(mod 4), −1 is a square in Fp.

Proposition 10.4. Let p ∈ Z be prime. Then there are elements u, v ∈ Fp such that u2 + v2 = −1.

Proof. This is clear if p = 2 so suppose p is odd. Let S = {x2 x ∈ Fp}. If 2 2 a ∈ S −{0} there are two elements of Fp having square a: if x = a, then (−x) = a 1 too, and x 6= −x because p is odd. Hence S −{0} has 2 (p−1) elements. Thus S and 1 1 − S both have 2 (p + 1) elements. But Fp has only p elements so S ∩ (1 − S) 6= φ. 2 2 Hence there are u, v ∈ Fp such that u = 1 − v .  Now let p ∈ Z be prime. Since pH 6= H it is contained in a maximal right ideal of H, say πH. Write p = πτ. Then p2 = N(πτ) = ππτ¯ τ¯. If ττ¯ = 1, then pH = πH so pH is already maximal. Otherwise, ππ¯ = p = ττ¯. However, π = r + si + tj + uξ for some integers r, s, t, u so, if u is even π ∈ Z + Zi + Zj + Zk and N(π) = p so p is a sum of four squares. Now suppose u is odd. Then 2π = (2r + u) + (2s + u)i + (2t + u)j + uk.

Let p be a prime. Then there are integers u, v such that u2 + v2 + 1 is divisible by p. Let’s write d = −u2 − v2. Consider the integer quaternion a = 1 + ui + vj. Then a2 = 1 + 2(ui + vj) + (ui + vj)2 = 2(1 + ui + vj) − 1 − u2 − v2 = 2a − 1 + d. It follows that R := Z ⊕ Za is a subring of H. It is commutative! It is clear that R is also equal to Z ⊕ Z(a − 1) and (a − 1)2 = d. The kernel of the surjective ring homomorphism φ : [x] → R, φ(x) := a − 1 Z √ 2 2 ∼ contains x − d;√ but Z[x]/(x − d) = Z[ d]√ so φ induces a surjective ring homo- ¯ morphism√ φ : Z[ d] → R. We have shown Z[ √d]/I is finite if I is a non-zero ideal ¯ ∼ of Z[ d] so ker φ must be zero.√ Hence R = Z[ d]. ∼ It follows√ that R/√ (q) = Z√[ d]/(q). In Z[ d], (1 + d)(1 − d) = 1 − d2 = 1 + u2 + v2 and this integer is divisible by p. If it equals p, then p is a sum of three squares.√ Suppose√ it is not equal to p. Then p is not prime because it divides neither 1 + d nor 1 − d. 10.8. Some history. The quaternion norm is multiplicative, i.e., N(αβ) = N(α)N(β). Writing this out explicitly gives the identity (a2 + b2 + c2 + d2)(A2 + B2 + C2 + D2) = (aA − bB − cC − dD)2 + (aB + bA + cD − dC)2 + (aC − bD + cA + dB)2 + (aD + bC − cB + dA)2, sometimes called Euler’s identity because he wrote it down in 1748 when trying to prove Fermats’s 1659 assertion that every positive integer is a sum of four squares. It is analogous to Brahmagupta’s identity (5-1). This identity was used by Lagrange (1736-1813) in his 1770 proof that every integer is a sum of four squares. Lagrange was an Italian, born as Giuseppe Luigi Lagrangia in Turin, and changed his name MATH 404: ARITHMETIC 51 after he moved to Paris in 1787 and took French citizenship. Lagrange proved this result, known then as Bachet’s conjecture, in 1770. In the late 1800s there were strong feelings and conflicting opinions about the quaternions. To James Clerk Maxwell, he of Maxwell’s Equations, “The invention of [...] quaternions is a step towards the knowledge of quantities related to space which can only be compared, for its importnace, with the invention of triple coor- dinates by Descartes.” But Lord Kelvin, in an 1892 letter to Hayward, said that “Quaternions came from Hamilton after his really good work had been done; and though beautifully ingenious, have been an unmixed evil to those who have touched them in any way, including Clerk Maxwell.” To Thomas Hill, President of Harvard University from 1862 til 1868, “[...] in the great mathematical birth of 1843, the Quaternions of Hamilton, there is as much real promise of benefit to mankind as in any event of Victoria’s reign.” C.S. Pierce said his brother Charles17 ”remained to his√ dying day a superstitious worshipper of two hostile gods, Hamilton and the scalar −1.” Quaternions became something of a fashion in the late 1800s and there was even an International Association for Promoting the Study of Quaternions and Allied Systems of Mathematics but, as with all fashions, their day passed and by the early twentieth century it was referred to as The Society for the Prevention of Cruelty to Quaternions.

Department of Mathematics, Box 354350, Univ. Washington, Seattle, WA 98195 E-mail address: smith@@math.washington.edu

17Both, like their father, were mathematicians.