<<

Theory Course notes for MA 341, Spring 2018

Jared Weinstein

May 2, 2018

Contents

1 Basic properties of the 3 1.1 Definitions: Z and Q ...... 3 1.2 The well-ordering principle ...... 5 1.3 The division ...... 5 1.4 Running times ...... 6 1.5 The ...... 8 1.6 The extended Euclidean algorithm ...... 10 1.7 Exercises due February 2...... 11

2 The unique theorem 12 2.1 Factorization into primes ...... 12 2.2 The proof that prime factorization is unique ...... 13 2.3 Valuations ...... 13 2.4 The rational root theorem ...... 15 2.5 Pythagorean triples ...... 16 2.6 Exercises due February 9 ...... 17

3 Congruences 17 3.1 Definition and basic properties ...... 17 3.2 Solving Linear Congruences ...... 18 3.3 The Chinese Remainder Theorem ...... 19 3.4 Modular Exponentiation ...... 20 3.5 Exercises due February 16 ...... 21

1 4 Units modulo m: Fermat’s theorem and Euler’s theorem 22 4.1 Units ...... 22 4.2 Powers modulo m ...... 23 4.3 Fermat’s theorem ...... 24 4.4 The φ function ...... 25 4.5 Euler’s theorem ...... 26 4.6 Exercises due February 23 ...... 27

5 Orders and primitive elements 27 5.1 Basic properties of the function ordm ...... 27 5.2 Primitive roots ...... 28 5.3 The discrete logarithm ...... 30 5.4 Existence of primitive roots for a prime modulus ...... 30 5.5 Exercises due March 2 ...... 32

6 Some cryptographic applications 33 6.1 The basic problem of cryptography ...... 33 6.2 Ciphers, keys, and one-time pads ...... 33 6.3 Diffie-Hellman key exchange ...... 34 6.4 RSA ...... 36

7 Quadratic Residues 37 7.1 Which are squares? ...... 37 7.2 Euler’s criterion ...... 38 7.3 Exercises due March 16 ...... 40

8 40 8.1 The Legendre symbol ...... 40 8.2 Some reciprocity laws ...... 41 8.3 The main quadratic ...... 42 8.4 The Jacobi symbol ...... 44 8.5 Exercises due March 23 ...... 45

9 The Gaussian integers 46 9.1 Motivation and definitions ...... 46 9.2 The division algorithm and the gcd ...... 48 9.3 Unique factorization in Z[i]...... 49 9.4 The factorization of rational primes in Z[i]...... 49 9.5 Exercises due March 30 ...... 50

2 10 Unique factorization and its applications 51 10.1 Pythagorean triples, revisited ...... 51 10.2 A cubic Diophantine√ equation ...... 51 10.3 The system Z[ −2] ...... 52 10.4 Examples of the failure of unique factorization ...... 53 10.5 The Eisenstein integers ...... 54 10.6 Exercises due April 13 ...... 56

11 Some analytic 57 P 11.1 p 1/p diverges ...... 58 11.2 Classes of primes, and their infinitude ...... 60 P 11.3 p≡±1 (mod 4) 1/p diverges ...... 61 11.4 Exercises due April 20 ...... 63

12 Continued and Pell’s equation 64 12.1 A closer look at the Euclidean algorithm ...... 64 12.2 Continued fractions in the large ...... 67 12.3 Real quadratic irrationals√ and their continued fractions . . . 68 12.4 Pell’s equation and Z[ d]...... 70 12.5 The fundamental ...... √ ...... 71 12.6 The question of unique factorization for Z[ d]...... 73 12.7 Exercises due April 27 ...... 74

13 Lagrange’s four square theorem 74 13.1 Hamiltonian ...... 75 13.2 The Lipschitz quaternions ...... 77 13.3 The Hurwitz quaternions ...... 78 13.4 Hurwitz primes ...... 80 13.5 The end of the proof ...... 81

1 Basic properties of the integers

1.1 Definitions: Z and Q Number theory is the study of the integers: ..., −3, −2, −1, 0, 1, 2, 3,... We use the symbol Z to stand for the set of integers. (Z stands for German Zahl, meaning number.) Now might be a good time to review some set- theoretic notations: 3 ∈ √Z is a true statement, meaning that 3 is a member of the integers, whereas 7 6∈ Z.

3 We observe that integers can be added, subtracted, and multiplied to produce other integers, but the same cannot be said for division. When we divide integers we create rational numbers, such as 3/7 and −2/3. We write the set of rational numbers as Q, for quotient. The failure of integers to divide each other evenly is so important that we have special notation to express it: for integers a and b, we write a|b to mean that b/a is an . In other words, a|b means that there exists c ∈ Z such that b = ac. In this case we say that a is a of b, and that b is a multiple of a.

Example 1.1.1. The of 12 are 1,2,3,4,6,12 and their negatives. A divisor of a positive integer n is proper if it’s positive and not equal to n itself. Thus the proper divisors of 12 are just 1,2,3,4,6.

Example 1.1.2. 1 is a divisor of every integer, as is −1. Also, every integer divides 0, since 0 = 0 · a for every a. However, the only multiple of 0 is 0 itself.

Proposition 1.1.3. Suppose that a, b, c ∈ Z. If a|b and b|c, then a|c.

Proof. There exists integers m, n such that b = am and c = bn. Then c = amn, so a|c.

The above proposition says that the relation a|b is transitive.

Proposition 1.1.4. Suppose a, b, d, x, y ∈ Z. If d|a and d|b, then d|ax + by.

We remark that ax + by is called a linear combination of a and b.

Proof. Write a = dm and b = dn, then ax+by = d(mx+ny), so d|ax+by.

A positive integer is prime if it has no proper divisors other than 1. By convention, 1 is not counted as prime.

Theorem 1.1.5 (Euclid). There are infinitely many primes.

Proof. If there we finitely many primes, then we could list all of them as 1 p1, . . . , pn. The number N = p1 ··· pn + 1 is divisible by some prime , which must be one of our enumerated primes, say . Then pi|N but also pi|p1 ··· pn. Thus pi|(N − p1 ··· pn) = 1, which is absurd.

1Strictly speaking, we don’t know this fact yet, but for now we’ll take it for granted.

4 Therefore we are guaranteed to never run out of primes. As of January 2018 the largest known prime is

277,232,917 − 1.

This is a , meaning a prime which is one less than a power of two. It is not known if there are infinitely many Mersenne primes.

1.2 The well-ordering principle How do we know that every integer n > 1 is divisible by a prime? An argument might go this way: if n isn’t itself prime, then it has a proper divisor n1 > 1. If n1 isn’t prime, then it has a proper divisor n2 > 1, and so on. The result is that we get a strictly decreasing of positive integers n > n1 > n2 > . . . , which cannot go on indefinitely. This fact, obvious that it may be, is quite important. We give it a name: The well- ordering principle.

Axiom 1.2.1 (The well-ordering principle). 2 A strictly decreasing sequence of positive integers cannot go on indefinitely.

Rather than attempt to prove this statement, we take it as an axiom of the system of integers.

1.3 The division algorithm We noted before that the integers are not closed under division. But there is a familiar operation among integers: you can divide one by another to obtain a quotient and a remainder. For instance, when 39 is divided by 5, the quotient is 7 and the remainder is 4. We can check this by verifying that 39 = 5 · 7 + 4. When this is done, the remainder must be less than the number you divided by. It would be incorrect to say that 5 goes into 39 with a quotient of 6 and a remainder of 9, even though 39 = 5 · 6 + 9 is also true.

Theorem 1.3.1 (The division algorithm). Let a, b ∈ Z, with b > 0. There exists a unique pair of integers q, r ∈ Z such that a = bq + r and that 0 ≤ r < b.

Of course, if the remainder r is 0, then a = bq and therefore b|a.

2There is another formulation: every nonempty subset of the positive integers has a least element. The two formulations are equivalent.

5 Proof. We’ll assume that a is positive, the other cases are similar. Consider the sequence a, a − b, a − 2b, a − 3b, . . . . By the well-ordering principle, these cannot all be nonnegative integers. So there is a least one which is nonnegative, call it r = a − bq. If r > b, then a − b(q + 1) = r − b > 0, which contradicts our assumption that r was the least element of our sequence. Therefore r ≤ b. That handles the existence part of the theorem. For uniqueness: if there were another pair q0, r0 such that a = bq + r = bq0 + r0, then r − r0 = b(q0 − q) would be a multiple of b, but since 0 ≤ r, r0 < b, this can only happen if r = r0, which implies q = q0 as well.

This proof gives a hint to the “algorithm” part of the division algorithm: to divide 5 into 39, keep subtracting 5 from 39 to get 34, 29, 24, 19, 14, 9, 4, at which point we cannot subtract anymore and 4 is the remainder. One says that just as is repeated , division is repeated subtraction. I want to introduce an important piece of notation: if r is the remainder when b is divided into a, we sometimes write a mod b = r, especially if the remainder is all we care about. You already do this with time: 17 hours after 2 o’clock is 19 mod 12 = 7 o’clock. (Or substitute 24 for 12 if you use that system.) We say that r is the residue of a modulo b. It is always between 0 and b − 1 inclusive.

1.4 Running times Of course in practice when you want to divide larger numbers, like 114 into 395623945, you don’t subtract repeatedly at all. Instead you perform an

6 algorithm known as long division, which looks like this:

3470385  114 395623945 342000000 53623945 45600000 8023945 7980000 43945 34200 9745 9120 625 570 55 Thus the quotient is 3470385 and the remainder is 55. This may look labo- rious, but you could probably do it by hand in just a few minutes. Contrast this with the repeated subtraction method. You would have had to subtract 114 from 395623945 a total of 3470385 times – even if you could do one subtraction every second, it would take 40 days! In our applications to cryptography, it will be important to keep track of how long it takes for a person (or a computer) to run a particular algo- rithm, in terms of how many basic operations are performed as a function of how long the inputs are. In the case of our long division problem, there were 3 + 9 = 12 inputs (the total number of digits in 114 and 395623945). If a basic operation means adding, subtracting, or multiplying individual digits, then the long division algorithm took dozens of operations, while the repeated subtraction algorithm took millions of operations. One says that long division is a time algorithm, but repeated subtraction is exponential time. Behind any abstract theorem in number theory there is often an algorith- mic question. For instance, we just saw that every integer n > 1 has a prime divisor. Is there a fast algorithm to find one? One simple method is to try dividing 2, 3, 4, . . . , n − 1, n into n to see if any of these are divisors; the first one that divides n evenly will be prime (why?). Such an algorithm would require at least n steps. When n has hundreds of digits, this is completely impractical. √ We can save some time by noting that if we reach n without finding any factors, then n must be prime, which limits the number of steps to

7 √ about n. That seems like it should help a lot, until you figure that if n √ has 200 digits, then n has about 100. Computers these days are fast, but no computer out there can execute 10100 steps in any reasonable amount of time.

1.5 The Euclidean algorithm Given positive integers a and b, a common divisor is an integer d such that d|a and d|b. The (gcd) is of course the greatest of these. This comes up in simplifying fractions: to reduce 18/12 you have to divide both numerator and denominator by their gcd, which is 6, to get 3/2. If gcd(a, b) = 1, we say that a and b are relatively prime or coprime. If a and b are large numbers, how do we compute gcd(a, b)? One way to be to count down from the smaller of the two numbers, and stop at the first one which divides them both. But if the smaller number has 100 digits, then this process will take about 10100 steps, which is far too long. The Euclidean algorithm is a very efficient way to compute gcd(a, b) with- out having to factor either number. It rests on repeated application of the division algorithm (which we already noted runs in polynomial time). It’s best illustrated by example. Suppose we want gcd(119, 259). We calculate:

259 = 2 · 119 + 21 119 = 5 · 21 + 14 21 = 1 · 14 + 7 14 = 2 · 7 + 0.

Note that in each iteration, the denominator and remainder become the numerator and denominator in the next step. The last non-zero remainder is 7, which is the gcd we wanted! The algorithm works because of the following lemma: Lemma 1.5.1. For integers a, b, q, r with a = bq + r, we have gcd(a, b) = gcd(b, r). Proof. Let d = gcd(a, b) and e = gcd(b, r). We’ll show that d ≤ e and e ≤ d, which will do the trick. First let’s show that d ≤ e. Since d divides a and b, it divides r = a − bq, which is a linear combination of a and b. Thus d is a common divisor of b and r. Therefore it cannot exceed the greatest common divisor of b and r, which is e.

8 Now let’s show that e ≤ d. Since e divides b and r, it divides a = bq + r, which is a linear combination of b and r. Thus e is a common divisor of a and b. Therefore it cannot exceed the greatest common divisor of a and b, which is d.

Thus in the example, gcd(259, 119) = gcd(119, 21) = gcd(21, 14) = gcd(14, 7) = gcd(7, 0) = 7. I should note here that as long as the remainder is nonzero, the algorithm can continue to produce a smaller remainder. By the well-ordering principle, the remainders cannot decrease forever, and so eventually one arrives at a remainder of 0. Finally, note that gcd(r, 0) = r for any nonzero r. It turns out that Euclid’s algorithm runs in polynomial time. Computers can easily compute gcd(a, b) even if a and b have hundreds of digits. To get a sense of why Euclid’s algorithm runs quickly, let us examine the following worst case scenario, in which we compute gcd(55, 34):

55 = 1 · 34 + 21 34 = 1 · 21 + 13 21 = 1 · 13 + 8 13 = 1 · 8 + 5 8 = 1 · 5 + 3 5 = 1 · 3 + 2 3 = 1 · 2 + 1 2 = 2 · 1 + 0

We computed gcd(55, 34) = 1 in 8 iterations, whereas gcd(259, 119) = 7 took only 4. Notice that the quotient was 1 each time we divided (ex- cept the last one), which means that the remainders go down as slowly as possible. We got this result because we used consecutive numbers in the Fibonacci sequence 1, 1, 2, 3, 5, 8,... , in which each number is the sum of the two previous numbers. As a result, computing gcd(a, b) can be done in at most n iterations, where the nth number in the Fibonacci sequence is larger than a and b.

9 1.6 The extended Euclidean algorithm The integers 49 and 40 are relatively prime, so it’s no surprise that the Euclidean algorithm produces 1:

49 = 1 · 40 + 9 40 = 4 · 9 + 4 9 = 2 · 4 + 1 4 = 4 · 1 + 0

Now look at the sequence of quotients: 1, 4, 2, 4. It turns out that this sequence “encodes” the numbers we started with. Place them in the top row of a table like so: 1 4 2 4 1 0 0 1 Proceeding from left to right, we fill in the blanks as follows. The first num- ber of the top row is 1. Use the two numbers in the second row immediately preceeding this column to make a number like this: 1 · 0 + 1 = 1. Then 4 · 1 + 0 = 4, so we put that in the next spot. Filling out everything like this gives us 1 4 2 4 1 0 1 4 9 40 0 1 1 5 11 49 The final column has 40, 49, which of course are the numbers we started with. The second-to-last column has 31, 38. Observe that

49 · 31 − 40 · 38 = 1.

This method, called the extended Euclidean algorithm, gives a practical means of finding a solution to the equation

ax + by = 1 when gcd(a, b) = 1. Now let’s try a = 259 and b = 119, like in our previous example. The sequence of quotients is 2, 5, 1, 2 and the gcd is 7. The extended Euclidean algorithm gives us

10 2 5 1 2 1 0 1 5 6 17 0 1 2 11 13 37 The numbers in the last column are 17 = 119/7 and 37 = 259/7. That is, we got the numbers we started with, divided out by their gcd. The second-to-last column has 6 and 13, and

37 · 6 − 17 · 13 = 1, and multiplying both sides by 7 gives

259 · 6 − 119 · 13 = 7.

Theorem 1.6.1 (Bezout’s identity). Let a and b be positive integers. There exist integers x, y such that ax + by = gcd(a, b). Proof. If you believe that the extended Euclidean algorithm works, you may be satisfied already. But here is an independent proof: Among all posi- tive linear combinations ax + by, there is a smallest one, say ax + by = d. Certainly gcd(a, b)|d. Let’s perform the division algorithm with a and d: a = dq + r, with 0 ≤ r < d. Then

r = a − dq = a − (ax + by)q = a(1 − xq) − bqy is also a linear combination of a and b. Since d was assumed least among all positive linear combinations, and r < d, the only way this is possible is if r = 0. Thus d|a. Similarly d|b, which means d ≤ gcd(a, b). Combining this with gcd(a, b)|d gives d = gcd(a, b).

1.7 Exercises due February 2. 1. The proper divisors of 6 are 1,2,3. We have 1 + 2 + 3 = 6, meaning that 6 is a perfect number. Verify that 28 and 496 are also perfect.

2. The ancient Greeks divided integers n into perfect (sum of proper divisors is n), abundant (sum of divisors is > n), and deficient (sum of divisors is < n). Classify each of the numbers 2, 3,..., 20 into one of these three classes.

3. Suppose that p = 2n − 1 is a Mersenne prime. Prove that 2n−1p is a perfect number.

4. Prove that if a, b, c, d ∈ Z and a|b and c|d, then ac|bd.

11 5. Let p1, . . . , pn be distinct primes. How many positive divisors does p1 ··· pn have? 6. True or false: the rational numbers Q obey the well-ordering principle. Explain your reasoning.

7. What is the remainder when 2100 is divided by 5? (Find a pattern in the first few powers of 2.)

8. Use the Euclidean algorithm to compute gcd(527, 408) and gcd(1001, 121).

9. Use the extended Euclidean algorithm to find integers x and y such that 527x + 408y = gcd(527, 408).

10. Let a and b be integers. Show that any common divisor of a and b must divide gcd(a, b).

2 The unique factorization theorem

2.1 Factorization into primes Lemma 2.1.1. Every positive integer can be expressed as a product of primes.

(Even 1 is a product of primes: it is the empty product, so to speak. And 17 is a product of primes too, but just one of them. So one must interpret the lemma to mean “every positive integer can be expressed as a product of zero or more primes.”)

Proof. Let n ∈ Z be positive. If n = 1, we’re done. Otherwise we can find a prime divisor p1|n. Write n = p1n1, where n1 < n. If n1 = 1, we’re done. Otherwise we can find a prime divisor p2|n1; write n1 = p2n2, with n2 < n1. Continuing, we get a sequence of descending positive integers n > n1 > n2 > . . . , which cannot go on forever. Thus there exists t for which nt = 1, and then n = p1p2 ··· pn. The proof even suggests a sort of algorithm for factoring a number into primes: keep dividing out prime factors until you’re completely factored the number. For instance,

72 = 2 · 36 = 2 · 2 · 18 = 2 · 2 · 2 · 9 = 2 · 2 · 2 · 3 · 3 = 23 · 32.

12 The process produces the same result no matter how we factor the num- ber. Here’s another way:

72 = 3 · 24 = 3 · 3 · 8 = 3 · 3 · 2 · 4 = 3 · 3 · 2 · 2 · 2 = 23 · 32.

Perhaps this isn’t so surprising. But how do we really know that you get the same prime factorization no matter what? Could there be a particular num- ber n, possibly with hundreds of digits, which has two prime n = p1p2 = q1q2, with all four primes p1, p2, q1, q2 distinct?

2.2 The proof that prime factorization is unique All will rest upon the following lemma.

Lemma 2.2.1. Let a, b, c ∈ Z, with a|bc and (a, b) = 1. Then a|c.

Proof. Crucially, we use Bezout’s identity (Theorem 1.6.1). There exist x, y ∈ Z with ax + by = 1. Multiplying by c, we get acx + bcy = c. We have a|bc, so that a|bcy. Obviously a|acx, so a|acx + bcy = c.

Corollary 2.2.2. Let a, b ∈ Z. If p is a and p|ab, then p|a or p|b.

Proof. We will show that if p - a then p|b. If p - a, then gcd(p, a) = 1, in which case the preceeding lemma shows that p|b.

From this it is easy to see that if p divides an arbitrary product then p must divide one of the factors.

Theorem 2.2.3 (Unique Factorization Theorem). Every positive integer can be written as a product of primes in a unique way, up to ordering.

Proof. If p1 ··· pt = q1 ·qs for primes p1, ··· , pt, q1, ··· , qs, then pt divides the product q1 ··· qs, so that it must divide one of the factors. Without loss of generality, pt|qs. But these are primes, so we must have pt = qs. Removing this factor gives p1 ··· pt−1 = q1 ··· qs−1. Continuing, we are able to match up each p with a q until no further factors remain.

2.3 Valuations The Unique Factorization Theorem shows that every n ≥ 1 can be written Y n = pap , p

13 where p runs over primes and ap is a nonnegative integer. It must be the case that ap = 0 for all but finitely many primes, so that the product can make sense. Since prime factorization is unique, the ap are uniquely determined by n, and so it makes sense to define

valp(n) = ap, 2 the valuation of n at p. For instance, 75 = 3 · 5 , so val3(75) = 1 and val5(75) = 2, whereas valp(75) = 0 for every other prime p. You can extend this definition to include negative n as well: valp(−n) = valp(n). You can even extend it to include 0. We set valp(0) = ∞. (Why is this the right definition?) The function valp obeys the following rules:

valp(mn) = valp(m) + valp(n) k valp(m ) = k valp(m), which makes it similar to the logarithm to base p. Here are some basic facts about valp: Theorem 2.3.1. Let a, b ∈ Z.

1. a|b if and only if, for all primes p, valp(a) ≤ valp(b).

2. valp(gcd(a, b)) = min {valp(a), valp(b)} .

3. valp(lcm(a, b)) = max {valp(a), valp(b)} . 4. If a > 0, then a is a perfect kth power if and only if, for all primes p, k| valp(a). I encourage you think about why these facts are true, and to work with some examples. For instance, the gcd of 25 · 3 · 54 and 32 · 53 is 3 · 53.A consequence of (2) is that gcd(a, b) = 1 if and only if, for all primes p, either valp(a) or valp(b) is 0. Theorem 2.3.2. For a, b ∈ Z positive, gcd(a, b) lcm(a, b) = ab.

Proof. The valp of the left hand side is min {valp(a), valp(b)}+max {valp(a), valp(b)} = valp(a) + valp(b) (why?), which is the same as valp(ab). Theorem 2.3.3. Let a and b be coprime positive integers. If ab is a perfect square, then so are a and b.

Proof. Since ab is a perfect square, valp(ab) = valp(a) + valp(b) is even for all p. Then since one of valp(a) and valp(b) has to be 0, both must be even. This shows by point (4) above that a and b are perfect squares.

14 2.4 The rational root theorem This is a classic example of proof by contradiction. √ Theorem 2.4.1. 2 is irrational. √ √ Proof. Assume that 2 is rational. Then 2 = p/q for positive p, q ∈ Z. Then p2 = 2q2. Since 2|p2, Theorem [?] shows that 2|p; i.e. p is even. Write 2 2 p = 2p0; then p0 = 2q . The same reasoning shows that q is even. Write 2 2 q = 2q0, and then p0 = 2q0. But this is the original equation! Repeating the process gives a descending sequence of positive integers p > p0 > p1 > . . . , which is impossible.

It may have occurred to you to avoid the use of the well-ordering principle in this proof by arguing as follows: express p/q in lowest terms, show that p and q are both even, and then draw a contradiction. To do this, though, we need to know that it is possible to expression in lowest terms in the first place! This is the point of the following theorem: Theorem 2.4.2. If gcd(p, q) = d, then gcd(p/d, q/d) = 1. Then if p/q is a , we can let d = gcd(p, q), and then after writing p = dp0 and q = dq0, then gcd(p0, q0) = 1, and p0/q0 is in lowest terms.

Proof. We can write px + qy = d for some integers x and y, and then p0x + q0y = 1, which shows that gcd(p0, q0) = 1.

But let’s return to the subject√ of irrationality. A variation of the above proof can be used to show that 3 and 71/3 are irrational too. These are examples of algebraic numbers, a class of complex numbers which include √ √ p √ combinations like 2 + 3, 3 + 7 − 2. A number is algebraic if it is the root of a polynomial with integer coefficients. Theorem 2.4.3 (Rational Root Theorem). Suppose the polynomial

n n−1 f(x) = anx + an−1x + ··· + a0 has coefficients ai ∈ Z. If p/q is a in lowest terms which is a root of f(x), then q|an and p|a0. Proof. The fact that p/q is a root of f(x) means that f(p/q) = 0. After clearing away denominators, this becomes

n n−1 n−1 n anp + an−1p q + ··· + a1pq + a0q = 0.

15 Since p divides all terms other than the last one, it divides the last one as n well: p|a0q . But by Theorem 2.2.2, p|a0 (remember that gcd(p, q) = 1). The proof that q|an is similar. The Rational Root Theorem gives a method for finding all rational roots p/q of a polynomial with integer coefficients, since the possibilities for p√and q are limited. We can also√ use the Rational Root Theorem√ to show 2 is irrational in another way. 2 is a root of x2 − 2. If 2 = p/q in lowest terms, then p|2 and q|1, which implies that p/q = ±2. But this is nonsense, √ √ since 2 6= ±2! The same proof can be used to show that n is irrational whenever n is not a perfect square.

2.5 Pythagorean triples A is a list (a, b, c) of integers which satisfy

a2 + b2 = c2, so that a, b, c could be the lengths of sides of a right triangle. This is an example of a : a polynomial equation meant to be solved for integer variables. This particular Diophantine equation is truly old, the solution (3, 4, 5) being known to the ancient Egyptians. Other familiar solutions are (5, 12, 13) and (6, 8, 10). The point of this discussion is to find all the Pythagorean triples. Note that if a prime p divides two of the three numbers, then it divides the third (Theorem 2.2.2 again). Let’s call a triple primitive if gcd(a, b, c) = 1. Then in a primitive triple, all pairs (a, b), (a, c), (b, c) are coprime as well. It suffices to find all the primitive triples, because any other triplet is just a multiple of a primitive one. Suppose (a, b, c) is primitive. Then a and b can’t both be even. But they can’t both be odd either: if a = 2m+1 and b = 2n+1 are odd, then c = 2c0 is even, and substituting gives

2 2 2 4m + 4m + 1 + 4n + 4n + 1 = 4c0, or 2 2 2 2(m + m + n + n) + 1 = 2c0, which is impossible. So a and b have opposite parities. Without loss of generality, say a is odd and b is even. We have a2 = c2 − b2 = (c + b)(c − b).

16 Since gcd(b, c) = 1, gcd(c + b, c − b) is 1 or 2 (Exercise 3). But we can rule out 2, since (c + b)(c − b) = a2 is odd. Thus (c + b)(c − b) = a2 is odd, so in fact gcd(c+b, c−b) = 1. Now by Theorem 2.3.3, c+b = p2 and c−b = q2 for positive integers p, q. These have to be odd and relatively prime. Solving, we get c = (p2 + q2)/2, b = (p2 − q2)/2, and a = pq.

Theorem 2.5.1. As p and q run through pairs of odd coprime integers, (pq, (p2 − q2)/2, (p2 + q2)/2) runs through all primitive Pythagorean triples (up to switching the a and b coordinates).

2.6 Exercises due February 9 1. How many (positive) divisors does the number 25 · 37 · 5 · 116 have?

2. Prove that if a, b, c ∈ Z, then gcd(ab, ac) = a gcd(b, c).

3. Prove that if a, b ∈ Z are coprime then gcd(a + b, a − b) is either 1 or 2.

4. Let a, b, c ∈ Z. Prove that if gcd(a, b) = 1, a|c, and b|c, then ab|c.

5. Prove that if ab is a perfect cube and gcd(a, b) = 1, then a and b are both perfect cubes.

6. Find all rational roots of 3x3 + x2 + x − 2. √ √ 7. Prove that 2 + 3 is irrational.

8. Show that if a and b are integers and an|bn, then a|b. (There are multiple ways to do this. One quick way is to use the rational root theorem!)

9. When the number 30! is written out in base 10, how many zeros are at the end?

10. Is it possible to write 50 as the difference between two perfect squares?

3 Congruences

3.1 Definition and basic properties Definition 3.1.1. For integers a, b, m, we write a ≡ b (mod m) (pro- nounced: a is congruent to b modulo m) if m|a − b.

17 The notation here suggests that somehow a and b are equal in a funny way. Indeed you probably already have a notion of taking a number modulo 12 (or 24) when you think about the clock: The clock looks the same when 100 hours pass as when 4 hours pass, because 100 ≡ 4 mod 12. Or if you think about numbers as being even or odd: a ≡ b (mod 2) means that a and b have the same parity (they are either both odd or both even). The notion that a ≡ b (mod m) is a sort of equality can be formalized by checking the following three properties:

1. (Reflexivity) a ≡ a (mod m).

2. (Symmetry) If a ≡ b (mod m) then b ≡ a (mod m).

3. (Transitivity) If a ≡ b (mod m) and b ≡ c (mod m) then a ≡ c (mod m).

4. If a ≡ b (mod m) then:

a + c ≡ b + c (mod m) a − c ≡ b − c (mod m) ac ≡ bc (mod m)

The first three properties express the fact that ≡ is an . This means that you can treat the ≡ symbol much like the = symbol, at least when it comes to substituting equals for equals. The fourth property means that when it comes to congruences you can add, subtract or multiply by c on both sides and the congruence will remain true. You should be able to come up with short proofs of the above properties. For instance, here’s a proof of 4(a): If a ≡ b (mod m) it means that m|a−b = (a + c) − (b + c), so a + c ≡ b + c (mod m).

3.2 Solving Linear Congruences The rules we outlined above enable us to solve for x in congruences like

x + 3 ≡ 1 (mod 10).

Namely, you can subtract 3 from both sides to get x ≡ −2 (mod 10), which is the same as x ≡ 8 (mod 10). But if the equation is

3x ≡ 2 (mod 10),

18 we cannot “divide by 3” on both sides just yet because “1/3” doesn’t having any meaning modulo 10 (at least until we give it meaning). We can try plugging in x = 0, 1,..., 9 to see that there is just one solution x ≡ 4 (mod 10). Here’s another example: 2x ≡ 4 (mod 10). There’s the obvious solution x ≡ 2 (mod 10), but then there’s also x ≡ 7 (mod 10). Those are the only solutions modulo 10. You can also say that the complete solution is x ≡ 2 (mod 5). Finally, look at 2x ≡ 3 (mod 10). This time there are no solutions at all! Thus a linear congruence can have zero, one, or more than one solutions.

Theorem 3.2.1. The congruence ax ≡ b (mod m) has a solution if and only if gcd(a, m)|b. If a solution exists, then it is unique modulo m/ gcd(a, m). In particular if gcd(a, m) = 1 then a solution always exists and is unique modulo m.

Proof. Let’s begin with the case that gcd(a, m) = 1. Then there exist x, y ∈ Z with aX +mY = 1. But then m|mY = aX −1, so that aX ≡ 1 (mod m). We can multiply this by b to get a(bX) ≡ b (mod m). Therefore x = bX is a solution. If x0 is another solution, then ax ≡ ax0 (mod m), so m|a(x−x0). Since gcd(a, m) = 1, m|x − x0 and so x ≡ x0 (mod m). We have shown that the solution is unique in this case. In the general case, let d = gcd(a, m). The congruence ax ≡ b (mod m) means that m|ax − b. Since d|m and |a, we also have d|b. Thus shows that if there is a solution we must have d|b. Supposing then that d|b, let a = da0, b = db0 and m = dm0. The statement m|ax − b is equivalent to m0|a0x − b0, or a0x ≡ b0 (mod m0). But now gcd(a0, m0), so this new congruence has a unique solution modulo m0.

3.3 The Chinese Remainder Theorem This section is concerned with solving simultaneous congruences such as

x ≡ 2 (mod 7) x ≡ 5 (mod 6), where x needs to satisfy both congruences at the same time. We might proceed by listing the solutions to the first congruence: 2, 9, 16, 23,... and stopping at the first one that satisfies the second, which is 23. Here’s a

19 different one:

x ≡ 2 (mod 8) x ≡ 3 (mod 10).

This one does not have any solutions, since those x which satisfy the first congruence are even, and those satisfying the second congruence must be odd. First we’ll handle the situation that m and n are coprime.

Theorem 3.3.1. Let m and n be coprime integers. Then the system of congruences

x ≡ a (mod m) x ≡ b (mod n) has a unique solution modulo mn.

Proof. FIrst we’ll show that a solution exists, and then we’ll show it’s unique mod mn. Since m and n are coprime, there exist integers y and z such that my + nz = 1. Then my ≡ 1 (mod n) and nz ≡ 1 (mod m). So

x = anz + bmy satisfies x ≡ a (mod m) and x ≡ b (mod n). For uniqueness: if x0 is another solution, then x − x0 ≡ 0 (mod m) and x − x0 ≡ 0 (mod n). That is, x − x0 is divisible by m and n. Since m and n are relatively prime, x−x0 is divisible by mn, so that x ≡ x0 (mod mn).

The proof suggests a practical solution to the system of congruences: use the Extended Euclidean Algorithm to find y and z such that my + nz = 1, and then use the formula for x above. If m and n are not necessarily relatively prime, say d = gcd(m, n), then the simultaneous congruence cannot have a solution unless d|a − b.

3.4 Modular Exponentiation We have already remarked that the division algorithm runs very fast. The operation a (mod m) can be computed in polymomial time, so that it is reasonable to compute even if a and m have hundreds of digits. The same is true for modular exponentiation, meaning the computation of an (mod m). We demonstrate with the example of 3165 (mod 100). That

20 is, we want the last two digits of 3165. Certainly we could compute 3165 and simply write down the last two digits, but this is impractical when the exponent is very large. Instead, we write the exponent in binary:

165 = 27 + 25 + 22 + 1.

Now the idea is to square the base 7 repeatedly:

3 ≡ 3 (mod 100) 32 ≡ 9 2 32 ≡ 81 3 32 ≡ 61 4 32 ≡ 21 5 32 ≡ 41 6 32 ≡ 81 7 32 ≡ 61

Then

7 5 2 3165 = 32 · 32 · 32 · 3 ≡ 61 · 41 · 81 · 3 ≡ 43 (mod 100).

The number of times you have to square the base is at most then number of binary digits of the exponent, which is proportional to the number of decimal digits. Thus this method can handle exponents which have hundreds of digits. This fact is important for cryptography: it is much easier to exponentiate than it is to do the reverse (extract a root).

3.5 Exercises due February 16 For 1–4, if it’s true, prove it, and if it’s false, give a counterexample.

1. True or False: If a ≡ b (mod m) and c ≡ d (mod n) then ac ≡ bd (mod mn).

2. True or False: If a ≡ b (mod m) and c ≡ d (mod m) then ac ≡ bd (mod m).

3. True or False: the only solutions to x2 ≡ 1 (mod n) are x ≡ ±1.

4. True or False: if b ≡ c (mod m), then ab ≡ ac (mod m).

21 5. The of a (mod m) is an integer b such that ab ≡ 1 (mod m). Prove that the multiplicative inverse, if it exists, is unique modulo m.

6. Solve 15x ≡ 4 (mod 79).

7. Solve the system of congruences:

z ≡ 1 (mod 50) z ≡ −1 (mod 71)

8. Compute 3301 (mod 501).

9. Let n ≥ 0 be an integer, and let m = 2n + 1. Show that 22n ≡ 1 (mod m).

10. Let (a, b, c) be a Pythagorean triple. Show that 60|abc.

4 Units modulo m: Fermat’s theorem and Euler’s theorem

4.1 Units For integers a, b and m, we say that b is a (multiplicative) inverse to a modulo m if ab ≡ 1 (mod m). Of course the relation is mutual: if b is an inverse to a, then a is an inverse to b. You have already seen that an inverse is unique if it exists.

Theorem 4.1.1. a has a multiplicative inverse modulo m if and only if gcd(a, m) = 1.

Proof. This is just a special case of a prior theorem: ax ≡ 1 (mod m) has a solution if and only if gcd(a, m)|1, which is to say gcd(a, m) = 1.

The most important thing about units is that they can be canceled from both sides of a congruence. That is, if a is a unit modulo m, and ax ≡ ay (mod m), then we can multiply both sides be the inverse of a to get x ≡ y (mod m).

Theorem 4.1.2. The set of units modulo m is closed under multiplication.

Proof. If a and b have inverses c and d, then ab is also a unit, since (ab)(cd) = (ac)(bd) ≡ 1 (mod m).

22 × Let Um be the set of units modulo m. (This set is also written (Z/mZ) .) The above theorem means we can creat multiplication tables modulo m, like this one for m = 10: 1 3 7 9 1 1 3 7 9 3 3 9 1 7 7 7 1 9 3 9 9 7 3 1 Observe that every row and every column contains every unit exactly once. (Sometimes I call this the “sudoku property”.) This reflects the fact that if a is a unit mod m, then the linear equiation ax ≡ b (mod m) has a unique solution modulo m. Notice also that the table is symmetric about its diagonal: this reflects the fact that ab = ba (multiplication is commutative). In abstract algebra we call this sort of structure an . Easy and important exercise: Construct a table like this for m = 5, m = 7 and m = 12. Take note of any patterns you observe.

4.2 Powers modulo m Let a be an integer considered modulo m, and consider the sequence of powers a, a2, a3 (mod m), ··· For instance, here are the powers of 2 modulo m for three values of m: m 21 22 23 24 25 26 27 28 29 15 2 4 8 1 2 4 8 1 2 4 16 2 4 8 0 0 0 0 0 0 0 17 2 4 8 16 15 13 9 1 2 4 The first thing we can prove about this is that since there are only finitely many residues modulo m, and infinitely many possible powers, that we can find N > n with aN ≡ an (mod m). But then, multiplying by a gives an+k+1 ≡ an+1 as well, and so on; we infer that the sequence an, an+1, . . . , aN−1 (mod m) is the same as the sequence aN , aN+1, . . . , a2N−n−1. In conclusion, the sequence powers of a modulo m must eventually enter a repeating cycle. A special case occurs when a is a unit modulo m. Then we can cancel the excess powers in aN ≡ an to get aN−n ≡ 1 (mod m). Thus at some point in the sequence of powers, 1 appears.

23 Definition 4.2.1. Let a be a unit modulo m. The of a modulo m, n written ordm(a), is the smallest power n such that a ≡ 1 (mod m).

Looking at the table above, ord15(2) = 4 and ord17(2) = 8. We’ll resume the study of this ord function a bit later.

4.3 Fermat’s theorem

When p is a prime number, Up is the set of all nonzero residues 1, 2, . . . , p−1. Consider the following table listing an modulo 7: n 1n 2n 3n 4n 5n 6n 1 1 2 3 4 5 6 2 1 4 2 2 4 1 3 1 1 6 1 6 6 4 1 2 4 4 2 1 5 1 4 5 2 3 6 6 1 1 1 1 1 1 Strikingly, row 6 has only 1s. Theorem 4.3.1 (Fermat’s (little) theorem). Let p be a prime number, and let a be a unit modulo p. Then ap−1 ≡ 1 (mod p). Somtimes the theorem is stated a slightly different way: ap ≡ a (mod p) for all integers a (not just units). The only non-unit modulo p is 0, and of course 0p ≡ 0, so the two forms are equivalent. We’ll give two proofs of Fermat’s theorem.

#1. This proof is based on the sudoku property of the multiplication table modulo p. For a unit a, the ath row of the table reads a, 2a, 3a, . . . , (p − 1)a (mod p). But by the sudoku property, this list of residues is just a reordering of 1, 2, 3,..., (p − 1). This means the product of these two lists is the same: a · 2a · 3a ··· (p − 1)a ≡ 1 · 2 · 3 ··· (p − 1) (mod p) The residues 1, 2, 3,..., (p − 1) are all units, so we can cancel them; what’s left over is ap−1 ≡ 1 (mod p).

24 #2. We’re going to prove ap ≡ a (mod p) for all a = 1, 2,... by induction3. The base case 1p ≡ 1 (mod p) is trivial. Now, assuming np ≡ n, we use the binomial theorem: p p p (n + 1)p = np + np−1 + np−2 + ··· n + 1. 1 2 1

The binomial coefficients are p p! = ∈ Z k k!(p − k)!

If k = 1, . . . , p − 1, then neither k! nor (p − k)! is divisible by p (by Theorem p 2.2.2!), but p does divide p! = k!(p − k)!, so (Theorem 2.2.2 again!) k p p| . Therefore (n + 1)p ≡ np + 1 (mod p), so that by the inductive k hypothesis (n + 1)p ≡ n + 1. We win by induction.

4.4 The φ function Definition 4.4.1. For an integer m, φ(m) is the number of units modulo m. In order words, it is the number of integers among 1, 2, . . . , m which are relatively prime with m. This function is sometimes called Euler’s totient function.

The first few values of φ(m) are

3The principal of mathematical induction is a way of proving a proposition P (n) for all n = 1, 2,... . It says that if P (1) is true, and if the implication P (n) =⇒ P (n + 1) is true for any n ≥ 1, then P (n) is true for all n. But we don’t need to assume this as an axiom; it follows from the well-ordering principle! Indeed, if there were some n for which P (n) were false, then by hypothesis n 6= 1. Also P (n − 1) could not be true, since it implies P (n). Again by hypothesis, n − 1 6= 1. Continuing, we find a sequence of positive integers which descends indefinitely, contradiction.

25 m φ(m) 1 1 2 1 3 2 4 2 5 4 6 5 7 6 8 4 9 6 10 4 The first thing I notice is that φ(m) appears to be even for m ≥ 3. (This follows from the fact that the units come in pairs a and −a.) But of course we might want a formula for φ(m). One easy special case is that when p is a prime number, φ(p) = p − 1, since the units are exactly 1, 2, . . . , p − 1. Another case is a prime power pn: among the numbers 1, 2, . . . , pn, the only non-units modulo pn are those numbers divisible by p, so that φ(pn) = pn − pn−1. Theorem 4.4.2. For m and n relatively prime, φ(mn) = φ(m)φ(n). Proof. (This is just a sketch.) We apply the Chinese remainder theorem. Each unit a modulo mn can be reduced modulo m and then modulo n, to create a function Umn → Um × Un. The Chinese remainder theorem shows that this function is one-to-one and onto, so that φ(mn) = φ(m)φ(n).

By combining together what we know so far about φ, we get the following formula.

a1 ar Theorem 4.4.3. If p1 ··· pr is the prime factorization of n, then Y φ(n) = (pai − pai−1). i Note that this requires knowing the prime factorization of n. As far as we know there is no shortcut to finding φ(n) without knowing the prime factorization. Therefore if n has hundreds of digits, φ(n) is very difficult to compute.

4.5 Euler’s theorem Fermat’s theorem has an extension to general moduli m. In fact we can just adapt proof #1 of Fermat’s theorem to obtain Euler’s theorem:

26 Theorem 4.5.1. Let a be a unit modulo m. Then aφ(m) = 1 (mod m).

4.6 Exercises due February 23 1. Compute 23506 (mod 101).

2. Compute 23111 (mod 47).

3. Compute φ(75000).

4. Compute 51000 (mod 18).

5. Prove that if p is prime, and x2 ≡ 1 (mod p), then x ≡ ±1 (mod p).

6. Prove that if p is an odd prime, and a is a unit mod p, then a(p−1)/2 ≡ ±1 (mod p).

7. How many solutions are there to x2 ≡ 1 (mod n), where n is a product of r distinct primes?

8. Prove Wilson’s theorem: If p is prime, then (p − 1)! ≡ −1 (mod p). Strategy: each a = 1, . . . , p−1 has a multiplicative inverse b, and then a and b are distinct unless a = ±1.

9. Fermat’s theorem suggests the following test for primality: if a is a unit mod m, and am−1 6≡ 1 (mod m), then m cannot be prime. Compute 2118 (mod 119), and use this method to show that 119 is composite.

10. Unfortunately, this method is not foolproof. The number 561 is com- posite: 561 = 3 · 11 · 17. Nevertheless, show that for all units a modulo 561, a560 ≡ 1 (mod 561).

5 Orders and primitive elements

5.1 Basic properties of the function ordm Let a be a unit modulo m. Recall that aordm(a) ≡ 1 (mod m), and an 6≡ 1 (mod m) for any integer 1 ≤ n < ordm(a). Thus if we do find a positive n integer n with a ≡ 1 (mod m), we can conclude that ordm(a) ≤ n. In fact a little more is true:

n Theorem 5.1.1. Suppose that a ≡ 1 (mod m). Then ordm(a)|n.

27 Proof. By the division algorithm, we can write n = q ordm(a) + r, where 0 ≤ r < ordm(a). Then

1 ≡ an ≡ (aordm(a))qar ≡ 1qar ≡ ar (mod m).

If r 6= 0, we get a contradiction, since r < ordm(a). Thus r = 0 and n = q ordm(a). Here’s an important corollary. By Euler’s theorem, aφ(m) ≡ 1 (mod m), and therefore ordm(a)|φ(m). (5.1.1)

This is a strong restriction on what ordm(a) could possibly be. It means that if we are interested in finding ordm(a), we don’t need to compute all the powers a, a2,... modulo m, stopping when we reach 1. Instead, we can n compute a for all divisors n of φ(m). The order ordm(a) is the least divisor n for which an ≡ 1 (mod m).

n Theorem 5.1.2. For an integer n, ordm(a ) = ordm(a)/ gcd(n, ordm(a)). Proof. We have

ord (a) n m ord (a) n n (a ) gcd(n,ordm(a)) = (a m ) gcd(n,ordm(a)) ≡ 1 gcd(n,ordm(a)) ≡ 1 (mod m),

n so that ordm(a ) ≤ ordm(a)/ gcd(n, ordm(a)). On the other hand, we have

n n an ordm(a ) = (an)ordm(a ) ≡ 1 (mod m).

n Therefore by the previous theorem ordm(a)|n ordm(a ), so that

ordm(a) n n · ordm(a ). gcd(n, ordm(a)) gcd(n, ordm(a))

n By Lemma 2.2.1, ordm(a)/ gcd(n, ordm(a))| ordm(a ).

5.2 Primitive roots

We have seen that ordm(a)|φ(m) for every unit a modulo m. Sometimes it happens that ordm(a) = φ(m). This happens for instance with 3 modulo 7. The powers of 3 modulo 7 are 1, 3, 2, 6, 4, 5, 1,... . Notice that all units modulo 7 appear in this sequence.

Definition 5.2.1. A unit a is a primitive root modulo m if ordm(a) = φ(m).

28 To determine whether a is a primitive root, you can calculate aφ(m)/p (mod m) for every prime p which divides φ(m). If none of these residues is 1, then a is a primitive root. Here is a chart of the first few positive integers m and their primitive roots.

m prim. roots mod m 1 1 2 1 3 2 4 3 5 2,3 6 5 7 3,5 8 none 9 2,5 10 3,7 11 2,6,7,8 12 none

Later we’ll tackle the question of which m have primitive roots. It turns out that a primitive root exists whenever m is prime. The following theorem explains the term “primitive root”.

Theorem 5.2.2. Let a be a primitive root modulo m. Then for every unit u modulo m, there exists n ∈ Z such that u ≡ an (mod m). Furthermore, n is unique modulo φ(m).

Thus, every unit can be generated from a primitive root.

Proof. We claim that the residues

1, a, a2, . . . , aφ(m)−1 are all distinct modulo m. Indeed if two of them were the same, say ai ≡ aj (mod m) for 0 ≤ i < j < φ(m), then aj−i ≡ 1 (mod m), which is a contradiction because 0 < j − i < φ(m). Also, all of these powers are units. But this list contains φ(m) elements, and that is exactly how many units there are. So the list must contain every unit exactly once. For uniqueness: if an ≡ an0 ≡ (mod m), then an0−n ≡ 1 (mod m), 0 0 so that by Theorem 5.1.1 ordm(a) = φ(m)|n − n, meaning that n ≡ n (mod φ(m)).

29 Theorem 5.2.3. Suppose a is a primitive root modulo m. Then the full set of primitive roots modulo m is   n a 1 ≤ n ≤ φ(m), gcd(n, φ(m)) = 1 .

Thus the number of primitive roots modulo m is φ(φ(m)).

Proof. By Theorem 5.2.2, it suffices to say when an is a primitive root. By n n Theorem 5.1.2, ordm(a ) = φ(m)/ gcd(n, φ(m)). Thus a is a primitive root if and only if gcd(n, φ(m)) = 1.

5.3 The discrete logarithm Let m be an integer, and let b be a primitive root modulo m. By Theorem 5.2.2, every unit a is a power of b:

a ≡ bk (mod m).

Here the integer k may be considered modulo φ(m). We set

k = logb(a), and call this the discrete logarithm of a to the base b. For instance, 2 is a 4 primitive root modulo 11, and 2 ≡ 5 (mod 11), so log2(5) = 4. (You have to deduce from context that we are referring to the discrete logarithm here, and not the usual one.) The discrete logarithm obeys some of the usual rules that logarithms do, only modulo φ(m):

logb(xy) ≡ logb(x) + logb(y) (mod φ(m)) n logb(x ) ≡ n logb(x) (mod φ(m)) Unlike the case of usual logarithms, discrete logarithms are not easy to com- pute. If m has hundreds of digits, one knows that there exists a k that makes bk ≡ a (mod m) true, but finding this k is not at all straightforward. There are to do so, but none that we know so far runs in polynomial time. Thus, the discrete logarithm is hard to compute.

5.4 Existence of primitive roots for a prime modulus Here we will address the question of the existence of primitive roots modulo a prime. The proof is a little involved, so we’ll demonstrate the main idea with an example. Suppose we want to show that there exists a primitive

30 root modulo 59. This means finding a unit of order 58. By (5.1.1), the possible orders of units all divide 58, so they must be 1, 2, 29 or 58. The only element of order 1 is 1, and the only element of order 2 is −1. (This is proved in your exercises from last week – it’s here we use the fact that 59 is prime.) But there are more than 2 units! Therefore there exists an element of order 29 or 58. If there’s an element of order 58, great; that’s a primitive root. Otherwise, suppose x is an element of order 29. What is the order of −x? It must be 29 or 58, since x 6≡ ±1 (mod 59). But (−x)29 = −x29 ≡ −1 (mod 59), so that −x must be a primitive root. In order for the above proof to work, it was important to know that x2 ≡ 1 (mod 59) could have only two solutions, namely ±1. This is a special case of the following theorem:

n n−1 Theorem 5.4.1. Let f(x) = x + an−1x + ··· + a0 be a polynomial with integer coefficients, and let p be a prime. Then f(x) ≡ 0 (mod p) can have no more than n distinct solutions modulo p.

Proof. The proof will follow from the following fact which is familiar from algebra: If f(r) ≡ 0 (mod p), then we can write

f(x) ≡ (x − r)g(x) (mod p) for some polynomial g(x), whose degree is n − 1. (This is a congruence between – it means that corresponding coefficients on either side are congruent.) This is easy to see when r = 0, because if f(0) ≡ 0 (mod p) it means that c0 ≡ 0 (mod p), so that f(x) (mod p) is divisible by x. In general, we can substitute: f(x+r) has 0 as a root, so f(x+r) ≡ xh(x), and so (substituting back) f(x) ≡ (x − r)h(x − r). Now suppose f(x) has n distinct roots r1, ··· , rn modulo p. Then f(x) ≡ (x − r1)f2(x). Plugging in x = r2, we get 0 ≡ f(r2) ≡ (r2 − r1)f2(r2). But since r2 6≡ r1, we can use Corollary 2.2.2 to get f2(r2) ≡ 0 (mod p). Thus (x − r2) can be factored out of f2(x): f(x) ≡ (x − r1)(x − r2)f3(x). Continuing, we get

f(x) ≡ (x − r1) ··· (x − rn) (mod p).

(There can be nothing left over, because both sides are degree n with unit leading coefficients.) Again by Corollary 2.2.2, there cannot be a root of this other than r1, . . . , rn.

Lemma 5.4.2. Suppose m and n are relatively prime. If ordp(x) = m and ordp(y) = n, then ordp(xy) = mn.

31 mn m n n m Proof. Let d = ordp(xy). On the one hand, (xy) = (x ) (y ) ≡ 1 (mod p), so that d|mn. On the other hand, 1 ≡ (xy)md ≡ ymd, so that by Theorem 5.1.1, n|md, and so (Lemma 2.2.1) n|d. Similarly m|d, and so (since m and n are coprime) mn|d.

Now we return to the problem of finding a primitive root modulo a prime n1 nt p. Suppose φ(p) = p − 1 factors as `1 ··· `t . That is, val`i (p − 1) = ni for i = 1, . . . , t. We first claim that for each i there exists a unit u with (p−1)/`i val`i ordp(u) = ni. Assume otherwise: this would mean that u ≡ 1 (mod p). But this contradicts Lemma 5.4.1, because it would mean that the polynomial x(p−1)/`i − 1 has p − 1 roots modulo p.

Therefore there exists, for each i, a unit ui with val`i ordp(ui) = ni. ni ordp(ui)/`i ni Let vi = ui ; then by Lemma 5.1.2 we have ordp(vi) = `i . Let n1 nt v = v1 ··· vt. By Lemma 5.4.2, ordp(v) = `1 ··· `t = p − 1, so that v is a primitive root. We have proved: Theorem 5.4.3. Let p be a prime. There exists a primitive root modulo p. Note that the above proof is not constructive! That is, it doesn’t give us an algorithm to find a primitive root modulo p. If p is large, we don’t have a great way of finding a primitive root. I will say however that if we happen to know all the prime factors of p − 1, then we can quickly check if a given unit u is primitive (by testing u(p−1)/` 6≡ 1 for all prime ` dividing p − 1), so one might simply test units 2, 3, ··· until one finds a primitive root.

5.5 Exercises due March 2 These exercises constitute your midterm. You may refer to the notes, but not to any outside sources, and you must work on your own4.

1. Find integers x, y, z such that

55x + 35y + 77z = 1.

Please show your method.

2. Let n be an integer. Show that n13 − n is divisible by 2730.

3. True or false: for units a and b modulo m, ordm(ab) = ordm(a) ordm(b). (If true, prove it, if false, give a counterexample.)

4Added Monday Feb. 26: I shouldn’t have to say this, but there are some very real consequences for handing in work that is not your own on an exam. I won’t hesitate to report plagiarism or copying to the Dean.

32 4. True or false: If a is a unit modulo m, and ar ≡ as ≡ 1 (mod m), then agcd(r,s) ≡ 1 (mod m). (If true, prove it, if false, give a counterexam- ple.)

5. True or false: if p is a prime, and a3 ≡ 1 (mod p), then a ≡ 1 (mod p). (If true, prove it, if false, give a counterexample.)

6. Find all primitive roots modulo 17.

7. The decimal expansion of 1/7 is .142857. It repeats with period 6. Find all other integers n such that 1/n has period 6. (You may assume that n is coprime with 10.)

16 8. The number p = 2 + 1 is prime. Find ordp(2). 9. Suppose p is a prime, such that p ≡ 1 (mod 4). Let b be a primitive root modulo p, and let x = b(p−1)/4. Show that x2 ≡ −1 (mod p).

10. Suppose p is a prime, such that p ≡ 3 (mod 4). Show that x2 ≡ −1 (mod p) has no solutions. (Hint: Raise both sides to the power of (p − 1)/2.)

6 Some cryptographic applications

6.1 The basic problem of cryptography Cryptography is the art of sending messages securely. Cryptographers speak of fictional characters Alice, Bob and Eve. Alice and Bob are far apart, and Alice wants to send Bob a private message. (For instance, Alice could be a customer sending her credit card information to Bob’s online store.) If she sends the message directly (via snail mail, courier, wire or e-mail: the medium doesn’t matter!), then Eve the eavesdropper could intercept it, which would be a disaster. So Alice should encrypt her message in some way and send the coded message, so that Eve would not be able to understand it. But then how is Bob supposed to understand it? It almost sounds logically impossible for this to work, but in fact it can be done using some basic number theory.

6.2 Ciphers, keys, and one-time pads Since we’re going to use mathematics, it makes sense to agree upon a way to turn the message into a number. This can be accomplished with a simple

33 substitution (01 for A, 02 for B, etc.), or something more sophisticated (like ASCII). We are going to assume that this substitution is known to all parties (Alice, Bob, and also Eve). Thus Alice wants to send a large number M (perhaps in the hundreds of digits) to Bob. A natural way to do this is a simple substitution cipher: 0 can be replaced with 5, 1 with 3, 4 with 7, etc. (Or perhaps the cipher can be a little more complicated, with a rule for pairs or triples of digits.) Perhaps Alice and Bob have met earlier to agree on which cipher to use. But such a cipher is relatively easy for Eve to crack: the regularities of language make it easy to guess which letter corresponds to which of numbers. (Indeed, sometimes there are puzzles in the newspaper which ask to solve such a cipher.) Another idea is to use a key K. This is a random number with approx- imately the same size as M, which is known to Alice and Bob and no one else. To send a secure message, Alice can send C = M + K to Bob, who can then compute C − K = M. This has the advantage of being virtually unbreakable: since K is random, Eve has no way of guessing it and breaking the code. But it has some major disadvantages too: Alice and Bob would have had to meet in advance to agree on the key K (this is impractical if Alice is a customer at Bob’s online store!), and they both need to keep K secure as they travel. Not only that, but the key should only be used once: if Alice wants to send another message M 0, she sends C0 = M 0 + K. Then Eve, who has intercepted both C and C0, can compute C − C0 = M − M 0, the difference between the two messages–too risky. This last problem can be overcome if Alice and Bob share a one-time pad: a whole collection of keys K1,K2,... , all random and unrelated to one another, so that Alice can send Bob as many messages as there are keys. But this still has the problem that Alice and Bob need to agree on these keys in a secure location, which is often impractical.

6.3 Diffie-Hellman key exchange Remarkably, there is a way for Alice and Bob to agree on a key K with- out ever meeting, in such away that Eve cannot determine K even if she intercepts all communications. As a warm-up, here’s a riddle: Suppose Alice is sending a suitcase to Bob containing sensitive material. Both Alice and Bob own padlocks that can lock the suitcase, but the padlocks have different keys. How can Alice securely send Bob the suitcase? Here’s the solution: Alice locks the suitcase with her lock and sends it

34 to Bob. Bob receives it and places his own padlock on it, and sends it back to Alice with both locks. Alice then removes her own lock and sends it a third time to Bob, who removes his own padlock and opens the suitcase. In Diffie-Hellman key exchange, the idea of the riddle is combined with number theory. Alice chooses a large prime p, at least in the hundreds of digits and certainly larger than her message M. By Theorem 5.4.3, there exists a primitive root g modulo p. Alice finds one and makes both g and p public. (There is the good question of how quickly one can find a primitive root; we won’t be so concerned with this. If the factorization of p − 1 is known, it is easy to check that a particular unit is a primitive root; so one can guess and check until a primitive root is found.) Alice and Bob both choose secret numbers a and b, respectively. These should be very large but still less than p. They should also be relatively prime to p−1. Alice calculates A = ga (mod p), and B computes gb (mod p) (remember that modular exponentiation runs in polynomial time, so this is reasonable for them to do). The next steps are: 1. Alice sends A to Bob. 2. Bob sends B to Alice. 3. Alice computes Ba (mod p). 4. Bob computes Ab (mod p). In fact Alice and Bob have computed the same quantity, since Ba ≡ (gb)a ≡ (ga)b ≡ Ab (mod p). Call this common value K. Then K is the key that Alice and Bob can use to encode messages between each other. The whole process is called Diffie-Hellman key exchange. Why is it secure? Let’s say Eve wants to spy on Alice and Bob. She knows the prime p and its primitive root g, because these are public. She intercepts A and B. Can she use them to compute K in a reasonable amount of time? It is believed that the answer is no. The Diffie-Hellman problem is: Given ga and gb modulo p, compute gab modulo p. This is what Eve has to solve to get the private key K. Note the relationship with the problem of computing discrete logarithms. If Eve has a magical discrete-log calculator, ab she can compute a = logg A and b = logg B and then easily get g (mod p). But as far as we know there is no rapid way to compute discrete logarithms, and also no way to solve the Diffie-Hellman problem without them.

35 6.4 RSA The RSA algorithm is another number-theory based encryption method. It allows Alice to directly encrypt her message to Bob. Its security is based on the difficulty of factoring large integers. Bob is the intended recipient of secure messages. He chooses two large primes p and q, and computes N = pq. Bob publishes N but keeps its factorization secret. Bob has access to φ(N) = (p − 1)(q − 1). We remark that knowledge of φ(N) is equivalent to knowledge of p and q. Indeed, if you know φ(N) = N − (p + q) + 1, then you know p + q and pq = N, from which you can solve for p and q. Bob also chooses a private decryption key d. The number d can be small, but it should not be 1. It should also be relatively prime to φ(N). Secretly, Bob computes the inverse of d modulo φ(N). That is, he finds an integer e such that de ≡ 1 (mod φ(N)). This is the public encryption key. Bob publishes e. Alice would like to use RSA to send a secure message to Bob. Her message takes the form of an integer M which is less than N. (If her message is longer than N, she can break it up into smaller chunks. Also, if her message is particularly short, she should use a simple “padding” process to make sure that M is almost as large as N.) Since the encryption key e is public, Alice can use it to compute C = M e (mod N). This is the encrypted message. Alice sends it to Bob. To decrypt the message, Bob computes Cd (mod m). This works because

Cd ≡ (M e)d ≡ M ed ≡ M (mod N).

Why is the last congruence true? If M is relatively prime to N, it fol- lows from Euler’s theorem: Since ed ≡ 1 (mod φ(N)), we have M ed ≡ M (mod N). (It’s still true even in the unlikely event that M is divisible by p or q–you should figure this out for yourself.) Now suppose Eve overhears everything. She knows N, e and C = M e (mod N). To figure out M, she needs to extract an eth root of C modulo N. This is known as the RSA problem. If Eve can factor N, she can compute φ(N) and then use Euclid’s algorithm to compute d (the inverse of e modulo φ(N)), and then compute M the same way that Bob did. It is believed that solving the RSA problem is very difficult. But there is no proof that it can’t be done efficiently. For all we know, a criminal mastermind has already solved the problem and therefore can break RSA- based cryptosystems. The only evidence to the contrary is that very smart people have tried and failed to solve the RSA problem.

36 7 Quadratic Residues

7.1 Which numbers are squares? Which numbers are perfect squares? In other words, given n, when does √ n make sense? The answer depends very much on what sort of number system we are working with:

• In the real numbers R, the squares are the nonnegative numbers.

• In the complex numbers C, every number is a square.

• In the integers Z, it is easy to decide whether n is a square. If n < 0 it is certainly not. If n > 0, we can use a calculator to compute the √ n; if anything appears past the decimal point, n is not a square. Thus, deciding whether n is a perfect square is a polynomial- time algorithm.

• In the rational numbers Q, a positive reduced fraction p/q is a square if and only if both p and q are.

Much less obvious is the question of perfect squares in Z/mZ. That is, given an integer a, we could like to know if there is a solution to

x2 ≡ a (mod m).

(This is the natural progression of things: we have already solved linear congruences modulo m, and now we are moving on to degree 2 equations.) If a solution exists, we call a a modulo m; otherwise it is a quadratic nonresidue. (These terms are due to Gauss.) For instance, 10 is a square modulo 13 because 72 ≡ 10 (mod 13). Is 2 a square modulo 13? We can answer the question using a chart like this:

37 x x2 (mod 13) 0 0 1 1 2 4 3 9 4 3 5 12 6 10 7 10 8 12 9 3 10 9 11 4 12 1 Since 2 doesn’t appear on the second column, it is a quadratic nonresidue modulo 13. Note that the second column is palindromic (ignoring the initial zero), because (−x)2 = x2. So to answer the question of whether 2 was a quadratic residue, it was only really necessary to compute the squares of 0, 1,..., 6. This method is horribly inefficient for large values of m. It takes m/2 steps to decide if a is a quadratic residue modulo m this way, which is unacceptable.

7.2 Euler’s criterion If the modulus is a prime number p, there is a far better way to decide if a is a quadratic residue. Theorem 7.2.1 (Euler’s criterion). Let p be an odd prime. Suppose that a is a unit modulo p. Then a is a quadratic residue if and only if

a(p−1)/2 ≡ 1 (mod p).

Proof. If a ≡ x2 (mod p), then

a(p−1)/2 ≡ (x2)(p−1)/2 ≡ xp−1 ≡ 1 (mod p) by Fermat’s theorem. Conversely, suppose a(p−1)/2 ≡ 1 (mod p). By Theorem 5.4.3, there exists a primitive root g modulo p; let us write a ≡ gk (mod p). Then

1 ≡ a(p−1)/2 ≡ gk(p−1)/2 (mod p).

38 Since ordp(g) = p − 1, Theorem 5.1.1 implies that (p − 1)|k(p − 1)/2. Can- celling the integer (p − 1)/2 from both sides gives us 2|k, so that k = 2`. Therefore a ≡ gk ≡ (g`)2 (mod p) is a quadratic residue.

Theorem 7.2.2. Let p be an odd prime. There are exactly (p + 1)/2 quadratic residues modulo p. (Since 0 is obviously a quadratic residue, this is the same as saying that there are exactly (p − 1)/2 quadratic residues which are units.) Proof. We have already observed that the complete list of unit quadratic residues is 12, 22,..., ((p − 1)/2)2 (mod p). We are done if we can show that the members of this list are distinct. Suppose 1 ≤ x, y ≤ (p − 1)/2 and x2 ≡ y2 (mod p). Then p|x2 − y2 = (x − y)(x + y), so that (Lemma 2.2.1) p|(x − y) or p|(x + y), which is to say x ≡ ±y (mod p). Since x, y belong to the range 1,..., (p − 1)/2, x ≡ −y is impossible, so that x ≡ y (mod p).

Euler’s criterion gives a polynomial time algorithm for deciding whether a unit a is a quadratic residue modulo an odd prime p. However, Euler’s criterion does not tell us how to find a solution to x2 ≡ a (mod p). This is a harder problem. The following theorem is another interpretation of the problem in terms of discrete logarithms. Theorem 7.2.3. Let p be an odd prime and let a be a unit modulo p. Let g be a primitive root modulo p. Then a is a quadratic residue modulo p if and only if logg(a) is even. k Proof. Let k = logg(a), so that a ≡ g (mod p). If k is even, then a is obvi- 2 2 ously a quadratic residue. Conversely if a ≡ x , then logg(a) ≡ logg(x ) ≡ 2 logg(x) (mod p − 1). Since p − 1 is even, this implies that logg(a) is even as well.

We remark that the quadratic residues modulo p are 0 together with

1, g2, g4, . . . , gp−3.

A special case is a = −1. When is −1 a quadratic residue modulo p? Informally, we are asking whether the i exists modulo p. Theorem 7.2.4. Let p be an odd prime. −1 is a quadratic residue modulo p if and only if p ≡ 1 (mod 4).

39 Proof. This follows right away from Euler’s criterion, since (−1)(p−1)/2 is 1 if and only if p ≡ 1 (mod 4).

7.3 Exercises due March 16 (This is assignment #5.)

1. List the quadratic residues modulo 13.

2. How many quadratic residues are there modulo 9, 25, 27? Formulate a conjecture about the number of squares modulo pn, where p is an odd prime and n ≥ 1.

3. The number p = 28 + 1 is prime. Decide if 2 is a quadratic residue modulo p. Do the same for p = 216 + 1.

4. Let m = p1 ··· pn be a product of distinct odd primes pi. How many units modulo m are squares?

5. (2 pts) Let m = p1 ··· pn be a product of distinct odd primes pi. Under what conditions does x2 ≡ −1 (mod m) have a solution? How many solutions are there?

6. (2 pts) Let p be an odd prime, and let a be an integer. Prove that there exists a solution to x2 + y2 ≡ a (mod p).

7. (2 pts.) Let p be an odd prime, and let x = [(p − 1)/2]!. Prove that

x2 ≡ (−1)(p+1)/2 (mod p).

(You will need Wilson’s theorem, (p − 1)! ≡ −1 (mod p).) This gives another proof that if p ≡ 1 (mod 4), then x2 ≡ −1 (mod p) has a solution.

8 Quadratic Reciprocity

8.1 The Legendre symbol In the real numbers R, the nonzero squares are exactly the positive numbers, and the nonsquares are exactly the negative numbers. From this we deduce that the product of two nonsquares is a square. This is not at all true in Z, since for instance 2 ··· 3 = 6 is not a square. But this property is recovered in Z/pZ for an odd prime p:

40 Theorem 8.1.1. Let p be an odd prime. Then in Z/pZ:

1. The product of two nonzero quadratic residues is again a nonzero quadratic residue.

2. The product of a nonzero residue and a nonresidue is a nonresidue.

3. The product of two nonresidues is a residue.

Proof. Suppose x and y are two units modulo p. Let g be a primitive root modulo p. Then logg(xy) ≡ logg(x)+logg(y) (mod p−1). By Theorem 7.2.3, a unit is a residue if and only if its logg is even. Therefore the theorem is reduced to the observation that even plus even is even, even plus odd is odd, and odd plus odd is even.

Definition 8.1.2. Let p be an odd prime, and let a be an integer. The Legendre symbol is defined as  1, a is a unit residue modulo p a  = −1, a is a nonresidue modulo p p 0, p|a.

(Often this symbol is pronounced “a on p”.) Theorem 8.1.1 can now be restated elegantly as follows: for integers a and b, ab a  b  = . p p p Furthermore, by Euler’s criterion we have a a(p−1)/2 ≡ (mod p). p

8.2 Some reciprocity laws Let us look for some patterns in the Legendre symbol. The patterns will  a  take this form: we would like to predict what p is, based on what p is modulo some other number. Such a rule is called a reciprocity law. The simplest case is when a = −1, where we have Theorem 7.2.4. This says that ( −1 1, p ≡ 1 (mod 4) = (−1)(p−1)/2 = p −1, p ≡ −1 (mod 4).

41 The next case to examine is a = 2. It turns out that that the correct reciprocity law is   ( 2 2 1, p ≡ ±1 (mod 8) = (−1)(p −1)/8 = p −1, p ≡ ±3 (mod 8). We will not prove this law in its entirety right now; instead we will offer the following partial result.

 2  Theorem 8.2.1. If p ≡ 1 (mod 8), then p = 1. Our proof will be based on the following observation about complex numbers (!). Let z = e2πi/8. This is a primitive 8th root of 1, because z8 = 2πi k iθ e = 1, but z 6= 1√ for 1 ≤ k < 8. Using Euler’s√ formula e = cos θ+√i sin θ, we find z = (1 + i)/ 2 and z−1 = (1 − i)/ 2. Therefore z + z−1 = 2.

Proof. Let g be a primitive root modulo p. Since p ≡ 1 (mod 8), we may (p−1)/8 4 set z = g ; by Theorem 5.1.2, ordp(z) = 8 and ordp(z ) = 2; the latter relation tells us that z4 ≡ −1 and therefore z2 ≡ −z−2 (mod p). Let α = z + z−1. Then α2 = (z + z−1)2 = z2 + z−2 + 2 ≡ 2 (mod p). Therefore 2 is a quadratic residue modulo p.

The same reasoning can be used to prove the following reciprocity law:

 −3  Theorem 8.2.2. If p ≡ 1 (mod 3), then p = 1. √ For this, one is inspired by the equation ω + ω−1 = −3, where ω = e2πi/3. The reader is invited to check the details.

8.3 The main quadratic reciprocity law Theorem 8.3.1. Let p and q be distinct odd positive primes. Then

p q  p−1 q−1 = (−1) 2 2 q p The symmetry between p and q is the reason Theorem 8.3.1 is called a reciprocity law. The right side of the equation is −1 if p ≡ q ≡ 3 (mod 4), and 1 in all other cases. Thus a restatement of Theorem 8.3.1 is the following:

 p   q   q = p , p ≡ 1 (mod 4) or q ≡ 1 (mod 4),  p   q   q = − p , p ≡ q ≡ 3 (mod 4).

42  5  p  As an example, since 5 ≡ 1 (mod 4), Theorem 8.3.1 predicts that p = 5 5  for all positive odd primes p 6= 5. We confirm this for p = 11: 11 = 1 2 11  1  (since 5 ≡ 4 (mod 11)), and indeed 5 = 5 = 1. Theorem 8.3.1 is a truly deep result. It was first proved by Gauss around 1797. Gauss (and others) would go on to publish many proofs. Later on in this course, we will present on of Gauss’ proofs. Theorem 8.3.1 provides a strategy for computing the Legendre symbol. 91  For instance, let’s compute 101 . The first step is to factor the “numerator”: 91 = 7 · 13. Therefore  91   7   13  = 101 101 101 101 101 = 7 13 3 10 = 7 13 3  2   5  = 7 13 13 3  5  = − 7 13 7 13 = 3 5 1 3 = 3 5 5 2 = = = −1. 3 3

Notice the steps involved: factor the numerator(s), apply quadratic reci- procity, reduce the numerator(s) modulo the denominator(s), and then re- peat. If a and p are very large, then this method is actually impractical, because of the factoring step.

43 8.4 The Jacobi symbol 91  In the example of 101 above, suppose we didn’t know that 91 was com- posite. We would then proceed to apply quadratic reciprocity directly:

 91  101 = 101 91 10 = 91  2   5  = 91 91  5  = − 91 91 1 = − = − = −1. 5 5

We arrived at the correct answer regardless! In fact we can justify the above manipulations using an extension of the Legendre symbol which allows composite (but odd) numbers in the denomi- nator. For a positive odd number P which is the product of primes p1 ··· pt, we define the Jacobi symbol

t  a  Y  a  = . P p i=1 i Then the Jacobi symbol is multiplicative in both its numerator and denom- a  b  inator. Another important observation is that P = P whenever a ≡ b (mod P ). It turns out that the Jacobi symbol obeys much the same reciprocity laws as the Legendre symbol.

Theorem 8.4.1. Let P be a positive odd number. The Jacobi symbol has the following properties:

−1  (P −1)/2 1. P = (−1) .

2  (P 2−1)/8 2. P = (−1) . 3. For another positive odd number Q which is coprime to P , we have

P  Q P −1 Q−1 = (−1) 2 2 . Q P

44 We warn the reader that the Jacobi symbol does not predict whether −1  −1  −1  a is a quadratic residue modulo P . For instance, 21 = 3 7 = (−1)(−1) = 1, but −1 is not a square modulo 21. The only use of the Jacobi symbol for us is as an intermediate step in calculations for the Legendre symbol. If we use the Jacobi symbol, we no longer have to factor any numbers (with the exception of factoring out powers of 2, which is easy.)  a  Executing this algorithm for computing p is on par with running the Euclidean algorithm for a and p, which is to say it is very fast indeed.

8.5 Exercises due March 23 (This is assignment #6.)

38  1. Evaluate the Legendre symbol 79 . 31  2. Evaluate the Legendre symbol 103 . 3. Let p be a prime such that q = 2p + 1 is also prime. Let a be a unit  a  modulo q other than ±1. Show that if q = −1, then a must be a primitive root modulo q.

4. Let p = 2n + 1 be a Fermat prime with n ≥ 2. (In fact n itself must be a power of 2). Prove that 3 is a primitive root modulo p.

5. Use the law of quadratic reciprocity to show that for an odd prime p 6= 5: ( 5 1, p ≡ ±1 (mod 5) = p −1, p ≡ ±2 (mod 5)

6. Use the law of quadratic reciprocity to show that for an odd prime p 6= 3: ( 3 1, p ≡ ±1 (mod 12) = p −1, p ≡ ±5 (mod 12)

7. Let p be an odd prime, and let a be a unit modulo pn which is a quadratic residue. Show that a is also a quadratic residue modulo pn+1. (Therefore by induction, if a is a nonzero square modulo p, then it is a square modulo all powers of p.)

8. Find a solution to the congruence x2 ≡ 14 (mod 53).

45 9. Let p ≡ 1 (mod 3) be a prime. Show that a unit a modulo p is a perfect cube if and only if a(p−1)/3 ≡ 1 (mod p).

10. Let p ≡ 2 (mod 3) be a prime. Show that a unit a modulo p is always a perfect cube.

9 The Gaussian integers

9.1 Motivation and definitions Here is a list of properties enjoyed by the integers Z:

• They are closed unter addition, subtraction, and multiplication, but not division.

• There is a division algorithm, which leads to a Euclidean algorithm, which computes the gcd.

• If a and b are coprime then ax + by = 1 has a solution.

• If a|bc and gcd(a, b) = 1 then a|c.

• In particular, if p is prime and p|ab then p|a or p|b.

• Every nonzero element can be expressed as a product of primes, which is unique up to rearranging and units.

In this section we will explore an extension of Z to the complex numbers, which turns out to satisfy all of these properties.

Definition 9.1.1. A Gaussian integer is a of the form a + bi, where a, b ∈ Z. The set of Gaussian integers is denoted Z[i].

Thus, elements of Z[i] lie on a square (where all the squares have side length 1) in the . To avoid confusion, we can say that elements of Z are called rational integers. It is easy to check that Z[i] ⊂ C is closed under the operations of addi- tion, subtraction, and multiplication. For instance, the relation

(a + bi)(c + di) = (ac − bd) + (ad + bc)i shows that Z[i] is closed under multiplication.

46 All the same, we see that Z[i] is not closed under division. For instance, 1 1 − 2i 1 2 = = − i. 1 + 2i (1 + 2i)(1 − 2i) 5 5

For α, β ∈ Z[i], let us write α|β if there exists γ ∈ Z[i] with β = αγ. One major difference between Z and Z[i] is that elements of Z can be compared with the relation <, whereas it is nonsense to say that α < β for Gaussian integers α and β. The relation < among integers is quite important, since we need it to apply the well-ordering Principle. To remedy this problem, we introduce the norm

N(a + bi) = |a + bi|2 = a2 + b2

Then if α ∈ Z[i], the norm N(α) is a non-negative integer. Crucially, the norm is multiplicative: N(αβ) = N(α)N(β). Thus if α|β, then N(α)|N(β). An element α ∈ Z[i] is a unit if α|1, which is to say that the multiplicative inverse of α lies in Z[i].

Theorem 9.1.2. The units of Z[i] are 1, −1, i, −i.

Proof. Suppose α = a + bi is a unit. If αβ = 1, then N(α)N(β) = 1. Since N(α) is a non-negative integer, this is only possible if N(α) = a2 + b2 = 1, which forces α to be one of 1, −1, i or −i.

Definition 9.1.3. Two Gaussian integers α and β are associates if there exists a unit u such that β = uα.

This is the same as saying that α|β and β|α.

Definition 9.1.4. Let α, β ∈ Z[i]. A common divisor of α and β is a Gaussian integer δ with δ|α and δ|β. We write δ = gcd(α, β) if N(δ) ≥ N(δ0) for any other common divisor δ0.

Somewhat confusingly, gcd(α, β) isn’t quite unique. If δ is a gcd of α and β, then so is any associate of δ.

Definition 9.1.5. A nonzero Gaussian integer π is prime if (a) it is not a unit and (b) it is not equal to a product of non-units. Such π are called Gaussian primes.

Again, to avoid confusion, we will refer to a prime in Z as a rational prime. Let’s observe that a rational prime isn’t necessarily a Gaussian prime: 5 is a rational prime, but 5 = (1 + 2i)(1 − 2i).

47 How would one verify that a given Gaussian integer is prime? For in- stance, let π = 2 + 3i. If π is not prime, then it factors as π = βγ for non-units β, γ ∈ Z[i], then N(β)N(γ) = N(π) = 13. Since 13 is a rational prime, this is only possible if N(β) = 1 or N(γ) = 1, which contradicts the fact that β and γ are non-units. Generalizing: if N(π) is a rational prime, then π is a Gaussian prime. As another example, let π = 7. If 7 = βγ for non-units β and γ, then N(7) = 49 = N(β)N(γ), which is only possible if N(β) = N(γ) = 7. If β = a + bi, we get a2 + b2 = 7, which has no solutions in rational integers a, b. Therefore 7 is a Gaussian prime. Generalizing: if p is a rational prime which is not the sum of two perfect squares, then p is also a Gaussian prime. We would like to give a classification of Gaussian primes, and also answer the question of which primes are expressible as a2 + b2, but this will have to wait.

9.2 The division algorithm and the gcd Theorem 9.2.1. Let α, β ∈ Z[i] be Gaussian integers with β 6= 0. There exist γ, δ ∈ Z[i] such that α = βγ + δ and N(δ) < N(β). Proof. Consider the complex number α/β. It falls somewhere within a unit square whose vertices are Gaussian integers.√ The farthest a point in the square can be from one of the vertices is 1/ 2 (that is, the distance from the√ center to any vertex). Thus there exists γ ∈ Z[i] such that |α/β − γ| ≤ 1/ 2. Squaring both sides and rearranging gives N(α − βγ) ≤ N(β)/2 < N(β). Now we can let δ = α − βγ. Note that the quotient and remainder are not necessarily unique! Theorem 9.2.2. Let α, β be nonzero Gaussian integers. Any common di- visor of α and β divides gcd(α, β). Furthermore, there exist x, y ∈ Z[i] such that αx + βy = gcd(α, β). Proof. Choose x and y so that δ = αx + βy has the least nonzero norm. Using Theorem 9.2.1, there exists q, r ∈ Z[i] such that α = δq + r and N(r) < N(δ). But then r = α − δq = α(1 − xq) − βqy is also a linear combination of α and β. This is a contradiction unless r = 0, so that in fact δ|α. Similarly, δ|β, so that δ is a common divisor of α and β. If δ0 is another common divisor, then δ0 divides the linear combination αx + βy = δ. Thus N(δ0)|N(δ), and in particular N(δ0) ≤ N(δ). We conclude that δ = gcd(α, β). This theorem implies that gcd(α, β) is unique up to associates.

48 9.3 Unique factorization in Z[i] Theorem 9.3.1. Suppose α, β, γ ∈ Z[i] satisfy gcd(α, β) = 1 and α|βγ. Then α|γ.

Proof. The proof is very similar to the proof of Theorem 2.2.1. By Theorem 9.2.2, there exist x, y ∈ Z[i] with αx + βy = 1. Multiplying by γ, we get αγx + βγy = γ. Since α divides both terms on the left, it divides γ as well.

Corollary 9.3.2. If π is a Gaussian prime, and π|αβ, then π|α or π|β.

Theorem 9.3.3. Let α ∈ Z[i] be nonzero. Then we may factor α as

α = uπ1 ··· πn for a unit u and Gaussian primes πi. This factorization is unique up to reordering the πi and replacing them by associates. Proof. This is quite the same proof as in Theorem 2.2.3 (but you should still check the details!)

9.4 The factorization of rational primes in Z[i] We can now tackle the problem of when a rational prime stays prime in Z[i], and when it factors.

Theorem 9.4.1. Let p be a positive rational prime. Then the factorization of p in Z[i] is as follows:

1. If p = 2, then 2 = −i(1 + i)2.

2. If p ≡ 1 (mod 4), then p = ππ for a Gaussian prime π. In particular p is the sum of two perfect squares.

3. If p ≡ 3 (mod 4) then p is a Gaussian prime.

Proof. The claim about 2 can be checked directly.  −1  Let p ≡ 1 (mod 4). Then since p = 1, there exists an integer x such that x2 ≡ −1 (mod p). This means that p|(x2 + 1) = (x + i)(x − i). If p were a Gaussian prime, then Corollary 9.3.2 would apply, so that p|x + i or p|x − i. But neither can be true, because (x ± i)/p 6∈ Z[i]. Thus p is not a Gaussian prime, and so p = ππ0 for non-units π, π0. Taking norms, we get

49 p2 = N(π)N(π0), so that N(π) = p; this implies that π is a Gaussian prime and π0 = π. Finally, if p ≡ 3 (mod 4), then p is not the sum of two squares, because 3 is not the sum of two squares modulo 4. Therefore (as we noted before) p is a Gaussian prime.

9.5 Exercises due March 30 (This is assignment #7.) 1. Let α = 23 − 9i, β = 3 + 2i. Find Gaussian integers γ, δ such that α = βγ + δ and N(δ) < N(β). 2. Let α = 2 + 3i, β = 4 + i. Find Gaussian integers x, y such that αx + βy = 1. 3. Factor into Gaussian primes: 29, 39, 7 + 9i. 4. True or false: if α, β ∈ Z[i] and N(α)|N(β), then α|β. 5. Let a, b ∈ Z, and let α = a + bi. Show that 1 + i|α if and only if a and b are either both even or both odd. 6. Let a, b ∈ Z be coprime, and let α = a+bi ∈ Z[i]. Show that gcd(α, α) is either 1 or 1 + i. 7. Prove Fermat’s little theorem for Gaussian primes: For a Gaussian prime π and a Gaussian integer α, we have αNπ ≡ α (mod π).

8. Let π be a Gaussian prime. Show that the only solutions to x4 ≡ 1 (mod π) are 1, −1, i, −i. 9. Let n = (10002 + 1)(20002 + 1). Express n as the sum of two perfect squares. Then do it in a different way, using entirely different squares. 10. Let p be an odd prime. Note that (1 + i)2 = 2i. Also remember that p divides the middle binomial coefficients, so that (1 + i)p ≡ 1 + ip (mod p). Combined these facts to show that in Z[i] we have the following congruence:   p 2 p−1 1 + i ≡ i 2 (1 + i) (mod p). p

 2  This can be used to deduce the reciprocity law for p .

50 10 Unique factorization and its applications

Theorem 2.3.3 has the following analogue in Z[i].

Theorem 10.0.1. Let α, β ∈ Z[i] be relatively prime. If αβ is an nth power in Z[i], then α = uγn for some γ ∈ Z[i] and some unit u (and similarly for β).

We present two applications of Theorem 10.0.1 to Diophantine equations.

10.1 Pythagorean triples, revisited We can use unique factorization in Z[i] to come up with a formula which generates Pythagorean triples in a different way than in 2.5. Let a, b, c be integers satisfying a2 + b2 = c2, with gcd(a, b) = 1. As we already observed, a and b cannot both be even (since then they wouldn’t be coprime) and they cannot both be odd (there would arise a contradition modulo 4). Without loss of generality we may assume that a is positive and odd, and b is positive and even. Let α = a + bi ∈ Z[i], so that αα = c2. By Exercises #4 and #5 from 9.5, gcd(α, α) = 1. Therefore by Theorem 10.0.1, α = uγ2 for a unit u. Let’s write γ = p + qi, so that γ2 = p2 − q2 + 2pqi. If the unit u is 1, we can equate real and imaginary parts to get

a = p2 − q2 b = 2pq c = p2 + q2

(Other choices of units simply result in permuting a and b and changing their signs.) As p and q run through positive relatively prime integers of opposite parity with p > q, the formulas above produce all primitive Pythagorean triples (a, b, c) with a positive and odd and b positive and even.

10.2 A cubic Diophantine equation In this example, we use unique factorization in Z[i] to solve a Diophantine equation in two variables of degree 35

Theorem 10.2.1. The only solution to the Diophantine equation y2 = x3−1 is (1, 0). 5The graphs of such equations are called elliptic curves. It is a rich and interesting problem to find all integral or rational points to an .

51 Proof. Let x and y be integers with y2 = x3 − 1. First we examine the parities of x and y. If x is even, then x3 ≡ 0 (mod 4), but then y is odd, so that y2 ≡ 1 (mod 4). This leads to a contradiction. Thus x is odd and y is even. We write the equation as y2 + 1 = x3 and factor the left side in Z[i]:

(y + i)(y − i) = x3.

Since y is even, y ± i is not divisible by 1 + i (9.5 #5), and then we must have gcd(y + i, y − i) = 1 (9.5 #6). Therefore we can apply Theorem 10.0.1 to get y + i = uz3 for some z ∈ Z[i]. In fact, since every unit in Z[i] is a perfect cube, we can in fact write y + i = z3. Letting z = a + bi, we get

y + i = (a + bi)3 = a3 − 3ab2 + (3a2b − b3)i.

Therefore

y = a3 − 3ab2 1 = 3a2b − b3 = b(3a2 − b2).

The only way for the second equation to be true is if b = 3a2 − b2 = ±1. By inspection we get b = −1 and a = 0, which leads to y = 0 and x = 1. √ 10.3 The system Z[ −2] The reader might wonder if this method can be used to solve other Dio- phantine equations, such as the elliptic curve

y2 = x3 − 2. (10.3.1)

This time, x and y must have the same parity. If they are both even, so that x = 2z and y = 2w, then 2z2 = 4w3 − 1, which is impossible. Therefore x and y are both odd. 2 3 After rewriting this as y √+ 2 = x , we observe that the left side factors not in Z[i], but rather√ in Z[ −2], this being the set of complex numbers of√ the form a + b −2, where a, b ∈ Z. So let us turn our attention to Z[ −2]. It is easy to observe that it is closed under addition, subtraction and multiplication.√ We can once again define α|β to mean that there exists γ√∈ Z[ −2] with β = αγ. There is once again the norm function N(a + 2 2 b −2) = a√+ 2b , and the same logic as in Theorem 9.1.2 shows that the units of Z[ −2] are just ±1.

52 Returning to the Diophantine equation above, we can rewrite it as √ √ (y + −2)(y − −2) = x3. √ √ If d is a common factor√ of y + −2 and y − −2, then d divides their difference, which is 2 −2; it follows that N(d)|8, so that N(d) is a power of 2. But d must also must divide x3, which means that N(d)|x6; since x is odd,√ the only possibility√ is that N(d) = 1, so that d is a unit. Therefore y + −2 and y − −2 are coprime. Can we apply the same technique we used in (10.2)? There, we relied on Theorem 10.0.1, which in turn relied on the unique factorization property for Z[i] in Theorem 9.3.3, which in turn relied on the division algorithm√ for Z[i] in Theorem 9.2.1. So if we can prove the division algorithm for Z[ −2], then√ the same chain of reasoning√ could apply, and we√ could conclude that y + −2 = z3 for some z ∈ Z[ −2]. Letting z = a + b −2, we get √ √ y + −2 = (a + b −2)3 √ = a3 − 6ab2 + (3a2b − 2b3) −2

Therefore

y = a3 − 6ab2 1 = 3a2b − 2b3 = b(3a2 − 2b2).

Once again, b = 3a2 − 2b2 = ±1. The case b = −1 leads to an absurdity. Thus b = 1, 3a2 − 2 = 1 and a = ±1, which leads to the solutions (3, ±5) to (10.3.1). We have thus proved that these are the only solutions. As for the division algorithm,√ we have seen that this can be proved using geometry. The elements of Z[ −2] comprise√ the corners of a tiling of the plane by rectangles,√ with side lengths 1 and 2. The center of such a rect- angle is distance 3/2 from the√ corners, and obviously this is the maximum possible√ such distance. Since 3/2 < 1, the division algorithm holds in Z[ −2], and therefore this system has the property of unique factorization into primes.

10.4 Examples of the failure of unique factorization √ It is easy to see how this circle√ of ideas might break down. Consider Z[ −3]. This time, the elements√ of Z[ −3] are the corners of a plane by rectangles with side lengths 1 and 3. The center of such a rectangle lies at a distance exactly 1 from the corners. This leads to the following cascade of failures:

53 √ • The division algorithm fails: If we try to divide 2 into√ 1 + −3, what should the quotient and remainder be? Since (1 + −3)/2 lies in the center of one of these rectangles, the remainder cannot have norm less than N(2). √ • Bezout’s identity fails: The elements 2 and 1 + −3 share no common factors. (Any non-unit common factor would have norm 2, which is 2 2 impossible because√ 2 = a + 3b has no solutions.)√ Thus if Theorem 1.6.1 held for Z[ −3], we√ would expect 2x + (1 + −3)y = 1 to have a solution with√ x, y ∈ Z[ −3]. In fact such a solution does not exist! Let β = 1 + −3, so that N(β) = 4. The norm of 2x + βy is

(2x + βy)(2x + βy) = 4N(x) + 2(xβy + xβy) + 4N(y),

which is always even. √ √ √ • Lemma 2.2.1 fails: In Z[ −3], we have 2|4 = (1 + −3)(1 − −3), yet 2 is coprime with each factor. √ • Unique factorization fails: The elements 2 and 1 ± −3 are prime (in the sense that they√ are nonzero√ elements with no non-unit proper divisors), but 4 = (1 + −3)(1 − −3) = 2 · 2 exhibits 4 as a product of primes in two truly distinct ways. √ √ • Theorem 2.3.3 fails: The elements 1 + −3 and 1 − −3 share no common factors. Their product 4 is a square, but neither factor is a square (even up to units).

√It isn’t hard to come up with further examples of such failures: In Z[ −5], for instance, we have the non-unique factorization √ √ 6 = 2 · 3 = (1 + −5)(1 − −5).

10.5 The Eisenstein integers √ As we saw, the failure of Z[ −3] to have a division algorithm had to do with geometry:√ there are elements of C which are of distance 1 from an element of Z[ −3], namely the very centers of the rectangles. This deficit√ can be resolved simply by adding those centers into the system Z[ −3] itself, producing a larger one. Define the complex number ω = e2π/3. Using some basic facts about complex numbers, we gather some data on ω:

54 √ • ω = (−1 + −3)/2.

• ω3 = e2πi = 1.

• ω2 = ω = −1 − ω.

The final equation can be derived efficiently as follows: since ω3 − 1 = 0, we can factor to get (ω − 1)(ω2 + ω + 1) = 0. Since ω 6= 1, the second factor must be 0.

Definition 10.5.1. The Eisenstein integers Z[ω] are the subset of complex numbers of the form a + bω, with a, b ∈ Z.

It is clear that Z[ω] is closed under addition and subtraction. It is closed under multiplication too, because

(a + bω)(c + dω) = ac + (b + d)ω + bdω2 = (ac − bd) + (b + d − bd)ω.

The complex plane can now be tiled by rhombuses whose corners are the elements of Z[ω]. The norm of a + bω is

N(a + bω) = (a + bω)(a + bω2) = a2 − ab + b2.

The units in Z[ω] are exactly those elements of norm 1, which are ±1, ±ω, ±ω2. Note that they form a regular hexagon in the complex plane. The division algorithm holds in Z[ω], since the farthest a complex num- ber can be from an element of Z[ω] is < 1. (In fact this maximum distance is the distance√ of the center of a unit equilateral triangle to one of its corners, which is 1/ 3.) Therefore:

Theorem 10.5.2.Z [ω] has unique factorization. √ Note than in our prior example√ with√Z[ −3], we exhibited two distinct factorizations of 4: 2·2 = (1+ −3)(1−√ −3). In Z[ω], these√ factorizations are the same up to units, because 1 + −3 = −2ω2 and 1 − −3 = −2ω. We close this section by proving a case of quadratic reciprocity, made 2 √possible by the Eisenstein integers. Let p 6= 2, 3 be prime. Since ω − ω = −3, we have on the one hand

√ p−1 √   √ 2 p p −3 (ω − ω ) = −3 = (−3) 2 −3 ≡ −3 (mod p). p

55 On the other hand, since p divides the interior binomial coefficients, (√ −3, p ≡ 1 (mod 3) (ω − ω2)p ≡ ωp − ω2p ≡ √ (mod p). − −3, p ≡ −1 (mod 3)

Therefore ( −3 1, p ≡ 1 (mod 3) p = = . p −1, p ≡ −1 (mod 3) 3 √ (Why are we able to cancel −3 from both sides of the congruence? Since p and 3 are relatively√ prime, we can√ solve 3x + py = 1 in integers x, y; this shows that − −3x is an inverse of −3 modulo p. Another small point is that we passed from a congruence between Legendre symbols modulo p to an equality; this is because p does not divide 2 in Z[ω].)

10.6 Exercises due April 13 (This is assignment #8.)

1. The system of congruence classes of Gaussian integers modulo 3 is written as Z[i]/3Z[i]. It has nine elements, eight of which are units: ±1, ±i, ±1±i. Find the orders of all eight units. How many primitive roots are there?

2. Let π = a + bi be a Gaussian prime such that Nπ = p is a rational prime. Show that every α ∈ Z[i] is congruent modulo π to exactly one of 0, 1, . . . , p − 1 modulo π. Thus Z[i]/πZ[i] has p elements. (One says that Z[i]/πZ[i] and Z/pZ are isomorphic).

3. Let n = p1 ··· pr, where the pis are distinct primes with pi ≡ 1 (mod 4). How many ways can we write n = a2 + b2 for integers a and b? Let us only count n = a2 + b2 and n = c2 + d2 as different if |a|= 6 |c| and |a|= 6 |d|.

4. For n ≥ 2, the integer 2n is always a sum of two squares. If n = 2k is even, then 2n = (2k)2 + 02, and if n = 2k + 1 is odd, then 2n = (2k)2 + (2k)2. Are there any other ways to write 2n as a2 + b2?

5. For an odd prime p, show that ( −2 1, p ≡ 1, 3 (mod 8) = p −1, p ≡ 5, 7 (mod 8).

56 6. (2 pts.) For an odd prime p, prove that p = a2 + 2b2 has a solution in integers a, b if and only if p ≡ 1, 3 (mod 8).

7. (2 pts.) For a prime p 6= 2, 3, prove that p = a2 +ab+b2 has a solution in integers a, b if and only if p ≡ 1 (mod 3).

 −5  8. True or false: for a prime p satisfying p = 1, the equation p = a2 + 5b2 has a solution in integers.

11 Some

Analytic number theory is the marriage of number theory to calculus, with the aim of answering quantitative questions about the former. For instance, we have already seen that there are infinitely many primes, but just how infinite are they? Plenty of sequences are infinite, such as the odd num- bers, the numbers which are 7 (mod 10), the square numbers, the powers of two, etc. But odd numbers are more common than numbers which are 7 (mod 10), which are more common than square numbers, which are much more common than powers of two. Where do the primes rank among this list, and how do we even make such a question precise? One way of answering such a question is with a counting function. For a positive real number x, let π(x) be the number of primes p ≥ x. Thus π(10) = 4 because there are four primes less than 10. Here are the values of π(x) for the first few powers of 10:

x π(x) π(x)/x 10 4 .4 102 25 .25 103 168 .168 104 1229 .1229 105 9592 .09592

We have listed π(x)/x, which is the ratio of primes among positive integers ≤ x. It seems that this ratio decreases, but only very slowly. Contrast this with the similar ratio for odd numbers (roughly 1/2), for numbers which √ are 7 (mod 10) (1/10), for squares (1/ x), and for powers of 2 (log2(x)/x). The first two ratios don’t decrease at all, but the second two decrease to 0 fairly quickly. Mathematicians have studied π(x) for centuries. The following theorem was conjectured by Gauss in 1793 (when he was a teenager) and proved by

57 Hadamard and de la Vall´eePoussin in 1896 using (of all things) complex analysis.

Theorem 11.0.1 (The ). We have π(x) ∼ x/ log x. That is, π(x) lim = 1. x→∞ x/ log x Unfortunately we will not be developing the tools to prove this here, but we can develop some interesting results nonetheless.

P 11.1 p 1/p diverges Here’s a crude means for testing whether a sequence is “dense” or ”sparse”. The harmonic series is 1 1 1 + + + ... 2 3 diverges (it can be compared with R ∞ dx/x, for instance). So if we are given 1 P a subset S ⊂ Z≥1, we can ask whether n∈S 1/n converges or diverges; if it converges, we can conclude that there “aren’t that many” elements6 of S. Here are some examples: P∞ • If S is the set of odd numbers, then the sum in question is k=0 1/(2k+ 1), which diverges.

• If S is the set of numbers which are 7 (mod 10), then the sum is P∞ k=0 1/(10k + 7), which diverges. P∞ 2 • If S is the set of square numbers, the sum is n=1 1/n , which con- R ∞ 2 verges (in comparison with 1 dx/x ). P∞ n • If S is the set of powers of 2, the sum is n=0 1/2 , which converges quite rapidly (to 2, in fact).

The set of primes, in turns out, falls in the “dense” column. Here and P elsewhere, we use the notation p to mean a sum over primes p. P Theorem 11.1.1. p 1/p diverges. 6Pedantic note: if S is infinite, then its cardinality is the same as that of Z; i.e. it is countable. But this isn’t the sort of comparison we are looking for – instead we want to know how spread out S is among the positive integers.

58 Proof. For a real number s, let ζ(s) denote the sum

∞ X 1 ζ(s) = . ns n=1 Then ζ(s) converges for s > 1 but diverges for s ≤ 1. This function is known as the , and is very important for analytic number theory. This is because of its Euler factorization, valid for s > 1:

 1 1  ζ(s) = 1 + + + ··· 2s 22s  1 1  × 1 + + + ··· 3s 32s  1 1  × 1 + + + ··· ··· 5s 52s −1 Y  1  = 1 − ps p

The first equality no more and no less than is the theorem of unique factor- ization into primes: each term 1/ns is unique the product of factors 1/pks, each of which appears somewhere in the product. The second equality comes from the geometric series,

1 + x + x2 + ··· = (1 − x)−1, valid for |x| < 1. In order to turn the product in to a sum, we take logarithms:

−1 X  1  log ζ(s) = log 1 − . ps p

We now apply the Taylor series for log(1 − x)−1: 1 1 log(1 − x)−1 = x + x2 + x3 + ..., 2 3 again valid for |x| < 1. Thus

X  1 1 1  log ζ(s) = + + + ··· . ps 2p2s 3p3s p

59 Now let’s think about what happens as s → 1 from the right. It turns out that X 1 1 + + ··· 2p2 3p3 p converges (left as exercise). But log ζ(s) → ∞ as s → 1. Thus we can P s P conclude that p 1/p → ∞ as s → 1, which is to say that p 1/p diverges.

11.2 Classes of primes, and their infinitude There are infinitely many primes among the integers. But we can also ask, given an infinite subset S of the positive integers (i.e., a sequence), are there infinitely many primes in S? For instance, are there infinitely many primes p among the following sets?

• The set of intgers of the form 4n + 1.

• The set of integers of the form 4n − 1.

• The set of integers of the form 2n − 1

• The set of integers of the form 2n + 1

• The set of integers of the form n2 + 1.

Every prime other than 2 falls into one of the first two categories, so we might guess that there infinitely many primes of either sort. The third class refers to the Mersenne primes, of which we have discovered quite a few, and the fourth class refers to the Fermat primes, of which we have discovered five. Unfortunately, it is not known whether there exist infinitely many primes of the form 2n − 1, 2n + 1, n2 + 1 or indeed almost any formula involving one variable, unless it happens to be a linear formula, such as the first two examples given here.

Theorem 11.2.1. There are infinitely many primes of the form 4n + 1 and 4n − 1.

Proof. We can vary the original method of Euclid’s proof for both cases. First we do the case of primes of the form 4n − 1. Suppose there are finitely many of these, say p1, ··· , pt. Then N = 4p1 ··· pt − 1 must be a product of odd primes, which cannot all be of the form 4n + 1 (since N ≡ 3 (mod 4)). Thus there exists a prime dividing N which is 3 (mod 4), which

60 has to be one of the pi. But none of the pi divide N, since they clearly divide N + 1. This is a contradiction. Now consider the case of primes of the form 4n + 1. Suppose there are 2 2 finitely many of these, say p1, ··· , pt. Then N = 4p1 ··· pt + 1 is divisible by some odd prime, say q. Then x2 ≡ −1 (mod q) has a solution, namely  −1  x = 2p1 ··· pt. Therefore q = 1, which implies that q ≡ 1 (mod 4), which means that q = pi for some i. But then q|N − 1 and q|N, which is a contradiction.

This theorem raises the question of whether there are infinitely many primes which are a (mod m), where a and m are integers. The answer will be no when a and m share a common factor, but otherwise it will be yes:

Theorem 11.2.2 (Dirichlet’s theorem on primes in arithmetic progres- sions). Let a, m ∈ Z with gcd(a, m) = 1. There are infinitely many primes which are a (mod m).

There isn’t a Euclidean method to prove this theorem. Instead, Dirichlet used analytic means. In the next section we will get a taste of the sort of method he used, although we won’t quite prove the whole theorem.

P 11.3 p≡±1 (mod 4) 1/p diverges Let X (−1)(n−1)/2 1 1 1 L(s) = = 1 − + − + ··· . ns 3s 5s 7s n odd As an alternating series, L(s) converges as long as its terms are strictly decreasing, which is true for s > 0. Remarkably, L(s) has an Euler factor- ization: !−1 Y (−1)(p−1)/2 L(s) = 1 − . ps p odd

(n−1)/2 −1  This is essentially because (−1) = n (Jacobi symbol) is multi- plicative in n. We would now like to take logarithms of both sides, but we must be mindful that we are not taking the logarithm of zero or a . At s = 1, for instance, we have 1 1 1 π 1 − + − + ··· = 3 5 7 4

61 (this is obtained by plugging in x = 1 into the Maclauren series for tan−1(x).) Therefore L(1) 6= 0, and so there is no problem defining log L(s) for values of s near 1. For such values we have !−1 X (−1)(p−1)/2 log L(s) = log 1 − ps p odd ∞ X X (−1)n(p−1)/2 = npns p odd n=1

P s For i = 1, 3, let Pi,4(s) = p≡i (mod 4) 1/p . Then we can rewrite the above as ∞ X (−1)n(p−1)/2 log L(s) = P (s) − P (s) + 1,4 3,4 npns n=2 The infinite sum converges for s = 1, because it converges absolutely (this was in the exercises). Therefore:

lim P1,4(s) − P3,4(s) exists. s→1+ On the other hand,

lim P1,4(s) + P3,4(s) does not exist, s→1+ because this is just the sum of 1/ps over all odd primes, and we already know ∼p 1/p diverges. Now if lims→1+ P1,4(s) existed, then so would lims→1+ P3,4(s), in which case the limit of the sum would exist, which it doesn’t. A similar argument applies to P3,4(s). As a result: P Theorem 11.3.1. p≡i (mod 4) 1/p diverges for i = 1, 3. P s Actually these methods tells us a little bit more. Let P (s) = p 1/p , + and let Q(s) = P1,4(s)−P3,4(s). We have proved that P (s) → ∞ as s → 1 , but lims→1+ Q(s) exists. On the other hand we can express P1,4 and P3,4 1 −s in terms of these quantities: P1,4(s) = 2 (P (s) + Q(s) − 2 ) and P3,4(s) = 1 −s 2 (P (s) − Q(s) − 2 ). We have P (s) P (s) + Q(s) − 2−s 1 lim 1,4 = lim = s→1+ P (s) s→1+ 2P (s) 2 and P (s) P (s) − Q(s) − 2−s 1 lim 3,4 = lim = . s→1+ P (s) s→1+ 2P (s) 2

62 For a subset S of the set of primes, we may define the function PS(s) = P s p∈S 1/p , which is convergent (at least) for s > 1. Then the Dirichlet den- sity of S is defined as the limit lims→1+ PS(s)/P (s), if this exists. Therefore the set of all primes has density 1, while any finite set of primes as density 0. We have shown that the Dirichlet density of primes which are 1 (mod 4) and 3 (mod 4) are 1/2 each. Dirichlet’s original theorem is that for coprime integers a and m, the Dirichlet density of primes which are a (mod m) is 1/φ(m). This is what you would expect, if you figured that nature has no “bias” in distributing the primes among the classes of units modulo m. There are other notions of density as well. We can define a function πS(x) to be the number of primes in S which are ≤ x, and then of S is the limit limx→∞ πS(x)/π(x), if this exists. One can show that if the natural density exists, then so does the Dirichlet density, and these are the same. It is known that the natural density of primes which are a (mod m) exists (and so equals 1/φ(m)).

11.4 Exercises due April 20 (This is assignment #9.) 1. The number of primes below 1026 is π(1026) = 1, 699, 246, 750, 872, 437, 141, 327, 603. The prime number theorem gives the estimate π(x) ≈ x/ log x. What is the percentage error of this approximation for x = 1026? (The percentage error of an approximation is the difference between the true value and the approximation, divided by the true value and expressed as a percentage.) 2. Using a Euclidean argument, show that there are infinitely many primes which are 2 (mod 3). 3. (2 pts.) Using a Euclidean argument, show that there are infinitely many primes which are 1 (mod 3). 4. Let f(n) be a function whose inputs are positive integers and whose outputs are complex numbers. Then we can form the Dirichlet series P s n≥1 f(n)/n . If f(mn) = f(m)f(n), we say that f is multiplicative. Show that if f is multiplicative, then the Dirichlet series has an Euler factorization: X Y f(n)/ns = (1 − f(p)p−s)−1. n≥1 p

63 (Ignore questions of convergence in this problem.)

5. We can multiply together Dirichlet series to produce new ones. For instance, ∞ X d(n) ζ(s)2 = , ns n=1 for some function d. What is the function d?

6. Similarly, ∞ X f(n) ζ(s − 1)/ζ(s) = ns n=1 for some function f. What is the function f?

7. Show that ζ(n) − 1 ≤ 1/(n − 1) for n = 2, 3,... , by comparing the sum to an integral.

8. (2 pts.) Use the result of the previous exercise to show that

∞ X X 1 npn p n=2 converges. You can replace the sum over primes with a sum over integers k ≥ 2, and show that the new sum (which is larger) still converges. This is the technical result that allows us to prove Theorem 11.1.1.

12 Continued fractions and Pell’s equation

12.1 A closer look at the Euclidean algorithm Let’s examine what is actually going on in the extended Euclidean algorithm. Let’s start with the inputs 33 and 26:

33 = 1 · 26 + 7 26 = 3 · 7 + 5 7 = 1 · 5 + 2 5 = 2 · 2 + 1 2 = 2 · 1 + 0.

The quotients 1, 3, 1, 2, 2 then go into the table:

64 1 3 1 2 2 0 1 1 4 5 14 33 1 0 1 3 4 11 26

(The rows in this table are reversed from the way we usually set things up – ultimately it doesn’t matter.) In doing so, we can find a solution to Bezout’s identity 33x + 26y = 1, namely x = −11, y = 14. But what do the other entries of the table mean? Let’s interpret them as fractions and write them in decimal to six places:

1/1 = 1.000000 4/3 = 1.333333 5/4 = 1.250000 14/11 = 1.272727 33/26 = 1.269231

It seems like the fractions are converging on the final fraction, 33/26, with each fraction being a better approximation than the last. Not only that, but the odd-numbered fractions are less than 33/26, while the even- numbered fractions are greater than it. To see what is happening, we return to the results of the Euclidean algorithm and reinterpret it in terms of the fraction 33/26. Since 33 = 1 · 26 + 7, we have 33 7 = 1 + 26 26 1 = 1 + 26 7

65 Then we substitute 26 = 3 · 7 + 5 in much the same way, and continue:

33 1 = 1 + 26 5 3 + 7 1 = 1 + 1 3 + 7/5 1 = 1 + 1 3 + 2 1 + 5 1 = 1 + 1 3 + 1 1 + 5 2 1 = 1 + 1 3 + 1 1 + 1 2 + 2 This is a continued fraction. Note that in the final result, all numerators are 1, so the only data that matter are the sequence of denominators 1, 3, 1, 2, 2, which are exactly the quotients appearing in the Euclidean algorithm. To save space we can write this as 33 = [1, 3, 1, 2, 2]. 26 What’s more, the approximants to 33/26 we found occur when we trun- cate the continued fraction:

1 = [1] 4/3 = [1, 3] 5/4 = [1, 3, 1] 14/11 = [1, 3, 1, 2]

66 12.2 Continued fractions in the large The theory of continued fractions extends beyond rational numbers like 33/26. Given an arbitrary real number x > 1, we can execute an algo- rithm which produces rational approximations to x. It goes like this: let a0 = bxc. If x is an integer, the algorithm ends there: x = [a0]. Otherwise let x1 = 1/(x − a0), so that x1 > 1. Then apply the same steps to x1: let a1 = bx1c. If x1 = a1, the algorithm ends there: x = [a0, a1]. Otherwise, let x2 = 1/(x1 − a1), etc. After n steps, the algorithm produces the result:

x = [a0, a1, ··· , an−1, an, xn+1] where a0, ··· , an ∈ Z. We can place the ans in the usual table, to produce two lists pn and qn, defined recursively by

pn = anpn−1 + pn−2

qn = anqn−1 + qn−2 with initial conditions p−1 = q−2 = 1, p−2 = q−1 = 0. Then

pn = [a0, a1, ··· , an]. qn (We invite the reader to prove this result by induction.) If x is rational, then this algorithm is identical to the Euclidean algo- rithm; it halts after finitely many steps, producing a finite continued fraction x = [a0, a1, . . . , an]. But if x is irrational, the algorithm will never halt, since each [a0, a1, ··· , an] is clearly a rational number. If 0 < x < 1, then x still has a continued fraction expansion; by conven- tion we let a0 = 0 for such numbers.

Theorem 12.2.1. Let x > 0 be a real number, and let the numbers an, pn and qn be defined as above. 1. We have p p p p 0 < 2 < ··· < x < ··· < 3 < 1 . q0 q2 q3 q1 n 2. pnqn−1 − pn−1qn = (−1) .

pn 1 3. x − < 2 . qn qn

4. If x is irrational, so that the pn and qn are well-defined for all n, then limn→∞ pn/qn = x.

67 Thus if x is irrational, it makes sense to say that

x = [a0, a1, ··· ] is the (infinite) continued fraction expansion for x. Proof. Part 1 follows from the following observation: for positive numbers y1, y2, ··· , yn, we have that [y1, ··· , yn] is larger than [y1, ··· , yn−1] if n is odd, and is smaller otherwise. (Think about why this is true!) Part 2 can be proved by routine induction. Part 3 follows from parts 1 and 2:

pn pn pn+1 1 1 x − ≤ − = < 2 , qn qn qn+1 qnqn+1 qn where in the last step we used the fact that qn+1 > qn. Finally, part 4 follows from part 3, since qn → ∞ as n → ∞.

In fact we can give estimate how well the pn/qn approximate x. If Fn is the nth , then qn ≥ Fn (induction once again!). On the n other hand Fn ∼ φ , where φ = 1.618 ... is the . Thus part 3 of the theorem shows that the accuracy of pn/qn as an approximation to x is exponential, meaning that the number of correct digits grows linearly with n. For instance, the continued fraction expansion of π is π = [3, 7, 15, 1, 292, 1, 1, 1, 2, 1, 3, 1,... ] The approximation 22/7 = [3, 7] is a well-known approximation to π which dates back at least to Archimedes. The better approximation 335/113 = [3, 7, 1, 15] is correct to six places after the decimal point: 355 = 3.14159292 .... 113 This approximation was known to Chinese mathematicians in the 5th cen- tury.

12.3 Real quadratic irrationals and their continued fractions √ Let’s find the continued fraction expansion for 2. We start by observing √ that 2 = 1, and continue as follows: √ √ 2 = 1 + (−1 + 2) √ √ √ (−1 + 2)−1 = 1 + 2 = 2 + (−1 + 2) √ √ √ (−1 + 2)−1 = 1 + 2 = 2 + (−1 + 2)

68 and then we immediately√ notice we are trapped in a loop! The continued fraction expansion for 2 is [1, 2, 2,... ], which we abbreviate as [1, 2]. √We can use our usual table to come up with good rational approximations to 2: 1 2 2 2 ··· 0 1 1 3 7 17 ··· 1 0 1 2 5 12 ··· (Some of these approximations were known to the ancient Babylonians and Indians.) Interestingly, if p/q is the nth approximation obtained this way, 2 2 n then p − 2q = (−1) . √ Let’s do the example of 7 to convince ourselves that this isn’t a fluke. We proceed: √ √ 7 = 2 + (−2 + 7) √ √ √ (−2 + 7)−1 = (2 + 7)/3 = 1 + (−1 + 7)/3 √ √ √ 3/(−1 + 7) = (1 + 7)/2 = 1 + (−1 + 7)/2 √ √ √ 2/(−1 + 7) = (1 + 7)/3 = 1 + (−2 + 7)/3 √ √ √ 3/(−2 + 7) = 2 + 7 = 4 + (−2 + 7)

Since this remainder has√ appeared before, we can deduce that the continued fraction expansion for 7 is periodic as well: √ 7 = [2, 1, 1, 1, 4]

Let’s make another table, this time including the value of p2 − 7q2: 2 1 1 1 4 1 1 ··· p 0 1 2 3 5 8 37 45 82 ··· q 1 0 1 1 2 3 14 17 31 ··· p2 − 7q2 -7 1 -3 2 -3 1 -3 2 -3 We observe here that the value of p2 − 7q2 seems rather small relative to p and q. Some of these observations can be summarized in the following theorem, which is due to Lagrange: Theorem 12.3.1. Let α > 0 be a real number. The following statements are equivalent: 1. The continued fraction expansion of α is (eventually) periodic, that is:

α = [a0, a1, ··· , am, b1, b2, ··· , bn].

69 2. There√ exist rational numbers r, s, and d with d > 0 such that α = r + s d. √ 12.4 Pell’s equation and Z[ d] √ Let d > 0 be an integer which is not a perfect square. We let Z[ d] in the same√ way we did when d was negative: it is the set of numbers of the form a + b d, where a, b ∈ Z. This√ is closed under addition, subtraction and multiplication. Of course, Z[ d] consists of real numbers. This√ makes it difficult to picture it in a nice way: if we plotted elements of Z[ d] as dots on the real number line, the dots would “accumulate” and crowd each other out, rather than appearing as an orderly lattice in the complex plane. Nonetheless, we can copy some of the methods we used√ in the complex case. To wit, we can define the norm of an element of Z[ d] by √ √ √ N(a + b d) = (a + b d)(a − b d) = a2 − db2. Note that this may well be negative! It is still the case that N(αβ) = N(α)N(β), and so the norm function retains its importance for studying units, factorization, and primes. √ Lemma 12.4.1. An element α ∈ Z[ d] is a unit if and only if N(α) = ±1. Proof. If αβ = 1, then N(α)N(β) = 1, so of course N(α) = ±1. √ Thus α = x + y d (with x, y ∈ Z) is a unit if and only if x2 − dy2 = ±1. We remark here that the Diophantine equation a2 − db2 = 1 is called the Pell’s equation. √ In the case d = 2, it is easy to see that u =√ 1+ 2 is a unit. Therefore so are its powers u2, u3,... ; we conclude that Z[ 2] has infinitely many units. The first few powers of u are √ u = 1 + 2 √ u2 = 3 + 2 2 √ u3 = 7 + 5 2 √ u4 = 17 + 12 2.

Interestingly, the coefficients appearing here√ are exactly the numbers ap- pearing in our table of approximations√ for 2! Does there always exist a unit in Z[ d] other than ±1? Examination of some small values of d seems to indicate that it does:

70 √ d unit√ in Z[ d] 2 1 + √2 3 2 + √3 5 2 + √5 6 5 + 2√6 7 8 + 3 7 √ (Here we have listed the unit a + b d with the smallest positive value of b.) However,√ if we continued√ far enough we might find some erratic behavior. Z[ 60] has 31 + 4 60, which is simple enough, but the entry in our table for d = 61 would be √ 29718 + 3805 61, which is rather too large to obtain by hand. √ Our observations so far√ suggest that units in Z[ d] are strongly related to the approximations to d coming from√ its continued fraction expansion. For instance, the large size of the unit in Z[ 61] is “explained” by the length of the periodic part of the continued fraction expansion √ 61 = [7, 1, 4, 3, 1, 2, 2, 1, 3, 4, 1, 14]

12.5 The fundamental unit The following theorem ties everything together.

Theorem 12.5.1. Let d > 0 √be an integer which is not a perfect square. Then there exists a unit u ∈ Z[ d] which is not ±√1. Furthermore, this unit can be chosen in such a way that every unit in Z[ d] is of the form ±un for some n ∈ Z. √ The unit u appearing in the theorem is the fundamental unit in Z[ d].√ In the language of abstract algebra, we would say that the unit group of Z[ d] is isomorphic to Z×Z/2Z. Note that this theorem shows that Pell’s equation x2 − dy2 = 1 always has infinitely many solutions: even if N(u) = −1, every even power of u has norm 1. (It is not the case that x2 − dy2 = −1 always has solutions, though; this is a subtle problem.)

Proof. There are two parts to this theorem: the existence of a unit u 6= ±1, and then the statement that every unit arises from a fundamental unit. We start with Theorem 12.2.1, which gives us an infinite supply of pairs √ 2 of positive integers (p, q) with d − p/q < 1/q . For such a pair we have

71 √ p/q < d + 1, and so √ √ 2 2 2 p p − dq = q − d (p/q + d) q 1 √ √ ≤ q2 (2 d + 1) = 2 d + 1 q2

Thus p2−dq2 takes on only finitely many values. By the pigeonhole principle, at least one value gets repeated infinitely often. That is, we can find an integer m and infinitely many pairs (p, q) with

p2 − dq2 = m.

In fact we need to apply the pigeonhole principle one more time. Each pair (p, q), when considered modulo m, falls into one of m2 possible cases. Therefore there exist residues p0 and q0 modulo m, and infinitely many 2 2 pairs (p, q) of positive integers satisfying p − dq = m, p ≡ p0 (mod m), 0 0 and q ≡ q√0 (mod m). Let (p,√ q) and (p , q ) be two distinct such pairs. Let α = p + q d and α0 = p0 + q0 d. Then let α 1 √ √ u = = (p + q d)(p0 − q0 d) α0 m 1 1 √ = (pp0 − qq0d) + (pq0 − p0q) d m m 0 0 2 2 0 0 Since pp −qq d ≡ p −dq√ ≡ 0 (mod m) and pq −p q ≡ pq−pq ≡ 0 (mod m), the element u lies in Z[ d]. Furthermore, N(u) = N(α)/N(α0) = m/m = 1, so that u is a unit. Since (p, q) 6= (p0, q0), u 6= 1, and since p, p0, q, q0 > 0, u > 0 and so u 6= −1. √ We have therefore shown that there exists a unit u = p + q d with p, q > 0. Let u be the least such unit. (This exists by the well-ordering principle!) Now if v is another unit, we claim that v = ±un for some integer n. After replacing v with −v, we may assume that v > 0. Then there exists n n+1 n n ∈ Z such that u ≤ v < u : this is essentially because limn→−∞ u = 0 n −n and limn→∞ u = ∞. Let w√= u v, so that 1 ≤ w < u. √ −1 Let us write w = a + b d. If N(w) = 1, then w √= a − b d, so that 2a = w + w−1 ≥ 2 and therefore a > 0. Similarly 2b d = w − w−1 ≥ 0 (since w ≥ 1), so that b ≥ 0. But u was assumed to be the least unit > 1 with positive coefficients; therefore w = 1. −1 The√ argument in the case that N(w√) = −1 is similar: we have w = −a + b d, so that 2a = w − w−1 and 2b d = w + w−1; in any case both of these are nonnegative.

72 √ 12.6 The question of unique factorization for Z[ d] √ We briefly touch upon the subtle topic of unique factorization in Z[ d], where d > 0 is not a square. We first observe that the existence of many units makes it harder to√ tell factors apart. For instance, 7 has the following two factorizations in Z[ 2]: √ √ √ √ 7 = (3 + 2)(3 − 2) = (−1 + 2 2)(1 + 2 2). √ √ But 3 + 2 is associate with −1 + 2 2, since √ 3 + 2 √ √ = −(1 + 2) −1 + 2 2 is a unit. √ In fact Z[ 2] has the property of unique factorization into primes, and we can prove it the same way as usual: it has a division algorithm. √ √ Theorem 12.6.1. Let α, β ∈ Z[ 2] with β 6= 0. There exist γ, δ ∈ Z[ 2] such that α = βγ + δ and |N(δ)| < |N(β)|.

(Note the presence√ of the signs: these are important, because norms in Z[ 2] can be negative.) √ Proof. Write α/β = a1 + a2 2, where a1, a2 ∈ Q. Let q1 and q2 be the integers nearest to a1 and√a2 respectively. Then if ri = √ai − qi, we have |ri| ≤ 1/2. Let γ = q1 + q2 2 and δ = α − βγ = β(r1 + r2 2). We have

2 2 2 2 3 |N(δ)/N(β)| = r − 2r ≤ r + 2r ≤ < 1, 1 2 1 2 4 so that |N(δ)| < |N(β)|. √ √ √There is also a division algorithm for Z[ 3], but not for Z[ 5]. In fact Z[ 5] lacks unique factorization into primes; a counterexample is √ √ 4 = 2 · 2 = −(1 + 5)(1 − 5). √ (You should confirm that 2 and 1 ± 5 cannot√ be factored into nonunits.) This√ time, 2 really does not√ divide 1 ± 5, simply because the quotients (1 ± 5)/2 do not lie in Z[ 5]. In this particular case√ the problem can be remedied simply by throwing in the element φ = (1 + 5)/2 to form the new Z[φ], the set of real numbers of the form a+bφ, with a, b ∈ Z. This is closed under multiplication, since φ2 = 1 + φ. Then in fact elements of Z[φ] can be factored into primes, uniquely up to units; the positive units are exactly the powers of φ.

73 12.7 Exercises due April 27 (This is assignment #10.)

1. Express 103/71 as a continued fraction.

2. Evaluate [2, 1, 8, 1] as a rational number. √ 3. Express 11 as a continued fraction.

4. Find three solutions to x2 − 11y2 = 1 in positive integers x, y.

5. Write [3, 4, 1] in closed form. √ 6. Z[ 2] has the property of unique factorization into primes. Using this, show that an odd prime p can be written as a2 − 2b2 if and only if  2  p = 1. √ √ 7. Factor 23 + 10 2 into primes in Z[ 2]. √ 8. Z[ 10] does not have the property of unique factorization. Prove this as follows: show that 2 cannot be√ factored as the product of two non-units,√ so√ it is a prime in Z[ 10]. From this we see that 10 = 2 · 5 = 10 10 is factorization of 10 in two different ways.

9. True or false: a prime p 6= 2, 5 can be written as a2 − 10b2 if and only  10  if p = 1.

10. Does x2 − 21y2 = −1 have a solution in integers x, y?

13 Lagrange’s four square theorem

We have fully answered the question of which integers are expressible as the sum of two perfect squares. So the next natural question is to replace two squares by three, or four, or more. A little arithmetic reveals that some integers (like 7 or 15) are not ex- pressible as the sum of three squares, but they all seem to be expressible as the sum of four squares. Here are the first few integers n which are not expressible as the sum of two squares, but which are nonetheless are sums of four:

74 n a2 + b2 + c2 + d2 3 12 + 12 + 12 + 02 6 22 + 12 + 12 + 02 7 22 + 12 + 12 + 12 11 32 + 12 + 12 + 02 12 22 + 22 + 22 + 02 15 32 + 22 + 12 + 12 19 42 + 12 + 12 + 12 The goal of this final section is to prove the following theorem. Theorem 13.0.1 (Lagrange, 1770). Every positive integer can be written as the sum of four perfect squares. There are a few standard approaches to proving this theorem, all of which are quite interesting:

1. Lagrange’s original proof by descent,

2. Jacobi’s proof using modular forms,

3. Minkowski’s geometry of numbers,

4. Hurwitz’s system of integral quaternions.

We will follow the last approach, since it is most in line with themes we have encountered so far.

13.1 Hamiltonian quaternions We are all familiar with the real numbers R and the complex numbers C. These are fields, meaning that they are systems of numbers admitting laws of addition, subtraction, multiplication and division. The addition and multiplication laws are required to be commutative: a + b = b + a and ab = ba. When the commutativity constraint on multiplication is lifted, it turns out one can form a further extension of the complex numbers, known as the quaternions. Definition 13.1.1. A (Hamiltonian) is a formal sum of the form a + bi + cj + dk, where a, b, c, d ∈ R. The set of all quaternions is denoted H. Addition in H is componentwise, and multiplication in H is determined by the distributive law, the associative law, and by the rules

ij = k = −ji, jk = i = −kj, ki = j = −ik; i2 = j2 = k2 = −1.

75 The first three equations work the same way as unit vectors i, j, k under the cross product in R3. But unlike R3, we are allowed to add together scalars and vectors to produce results like 2 + 3i − 4j. You should be able to follow along with calculations like

(2 + 3i − 4j)(1 − j) = −2 + 3i − 6j − 3k, and if you like you can carefully note where the distributive and associative laws are used. The first observation we make is that multiplication in H is not commu- tative; indeed ij = k but ji = −k. This means we must be very careful about order when we multiply quaternions. However, a real number a (interpreted as a + 0i + 0j + 0k) does commute with every quaternion. Much like the complex numbers, the quaternions are equipped with a conjugation operation, which turns α = a+bi+cj+dk into α = a−bi−cj−dk, as well as a norm N(α) = a2 + b2 + c2 + d2.

Lemma 13.1.2. The following statements are true in H.

1. For all α, β ∈ H, αβ = βα.

2. αα = αα = a2 + b2 + c2 + d2.

3. If α 6= 0, then N(α) 6= 0. If we let α−1 = N(α)−1α, then αα−1 = α−1α = 1.

4. N(αβ) = N(α)N(β).

Proof. 1 and 2 are routine calculations, and 3 follows from

αα−1 = αN(α)−1α = N(α)−1αα = N(α)N(α)−1 = 1

(and similarly for the other order of multiplication); note that N(α) is a scalar and so it commutes with everything. Part 4 comes from

N(αβ) = αβαβ = αββα = αN(β)α = N(α)N(β)

The third part of the lemma tells us that nonzero quaternions have multiplicative inverses, and so we can divide by them. However one must be careful about order. It is not advised to write α/β, since this is ambiguous. Instead, write β−1α or αβ−1.

76 The fourth part of the lemma is quite interesting. It says that the prod- uct of two sums of four squares is another product of four squares:

(a2 + b2 + c2 + d2)(e2 + f 2 + g2 + h2) = w2 + x2 + y2 + z2, (13.1.1) where

w = ae − bf − cg − dh x = af + be + ch − dg y = ag − bh + ce + df z = ah + bg − cf + de

(We have also a similar formula for the product of two sums of two squares, coming from multiplication in C. Is there a formula like this ex- pressing the product of two sums of n squares as another sum of n squares? It turns out the answer is yes for n = 1, 2, 4, 8, but false otherwise! This is one reason why an analysis of sums of four squares is tractable for this course, but the same analysis for sums of three squares is much harder.) The quaternions were introduced in 1843 by Hamilton, who later devel- oped applications to analysis and physics. They are especially useful for describing arbitrary rotations of a sphere. But for us, the main application will be to number theory.

13.2 The Lipschitz quaternions Definition 13.2.1. Let L be the set of Lipschitz quaternions; these are the quaternions a + bi + cj + dk with a, b, c, d ∈ Z.

Then L is closed under addition, subtraction and multiplication. It is easy to see why we might be interested in L: an integer n is the sum of four squares exactly when n = N(α) for some α ∈ L. The multiplicativity of the norm tells us that in order to prove Lagrange’s theorem, it is enough to show that every prime number p is a norm from L. A unit in L would be a nonzero element u ∈ L such that u−1 is also in L. Since N(u)N(u−1) = 1, we can deduce that units in L are exactly those elements of norm 1, namely the eight elements ±1, ±i, ±j, ±k. We might want to proceed as we have for Z[i], when we showed that every prime p ≡ 1 (mod 4) is the norm of a Gaussian integer. Thus our first instinct might be to see whether L has a division algorithm. Thus, given α, β ∈ L with β 6= 0, is it possible to write α = βq + r, where q, r ∈ L and N(r) < N(β)? This is the same as asking whether β−1α ∈ H

77 can be approximated by an element q ∈ L, which is close enough so that N(β−1α − q) < 1. We can try to think about this geometrically, although admittedly it is difficult to think in four dimensions! Let us imagine H as a four-dimensional space, with the distance between quaternions α and β given by the formula pN(α − β). The points of L determine a collection of four-dimensional “hypercubes”. The√ distance from one corner of such a hypercube to the opposite corner is 12 + 12 + 12 + 12 = 2. Thus the distance from the center of the hypercube to each corner is 1. This is bad news for us. If β−1α happens to land exactly in the center of a hypercube, then the nearest q ∈ L (there are sixteen of these!) are a full distance 1 away. This won’t quite work: we need the remainder N(r) to be strictly less than N(β), in order for the Euclidean algorithm to be guaranteed to halt.

13.3 The Hurwitz quaternions √ We were in a similar situation before: Z[ −3 formed a rectangular lattice in C where the distance from the center of a rectangle to the corners was 1. By adding in the centers of the rectangles, we arrived at an enlargement Z[ω], in which the division algorithm was saved. We can do something similar to L. Definition 13.3.1. Let H be the set of Hurwitz quaternions: these are the quaternions of the form a + bi + cj + dk, where the a, b, c, d are either all integers, or all half of an odd integer. Thus 1 + i and (3 − i + 5j − 9k)/2 lie in H, but 1 + 3i/2 does not. Lemma 13.3.2. H is closed under addition, subtraction and multiplication. If α ∈ H, then N(α) is a nonnegative integer. Proof. The claims about closure can be proved by inspection, though the details might be tedious. Here’s a shortcut: every Hurwitz quaternion is either in L or else it equals α + ω, where α ∈ L and 1 ω = (−1 + i + j + k). 2 Now ω + overlineω = −1 and ωω = N(ω) = 1, so that ω2 = ω(−1 − ω) = −ω − 1; that is, ω2 + ω + 1 = 0. This implies that ω3 = 1. In this regard it is like the element ω belonging to the Eisenstein integers Z[ω]. Note the following interactions: iω = ω − i − j, ωi = ω − i − k, jω = ω − j − k, ωj = ω − i − j, kω = ω − i − k, ωk = ω − j − k. These show that when ω is

78 multiplied on the right or left by any element of L, the result still belongs to H. Now one only has to check that if we have two Hurwitz quaternions of the form α, β + ω or α + ω, β + ω (with α, β ∈ L), then the sum and product of those elements belongs to H again. This is now quite easy; for instance we have (α + omega)(β + ω) = (αβ − 1) + αω + βω which by our obvservations still belongs to H. For the final claim one just has to notice that if a, b, c, d are odd then a2 + b2 + c2 + d2 is divisible by 4.

The units in H are exactly those elements of norm 1. There are 24 of these: 8 are the units from L, and the other 16 are of the form (±1 ± i ± j ± k)/2. Theorem 13.3.3. Given α, β ∈ H with β 6= 0, there exist q, r ∈ H such that α = βq + r and N(r) < N(β). Proof. Let x = β−1α. Then the distance from x to the nearest element of H must be less than 1: it was already distance at most 1 from the nearest element of L, and if the distance is exactly 1, then x lies in the center of its hypercube, which means it already lies in H. Thus there exists q ∈ H such that N(β−1α − q) < 1. Multiply through by N(β) to obtain N(α − βq) < N(β).

We can now proceed to investigate the arithmetic of H as we have done before, but we have to be extra careful about the order of multiplication. For instance, it is ambiguous to write α|β: does this mean there exists γ with β = αγ, or β = γα? In the first case we say that α is a left divisor of β, and in the second case we say that α is a right divisor of β. However, if n is a rational (i.e., scalar) integer, then n|β is unambiguous: it means that there exists γ ∈ H with β = γn = nγ. The set of left and right divisors of a given quaternion might be different! However, note that units u are both left and right divisors of every element, since α = αu−1u and α = uu−1α. Theorem 13.3.4. Suppose α and β have no common left divisors other than units. Then there exist x, y ∈ H such that αx + βy = 1. Proof. Consider the set S of all αx + βy, where x, y ∈ H. This set contains some nonzero elements (it contains α and β, and if these were both zero, the condition on common divisors would not be satisfied). So there is an

79 element d = αx + βy in S of least nonzero norm. Now apply the division algorithm: there existq, r ∈ H with α = dq + r with N(r) < N(d). But r = α − dq = α(1 − xq) − βyq also lies in H. The minimality of N(d) shows that r must be 0, so that in fact d is a left divisor of α. Similarly, it is a left divisor of β, which implies that d is a unit. Now we can take αx + βy = d and multiply through on the right by d−1 to get the desired equation.

Theorem 13.3.5. Let a be a rational integer, and let β, γ ∈ H. Suppose that a and β have no common left divisors other than units. Also suppose that a|γβ. Then a|γ. Proof. By Theorem 13.3.4, there exist x, y ∈ H such that ax+βy = 1. Since a|βγ, we can write γβ = az, with z ∈ H. Multiplying Bezout’s identity on the left by γ gives γax + γβy = γ, so that γ = a(γx + zy) is divisible by a.

13.4 Hurwitz primes Definition 13.4.1. An element π ∈ H is a Hurwitz prime if it is a non-unit which cannot be factored as αβ for non-units α, β. Lemma 13.4.2. If π ∈ H and N(π) = p is a prime number, then π is a Hurwitz prime. Proof. If π = αβ, then p = N(α)N(β). This implies that N(α) or N(β) must be 1, so that either α or β must be a unit.

Theorem 13.4.3. Let p be a rational prime. Then p = N(α) for some α ∈ H. Proof. The trick is to use a result from Exercise 6 from Assignment #5. This said that the congruence x2 + y2 ≡ a (mod p) always has a solution, no matter the value of a. Therefore we can find x, y ∈ Z with x2 + y2 ≡ −1 (mod p), so that p|x2 + y2 + 1. But this last expression, being a sum of (at most) four squares, is a norm from L: it is N(α), where α = 1 + xi + yj. This means that p|αα. Note that p does not divide α, because α/p does not lie in H. (This argument is valid even if p = 2: the k-component of α/2 is 0, which is an integer, not a half-odd.)

80 Assume that p is a Hurwitz prime. Then up to right associates its only right divisors would be 1 and p. This means that p cannot share any non- unit right divisors with α. Now Theorem 13.3.5 applies to give p|α, which is (for the same reasoning as before) absurd. Thus our assumption is false, and p cannot be a Hurwitz prime. Thus we can write p = αβ for non-units α and β; taking norms gives p2 = N(α)N(β), so that N(α) = p.

13.5 The end of the proof We have only proved that every prime p is the norm of a Hurwitz integer α. This is the same as saying that either p is the sum of four squares, or else 4p is the sum of four odd squares. Not quite good enough! But the solution is not far off. If α does not belong to L, we can hope that there exists a unit u ∈ H such that αu ∈ L. Then p = N(αu) is the norm of a Lipschitz quaternion, which is what we want.

Lemma 13.5.1. Given α ∈ H, there exists a unit u ∈ H such that αu ∈ L.

Proof. Of course we might as well assume that α 6∈ L. Then α = (a + bi + cj + dk)/2, where a, b, c, d are all odd. Now, each of a, b, c, d must be congruent to ±1 modulo 4. This means that for some choice of signs we have α = (±1 ± i ± j ± k)/2 + 2λ, where λ ∈ L. But u = (±1 ± i ± j ± k)/2 is a unit (it has norm 1), so that u−1α = 1 + 2u−1λ. Finally, 2u−1 ∈ 2H ⊂ L, so that u−1α ∈ L.

Finally we can complete the proof of Theorem 13.0.1. By the identity in (13.1.1), it suffices to show that every prime p is the sum of four squares, or equivalently that it is a norm from L. By Theorem 13.4.3, p = N(α) for some α ∈ H. By the preceding lemma, αu ∈ L for some unit u, and then p = N(αu) is the sum of four squares.

81