MATH 537 Class Notes

Ed Belk

Fall, 2014

1 Week One

1.1 Lecture One

Instructor: Greg Martin, Office Math 212 Text: Niven, Zuckerman & Montgomery

Conventions: N will denote the set of positive , and N0 the set of nonnegative integers. Unless otherwise stated, all variables are assumed to be elements of N. §1.2 – Divisibility

Definition: Let a, b ∈ Z with a 6= 0. Then a is said to divide b, denoted a|b, if there exists some c ∈ Z such that ac = b. If in addition a ∈ N, then a is called a divisor of b. Properties of Divisibility: For all a, b, c ∈ Z with a 6= 0, one has: • If a|b then ±a| ± b • 1|b, b|b, a|0 • If a|b and b|a then a = ±b

• If a|b and a|c, then a|(bx + cy) for any x, y ∈ Z If we assume that a and b are positive, we also have • If a|b then a ≤ b

The Division Algorithm: Let a, b ∈ N. Then there exist unique natural numbers q and r such that: 1. b = aq + r, and 2. 0 ≤ r < a Proof: We prove existence first; consider the set

R = {b − an : n ∈ N0} ∩ N0.

By the well-ordering axiom, R has a least element r, and we define q to be the nonnegative q such that b − aq = r. Then b = aq + r and r ≥ 0; moreover, if r ≥ a then one has

0 ≤ r − a = (b − aq) − a = b − a(q + 1) < b − aq + r, contradicting the minimality of r ∈ R, and we are done.

1 Now, suppose q0 and r0 are such that we have

b = aq + r = aq0 + r0.

Without loss of generality we may assume than r ≥ r0. Then

r − r0 = (b − aq) − (b − aq0) = a(q0 − q) ⇒ a|(r − r0); but 0 ≤ r − r0 ≤ r < a, and so the above equation is a contradiction unless r − r0 = 0, and the result is immediate.  : Given any two integers a and b not both equal to zero, we define their greatest common divisor (commonly abbreviated gcd) to be the largest d ∈ N such that d|a and d|b; we write d = (a, b). Note that because a and b each have only finitely many divisors, the gcd is always well-defined.

Theorem 1.1.1 Let a, b ∈ Z, not both equal to zero. Then: 1. (a, b) = min S, where S = ({ax + by : x, y ∈ Z} ∩ N), and 2. For any c ∈ Z such that c|a and c|b, we have c|(a, b). The existence of integers x, y so that ax + by = (a, b) as in part (1) is known as B´ezout’sidentity. Proof: 1. Let m = min S, with u and v such that m = au + bv, and let g = (a, b); note that m ≤ a. Since g|a and g|b, we know from the properties of divisibility that g|m and so g ≤ m. Now, if m - a then by the division algorithm we may write a = mq + r with 0 < r < m, and thus

r = a − mq = a − q(au + bv) = a(1 − qu) + b(−qv) ∈ S, and we deduce that r ≥ m = min S, a contradiction; thus m|a. In the same fashion we show m|b, and so by definition m ≤ (a, b) = g, and we are done.

2. If c|a and c|b, then we know c|(ax + by) for every x, y ∈ Z, and in particular for those u, v such that (a, b) = au + bv, whose existence is guaranteed by part 1. 

2 1.2 Lecture Two

Recall: B´ezout’sidentity states that (a, b) is the smallest positive integer that may be written ax + by, where x, y ∈ Z. Proposition 1.2.1 For a, b ∈ N, one has (ma, mb) = m(a, b). a b  1  a b  Corollary 1: If d|a, d|b, then d , d = d (a, b); in particular, (a,b) , (a,b) = 1. Proof: Set g = (a, b), so that we may write ax + by = g, for some x, y ∈ Z. Then mg = (ma)x + (mb)y, thus mg ≥ (ma, mb). Furthermore, g|a and so mg|ma; similarly mg|mb, thus mg ≤ (ma, mb), and we are done.  Definition: Two integers a and b are called relatively prime (or coprime) if (a, b) = 1. nb. We observe that (a, b) = 1 if and only if there exist x, y such that ax+by = 1. The corresponding statement with (a, b) = k > 1 is not, in general, true, however it is the case that

ax + by = k ⇒ (a, b)|k.

Proposition 1.2.2 If (a, n) = (b, n) = 1, then (ab, n) = 1. Proof: Suppose we have u, v, x, y so that au + nv = bx + ny = 1; then we have

1 = 1 · 1 = (au + nv)(bx + ny) = ab(ux) + n(auy + bvx + nvy), and the result is immediate.  [Aside: Compare with the analagous result in commutative algebra. If R is a commutative, unital ring and I, J, K ⊂ R are ideals such that I + K = J + K = R, then IJ + K = R.] Proposition 1.2.3 If a|c, b|c, and (a, b) = 1, then ab|c. (Note that this is not, in general, true for (a, b) > 1, e.g. a = b = c = 2.) Proof: Choose m, n, x, y so that c = am = bn and ax + by = 1. Then

c = cax + cby = (bn)ax + (am)by = ab(nx + my), and we deduce that ab|c.  Theorem 1.2.4 (Theorem 1.10, Niven) If d|ab and (b, d) = 1, then d|a. Proof: Exercise. nb. If d|a, d|b, then d|b + ax for any x ∈ Z. In fact, the condition is also necessary, as b = (b + ax) − x(a). The Euclidean Algorithm: How can we find the gcd of two integers, for example 537 and 105? By the division algorithm, we have 537 = 5 · 105 + 12, and so by the above note we know (537, 105) = (105, 12). Repeating this process, we see 105 = 8 · 12 + 9 ⇒ (105, 12) = (12, 9); 12 = 1 · 9 + 3 ⇒ (12, 9) = (9, 3);

3 9 = 3 · 3 + 0 ⇒ (9, 3) = (3, 0) = 3. Thus (537, 105) = 3. Notation: The least common multiple of a and b is denoted lcm(a, b) or, more commonly, [a, b]. Exercise: Show that (a, b)[a, b] = ab. §1.3 – Primes Definition: A natural number n is called prime if it has exactly two divisors. n is called composite if there exists some d with 1 < d < n such that d|n. The integer n = 1 is neither prime nor composite. Notation: Unless otherwise stated, p will denote a prime number. Lemma 1.2.5 (Euclid’s lemma) If p|ab, then p|a or p|b.

Proof: Suppose p - b. Then (p, b) = 1, and so by theorem 1.2.4 we know that p|a.  Theorem 1.2.6 (The Fundamental Theorem of Arithmetic) Every n ∈ N, n > 2 may be written as the product of primes; moreover this expression is unique up to reordering of the factors. Proof: (existence) We use strong induction. The case n = 2 is trivial from the definition of a prime, therefore suppose n > 2. If n is prime we have the trivial factorization n = n, otherwise we may write n = ab, with 1 < a < n and 1 < b < n. By the inductive hypothesis we may write a = p1p2 ··· pk, b = q1q2 ··· ql, with each pi, qj prime, and the result is immediate. (uniqueness) Let n ∈ N and suppose we have

n = p1p2 ··· pk = q1q2 ··· ql, each pi, qj prime.

Since p1|q1q2 ··· ql we have by lemma 1.2.5 that p1|q1 or p1|q2 ··· ql. Repeating this process as many times as necessary, we find qt such that p1|qt, and by relabelling the qj if necessary we will assume t = 1. Since p1 6= 1 this implies that p1 = q1, as q1 has no other factors. We then cancel p1 = q1 on both sides of the equation and we have p2p3 ··· pk = q2q3 ··· ql.

We apply the same argument to this expression to obtain p2 = q2, p3 = q3, and so on; it follows that k = l, and we are done. 

4 2 Week Two

2.1 Lecture Three

Doing a linear algebra problem backwards. Consider the augmented matrix

 1 0 537  ; 0 1 105

x 537 this system clearly has solution = . Moreover, from basic linear algebra we know that the application y 105 of elementary row operations to this augmented system will not change the solution; therefore, with R1,R2 x 537 respectively denoting the first and second row of the matrix, we observe that = is also a solution y 105 to the augmented matrices  1 −5 12  (R → R − 5R ), 0 1 105 1 1 2  1 −5 12  (R → R − 8R ), −8 41 9 2 2 1  9 −46 3  (R → R − R ), −8 41 9 1 1 2  9 −46 3  (R → R − 3R ). −35 179 0 2 2 1 Thus we have the matrix equation  9 −46 537 3 = . −35 179 105 0 The first entry of this equation indicates that 9(537) + (−46)(105) = 3 = (537, 105), while the entries in the 105 537 second row of the matrix are −35 = − (537,105) and 179 = (537,105) . This operation is known as the extended Euclidean algorithm.

Lemma 2.1.1 Let a, b ∈ N and use the division algorithm to write b = aq + r with 0 ≤ r < a. Then a|b if and only if r = 0. Proof: If r = 0 then b = aq and we are done. Conversely, if a|b then a|b−ax for every x, and since r = a−bq < a, we must have r = 0.  Theorem 2.1.2 (Euclid’s theorem) There are infinitely many prime numbers.

Proof: It suffices to show that every finite list of primes excludes at least one prime number. Let {p1, p2, . . . , pk} be a set of finitely many primes and let N = p1p2 ··· pk + 1. Then N ≥ 2 and so by the fundamental theorem of arithmetic N is the product of primes, so there exists some prime p such that p|N. Applying the division algorithm with N and any pj yields

N = pj(p1 ··· pj−1pj+1 ··· pk) + 1, which (since 1 < pj) by lemma 2.1.1 implies that pj - N for any j. Thus we deduce that p 6= pj for any j = 1, 2, . . . , k, and therefore that the set of primes {p1, p2, . . . , pk} is not exhaustive. 

5 §2.1 – Congruences

Definition: Let m ∈ Z, m 6= 0. Given a, b ∈ Z, we say that a is congruent to b modulo m, written a ≡ b mod m, if m|(b − a). For example, we have

53 ≡ 7 mod 23, but 5 6≡ 37 mod 23.

Lemma 2.1.3 For fixed m 6= 0, “congruence modulo m” is an equivalence relation. Proof: Clearly a ≡ a mod m because m|0 = a − a, which proves reflexivity. Symmetry is an immediate consequence of the fact that m|(b − a) ⇔ m|(a − b), and to prove transitivity we observe that

a ≡ b mod m, b ≡ c mod m ⇒ m|(b − a), m|(c − b) ⇒ m|(c − b) + (b − a) = (c − a), and we are done.  Thus in particular, congruence modulo m (as every equivalence relation) partitions Z into equivalence classes, called residue classes modulo m. For example, one residue class modulo 23 is the set

{..., −39, −16, 7, 30, 53,...}.

In general, a residue class modulo m is of the form {a + km : k ∈ Z}. Note in particular that a ≡ b mod m if and only if a and b have the same remainder when dividing by m. Lemma 2.1.4 Suppose a ≡ b mod m, c ≡ d mod m. Then: 1. If d|m then a ≡ b mod d, 2. a + c ≡ b + d mod m, 3. ac = bd mod m. Proof: We prove only (3), as the others are clear from the definitions: since m|(b − a), m|(c − d), we must have that m divides c(b − a) + b(d − c) = bd − ac, and the result follows.  The last two parts of lemma 2.1.4 imply further that a − c ≡ b − d mod m, and more generally, if f(X) ∈ k k Z[X], then f(a) ≡ f(b) mod m whenever a ≡ b mod m. In particular, we have that a ≡ b mod m for any k ∈ N. Question: If j ≡ k mod m, do we have aj ≡ ak mod m? In general, no: some counterexamples include a = 2, m = 3 or a = 2, m = 4. We have seen that the operations of addition, subtraction, and multiplication behave well with respect to congruence modulo m; does division? Again, in general the answer is no:

18 ≡ 28 mod 10, but 9 6≡ 14 mod 10, as we might expect if we were allowed to “divide by 2.” m Theorem 2.1.5 (Theorem 2.3, Niven) We have ax ≡ ay mod m if and only if x ≡ y mod (a,m) . In particular, if (a, m) = 1 then ax ≡ ay mod m ⇔ x ≡ y mod m.

6 m a  m a  Proof: Suppose ax ≡ ay mod m so that m|a(y−x); then we have (a,m) | (a,m) (y−x), and since (a,m) , (a,m) = 1 m m m m we know that (a,m) |(y − x), hence x ≡ y mod (a,m) . Now, suppose x ≡ y mod (a,m) so that (a,m) |(y − x). Then m a we certainly have a (a,m) |a(y −x), hence (a,m) m|a(y −x) and so in particular m|a(y −x), and we are done.  Definition: Given m ∈ Z, m 6= 0, a complete residue system modulo m is a set containing exactly one element from each residue class modulo m. For example, with m = 5 we may take any of the sets

{0, 1, 2, 3, 4}, {1, 2, 3, 4, 5}, {−2, −1, 0, 1, 2}, or {−17, 60, 101, 12, −111}.

A reduced residue system is a set of representatives from all residue classes relatively prime to m; continuing in the same example, we may take

{1, 2, 3, 4} or {537, −7, 1, 99999929}.

7 2.2 Lecture Four

Recall:A reduced residue system modulo m is a set consisting of exactly one element form each residue class modulo m whose elements are relatively prime to m; these are called reduced residue classes. Equivalently, we may take any complete residue system modulo m, and discard all elements d such that (d, m) > 1. Example: If m = 10, a complete residue system is given by {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}; by discarding all elements not relatively prime to 10, we obtain the reduced residue system {1, 3, 7, 9}. If m is prime, a reduced residue system is given by {1, 2, . . . , m − 1}.

Definition: The Euler φ-function (or Euler totient function) is the function which assigns to m ∈ N the cardinality of a reduced residue system modulo m; that is,

φ(m) = #{1 ≤ i ≤ m :(i, m) = 1}.

For example, φ(10) = 4, and φ(p) = p − 1 for any prime p.

Lemma 2.2.1 Let {r1, r2, . . . , rφ(m)} be a reduced residue system modulo m and let a ∈ Z with (a, m) = 1. Then {ar1, ar2, . . . , arφ(m)} is also a reduced residue system modulo m. For example, with m = 10, a = 13, we see that {13, 39, 91, 117} = {13 · 1, 13 · 3, 13 · 7, 13 · 9} is a reduced residue system modulo 10.

Proof: By assumption a and each rj are relatively prime to m, and so each arj is also relatively prime to m. Moreover, if ari, arj lie in the same residue class, then one has

ari ≡ arj mod m.

By theorem 2.1.5, we may cancel a (which is relatively prime to the modulus) to yield the congruence

ri ≡ rj mod m, and hence (since we began with a reduced residue system) we know that i = j, and the result is immediate.  Theorem 2.2.2 (Euler’s theorem) If (a, m) = 1, then aφ(m) ≡ 1 mod m.

Proof: Let {r1, r2, . . . , rφ(m)} be a reduced residue system modulo m. Then by lemma 2.2.1, the elements ar1, ar2, . . . , arφ(m) are congruent (in some order) to the elements r1, r2, . . . , rφ(m), and therefore

r1r2 ··· rφ(m) ≡ (ar1)(ar2) ··· (arφ(m)) mod m φ(m) ≡ a r1r2 ··· rφ(m) mod m.

Since (r1r2 ··· rφ(m), m) = 1, we may cancel it, and the result follows.  p−1 Corollary 1: (Fermat’s little theorem) If p is prime and p - a, then a ≡ 1 mod p, and for all a ∈ Z one has ap ≡ a mod p. Corollary 2: Let (a, m) = 1. If there exist e and f with e ≡ f mod φ(m), then ae ≡ af mod m. For example, 537 ≡ 1 mod 4, and since 4 = φ(10) we have that 3537 ≡ 31 mod 10.

8 Proof: Suppose without loss of generality that f ≥ e and write f = e + kφ(m). We have

af = ae+kφ(m) = ae(aφ(m))k ≡ ae(1)k mod m ≡ ae mod m, as claimed.  Definition: Given a, m ∈ Z with m 6= 0, we call x ∈ Z a (multiplicative) inverse of a modulo m if ax ≡ 1 mod m. Theorem 2.2.3 (Theorem 2.9, Niven) If (a, m) > 1, then a has no inverse modulo m. If (a, m) = 1, then there exists a unique reduced residue class modulo m which contains all inverses of a. We denote any such inverse as a¯ or a−1. Note that the notation a−1 is justified, as for example if we define a−k to be (a−1)k mod m, then we indeed have (ak)−1 = (a−1)k. Proof: Let g = (a, m); note that if ax ≡ 1 mod m then ax ≡ 1 mod g, and since g|a this congruence becomes 0x ≡ 1 mod g, a contradiction unless g = 1. Thus with the assumption that g = 1, we first prove uniqueness: if ax ≡ 1 mod m and ay ≡ 1 mod m, then ax ≡ ay mod m, hence (since (a, m) = 1) x ≡ y mod m, as claimed. To show existence, we give two short proofs: (1) By Euler’s theorem, we have 1 ≡ aφ(m) mod m ≡ a · aφ(m)−1 mod m, so we may take a−1 = aφ(m)−1. (2) Since (a, m) = 1, there exist integers u, v such that au + bv = 1. Taking this equation modulo m yields the congruence au ≡ 1 mod m, and so we may take a−1 = u. 

9 2.3 Lecture Five

Calculating inverses: Suppose we want to calculate the (multiplicative) inverse of 9 modulo 20; note that this calculation is well-defined, as (9, 20) = 1. We perform the Euclidean algorithm:

20 = 9 · 2 + 2; 9 = 2 · 4 + 1

⇒ 1 = 9 − 2 · 4 = 9 − 2 · (20 − 2 · 9) = 9 · 9 − 4 · 20. Taking this last equation modulo 20, we see that 92 ≡ 1 mod 20, so 9−1 ≡ 9 mod 20. The same equation also tells us that 20−1 ≡ 4 mod 9. One clearly has

20−1 ≡ 1 mod 19, 19−1 ≡ −1 mod 20,

19−1 ≡ 1 mod 9, 9−1 ≡ −2 mod 19.

Definition: A collection of integers m1, m2, . . . , mr are called pairwise coprime (or pairwise relatively prime) if (mi, mj) = 1 for all i 6= j. Note that this is stronger than the statement that (m1, m2, . . . , mr) = 1. For example, (6, 10, 15) = 1, but (6, 10) = 2, (6, 15) = 3, (10, 15) = 5.

Theorem 2.3.1 (Theorem 2.18, Niven; the Chinese remainder theorem) Let m1, m2, . . . , mr be pairwise co- prime, and let {a1, a2, . . . , mr} be any set of integers. Then there exists a solution x to the system of congruences

x ≡ a1 mod m1,

x ≡ a2 mod m2, . .

x ≡ ar mod mr, and moreover the set of all solutions is exactly the residue class of x modulo M = m1m2 ··· mr. Proof: For j = 1, 2, . . . , r, let N = m1m2···mr , and note that (m ,N ) = 1. Therefore we may define b to be j mj j j j the inverse of Nj modulo mj, so Njbj ≡ 1 mod mj. Set

r X x0 = Njbjaj; j=1 we claim that x0 solves our system. Indeed, modulo mj, each Ni with i 6= j is congruent to 0 modulo mj, and so x0 ≡ (Njbj)aj mod mj ≡ aj mod mj, as claimed. Now, if x ≡ x0 mod M, then in particular for each j we have x ≡ x0 mod mj ≡ aj mod mj, so x is also a solution. Finally, if y is any solution to our system, then y ≡ aj mod mj ≡ x0 mod mj for every j, so mj|(y − x0). Since the mi are pairwise coprime, we have m1m2|(y − x0), m1m2m3|(y − x0), and so on, until we obtain M|(y − x0), and we are done. 

Remark: If m1, m2, . . . , mr are not pairwise coprime, then there may be no solution, or there may be one residue class of solutions modulo [m1, m2, . . . , mr]. For example, the system

x ≡ 0 mod 6,

x ≡ 1 mod 4,

10 has no solution, while x ≡ 0 mod 6, x ≡ 2 mod 4, has as its solution the residue class of 6 modulo 12. Example: Greg steals B boxes of 20 Timbits each. There are an equal number of each of the 9 flavours, and one extra to fill the last box. In class, he divides the Timbits equally among the 19 students, with 4 leftover for himself. What is the smallest possible value of B? Solution: Let t be the total number of Timbits; we have

t ≡ 0 mod 20, t ≡ 1 mod 9, t ≡ 4 mod 19.

Set m1 = 20, m2 = 9, m3 = 19; then

N1 = 171,N2 = 380,N3 = 180.

−1 −1 −1 −1 We need b1 ≡ N1 mod m1 ≡ (9 · 19) mod 20 ≡ (9) (19) mod 20 ≡ 11 mod 20, from our previous work. Similarly, b2 ≡ 5 mod 9, b3 ≡ −2 mod 19. Hence

x0 = N1b1a1 + N2b2a2 + N3b3a3 = (171)(11)(0) + (380)(5)(1) + (180)(−2)(4) = 460.

Structural comments: Let Zm = Z/mZ be the set of residue classes modulo m. If d|m, then there is a well-defined projection map πd : Zm → Zd given by

πd(a mod m) = a mod d.

Note that this map is not well-defined if d - m. Now, let m1, m2, . . . , mr be pairwise coprime. We have a map

π : Zm1m2···mr −→ Zm1 × Zm2 × · · · × Zmr , given in each component Zmi by πmi . The Chinese remainder theorem gives a map

ρ : Zm1 × Zm2 × · · · × Zmr −→ Zm1m2···mr such that π ◦ ρ = id. Since each set is finite, we know that π and ρ are bijections. One can check that: 1. π and ρ respect coprimality, and 2. π and ρ respect multiplication and addition. × Hence, π and ρ are ring isomorphisms. In particular, if Zm is the set of reduced residue classes modulo m, then × × × × π :(Zm1m2···mr ) −→ Zm1 × Zm2 × · · · × Zmr is an isomorphism of multiplicative groups. It follows from this, and the formula for the Euler φ-function, that φ(m1m2 ··· mr) = φ(m1)φ(m2) ··· φ(mr).

11 3 Week Three

3.1 Lecture Six

Suppose n ∈ N has prime factorization α1 α2 αr n = p1 p2 ··· pr , with αi > 0 and pi 6= pj for all i 6= j. Then as discussed last time, we have maps

π : Zm1m2···mr −→ Zm1 × Zm2 × · · · × Zmr ,

ρ : Zm1 × Zm2 × · · · × Zmr −→ Zm1m2···mr , where π = π α1 × π α2 × · · · × π αr and ρ is the map given by the Chinese remainder theorem. These maps are p1 p2 pr mutual inverses, and moreover are ring isomorphisms. In particular, these maps respect coprimality, and so their restrictions to their respective multiplicative groups of units yield mutually inverse isomorphisms

× × × × π˜ :(Zm1m2···mr ) −→ Zm1 × Zm2 × · · · × Zmr ,

× × × × ρ˜ : Zm1 × Zm2 × · · · × Zmr −→ (Zm1m2···mr ) . × By definition, (Zn) has cardinality φ(n), and so it follows that

φ(m1m2 ··· mr) = φ(m1)φ(m2) ··· φ(mr).

Thus we are led to compute φ(pα) for prime p; but since the only 1 ≤ k ≤ pα with (pα, k) > 1 must have (pα, k) = p, we deduce that exactly the multiples of p are not relatively prime to pα, hence φ(pα) = pα − pα−1 = α  1  p 1 − p . It follows that Y  1 φ(n) = n 1 − , p p|n with the product running over all prime divisors p of n.

Lemma 3.1.1 Fix m ∈ N, and consider the following statements: 1. x2 ≡ 1 mod m 2. x−1 ≡ x mod m 3. x ≡ ±1 mod m For any m, one has (1) if and only if (2), and that (3) implies (1). If m is prime, then all three are equivalent. Proof: The first statement is clear, as is the statement that (3) implies (1). Thus we will assume m is prime; then one has (3) if and only if m|x2 − 1 = (x + 1)(x − 1). Thus by Euclid’s lemma we have m|x + 1 or m|x − 1, and the result is immediate.  We saw in the last lecture that 9−1 ≡ 9 mod 20, but clearly 9 6≡ ±1 mod 20. The same is true for 11 ≡ −9 mod 20. Theorem 3.1.2 (Wilson’s theorem) If p is prime, then (p − 1)! ≡ −1 mod p.

12 Proof: The cases p = 2, p = 3 are clear by computation. For p > 3, we pair off the numbers {2, 3, . . . , p − 2} p−3 as {a1, b1, a2, b2, . . . , ak, bk}, where k = 2 and aibi ≡ 1 mod p. We know that this is well-defined by lemma 3.1.1, and the fact that inverses modulo p are unique. One then has

(p − 1)! = 1 · 2 ··· (p − 1) = 1 · (p − 1) · a1b1 ··· akbk ≡ 1 · (p − 1) · 1 · 1 ··· 1 mod p ≡ −1 mod p, as claimed.  §2.2 – Solutions of congruences How many solutions has X4 + 2X3 + X + 1 ≡ 0 mod 5? As integers, we have solutions x ∈ {· · · , −14, −13, −9, −8, −4, −3, 1, 2, 6, 7, 11, 12, ···}. As residue classes modulo 5, we have only x ≡ 1 mod 5 and x ≡ 2 mod 5; we say that our congruence has only 2 solutions modulo 5.

Definition: Given a polynomial f(X) ∈ Z[X], the number of solutions of f(X) ≡ 0 mod m, denoted σf (m), is the number of residue classes modulo m which satisfy the congruence; equivalently,

σf (m) = #{1 ≤ x ≤ m : f(x) ≡ 0 mod m}.

2 Example: Let f(X) = X − 1. We saw that σf (20) ≥ 4, while by lemma 3.1.1 we know that if p is an odd prime then σf (p) = 2, while σf (2) = 1. We begin our investigation by studying linear congruences of the form ax ≡ b mod m.

Theorem 3.1.3 (Theorem 2.17, Niven) Let m ∈ N and set f(X) = aX − b, a, b ∈ Z. Set g = (a, m). Then σf (m) = 0 unless g|b, in which case σf (m) = g. Proof: If ax ≡ b mod m, then ax ≡ b mod g, i.e. 0x ≡ b mod g, since g|a, and hence we must have g|b. Now, suppose g|b and write a = αg, b = βg, m = µg. Then ax ≡ b mod m ⇔ αx ≡ β mod µ, by theorem 2.1.5. But (α, µ) = 1 by construction, so α−1 modulo µ exists, and we have the unique solution −1 m given by x ≡ α β mod µ. This yields g = µ solutions modulo m, as claimed.  Example: Let m = 100 and g = 5, so that µ = 20. Then x ≡ 14 mod 20 if and only if x ≡ 14, 34, 54, 74, or 94 modulo 100.

e1 e2 er Let m have prime factorization m = p1 p2 ··· pr . By the Chinese remainder theorem, the congruence f(x) ≡ 0 mod m is equivalent to the system of congruences

e1 f(x) ≡ 0 mod p1 , e2 f(x) ≡ 0 mod p2 , . .

er f(x) ≡ 0 mod pr .

13 In particular, this implies that r Y ei σf (m) = σf (pi ), i=1 and thus it suffices to study polynomial congruences modulo prime powers; this will be the focus of our next lecture.

14 3.2 Lecture Seven

Exercise: Prove that the product of any k consecutive integers is a multiple of k!. Solution: The pigeonhole principle implies that among any k consecutive integers must be a multiple of 1, of 2, and so on up to k, but this is not quite enough, since these numbers need not be pairwise coprime. Instead, we may prove it one prime at a time, from which the general case follows. On the other hand, we may simply use the identity j(j − 1) ··· (j − k + 1) j! j = = ∈ , k! k!(j − k)! k Z from which the fact is apparent; granted, the last method is a Deus ex machina. §2.6 – Prime power moduli

Lemma 3.2.1 Let f(X) ∈ C[X] have degree d. Then for any a ∈ C, we have f 00(a) f (d)(a) f(a + h) = f(a) + hf 0(a) + h2 + ··· + hd . 2! d!

Proof: Fix a; both expressions above are polynomials in h of degree d, and their zeroth derivatives agree at h = 0, as do their first derivatives, second, and so on up to the dth derivatives. Thus their derivative, which is a polynomial in h of degree at most d, is divisible by hd+1, which implies that they must, in fact, be equal.  nb. With the notion of a derivative not defined here, we instead will use the formal derivative of a polynomial or power series, i.e.

m m X n 0 X n−1 if f(X) = anX , then f (X) = nanX , m ∈ N0 ∪ {∞}. n=0 n=0

f (k)(a) Lemma 3.2.2 If f(X) ∈ Z[X], then for any a ∈ Z, k ∈ N, we have that k! is an integer. d X n Proof: Write f(X) = anX , an ∈ Z. Then n=0

d f (k)(a) X n(n − 1) ··· (n − k + 1) = an−k, k! k! n=0

n(n−1)···(n−k+1) and by the exercise we know that k! ∈ Z.  j Theorem 3.2.3 (Hensel’s lemma) Let f(X) ∈ Z[X] and let p be a prime power. Suppose there exists a ∈ Z so that f(a) ≡ 0 mod pj and f 0(a) 6≡ 0 mod p. Then there exists a unique integer t, 0 ≤ t < p such that f(a + tpj) ≡ 0 mod pj+1. Example: Take f(X) = X2 − 2, a = 4, pj = 71. Then

f(4) = 16 − 2 ≡ 0 mod 7, f 0(4) = 2(4) 6≡ 0 mod 7.

It follows that exactly one element of {4, 11, 18, 25, 32, 39, 46} is a root of f(X) modulo 72; it turns out to be 39.

15 Note that the residue class a modulo pj is the union of the p residue classes a + tpj, 0 ≤ t < p. The one which is a root modulo pj+1 is called a lift of a. Proof of Hensel’s lemma: By lemma 3.2.1, we may write

(tpj)2f 00(a) (tpj)df (d)(a) f(a + tpj) = f(a) + tpjf 0(a) + + ··· + . 2! d! Taking this expression modulo pj+1 yields

f(a + tpj) ≡ f(a) + tpjf 0(a) mod pj+1.

Since f(a) ≡ 0 mod pj, we have that this is the case if and only if

f(a) ≡ −tf 0(a) mod p. pj

Since f 0(a) 6≡ 0 mod p, we have that f 0(a) is a unit modulo pj+1, and so we find the unique class t to be given by −(f 0(a))−1f(a) t ≡ mod p, pj as can be easily verified.  f(a) 14 0 Example: Using the same example from before, we calculate pj = 7 = 2, f (a) = 8 ≡ 1 mod 7, so we ought to take t = −(1)−1(2) ≡ 5 mod 7, and indeed

f(4 + 5 · 7) = f(39) = 1519 ≡ 0 mod 72.

0 Corollary 1: Given f(X) ∈ Z[X], a prime p, and a ∈ Z with f(a) ≡ 0 mod p and f (a) 6≡ 0 mod p, then for j j every j ≥ 2 there exists a unique lift of a to a root of f modulo p ; that is, a unique residue class aj mod p such that j f(aj) ≡ 0 mod p and aj ≡ a mod p.

Proof: Exercise. (hint: use induction and Hensel’s lemma)

Remark: The aj of the corollary are given recursively by a1 = a and, for j ≥ 1,

0 −1 aj+1 = aj − f (aj) f(aj). nb. The condition f 0(a) 6≡ 0 mod p is the condition that a is a nonsingular root of f(X) modulo p. As written, this formula fails for singular roots: consider f(X) = X2. Then a = 0 is a root modulo p, and every lift of a is a root of f modulo p2. Similarly, for g(X) = X2 − p, a = 0 is a root modulo p, but no lifts of a are roots modulo p2. There is a more general version of Hensel’s lemma (theorem 2.24 of Niven) which accommodates such roots. Fact: There exist polynomials, such as

(X2 − 2)(X2 − 17)(X2 − 34), or 3X3 + 4Y 3 + 5Z3, which have roots modulo m for every m ∈ N, but have no roots over the rationals.

16 3.3 Lecture Eight

§2.7 – Prime modulus P j P j Definition: Let f(X) = ajX , g(X) = bjX ∈ Z[X]. We will say that f(X) is congruent to g(X) modulo m, written f(X) ≡ g(X) mod m, if aj ≡ bj mod m for every j. In other words, f(X) ≡ g(X) mod m ∼ if and only if f(X) and g(X) have the same image in (Z[X])/(m) = (Z/mZ)[X]. 2 Example: Suppose f(X) = 15X + 3X + 8 ∈ Z[X]. We note that deg f = 2 over Z, but deg f = 1 over Z5, and deg f = 0 over Z3. Lemma 3.3.1 Let p be prime, a an integer, and f(X) ∈ Z[X]. If f(a) ≡ 0 mod p, then there exists g(X) ∈ Z[X] with deg g = deg f − 1 such that

f(X) ≡ (X − a)g(X) mod p.

Proof: We saw in our last lecture that (with d = deg f)

f 00(a) f (d)(a) f(a + h) = f(a) + hf 0(a) + h2 + ··· + hd . 2! d! We set d X f (j) g(X) = (X − a)j−1 , j! j=1 and we have that f(X) = f(a) + (X − a)g(X) ≡ (X − a)g(X) mod p.

f (d)(a) Note that the leading coefficient of f(X) is d! and that deg g = d − 1.  Observe that the primality condition is necessary; indeed, if f(X) = X2 − 1, then f has roots ±1, but we may factor f(X) = (X − 5)(X + 5).

Theorem 3.3.2 (Theorem 2.26, Niven) Let f(X) ∈ Z[X], deg f = d modulo p, with p prime. Then f has at most d roots modulo p. Proof: We induct on deg f. For deg f = 0 the result is clear, so suppose deg f = d > 0. If f has no roots modulo p we are done; otherwise, write

f(X) ≡ (X − a)g(X) mod p, where f(a) = 0 and deg g = d − 1, as guaranteed by lemma 3.3.1. Since p is prime, any root of f(X) modulo p is a root of X − a or g(X). By the inductive hypothesis, g has at most d − 1 roots modulo p, and X − a has a single root modulo p, from which we deduce the result.  Example: Consider f(X) = Xp − X with p prime. By Fermat’s little theorem, every residue class modulo p is a root of f, and by lemma 3.3.1 it follows that

f(X) = X(X − 1)(X − 2) ··· (X − p + 1) mod p.

Comparing coefficients yields some interesting congruences, among which we have in the coefficient of Xp−1

0 + 1 + 2 + ··· + (p − 1) ≡ 0 mod p, p > 2,

17 and in the coefficient of Xp−2 X jk ≡ 0 mod p, p > 3. 0≤j

(p − 1)! ≡ −1 mod p.

Remark: This example implies that if f(X), g(X) ∈ Z[X] are such that f(a) ≡ g(a) mod p for every a ∈ Z, then f(X) − g(X) ≡ h(X)(Xp − X) mod p for some h(X) ∈ Z[X]. In fact, this condition is also sufficient.

Proposition 3.3.3 Let F (X) be any function (i.e. set map) from Zp to Zp. Then there exists a unique polynomial g(X) modulo p of degree at most p − 1 such that

F (a) ≡ g(a) mod p for every a ∈ Z.

Proof: We show uniqueness first. If g(X), h(X) both satisfy the condition, then from our remark above we have that p g(X) − h(X) = q(X)(X − X), some q(X) ∈ Z[X]. Comparing degrees, we see that we must have g = h. For existence, we give two proofs. First of all, if we set p−1 X g(X) = (1 − (X − a)p−1)F (a), a=0 then by Fermat’s little theorem we see that g(a0) ≡ (1 − 0)F (a0) mod p ≡ F (a0) mod p. p p Alternatively, we observe that there are exactly p functions Zp → Zp, and there are exactly p polynomials over Zp of degree at most p − 1. No two of these polynomials give the same function, and it follows that the two sets must coincide.  Corollary 1: (Corollary 2.30, Niven) Let p be prime and suppose that d|(p − 1). Then Xd − 1 has exactly d roots modulo p. Proof: By theorem 3.3.2 there are most d roots, so we need only show there are at least d roots. Note that Xp−1 − 1 ≡ (X − 1)(X − 2) ··· (X − p + 1) mod p has exactly p − 1 roots modulo p. Since d|(p − 1), we have

Xp−1 − 1 = (Xd − 1)(Xp−1−d + Xp−1−2d + ··· + X2d + Xd + 1).

The second factor has at most p − 1 − d roots modulo p, and so by the pigeonhole principle Xd − 1 must have at least d roots modulo p, as claimed.  §2.8 – Primitive roots and power residues Consider the congruence Xn ≡ 1 mod m; note that any solution a must satisfy (a, n) = 1. Definition: Given a with (a, m) = 1, the multiplicative order of a modulo m (often called simply the order of a) is the least positive integer k such that ak ≡ 1 mod m. One sometimes says that a belongs to the exponent k modulo m.

18 Example: Let m = 11, a = 3. We have

31 ≡ 3 mod 11, 32 ≡ 2 mod 11, 33 ≡ 5 mod 11, 34 ≡ 4 mod 11, 35 ≡ 1 mod 11, and we see that the order of 3 modulo 11 is 5. Fact: The order of a modulo m always divides φ(m).

19 4 Week Four

4.1 Lecture Nine

Lemma 4.1.1 (Lemma 2.31, Niven) ak ≡ 1 mod m if and only if the order of a modulo m divides k. Proof: Let h be the order of a modulo m. If h|k, we have k = hq for some q, hence

ak = ahq = (ah)q ≡ 1q mod m ≡ 1 mod m.

Conversely, if ak ≡ 1 mod m, we may use the division algorithm to write k = hq + r, 0 ≤ r < h. One then has 1 ≡ ak mod m ≡ (ah)qar mod m ≡ ar mod m. Since h is the minimal positive integer such that ah ≡ 1 mod m, it follows that r = 0, and we are done.  If (a, m) = 1, then the order of a modulo m divides φ(m). k h Lemma 4.1.2 (Lemma 2.33, Niven) If a has order h modulo m, then a has order (h,k) modulo m. 2 h For example, the order of a modulo m is 2 if h is even, and h if h is odd. Proof: The following statements about positive integers j are equivalent: 1. (ak)j ≡ 1 mod m 2. h|(kj) h k 3. (h,k) | (h,k) j h 4. (h,k) |j h It follows that the least positive j satisfying (4), and hence (1), is exactly j = (h,k) .  × Remark: The subgroup of Zm generated by a is a of order h. The same proof shows that the h smallest positive integer y such that ky ≡ 0 mod h is y = (h,k) . Lemma 4.1.3 Let a have order r modulo m, and let b have order s modulo m. Then the order of ab modulo rs rs [r,s] m divides (r,s) = [r, s], and moreover is a multiple of (r,s)2 = (r,s) . In particular (Lemma 2.34, Niven), if (r, s) = 1, then the order of ab modulo m is exactly rs. Proof: Let t be the order of ab modulo m. Then

(ab)rs/(r,s) = (ar)s/(r,s)(bs)r/(r,s) ≡ (1)(1) mod m ≡ 1 mod m,

rs and it follows that t| (r,s) . We also have

ast ≡ ast(bs)t mod m ≡ ((ab)t)s mod m ≡ 1 mod m,

r s r s  r s  hence r|st, so (r,s) | (r,s) t ⇒ (r,s) |t. By a symmetric argument we may show that (r,s) |t, and since (r,s) , (r,s) = 1 rs it follows that (r,s)2 |t.  Definition: An integer a is called a primitive root modulo m if it has order φ(m) modulo m. In this case, × Zm is the cyclic group of order φ(m).

20 Proposition 4.1.4 If m has a primitive root, then it has exactly φ(φ(m)) primitive roots. Proof: Let g be a primitive root modulo m. Then we have a reduced residue system modulo m given by 2 φ(m) j φ(m) {g, g , . . . , g }. By lemma 4.1.2, the order of g modulo m is exactly (j,φ(m)) , which equals φ(m) exactly when (j, φ(m)) = 1. There are exactly φ(φ(m)) such residue classes, and we are done.  r Lemma 4.1.5 (Lemma 2.35, Niven) Let p, q be primes and let r ∈ N be such that q |(p − 1). Then there are qr − qr−1 residue classes of order qr modulo p. Proof: The order of a modulo p divides qr if and only if aqr ≡ 1 mod p. This congruence has exactly qr solutions by corollary 1 of proposition 3.3.3. The order of a modulo p divides qr−1 if and only if aqr−1 ≡ 1 mod p, which has exactly qr−1 solutions. The result is now immediate.  Theorem 4.1.6 (Theorem 2.36, Niven) Every prime p has a primitive root. Proof: If p = 2 the result is immediate, so assume p is odd and write p − 1 in its prime factorization

r1 r2 rk p − 1 = q1 q2 ··· qk .

rj For each 1 ≤ j ≤ k, let aj be some integer of order qj modulo p, whose existence is guaranteed by lemma 4.1.5. ri rj r1 r2 Since (qi , qj ) = 1 for all i 6= j, we have by lemma 2.34 of Niven that a1a2 has order q1 q2 modulo p, that r1 r2 r3 a1a2a3 has order q1 q2 q3 modulo p, and continuing in this fashion, we eventually see that a1a2 ··· ak has order p − 1 modulo p, as claimed. 

21 4.2 Lecture Ten

Example: Modulo 5, the reduced residue classes are 1, 2, 3, and 4, with respective orders 1, 4, 4, and 2; we see that 2 and 3 are the φ(φ(5)) primitive roots modulo 5. What are the primitive roots modulo 25? Exactly

{2, 3, 8, 12, 13, 17, 22, 23}.

Note that there are 8 = φ(φ(25)) of them, and that all are also primitive roots modulo 5. In fact, we may lift any primitive root modulo p to p − 1 primitive roots modulo p2, and for j ≥ 2, any primitive root modulo pj lifts to exactly p primitive roots modulo pj+1. Proposition 4.2.1 For n ≥ 1, we have X φ(d) = n. d|n

1 2 n Proof: The fractions { n , n ,..., n } are not all in lowest terms; when we do so, we may consider their denomi- nators. For every divisor d of n, exactly φ(d) of these fractions have denominator d; indeed, these fractions are exactly k(n/d)  : 1 ≤ k ≤ d, (k, d) = 1 . n Since there are exactly n fractions in our original set, the result follows.  Alternative proof of the existence of primitive roots modulo p: We use strong induction to find the number of elements of order k modulo p, namely φ(k) if k | (p − 1), and 0 if k - (p − 1). The case k = 1 is trivial. For k > 1, k | (p − 1), we first note that X X φ(k) + φ(d) = φ(d) = k. d|k, d|k d

Since p is prime, there are exactly k solutions to the congruence xk ≡ 1 mod p, which are exactly those x modulo p with order dividing k. This, again, is exactly the sum X #{x : ordp(x) = k} + #{x : ordp(x) = d}, d|k, d

r−2 gp (p−1) 6≡ 1 mod pr.

Moreover, the converse holds if g is a primitive root modulo pr−1. Proof: If g is a primitive root modulo pr, then

r r−1 r−2 ordpr (g) = φ(p ) = p (p − 1) > p (p − 1),

22 from which it follows that r−2 gp (p−1) 6≡ 1 mod pr. Now, suppose that g is a primitive root modulo pr−1 and that

r−2 gp (p−1) 6≡ 1 mod pr.

The order of g modulo pr divides φ(pr) = pr−1(p − 1), and by lemma 4.2.2 must be a multiple of pr−2(p − 1). r−2 Since ordpr (g) 6= p (p − 1) by assumption, we deduce the result.  Theorem 4.2.4 Primitive roots exist modulo p2 for any prime p. Proof: Let g be a primitve root modulo p and consider the lifts g + tp modulo p2, 0 ≤ t ≤ p − 1. We claim that all but one of these lifts are primitive roots modulo p2. Indeed, by proposition 4.2.3 it suffices to show that exactly one lift satifsies

(g + tp)p−1 ≡ 1 mod p2.

Let f(X) = Xp−1 − 1. Then g is a root of f(X) modulo p, and

f 0(g) = (p − 1)gp−2 6≡ 0 mod p.

Thus g is a nonsingular root of f modulo p, and so by Hensel’s lemma exactly one lift of g is a root of f modulo p2; every other such lift must then yield a primitive root.  Lemma 4.2.5 If g is a primitive root modulo p2, then it is also a primitive root modulo p. Proof: If ak ≡ 1 mod p, then

apk − 1 = (ak − 1)((ak)p−1 + (ak)p−2 + ··· + ak + 1).

Both factors are multiples of p, so it follows that apk ≡ 1 mod p2. In particular, if g is a primitive root modulo p2, then gpk 6≡ 1 mod p2 for k = 1, 2, . . . , p − 2. Hence gk 6≡ 1 mod p for 1 ≤ k ≤ p − 2, and it follows that the order of g modulo p is p − 1.  Next, we will consider primitive roots modulo pr for r ≥ 3. No more degenerate cases arise here, except when p = 2. In this case, there are no primitive roots modulo 2r for any r ≥ 3.

23 4.3 Lecture Eleven

Theorem 4.3.1 Let p be an odd prime and let r ≥ 2. Then any primitve root modulo p2 is a primitive root modulo pr. Proof: We induct on r. The case r = 2 is trivial, so for r > 2 assume g is a primitive root modulo pr; we will show that g is a primitive root modulo pr+1. Indeed, by proposition 4.2.3 we have that

r−2 gp (p−1) 6≡ 1 mod pr, and so by the same proposition it suffices to show that gpr−1(p−1) 6≡ 1 mod pr+1. By Euler’s theorem we have that r−2 gp (p−1) ≡ 1 mod pr−1, so we can write gpr−2(p−1) = 1 + npr−1 for some n 6≡ 0 mod p. By the binomial theorem we have that

p   r−1 X p gp (p−1) = (1 + npr−1)p = (npr−1)k, k n=0

p r+1 p r−1 k and since p| k for 2 ≤ k ≤ p − 1, we see that p | k (np ) . In fact we also have this divisibilty when k = p, and so r−1 gp (p−1) ≡ 1 + npr mod pr+1 6≡ 1 mod pr+1, and we are done.  p 2 2r−2 nb. We only use the fact that p is odd in the cancellation of 2 n p . r r−2 1 r Lemma 4.3.2 If r ≥ 3, then the order of every odd integer modulo 2 divides 2 = 2 φ(2 ). In particular, there are no primitive roots modulo 2r. Proof: Again we induct on r. We did the case r = 3 in the last lecture, and so assuming the claim is true for some r with r ≥ 3, then r−2 a2 ≡ 1 mod 2r for every odd a. Then 2r|(a2r−2 − 1) and 2|(a2r−2 + 1) by parity, hence

r−2 r−2 r−1 2r+1|(a2 − 1)(a2 + 1) = a2 − 1, whence a2r−1 ≡ 1 mod 2r+1, as claimed.  α nb. The same proof shows that if a ≡ 5 mod 8, then 2α+2||(a2 − 1), where pk||n if and only if pk | n and k+1 p - n. Theorem 4.3.3 (Theorem 2.43, Niven) Let r ≥ 3; then the set {±5, ±52,..., ±52r−2 } is a reduced residue system modulo 2r. In particular, 5 has order 2r−2 modulo 2r, and the abelian group homomorphism

× f : Z2r−2 × Z2 −→ Z2r given by f(x, y) = 5x(−1)y is an isomorphism.

24 By way of comparison, note that if p is odd, the map is an isomorphism

× f : Zpr−1(p−1) −→ Zpr given by f(x) = gx for any primitive root g modulo pr−1. Proof: The order of 5 modulo 2r divides 2r−2 by lemma 4.3.2, and so if 2r−2 is not the order, then the order divides 2r−3, hence r−3 52 ≡ 1 mod 2r. But then 2r|52r−3 − 1, contradicting our previous remark with α = r − 3. Thus 5 has order 2r−2 modulo 2r, and so the residue classes r−2 {5, 52,..., 52 } are distinct modulo 2r, as are the residue classes

r−2 {−5, −52,..., −52 }.

Finally, 5k ≡ 1 mod 4, while −5k ≡ 3 mod 4, so the two sets above are disjoint, and we are done. 

× e1 e2 er We now know the group structure of Zn for every n. If n has prime factorization n = p1 p2 ··· pr , then by the Chinese remainder theorem × ∼ × × × Zn = Z e1 × Z e2 × · · · × Z er . p1 p2 pr If p is odd, then × ∼ Z ei = Z ei−1 , pi pi (pi−1) and for p = 2 we have  if r = 1, Z1 × ∼ Z2r = Z2 if r = 2, and  Z2r−2 × Z2 if r ≥ 3.

Primitive roots modulo non-prime powers Note that φ(n) is even for every n ≥ 3. If we can write n = cd with (c, d) = 1 and c, d ≥ 3, then the order of 1 1 any a modulo n must divide 2 φ(n) = 2 φ(c)φ(d), as we have

aφ(n)/2 = (aφ(c))φ(d)/2 ≡ 1φ(d)/2 mod c ≡ 1 mod c, and similarly aφ(n)/2 = (aφ(d))φ(c)/2 ≡ 1φ(c)/2 mod d ≡ 1 mod d, since by our assumption 2|φ(c), 2|φ(d). Our claim then follows by the Chinese remainder theorem. The only integers a which do not have such a factorization are powers of 2, or are of the form a = pr or a = 2pr, where p is an odd prime and r ≥ 1. Numbers of this form are the only ones which could possibly have primitive roots. Theorem 4.3.4 (Theorem 2.41, Niven) The moduli that have primitive roots are exactly 1, 2, 4, pr, and 2pr, where p is an odd prime and r ≥ 1. Proof: Next lecture.

25 5 Week Five

5.1 Lecture Twelve

Fun fact! If S(x) denotes the set of squarefree numbers s with s ≤ x, then one has

#S(x) 6 lim = . n→∞ x π2

Recall theorem 4.3.4 from last lecture, and let PR denote the set of moduli which have primitive roots. For example, modulo 18, we have φ(18) = 6, and indeed a reduced residue system is given by {1, 5, 7, 11, 13, 17}, which have respective order 1, 6, 3, 6, 3, and 2. Thus 5 and 11 are primitive roots modulo 18, and as expected we find there are 2 = φ(φ(18)) of them. Similarly, modulo 9 a reduced residue system is given by {1, 2, 4, 5, 7, 8} with respective orders 1, 6, 3, 6, 3, and × 2 (note the similarity with Z18), and we have the same result with the primitive roots 2 and 5. Proof: (of theorem 4.3.4) We need only check that m = 2pr has primitive roots, the other claims having r already been proven. If {a1, a2, . . . , aφ(pr)} is a reduced residue system modulo p , then we claim that

r {aj : 2 - aj} ∪ {aj + p : 2 | aj} is a reduced residue system modulo 2pr. Indeed, we see that we have exactly φ(2pr) = φ(2)φ(pr) = φ(pr) residue classes, that all are distinct, and since (aj, p) = 1 we have u, v so that aju + pv = 1; thus writing x = u and y = v − pr−1u, we have

r−1 r r 1 = ajx + p(y + p x) = (aj + p )x + py ⇒ (aj + p , p) = 1,

r r and hence (since p is assumed odd) aj + p is indeed a unit modulo 2p , by the Chinese remainder theorem. Furthermore, the order of the elements of the latter set (the lifts of the even aj) do not change, as for 0 < k < ordpr (aj) we have k X k (a + pr)k = anpr(k−n) ≡ ak mod pr, j n j j n=0 k r which is nonzero by assumption, thus aj 6≡ 0 mod 2p . The same argument holds for the odd aj, and we see that one of the elements in our reduced residue system must have order φ(pr) = φ(2pr), which completes the proof.  × ∼ × Remark: When m is odd, we have an isomorphism of groups π : Zm −→ Z2m. Corollary 1: (Corollary 2.42, Niven) Let m ∈ PR and let (a, m) = 1. The congruence xn ≡ a mod m has d solutions if aφ(m)/d ≡ 1 mod m where d = (n, φ(m)), and zero solutions otherwise. Remark: The analogue for m = 2r, r ≥ 3, is corollary 2.44 in Niven. Proof: Let g be a primitive root modulo m. Choose j, 1 ≤ j ≤ φ(m) so that gj ≡ a mod m, and note that if xn ≡ a mod m then one must have (x, n) = 1. For every such x, there exists k so that gk ≡ x mod m, and thus it suffices to solve the congruence (gk)n ≡ gj mod m for k. Since the order of g is φ(m), this congruence has a solution if and only if kn ≡ j mod φ(m). For fixed j, theorem 3.1.3 tells us that there are d = (n, φ(m)) solutions if d|j, and none otherwise. But d|j if and only if j = dl for some 1 ≤ l ≤ m, if and only if a ≡ gdl mod m.

26 Finally, this is equivalent to the statement that aφ(m)/d ≡ gφ(m)l mod m (it is a sufficient condition because gdi 6≡ 1 mod m for 1 ≤ i ≤ l − 1); but gφ(m)l ≡ 1 mod m, and we are done.  Corollary 2: (Corollary 2.38, Niven; Euler’s criterion): Let p be an odd prime. The congruence X2 ≡ a mod p p−1 has two solutions if a 2 ≡ 1 mod p, and no solutions otherwise. There is one solution if p|a.

Definition: The Carmichael lambda function, denoted λ(m), is the smallest exponent e ∈ N such that ae ≡ 1 mod m for every (a, m) = 1. Remark: We know λ(m)|φ(m), and λ(m) = φ(m) if and only if m ∈ PR. Moreover, as seen last week, if φ(m) m ∈ PR then λ(m) ≤ 2 . By the Chinese remainder theorem,

e1 e2 er e1 e2 er λ(p1 p2 ··· pr ) = [p1 , p2 , . . . , pr ].

For odd primes, we have λ(pr) = pr−1(p − 1), which also holds for p = 2 and r ≤ 2. For r ≥ 3, one has instead r r−2 × λ(2 )/2 . Group theoretically, λ(m) is the exponent of the group Zm. Definition:A base-b pseudoprime is a composite number m such that bm−1 ≡ 1 mod m. For example, we may take b = 2, m = 341; then

210 = 1024 = 3 · 341 + 1, and so 2341−1 = (210)34 ≡ 134 mod 341 ≡ 1 mod 341. Thus 341 is a base-2 pseudoprime. This notion gives rise to the Fermat test for primality: if bm−1 6≡ 1 mod m, then m is composite. For example, with m = 341, b = 3, we have 3341−1 ≡ 56 mod 341 6≡ 1 mod 341, and it follows that 341 is not prime.

27 5.2 Lecture Thirteen

Recall: Fermat’s test for primality. Definition: Let m be composite. Then m is called a Carmichael number if bm−1 ≡ 1 mod m for all (b, m) = 1. For example, we might take m = 561 = 3 · 11 · 17. If (b, m) = 1, then we have by Euler’s theorem  (b2)280 mod 3 ≡ 1 mod 3,  b561−1 ≡ (b10)56 mod 11 ≡ 1 mod 11, (b16)35 mod 17 ≡ 1 mod 17.

The Chinese remainder theorem then implies that b560 ≡ 1 mod m. In 1994, Alford, Granville, and Pomerance showed that there are infinitely many Carmichael numbers, in the paper of the same name.

In fact, if 6k + 1, 12k + 1, and 18k + 1 are all prime for some k ∈ N, then their product is a Carmichael number. For example with k = 1 we get that 1729 is a Carmichael number. §3.1 – Quadratic residues Most generally, we will investigate congruences of the form aX2 + bX + c ≡ 0 mod p, where p is an odd prime. Completing the square gives 4a2X2 + 4abX + 4ac ≡ 0 mod p ⇒ (2aX + b)2 ≡ b2 − 4ac mod p. Thus we are led to ask when y2 ≡ ∆ mod p (where ∆ = b2 − 4ac is the discriminant of our polynomial) has a solution. If so, then 2aX + b ≡ y mod p ⇔ x ≡ (y − b)(2a)−1 mod p. We note the obvious analogue of the quadratic formula. Thus it suffices to investigate when X2 ≡ a mod p can be solved. By Euler’s criterion, this occurs exactly when

p−1 a 2 ≡ 1 mod p, if p - a.

p−1 Example: We investigate such congruences modulo 7, when 2 = 3. 3 2 a ord7(a) a mod 7 Solutions of x ≡ a mod 7 0 – 0 x ≡ 0 mod 7 1 1 1 x ≡ 1, 6 mod 7 2 3 1 x ≡ 3, 4 mod 7 3 6 −1 none 4 3 1 x ≡ 2, 5 mod 7 5 6 −1 none 6 2 −1 none

Definition: If (a, m) = 1, then a is called a quadratic residue modulo m if X2 ≡ a mod m has a solution, and a quadratic nonresidue otherwise. a Definition: If p is an odd prime, define the Legendre symbol p via  1 if a is a quadratic residue modulo p, a  = −1 if a is a quadratic nonresidue modulo p, p 0 if p|a.

28 a b 2 Remark: If a ≡ b mod p, then p = p . Moreover, the number of solutions of X ≡ a mod p is exactly a p + 1.

p−1 a 2 Theorem 5.2.1 (Theorem 3.1, Niven) If p is an odd prime and (a, p) = 1, then p = a . Proof: We give two proofs. In the first, we simply use Euler’s criterion (this is left as an exercise). For the second, we observe that if a is a quadratic residue modulo p, then we can choose some z such that 2 2 z ≡ (−z) mod p ≡ a mod p. We then pair the reduced residue classes modulo p apart from ±z as (xi, yi), p−3 with xiyi ≡ a mod p. There are 2 such pairs, and by Wilson’s theorem

p−3 Y2 −1 ≡ (p − 1)! mod p ≡ z(−z) xiyi mod p i=1

p−3 p−1 ≡ −a · a 2 mod p ≡ −a 2 mod p, and the result follows. If a is a nonresidue, we repeat the above construction, this time pairing all residue p−1 classes xiy1 ≡ a mod p, i = 1, 2,..., 2 , and we are done.  ab ab a2 Corollary 1: For any integers a, b, we have p = p p ; in particular, if (a, p) = 1 we have p = 1. In other words, the product of two quadratic residues is a quadratic residue, as is the product of two quadratic nonresidues. The product of a residue and a nonresidue is a nonresidue – compare this behaviour with that of the positive and negative integers.

29 5.3 Lecture Fourteen

Recall: The Legendre symbol for p - a is defined ( a 1 if x2 ≡ a mod p has a solution, = p −1 otherwise.

p−1 2 a By Euler’s criterion, we showed that a ≡ p mod p. Example: When a = −1 and p is odd, we have that ( −1 p−1 1 if p ≡ 1 mod 4, ≡ (−1) 2 mod p ≡ p −1 if p ≡ 3 mod 4.

So X2 ≡ −1 mod p has two solutions if p ≡ 1 mod 4, and no solutions if p ≡ 3 mod 4. nb. For odd primes p, we have

p−1 p−1 2   Y p−1 Y p−1 p − 1 i ≡ (−1) 2 j mod p ≡ (−1) 2 ! mod p. (1) 2 p+1 j=1 i= 2 In particular, if p ≡ 1 mod 4 we get

  2   p−1 p − 1 p − 1 p−1 Y ! ≡ (−1) 2 i mod p ≡ (p − 1)! mod p ≡ −1 mod p, 2 2 p+1 i= 2

 p−1  2 and hence x = 2 ! solves x ≡ −1 mod p. Theorem 5.3.1 (The Law of Quadratic Reciprocity) Let p 6= q be odd primes; then    p q p−1 · q−1 = (−1) 2 2 . q p

p q p q In other words, q = p if p or q ≡ 1 mod 4, and q = − p if p ≡ q ≡ 3 mod 4. Knowing whether or not X2 ≡ p mod q has solutions is the same as knowing whether or not X2 ≡ q mod p has solutions. p−1 q−1 Proof: (due to Rousseau, 1991) First, some background. Let α = 2 , β = 2 . Let n pq o F = 1 ≤ k < :(k, pq) = 1 2

× be the “first half” of Zpq and let n q o L = (i, j) ∈ × × × : 1 ≤ i ≤ p − 1, 1 ≤ j < Zp Zq 2

× × be the “left half” of Zp × Zq , and let π : Zpq → Zp × Zq be the map given by the Chinese remainder theorem. × One can see that for every k ∈ Zpq, one has π(k) ∈ L or −π(k) ∈ L (we will write k ∈ −L). For each such k, choose k ∈ {±1}, ik ∈ {1, 2, . . . , p − 1}, jk ∈ {1, 2, . . . , β} such that

π(k) = (ik, jk).

30 0 0 0 In particular, if k 6= k ∈ F , then π(k) 6= π(k ) and π(k) 6= −π(k ). Thus each ordered pair (ik, jk) is distinct, and we obtain !   Y Y Y Y Y (k, k) ≡ π(k) ≡ k(ik, jk) ≡ k  (i, j) , (2) k∈F k∈F k∈F k∈F (i,j)∈L × × the calculation taking place in Zp × Zq and the congruences taken (modp, modq). Now, consider the right-hand side of (2): we have (with the same notation convention)

p−1 β Y Y Y (i, j) ≡ (i, j) ≡ (((p − 1)!)β, (β!)p−1). k∈F i=1 j=1 From (1), we have that q−1 Y i ≡ (−1)ββ! mod q, i=β+1 hence (modp, modq) we have α   q−1   Y β Y β β αβ α (i, j) ≡ ((p − 1)!) , β! · i(−1)   ≡ (((p − 1)!) , (−1) ((q − 1)!) ), (i,j)∈L β+1 and finally by Wilson’s theorem we obtain Y (i, j) ≡ ((−1)β, (−1)αβ(−1)α). (i,j)∈L Q Thus with  = k∈F k, the right-hand side of (2) becomes ((−1)β, (−1)αβ(−1)α). Now, on the left-hand side, we look at the first co-ordinate modulo p:    −1     Y Y  Y   Y  k ≡ k ≡  k  k . (3) k∈F pq  pq   pq  1≤k< 2 , 1≤k< 2 , 1≤k< 2 , (pq,k)=1 p-k q|k  pq  The first factor in (3) splits into intervals of length p − 1, with one exception, namely the interval ending 2 . Thus modulo p we see         Y Y Y Y Y k =  k  k ···  k  k ; pq 1≤k≤p−1 p+1≤k≤2p−1 (β−1)p≤k≤βp−1 βp+1≤k≤βp+α 1≤k< 2 , p-k  pq  but βp + α = 2 , so we see that Y k ≡ ((p − 1)!)βα! mod p. pq 1≤k< 2 , p-k The second factor of (3) is the inverse of Y q k ≡ q · 2q ··· αq mod p ≡ qαα! mod p ≡ α! mod p, pq p 1≤k< 2 , q|k

31 with the last congruence following by Euler’s criterion. Thus (3) becomes

−1 Y q  k ≡ ((p − 1)!)βα! α! mod p, p k∈F

βq which by Wilson’s theorem is congruent modulo p to (−1) p . The same proof shows

Y p k ≡ (−1)α mod q, q k∈F and so (2) becomes

 q p (−1)β , (−1)α ≡ ((−1)β, (−1)αβ(−1)α) (modp, modq). p q

q p αβ αβq The first co-ordinate tells us that p ≡  mod p, and the second that q = (−1)  = (−1) p (where we q have equality rather than congruence, as p ∈ {±1} and p is odd), hence

pq = (−1)αβ, q p as claimed. 

32 6 Week Six

6.1 Lecture Fifteen

p−1 −1 2 Recall: Last week, we saw that Euler’s criterion implies that p = (−1) for any odd prime p. In other words, x2 ≡ −1 mod p has 2 solutions if p ≡ 1 mod 4, and no solutions if p ≡ 3 mod 4. There is a single solution if p = 2. Consequently, we see that, for every integer x, all of the prime factors of x2 +1 (other than 2) must be congruent to 1 modulo 4. Similarly, for any x, k ∈ Z we have that all prime factors p of x2 + k2 satisfy p | 2k or p ≡ 1 mod 4,

2 2 2 2 −1 2 since if p - k then x + k ≡ 0 mod p implies that x ≡ −k mod p, hence (xk ) ≡ −1 mod p and so p = 2 or p ≡ 1 mod 4. Note that in the first case, we must have (x, k) > 1. Example: We use quadratic reciprocity to answer the question: Does x2 ≡ 55 mod 367 have a solution? Note that 367 is a prime congruent to 3 modulo 4. 55  To answer this question we compute the Legendre symbol 367 : by multiplicativity we have  55   5  11  = . 367 367 367 The law of quadratic reciprocity then implies that  5  367 2 = = = −1, 367 5 5 since the quadratic residues modulo 5 are 1 and 4, and similarly  11  367  4   2 2 = − = − = − = −1. 367 11 11 11

55  Thus 367 = (−1)(−1) = 1, and we see that 55 is a quadratic residue modulo 367. The theorem is non- constructive, but one may check that (±34)2 ≡ 55 mod 367. We see from this example that one algorithm for calculating (ap) is given by:

e1 e2 ek 1. Factor a completely, a = p1 p2 ··· pk . 2. Use multiplicativity and periodicity: a pe1 pe2  pek  = 1 2 ··· k . p p p p

3. Use the law of quadratic reciprocity. 4. If not finished, return to 1. Theorem 6.1.1 (Theorem 3.3, Niven) If p is an odd prime, then

2 p2−1 = (−1) 8 ; p that is, ( 2 1 if p ≡ ±1 mod 8, = p −1 if p ≡ ±3 mod 8.

33 The proof is not given here. §3.3 – The Jacobi symbol

Let p1, p2, . . . , pk be odd primes (not necessarily distinct), and let Q be their product. The Jacobi symbol a  Q is defined k  a  Y  a  = , Q p j=1 j where the symbols on the right are Legendre symbols. 8  Example: We compute the Jacobi symbol 15 . We have  8  88 22 = = = (−1)(−1) = 1. 15 3 5 5 5

8  2 2 Note that although the Jacobi symbol 15 is 1, the congruence x ≡ 8 mod 15 has no solution, as x ≡ 2 mod 3 a  2 hasn’t any. However, we can say that, if Q = −1, then x ≡ a mod Q has no solutions. Our example shows that the converse is false; why, then, define the Jacobi symbol at all? There are several reasons, chief among which are 1. It agrees with the Legendre symbol when Q is prime, and 2. It is easy to compute without factoring any integers. The first of these assertions is clear, but the second is not yet. Properties of the Jacobi symbol • It is totally multiplicative in both arguments; that is, if Q and R are odd primes, then for any a, b we have ab  a  b   a   a a = , = . Q Q Q QR Q R

a  b  • It is periodic in the top argument with period Q, i.e. if a ≡ b mod Q then Q = Q . The second property is immediate if Q is squarefree, and if not then we write Q = Q0S with Q0 squarefree and S a perfect square, and we have that

 a   a a  a  a 2  a  = = √ = . Q Q0 S Q0 S Q

Before proceeding, we first record the following

Lemma 6.1.2 If b1, b2, . . . , bk are odd, then

k X bj − 1 b1b2 ··· bk − 1 ≡ mod 2. 2 2 j=1

Proof: If k = 2, then b b − 1 b − 1 b − 1 (b − 1)(b − 1) 1 2 − 1 + 2 = 1 2 ≡ 0 mod 2, 2 2 2 2 and the general case follows by induction (exercise). 

34 −1 Theorem 6.1.3 (Theorem 3.7, Niven) If Q > 0 is odd, then the Jacobi symbol Q equals ( Q−1 1 if Q ≡ 1 mod 4, (−1) 2 = −1 if Q ≡ 3 mod 4.

Proof: Since square factors of Q do not affect the Jacobi symbol (as illustrated above), we may assume without loss of generality that Q = p1p2 ··· pk is squarefree. Then by lemma 6.1.2 we have that Q − 1 p − 1 p − 1 p − 1 ≡ 1 · 2 ··· k mod 2, 2 2 2 2 hence       −1 −1−1 −1 p1−1 p2−1 pk−1 Q−1 = ··· = (−1) 2 (−1) 2 ··· (−1) 2 = (−1) 2 , Q p1 p2 pk as claimed. 

35 6.2 Lecture Sixteen

Theorem 6.2.1 (Theorem 3.8, Niven; the law of Quadratic reciprocity for Jacobi symbols) Let P,Q ∈ N be odd with (P,Q) = 1. Then

   ( P Q P −1 · Q−1 −1 if P ≡ Q ≡ 3 mod 4, = (−1) 2 2 = Q P 1 otherwise.

P  Note that if (P,Q) > 1, we must have Q = 0.

Proof: Write P = p1p2 ··· pk,Q = q1q2 ··· ql, where the pi and qj are odd (not necessarily distinct) primes. By multiplicativity, we have   k   k l   P Y pi Y Y pi = = , Q Q q i=1 i=1 j=1 j where the factors in the last product are Legendre symbols. The law of quadratic reciprocity (for Legendre symbols) then implies that

k l     q −1   p −1 q −1 P Y Y qj pi−1 · j Q Pk Pl i · j = (−1) 2 2 = (−1) i=1 j=1 2 2 . Q p P i=1 j=1 i

By lemma 6.1.2 from our last lecture, the exponent of −1 is exactly

k l X X pi − 1 qj − 1 P − 1 Q − 1 · ≡ · , 2 2 2 2 i=1 j=1 hence   P P −1 · Q−1 = (−1) 2 2 , Q as claimed.  2 Application: We calculate the Legendre symbol p , where p is an odd prime; rather, we will show that the 2  Jacobi symbol Q obeys the formula from last lecture, namely (  2  Q2−1 1 if Q ≡ ±1 mod 8, = (−1) 8 = Q −1 if Q ≡ ±3 mod 8, from which the special case of the Legendre symbol follows. By periodicity in the top argument, we have that  2  2 − Q −1Q − 2 Q−1 Q − 2 = = = (−1) 2 . Q Q Q Q Q Since Q is odd and positive, we must have that (Q, Q−2) = 1, and so by quadratic reciprocity we see that     2 Q−1 Q Q−1 · Q−3 = (−1) 2 (−1) 2 2 ; Q Q − 2 again, since one of Q − 1 and Q − 3 must be divisible by 4, we cancel the last factor and obtain

 2  Q−1  Q  Q−1  2  = (−1) 2 = (−1) 2 . Q Q − 2 Q − 2

36 By descent, we obtain     2 Q−1 Q−3 3 2 2 = (−1) 2 (−1) 2 ··· (−1) (−1) , Q 3 and finally since 2 is a quadratic nonresidue modulo 3 we have

  2 2 1+2+···+ Q−1 1 · Q−1 · Q+1 Q −1 = (−1) 2 = (−1) 2 2 2 = (−1) 8 , Q and we are done.  a  We can turn this into a general algorithm for computing the Jacobi symbol. Indeed, to compute Q , we may apply the following steps: P  1. Factor −1 and any powers of 2 from a, leaving Q with P an odd positive number. 2. Use quadratic reciprocity and periodicity. 3. If not finished, return to 1. Note, in particular, that this algorithm doesn’t require us to factor any integers. Example: 53681 is prime and congruent to 1 modulo 4. Is 1311 a quadratic residue modulo 53681? It suffices to compute the Jacobi symbol, which in the case that Q is an odd prime is exactly the Legendre symbol. Using the algorithm outlined above, we find  1311  53681 −70  −1  2  35  = = = 53681 1311 1311 1311 1311 1311

 35  1311 16  4 2 = (−1)(1) = − (−1) = = = 1. 1311 35 35 35 So 1311 is indeed a square modulo 53681. Here we will give an outline of a more “traditional” proof of the law of quadratic reciprocity, nearer to the proof given in Niven. We start with a preliminary result. Lemma 6.2.2 (Gauss’s lemma) Let p be an odd prime and let  p − 1 p + 1 p + 3  F = 1, 2,..., , −F = , , . . . , p − 1 . 2 2 2

a n Given a with (a, p) = 1, let n = #{k ∈ F : ak mod p ∈ −F }. Then p = (−1) .

 2  p p Note that from this we can immediately compute p , since in this case n = #{ 4 < k < 2 }. Next, we show that p−1 X2 aj  n ≡ mod 2, p j=1 and we also use the fact that p−1 q−1 X2 aj  X2 kp p − 1 q − 1 + = · . p q 2 2 j=1 k=1 One proof of this fact counts lattice points in the rectangle R in the first quadrant, whose vertices are at (0, 0), (0, q), (p, 0) and (p, q); specifically, those lying above and below the line segment joining the origin to (p, q) — but this is all the detail we give here.

37 With this machinery, we can show that there are infinitely many primes congruent to 1 modulo 4. Indeed, if p1, p2, . . . , pk is any finite list of such primes, let

2 N = (2p1p2 ··· pk) + 1.

Then pi - N for i = 1, 2, . . . , k. But since N is one more than a square and odd, we know that all of its prime factors must be congruent to 1 modulo 4; in particular, there must be such a prime which is not on the list.

38 6.3 Lecture Seventeen

Final exam date: Friday, December 8, at noon. Definition:A degree-d form (or homogeneous polynomial) is a polynomial, each of whose monomials has degree d. For example, X3 + 2Y 3 + 3Y 2Z − 4XYZ is a degree-3 form. A binary form is a form in two variables, and a quadratic form is a degree-2 form. We will focus on binary quadratic forms. Example: One binary quadratic form is f(X,Y ) = X2 + Y 2; another is g(X,Y ) = 53X2 + 152XY + 109Y 2. Among the questions we might ask about binary quadratic forms f(X,Y ), two important ones are:

1. Which m ∈ Z are represented by f? That is, for which m ∈ Z do we have x, y ∈ Z with f(x, y) = m? 2. Which n ∈ Z can be properly represented by f? That is, when is m represented m = f(x, y) with (x, y) = 1? One motivation for the second question is the observation that for any binary quadratic form f, we have f(dx, dy) = d2f(x, y). We first investigate the form f(X,Y ) = X2 + Y 2, and investigate when f represents a prime p. We observe that 2 = 12 + 12, and from now on will restrict our attention to odd primes p. Lemma 6.3.1 If p ≡ 3 mod 4 and p|(x2 + y2), then p|x and p|y. 2 2 2 2 Proof: Since p|(x + y ), we have that x ≡ −y mod p. If p - y, then y is a unit modulo p and we have the equivalent congruence (xy−1)2 ≡ 1 mod p, or p | ((xy−1)2 + 1), contradicting our result from the end of the last lecture that p | ((2n)2 + 1) implies p ≡ 1 mod 4. Thus p | y, from which we immediately see p | x.  In particular, if p ≡ 3 mod 4, then there is no way to express p as the sum of two squares. 2 2 Proposition 6.3.2 If p ≡ 1 mod 4, then there exist x, y ∈ Z such that x + y = p and (x, y) = 1. Proof: Fix some z so that z2 ≡ −1 mod p, and consider the set √ √ S = {u + zv : 0 ≤ u < p, 0 ≤ v < p}. √ It is not difficult to see that #S = (1 + b pc)2, and that √ √ (1 + b pc)2 > d pe2 > p, where dxe denotes the ceiling function. Thus by the pigeonhole principle there must be two distinct elements u + zv, u0 + zv0 (i.e. with not both u = u0 and v = v0) which are congruent modulo p. Define

x = u − u0, y = v0 − v.

Then since u − u0 ≡ z(v0 − v) mod p, we see that x2 ≡ −y2 mod p, and so p|(x2 + y2). Moreover, we see that |x2 + y2| ≤ |x|2 + |y|2 < 2p, and since we do not have x = y = 0 by our earlier remarks, it follows that x2 + y2 = p. Furthermore, if d = (x, y), then it follows that d2|p and hence d = 1.  2 2 Theorem 6.3.3 (due to Fermat) An integer n is properly represented by X + Y if and only if 4 - n and no prime p ≡ 3 mod 4 has p | n.

39 Proof: Suppose first that n = x2 + y2 with (x, y) = 1, and let p ≡ 3 mod 4 be prime. If p|(x2 + y2), then by lemma 6.3.2 p|x and p|y, thus (x, y) > 1, a contradiction. Conversely suppose that no prime factor p of n has p ≡ 3 mod 4. Since we know each prime factor is properly represented, its suffices to prove that the product mn of any numbers m, n properly represented by X2 + Y 2, is itself properly represented. Write m = w2 + z2 and n = x2 + y2 with (w, z) = (x, y) = 1. Then

mn = (wx)2 + (wy)2 + (xz)2 + (yz)2 = (wx − yz)2 + (wy − xz)2, and it suffices to check coprimality. [Here we encounter an error in the proof, the rest of which has been omitted.] In the next lecture, we will prove the following, also due to Fermat.

Theorem 6.3.4 Given n ∈ N, write n in its prime factorization as

k l α Y βi Y γj n = 2 pi qj , i=1 j=1

2 2 where every pi has pi ≡ 1 mod 4 and every qj has qj ≡ 3 mod 4. Then n is represented by X + Y if and only 2 if every γj is even; in other words, if and only if we can write n = ab , where

p|a ⇒ p 6≡ 3 mod 4 and p|b ⇒ p ≡ 3 mod 4.

40 7 Week Seven

7.1 Lecture Eighteen

Recall: Theorem 6.3.4. Proof: Lemma 6.3.1 showed that if q|(x2 + y2) and q ≡ 3 mod 4 is prime, then q|x and q|y, thus q2|(x2 + y2). Conversely, proposition 6.3.2 showed the converse statement for p ≡ 1 mod 4, and theorem 6.3.3 for 2 and for q2, q ≡ 3 mod 4, and since (a2 + b2)(c2 + d2) = (ac − bd)2 + (ad + bc)2 we see that representability by X2 + Y 2 is multiplicative, which completes the proof.  2 2 Fact: A positive integer n can be properly represented by X + Y if and only if each γj = 0; that is, if and only if no prime congruent to 3 modulo 4 divides n. The proof of one implication was attempted at the end of the last lecture; today, we develop machinery to prove more general statements. [Aside: Lagrange’s Four-Square theorem asserts that any nonnegative integer can be written as the sum of at most four squares. One proves this first for primes, then by showing multiplicative closure of representability by W 2 + X2 + Y 2 + Z2. We may draw an analogy between the corresponding observation in the proof of theorem 6.3.4 and multiplicativity of the complex norm |a + ib|2 = a2 + b2, and that of the norm in the ring of quaternions, |a + ib + jc + kd|2 = a2 + b2 + c2 + d2.

Moreover let f(X1,X2,...,Xn) be any quadratic form. If f represents every integer in the set {1, 2,..., 15}, then f represents every integer. This is known as the Fifteen Theorem.] §3.4 – Binary quadratic forms Notation: For the remainder of this lecture, f(X,Y ) = aX2 + bXY + cY 2 will denote an arbitrary quadratic form of discriminant d = b2 − 4ac. When does f(x, y) = 0 for x, y not both 0? Suppose d is a perfect square. If a 6= 0 then we may factor f over via Q √ ! √ ! b − 2 d b − d f(x, y) = a x + y x + y , 2a 2a and so by proposition 6.2.2 we see that f also factors over Z. In this case, there are many ways to represent 0, as we need only make one of the factors equal zero. If a = 0 then f(X,Y ) = Y (bX + cY ) and we have the same observation. In the case d = 0, we can write f(X,Y ) = e(gX + hY )2 for some integers e, g, h. If e > 0 then f is positive semidefinite; that is, f(x, y) ≥ 0 for any x, y ∈ Z. Similarly if e < 0 then f(x, y) ≤ 0 for all x, y ∈ Z, and f is said to be negative semidefinite. If furthermore f(x, y) = 0 implies that x = y = 0, then f is said to be positive definite (resp. negative definite). 2 Now, suppose d is not a perfect square; then f is irreducible over Q. In particular, ac 6= 0, else d = b which is not the case. Theorem 7.1.1 (Theorem 3.10, Niven) Suppose that a binary quadratic form f(X,Y ) has discriminant d < 0; then f is definite (i.e. positive definite or negative definite). Proof: Suppose f(m, n) = 0 and suppose n 6= 0. The identity

4af(x, y) = (2ax + by)2 − dy2

41 implies that m (2am + bn)2 − dn2 = 0 ⇔ dn2 = (2am + bn)2 ⇔ d = (2a + b)2, n so d < 0 is the square of a rational number, which is a contradiction. A symmetric argument with the assumption m 6= 0 completes the proof.  We might ask: when is f positive? negative? Theorem 7.1.2 (Theorem 3.11, Niven) Let f be a binary quadratic form of discriminant d. If d > 0 then f is indefinite, that is, f represents both positive and negative values. If d < 0 and a > 0, then f is positive definite. If d < 0 and a < 0, then f is negative definite. Proof: Suppose d > 0. Then if a 6= 0 we have that f(1, 0) = a and f(b, −2a) = −ad, and since d > 0 we know that a and −ad have opposite signs, so f is indefinite. The same argument works if we assume c 6= 0, using f(0, 1) = c, f(−2c, b) = −cd. Finally if a = c = 0 then f(1, 1) = b, f(−1, 1) = −b, and since f 6= 0 by assumption this exhausts all cases. Suppose now that d < 0 so that in particular d is not a perfect square. Then we know a 6= 0 and so by our identity we have that 4af(x, y) = (2ax + by)2 + |d|y2 ≥ 0, from which it follows that a must have the same sign as f(x, y). The same equation shows that if f(x, y) = 0 then y = 0, thus x = 0, and we are done. 

42 7.2 Lecture Nineteen

Theorem 7.2.1 (Theorem 3.12, Niven) Let d ∈ Z; then there exists a binary quadratic form of discriminant d if and only if d ≡ 0 or 1 mod 4. Proof: Suppose f(X,Y ) = aX2 + bXY + cY 2 has discriminant d; then

d = b2 − 4ac ≡ b2 mod 4, and since the squares modulo 4 are 0 and 1 the result is clear. Conversely, if d ≡ 0 mod 4 we may take 2 d 2 2 d−1 2 f(X,Y ) = X − 4 Y which has discriminant d, and if d ≡ 1 mod 4 we instead take f(X,Y ) = X +XY − 4 Y with the same result.  Theorem 7.2.2 (Theorem 3.13, Niven) Let d, n ∈ Z with n 6= 0. There exists a binary quadratic form of discriminant d that properly represents n if and only if the congruence x2 ≡ d mod 4n has a solution. Remark: This theorem guarantees the existence of some binary quadratic form of discriminant d, but repre- sentability by a specific form is a much harder question. Example: Take n = −3. There is a binary quadratic form of discriminant d representing −3 if and only if x2 ≡ d mod −12 has a solution. The squares modulo 12 are 0, 1, 4, and 9, and so we see that the only binary quadratic forms representing −3 have discriminant d lying in one of these residue classes modulo 12. Proof: Suppose u2 ≡ d mod 4n, and write u2 − d = 4nv for some integer v. Then with

f(X,Y ) = nX2 + uXY + vY 2, we see that the discriminant of f is u2−4nv = d and that f(1, 0) = n. Conversely, suppose that as2+bst+ct2 = n 2 with (s, t) = 1 and b − 4ac = d. Choose m1, m2 ∈ Z such that (m1, m2) = 1, m1m2 = 4n, and also (m1, t) = (m2, s) = 1. Note that we can always choose such m1, m2: for example,

Y ordp(4n) 4n m1 = p , m2 = . m1 p|s

Recalling from last lecture the identity 4af(x, y) = (2ax + by)2 − dy2, hence

2 2 −1 2 (2as + bt) − dt ≡ 0 mod m1 ⇔ d ≡ (2ast + b) mod m1,

−1 2 since (t, m1) = 1. A symmetric argument shows that d ≡ (2cts + b) mod m2, and since (m1, m2) = 1 the 2 Chinese remainder theorem implies that we have a solution to the congruence x ≡ d mod m1m2 ≡ d mod 4n, and we are done.  Corollary 1: Let d ≡ 0 or 1 mod 4, and let p be an odd prime. There exists a binary quadratic form of d discriminant d representing p if and only if p = 0 or 1. 2 d Proof: By Theorem 7.2.2 it suffices to show that x ≡ d mod 4p has a solution if and only if p = 0 or 1. 2 2 d Suppose x ≡ d mod 4p so that x ≡ d mod p; it follows that p = 0 or 1. d 2 Conversely, if p = 0 or 1, then we may write x ≡ d mod p, and since d is a square modulo 4 by assumption we have y2 ≡ d mod 4, and the Chinese remainder theorem completes the proof.  Thus we are led to investigate the set of all binary quadratic forms of a given discriminant.

43 Example: Determine all integers represented by f(X,Y ) = 53X2 + 152XY + 109Y 2. If we set y = 2u − 7v, x = −3u + 10v, then a calculation shows that f(x, y) = u2 + v2, and thus if n is represented by f, it is also represented by X2 + Y 2. Conversely if n is represented by this latter form, then n = u2 +v2 = f(−3u+10v, 2u−7v), and we see that both forms represent exactly the same set of integers. We can associate to any binary quadratic form f(X,Y ) = aX2 + bXY + cY 2 the 2 × 2 symmetric matrix  b  a 2 F = b , which has the property that 2 c

x ~xT F~x = f(x, y), ~x = , y

53 76  where AT denotes the matrix transpose. In our above example, F = is associated to f(X,Y ) = 76 109 1 0 53X2 + 152XY + 109Y 2, and G = is associated to g(X,Y ) = X2 + Y 2. 0 1 With this in mind, we write our change of variables from our example above as

x −3 10  u ~x = = =: M~u, y 2 −7 v hence f(x, y) = ~xT F~x = (M~u)T F (M~u) = ~uT (M T FM)~u, and indeed, M T FM = G.

44 8 Week Eight

8.1 Lecture Twenty

Recall from last lecture the binary quadratic forms

f(X,Y ) = 53X2 + 152XY + 109Y 2, g(X,Y ) = X2 + Y 2, with their associated matrices 53 76  1 0 F = and G = , 76 109 0 1 −3 10  a b respectively. We saw that M T FM = G, where M = . Recall that if A = , then 2 −7 c d

1  d −b 1  d −b A−1 = = . det A −c a ad − bc −c a

−7 −10 u x In our case, det M = 1 and so M −1 = ; however, we observe that if M = , then −2 −3 v y

u x −7x − 10y = M −1 = . v y −2x − 3y

Since f(−u, −v) = f(u, v) for any binary quadratic form, the negative signs in this matrix are of no concern. Thus we obtain F = (M −1)T GM −1, which combined with our previous relation G = M T FM implies that f and g represent exactly the same integers.

Definition: The modular group Γ is the set of all 2 × 2 matrices over Z with determinant 1, with the group operation being multiplication. −1 Also used to denote Γ are SL2(Z) and SL(2, Z). Since Γ is a group we have that M ∈ Γ ⇔ M ∈ Γ. Definition: Two binary quadratic forms f and g are called equivalent, denoted f ∼ g, if there exists some M ∈ Γ such that M T FM = G, where F and G are the associated matrices of f and g, respectively. a b It is easy to see that if f ∼ g with M tFM = G, M = , then f(ax + by, cx + dy) = g(x, y). In our c d previous example, we showed that 53X2 + 152XY + 109Y 2 ∼ X2 + Y 2. Remark: If M T FM = G, then (−M)T F (−M) = G. Thus we may take M or −M as we see fit, or equivalently choose a representative from PSL2(Z) = Γ/{±I}. Theorem 8.1.1 (Theorem 3.16, Niven) ∼ is an equivalence relation. Proof: Reflexivity is clear, as F = IT FI, as is symmetry by our remarks above, so it suffices to prove transitivity. Suppose f ∼ g, g ∼ h, and let M,N ∈ Γ be such that M T FM = G, N T GN = H. Then MN ∈ Γ and (MN)T F (MN) = H, so f ∼ h, and we are done.  2 2 b2 d Note that if f(X,Y ) = aX + bXY + cY has associated matrix F , then det F = ac − 4 = − 4 , where d is the discriminant of f. In particular, this means that if f ∼ g then their discriminants are equal. Indeed, in our perennial example f(X,Y ) = X2 + Y 2, it is not difficult to see that the discriminant of f is −4, as is the discriminant of g.

Theorem 8.1.2 (Theorem 3.17, Niven) Let f ∼ g be binary quadratic forms, and let n ∈ Z. Then:

45 1. The representations of n by f are in one-to-one correspondence with the representations of n by g. 2. The proper representations of n by f are in one-to-one correspondence with the proper representations of n by g. Proof: 1. If f(x, y) = n, then ~xT F~x = (n), and so with M T FM = G we have (M~x)T G(M~x) = (n). This process is invertible, whence we deduce the result. 2. In the calculation in the proof of the first statement, if m|x and m|y then m divides both entries of M~x, and conversely.  We seek to understand the structure of the equivalence classes of binary quadratic forms of discriminant d, which our work above shows to be partitioned by ∼. We begin by showing that every equivalence class contains a “nice” form; that is, roughly speaking, one in which b is the smallest coefficient in absolute value and c the largest. Definition: Let f(X,Y ) = aX2 + bXY + cY 2 be a binary quadratic form. Then f is said to be reduced if one of the following conditions hold: 1. −|a| < b ≤ |a| < |c|. 2. 0 ≤ b ≤ |a| = |c|.

46 8.2 Lecture Twenty-One

Recall from last time the notion of a reduced binary quadratic form; there is an algorithm for converting any given binary quadratic form f into an equivalent, reduced binary quadratic form. 2 2 Example: We will reduce f = f0(X,Y ) = 53X + 152XY + 109Y , which corresponds to the matrix F = 53 76  . For n ∈ , let 76 109 Z 1 n  0 1 T = ,S = . n 0 1 −1 0

We note that if F1 is defined via

1 −1T 53 76  1 −1 53 23 F = T T F T = = , 1 −1 0 −1 0 1 76 109 0 1 23 10

2 2 which corresponds to the form f1(X,Y ) = 53X + 46XY + 10Y . Next, we set

 0 1T 53 23  0 1  10 −23 F = ST F S = = , 2 1 −1 0 23 10 −1 0 −23 53

2 2 so that f2(X,Y ) = 10X − 46XY + 53Y . Continuing in this way, we set

1 2T  10 −23 1 2  10 −3 F = T T F T = = , 3 2 2 2 0 1 −23 53 0 1 −3 1

 0 1T  10 −3  0 1 1 3  F = ST F S = = , 4 3 −1 0 −3 1 −1 0 3 10

1 −3T 1 3  1 −3 1 0 F = T T F T = = . 5 −3 4 −3 0 1 3 10 0 1 0 1 −3 10  We see that f ∼ f and that f (X,Y ) = X2 + Y 2 is reduced. Thus, if M = T ST ST = , then 0 5 5 −1 2 −3 2 −7 t we have that M F0M = F5. Theorem 8.2.1 (Theorem 3.18, Niven) Let d ≡ 0 or 1 mod 4, with d not a perfect square. Then every equivalence class of binary quadratic forms of discriminant d contains a reduced form.

 bs  2 2 as 2 Proof: Let f0(X,Y ) = a0X + b0XY + c0Y have discriminant d, and for s ≥ 0 let Fs = bs , with Tn 2 cs and S as above. Define an algorithm via: T (A) If |cs| < |as|, set Fs+1 = T FsT so that as+1 = cs, cs+1 = as, bs+1 = −bs.

(B) If |as| ≤ |cs| but |bs| ∈/ (−|as|, |as|], then choose n ∈ Z so that 2asn + bs ∈ (−|as|, |as|]. Indeed, this choice is unique by the division algorithm, writing

|as| − bs = (2as)q + r; set n = q.

T Then set Fs+1 = Tn FsTn, so that

2 as+1 = as, bs+1 = 2asn + bs, cs+1 = asn + bsn + cs = fs(n, 1).

T (C) If |as| = |cs| but bs < 0, then set Fs+1 = S FsS.

47 We observe that if a binary quadratic form does not satisfy the premises of (A), (B), or (C), then it is reduced; thus it suffices to show that the algorithm terminates.

Since d is assumed not to be a perfect square we know that as 6= 0 for any s. We see that (A) is never followed by (A), nor (B) by (B), nor (C) by (C), and moreover since the output of (C) is reduced by construction it remains only to show that we cannot have an infinite loop (A) followed by (B) followed by (A), and so on. But this is clear, since every time we apply step (A), |as| decreases, and so the well-ordering axiom implies that the algorithm terminates. 

Note that if d is a perfect square, then applying the above algorithm may obtain as = 0, meaning that none of the steps (A), (B), or (C) is triggered unless as = bs = cs = 0. 2 Theorem 8.2.2 (Theorem 3.19, Niven) Let d ∈ Z with d not a perfect square, and let f(X,Y ) = aX +bXY + cY 2 be a reduced binary quadratic form of discriminant d. Then:

q d 1. If d > 0 then ac < 0 and 0 < |a| < 2 .

q |d| 2. If d < 0 then ac > 0 and 0 < |a| < 3 . It is an immediate consequence of this theorem that there are only finitely many equivalence classes of bi- nary quadratic forms of discriminant d, as there are only finitely many such reduced forms: indeed, we must have b2 − d 0 ≤ |b| ≤ |a| ≤ p|d|, c = . 4a

The proof will be given in the next lecture; today, we end with the following definition.

Definition: Let d ∈ Z with d not a perfect square. The number of equivalence classes of binary quadratic forms of discriminant d is called the class number of d and is denoted H(d).

48 8.3 Lecture Twenty-Two

Recall theorem 8.2.2 from last time. Today, we prove the second assertion of the theorem. Proof: (of Theorem 8.2.2, part (2)) Since d < 0 we know that ac > 0, as b2 − 4ac < 0, so in particular |a| > 0. Then |d| = −d = 4ac − b2 = 4|ac| − b2. Since f is reduced, we have that |b| ≤ |a| ≤ |c|, and so

4|ac| − b2 ≥ 4a2 − a2 = 3a2,

q |d| and we have that |a| ≤ 3 , as claimed.  Recall also the definition of the class number H(d) of d. Example: We compute H(−7). We proceed by listing all reduced binary quadratic forms of discriminant −7 2 2 and then checking whether any are equivalent.√ Theorem 8.2.2 shows that if f(X,Y ) = aX + bXY + cY is reduced of discriminant −7, then 0 < |a| ≤ 73 < 2, hence a = ±1. If |a| = |c| = 1 then we have −1 < b ≤ 1, and if |a| < |c| we have 0 ≤ b ≤ 1; that is, in both cases b ∈ {0, 1}. b2−d Calculating the possibilities for c = 4a yields the following table: a b c valid? 7 1 0 4 no 1 1 2 yes −7 −1 0 4 no −1 1 −2 yes

(where the last column indicates whether or not aX2 + bXY + cY 2 is a valid binary quadratic form). It follows from this that H(−7) ≤ 2. Since the discriminant is negative, it follows that both of the binary quadratic forms f(X,Y ) = X2 + XY + 2Y 2, g(X,Y ) = −X2 + XY − 2Y 2 are (positive or negative) definite, and a calculation shows that f(1, 1) = 4 > 0, g(1, 1) = −2. Thus f is positive definite, g is negative definite, and so in particular f 6∼ g and we have that H(−7) = 2. Note that for any binary quadratic form of discriminant d, we have that d = b2 − 4ac ≡ b2 mod 2, so b must have the same parity as d. Example: Which primes are represented by the reduced form f found in our example above? By theorem 7.2.2 we have that n is properly represented by some binary quadratic form of discriminant −7 if and only if there exists a solution to the congruence x2 ≡ −7 mod 4|n|. If n > 0, then x2 ≡ −7 mod 4n implies that n is properly represented by f, since f is the only positive definite reduced binary quadratic form of discriminant −7. Furthermore, if n = p is prime, then every representation of p is proper. For p = 2, take (x, y) = (0, 1) so that f(x, y) = 2. For odd p, we see that f represents p if and only if x2 ≡ −7 mod p has a solution, by the Chinese remainder theorem. If p = 7 this is clear; otherwise, −7 −17 p • If p ≡ 1 mod 4 then p = p p = 7 . −7 −17 p • If p ≡ 3 mod 4 then p = p p = 7 . The quadratic residues modulo 7 are 1, 2, and 4; thus p is represented by f if and only if p ≡ 0, 1, 2 or 4 mod 7.

49 Theorem 8.3.1 (Theorem 3.25, Niven) Let f(X,Y ) = aX2 + bXY + cY 2, g(X,Y ) = a0X2 + b0XY + c0Y 2 be reduced, positive definite binary quadratic forms. If f ∼ g, then f = g. Proof: Exercise. Consequently, if d < 0 then H(d) equals the number of reduced binary quadratic forms of discriminant d, which is twice the number of such positive definite forms. p [Aside: there is also the notion of the class number of a number field; when d < 0, the class number of Q( −|d|) 1 equals 2 H(d).]

50 9 Week Nine

9.1 Lecture Twenty-Three

Recall: Theorem 8.3.1 Can we “compose” two binary quadratic forms? We can generalize the multiplication formula

(a2 + b2)(c2 + d2) = (ac − bd)2 + (ad + bc)2.

Note that if z = a + ib, w = c + id are complex numbers, then the above formula states exactly that |z|2|w|2 = |zw|2. Thus, the binary quadratic form f(X,Y ) = X2 + Y 2 has a “composition law” given by

f(a, b)f(c, d) = f(ab − cd, ad + bc); in particular, this implies that the set of numbers represented by f is multiplicatively closed. Can we generalize this idea to arbitrary binary quadratic forms? Example: Let d = −7. We saw last week that the single equivalence class of positive definite binary quadratic forms of discriminant −7 is represented by the reduced form f(X,Y ) = X2 + XY + 2Y 2. We factor over the complex numbers, using the quadratic formula: √ ! √ ! 1 + i 7 1 − i 7 f(a, b) = a + b a + b . 2 2

Thus we are led to compute √ ! √ ! √ 1 + i 7 1 + i 7 1 + i 7 a + b c + d = (ac − 2bd) + (ad + bc + bd), 2 2 2 which implies f(a, b)f(c, d) = f(ac − 2bd, ad + bc + bd), and again we see that the set of represented values is multiplicatively closed. Example: Suppose d = −20. In assignment 4, we verify that there are exactly two positive definite reduced binary quadratic forms of discriminant −20, namely

2 2 2 2 f+(X,Y ) = X + 5Y , and f−(X,Y ) = 2X + 2XY + 3Y .

Observe that the set of values represented by f− is not multiplicatively closed, as indeed

f−(1, 0) = 2, f−(0, 1) = 3, but f−(x, y) 6= 6 for any x, y ∈ Z.

2 2 Indeed, we have the identity 4af−(x, y) = (2ax + by) − dy , hence

2 2 2 2 8f−(x, y) = (4x + 2y) + 20y ⇔ 2f−(x, y) = (2x + y) + 5y ,

2 2 and thus f−(x, y) = 6 implies that (2x + y) + 5y = 12, which is never satisfied, as can easily be verified by checking possible values of x and y. In particular, this means that there is no multiplicative formula (or “composition law”) for f− as there were for our previous examples.

Does such a formula exist for f+? The identity √ √ √ (a + i 5b)(c + i 5d) = (ac − 5bd) + i 5(ad + bc)

51 implies f+(a, b)f−(c, d) = f+(ac − 5bd, ad + bc).

We see that if we factor f− using the quadratic formula, we obtain √ ! √ ! √ ! √ ! 1 + i 5 1 − i 5 √ 1 + i 5 √ 1 + i 5 f (a, b) = 2 a + b a + b = 2a + b 2a + b . − 2 2 2 2

Calculating as before, we obtain √ ! √ ! √ 1 + i 5 √ 1 + i 5 √ 2a + b 2c + d = (2ac + ad − 2bd) + i 5(ad + bc + bd), 2 2 which implies f−(a, b)f−(c, d) = f+(2ac + ad − 2bd, ad + bc + bd).

What happens if we consider the product f+(a, b)f−(c, d)? The relevant calculation is √ ! √  √  √ 1 + i 5 √ 1 + i 5 a + i 5b 2c + d = 2(ac + 2bc − 3bd) + (ad + 2bc + bd), 2 2 hence f+(a, b)f−(c, d) = f−(ac + 2bc − 3bd, ad + 2bc + bd). Thus we have obtained the following “multiplication table”:

f+ f−

f+ f+ f− f− f− f+

The entries are understood to mean, for example, that the product of two numbers represented by f+ may also be represented by f+. In fact, this relation holds on the level of equivalence classes; that is, if f ∼ f+, g ∼ f−, then f(a, b)g(c, d) = h(x, y) for some x, y linear combinations of a, b, c, d, and h ∼ f−. In general, the set of equivalence classes of positive definite binary quadratic forms of negative discriminant is a group under the operation of “multiplication” alluded to above. This is known as the class group. This ends our discussion of binary quadratic forms; next, we will discuss arithmetic functions; that is, complex-valued functions whose domain is N.

52 9.2 Lecture Twenty-Four

§4.2 – Arithmetic functions Notation: Let τ(n) denote the number of positive divisors of n (also used is the notation d(n)).

e1 e2 ek Lemma 9.2.1 Let n have prime factorization n = p1 p2 ··· pk . Any integer d divides n if and only if d = s1 s2 sk p1 p2 ··· pk , with 0 ≤ sj ≤ ej for every j.

e1−s1 e2−s2 ek−sk Proof: Clearly, with n and d as above we see that n = d(p1 p2 ··· pk ). Conversely, if d|n and sj d n p 6= pj is prime with p|d, then p - n, a contradiction. Finally if sj > ej and pj |d, then pj| ej ; but pj - ej , a pj pj d n contradiction, hence ej - ej , if and only if d - n, and we are done. pj pj 

e1 e2 ek One consequence of this lemma is that if n = p1 p2 ··· pk , then

τ(n) = #{(s1, s2, . . . , sk) : 0 ≤ sj ≤ ej} = (1 + e1)(1 + e2) ··· (1 + ek), or more succinctly written, Y τ(n) = (α + 1). pαkn Proposition 9.2.2 If (m, n) = 1, then τ(mn) = τ(m)τ(n). This statement is false if (m, n) > 1; for example, τ(8) = 4 6= 6 = τ(2)τ(4). Proof: We give two sketches, left as exercises. 1. The assertion follows from the multiplicative formula found above. 2. Divisors d of n are in one-to-one correspondence with pairs of integers (d, e) where de = n.  Definition: An arithmetic function f : N → C which is not identically zero is called multiplicative if, whenever (m, n) = 1, we have f(mn) = f(m)f(n). Proposition 9.2.2 shows that τ(n) is multiplicative, and from previous work we know that φ(n) is also multi- plicative. Indeed, we used this property to prove the formula

Y  1 φ(n) = n 1 − . p p|n

A similar example is given by the function

σf (n) = #{x mod n : f(x) ≡ 0 mod n}, where f(X) ∈ Z[X]. The Chinese remainder theorem tells us that σf (n) is multiplicative, and indeed we observe that

φ(n) = σXφ(n)−1(n). Properties of multiplicative functions: Suppose f is a multiplicative function. • For every n, we have the formula Y f(n) = f(pα). pαkn

53 In particular, f is determined by its values on prime powers. Conversely, any set map

k f : {p : p prime, k ∈ N0} → C induces a multiplicative function. • f(1) = 1. Indeed, since there must be some n with f(n) 6= 0, we have f(n) = f(1 · n) = f(1)f(n). Definition: If an arithmetic function f, not identically zero, satisfies f(mn) = f(m)f(n) for every pair of numbers m, n, then f is said to be totally multiplicative (or completely multiplicative). Clearly, any totally multiplicative function is also multiplicative. λ Example: For any λ ∈ R, the function fλ(n) = n is totally multiplicative. In particular, when λ = 0 we have fλ = 1 for all n, and for λ = 1 we have fλ(n) = id(n) = n for every n. Example: The iota function ι(n), defined ( 1 if n = 1, ι(n) = 0 if n 6= 1, is totally multiplicative. Example: Let f(n) = (−1)n−1, so that f(n) = 1 if n is odd and −1 if n is even. Then f is not totally multiplicative, as for example f(8) = −1 6= 1 = f(2)f(4); ( 1 if p is odd, however, f(n) is multiplicative, and indeed f is induced by the map f(pα) = −1 if p = 2. Example: The function f(n) = (−1)n is not multiplicative, and so in particular is not totally multiplica- tive. Theorem 9.2.3 (Theorem 4.4, Niven) Let f(n) be a multiplicative function and let X F (n) = f(d). d|n Then F (n) is also multiplicative. Proof: As alluded to in the proof of proposition 9.2.2, divisors d of mn are in one-to-one correspondence with ordered pairs (b, c), with bc = d, b|m, c|n. Thus, if (m, n) = 1, we have X X X X X F (mn) = f(d) = f(bc) = f(b)f(c) d|mn b|m c|n b|m c|n     X X =  f(b)  f(c) = F (m)F (n), b|m c|n and we are done.  Example: Let f(n) = n0 = 1. Then X F (n) = f(n) = τ(n), d|n giving another proof of the fact that τ is multiplicative. Note that f is totally multiplicative, while F (n) is not.

54 9.3 Lecture Twenty-Five

Recall: Theorem 9.2.3. Motivating questions: P • Is the converse of theorem 9.2.3 true? That is, if F (n) = d|n f(d) is multiplicative, must f(n) also be multiplicative? • Given F (n), how can we get information about f(n)? P Remark: Given any arithmetic function F , there is exactly one function f so that F (n) = d|n f(d). Indeed, we set f(1) = 1 and recusively define the other values via X f(n) = F (n) − f(d). d|n, d

Example: We find the function f(n) satisfying ( X 1 if n = 1, f(d) = ι(n) = 0 if n > 1. d|n We calculate the first couple of values:

f(1) = 1, f(2) = F (2) − f(1) = 0 − 1 = −1.

Clearly, for any prime p we have

f(p) = F (p) − f(1) = −1, f(p2) = F (p2) − f(p) − f(1) = 0, and indeed f(pk) = 0 for k > 1. For composite numbers of the form pq where p, q are distinct primes, we have f(pq) = F (pq) − f(p) − f(q) − f(1) = 0 − (−1) − (−1) − 1 = 1 = f(p)f(q), while for n = p2q we have

f(p2q) = F (p2q) − f(p) − f(p2) − f(q) − f(pq) − f(1) = 0 = f(p2)f(q).

The above calculations suggest that f is multiplicative, which motivates the following definition. Definition: The M¨obiusfunction µ(n) is the multiplicative function satisfying, for every prime p, ( −1 if α = 1, µ(pα) = 0 if α > 1.

Equivalently: if n is not squarefree, then µ(n) = 0. Otherwise, writing n = p1p2 ··· pk with pj distinct primes, one has µ(n) = (−1)k. Notation: Denote by ω(n) the number of distinct prime divisors of n, and by Ω(n) the number of prime factors of n counted with multiplicity. For example, with n = 720 = 24 · 32 · 5, we have ω(n) = 3, Ω(n) = 4 + 2 + 1 = 5. With this notation, we may define ( (−1)ω(n) if n is squarefree, µ(n) = 0 otherwise.

55 Theorem 9.3.1 (Theorem 4.7, Niven) One has X µ(d) = ι(n). d|n

This theorem is much more widely invoked than is the definition of µ(n). Proof: We give two proofs. 1. Both sides of the equation are multiplicative by theorem 9.2.3, and we already know that both sides agree when n is a prime power, from which we deduce the result. 2. By definition, X X µ(d) = (−1)ω(d), d|n d|n, d squarefree k and so if ω(n) = k then there are exactly j squarefree divisors d of n with ω(d) = j. Thus

k ( X X k 1 if n = 1, µ(d) = (−1)j = (1 − 1)k = j 0 if n > 1, d|n j=0

and we are done.  Theorem 9.3.2 (Theorem 3.8, Niven; the M¨obiusinversion formula) Let f(n) be an arithmetic function and P let F (n) = d|n f(d). Then X n f(n) = µ(d)F . d d|n

For example, for any multiplicative function f(n), we have f(12) = F (12) − F (6) − F (4) + F (2). Proof: The right-hand side of the equation is X n X X X µ(d)F = µ(d) f(δ) = µ(d)f(δ) d n d|n d|n δ| d dδ|n X X X n = f(δ) µ(d) = f(δ)ι = f(n), n δ δ|n d| δ δ|n where we have used the result of theorem 9.3.1, and the result folllows. 

56 10 Week Ten

10.1 Lecture Twenty-Six

Recall: The M¨obiusinversion formula. Example: We have proven the identity X n = id(n) = φ(d), d|n and so M¨obius inversion implies that

X n X µ(d)n φ(n) = µ(d)id = ; d d d|n d|n that is, φ(n) X µ(d) = . n d d|n

µ(d) φ(n) Note that d is multiplicative, thus by theorem 9.2.3 we know that n is multiplicative. Indeed, checking on prime powers, we see for α ≥ 1 that

φ(pα) pα−1(p − 1) p − 1 1 = = = 1 − , pα pα p p and similarly

X µ(d) µ(1) µ(p) µ(p2) µ(pα) (−1) 1 = + + + ··· + = 1 + + 0 + ··· + 0 = 1 − . d 1 p p2 pα p p d|pα

Theorem 10.1.1 (Theorem 4.9, Niven) Let F (n) be an arithmetic function and define X n f(n) = µ(d)F . d d|n

Then X F (n) = f(d). d|n

Proof: We have   X X X d f(d) = µ(δ)F .  δ  d|n d|n δ|d d With d fixed, as δ ranges over the divisors of d, so does δ . Thus X X X d X X d f(d) = µ F (δ) = µ F (δ). δ δ d|n d|n δ|d δ|n d|δ

d  Writing d = δ δ , we have X X X d X n f(d) = F (δ) µ = F (δ)ι = F (n), δ δ d|n δ|n d n δ|n δ | δ

57 and we are done.  Definition: Let f(n), g(n) b two arithmetic functions. Their Dirichlet convolution, denoted f ∗ g, is defined X n (f ∗ g)(n) = f(d)g . d d|n

Note that Dirichlet convolution is commutative, as X n X n (g ∗ f)(n) = g(d)f = g f(d) = (f ∗ g)(n). d d d|n d|n

Example: If g(n) = 1 for every n, then X (f ∗ g)(n) = f(d). d|n (The function g is sometimes written 1.) In particular, this means that id = φ ∗ 1, ι = µ ∗ 1, and τ = 1∗ 1. With this notation, we may restate the M¨obius inversion formula as: F = f ∗ 1 if and only if f = F ∗ µ. Theorem 10.1.2 If f and g are multiplicative functions, then f ∗ g is multiplicative. Note that this theorem is a generalization of theorem 9.2.3. Proof: If (m, n) = 1, then X mn (f ∗ g)(mn) = f(d)g . d d|mn

For each divisor d of mn, we may uniquely factor d = d1d2 with d1|m and d2|n. Thus

X X  mn  X X  m   n  (f ∗ g)(mn) = f(d1d2)g = f(d1)g f(d2)g d1d2 d1 d2 d1|m d2|n d1|m d2|n     X  m  X  n  =  f(d1)g   f(d2)g  = (f ∗ g)(m)(f ∗ g)(n), d1 d2 d1|m d2|n as claimed.  × [Structural remarks: Let A = {f : N → C} be the set of arithmetic functions and let A = {f ∈ A : f(1) 6= 0}; then (A×, ∗) forms an abelian group. In this group, ι is the identity and 1−1 = µ, which yields yet another statement of the M¨obiusinversion formula:

F = f ∗ 1 ⇔ µ ∗ F = µ ∗ (f ∗ 1) = f ∗ (µ ∗ 1) = f ∗ ι = f.

Moreover, by theorem 10.1.2, the set of multiplicative functions forms a subgroup.] Example: Let ( 1 if n is a perfect square, s(n) = 0 otherwise; we will identify s ∗ (µ2).

58 Note that s is multiplicative, and is characterized by ( 1 if 2 | α, s(pα) = 0 if 2 - α.

Moreover, µ2 is multiplicative, as the product of two multiplicative functions; hence f = s ∗ (µ2) is also multiplicative. We compute:

X pα  f(pα) = s µ2(d) = s(pα)µ2(1) + s(pα−1)µ2(p) + ··· + s(1)µ2(pα) = s(pα) + s(pα−1) = 1. d d|pα

So f(pα) = 1 for every α ≥ 1, and it follows that s ∗ (µ2) = 1. Note that µ2 is the characteristic function of squarefree numbers, and indeed we see

2 X 2 2 (s ∗ µ )(n) = s(a)µ (b) = #{a, b ∈ N : ab = n, a = s some s, b squarefree } = 1. ab=n

0 2 0 2 3 4 Thus there is a unique way to factor any n ∈ N as n = n s where n is squarefree. For example, if n = 2·3 ·5 ·7 , we have n = (2 · 5)(3 · 5 · 72)2.

59 10.2 Lecture Twenty-Seven

Properties of M¨obiusinversion: • We do not assume multiplicativity of the functions; that is, the inversion formula holds for any arithmetic functions. X • If F (n) = f(d) and F (n) is multiplicative, then so is f(n), as f = F ∗ µ. d|n Recall: Dirichlet convolution. When n = pα is a prime power, then X n (f ∗ g)(pα) = f(d)g = f(1)g(pα) + f(p)g(pα−1) + ··· + f(pα)g(1). d d|pα

2 Let us assign names to these values, so that f(1) = a0, f(p) = a1, f(p ) = a2,..., and similarly g(1) = b0, g(p) = 2 b1, g(p ) = b2,... We obtain the following table:

α f(pα) g(pα)(f ∗ g)(pα) 0 a0 b0 a0b0 1 a1 b1 a0b1 + a1b0 2 a2 b2 a0b2 + a1b1 + a2b0 3 a3 b3 a0b3 + a1b2 + a2b1 + a3b0

We observe the similarity with the coefficients of the product of power series:

∞ ! ∞ ! ∞ X X X f(pα)Xα g(pα)Xα = (f ∗ g)(pα)Xα. α=0 α=0 α=0

Example: Find an arithmetic function f such that

φ(n) X = f(d), n d|n forgetting that we found it in the previous lecture. φ(n) Let F (n) = n , so that F = f ∗ 1. By M¨obiusinversion we know that f = F ∗ µ and that f is multiplicative, since F is. Thus we have a table as before: α F (pα) µ(pα) f(pα) 0 1 1 1 1 −1 1 1 − p −1 p 1 2 1 − p 0 0 1 3 1 − p 0 0 We see that f is the multiplicative function generated by ( −1 if α = 1, f(pα) = p 0 if α > 1.

µ(n) That is, f(n) = n , as before.

60 Example: Define a multiplicative function r via  2 if p ≡ 1 mod 4,  0 if p ≡ 3 mod 4, r(pα) = 1 if p = 2 and α = 1,  0 if p = 2 and α > 1.

Now, define R = r ∗ s, where s is the indicator function of the perfect squares from lecture twenty-six; note that R is multiplicative. Determine the values of R(pα). [Aside: Theorem 3.2.2 of Niven tells us that the number of proper representations of n by the binary quadratic form X2 + Y 2 equals 4r(n). In the statement of theorem 6.3.3 originally given, there was an error, in that we forgot the necessary condition that 4 - n. 2 2 x 2 y 2 n Note also that any representation x + y = n corresponds to a proper representation d + d = d2 , where 2 2 p d = (x, y). Thus if Sn denotes the set of representations of n by X + Y , and Sn ⊂ Sn denotes the subset of proper representations, then

X X  n  X n #S = #Sp = 4r = 4 r s(d) = 4(r ∗ s)(n) = 4R(n). n n/g2 g2 d g2|n g2|n d|n

Note in particular that Niven’s functions R and r correspond to our 4R and 4r, respectively.] First, we assume that p ≡ 1 mod 4. We get the table

α r(pα) s(pα) R(pα) 0 1 1 1 1 2 0 2 2 2 1 3 3 2 0 4 4 2 1 5 5 2 0 6

In fact, we can prove that R(pα) = α + 1 for any p ≡ 1 mod 4: if α is even then

α X X X α R(pα) = r(pj)s(pα−j) = r(1)s(pα) + r(pj) = 1 + 2 = 1 + 2 = α + 1. 2 j=0 1≤j≤α, 1≤j≤α, α even α even A similar proof works for α odd, and is left as an exercise. Now, suppose p ≡ 3 mod 4; we obtain

α r(pα) s(pα) R(pα) 0 1 1 1 1 0 0 0 2 0 1 1 3 0 0 0 4 0 1 1 5 0 0 0

61 On these primes, r acts like s, so the restriction of r ∗ s to the primes congruent to 3 modulo 4 is simply s. Finally, suppose p = 2; the table this time is

α r(pα) s(pα) R(pα) 0 1 1 1 1 1 0 1 2 0 1 1 3 0 0 1 4 0 1 1 5 0 0 1

On these prime powers, r acts like µ2, so R acts like µ2 ∗ s = 1. Thus we conclude that R is the multiplicative function generated by  α + 1 if p ≡ 1 mod 4,  1 if p ≡ 3 mod 4 and α is even, R(pα) = 0 if p ≡ 3 mod 4 and α is odd,  1 if p = 2. One consequence of this fact is that R(n) = 0, or

R(n) = #{d : d|n and p|d ⇒ p ≡ 1 mod 4}.

62 10.3 Lecture Twenty-Eight

Example: Let R(n) be the multiplicative function from the last lecture, generated by  α + 1 if p ≡ 1 mod 4,  1 if p ≡ 3 mod 4 and α is even, R(pα) = 0 if p ≡ 3 mod 4 and α is odd,  1 if p = 2.

X Find a function g such that R(n) = g(d). d|n nb. We defined X  n  X n R(n) = r = r s(d). g2 d g2|n d|n Note that, since R = g ∗ 1, the M¨obiusinversion formula implies that g = R ∗ µ, and since R and µ are both multiplicative, we know that g is as well. We observe that

X pα  g(pα) = R µ(d) = R(pα)µ(1) + R(pα−1)µ(p) + ··· + R(1)µ(pα) = R(pα) − R(pα−1). d d|pα

Thus: • If p ≡ 1 mod 4 then g(pα) = (α + 1) − α = 1. ( 1 − 0 = 1 if α is even, • If p ≡ 3 mod 4 then g(pα) = 0 − 1 = −1 if α is odd. • If p = 2 then g(pα) = 1 − 1 = 0. Remarks: • Since g(pα) = g(p)α for every prime p and positive integer α, it follows that g is totally multiplicative. −1 • On odd primes, g(p) equals the Legendre symbol p , and hence on odd n, g(n) equals the Jacobi symbol n−1 −1 2 n . Thus, for odd n, g(n) = (−1) . Consequently, X R(n) = g(d) = #{d|n : d ≡ 1 mod 4} − #{d|n : d ≡ 3 mod 4}. d|n

P Some miscellany: Recall that σ(n) = d|n d = 1∗id. The Greeks defined a perfect number to be a number n whose proper divisors sum to n itself; that is, a number satisfying

n = σ(n) − n ⇔ σ(n) = 2n.

For example, 6 is perfect, as 6 = 1 + 2 + 3, as is 28 = 1 + 2 + 4 + 7 + 14. The next perfect number is 496, then 8128. Note that σ(n) is multiplicative, and that

pα+1 − 1 σ(pα) = 1 + p + p2 + ··· + pα = . p − 1

63 We see equivalently that n is a perfect number if and only if

σ(n) Y pα+1 − 1 2 = = . n pα(p − 1) pαkn

Let us factor the first three perfect numbers:

6 = 2 · 3 = 21(22 − 1), 28 = 22 · 7 = 22(23 − 1), 496 = 24 · 31 = 24(25 − 1).

This motivates our next result. Theorem 10.3.1 If q = 2p − 1 is prime, then n = 2p−1q is a perfect number. Recall from a homework problem that if 2k − 1 is prime, then k must be prime, although this is not a sufficient condition as e.g. 211 − 1 = 2047 = 23 · 89. Proof: We give two. (1) By multiplicativity,

σ(2p−1q) = σ(2p−1)σ(q) = (2p − 1)(q + 1) = 2p(2p − 1) = 2(2p−1)(2p − 1) = 2(2p−1q), and we are done. (2) We simply verify that the divisors of 2p−1q, namely

1, 2, 22,..., 2p−1, q, 2q, 22q, . . . , 2p−1q, sum to 2(2p−1q).  We know exactly 48 numbers of this form, and note that all such numbers by construction are even. The following theorem gives the converse statement. Theorem 10.3.2 If n is an even perfect number, then n = 2p−1(2p − 1), where both p and 2p − 1 are prime. Proof: Write n = 2k−1m where k ≥ 2 and m odd. If n is perfect, then

2km = 2n = σ(n) = σ(2k−1)σ(m) = (2k − 1)σ(m).

Hence (2k − 1)|2km, so by Euclid’s lemma we have that (2k − 1)|m. Writing m = (2k − 1)l, we have 2kl = σ(m); but l and m are both divisors of m, so

σ(m) ≥ m + l = (2k − 1)l + l = 2kl.

Thus we have the equality 2km σ(m) = = 2kl = (2k − 1)l + l = m + l, 2k − 1 so m has exactly two divisors m and l, which are distinct because k ≥ 2, and we must have l = 1. It follows that m = 2k − 1 is prime.  Some open conjectures: 1. There are infinitely many Mersenne primes (that is, primes of the form 2p − 1 with p prime), and hence infinitely many even perfect numbers. 2. There are no odd perfect numbers.

64 11 Week Eleven

11.1 Lecture Twenty-Nine

Diophantine approximation is the technique of finding rational numbers near given real numbers. One fundamental fact of Diophantine approximation that we will use frequently is that, if n ∈ Z and n 6= 0, then |n| ≥ 1. Example: Define ∞ X 1 e = ; n! n=0 a we will prove that e is irrational. Indeed, assume not, and choose a, b ∈ Z, b > 0 such that e = b . Then be ∈ Z and so in particular b!e ∈ Z. Thus we define

b ∞ X b! X 1 m = b!e − = b! ∈ . n! n! Z n=0 n=b+1

Clearly m > 0, and moreover in the last sum we see that every term is at most half the previous term, thus

∞ ∞ X 1 X 1 1 2b! 2 m = b! < b! · = = ≤ 1. n! (b + 1)! 2n−(b+1) (b + 1)! b + 1 n=b+1 n=b+1

That is, m ∈ Z and 0 < m < 1, which is a contradiction. Thus e∈ / Q.  a c a c 1 Lemma 11.1.1 If b , d are distinct rational numbers, then b − d ≥ |bd| . Proof: This follows from the basic rules of arithmetic:

a b ad − bc 1 − = ≥ . c d bd |bd|

 Theorem 11.1.2 (Theorem 6.8, Niven; Dirichlet’s theorem on Diophantine approximation) Let x ∈ R, n ∈ N. a a 1 Then there exists b ∈ Q with 1 ≤ b ≤ n and |x − b | ≤ b(n+1) . a 1 1 nb. It is slightly easier to prove the bound |x − b | < bn or b(n−1) , but the inequality in the theorem statement c is the best possible result; indeed, we attain equality with x = n+1 , (c, n + 1) = 1. Proof: Define the fractional part of y to be {y} = y − byc ∈ [0, 1). Consider the n real numbers {x}, {2x},..., {nx} and the n + 1 subintervals

 1   1 2   n  0, , , ,..., , 1 , n + 1 n + 1 n + 1 n + 1

1 a bjxc whose disjoint union is [0, 1). If some {jx} ∈ [0, n+1 ), then let b = j ; we have

a jx bjxc {jx} 1 1 x − = − = < = . b j j j j(n + 1) b(n + 1)

65 n a bjxc+1 Similarly, if some {jx} ∈ [ n+1 , 1) then we may take b = j , and we have   1 a bjxc + 1 jx 1 − {jx} n+1 1 − x = − = < = . b j j j j b(n + 1) Finally, if neither of these cases occur, then by the pigeonhole principle there exists some subinterval containing 1 {jx} and {kx} with j < k (say), so that |{jx} − {kx}| < n+1 . Then, with a = bkxc − bjxc, b = k − j, we have   1 a (k − j)x bkxc − bjxc |{kx}{jx}| n+1 x − = − = < , b b b b b and we are done.  a a 1 Corollary 1: If x ∈ R \ Q, then there exist infinitely many b ∈ Q such that |x − b | < b2 . Proof: Theorem 11.1.2 gives, for every n ∈ , a rational number an with 1 ≤ b ≤ n and N bn n

an 1 1 0 < x − ≤ < 2 . bn bn(n + 1) bn Since x∈ / , we know that |x− an |= 6 0, so any given a can equal only finitely many of the terms an , since Q bn b bn

an lim x − = 0. n→∞ bn

 We may generalize lemma 11.1.1 as follows: a a  a  1 Lemma 11.1.3 Let p(X) ∈ Z[X] have degree d and let b ∈ Q. If p b 6= 0, then |p b | ≥ bd . d d−1 Proof: If p(X) = cdX + cd−1X + ··· + c1X + c0, where ci ∈ Z, cd 6= 0, then a bdp = c ad + c ad−1b + ··· + c abd−1 + c bd ∈ . b d d−1 1 0 Z

a  d a  Hence if p b 6= 0, then |b p b | ≥ 1, and the result is immediate.  Definition: Let α ∈ R. We say that α is algebraic of degree d if there exists an irreducible polynomial p(X) ∈ Z[X] such that p(α) = 0. If α is not algebraic, then α is said to be transcendental. √ For example, 2 is algebraic of degree 2, as it is a root of X2 − 2. Furthermore, α is algebraic of degree 1 if and only if α ∈ Q. Theorem 11.1.4 (Liouville’s theorem on Diophantine approximation) Let α be algebraic of degree d. Then a a there exists some constant C = C(α) > 0 such that, for any b ∈ Q, b 6= α, we have a C(α) α − ≥ . b bd

a a Proof: By taking C(α) ≤ 1 we may assume that b satisfies |α − b | ≤ 1. Choose p(X) ∈ Z[X] to be irreducible a  a  1 of degree d and such that p(α) = 0. Then we must have p b 6= 0 and so by lemma 11.1.3 that |p b | ≥ bd . But a a a p = p − p(α) = − α p0(t), b b b

66 a for some t between α and b , by the mean value theorem. Thus, taking 1 C(α) = , max{p0(t): t ∈ [α − 1, α + 1]} we obtain 1 a a a 1 ≤ p = − α p0(t) ≤ − α · , bd b b b C(α) and we are done.  It was using this theorem that Liouville first demonstrated (1844) the existence of transcendental numbers. This work preceded by several decades Cantor’s investigation of uncountable sets, which yields a simpler albeit non-constructive proof of the existence of transcendental numbers.

67 11.2 Lecture Thirty

Recall: Theorem 11.1.2. It is a trivial consequence of this theorem that the number

∞ X α = 10−n! = 0.11000100 ... n=1 is transcendental. Indeed, define k ak X = 10−n!, b k n=1 k! so that bk = 10 and thus ∞ ak X −n! α − = 10 . bk n=k+1 We note that each summand is at most half the previous one, thus ∞ ∞ ak X X 1 2 α − = 10−n! ≤ 10−(k+1)! = . n−(k+1) (k+1)! bk 2 10 n=k+1 n=k+1

If α were algebraic of degree d, then for some constant C(α) > 0 we would have

C(α) ak 2 ≤ α − ≤ , d b k+1 bk k bk

k+1−d 2 and thus bk ≤ C(α) . Taking k → ∞ yields a contradiction, and so we see that α cannot be algebraic. a a 1 Recall: Last lecture we showed that for all α ∈ R\Q there are infinitely many b ∈ Q such that |α− b | < b2 . Theorem 11.2.1 (Roth’s theorem) If α is algebraic, then for any  > 0 there exists some constant C = C(α, ) such that a C(α, ) a α − ≥ , for all ∈ . b b2+ b Q

§6.1 – Farey sequences a Given n ∈ N, the Farey fractions of order n are those b ∈ Q such that 1 ≤ b ≤ n and 0 ≤ a ≤ b; that is, a F = { : 1 ≤ b ≤ n, 0 ≤ a ≤ b} ⊂ ∩ [0, 1]. n b Q Usually the set is thought of as being totally-ordered. For example,

0 1 1 1 2 1 3 2 3 4  F = , , , , , , , , , , 1 . 5 1 5 4 3 5 2 5 3 4 5

If we know the first few elements of Fn, how can we compute the next? a a x −1 Proposition 11.2.2 Let b ∈ Fn with a 6= b. The next element of Fn after b is y , where y ≡ −a mod b, n − ay+1 b < y ≤ n, and x = b .

68 −1 Proof: Since ay + 1 ≡ a(−a ) + 1 ≡ 0 mod b, we know that x ∈ Z. Moreover since y ≤ n and 1 ≤ y(b − a), we know x ay + 1 by = ≤ = 1, y by by x c a c x and thus y ∈ Fn. Now, suppose d ∈ Fn with b < d < y . Then

x c   c a bx − ay 1 − + − = = . y d d b yb yb

But by lemma 11.1.1, we know that

x c   c a 1 1 y + b n + 1 1 n + 1 1 − + − ≥ + = ≥ ≥ · > , y d d b yd db ybd ybd yb n yb which is a contradiction, and we are done.  a x Corollary 1: If b < y are consecutive Farey fractions (for any fixed n), then xb − ay = 1. a c x c a+x Corollary 2: If b < d < y are consecutive Farey fractions, then d = b+y . For example, 0 1 1 1 2 3  F = , , , , , , 1 . 4 1 4 3 2 3 4

The fractions of F5 \F4 are exactly 1 0 + 1 2 1 + 1 3 1 + 2 4 3 + 1 = , = , = , = , 5 1 + 4 5 3 + 2 5 2 + 3 5 4 + 1 which are seen to lie in the respective intervals

0 1 1 1 1 2 3 1 , , , , , , , . 1 4 3 2 2 3 4 1

Next lecture, we will use the Farey fractions to give an alternate proof of Dirichlet’s theorem.

69 11.3 Lecture Thirty-One

b c In the Farey fractions Fn of order n, we have that if r < s are consecutive, then b b + c c rc − sb = 1 and < < with r + s ≥ n + 1. r r + s s Indeed, the condition r + s ≥ n + 1 is necessary for our second result, otherwise the middle fraction is itself a Farey fraction, a contradiction.

Recall: Dirichlet’s theorem on Diophantine approximation (theorem 11.1.2), which states that if x ∈ R, n ∈ N, a a 1 then there exists q ∈ Q with 1 ≤ q ≤ n and |x − q | ≤ q(n+1) . a b c Proof: If α ∈ Fn, then take q = α. Otherwise, choose r < s to be consecutive in Fn such that b c < α < , r s by replacing α with {α} if necessary. We now have two cases. 1. Suppose b b + c < α ≤ , r r + s a b and take q = r . We have

b b + c b cr − bs 1 1 α − ≤ − = = ≤ , r r + s r r(r + s) r(r + s) r(n + 1) and by assumption 1 ≤ r ≤ n. 2. If instead we have b + c c ≤ α < , r + s s a c we instead take q = s , and the proof unfolds in the same way.  §7.1 – The Euclidean algorithm We can think of continued fractions as a consequence of the Euclidean algorithm. Example: We find (76, 26). Simple calculation shows

73 = 2 · 26 + 21, 26 = 1 · 21 + 5, 21 = 4 · 5 + 1, 5 = 5 · 1 + 0.

Note also that 73 21 1 1 = 2 + = 2 + = 2 + 5 . 26 26 (26/21) 1 + 21 Continuing in this fashion, we have

73 1 1 = 2 + = 2 + . 26 5 1 1 + 1 + 21 1 4 + 5

70 This is an example of the type of expression we will now study. Definition:A continued fraction is an expression of the form

1 x0 + , 1 x1 + 1 x2 + . 1 .. + xj where xi ∈ R and x0, x1, . . . , xj > 0; we will mostly be interested in the situation when xi ∈ Z for every i. We have the shorthand notation hx0; x1, x2 . . . , xji. For example,

76  26  21 = 2; = 2; 1, = h2; 1, 4, 5i . 23 21 5

Example: Find a simple expression for h1; 3, 1, 5, xi as a function of x > 0. We have

1 1 1 6x + 1 29x + 5 h1; 3, 1, 5, xi = 1 + = 1 + = 1 + = 1 + = . 1 1 5x + 1 23x + 4 23x + 4 3 + 3 + 3 + 1 x 6x + 1 1 + 1 + 1 5x + 1 5 + x We may write the above calculation more compactly as

 5x + 1  6x + 1  23x + 4 29x + 5 h1; 3, 1, 5, xi = 1; 3, 1, = 1; 3, = 1; = . x 5x + 1 6x + 1 23x + 4

Some useful identities: 1 • hx0; x1, x2, . . . , xji = x0 + . hx1;x2,x3,...,xj i D E • hx ; x , x , . . . , x i = x ; x , x , . . . , x , x + 1 . 0 1 2 j 0 1 2 j−2 j−1 xj 14 73 Example: We find a fraction between 5 = 2.8 and 26 = 2.8076923, with minimal denominator. Note that 14 76 5 = h2; 1, 4i and 23 = h2; 1, 4, 5i. The function x 7→ h2; 1, 4, xi for x > 0 is a decreasing function of x and satisfies 73 14 f(5) = , lim f(x) = . 26 x→∞ 5 Thus taking x = 6 we have 87 f(6) = h2; 1, 4, 6i = = 2.8064 ... 31 14+73 14 73 It is no coincidence that this is the Farey mediant 5+26 of 5 and 26 in F31. It is not difficult to see that f(x0, x1, . . . , xk) = hx0; x1, x2, . . . , xki is an increasing function of xj for every even j and a decreasing function of xj for every odd j. Thus if ai, bi ∈ Z, we have that ha0; a1, a2, . . . , aki < hb0; b1, b2, . . . , bki if and only if

71 • a0 < b0, or

• a0 = b0 and a1 > b1, or

• a0 = b0 and a1 = b1 and a2 < b2, or ... Thus we have an alternating lexicographic ordering on the integral continued fractions. To compare ha0; a1, a2, . . . , aki to ha0; a1, a2, . . . , ali with k < l, we write, formally,

ha0; a1, a2, . . . , aki = ha0; a1, a2, . . . , ak, ∞i .

Finally since we may always write, for example, 1 4 = 3 + ⇒ h2; 1, 4i = h2; 1, 3, 1i , 1 we remark on the special case

ha0; a1, a2, . . . , aki = ha0; a1, a2, . . . , ak − 1, 1i .

Notation: For the Euclidean algorithm applied to the pair (u0, u1), we write

u0 = u1a0 + u2, 0 < u2 < u1,

u1 = u2a1 + u3, 0 < u3 < u2, . .

uk−1 = ukak−1 + uk+1, 0 < uk+1 < uk,

uk = uk+1ak + uk+2, 0 = uk+2 < uk+1.

We call the ai coefficients partial quotients. We have equivalently   u0 1 u0 = a0 + , a0 = , u1 u1/u2 u1   u1 1 u1 = a1 + , a1 = , u2 u2/u3 u2 . .   u0 uk uk = ak, ak = = . u1 uk+1 uk+1 Similarly, we have for example u 1 1 1 = = − a . n o u0 0 u u0 2 u1 u1

72 12 Week Twelve

12.1 Lecture Thirty-Two

The Process: Given ξ ∈ R, define ξ0 = ξ and set 1 1 a0 = bξ0c, ξ1 = = , ξ0 − a0 {ξ0} 1 1 a1 = bξ1c, ξ2 = = , ξ1 − a1 {ξ1} m and so on. We saw in our last lecture that if ξ = n , then The Process is exactly the Euclidean algorithm applied to find (m, n); in particular, The Process eventually terminates. Conversely, if ξ ∈ R \ Q, then The Process never terminates. Furthermore, we see that

ξ = hξi = ha0; ξ1i = ha0; a1, ξ2i = ···

The numbers aj are called the partial quotients of ξ. √ 3 Example: Let ξ = 2 = 1.25992 ... We have ξ0 = ξ, and

√3 1 a0 = b 2c = 1, ξ1 = = 3.84732 ... ξ0 − 1 1 a1 = bξ1c = 3, ξ2 = = 1.18019 ... ξ1 − 3 1 a2 = bξ2c = 1, ξ3 = = 5.54974 ... ξ2 − 1 1 a3 = bξ3c = 5, ξ4 = = 1.81905 ... ξ3 − 5 We have that √ 3 29ξ4 + 5 2 = h1; 3, 1, 5, ξ4i = ; 23ξ4 + 4 solving this expression for ξ4, we obtain √ 4 3 2 − 5 ξ4 = √ . −23 3 2 + 29

Definition: Given a0 ∈ Z, a1, a2 ∈ N, define recursively the sequences

h−2 = 0, h−1 = 1, hj = ajhj−1 + hj−2 for j ≥ 0,

k−2 = 1, k−1 = 0, kj = ajkj−1 + kj−2 for j ≥ 0.

hj Furthermore for j ≥ 0 define rj = ; if the coefficients aj are those found in The Process applied to ξ ∈ R, kj √ 3 then rj is called the jth convergent to ξ. Continuing from our last example, the partial quotients of 2 are 1, 3, 1, 5,... We have the following table:

j aj hj kj rj −2 0 1 0 −1 1 0 ∞ 0 1 1 1 1 4 1 3 4 3 3 5 2 1 5 4 4 29 3 5 29 23 23

73 Note that r0 = 1, r1√= 1.3333 . . . , r2 = 1.25, r3 = 1.26087 ..., so that the convergents are indeed good rational approximations to 3 2 = 1.25992 .... Theorem 12.1.1 (Theorem 7.3, Niven) For any x > 0, we have that

xhj−1 + hj−2 ha0; a1, a2, . . . , aj−1, xi = . xkj−1 + kj−2 In particular, ajhj−1 + hj−2 hj ha0; a1, a2, . . . , aj−1, aji = = . ajkj−1 + kj−2 kj

x·1+0 Proof: We use induction. In the j = 0 case we have that hxi = 0·x+1 which is clearly so, and thus we may assume the claim holds up to j. We have

1 1 (aj + x )hj−1 + hj−2 ha0; a1, a2, . . . , aj, xi = ha0; a1, a2, . . . , aj−1, aj + i = 1 x (aj + x )kj−1 + kj−2 (a h + h )x + h xh + h = j j−1 j−2 j−1 = j j−1 . (ajkj−1 + kj−2)x + kj−1 xkj + kj−1 

Example: Suppose aj = 1 for all j ≥ 0. Then hj = Fj+2, kj = Fj+1, where Fn are the Fibonacci numbers Fn = Fn−1 + Fn−2 normalized so that F0 = 0,F1 = 1. In particular,

F j→∞ h1; 1, 1,..., 1i = j+1 −→ ϕ, | {z } Fj j copies √ 1+ 5 where ϕ = 2 = 1.618033 ... is the golden ratio. j−1 Theorem 12.1.2 (Theorem 7.5, Niven) For j ≥ −1 one has hjkj−1 − kjhj−1 = (−1) . In particular, this means that (hj, kj) = 1 for every j and that

(−1)j−1 rj − rj−1 = . kjkj−1

Proof: Exercise. (hint: use induction)

From the last equation, we know that rj > rj−1 if and only if j is odd.

Theorem 12.1.3 (Convergence of convergents) Let ξ ∈ R and let a0, a1, a2,... be its partial quotients, with ξj, hj, kj, rj defined as above. Then (−1)j ξ − rj = , kj(ξj+1kj + kj−1) and in particular lim rj = ξ. j→∞ Proof: We apply theorems 12.1.1 and 12.1.2 to obtain

ξj+1hj + hj−1 hj ξ − rj = ha0; a1, a2, . . . , aj, ξj+1i − rj = − ξj+1kj + kj−1 kj

h k − h k (−1)j = j−1 j j j−1 = , kj(ξj+1kj + kj−1 kj(ξj+1kj + kj−1)

74 and we are done. 

Note that aj+1 ≤ ξj+1 < aj+1 + 1. Given n ∈ N, then choosing j so that kj ≤ n < kj+1, then we can show that hj 1 ξ − ≤ . kj kj(n + 1)

Thus every convergent rj confirms Dirichlet’s theorem on Diophantine approximation. We may also restate the theorem thus: hj 1 1 kj−1 ξ − = 2 · , where aj+1 ≤ ξj+1 + ≤ aj+1 + 2. kj kj ξj+1 + kj−1/kj kj

Hence, the greater aj+1, the better the approximation rj = ha0; a1, a2, . . . , aji is to ξ.

75 12.2 Lecture Thirty-Three

Recall: Theorem 12.1.1 tells us that ξ h + h ξ = j j−1 j−2 , ξjkj−1 + kj−2 from which it follows that ξkj−2 − hj−2 ξj = . −ξkj−1 + hj−1 √ Example: Let ξ = ξ0 = 41 = 6.4312 ... We see that √ 1 a0 = b 41c = 6, ξ1 = = 2.48062 ... ξ0 − 6 1 a1 = bξ1c = 2, ξ2 = = 2.08062 ... ξ1 − 2 1 a2 = bξ2c = 2, ξ3 = = 12.40312 ... ξ2 − 2 We have the table: j aj hj kj −2 0 1 −1 1 0 0 6 6 1 1 2 13 2 2 2 32 5 Thus ξk−1 − h−1 1 ξ1 = = √ , −ξk0 + h0 41 − 6 √ ξk0 − h0 41 − 6 ξ2 = = √ , −ξk1 + h1 −2 41 + 13 √ ξk1 − h1 2 41 − 13 ξ3 = = √ . −ξk2 + h2 −5 41 + 32 Rationalizing denominators, we obtain √ √ 1 41 + 6 41 + 6 ξ1 = √ · √ = , 41 − 6 41 + 6 5 √ √ √ 41 − 6 2 41 + 13 4 + 41 ξ2 = √ · √ = , −2 41 + 13 2 41 + 13 5 √ √ 2 41 − 13 5 41 + 32 √ ξ3 = √ · √ = 6 + 41. −5 41 + 32 5 41 + 32 √ √ We see that 41 = h6; 2, 2, 6 + 41i, hence √ √ √ 6 + 41 = h12; 2, 2, 6, 6 + 41i = h12; 2, 2, 12, 2, 2, 6 + 41i = ··· √ √ Thus 41 = h6; 2, 2, 12i; that is, 41 has a periodic continued fraction.

Lemma 12.2.1 If the continued fraction of ξ ∈ R is eventually periodic, then ξ is a quadratic irrational, i.e. it is the root of some quadratic polynomial with integer coefficients.

76 Proof: For simplicity we will assume that the continued fraction is purely periodic, although the stronger claim is true; that is, assume ξ = ha0; a1, a2, . . . , aj−1i. Then ξhj−1 + hj−2 ξ = ha0; a1, a2, . . . , aj−1, ξi = , ξkj−1 + kj−2 hence ξ(ξkj−1 + kj−2) = ξhj−1hj−2, and so

2 kj−1ξ + (kj−2 + hj−1)ξ − hj−2 = 0.

 √ Lemma 12.2.2 Every real quadratic irrational√ r + s c, where r, s ∈ Q and c ∈ N is not a perfect square 2 m+ d 2 2 (written c ∈ N \ N ) can be written q , where m, q ∈ Z, d ∈ N \ N , and q|(d − m ). Proof: Taking a common denominator for r and s, we may write √ √ √ √ a + b c a + cb2 ae + cb2e2 r + s c = = = , e e e2 and the claim is now immediate.  √ The Quadratic Irrational Process: Let ξ = ξ = m0+ d , where d, m , and q satisfy the conditions of 0 q0 0 0 lemma 12.2.2. For j ≥ 0, define

2 √ d − mj+1 mj+1 + d aj = bξjc, mj+1 = ajqj − mj, qj+1 = , ξj+1 = . qj qj+1

The aj and ξj so produced are the same as those produced in The Process. √ Example: ξ = ξ0 = 41, so that m0 = 0, d = 41, q0 = 1. √ j√ k 41 − 62 6 + 41 j = 0 : a = 41 = 6, m = 6 · 1 − 0 = 6, q = = 5, ξ = . 0 1 1 1 1 1 $ √ % √ 6 + 41 41 − 42 4 + 41 j = 1 : a = = 2, m = 2 · 5 − 6 = 4, q = = 5, ξ = . 1 5 2 2 5 2 5 $ √ % √ 4 + 41 41 − 62 6 + 41 j = 2 : a = = 2, m = 2 · 5 − 4 = 6, q = = 1, ξ = . 2 5 3 2 5 2 1

Theorem 12.2.3 (Theorem 7.19, Niven) Given a quadratic irrational ξ0, we have:

1. The qj from The Quadratic Irrational Process are integers which are eventually positive.

2. The qj and the mj are bounded.

3. The continued fraction for ξ0 is eventually periodic. √ 1 3 Example: The quadratic irrational − 2 − 4 5 has continued fraction h−3; 1, 4, 4, 1, 1, 1, 5, 3, 5i. 2 Proof: (sketch) (1) ⇒ (2): Since qj > 0 for all j sufficiently large, and qj+1 + qj + mj = d, we see that there are only finitely many choices for the qj, mj.

(2) ⇒ (3) There are only finitely many pairs (mj, qj), and so by the pigeonhole principle there must eventually occur a duplicate. The pair (mj, qj) determines the values for the next step of The Quadratic Irrational Process.

77 (3) ⇒ (1) Highly nontrivial, and omitted.  √ 2 Theorem 12.2.4 (Theorem 7.21, Niven) Let d ∈ N \ N and set c = d. Then bcc + c has a purely periodic continued fraction ha0; a1, a2, . . . , ar−1i with a0 = 2c. Hence c = hc; a1, a2, . . . , ari where ar = 2c. √ We refer to our earlier example, where we found that 6 + 41 has a purely periodic continued fraction. Proof: (omitted) √ Facts: If ξ = d and qj are defined as above, then:

• For every j we have qj 6= −1.

• If r is the period of the continued fraction of ξ, then qj = 1 if and only if r | j.

78 12.3 Lecture Thirty-Four

Notation: Throughout this lecture, d denotes a positive√ integer that is not a perfect square. The symbols aj, hj, kj denote the terms from The Process applied to d, and similarly for mj, qj. 2 2 Pell’s equation: We are interested in integer solutions to the equation x − dy = N for some fixed N ∈ Z; in particular, we seek solutions where both x and y are positive. √ Theorem 12.3.1 (Theorem 7.24, Niven)√ If |N| < d, then for any positive solution (x, y) to Pell’s equation x we must have that y is a convergent to d. In particular, if (x, y) = 1 then we must have that x = hj and y = kj for some j. Proof: (omitted) √ Example: Every solution of x2 − 41y2 = −1 must come from a convergent of 41. We saw in our last lecture that in this case h2 = 32, k2 = 5, and indeed

(32)2 − 41(5)2 = 1024 − 1025 = −1.

2 2 j+1 Theorem 7.22 of Niven gives us the following key identity: for j ≥ −1, one has hj − dkj = (−1) qj+1. At the√ end of our last lecture we saw that qj = 1 if and only if r|j, where r is the period of the continued fraction of d. It is a corollary (Corollary 7.23) that, for every l ≥ 0, we have

2 2 lr hlr−1 − dklr−1 = (−1) .

Example: We solve Pell’s equation for d = 45. We have √ 45 = h6; 1, 2, 2, 2, 1, 12i, so r = 6. Then with l = 1, we have by corollary 7.23 that

2 2 h5 = 161, k5 = 24, hence 161 − 45(24) = q6 = 1.

So a solution to x2 − 45y2 = 1 is x = 161, y = 24. Note that

h5 = r5 = h6; 1, 2, 2, 2, 1i. k5 Another solution is given by l = 2; we have

h11 51841 = r11 = h6; 1, 2, 2, 2, 1, 12, 1, 2, 2, 2, 1i = , k11 7728 and indeed we have that 518412 − 45(7728)2 = 1. 2 2 Theorem 12.3.2 (Theorem 7.25, Niven) All solutions to x −√dy = ±1 are of the form x = hlr−1, y = klr−1, where l ≥ 0 and r is the period of the continued fraction of d. Furthermore if r is even then there are no 2 2 2 2 positive solutions to x − dy = −1, and the positive solutions to x − dy = 1 are exactly x = hlr−1, y = klr−1 2 2 with l ≥ 1; if r is odd, then the positive solutions to x − dy = −1 are exactly x = hlr−1, y = klr−1 where l is 2 2 odd and positive, and the positive solutions to x − dy = 1 are exactly x = hlr−1, y = klr−1 where l is even and positive. In every case, y = y(l) is a strictly increasing function of l. This is the main important result of our foregoing work. Remark: Suppose s2 − dt2 = A, u2 − dv2 = B. Factoring over the reals gives √ √ √ √ A = (s − t d)(s + t d),B = (u − v d)(u + v d),

79 from which it follows that √ √ AB = ((su + dtv) − d(sv + tu))((su + dtv) + d(sv + tu)) = (su + dtv)2 − d(sv + tu)2. √ In particular, if A = 1, then we get new solutions to the equation x2 − dy2 = A by considering (s + t d)l with l ≥ 2. Example: Suppose d = 45. Set s = 161, t = 24 so that s2 − dt2 = 1. We have √ √ √ √ (161 + 24 45)2 = 51841 + 7728 45, (161 + 24 45)3 = 16, 692, 641 + 2, 488, 392 45, and indeed 2 2 16, 692, 641 − 45 · 2, 488, 392 = 1, h17 = 16, 692, 641, k17 = 2, 488, 392.

Proof: (omitted)

Theorem 12.3.3√ (Theorem 7.26, Niven) Set x1 = hr−1, y1 = kr−1, where r is the period of the continued fraction of d. Define xl, yl recursively via √ √ l xl + yl d = (x1 + y1 d) .

Then xl = hlr−1 and yl = klr−1. Proof: (omitted) 2 2 Theorems 12.3.2 and 12.3.3 together tell us that the smallest (in terms of y) solution to x − dy = ±√1 is given by x1 = hr−1, y1 = kr−1, and moreover that all solutions may be found by taking exponents of x1 + y1 d. 2 2 Example: Suppose d = 41; then the smallest positive solution to x − 41y = −1 is x1 = h2 = 32, y1 = k2 = 5. Thus √ √ √ 2 x2 + y2 41 = (32 + 5 d) = 2049 + 320 41. By theorem 12.3.3, (2049, 320) is the smallest positive solution to x2 − 41y2 = 1.

80 13 Week Thirteen

13.1 Lecture Thirty-Five

Miscellany about continued fractions: Given an arbitrary continued fraction, must it correspond to a real number? Let a0 ∈ Z, a1, a2,... ∈ N, and define

L = ha0; a1, a2,...i = lim ha0; a1, a2, . . . , ani. n→∞ Theorem 13.1.1 The limit L always exists and is irrational. Moreover, the partial quotients of L are exactly a0, a1, a2,...

Recall: If rn denotes the nth convergent of L, we have rn = ha0; a1, . . . , ani and moreover (−1)n−1 rn − rn−1 = . knkn−1 This implies that the convergents oscillate around L. Indeed, define α = 1 so that n knkn−1

n X j−1 rn = a0 + (−1) αj; j=1 as a decreasing, alternating series, we know that this series converges and thus that the convergents also converge. Example: Define x = h1; 1, 1,...i so that x = 1 + 1 . This yields the quadratic equation x2 − x − 1 = 0 and √ x 1+ 5 since x > 0 we deduce that x = 2 = ϕ, as introduced in lecture thirty-two. With the Fibonacci numbers as defined there, we have 1 n n Fn = √ (ϕ − (−ϕ) ), and m|n ⇒ Fm|Fn. 5

Definition: A real number is called simply normal in base-10 if, for every i ∈ {0, 1,..., 9}, the probability of randomly selecting an i in its decimal expansion is 0.1. There is an analogous definition for simple normality in base-b. A real number is normal base-b if it is simply normal base-b, base-b2, base-b3, and so on. For example, 0.0123456789 is simply normal base-10, but not normal. Theorem 13.1.2 Almost all real numbers are normal base-10. Champernowne’s number: Let c = 0.12345678910111213 ... D.G. Champernowne showed in 1933 that c is normal base-10.

It is conjectured that the following numbers are normal: π, e, log 2, and any q ∈ Q of degree at least 3. It is a trivial consequence of theorem 13.1.2 that almost all real numbers are normal in every base simultane- ously.

Back to continued fractions: given ξ ∈ R, define

#{n ≤ x : an = k} δk(ξ) = lim . x→∞ x 1 Aleksandr Khinchin showed that, for almost all ξ ∈ R, δk(ξ) exists and equals log2(1 + k(k+2) ), thus

δ1 ≈ 0.415, δ2 ≈ 0.170, δ3 ≈ 0.093,...

81 One number which fails this test is

e = h2; 1, 2, 1, 1, 4, 1, 1, 6, 1, 1, 8, 1, 1, 10, 1,...i

me+n Furthermore, any number of the form re+s also fails Khinchin’s theorem. It is conjectured that the following numbers satisfy Khinchin’s theorem: π, e, log 2, and any q ∈ Q of degree at least 3. Khinchin also proved (1934) that, for almost all ξ ∈ R, one has

∞  log2 k 1/n Y 1 lim (a1a2 ··· an) = 1 + = 2.6854520010 ... n→∞ k(k + 2) k=1 Theorem 13.1.3 (Theorem 7.17, Niven) For all ξ ∈ \ , there exist infinitely many a ∈ such that √ R Q b Q |ξ − a | < √1 , and moreover 5 is the best possible such bound. b 5b2 √ By discarding the (countable) set of real numbers ξ for which the bound 5 is necessary, we may improve the √ √ √ 221 1517 bound to 8; repeating this process we obtain bounds of 5 , 13 ,... These numbers arise naturally in the study of the Markov spectrum. a 1 a Theorem 13.1.4 (Theorem 7.14, Niven) If |ξ − b | < 2b2 , then b is a convergent to ξ.

82 13.2 Lecture Thirty-Six

Numerical examples of continued fractions Let y = 365.242199 ... be the number of solar days in a year; it has been a challenge for centuries to construct a calendar which takes into account this lack of integrality. Numa Pompilius devised a calendar (ca. 713 BCE) in which occasional and irregular leap months would be added into the middle of February. Julius Caesar (48 BCE) devised the Julian calendar, in which every year has 365 days, except for every fourth year which has 366. While divergence from the true count is slow in the Julian calendar (amounting to about 11 days over 1800 years) it is noticeable; in 1582, Pope Gregory XIII introduced the Gregorian calendar as a replacement. In this calendar, every year divisible by 4 is a leap year, except years divisible by 100 and not 400. This is the most widely-used calendar in contemporary Western society; it averages 365.2425 days per year, and so diverges by about 3 days every 10,000 years. The continued fraction of y is h365; 4, 7, 1, 3, 5, 20,...i, and the convergents to y − 365 are 1 7 8 31 163 , , , , ,... 4 29 33 128 673 To get a good rational approximation, we need to truncate before a large partial quotient. Using the convergent 31 128 , we might say that we have a leap year every year which is divisible by 4, except years that are divisible by 128. In hexadecimal: a year is a leap year if it ends in 0, 4, 8, or C, unless it ends A00. This diverges by about one day every 87,000 years, and we have 31 365 = 365.2421875. 128 Now, let m = 29.53059 ... be the number of days in a lunar month (that is, from one new moon to the next), y so that we have m = 12.3683 ... Taking the continued fraction, y = h12; 2, 1, 2, 1, 1, 17,...i, x y and the convergents of x − 12 are 1 1 3 4 7 , , , , ,... 2 3 8 11 19 Modern lunisolar calendars have 7 leap months every 19 years, diverging by one month every 6800 years. In modern western music, the A above middle C is assigned the frequency 440Hz. By doubling this frequency, we obtain a note one octave higher; tripling it, we obtain a perfect fifth between 880Hz and 1320Hz. Un- fortunately much like the alignment of months and years, the alignment of octaves and fifths is out of sync; indeed, (3/2)12 ≈ 1.015. 27 However, an equally-tempered tuning divides each octave into 12 equal segments, so each semitone is an increase by a factor of 21/12; in this case we see 27/12 ≈ 1.498. We take the continued fraction:

log(3/2) log (3/2) = = 0.58496 ... = h0; 1, 1, 2, 2, 3, 1, 5 ...i, 2 log 2 with convergents 1 3 7 24 1, , , , ,... 2 5 12 41

83 So if we wanted to divide the octaves into x notes so that an interval of y of them make a perfect fifth, we would be better to take x = 41, y = 24. Pythagorean triplets: What are all positive integer solutions to the equation x2 + y2 = z2?A primitive triplet is a solution to this equation in which (x, y) = 1. Theorem 13.2.1 (Theorem 5.5, Niven) The positive, primitive Pythagorean triplets (with y even) are param- eterized by: x = r2 − s2, y = 2rs, z = r2 + s2, where r > s > 0, (r, s) = 1, and r and s have opposite parity. nb. For any primitive (x, y, z), exactly one of x and y is even. Proof: We give two sketches. 1. We may factor y2 = (z − x)(z + x), hence

y 2 z + x z − x = · , with ( x+z , x−z ) = 1. 2 2 2 2 2

z+x 2 z−x 2 By Euclid’s lemma, we must have that 2 = r , 2 = s . x 2 y 2 2. We have z + z = 1, and so we seek to find the rational points q of the unit circle. The line joining any rational point q to (−1, 0) has rational slope; conversely, any line through (−1, 0) with rational slope intersects the circle in a rational point:

2 2 2 2 y = mx + b, m ∈ Q ⇒ x + (m(x + 1)) = 1 ⇔ (x + 1)((m + 1)x + (m − 1)) = 0.

So, all rational points on the circle have the form

1 − m2 2m  , , m ∈ . 1 + m2 1 + m2 Q

 The approach of proof (2) generalizes to arbitrary conic sections.

84 13.3 Lecture Thirty-Seven

Final exam review At least half of the problems on the final will be taken from homework problems. No calculators are permitted. Below is a brief overview of the important topics covered. Chapter One – Divisibility • The Euclidean algorithm: calculating the gcd, B´ezout’sidentity, calculating inverses modulo m. • The Fundamental theorem of arithmetic. • Euclid’s theorem Chapter Two – Congruences • The Chinese remainder theorem. • Euler’s theorem; Fermat’s little theorem. • The Euler φ-function. × • Primitive roots; the structure of Zn . • Hensel’s lemma. • Solving linear congruences ax ≡ b mod m. • The number of solutions of xn ≡ a mod p. n φ(n) 2φ(n) Example problems: Find all n ∈ Z such that 3 ≡ n mod 7. Show that a ≡ a mod n for all a ∈ Z, n ∈ N. Prove that a squarefree integer n is a Carmichael number if and only if (p − 1)|(n − 1) for every p|n. Chapter Three – Quadratic Reciprocity and Quadratic Forms • Sums of two squares. • The law of quadratic reciprocity. • Jacobi symbols, Legendre symbols; special known values of the same. • Quadratic residues and nonresidues. • Euler’s criterion. • Binary quadratic forms × Example problem: In Zn , prove that at most half of the elements are quadratic residues, and that exactly half of them are quadratic residues if and only if n has a primitive root. Chapter Four – Some Functions of • Multiplicative functions, totally multiplicative functions. • Dirichlet convolution. • M¨obiusinversion. Chapters Six and Seven – Farey Fractions and Irrational Numbers; Simple Continued Frac- tions • Dirichlet’s theorem on Diophantine approximation.

85 • Farey fractions. • Diophantine approximations to rational and algebraic numbers. • Continued fractions. • Pell’s equation.

86