<<

DIOPHANTINE – TAUGHT BY JOE SILVERMAN

NOTES BY SHAMIL ASGARLI

1. Diophantine Objective. Solve equations using (or rationals). Example. Linear : Solve ax + by = c where a, b, c ∈ Z. It is a classical fact from elementary theory that this equation has a solution (for x and y) if and only if gcd(a, b) | c. Example. Quadratic equation: x2 + y2 = z2. We are interested in non-zero solutions (x, y, z) ∈ Z, i.e. (x, y, z) 6= (0, 0, 0). Since the equation is homogeneous, it is enough to understand the solutions of X2 + Y 2 = 1 where X,Y ∈ Q (points on the unit ). Anyways, the complete solution is known for this problem. WLOG gcd(x, y, z) = 1, x is odd and y is even. Then the solutions are given by x = s2 − t2, y = 2st, and z = s2 + t2. s2 − t2 Analogously, the solutions (X,Y ) ∈ 2 of X2 + Y 2 = 1 are parametrized by: X = Q s2 + t2 2st and Y = . s2 + t2 Example. Consider the equation y2 = x3 − 2. It turns out that (3, ±5) are the only solutions in Z2, while there are infinitely many solutions (x, y) ∈ Q. Goal. Given f1, f2, ..., fk ∈ Z[X1, ..., Xn]. For R = Z, Q, or any other . Let n V (R) = {(x1, x2, ..., xn) ∈ R : fi(x1, ..., xn) = 0 for all i} Describe V (R). Two questions naturally arise. 1) Is V (R) = ∅? This is undecidable for R = Z. 2) Is V (R) finite? 2 variables, 1 equation. Let C be a given by f(x, y) = 0 where f(x, y) ∈ Z[x, y]. The goal would be to describe the solutions (x, y) ∈ Z2, Q2, R2 or C2. As the ring gets bigger and bigger, the task progressively becomes easier. In other words, we are concerned with the solution set C(R) = {(x, y) ∈ R2 : f(x, y) = 0}. Example. C : x3 + y2 = 1. THM C(Q) = C(Z) = {(1, 0), (0, 1)} Example. C : xn + yn = 1. FLT C(Q) ⊆ {(±1, 0), (0, ±1)} where FLT stands for Fermat’s Last Theorem (proved by Wiles). Idea. If degf(x, y) is big, does that necessarily mean fewer solutions? Not necessarily, e.g. y = xd still has plenty of solutions. Guiding Principle. Geometry (solutions to polynomial equations over an algebraically closed field) determines the (, i.e. solutions over integers or non- closed fields). 1 Consider the C : f(x, y) = 0. There are some extra points “at infinity”. Let C = C ∪ {points at infinity}. Sometimes, C is a nice curve (smooth). Not so nice are the ones with singularities (think of a cuspidal or nodal cubic curve). We can blow-up these curves at their points to make them smooth. Assuming C is nice, the set C(C) is a nonsingular compact 1-dimensional complex mani- fold, i.e. a Riemann of g. Intuitively, g counts the number of holes. So g = 0 corresponds to the 2-, while g = 1 corresponds to the usual torus, etc. So here, the genus g is the “geometry” side (see the Guiding Principle above). Theorem. Consider the plane curve C : f(x, y) = 0 for f ∈ Q[x, y]. Suppose there are no singularities, so C(C) = g-holed torus, where g is the genus of C. There are three cases to consider. Case 1. g = 0. Then C(Q) = ∅ or C(Q) = Q ∪ {∅}. There exists an to determine the conclusion. Case 2. g = 1. Then C(Q) = ∅ or C(Q) = finitely-generated abelian . This is the Mordell-Weil Theorem. In the latter case, we know that any finitely-generated abelian group is of the form r finite abelian group × Z | {z } =torsion part . The non-negative r is called the rank. It is a theorem of Mazur that the torsion part has order at most 16. Furthermore, there exists an algorithm to determine the torsion part. It is not known if the rank r can be unbounded or not. Current record for an example with high rank is r = 28 due to Elkies. There is no known algorithm to determine the rank in general. Case 3. g ≥ 2. Then C(Q) is finite. This is a theorem of Faltings (this result was previously known as Mordell’s Conjecture). There is no algorithm in general to find the solution set. Goals for the class. We will prove Mordell-Weil Theorem and Faltings’ Theorem (but not Faltings’ original proof). Key Tools. (1) Diophantine Approximation: how closely can a rational quantity approximate an irrational quantity? We will learn about results of Roth, Baker and others. (2) Height Functions: measuring complexity of objects.

2. Diophantine Approximation Let us say few words about Diophantine Approximation. First, since is dense in , it is a √ Q √ R true that inf | − 2| = 0. However, we are interested in approximating 2 with rationals a/b∈Q b whose denominators are not so large (relatively speaking). For example, here are two facts that are easy to prove: a a √ 1 (1) There are infinitely many ∈ with gcd(a, b) = 1 satisfying − 2 < . b Q b b2 a a √ 1 (2) There are only finitely many ∈ with gcd(a, b) = 1 satisfying − 2 < . b Q b b3 In fact, let’s prove a more general result that implies the second statement. 2 Theorem. (Liouville) Let  > 0. If x ∈ R satisfies a degree n polynomial with coefficients in Q, then a 1 − x < b bn+ a has only finitely many solutions for with gcd(a, b) = 1. b Remark. The following proof was communicated to me by Ming Hao Quek. Proof. Consider the set S defined by a a 1  S := ∈ : − x < b Q b bn+

Assume, to the contrary, that S is infinite. Say x satisfies a monic polynomial P (X) ∈ Q[X] Qn a a  of degree n. Let P (X) = i=1(X − xi) where xi ∈ C and x1 = x. Given b ∈ S, P b is a with denominator at most Dbn for some fixed D > 0. Since P (X) has only finitely many roots, and S is infinite, the subset na a o S0 := ∈ S : P 6= 0 b b a 0 must be infinite as well. For all b ∈ S , we have a 1 P ≥ b Dbn a 0 On the other hand, for all b ∈ S , a a  a  a  P = − x − x ··· − x b b b 2 b n   n a Y  a  a n−1 a ∆ ≤ − x  − x + |x − xi| ≤ − x (1 + δ) = ∆ − x ≤ b b b b bn+ i=2  | {z } | {z } <δ ≤1 where δ is any upper bound for the difference of the roots of P (X), and ∆ := (1 + δ)n−1 only depends on x. Combining the upper and lower bounds, we get 1 ∆ ≤ ⇒ bn+ ≤ D∆bn Dbn bn+ a 0 0 a 0 for all b ∈ S . Since S is infinite, we can choose b ∈ S where b is arbitrarily large, and this leads to a contradiction. a  Using the above theorem, we easily see that there are only finitely many ∈ with b Q a √ 1 gcd(a, b) = 1 satisfying − 2 < . Similarly, for any  > 0, there are only finitely many b b3 a a √ 1 ∈ with gcd(a, b) = 1 satisfying − 3 2 < . Q 3+ b b b a Fact. It is also true, but much harder to prove, that there are only finitely many b ∈ Q a √ 1 with gcd(a, b) = 1 satisfying − 3 2 < . This would instantly follow from Roth’s b b2.5 celebrated theorem. 3 Theorem. (Roth) Let  > 0. If x is an , then there are only finitely a many ∈ with gcd(a, b) = 1 satisfying b Q a 1 − x < b b2+

3. Background We have the standard definitions of affine and projective spaces.

n A (K) = {(x1, ..., xn): xi ∈ K} n n+1 ∗ ∗ P (K) = A (K) \{0}/K = {[x0 : ··· : xn] = [λx0 : ··· : λxn] for λ ∈ K }

Let K be a number field, and fix an algebraic closure K. The Galois group GK = Gal(K/K) n acts on A (K) by σ(P ) = (σ(x1), ..., σ(xn)) for P = (x1, ..., xn). n GK n n n Then A (K) = fixed points of A (K) = A (K). Likewise, GK acts on P (K) by σ(P ) = [σ(x0): ··· : σ(xn)] for P = [x0 : ··· : xn]. Proposition. Pn(K)GK = Pn(K). Proof. This is an application of Hilbert’s Theorem 90. Definition. We say that f(x0, x1, ..., xn) is a homogeneous of degree d if

X i0 in f = aI x0 ··· xn

I=(i0,...,in) i0+···+in=d

n or equivalently, f(λx0, ..., λxn) = λ f(x0, ..., xn) in the ring K[λ, x0, ..., xn]. For P ∈ Pn(K), f(P ) is not well-defined but {P : f(P ) = 0} is well-defined. n m Definition. A rational map f : P → P is f = [f0, ..., fm] with f0, ..., fm ∈ K[x0, ..., xn] homogeneous of degree d. To be more pedantic, we could have written f : Pn(K) → Pm(K). Then f(P ) is (almost) well-defined: If P = [a0, ..., an], then f(P ) = [f0(a0, ..., an), ..., fm(a0, ..., an)] is well-defined when fi(a0, ..., an) 6= 0 for some i. Since K[x0, ..., xn] is a UFD, we can assume that f0, f1, ..., fm have no common irreducible factor, that is, gcd(f0, ..., fm) = 1, in which case we can define the degree of f as d. For a rational map f : Pn → Pm, its indeterminacy is defined by: n If = {P ∈ P : f0(P ) = ··· = fm(P ) = 0}

n m So f gives a function f : P (K) \ If → P (K). n m Definition. A rational map f : P → P is called a if If = ∅. Example. Consider the rational map f : P2 → P2 given by [x, y, z] 7→ [x2, xy, z2]. Then If = {[0, 1, 0]} and so f is not a morphism. Nullstellensatz. Given F1,F2, ..., FN ∈ K[x0, ..., xn] homogenous (not nec- essarily of the same degree), let

n V (F ) := V (F1, ..., FN ) := {P ∈ P (K): F1(P ) = ··· = FN (P ) = 0} p Suppose V (F ) = ∅. Then Nullstellensatz says that the radical hF1, ..., FN i = (x0, ..., xn). ki PN In other words, for each 1 ≤ i ≤ n, there exists a ki ∈ N such that xi = i=1 GiFi. 4 4. Height Functions Moral: “height(object) = complexity”. n Given P = [x0, ..., xn] ∈ P (Q), assume WLOG xi ∈ Z. Since Z is a UFD, we can assume WLOG gcd(x0, ..., xn) = 1 in which case we say the coordinates are normalized (normalization in this case is unique up to ±1). Pn To describe P takes roughly i=1 (log2 |xi| + 1) bits. n Definition. Given P = [x0, ..., xn] ∈ P (Q), | {z } normalized   (1) The (logarithmic) height of P is h(P ) = log max |xi| . 0≤i≤n

(2) The (multiplicative) height of P is H(P ) = max |xi| 0≤i≤n Theorem. For any fixed B > 0 and n ∈ N, n #{P ∈ P (Q): h(P ) ≤ B} < ∞ n Proof. Write P = [x0, ..., xn] ∈ ( ) in a normalized form. Note that H(P ) = max |xi| ≤ P Q 0≤i≤n 0 0 B 0 B where B = e . The number of choices for xi is 2B + 1, namely the number of integers in the interval [−B,B]. Thus, n n 0 0 n+1 #{P ∈ P (Q): h(P ) ≤ B} = {P ∈ P (Q): H(P ) ≤ B } ≤ (2B + 1) < ∞ as desired.  n 0 0 n+1 2 In fact, the size of {P ∈ P (Q): H(P ) ≤ B } is asymptotic to kn(B ) where kn = ζ(n+1) . In symbols, #{P ∈ Pn(Q): H(P ) ≤ B} lim n+1 = 1 B→∞ knB It would be an instructive exercise to check this for n = 1.

5. Absolute Values Recall that

MQ := {(normalized) absolute values on Q} = {| · |∞} ∪ {| · |p where p ∈ Z is prime}

− ordp a For a given a ∈ Q, we have |a|∞ = max(a, −a), and |a|p = p where ordp(a) = r is r b defined to be the unique integer such that a = p c with p - b and p - c. Product Formula. For any a ∈ Q \{0}, we have Y |a|v = 1

v∈MQ In general, for any number field K, we have

MK := {absolute values on K extending MQ} ∞ ◦ ∞ ◦ We usually decompose MK = MK ∪MK where MK consists of archimedean places, and MK consists of non-archimedean places. Given a tower of field extensoons L/K/Q, and places w ∈ ML. v ∈ MK , we say that w | v if | · |w = | · |v on K. Notation. We denote Kv to be the completion of K at v. ∞ If v ∈ MK , then Kv = R or C. 5 Let nv be the local degree of v, which is defined to be nv := [Kv : Qv]. ∞ If v ∈ Mk , then nv = 1 or 2. ◦ If v ∈ Mk , where v corresponds to a prime p lying over a prime p ∈ Z, then

nv = e(p/p)f(p/p) where e and f denotes ramification index and inertia degree, respectively. nv Definition. ||a||v = |a|v (a slightly different normalization). Product Formula. For a ∈ K∗, we have Y ||a||v = 1

v∈MK The proof of this result uses

Y Y Y − ordp(a) −1 ||a||v = |NK/Qa| and ||a||v = Np = |NK/Qa| ∞ ◦ v∈MK v∈MK p|(a)

Recall that RK denotes the ring of integers in a number field K. We can interpret the ring of integers in terms of places: ◦ ∞ RK = {a ∈ K : |a|v ≤ 1 for all v ∈ MK } = {a ∈ K : |a|v ≤ 1 for all v∈ / MK } ∞ More generally, let S be a finite set such that MK ⊆ S ⊂ MK . Definition. The ring of S-integers is defined by

RS := {a ∈ K : |a|v ≤ 1 for all v∈ / S} In other words, we allow the finite set S of primes to occur in the denominator. Theorem. (Dirichlet’s Unit Theorem). ◦ ∗ #S−1 r1+r2+#(S∩MK )−1 RS = µK × Z = µK × Z where µK denotes the roots of unity in K, and r1, r2 denote the number of real and complex (counted in pairs with its complex conjugate) embeddings of K, i.e. [K : Q] = r1 + 2r2. 6. Height Functions on Number Fields Let K/Q be a number field. Given a P ∈ Pn(K), the (relative multiplicative) height of P is Y HK (P ) = max ||xi||v 0≤i≤n v∈MK

The (relative logarithmic) height of P is hK (P ) := log HK (P ). The word “relative” here means relative to K. Proposition.

(1) HK (P ) is well-defined. (2) HK (P ) ≥ 1. [L:K] (3) For a field extension L/K, HL(P ) = HK (P ) .

Proof. See the book [HS00]. For 2), a different way to proceed is to note that xj 6= 0 for some j so Y Y HK (P ) = max ||xi||v ≥ ||xj||v = 1 0≤i≤n v∈MK v∈MK where the last equality follows from the product formula. 6 n Definition. (Northcott) The (absolute multiplicative) height of P = [x0, ..., xn] ∈ P (Q) is defined as follows: • Let K/Q be such that P ∈ Pn(K). 1/[K:Q] • Define H(P ) := HK (P ) ≥ 1. Part (3) of the proposition implies that H(P ) is independent of K/Q such that P ∈ Pn(K). Indeed, suppose L/K/Q such that P ∈ Pn(K), so that P ∈ Pn(L) as well. To show independence of H(P ) on K or L, observe that

1/[L:Q] [L:K] 1/[L:Q] [L:K]/[L:Q] 1/[K:Q] HL(P ) = (HK (P ) ) = HK (P ) = HK (P ) The (absolute logarithmic) height of P is 1 h(P ) = log H(P ) = hK (P ) [K : Q] where K/Q is any field extension such that P ∈ Pn(K). Notation. For a ∈ Q, we can consider [a, 1] ∈ P1(Q). We set !1/[K:Q] Y H(a) := H([a, 1] ) = max(||a||v, 1)

|{z} v∈MK ∈P1(Q) and h(a) := h([a, 1]) n Proposition. Given P ∈ P (Q) and σ ∈ GQ = Gal(Q/Q), we have H(σ(P )) = H(P ). Proof. See the book [HS00]. Theorem. Let K be a number field. The set n {P ∈ P (K): HK (P ) ≤ B} is finite. n Definition. The field of definition of P = [x0, ..., xn] ∈ P (Q) is   x0 x1 xn Q(P ) = Q , , ..., xj xj xj for some j such that xj 6= 0. Alternatively, let GP = {σ ∈ GQ : σ(P ) = P }. Then GP Q(P ) = Q is called the field of moduli for P . Theorem. For any fixed B, D > 0, the set n #{P ∈ P (Q): H(P ) ≤ B and [Q(P ): Q] ≤ D} is finite. n Notation. The set above is denoted P (Q)B,D, and we put D(P ) := [Q(P ): Q]. n Proof. Let P = [x0, ..., xn] ∈ P (Q)B,D, and let K = Q(P ). WLOG assume that some xj = 1. So, for any k, max ||xi||v ≥ max(||xk||v, 1) 0≤i≤n So !1/DP !1/DP Y Y (1) max ||xi||v ≥ max(||xk||v, 1) = H(xk) 0≤i≤n v∈MK v∈MK | {z } =H(P )≤B 7 for each k. Also, Q(xk) ⊆ Q(x0, ..., xn) = K which gives

(2) D(xk) ≤ D(P ) Combine inequalities (1) and (2) to get n 1 n+1 #P (Q)B,D ≤ #(P (Q)B,D) In the view of the last inequality, it is enough to show that #{α ∈ Q : H(α) ≤ B,D(α) = d} < ∞ for all 1 ≤ d ≤ D. Let Fα(X) be the minimal polynomial of α over Q. We can write d d Y X j d−j Fα(X) = (X − αi) = (−1) σj(α1, ..., αd)X i=1 j=0

where α = α1, α2, ..., αd are the conjugates of α. In the analysis that follows below, we will use the notation ( N if v is archimedean N = v 1 if v is non-archimedean

for any positive integer N. For v ∈ MK ,

X |σj(α1, ..., αd)|v = αi1 αi2 ··· αij

1≤i <...

Applying the usual procedure of raising the expression to nv power, multiplying over all v ∈ MK and taking [K : Q] roots, one obtains !1/[K:Q] d 2 Y dnv Y d d d H([σ0(α), ..., σd(α)]) ≤ 2v H(αi) = 2 H(α)

v∈MK i=1 where we applied the product formula at the end to the term 2d. We have a map

take min poly d d−1 d d2 {α ∈ Q : H(α) ≤ B,D(α) = d} −→ {X +a1X +···+ad : ai ∈ Q,H([1, a1, ..., ad]) ≤ 2 B } | {z } =C which is d-to-1. But we saw earlier that d d−1 n #{X + a1X + ··· + ad : ai ∈ Q,H([1, a1, ..., ad]) ≤ C} ≤ #{P ∈ P (Q): H(P ) ≤ C} < ∞ which completes the proof.  n c2(n,D) Exercise. Show that #P (Q)B,D ≤ c1(n, D) ≤ B . 8 ∗ Corollary. (Kronecker’s Theorem) Let α ∈ Q . Then h(α) = 0 if and only if α is a root of unity. Proof. First, we need a preliminary result, namely that h(αn) = nh(α) or equivalently, n n HK (α ) = HK (α) . Indeed, !n n Y n Y n HK (α ) = max(||α ||v, 1) = max(||α||v, 1) = HK (α)

v∈MK v∈MK We can proceed to the proof of the corollary. (⇐) This direction is clear: if αn = 1, then nh(α) = h(αn) = h(1) = 0 because H(1) = 1. We get h(α) = 0. (⇒) Suppose that h(α) = 0. Then h(αn) = nh(α) = 0. Hence, 2 3 {1, α, α , α , ...} ⊆ {β ∈ Q : H(β) ≤ 1 and D(β) ⊆ D(α)} | {z } finite Thus, αm = αn for some m > n, which gives αm−n = 1, i.e. α is a root of unity. ∗  Lehmer’s Conjecture. There exists c > 0 such that for all α ∈ Q with α 6= a root of unity, c h(α) ≥ D(α)

Theorem. (Dobrowolski) [Dob79]. Let  > 0. There exists a constant c > 0 such that ∗ for all α ∈ Q with α 6= a root of unity, c log log D(α)6+ h(α) ≥  D(α) log D(α)

7. Comparing heights of P and f(P ) Recall that we have defined the absolute h : Pn(Q) → [0, ∞). Let f : n m P 99K P be a rational map of degree d ≥ 1. In other words, f is given by [f0, ..., fm] d where fi ∈ Q[x0, ..., xn]. Intuitively, we would expect that H(f(P )) ≈ H(P ) so that h(f(P )) ≈ d · h(P ). n n d d d Example. Consider the map f : P 99K P given by f[x0, ..., xn] = [x0, x1, ..., xn]. In this case, H(f(P )) = H(P )d holds exactly. The next theorem explains how the heights of P and f(P ) compare. n Theorem. (a) There is a constant c1(f) so that for all P ∈ P (Q),

h(f(P )) ≤ d · h(P ) + c1(f)

(b) Assume that f is a morphism (i.e. the indeterminacy locus If = ∅). There is a constant n c2(f) so that for all P ∈ P (Q),

h(f(P )) ≥ d · h(P ) − c2(f) Remark. The hypothesis that f is a morphism is necessary for the conclusion in part 2 2 (b) to hold. Indeed, consider the following rational map f : P 99K P given by f([x, y, z]) = [x2, xy, z2]. Note that f([a, b, a]) = [a2, ab, a2] = [a, b, a] for every [a, b] ∈ P1(Q). Thus, h(f([a, b, a]) = h([a, b, a]) which shows that the lower bound h(f(P )) ≥ 2 · h(P ) − c2(f) cannot possibly hold for any constant c2(f). The problem is that f is not a morphism. In fact, If = {[0, 1, 0]}. 9 n Proof. (a) Let P = [x0, ..., xn] ∈ P (Q) and f = [f0, ..., fm]. We can write each fi as

X j0 j1 jn fi = ai,j0,...,jn x0 x1 ··· xn j0+···+jn=d

By enlarging the field, if necessary,#» we can assume that there is a number field K such that #» xi ∈ K and ai, j ∈ K for all i and j = (j0, ..., jn). Let v ∈ MK be a place. Then

X #» j0 jn |fi(P )|v = ai, j x0 ··· xn j0+···+jn=d v   n + d j0 jn ≤ · max#» |aij|v · max#» |x0 ··· xn |v d v j j n+d d ≤ 2v · |fi|v · max |xk|v 0≤k≤n

#» In the last , we simply defined |fi|v to be max j |aij|v, i.e. it measures the largest absolute value among the coefficients of fi. Since f = [f0, ..., fn], we can define |f|v = max0≤i≤m |fi|v. We get n+d d |fi(P )|v ≤ 2v · |f|v · max |xk|v 0≤k≤n for each 0 ≤ i ≤ m. The right hand side is independent of i, so

n+d d max |fi(P )|v ≤ 2v · |f|v · max |xk|v 0≤i≤m 0≤k≤n

Now we do the usual procedure: we raise the inequality to nv-th power (local powers), multiply over all v ∈ MK , and then take the [K : Q]-th root to get the absolute heights. Finally, we will take the log of both sides to obtain the absolute logarithmic heights. It Q nv Q nv is worth mentioning that for any integer N ∈ , we have N = ∞ N = N v∈MK v v∈MK v 1/[K: ] Norm (N) = N [K:Q]. After taking [K : ]-th roots, Q N nv  Q = N. Combining K/Q Q v∈MK v these observations, the previous displayed equation translates to H(f(P )) ≤ 2n+dH(f)H(P )d

1/[K: ] where H(f) stands for Q |f|nv  Q . There is a way to interpret H(f) as an actual v∈MK v height. Indeed, arrange all the coefficients of all the component functions fi of f into some big vector and view the large string as an element of some big projective . Then H(f) is precisely the absolute height of this vector. Similarly, we define h(f) := log H(f). Taking logs, we arrive to h(f(P )) ≤ d · h(P ) + h(f) + (n + d) log(2)

We can take c1(f) := h(f) + (n + d) log(2), and this completes the proof of part (a). (b) The lower bound is more subtle, and requires Nullstellensatz. We are assuming that f is a morphism, i.e. If = ∅. This means f0, ..., fm have no common roots except for (0, ..., 0) which is not in the . The projective version of Nullstellensatz says that for ei each 0 ≤ i ≤ n, there is an exponent ei such that xi ∈ hf0, ..., fmi. By taking the largest of e the ei’s, there is a single exponent e (independent of i) such that xi ∈ hf0, ..., fmi. 10 For each i, there are polynomials gij(x0, ..., xn) in K[x0, ..., xn] such that m e X xi = gij(x0, ..., xn)fj(x0, ..., xn) j=0

By enlarging the field again, we may as well assume that gij[x0, ..., xn] ∈ K[x0, ..., xn]. With- out loss of generality, we can assume that gij are homogeneous of degree e − d. Thus, e #» #» |xi|v ≤ (m + 1)v · max |gij( x )|v · max |fj( x )|v 0≤j≤m 0≤j≤m

 n+e−d e−d #» ≤ (m + 1)v · max 2 · |gij|v · max |xk|v max |fj( x )|v 0≤j≤m k 0≤j≤m

Here we applied the result of part (a) to the functions gij. Now, taking a maximum over all 0 ≤ i ≤ n, we get e n+e−d e−d #» max |xi|v ≤ (m + 1)2 · max |gij|v · max |xk|v · max |fj( x )|v 0≤i≤n v i,j 0≤k≤n 0≤j≤m e−d Moving the term max0≤k≤n |xk|v to the left hand side, we obtain d n+e−d #» max |xi|v ≤ (m + 1)2 · max |gij|v · max |fj( x )|v 0≤i≤n v i,j 0≤j≤m Now we do the usual procedure (see part (a) for details) to get H(P )d ≤ (m + 1)2n+e−dH(g)H(f(P )) where H(g) stands for the heigh of a point obtained by stringing together all the coefficients of all the gij. After taking logs, and rearranging the equation, we have h(f(P )) ≥ d · h(P ) − h(g) − log(m + 1) − (n + e − d) log(2)

We can take c2(f) := h(g) + log(m + 1) + (n + e − d) log(2), and this completes the proof of part (b).  8. Application: Northcott’s Theorem for preperiodic points n n Suppose f : P 99K P is a rational map (so here m = n). We can iterate f by composing f with itself. So we consider f 2 := f ◦ f, f 3 := f ◦ f ◦ f, and in general f r := f ◦ f ◦ · · · ◦ f. | {z } =r times n r Given a point P ∈ P , its f-orbit is defined to be the set Of (P ) := {f (P ): r ≥ 0}. Definition. A point P is called preperiodic if Of (P ) is finite. A point P is called periodic if f m(P ) = P for some m ≥ 1. It is clear that any periodic point is preperiodic. Similarly, some iterate of a preperiodic point must be periodic. In Pm(C), the set PrePer(P ) is big. Indeed, for any pair of integers k > j, the polynomial equation f k(P ) = f j(P ) will always have solutions in C, leading to abundance of preperiodic points. Of course, the same argument applies to any algebraically closed field. Theorem. (Northcott) Suppose f : Pn → Pn is a morphism defined over Q with degree d ≥ 2. The set n {P ∈ P (Q): P is preperiodic for f} is a set of bounded height. Since there are only finitely many points with bounded height and bounded degree, we immediately deduce the following corollary. 11 Corollary. Suppose f : Pn → Pn is a morphism defined over Q with degree d ≥ 2. For any number field K, the set n {P ∈ P (K): P is preperiodic for f} is finite. Proof of Northcott’s Theorem. We are interested in the set of periodic and preperiodic points for f. n n i Per(f, P (Q)) := {P ∈ P (Q): f (P ) = P for some i ∈ N} n n i j PrePer(f, P (Q)) := {P ∈ P (Q): f (P ) = f (P ) for some i, j ∈ N with i > j} We first show that Per(f, Pn(Q)) is a set of bounded height. In a previous section, we have proved that for a morphism f of degree d, h(f(Q)) ≥ d · h(Q) − C holds for all points Q. Here C = C(f) is a constant that only depends on f. Suppose that P ∈ Per(f, Pn(Q)), i.e. f k(P ) = P for some k ≥ 1. Then h(P ) = h(f k(P )) = h(f(f k−1(P ))) ≥ d · h(f k−1(P )) − C ≥ d(d · h(f k−2(P )) − C) − C = d2 · h(f k−2(P )) − (d + 1)C ≥ ... ≥ dkh(P ) − (dk−1 + dk−2 + ··· + d + 1)C dk − 1 = dkh(P ) − C d − 1 We have shown that dk − 1 h(P ) ≥ dkh(P ) − C d − 1 Rearranging this inequality, one obtains dk − 1 C C ≥ (dk − 1)h(P ) ⇒ h(P ) ≤ d − 1 d − 1 Therefore,  C  Per(f, n( )) ⊆ P ∈ n( ): h(P ) ≤ P Q P Q d − 1 which proves Northcott’s theorem in the periodic case. Now suppose that P ∈ PrePer(f, Pn(Q)), i.e. f k+i(P ) = f i(P ) for some k ≥ 1, i ≥ 0. Note that i = 0 if and only if P is periodic for f. Since f k(f i(P )) = f i(P ), it follows that f i(P ) ∈ Per(f, Pn(Q)). By the previous compu- tation, C h(f i(P )) ≤ d − 1 12 Using the same computation as in the periodic case, we have di − 1 h(f i(P )) ≥ di · h(P ) − C d − 1 Combining the previous two equations, we get di − 1 C di · h(P ) − C ≤ d − 1 d − 1 Rearranging this inequality, we have  1 di − 1  di  di · h(P ) ≤ + C = C d − 1 d − 1 d − 1 Cancelling out di from both sides, we arrive at C h(P ) ≤ d − 1 which is amazingly the exact same bound as in the periodic case. We conclude that  C  PrePer(f, n( )) ⊆ P ∈ n( ): h(P ) ≤ P Q P Q d − 1 This finishes the proof of Northcott’s theorem.  As a corollary of Northcott’s theorem, PrePer(f, Pn(K)) must be finite for any number field K. So the natural question is, how big can the set PrePer(f, Pn(K)) be? It is clear that the size of PrePer(f, Pn(K)) can tend to infinity as [K : Q] → ∞, or n → ∞ or deg(f) → ∞. The following conjecture predicts that these 3 factors together govern how big can PrePer(f, Pn(K)) get. Uniform Boundedness Conjecture. n # PrePer(f, P (K)) ≤ C ([K : Q], deg(f), n) where the constant C only depends on [K : Q], deg(f) and n. 2 A simplest example is the case of a quadratic polynomial, i.e. fc(x) = x + c where c ∈ Q. By Northcott’s theorem, the set of periodic points Per(fc, Q) is finite. For c = 0, the point 2 x = 0 has period 1. For c = −1, x = 0 is a point of period 2 – indeed, f−1(x) = x − 1 and f−1(0) = −1 and f−1(−1) = 0. There is a specific value c0 such that fc0 has a point of period 3. However, Morton showed that there is no value of c such that fc has a point of period 4. The same conclusion holds for points of period 5. Conditional on the Birch–Swinnerton-Dyer conjecture, Stoll showed that there is no value of c ∈ Q such that fc has a point of period 6. Nothing is currently known for period 7 or higher.

9. Heights on Varieties Recall that an algebraic set V ⊆ Pn is defined by a collection of polynomial equations f1 = ... = fr = 0 where each fi is a homogenous polynomial. In other words, V = {P ∈ n P : f1(P ) = ... = fr(P ) = 0}. Given an algebraic set V , we can consider I(V ) which is the generated by the set homogeneous polynomials vanishing on V . In symbols, I(V ) = {homogeneous f such that f(P ) = 0 for all P ∈ V }. We say that an algebraic set V is a variety if I(V ) is . Geometrically, a variety is an algebraic set which cannot 13 be written as a union of two other algebraic sets. What we call “variety” here is sometimes called an “irreducible variety” in some textbooks. Given a variety V , an irreducible divisor of V is a subvariety W ⊆ V of codimension 1. Definition. The group of divisors of V is ( ) X Div(V ) = nW W : nW ∈ Z all but finitely many of nw are zero W irred. divisor More formally, Div(V ) is the generated by the symbols W where W ranges over all irreducible divisors of V . Example. Suppose that V = C is a curve. Then an irreducible divisor is just a point on P the curve. So, any divisor D on C is of the form P ∈C nP P . We can use the concept of a divisor to keep track of zeros and poles of functions on curves. Example. Let V = Pn. It turns out that if W ⊆ V is an irreducible divisor, then W = {f = 0} for some homogeneous irreducible polynomial in k[x0, ..., xn]. Given a variety V over a field k, the field k(V ) consisting of all rational functions on V is called the function field of V . Notice that k(V ) is the fraction field of the coordinate ring k[V ] = k[x0, ..., xn]/I(V ) which is an precisely because I(V ) is a prime ideal 1 (by definition of a variety). Given f ∈ k(V ), we can view f : V 99K P . It is customary to associate 0 with [0 : 1] and ∞ with [1 : 0] on P1. This correspondence is explained via [a, b] ←→ a/b. Let’s assume from now on that V is smooth. Given any irreducible divisor W ⊂ V , it is possible to define an integer ordW (f) which is the order of vanishing of f along W . This quantity ordW (f) will be negative if f has a pole across W . It turns out that ∗ ordW : k(V ) → Z is a . For a given f ∈ K(V )∗, we define X div(f) := ordW (f)W W irred. divisor −1 Note that ordW (f) > 0 if and only if W ⊆ f ([0 : 1]) and ordW (f) < 0 if and only if W ⊆ f −1([1 : 0]). The reason that all of this works rests on the fact that the local ring OV,W is a DVR (because it is a 1 regular local ring). ∗ Definition. We say that D1 ∼ D2 if D2 − D1 = div(f) for some f ∈ k(V ) . This is indeed an equivalence relation. Definition. The of V is the quotient group Div(V ) Div(V ) Pic(V ) = = ∼ div(k(V )∗) Example. Let’s prove that Pic(Pn) = Z. Suppose D ∈ Div(Pn). Then X X D = nwW = nW {fW = 0} W W where fW is a of degree dW . The polynomial ! P Y nW − nW dW F := fW · x0 W 14 has degree 0, so F is an element of k(Pn). Note that X X  div(F ) = nW div(fW ) − nW dW {x0 = 0} W | {z } ∈Z P Thus, D ∼ ( nW dW ) {x0 = 0}. So we have shown that any divisor D is linearly equivalent to mH where m is an integer and H = {x0 = 0} is a fixed hyperplane. n P In fact, we have constructed a degree map deg : Pic(P ) → Z defined by D 7→ nW . n ∗ The map is well-defined because deg(div(f)) = 0 for any f ∈ K(P ) . Since {x0 = 0} 7→ 1, the degree map is surjective. The map deg : Pic(Pn) → Z is also injective: if deg(D) = 0, P Q nW and D = W nW DW then D is defined by the equation f := W fW which is a rational P n ∗ function of degree 0 as W nW DW = deg(D) = 0. So f ∈ k(P ) and D = div(f) which is the zero element in Pic(Pn). Theorem. Let V be a smooth variety. There is an exact sequence of abelian groups: 0 −→ Pic0(V ) −→ Pic(V ) −→ NS(V ) −→ 0 where NS(V ) is the Neron-Severi group. Moreover, NS(V ) is a finitely generated abelian group. It turns out that Pic0(V ) is naturally a projective variety, also known as an . An abelian variety is an abelian in the category of projective varieties. One interesting feature is that abelian varieties are always smooth. This is because it has at least one smooth point, and the group law allows one to translate a smooth point everywhere else. In general, Pic(V ) has an alternative description via 1) line bundles or 2) invertible sheaves. In this course, we will mostly use the divisor language. if f ∈ k(V )∗, then div(f) = zeros − poles. We can turn this around: Given poles & zeros, we can try finding a function f that fits such portfolio. Example. Find functions on P1 that vanish to order at least 2 at [1 : 3], and have a pole of order at most 3 at [5 : 2], and no other poles. For any [a : b] ∈ P1, we can consider the following function: (3x − y)2(ax + by) (2x − 5y)3 where x, y are the on P1. It is clear that this function does the job, and conversely, any function on P1 that vanish to order at least 2 at [1 : 3], and have a pole of order at most 3 at [5 : 2] must be of the form above for some choice of [a : b] ∈ P1. P There is a convenient way to compare divisors. We say that D ≥ 0 if D = W nW DW where each nW ≥ 0. For any two divisors D1 and D2, we say that D1 ≥ D2 if D1 − D2 ≥ 0. Given D ∈ Div(V ), we define the space L(D) = {f ∈ k(V )∗ : div(f) ≥ −D} ∪ {0} It turns out that L(D) is a finite-dimensional . To see that L(D) is indeed ∗ a vector space, one needs to use the fact that ordW : k(V ) → Z is a valuation. The finite-dimensionality part of the assertion is much subtler. Given V ⊂ Pn, how shall we define a height on V ? A naive approach is restrict the height n h : P (Q) → [0, ∞) to points of v to get hV : V (Q) → [0, ∞). 15 Problem / Issue. There are lots of ways to embed V inside a projective space, and the above definition of a height would depend on such an embedding. So we need to study how to map V into a projective space. Given a divisor D, we can r choose a basis f0, ..., fr for L(D), and we can use this to define [f0, ...., fr]: V 99K P . We will get different height functions, one for each divisor D. We will study the variation in heights obtained this way. We denote `(D) := dim L(D). Given a basis f0, ..., fr for L(D) where r = `(D) − 1, we r get a map φD :[f0, ..., fr]: V 99K P . Assuming that φD is an embedding (in particular, a morphism), we can define the height of a point P ∈ V by hD(P ) = h(φD(P )). Definition. A divisor D is said to be very ample if φD is an embedding. Definition. A divisor D is said to be ample if there exists m ≥ 1 such that mD is very ample. Example. On a smooth curve C, D is very ample if deg(D) ≥ 2g +1 where g is the genus of C. This is an application of Riemann-Roch Theorem. Example. On a smooth curve C, D is ample if deg(D) ≥ 1. Theorem. (Serre) Every divisor D ∈ Div(V ) can be written as D = D1 − D2 where D1 and D2 are very ample. Idea behind the Proof. Let H be very ample (say a of V ). Then we can set D1 = mH + D and D2 = mH. For large enough m ∈ N, both D1 and D2 are very ample, and clearly D = D1 − D2. For every D ∈ Div(V ), choose very ample divisors D1 and D2 such that D = D1 − D2. `(D1)−1 Choose bases for L(D1) and L(D2) to get embeddings ϕD1 : V,→ P and ϕD2 : V,→ P`(D2)−1.

We can “define” hD(P ) := hD1 (P ) − hD2 (P ) = h(φD1 (P )) − h(φD2 (P )). This depends on a lot of choices: it depends on the way we decomposed D as D = D1 − D2, and it depends on the bases for D1 and D2. We will soon see that these choices can only change the height function by a bounded amount. Proposition. a) Suppose that D is very ample. Pick a basis f0, f1, ..., fr for L(D), 0 0 0 0 where r = `(D) − 1. Now pick another basis f0, f1, ..., fr for L(D). Let ϕD and ϕD be the corresponding embeddings V,→ Pr. Then

0 h(ϕD(P )) = h(ϕD(P )) + O(1)

for all P ∈ V (Q). 0 0 b) Suppose that D and D are very ample. Then D + D is very ample, and hD+D0 = hD + hD0 + O(1). r 0 0 0 r Proof. a) We have ϕD = [f0, ..., fr]: V,→ P and ϕD = [f0, ..., fr]: V,→ P . There P 0 exists a change of basis matrix A = (aij) ∈ GLr+1(Q) such that fi = j aijfj. Note that r r 0 A : P → P defines a morphism and deg(A) = 1. Since ϕD = AϕD, we obtain

h(ϕD(P )) = h(A(ϕD0 (P ))) = h(ϕD0 (P )) + O(1) where O(1) is a constant that does not depend on the point P . 0 0 0 r b) Fix bases f0, ..., fr for L(D) and f0, ..., fs for L(D ). We get ϕD = [f0, ..., fr]: V,→ P 0 0 s 0 0 and ϕD0 :[f0, ..., fs]: V,→ P . A basis for L(D + D ) is given by {fifj : 0 ≤ i ≤ r, 0 ≤ (r+1)(s+1)−1 rs+r+s 0 j ≤ s}. Then ϕD+D0 : V → P = P is given by [..., fifj, ...]. The following 16 commutative diagram summarizes all the relevant maps:

ϕ 0 D+D / r+s+rs V _ P8

ϕD×ϕD0  + Pr × Ps rs+r+s 0 As a composition of embeddings, ϕD+D0 : V,→ P is also an embedding, i.e. D + D is very ample. Then 0 h(ϕD+D0 (P )) = h([..., fi(P )fj(P ), ...]) Once we fix a number field K, note that

0 Y 0 Hk([..., fi(P )fj(P ), ...]) = max ||fi(P )fj(P )||v i,j v∈MK   Y 0 = max ||fi(P )||v · max ||fj(P )||v i j v∈MK 0 = HK ([..., fi(P ), ...]) · HK ([..., fj(P ), ...])

Taking the , we get hD+D0 (P ) = hD(P ) + hD0 (P ) + O(1). Corollary. hD is well-defined up to O(1). 0 0 Proof. Suppose D = D1 − D2 = D1 − D2 are two different decompositions of D where 0 0 0 0 D1,D2,D1,D2 are all very ample. We have D1 + D2 = D1 + D2. Thus,

h 0 = h 0 + O(1) D1+D2 D1+D2

Since h 0 = h + h 0 + O(1) and h 0 = h 0 + h + O(1), we get D1+D2 D1 D2 D1+D2 D1 D2

h + h 0 = h 0 + h + O(1) D1 D2 D1 D2 or equivalently, h − h = h 0 − h 0 + O(1) D1 D2 D1 D2 as desired.  Consequence. We have constructed a well-defined function {functions V ( ) → } h : Div(V ) → Q R {bounded functions}

that sends D 7→ (hD : V (Q) → R). In fact, this is a group , because

hD1+D2 = hD1 + hD2 + O(1) as we proved above.

10. Weil Height Machine Let V be any non-singular variety. We want to further investigate the map {functions V ( ) → } h : Div(V ) → Q R {bounded functions}

D 7→ hV,D

where hV,D is a shorthand for hD : V (Q) → R) constructed in the previous section. Theorem. (Weil Height Machine) 17 (a) Normalization. Let ϕ : V,→ Pn be an embedding, and H ⊆ Pn be a hyperplane. Then

hV,ϕ∗H (P ) = h(ϕ(P )) + O(1) Here ϕ∗(H) is H ∩ V , and the height on the right hand side is the usual height on Pn. (b) Functoriality. Suppose ϕ : V → W is a morphism, and let D ∈ Div(W ). Then

hV,ϕ∗D(P ) = hW,D(ϕ(P )) + O(1) (c) Additivity. If D,E ∈ Div(V ), then

hV,D+E = hV,D + hV,E + O(1) (d) Linear Equivalence. Let D,E ∈ Div(V ) such that D ∼ E (linearly equivalent divisors). Then

hV,D = hV,E + O(1) P (e) Positivity. Assume D ≥ 0, i.e. D = niWi where Wi is an irreducible subvariety of V , and ni ≥ 0. Then

hV,D(P ) ≥ O(1) for all P not in the base locus of D. In other words, there exists a constant C ≥ 0 such that hV,D(P ) ≥ −C for all such points P . (f) Boundedness for Ample Divisors. If D is ample, then for any fixed B1,B2 > 0, the set

{P ∈ V (Q): hV,D(P ) ≤ B1 and [Q(P ): Q] ≤ B2} is finite. Base Point Freeness. Before we delve into the proof, we will say a few words about a base locus of a linear system, as it will come up in the proof a few times (for example, already n in the proof of part (b) below). If D is very ample, then by definition, [f0, ..., fn]: V,→ P is an embedding, where f0, ..., fn is a basis for L(D). For a general divisor D, if we pick a basis r f0, ..., fr for L(D), the associated rational map ϕD :[f0, ..., fr]: V,→ P might fail to be an embedding for several reasons. First of all, there might not even be enough sections to begin with, e.g. nothing prevents the case L(D) = {0}, and secondly, the associated rational map is not necessarily a morphism, i.e. f0, ..., fr may vanish at a common point. This second reason for failure can be encapsulated by a notion of a base locus. Given a divisor D, the base locus of D is the indeterminacy locus IϕD . Another definition is provided as follows. We first define |D| = {D0 ∈ Div(V ): D0 ∼ D, and D0 ≥ 0} Then the base locus of D is \ 0 BD = Support(D ) D0∈|D| 0 0 P Here, Support(D ) = ∪Wi where D = niWi with ni 6= 0. We say that D is base point r free, if the base locus BD is empty, i.e. the associated map ϕD = [f0, ..., fr]: V → P is a morphism. It turns out that one can define the associated height function for any base point free divisor by taking a basis for its global sections (it doesn’t have to be very ample!). See more details in page 186 of the the book [HS00]. 18 #» Proof. (a) Write H = {a0x0 + .... + anxn = 0}. Let’s denote A( x ) := a0x0 + .... + anxn. A basis for L(H) is given by x0 x1 xn #» , #» , ..., #» A( x ) A( x ) A( x ) These elements restricted to V give a basis for L(φ∗H) = L(V ∩ H). Thus,   x0 x1 xn hV,ϕ∗H (P ) = h #» (P ), #» (P ), ..., #» (P ) A( x ) A( x ) A( x ) #» We cancel out A( x ) in the projective coordinates, which gives

hV,ϕ∗H (P ) = h([x0(P ), ..., xn(P )]) = h(P ) + O(1)

(b) We write D = D1 − D2 where D1,D2 are very ample. If we can show the same claim for D1,D2, i.e. if we can show that

∗ hV,ϕ D1 (P ) = hW,D1 (ϕ(P )) + O(1)

∗ hV,ϕ D2 (P ) = hW,D2 (ϕ(P )) + O(1) then we would be done. Indeed, after subtracting the second equation from the first, one obtains

∗ ∗ hV,ϕ D1 (P ) − hV,ϕ D2 (P ) = hW,D1 (ϕ(P )) − hW,D2 (ϕ(P )) + O(1) or equivalently,

∗ ∗ hV,ϕ D1−ϕ D2 (P ) = hW,D1−D2 (ϕ(P )) + O(1) ∗ ∗ ∗ Since D = D1 − D2, and ϕ (D1) − ϕ (D2) = ϕ (D1 − D2), the last displayed equation becomes

hV,ϕ∗ (P ) = hW,D(ϕ(P )) + O(1) as desired. This reduction shows that the conclusion would follow if the claim for part (b) is demonstrated for D1 and D2. The only difference between D and Di (for i = 1, 2) is that Di are very ample. Therefore, we can assume, without loss of generality, that D is very ample. r Assume D is very ample, i.e. it induces an embedding ϕD : W,→ P . It is not necessarily true that ϕ∗(D) is very ample, but it is at least base point free. The divisor ϕ∗(D) induces n the map ϕD ◦ ϕ : V → P . Using part (a) twice, we have

hV,ϕ∗(D)(P ) = h ◦ ϕD ◦ ϕ(P ) + O(1)

= h ◦ ϕD(ϕ(P )) + O(1)

= hW,D(ϕ(P )) + O(1) (c) We proved this property last time. The key was to use the Segre embedding. (d) Suppose D ∼ E. Write D = D1 − D2 and E = E1 − E2 where D1,D2,E1,E2 are very ample. The hypothesis D1 − D2 ∼ E1 − E2 implies that D1 + E2 ∼ E1 + D2. Notice that D1 + E2 and E1 + D2 are both very ample (as they are sum of two very ample divisors). If we can show the same claim for D1 + E2 ∼ E1 + D2, i.e. if we can show that

hD1+E2 = hE1+D2 + O(1), then we would be done, as the last equation can be rearranged to

hD1−D2 = hE1−E2 + O(1). So we may assume, without loss of generality, that D and E are both very ample divisors. Let f0, ..., fn be a basis for L(D), so that

hD(P ) = h([f0(P ), ..., fn(P )]) 19 By definition of D ∼ E, we have D = E + div(g) for some g ∈ k(V )∗. Note the following chain of equivalences: f ∈ L(D) ⇐⇒ div(f) + D ≥ 0 ⇐⇒ div(f) + div(g) + E ≥ 0 ⇐⇒ div(fg) + E ≥ 0 ⇐⇒ fg ∈ L(E)

Since f0, ..., fn is a basis for L(D), the elements f0g, ..., fng (multiplication here, not compo- sition) form a basis for L(E). The last assertion can be justified by observing that one can multiply a linear dependence relation formally by g−1, since g ∈ k(V )∗. Thus,

hE(P ) = h ([f0g(P ), ..., fng(P )]) + O(1)

= h ([f0(P )g(P ), ..., fn(P )g(P )]) + O(1)

= h ([f0(P ), ..., fn(P )]) + O(1)

= hD(P ) + O(1) Note that a problematic part could be at the points P where g vanishes or g has poles. In this case, we can use another g0 to cover these problematic points. And to cover the problematic points of g0, we can use another g00, and so on. The idea is somehow to use different charts to cover all of V . (e) Write D = D1 − D2 where D1,D2 are very ample. Given D ≥ 0, let f0, ..., fn be a basis for L(D2). Then

hD2 (P ) = h([f0(P ), ..., fn(P )])

We know D1 − D2 ≥ 0. Also, div(fi) + D2 ≥ 0 for each i. Adding these two inequalities, div(fi) + D1 ≥ 0. So fi ∈ L(D1) for each i = 0, 1, ..., n. Note that {f0, ..., fn} is still linearly independent when viewed in L(D1) but not necessarily a basis for L(D1). In any case, we can further extend it to a basis of L(D1) by adding extra elements fn+1, ..., fm, i.e. {f0, f1, ..., fm} is a basis of L(D1). By definition,

hD1 (P ) = h ([f0(P ), ..., fm(P )]) So

hD(P ) = hD1 (P ) − hD2 (P )

= h([f0(P ), ..., fm(P )]) − h([f0(P ), ..., fn(P )]) + O(1)

Since the entries [f0(P ), ..., fm(P )] include the entries of [f0(P ), ..., fn(P )], it is clear that h([f0(P ), ..., fm(P )]) ≥ h([f0(P ), ..., fn(P )]) and so the preceding inequality implies that

hD(P ) ≥ O(1) 0 To cover all points P on all of V − BD, we would repeat the argument for each D ∈ |D|. 0 Indeed, given any point P ∈ V − BD, there exists some D ∈ |D| such that P is not in the 0 0 support of D . When we write D = D1 − D2, we can also assume that P is not in the support of D1 and D2. Consequently, the basis elements f0, ..., fn ∈ L(D1) do not all vanish at P , because if they all did, then all elements of L(D1) would vanish at P , which would mean that P is in the support of D. 20 (f) Given that D is ample, mD is very amply for some m ∈ N. After choosing a basis f0, ..., fn for mD, we get an embedding n ϕmD = [f0, ..., fm]: V,→ P ∗ Since mD ∼ ϕmD, applying part (a),

h (P ) = h ∗ (P ) + O(1) V,mD V,ϕmDH = h(ϕmD(P )) + O(1)

On the other hand hV,mD(P ) = mhV,D(P ) + O(1), so combining these two equations,

mhV,D(P ) + O(1) = h(ϕmD(P )) + O(1)

Thus, looking at the set of points P with bounded degree and a bounded value of hV,D(P ) is the same as looking at the set of points P with bounded degree and a bounded value of h(ϕmD(P )). The latter set has already been shown to be finite before. In essence, we used functoriality to reduce the problem from V to the case of the projective space which has been dealt before.

11. Canonical Height Suppose that f : PN → PN is a morphism of degree d. Then

h(f(P )) = d · h(P ) + Of (1)

The error term Of (1) can be annoying sometimes, so we would like to get rid of it somehow. If we iterate f, and repeatedly use the above formula, we reach to n n h(f (P )) = d · h(P ) + Of n (1) At this point, one is tempted to divide both sides by dn, and let d → ∞. The problem is that Of n (1) can be growing as a function of d and n as well. However, if one carefully carries out the iteration process, one can get n n n h(f (P )) = d · h(P ) + Of (d ) If we try the same trick of dividing both sides by dn, we would get h(f n(P )) O (dn) = h(P ) + f dn dn | {z } =bounded This looks more promising. However, it is still not clear what happens when we let d → ∞. Just because the error term is bounded, it does not mean that it converges as d → ∞. Theorem. Let V be a non-singular variety, and f : V → V a morphism. Assume that D ∈ Div(V ) such that f ∗(D) ∼ λD for some λ > 1. Then (a) The limit ˆ 1 n hf,D(P ) := lim hD(f (P )) n→∞ λn ˆ converges. We say that hf,D(P ) is the canonical height associated to f and D. ˆ (b) hf,D = hD + O(1) ˆ ˆ (c) hf,D(f(P )) = λ · hf,D(P ) ˆ Moreover, properties (b) and (c) determine hf,D uniquely. 21 Proof. Since f is a morphism, by functoriality part of the Weil Height Machine,

hD(f(P )) = hf ∗D(P ) + O(1) Since f ∗D ∼ λD,

hD(f(P )) = hλD(P ) + O(1)

= λ · hD(P ) + O(1) by linearity in the last step. We will use the identity hD(f(P )) = λ · hD(P ) + O(1) to show 1 n i−1 that the sequence λn hD(f (P )) is Cauchy, hence converges. Replacing P by f (P ), we have i i−1 hD(f (P )) − λhD(f (P )) = O(1) for each i. Thus, for n > m,

1 n 1 m hD(f (P )) − hD(f (P )) = λn λm

1 n 1 n−1 hD(f (P )) − hD(f (P ))+ λn λn−1 1 1 h (f n−1(P )) − h (f n−2(P ))+ λn−1 D λn−2 D . .

1 m−1 1 m hD(f (P )) − hD(f (P )) λm+1 λm n X 1 1 = h (f i(P )) − h (f i−1(P )) λi D λi−1 D i=m+1 n X 1 = h (f i(P )) − λ · h (f i−1(P )) λi D D i=m+1 n ∞ X 1 X 1 1/λm+1 = O(1) ≤ O(1) = O(1) λi λi 1 − 1/λ i=m+1 i=m+1 1 1 = · O(1) −→ 0 as n ≥ m → ∞ λm λ − 1 ˆ We conclude that hf,D(P ) is well-defined. (b) Take m = 0 in the above computation. Then

1 n 1 hD(f (P )) − hD(P ) ≤ · O(1) λn λ − 1

Letting n → ∞, we get hˆ (P ) − h (P ) ≤ 1 ·O(1), so in particular, hˆ (P ) = h (P )+ f,D D λ−1 f,D D O(1). 22 (c) By definition,

ˆ 1 n hf,D(f(P )) = lim hD(f (f(P ))) n→∞ λn λ n+1 = lim hD(f (P )) n→∞ λn+1 1 n+1 ˆ = λ lim hD(f (P )) = λ · hf,D(P ) n→∞ λn+1 ˆ0 ˆ ˆ0 (d) Let hf,D be a function satisfying (b) and (c). Define g(P ) := hf,D(P ) − hf,D(P ). Part (b) implies that g = O(1). Part (c) says that g(f(P )) = λ · g(P ) ⇒ g(f n(P )) = λn g(P ) ⇒ g(P ) = 0 | {z } |{z} bounded →∞ ˆ0 ˆ So g = 0, and hf,D = hf,D which proves the desired uniqueness.  Example. Consider a morphism f : Pn → Pn of degree d = deg(f) ≥ 2. Then f ∗H ∼ dH. ˆ And we can associate the canonical height hf,D. In particular, consider the case of d-th power map, that is, f : Pn → Pn is given by d d ˆ f([x0, ..., xn]) = [x0, ..., xn]. In this case, the canonical height hf,D ends up agreeing with the usual Weil height: ˆ 1 X hf,D(P ) = h(P ) = log max ||xi||v [K : Q] v∈MK Recall the following conjecture: ∗ Lehmer’s Conjecture. There exists an absolute constant C > 0 such that if α ∈ Q is not a root of unity, then C h(α) ≥ [Q(α): Q] As a context for the conjecture, recall that if α is a root of unity, then h(α) = 0. In fact, the converse is also true, the proof of which has appeared earlier in the notes. We will now prove a generalization of this result (letting f be the d-the power map x 7→ xd in the statement will recover the results about roots of unity): Proposition. Let f : V → V be a morphism and D ∈ Div(V ) such that f ∗D ∼ λD ˆ where λ > 1. Suppose that D is ample. Then hf,D(P ) = 0 ⇔ P is preperiodic for f. n Proof. (⇐) Suppose P is preperiodic for f. Then the set {f (P )}n∈N is finite, and so n ˆ 1 n {h(f (P ))}n∈ is finite, which clearly tells us that hf,D(P ) = lim h(f (P )) = 0. N n→∞ λn ˆ ˆ ˆ ˆ n (⇒) If hf,D(P ) = 0, then the property hf,D(f(P )) = λhf,D(P ) implies that hf,D(f (P )) = ˆ n 0 for each n ∈ N. Since hf,D(P ) = h(P ) + O(1), the set {h(f (P ))}n∈N is bounded. Now, D is ample and f n(P ) all live in some fixed number field. By the boundedness property of the n Weil Height Machine, {f (P )}n∈N is finite. Therefore, P is preperiodic for f.  Note that Northcott’s theorem is a corollary of the proposition above in the special case of a morphism f : Pn → Pn of degree d ≥ 2. Indeed, for a number field K, the preperiodic ˆ points P of f correspond to the points where the height hf,D(P ) = 0, i.e. the values of hD(P ) are bounded. But there are only finitely many points in K of bounded height, so there are only finitely many preperiodic points of f. 23 12. Abelian Varieties Definition. An abelian variety is a group in the category of projective varieties (where the are regular morphisms, as opposed to just rational maps). In other words, A ⊆ Pn is an abelian variety if there exist morphisms µ : A × A → A, ι : A → A and an identity point e ∈ A satisfying the group axioms, i.e. µ(e, x) = µ(x, e) = x µ(x, ι(x)) = µ(ι(x), x) = e µ(µ(x, y), z) = µ(x, µ(y, z)) for all x, y, z ∈ A. It turns out that the group structure is always abelian. We will see the proof of this result soon. Example. An (a smooth genus 1 curve) is an example of an abelian variety of dimension 1. Assuming char(k) 6= 2, 3, one can realize the elliptic curve as a plane curve A ⊆ P2 given by the equation y2z = x3 + Axz2 + Bz3 for some A, B ∈ k satisfying 4A3 + 27B2 6= 0. Remark. It turns out every abelian variety A of dimension 1 must be an elliptic curve. Indeed, if dim(A) = 1 then A is a smooth curve of genus g ≥ 0. For g ≥ 2, it is a non- trivial result that the curve A has only finitely many automorphisms, and hence cannot be an abelian variety; indeed, in an abelian variety A, each point P ∈ A gives rise to a translation-by-P map A → A given by Q 7→ P + Q which is clearly an automorphism. As we are working with algebraically closed fields, there are infinitely many k-points P on A, which implies that every abelian variety admits infinitely many automorphisms. To rule out the case g = 0, observe that in this case A = P1 and suppose that µ : P1 × P1 → P1 is the multiplication map of some group structure. One can show that µ must necessarily factor 1 1 1 through one of the projections, i.e. say µ = f ◦ π1 where π1 : P × P → P is projection onto the first coordinate, and f : P1 → P1 is some morphism. But then for each point P , we have P = µ(e, P ) = f(e) which is a contradiction, as f(e) is a single point, while P can be chosen to be any point. Thus, g = 1 is only permissible value of the genus for an abelian variety of dimension 1. The key result that is needed to prove that abelian varieties are abelian groups is the so-called rigidity lemma. Rigidity Lemma. Let X,Y,Z be varieties such that X is projective. Let f : X ×Y → Z be a morphism. Suppose that there exists some y0 ∈ Y such that f(X × {y0}) is a single point. Then f(X × {y}) is a single point for all y ∈ Y . Proof. Since X is projective, X is proper so that the projection map p : X × Y → Y is a closed map. By hypothesis, f(X × {y0}) = {z0} for some z0 ∈ Z. Let z0 ∈ U ⊆ Z where U is an open affine neighborhood of z. Look at W = p(f −1(Z \ U)) which is a closed subset −1 of Y . We claim that y0 ∈/ W (so that W 6= Y ). Indeed, if y0 ∈ W , then (x, y0) ∈ f (Z \ U) for some x ∈ X, so that f(x, y0) ∈ Z \ U which contradicts f(x, y0) = z0 ∈ U. Thus, Y \ W is an open dense subset of Y . Note that for each y ∈ Y \ W , we must have f(X × {y}) ⊆ U. Indeed, if f(x, y) ∈ Z \ U for some x ∈ X, then (x, y) ∈ f −1(Z \ U) so that y ∈ p(f −1(Z \ U)) = W , contradicting the choice of y. We conclude that f(X × {y}) ⊆ U holds for each y ∈ Y \ W . But U is affine, while f(X × {y}) is projective (because X × {y} is projective, and f is a morphism). This forces f(X × {y}) to be a single point for each y ∈ Y \ W . 24 Now fix x0 ∈ X and define the map g : X × Y → Z

(x, y) 7→ (f(x0), y)

As f(X ×{y}) is a single point for each y ∈ Y \W , it follows that f(x, y) = f(x0, y) = g(x, y) for each x ∈ X and y ∈ Y \ W . So f and g agree on X × (Y \ W ). Since X × (Y \ W ) is an open dense subset of X × Y , and X × Y is separated, it follows that f and g agree on X × Y . Thus, for each y ∈ Y , we have f(x, y) = f(x0, y) for each x ∈ X, so that f(X × {y}) = {f(x0, y)} is a single point.  Note: The rigidity lemma is true for all characteristics. Corollary. 1) An abelian variety is an abelian group. 2) A morphism of abelian varieties is a composition of a homomorphism and a translation. Proof. 1) Let ∗ denote the group operation. Look at f : A × A → A given by f(x, y) = x ∗ y ∗ x−1 ∗ y−1. For each x ∈ A, we have f(x, e) = x ∗ e ∗ x−1 ∗ e−1 = e. So, we have f(A, e) = {e} which is a single point. By the rigidity lemma, f(A × {y}) is a single point for each y ∈ A. Since f(e, y) = e, this single point must be e again, so f(A × {y}) = {e} for each y ∈ A, i.e. for each x, y ∈ A, we have x ∗ y ∗ x−1 ∗ y−1 = e, or equivalently, x ∗ y = y ∗ x. 2) The proof is similar, and also uses the rigidity lemma. See the book for a proof.  Theorem. Let A be an abelian variety over C, and denote g = dim(A). There is a g g holomorphic surjective homomorphism C  A(C) whose is a L ⊂ C , i.e. ∼ 2g L = Z as a group. The space Cg is the universal covering space. The theorem above produces a complex ∼ analytic uniformization ϕ : Cg/L → A(C). Note. Not every Cg/L is an abelian variety (but yes for g = 1, i.e. elliptic curves). Notation. Let A be an abelian variety defined over K. For each m ∈ N, we define the map [m]: A → A given by P 7→ mP . This is a homomorphism. We also define A[m] := ker[m] = {P ∈ A(K):[m]P = 0}. Proposition. GK = Gal(K/K) acts on A[m]. Proof. Since A is defined over K, we have σ(P1 + P2) = σ(P1) + σ(P2), and σ(−Q) = −σ(Q) for every σ ∈ GK . By induction, σ([m]P ) = [m]σ(P ). So if σ ∈ GK and P ∈ A[m], then [m]σ(P ) = σ([m]P ) = σ(0) = 0, so σ(P ) ∈ A[m].  Analogy. It is useful to consider various analogies between an abelian variety (a projective algebraic group) and Gm (an affine algebraic group). The analog of A[m] is Gm[m] = {α ∈ ∗ m . K : α = 1} = lµm, i.e. the m-th roots of 1. This object is well-studied. For example, the . . extension Q(lµm)/Q is abelian, has degree ϕ(m), and ramified at p if and only if p | m. . ∼ 2g Theorem. Assume char(K) = 0 and g = dim(A). Then A[m] = (Z/mZ) as abstract groups. Proof. (for K ⊆ C). We have A(K) ⊆ A(C). So g A[m] = {P ∈ A(C):[m]P = 0} = points of order m in C /L  1 2g ∼= ( 2g/ 2g)[m] ∼= / ∼= ( /m )2g R Z mZ Z Z Z ∼ k Remark. If char(k) = p > 0, then A[p] = (Z/pZ) for some 0 ≤ k ≤ g, with k = g being the most common case. 25 Representation of a Galois group. After choosing a basis, we get a non-canonical ∼ 2g isomorphism A[m] = (Z/mZ) . The action of GK on A[m] gives a representation ∼ ρm : GK → Aut(A[m]) = GL2g(Z/mZ) Let K(A[m]) be the field generated by all the points P ∈ A[m]. Consider the subgroup H = Gal(K/K(A[m])) of GK = Gal(K/K). For each σ ∈ H, by definition σ(P ) = P for every P ∈ A[m], so ρm(σ) is the identity map in Aut(A[m]), i.e. σ is in the kernel of ρm. Thus, H ⊆ ker(ρm), meaning that ρm factors through GK → GK /H. By the Fundamental Theorem ∼ of , we have an isomorphism GK /H = Gal(K(A[m])/K). The situation can be described by a commutative diagram

ρm GK / GL2g(Z/mZ) 5

' Gal(K(A[m])/K)

13. Mordell-Weil Theorem Let K be a number field, and let A be an abelian variety. We can consider the set A(K) which consists of all the K-points of A. This is naturally a group. Mordell-Weil Theorem. A(K) is a finitely generated abelian group. The theorem is proved by first proving a weaker version. Weak Mordell-Weil Theorem. For each m ≥ 2, the quotient A(K)/mA(K) is finite. Note that weak Mordell-Weil does not immediately imply the full Mordell-Weil. For example, A = Q/Z is an example of an abelian group such that A/mA = 0 for each m ≥ 2, but A is not finitely generated. The following lemma shows that it is enough to prove Weak Mordell-Weil theorem after a finite extension of fields. Lemma. Let L/K be a finite extension. Then the kernel of

A(K)/mA(K) → A(L)/mA(L)

is finite. Proof. Let Φ be the kernel of this map, so A(K) ∩ mA(L) Φ = mA(K)

Let P = P (mod mA(K)) ∈ Φ with P ∈ A(K). Then P = mQP for some QP ∈ A(L). By replacing L by its Galois closure, we can assume that L/K is Galois. Define a map

fP : GL/K → A(L)

σ 7→ σ(QP ) − QP

where we use GL/K as a shorthand for Gal(L/K). Note that σ(QP ) − QP has order m. Indeed, [m](σ(QP ) − QP ) = [m]σ(QP ) − [m]QP = σ([m]QP ) − [m]QP = σ(P ) − P = 0 as P is defined over K. Thus, the target of fP can be replaced with A[m], so we can view fP as 26 a set map GL/K → A[m]. This association leads to a map

A(K) ∩ mA(L) → HomSet(GL/K ,A[m]) | {z } finite, as GL/K and A[m] are finite

P 7→ fP

0 Claim. If fP = fP 0 , then P − P ∈ mA(K). Note that the claim implies the result, since it gives an injective map A(K) ∩ mA(L) ,→ finite set mA(K)

Proof of the Claim. If fP = fP 0 , then σ(QP ) − QP = σ(QP 0 ) − QP 0 for every σ ∈ GL/K . 0 So, σ(QP − QP 0 ) = QP − QP 0 for every σ ∈ GL/K , i.e. QP − QP 0 ∈ A(K). Then P − P = m(QP − QP 0 ) ∈ mA(K). This completes the proof of the lemma.  . Therefore, to prove weak Mordell-Weil, we may assume that A[m] ⊂ A(K) and .lµm ⊆ K. Let’s recall the Kummer sequence as an analogy: . ∗ x7→xm ∗ 1 −→ .lµm −→ K −→ K −→ 1 Taking group , we get . ∗ ∗ 1 . 1 ∗ 1 −→ .lµm ∩ K −→ K −→ K −→ H (GK ,.lµm) −→ H (GK ,K ) = 0 where the last terms is zero by Hilbert’s Theorem 90. Thus, we have an isomorphism ∗ ∗ m ∼ 1 . σ(α) K /(K ) = H (GK , lµm). This isomorphism is achieved by sending a → (σ 7→ ) where √ . α m σ(α) α ∈ a. The point is that often α∈ / K, so the cocycle (σ 7→ α ) is not a coboundary in general. If K contained all m-th roots of unity, then each cocycle would be a coboundary in 1 . which case H (GK ,.lµm) would be zero. Similarly, we consider 0 → A[m] −→ A(K) −→m A(K) −→ 0 Taking group cohomology, we obtain

m 1 1 0 → A[m] ∩ A(K) −→ A(K) −→ A(K) −→ H (GK ,A[m]) −→ H (GK ,A(K)) −→ ... Unfortunately, we don’t have a version of Hilbert Theorem 90 for abelian varieties, so 1 H (GK ,A(K)) is not necessarily zero. Since we are assuming A[m] ⊂ A(K), the connecting 1 homomorphism A(K) −→ H (GK ,A[m]) induces an injection 1 δ : A(K)/mA(K) ,→ H (GK ,A[m]) = Hom(GK ,A[m])

The reason for the last equality is that A[m] ⊂ A(K), and so GK acts trivially on A[m]. It is well-known that if G acts trivially on A, then the first non-trivial group cohomology is just the set of , i.e. we get usual homomorphisms instead of twisted ones, H1(G, A) := {f : G → A : f(gh) = gf(h) + f(g) ∀ g, h ∈ G} = {f : G → A : f(gh) = f(h) + f(g) ∀ g, h ∈ G} = Hom(G, A) 27 whenever G acts trivially on A. Let’s write down the map δ explicitly. It is analogous to the Kummer sequence.

δ : A(K)/mA(K) ,→ Hom(GK ,A[m])

(P mod mA(K)) 7→ (σ 7→ σ(QP ) − QP where [m]QP = P )

Now, the issue is that Hom(GK ,A[m]) is not finite in general. The trick is to pass to an appropriate subextension K ⊆ L ⊆ K, namely Y L := K([m]−1A(K)) = K(Q) Q∈A(K) mQ∈A(K) where product symbol is used to indicate that we are taking the compositum over all such −1 fields K(Q). The point is that δ(P )(σ) = σ(QP ) − QP where QP ∈ [m] P (”m”-th roots of P ). If σ fixes all of L, then δ(P )(σ) = 0. In other words, Gal(K/L) is contained in the kernel ∼ of δ(P ): GK → A[m]. Since GK / Gal(K/L) = Gal(K/K)/ Gal(K/L) = Gal(L/K), we see that δ(P ) factors through GL/K := Gal(L/K), i.e. δ(P ): GL/K → A[m]. Consequently, we obtain an injection A(K)/mA(K) ,→ Hom(GL/K ,A[m])

The advantage is that Hom(GL/K ,A[m]) is a lot smaller than Hom(GK ,A[m]). Our goal is to show that Hom(GL/K ,A[m]) is finite. It suffices to establish that L/K is a finite extension. Goal. We want to prove that [L : K] < ∞, in which case #GL/K < ∞ so that # Hom(GK ,A[m]) < ∞, which would imply that A(K)/mA(K) is finite, thus proving the Weak Mordell-Weil Theorem. We will proceed by showing the following steps. Step 1. L/K is abelian of exponent m. Step 2. There are only finitely many primes p of K which ramifies in L. Step 3. Step 1 + Step 2 ⇒ L/K is finite. 13.1. Step 1. We have a pairing:

A(K) × GK → A[m]

(P, σ) 7→ σ(QP ) − QP −1 where QP ∈ [m] P . We first need to check the map is well-defined, i.e. does not depend 0 −1 on the choice of QP . Indeed, suppose that QP and QP are both elements of [m] P , i.e. 0 0 0 [m]QP = P and [m]QP = P . We need to show that σ(QP ) − QP = σ(QP ) − QP . Note that 0 0 [m](QP − QP ) = 0 so that QP − QP ∈ A[m] ⊂ A(K) by assumption. We have 0 0 0 0 σ(QP ) − QP − (σ(QP ) − QP ) = σ(QP − QP ) − (QP − QP ) = 0 as desired. Note that the assignment given here is “bilinear” (a group homomorphism once 0 0 you fix either of the entries). Given P,P , we have [m](QP + QP 0 ) = P + P , so that QP +P 0 = QP + QP 0 . Thus, 0 0 (P + P , σ) 7→ σ(QP +P 0 ) − QP +P 0 = σ(QP ) + σ(QP 0 ) − QP − QP 0 ← (P, σ) + (P , σ) [ On the other hand, given σ, τ ∈ GK , we have

(P, σ ◦ τ) 7→ σ(τ(QP )) − QP = σ(τ(QP )) − τ(QP ) + τ(QP ) − QP

= σ(QP ) − QP + τ(QP ) − QP ← (P, σ) + (P, τ) [ 28 We should justify the last equality: σ(τ(QP )) − τ(QP ) = σ(QP ) − QP . This is equivalent to showing that σ(τ(QP ))−σ(QP ) = τ(QP )−QP , which is true because τ(QP )−QP ∈ A[m] ⊂ A(K), so σ must fix this element, i.e. σ(τ(QP ) − QP ) = τ(QP ) − QP . Let’s try to understand what we need to quotient out so that the map A(K) × G(K) → A[m] becomes non-degenerate. First, let’s look for the points P ∈ A(K) such that σ 7→ σ(QP ) − QP is the zero map,

{P ∈ A(K): σ(QP ) = QP ∀σ ∈ GK where mQP = P } = {P : QP ∈ A(K) where mQP = P } = mA(K) So, to get non-degeneracy in the first component, we need to replace A(K) with A(K)/mA(K). Similarly, let’s for the elements σ ∈ GK such that P 7→ σ(QP ) − QP is the zero map,

{σ ∈ GK : σ(QP ) = QP ∀P ∈ A(K)} = {σ ∈ GK : σ fixes L} = Gal(K/L) −1 because L was defined precisely to be the field generated by elements QP from [m] (P ) as P varies in A(K). To get non-degeneracy in the second component, we need to replace ∼ GK = Gal(K/K) with the quotient Gal(K/K)/ Gal(K/L) = Gal(L/K). We conclude that the pairing A(K)/mA(K) × Gal(L/K) → A[m] is non-degenerate. In particular, Gal(L/K) ,→ Hom(A(K)/mA(K),A[m]) Since A[m] is an abelian group, and has exponent m, it follows that Gal(L/K) is also abelian of exponent m, completing Step 1.

13.2. Step 2. Given a number field K, and a prime ideal ℘ inside the ring of integers OK , we can consider the reduction map

N mod ℘ N P (K) −−−−−→ P (F℘) [a0 : a1 : ··· : aN ] 7→ [ea0 : ea1 : ··· : eaN ] where eam = ai mod ℘. For this map to be well-defined (i.e. to avoid all the entires to be 0 mod ℘), we can choose projective coordinates so that

(1) every ai is ℘-integral, i.e. ord℘(ai) ≥ 0. (2) some ord℘(aj) = 0. For an abelian variety A ⊆ N , one can form A mod ℘ = A ⊂ N . This is called the P e e℘ PF℘ reduction of A modulo ℘. We have a natural map A(K) → A℘(F℘). Example. Let E be an elliptic curve given by y2 = x3 + Ax + B where A, B ∈ Z such 3 2 3 2 that 4A + 27B 6= 0. Then Eep is non-singular if and only if p - 2(4A + 27B ). One needs the extra factor of 2 in front of the because the equation is singular over F2. Theorem. Ae℘ is non-singular for all but finitely many primes ℘, and in fact it is an abelian variety (whenever it is non-singular). The second part of the statement deserves a remark. Just because Ae℘ is non-singular, it is not a priori clear that it is an abelian variety. Indeed, it is not clear that when one reduces the multiplication map µ : A × A → A mod ℘, the resulting map µ℘ : A℘ × A℘ → A℘ stays a morphism (in general it could be a rational map). 29 When Aewp is non-singular, we say that ℘ is a prime of good reduction for A. Otherwise, ℘ is a prime of bad reduction. Thus, we get a homomorphism of abelian varieties,

A(K) → Ae℘(F℘) for all primes ℘∈ / SA/K , where SA/K is the finite set of bad reduction primes. The following is a key input which goes into the proof of the Weak Mordell-Weil theorem. Key Fact. If A has a good reduction at ℘ and ℘ - m, then

A(K)[m] ,→ Ae℘(F℘)

In general, the kernel of A(K) → Ae℘(F℘) can be quite big. For example, if A(K) is infinite, then the kernel is necessarily infinite as the target Ae℘(F℘) is finite. Nevertheless, the key fact says that the m-torsion points will not be in the kernel. Analog for the multiplicative group. If p - m, then µm ,→ Fp. In this case, µm = Gm[m]. Note that the multiplicative group Gm = {xy − 1 = 0} is always non-singular when reduced mod p. The injection µm ,→ Fp says that different m-th roots do not coincide in Fp provided that p - m. Proposition. The Key Fact ⇒ L/K is unramified for ℘∈ / SA/K,m, where

SA/K,m = {℘ : A has a bad reduction at ℘} ∪ {℘ : ℘ | m}

Recall that the field L is defined by, Y L := K([m]−1A(K)) = K(Q) Q∈A(K) mQ∈A(K)

Proof. Let ℘∈ / SA/K,m. We want to show the unramifiedness. Let I℘ ⊂ GL/K be the inertia group at ℘. Since GL/K is abelian, the inertial group is “well-behaved”. Since e = #I℘ is equal to the ramification index at ℘, it suffices to prove that I℘ is the trivial group (being unramified means that e = 1). Let p be a prime in L lying above ℘, i.e. p | ℘. Recall the definition of the inertia group:  I℘ = ker GL/K → Gp/℘

Let σ ∈ I℘. So σ fixes “L-things” mod p. Let Q ∈ A(L) with [m]Q ∈ A(K). Then (1) σ(Q) ≡ Q mod p. (2)[ m](σ(Q) − Q) = σ([m]Q) − [m]Q = 0 as [m]Q ∈ A(K).

So σ(Q) − Q ∈ A[m]. Thus, σ(Q) − Q ∈ ker(A(L)[m] → Aep(Fp)) = 0 by the key fact, so σ(Q) = Q. Therefore, σ ∈ I℘ fixes every Q with [m]Q ∈ A(K), which are the elements used to generate L as a field extension over K. This implies that σ fixes L pointwise, i.e. σ = id in GL/K . Since σ ∈ I℘ was arbitrary, we conclude that the inertia group I℘ is trivial, and that ℘ is unramified in L.  This completes Step 2, assuming the key fact above (which will be proved later). 30 13.3. Step 3. We want to prove the following general result about field extensions. Proposition. Suppose that L/K is an abelian extension of exponent m and unramified at all ℘∈ / S for some finite set S of primes. Then L/K is a finite extension. . Proof. Without loss of generality, we can make K bigger and so we may assume that.lµm ⊂ K. Indeed, replacing K with a finite extension K0 will replace Gal(L/K) with its subgroup Gal(L/K0) which is still abelian of exponent m, and L/K is finite if and only if L/K0 is ∗ ∗ m ∼ . finite. By Kummer theory, there is an isomorphism K /(K ) = Hom(Gal(K/K),.lµ) given σ(β) m by b 7→ (σ 7→ β where β = b). This isomorphism comes from applying group cohomology to the Kummer exact sequence, and using Hilbert Theorem 90 that H1(Gal(K/K),K∗) = 1 . . 0, and that H (Gal(K/K), lµ) = Hom(Gal(K/K), lµ) because Gal(K/K) acts trivially on . . . lµ ⊂ A(K), so the twisted homomorphisms are just the usual homomorphisms. . 0 0 Next, note that L is generated by finite extensions K of K. Since Gal(L/K)  Gal(K /K), it follows that Gal(K0/K) is a finite abelian group of exponent m, so we can further decom- pose K0 into smaller extension of K whose Galois group is a subgroup of Z/mZ. Thus, L is generated by those finite subextensions K0 of K such that Gal(K0/K) ⊆ Z/mZ, i.e. Y L = K0 K0/K,K0⊂L GK0/K ⊆Z/mZ √ where each such K0 is obtained as K0 = K( m b) for some b ∈ K∗. Thus,

 pm  L = K bi : i ∈ I

∗ ∗ m where bi are chosen from equivalence classes in K /(K ) . By enlarging the finite set S, m if necessary, we may assume that S contains {℘ : ℘ |√m}. The discriminant of x − b is ±mmbm−1. So, for all primes ℘∈ / S, it follows that K( m b)/K is unramified at ℘ if and only if ord℘(b) ≡ 0 (mod m). We have an exact sequence

∗ ∗ m Y 0 → B −→ hbi ∈ K /(K ) : i ∈ Ii −→ Z/mZ ℘∈S

b 7→ (ord℘(b) (mod m))℘∈S where B is defined to be the kernel of the given map. We claim that B is finite. Note that ∗ ∗ m B ⊂ hb ∈ K /(K ) : ord℘(b) ≡ 0 (mod m) ∀℘}

e1 e2 er m For each b ∈ B, we can factorize the ideal (b) = (℘1 ℘2 ··· ℘r ) . By making S bigger, we can assume that the ring of S-integers RS is a PID, where R = OK is the ring of integers of the base field. Consider the class group HK = {a1, ..., ah} of K. Add to the set S all the primes ℘ with ord℘(ai) 6= 0. Then a1, a2, ..., ah are units in RS. Since RS is a PID, in e1 e2 er m m the decomposition (℘1 ℘2 ··· ℘r ) , we can write ℘i = (πi) so (b) = (β ) which means that m ∗ ∗ ∗ m b = ubβ for some unit ub in RS. By Dirichlet’s unit theorem, the set RS/(RS) is finite, so we just need to pick finitely many representatives u1, u2, ..., ut, which shows that B is finite. Q Combining the fact that B is finite and ℘∈S Z/mZ is finite (because the set S is finite, and we see how this was a key hypothesis), we conclude from the above displayed exact sequence ∗ ∗ m that hbi ∈ K /(K ) : i ∈ Ii is finite. As this is a generating set for L over K, it follows that L is a finite extension of K.  31 13.4. The proof of the Key Fact. We have proved the Weak Mordell-Weil Theorem, that is finiteness of A(K)/mA(K) for every m ≥ 2, modulo a key fact stated before, Key Fact. If A has a good reduction at ℘ and ℘ - m, then

A(K)[m] ,→ Ae℘(F℘) There are three different approaches: 1) Using formal groups – see the book [HS00], 2) Chevalley-Weil Theorem regarding unramified morphisms V → W – see exercise C.7 in [HS00], and 3) Hensel’s Lemma, plus some facts from algebraic geometry. We will follow approach 3) for our proof. Proof. We will need the following theorem, which we will assume without a proof: Theorem. Let k be an algebraically closed field, and A/k be an abelian variety, and g := dim(A). Then ( ( /pt )2g if p 6= 0 in k A(k)[pt] = Z Z (Z/ptZ)i if p = 0 in k, where 0 ≤ i ≤ g for each t ≥ 1. We also need Hensel’s Lemma, which we will state in two versions. The first version is the classical version, while the second one is a geometric version for varieties. Note that the uniqueness part in version 1 is missing in version 2. Hensel’s Lemma. Let K℘ be the completion of K at a prime ℘, and let R℘ denote the ring of integers, and denote by ℘ = ℘R℘ for the corresponding . Then the following statements hold. 0 Version 1. Let f(x) ∈ R℘[x] and α ∈ R℘ such that f(α) ≡ 0 (mod ℘) and f (α) 6≡ 0 (mod ℘). Then there exists a unique β ∈ R℘ such that β ≡ α (mod ℘) and f(β) = 0. Version 2. Let V/K ⊆ n be a variety. Reduce mod ℘ to get V / ⊂ n . Let ℘ PK℘ e℘ F℘ PF℘ Qe ∈ Ve℘(F℘). Assume that Qe is nonsingular point of Ve℘. Then there exists some Q ∈ V (K℘) such that Q ≡ Qe (mod ℘). Recall the definition of a non-singular point. We can choose local coordinates x1, ..., xr such that X is locally given by {f1 = ... = fs = 0} where we can choose fi ∈ R℘[x1, ..., xn].   A point Q is a nonsingular point of V if rank ∂fi (Q) = n − dim(V ). ∂xj e We can now prove the key fact above. Suppose that ℘ is a prime such that A has a good reduction at ℘ and ℘ - m. By the Version 2 of Hensel’s Lemma, the natural map

A(K℘)  Ae℘(F℘) is surjective. By enlarging the field K if necessary, we can assume that A[m] ⊆ K. Let’s view A[m] as a (rather than a set of closed points). Since A ⊂ PN is defined by some polynomial equations, A = {F1 = ··· = Fr = 0}, the same holds true for A[m], i.e. A[m] = {F1 = ··· = Fr = G1 = ··· = Gs = 0} where the polynomials G1,G2, ...., Gs arise from analyzing the equation [m]P = 0. Claim. The variety A[m] is non-singular. Proof. It suffices to show that the multiplication map [m]: A → A is unramified, in fact ´etale.This will imply the desired result as A[m] is the fiber of this map above 0. This can be checked by passing to C. Indeed, [m]: A(C) → A(C) gets identified with the map Cg/L → Cg/L given by z 7→ mz, i.e. multiplying each coordinate with m. This latter map is 32 ´etale, so the former map must be ´etaleas well. Thus, A[m] is non-singular in 0, and so it is non-singular over F℘ for all but finitely many primes ℘.  Hence, for all but finitely many primes ℘, we can Hensel’s lemma to A[m] to get a surjective map A[m](K℘)  Ae℘[m](F℘) Since both sides are isomorphic to (Z/mZ)2g, they both have the same cardinality m2g, and thus, the map A[m](K℘)  Ae℘[m](F℘) is also injective.  14. Mordell-Weil Theorem Our goal now is to prove the Mordell-Weil Theorem: A(K) is a finitely-generated abelian group for an abelian variety A and a number field K. We have already proved the Weak Mordell-Weil which states that A(K)/mA(K) is finite for each m ≥ 2. To obtain the full Mordell-Weil from the weak Mordell-Weil, we need to study the height machine on abelian varieties. 14.1. Divisors on Abelian Varieties. For an abelian variety A, consider the set End(A) = {algebraic homomorphisms A → A} This is called the endomorphism ring of A. The ring structure on End(A) is the following: for f, g ∈ End(A), and a point P ∈ A, define (f +g)(P ) = f(P )+g(P ), and (f ·g)(P ) = f(g(P )). There are some conditions to be checked here, such as distributivity. Theorem. (Theorem of the ). Given f, g, h ∈ End(A), and D ∈ Div(A), (f + g + h)∗D − (f + g)∗D − (f + h)∗D − (g + h)∗D + f ∗D + g∗D + h∗D ∼ 0 Remark. It is not true in general that (f + g)∗D = f ∗D + g∗D. If this were true, then the theorem of the cube would be trivial. Let’s apply the theorem of the cube in the particular case when f = [m], g = [1], and h = [−1]. So f(P ) = m[P ], g(P ) = P and h(P ) = −P for each P ∈ A. We get [m]∗D − [m + 1]∗D − [m − 1]∗D − [0]∗D + [m]∗D + [1]∗D + [−1]∗D ∼ 0 Note that [0]∗D = 0 and [1]∗D = D. After rearranging, we get [m + 1]∗D + [m − 1]∗D ∼ 2[m]∗D + D + [−1]∗D This gives a 2-step recurrence relation which can be solved, the two base cases being [0]∗D = 0 and [1]∗D = D. We state the final result. Corollary. For D ∈ Div(A) and m ≥ 1, we have m2 + m m2 − m [m]∗D ∼ D + [−1]∗D 2 2 Proof. Now that we know the answer, the proof can be obtained by a simple induction.  Definition. We say that a divisor D ∈ Div(A) is symmetric if [−1]∗D ∼ D. Similarly, we say that D is anti-symmetric if [−1]∗D ∼ −D. The above corollary simplifies in the special case of a symmetric or anti-symmetric divisor. Corollary. Let D ∈ Div(A) and m ≥ 1. If D is symmetric, then [m]∗D ∼ m2D. If D is anti-symmetric, then [m]∗D ∼ mD. Recall the context of canonical heights. If f is a morphism A → A such that f ∗D = λD ˆ for λ > 1, then we constructed a height function hf,D which satisfies some nice properties. 33 Theorem. Let K/Q be a number field, and A an abelian variety over K. Let D ∈ Div(A) be a symmetric divisor, so that [−1]∗D ∼ D. Then ˆ 1 n ˆ (1) hD(P ) = lim hD([2 ]P ) exists and satisfies hD(P ) = hD(P ) + O(1). n→∞ 4n ˆ 2 (2) For each m ≥ 2, we have hD([m]P ) = m hD(P ). ˆ 1 n ∗ Proof. (1) By definition, hf,D(P ) = lim hD(f (P )). In this case, f = [2] and [2] D = n→∞ λn ˆ ˆ 1 n 4D so λ = 4. Thus, hD := h[2],D = lim hD([2 ]P ). We have previously proved the n→∞ 4n ˆ property hD(P ) = hD(P ) + O(1) in general. (2) The key is that the multiplication maps [2] and [m] commute. For simplicity, we will denote [m]P with mP in the computation below. ˆ −n n hD(mP ) = lim 4 hD(2 (mP )) n→∞ −n n = lim 4 hD(m(2 P )) n→∞ −n n  = lim 4 h[m]∗D(2 P ) + O(1) n→∞ −n n = lim 4 (hm2D(2 P ) + O(1)) n→∞ −n 2 n  = lim 4 m hD(2 P ) + O(1) n→∞ −n 2 n = lim 4 m hD(2 P ) n→∞ 2 −n n 2ˆ = m lim 4 hD(2 P ) = m hD(P ) n→∞ as desired. 

14.2. More height inequalities. Fix a point Q ∈ A(K). Define a translation map

TQ : A → A P 7→ P + Q Let’s use the theorem of the cube for a symmetric divisor D, (f + g + h)∗D − (f + g)∗D − (f + h)∗D − (g + h)∗D + f ∗D + g∗D + h∗D ∼ 0

with f = TQ, g = T−Q and h = [−1]. We first compute all the relevant maps: (f + g + h)(P ) = P + Q + P − Q − P = P (f + g)(P ) = P + Q + P − Q = 2P (f + h)(P ) = P + Q − P = Q (g + h)(P ) = P − Q − P = −Q Thus, f + g + h = [1], f + g = [2], and f + h = Q and g + h = −Q are the constant maps with values Q and −Q, respectively. Going back to the theorem of the cube, and using the fact that the pullback of D under a constant map is 0, we obtain

∗ ∗ ∗ ∗ ∗ [1] D − [2] D − 0 − 0 + TQD + T−QD + [−1] D ∼ 0 34 Now [1]∗D and [2]∗D = 4D. Also [−1]∗D = D as D is assumed to be symmetric. Rearranging the terms, ∗ ∗ TQD + TQD ∼ 2D for every fixed Q ∈ A(K). We will convert this statement about linear equivalence of divisors (geometry) to an assertion about the height machine (arithmetic):

h ∗ (P ) + h ∗ (P ) = h (P ) + O(1) TQD T−QD 2D

Using functoriality, h ∗ (P ) = h (T (P )) = h (P +Q). Similarly, h ∗ (P ) = h (P −Q). TQD D Q D T−QD D Using linearity, h2D(P ) = 2hD(P ) + O(1). Thus,

hD(P + Q) + hD(P − Q) = 2hD(P ) + OA,D,Q(1) where the subscript on the O(1) is written to emphasize which variables it depends on. We will shortly see a more precise version of the bound (using the theorem of the ) which will explain how the bound depends on Q. In particular, if D is ample, then hD(P + Q) ≥ O(1). Consequently,

2hD(P ) + O(1) = hD(P + Q) + hD(P − Q) ≥ O(1) + hD(P − Q) or equivalently,

hD(P − Q) ≤ 2hD(P ) + O(1) This is the key height inequality that is used in the argument in the proof of Mordell- Weil. We will refer to this inequality as the descent inequality.

14.3. Descent. We are finally ready to prove the Mordell-Weil theorem. We will assume the Weak Mordell-Weil, which we have already proved. Theorem. (Mordell-Weil) For an abelian variety A over a number field K, the abelian group A(K) is finitely-generated. Proof. Fix m ≥ 2. By the Weak Mordell-Weil Theorem, the quotient A(K)/mA(K) is finite. Choose coset representatives

{Q1,Q2, ..., Qr} ←→ A(K)/mA(K) Fix an ample symmetric divisor D. This can be obtained by taking an ample divisor H, and considering D = H + [−1]∗H which is both ample and symmetric. For each point P , we have 2 hD(mP ) ≥ m hD(P ) − C1(A, D, m)(3)

hD(P − Q) ≤ 2hD(P ) + C2(A, D, Q)(4)

ˆ 2ˆ for some constants C1,C2. The first inequality relies on hD(mP ) = m hD(P ), while the second inequality is the descent inequality. Applying (4) for each coset representative Qi, we can get a single constant C2(A, D, Q1, ..., Qr) satisfying

hD(P − Q) ≤ 2hD(P ) + C2(A, D, Q1, ..., Qr)(5)

Take any P0 ∈ A(K). There exists some i1 such that

P0 ≡ Qi1 (mod mA(K)) 35 that is, P0 = mP1 + Qi1 for some P1 ∈ A(K). Similarly, there is some i2 such that P1 =

mP2 + Qi2 for some P2 ∈ A(K). Continuing in this way n times, we have constructed a sequence of points P0,P1, ..., Pn ∈ A(K) such that

P0 = mP1 + Qi1

P1 = mP2 + Qi1 . .

Pn−1 = mPn + Qin

The idea of the descent is to show that the heights of the points Pi must be getting smaller as i increases. The system of equations imply that n P0 = m Pn + Z-linear combination of Q1, ..., Qr

For each 1 ≤ j ≤ n, we obtain Pj−1 − Qij = mPj, so hD(Pj−1 − Qij ) = hD(mPj). Applying (4) and (5), 2 2hD(Pj−1) + C2 ≥ hD(Pj−1 − Qij ) = hD(mPj) ≥ m hD(Pj) − C1 Thus, 2 m hD(Pj) − C1 ≤ 2hD(Pj−1) + C2 or equivalently, 2 C + C h (P ) ≤ h (P ) + 1 2 D j m2 D j−1 m2 Apply this repeatedly to get, !  2 n C + C 2  2 2  2 n−1 h (P ) ≤ h (P ) + 1 2 1 + + + ··· + D n m2 D 0 m2 m2 m2 m2 Note that 2 n−1 ∞ i 2  2   2  X  2  1 m2 1 + + + ··· + ≤ = = m2 m2 m2 m2 1 − 2 m2 − 2 i=0 m2 Substituting this bound into the previous one, we obtain  2 n C + C h (P ) ≤ h (P ) + 1 2 D n m2 D 0 m2 − 2 As m ≥ 2, we can get even more rough upper bound by replacing each m with 2, 1n C + C h (P ) ≤ h (P ) + 1 2 D n 2 D 0 2

This explains why the height of Pn decreases as n increases. In particular, we can find n 1 n (which depends only on the initial point P0) such that 2 hD(P0) ≤ 1, so that C + C h (P ) ≤ 1 + 1 2 D n 2 Recall that n P0 = m Pn + Z-linear combination of Q1, ..., Qr 36 Since P0 was an arbitrary point in A(K), we conclude that   C + C  A(K) ⊂ Span {Q , ..., Q } ∪ P ∈ A(K): h (P ) ≤ 1 + 1 2 Z 1 r D 2

The first set {Q1, ..., Qr} is finite (because of Weak Mordell-Weil), and the second set is also finite because D is ample. This finally completes the proof of the Mordell-Weil Theorem. 

15. Theorem of the Square Recall the canonical heights for abelian varieties. Let D be a symmetric divisor on an ∗ ˆ abelian variety A, i.e. D = [−1] D. We get a canonical height hD : A(K) → R satisfying ˆ 2ˆ (1) hD(mP ) = m hD(P ). ˆ (2) hD(P ) = hD(P ) + O(1).

Theorem. (Theorem of the Square). Consider the following four maps σ, δ, π1, π2 : A × A → A defined by σ(P,Q) = P + Q, δ(P,Q) = P − Q

π1(P,Q) = P, π2(P,Q) = Q Then ∗ ∗ ∗ ∗ σ D + δ D ∼ 2π1D + 2π2D Remark. There is an equivalent version of the theorem of the square which states that the map

φD : A → Pic(A) ∗ P 7→ T−P D

is a homomorphism of abelian groups. Here T−P is the translation by −P . Another fact: ker(φD) is finite if and only if D is ample. As usual, we can convert linear equivalence between divisors into a statement about the height machine. Theorem. For each P,Q ∈ A, we have

hD(P + Q) + hD(P − Q) = 2hD(P ) + 2hD(Q) + OA,D(1) This is a more precise statement than the descent inequality used in the proof of the Mordell- Weil theorem. By replacing P with 2nP , and Q with 2nQ in the theorem above, multiplying 1 both sides by 4n and letting n → ∞, we obtain ˆ ˆ ˆ ˆ hD(P + Q) + hD(P − Q) = 2hD(P ) + 2hD(Q)(6) The advantage of using canonical heights is that there is no more O(1) term. The equation (6) is called the formula. Exercise. Use the parallelogram formula to deduce that A(K) × A(K) → R 1   (P,Q) 7→ hP,Qi := hˆ (P + Q) − hˆ (P ) − hˆ (Q) D 2 D D D 1 ˆ is a bilinear form. Note that the factor of 2 is needed to ensure that hP,P iD = hD(P ). 37 16. on A(K)/A(K)tors Using the theorem of the square, we have proved the parallelogram formula, ˆ ˆ ˆ ˆ hD(P + Q) + hD(P − Q) = 2hD(P ) + 2hD(Q) ˆ for any symmetric divisor D ∈ Div(A) on abelian variety. We claim that this implies hD is a quadratic form. The proof is a formal consequence of the parallelogram law, so we record the most general version. The proof given below appears in [HS00, Lemma B.5.2]. Lemma. Let A be an abelian group, and f : A → R be any function satisfying the parallelogram law, f(P + Q) + f(P − Q) = 2f(P ) + 2f(Q) for every P,Q ∈ A. Then f is a quadratic form on A. Proof. When P = Q = 0, the parallelogram law yields f(0) = 0. Letting P = 0, and keeping Q as a variable, we get f(Q) + f(−Q) = 2f(Q), so that f(−Q) = f(Q). Thus, f is an even function with no constant term. Next, we apply the parallelogram law four times, namely for the pairs (P + R,Q), (P + Q, R), (P,R − Q), and (R,Q). f(P + R + Q) + f(P + R − Q) − 2f(P + R) − 2f(Q) = 0(7) f(P + Q + R) + f(P + Q − R) − 2f(P + Q) − 2f(R) = 0(8) f(P + R − Q) + f(P − R + Q) − 2f(P ) − 2f(R − Q) = 0(9) 2f(R + Q) + 2f(R − Q) − 4f(R) − 4f(Q) = 0(10) Adding (7), (8), and subtracting (9), (10), and dividing both sides by 2, we obtain (11) f(P + Q + R) + f(P ) + f(Q) + f(R) = f(P + Q) + f(Q + R) + f(R + P ) or equivalently, f(P + Q + R) − f(P ) − f(Q + R) = [f(P + Q) − f(P ) − f(Q)] + [f(P + R) − f(P ) − f(R)]

1 Define hP,Qi = 2 (f(P + Q) − f(P ) − f(Q)). So (11) is equivalent to, hP,Q + Ri = hP,Qi + hP,Ri

Thus, hP,Qi is a bilinear form, and f is a quadratic form on A.  Theorem. Let D be an ample symmetric divisor on abelian variety A. Then: ˆ (1) hD is a quadratic form. ˆ (2) hD(P ) = 0 ⇔ P ∈ Ators. ˆ ∼ rank A(K) (3) hD is positive definite on A(K)/A(K)tors = Z . ˆ ∼ rank A(K) (4) hD induces a positive definite quadratic form on A(K) ⊗Z R = R . Proof. We have already proved (1) above. We do not need ampleness here. 2ˆ ˆ ˆ (2) (⇐) If P ∈ Ators, then mP = 0, so m hD(P ) = hD(mP ) = hD(0) = 0, which implies ˆ that hD(P ) = 0. ˆ ˆ 2ˆ ˆ (2) (⇒) If hD(P ) = 0, then hD(mP ) = m hD(P ) = 0 for each m ≥ 1. Since hD = ˆ hD + O(1), we conclude that {hD(mP )}m≥1 is a bounded set. Since D is ample, the set {mP }m≥1 must be finite, i.e. nP = kP for some n > k, so that (n − k)P = 0 and P ∈ A(K)tors. 38 ˆ (3) First, we should explain why hD induces a well-defined function on A(K)/A(K)tors. ˆ We have already seen that hD(T ) = 0 if and only if T ∈ A(K)tors. In particular, if P ∈ A(K) and T ∈ A(K)tors with mT = 0, then 1 1 1 hˆ (P + T ) = hˆ (m(P + T )) = hˆ (mP + mT ) = hˆ (mP ) = hˆ (P ) D m2 D m2 D m2 D D ˆ ˆ ˆ Since hD(P + T ) = hD(P ) for each P ∈ A(K) and T ∈ A(K)tors, the function hD is well- ˆ defined on A(K)/A(K)tors. Since D is ample, hD : A(K)/A(K)tors → [0, ∞). Using (2) ˆ ˆ again, we know that hD(P ) = 0 if and only if P = 0 for P ∈ A(K)/A(K)tors. Thus, hD is positive definite on A(K)/A(K)tors. ∼ ∼ r (4) So far we have a lattice L = A(K)/A(K)tors = Z where r = rank A(K), and a certain quadratic form q : L → [0, ∞). We should first explain how to define q on L ⊗Z R. Given Pn v ∈ R ⊗Z L, we can write v = i=1 αi ⊗ vi for The key is to use the bilinear property. We 1 have h·, ·i: L × L → R defined by ha, bi = 2 (q(a + b) − q(a) − q(b)) so that q(v) = hv, vi for v ∈ L. We can define q(v) as follows: n ! * n n + X X X q(v) = q αivi = αivi, αjvj i=1 i=1 j=1

X X X q(vi + vj) − q(vi) − q(vj) = hα v , α v i = α α hv , v i = α α i i j j i j i j i j 2 i,j i,j i,j ˆ ∼ ∼ r For q = hD and L = A(K)/A(K)tors = Z , first note that L = (A(K) → A(K) ⊗ R) where the map A(K) → A(K) ⊗ R is v 7→ v ⊗ 1. Let V = A(K) ⊗ R = Rr, and so q induces a quadratic form on Rr. By a standard result in linear , q can be diagonalized on V with its associated matrix having a certain number 1s, −1s, and 0s on the :   1s q ←→  −1t  0r−s−t r Let {v1, ..., vr} be a basis for R where the quadratic form has this diagonal representation: r ! s s+t X X 2 X 2 q(v) = q civi = ci − ci i=1 i=1 i=s+1 With this notation, q is positive definite on V ⇔ s = r and t = 0. Assume, to the contrary, ˆ that q = hD is not positive definite so that s < r. For all ε > 0 and B > 0, look at the set ( r s r ) X X 2 X 2 Cε,B = civi : ci ≤ ε and ci ≤ B i=1 i=1 i=1 #» #» Observe that Cε,B is compact, convex, and symmetric, that is, if x ∈ Cε,B then − x ∈ Cε,B. For a given ε > 0, lim (Cε,B) → ∞ B→∞ This is where the hypothesis s < r was crucially used. Indeed, if s = r, then the second condition in the definition of Cε,B is not present. Recall that Minkowski’s Theorem guaran- tees that a compact, convex and symmetric region E ⊂ Rr must contain a non-zero point 39 of a lattice L provided that Vol(E) > 2r Vol(Rr/L). In particular, for any given ε > 0, we

can choose the corresponding Bε > 0 to be large enough so that 0 6= P ∈ L ∩ Cε,Bε where L = A(K)/A(K)tors. By definition, r ! s s+t ˆ X X 2 X 2 hD(Pε) = q(Pε) = q ci(Pε)vi = ci(P ) − ci(P ) ≤ ε i=1 i=1 i=s+1 ˆ This implies that the set {P ∈ A(K): hD(P ) ≤ 1} is infinite, which is a contradiction as D is ample.  Remark. It is worth emphasizing that part (4) did not formally follow from part (3), and we had to use special properties of the height function, namely boundedness. Here is an example of a lattice L with a quadratic form q such that q is positive definite√ on L, but q does not induce positive definite quadratic form on L⊗Z R. Consider L =√Z+Z 2 as a subgroup√ 2 2 2 of R, and let q : L → R given by q(x) = |√x| . More explicitly, q(a + b 2) = a + 2b + 2ab 2. ∼ Then q is positive√ definite on L because√ 2 is irrational. On the other hand, L⊗Z R = R⊕R and q(a+b 2) = 0 for (a, b) = ( 2, −1) so q is not positive definite on L⊗Z R. The problem apparently arises from the fact that the image set q(L) has an accumulation point in R. By contrast, the boundedness property for ample divisors is precisely the statement that the set {hD(P )}P ∈A(K) is a discrete subset of R, so everything works out in this case. 17. Path towards Faltings: Mumford’s Theorem. 17.1. Abelian Regulator. Let A be an abelian variety over a number field K. Fix a ∼ symmetric ample divisor D ∈ Div(A). Let P1,P2, ..., Pr ∈ A(K) generate A(K)/A(K)tors = Zr where r = rank A. We define the abelian regulator of A(K) with respect to D as follows: ˆ RegD(A(K)) = covolume of A(K)/A(K)tors in A(K) ⊗ R relative to hD

= det (hPi,Pji)1≤i≤j≤r 6= 0 ˆ Note that the determinant is not zero because hD is positive definite. 17.2. Counting Functions. Given a variety V over a number field K with a height function hD, and B > 0, consider the following function

N(V (K), hD,B) := #{P ∈ V (K): hD(P ) ≤ B} It can be shown that N B N(P (K), h, B) ∼ CN,K for some constant CN,K depending on N and K. Theorem. (Neron) If A is an abelian variety, then   ˆ volume of r-dimensional ball rank A(K) N(A(K), hD,B) ∼ #A(K)tors · · B RegD(A(K)) Let C be a smooth curve of genus g ≥ 2. Question. How big is N(C(K), hD,B)? Falting’s Theorem asserts that N(C(K), hD,B) = O(1), so there are finitely many K- rational points on C. Before Falting’s celebrated result, Mumford had already shown (in the 1960s) the following weaker result

N(C(K), hD,B) < log B 40 We will work towards understanding Mumford’s proof.

18. The Jacobian of a curve Our eventual goal is to prove Falting’s theorem, which states that if C is a smooth curve of genus g ≥ 2 defined over a number field K, then C(K) is finite. We will soon the define the Jacobian variety J associated to C, which will come with an embedding C,→ J. The ∼ r Jacobian variety is an abelian variety, and so C(K) ,→ J(K) = finite group × Z by the Mordell-Weil theorem. Thus, Falting’s theorem would follow if one can show that C ∩ J(K) is a finite set. This latter statement is generalized by the following. Lang-Mordell Conjecture (proven by Faltings). Let A/C be an abelian variety, and Γ ⊂ A(C) be a finitely generated subgroup. Let X ⊂ A be a subvariety. Assume that X does not contain a translate of an abelian subvariety. Then X ∩ Γ is finite. Let C be a smooth projective curve of genus g, so C(C) is a . Topo- ∼ 2g logically, C(C) is a g-holed torus. We have H1(C(C), Z) = Z . By the definition of the 0 1 ∼ g , H (C(C), ΩC ) = Z . 2 2g+1 2g Example. Let Cf be a curve defined by y = x + a1x + ··· + a2g = f(x) where n i o disc(f) 6= 0. Then g(C ) = g. Indeed, one can show that H0(Ω1 ) = span x dx : 0 ≤ i < g . f Cf y 0 xidx 2xidy Note that 2ydy = f (x)dx ⇒ y = f 0(x) , so these proposed spanning elements are holo- morphic on (x, y)-frame.

18.1. Jacobian of curves over C. Let P0 ∈ C(C), and consider 0 1 H (C, Ω ) = span{ω1, ω2, ..., ωg} Given ω ∈ H0(C, Ω1), consider the map C(C) → C Z P P 7→ ω P0 As written, this is not a well-defined map because the output depends on the path between R  P0 and P . However, the ambiguity is precisely captured by the span γ ω : γ a closed loop . nR o This last set can be identified with γ ω : ω ∈ H1(C(C), Z) . This motivates us to consider the set Z Z Z   LC = ω1, ω2, ..., ωg : γ ∈ H1(C(C), Z) γ γ γ and the map ϕC g C(C) → C /LC Z P Z P Z P  P 7→ ω1, ω2, ..., ωg P0 P0 P0 Abel-Jacobi Theorem.

(1) LC is a lattice. (2) ϕC is an embedding. g (3) C /LC is an abelian variety. 41 In particular, (3) tells us that there exists a holomorphic map g/L ,→ N for some N. C C PC g Definition. The abelian variety C /LC is called the Jacobian of C, and often denoted by JC or JacC .

18.2. Algebraic construction of JC . We have an exact sequence

0 deg 0 −→ Div (C) −→ Div(C) −→ Z −→ 0 0 Div0(C) We can define Pic (C) = linear equivalence . Without loss of generality, assume that C(K) 6= ∅, 0 and pick P0 ∈ C(K). Then Pic (C) will be the Jacobian of the curve C. We have a map 0 j : C → Pic (C) given by j(P ) = [(P )−(P0)], where the square brackets mean modulo linear equivalence. We have that the map j is injective if and only if g ≥ 1 (or else there exists a global holomorphic function with exactly one zero and one pole, which gives an isomorphism between C and P1). We can extend the map j for n-tuple of points on C, namely j : Cn 7→ Pic0(C) " n # X j(P1, ..., Pn) 7→ j(P1) + ··· + j(Pn) = (Pi) − n(P0) i=1 Fact. n ≥ g ⇒ j(Cn) = Pic0(C). This fact uses Riemann-Roch.

18.3. Application of Riemann-Roch. Let C be a smooth projective curve of genus g, and let D ∈ Div(C). We define the space L(D) = {f ∈ k(C) : div(f) + D ≥ 0} and let `(D) = dim L(D). The Riemann-Roch Theorem tells us when we can find a mero- morphic function with prescribed zeros and poles. Weak version. If deg(D) ≥ 2g + 1, then `(D) = deg(D) − g + 1. Strong Version. Let KC be a canonical divisor on C. Then

`(D) − `(KC − D) = deg(D) − g + 1 Let’s recall the definition of the canonical divisor. Take any non-zero differential 1-form ω on C. At each point P ∈ C, let xP be a uniformizer i.e. ordP (xP ) = 1. Write ω = fdxP for some f ∈ k(C). This is possible as {meromorphic 1-forms} is 1-dimensional over k(C). Now define P ordp(ω) = ordp(fp), and set KC := div(ω) = P ∈C ordP (ω)P . Note that given another non- zero differential ω0, we have ω0 = gω for some g ∈ K(C). So div(ω0) = div(g) + div(ω), i.e. [KC ] = [div(ω)] is well-defined. Corollary 1. When D = 0, Riemann-Roch gives `(0) − `(KC ) = 0 − g + 1. Since `(0) = 1 (constant functions), this gives `(KC ) = g. So, dim({holomorphic 1-forms}) = g. Corollary 2. When D = KC , Riemann-Roch gives `(KC ) − `(0) = deg(KC ) − g + 1 ⇒ g − 1 = deg(KC ) − g + 1 ⇒ deg(KC ) = 2g − 2.

0 18.4. More on Pic (C). Let C/K be a genus g ≥ 1 curve, and P0 ∈ C(K). We have defined Pic0(C) as Div0(C) Pic0(C) = ∼ 42 where ∼ stands for linear equivalence. Note that Pic0(C), as defined, is just an abelian group. We will soon see that Pic0(C) is also a variety. There is a natural map j : C → Pic0(C) given by P 7→ [(P )−(P0)]. As we briefly saw before, we can use j to map multiple copies of C into 0 n 0 Pic (C). Indeed, j : C → Pic (C) can be defined by (P1, ..., Pn) 7→ [(P1)+...+(Pn)−n(P0)]. Clearly, the image does not depend on the order of the points, as the addition in Div(C) is (n) 0 (n) n commutative. Thus, we get a well-defined map C → Pic (C) where C := C /Sn. It turns out that j(C(g)) = Pic0(C) which will be proved by Riemann-Roch. In other words, every divisor of degree 0 on C can be expressed in the form (P1) + ... + (Pn) − n(P0) (g) (g) 0 for some points P1, ..., Pn. Since C is naturally a variety, the equality j(C ) = Pic (C) allows one to equip Pic0(C) with the structure of a variety. For this last assertion to make perfect sense, we also need some sort of injectivity of the map j. This is in fact true: there (g) 0 0 exists an U ⊂ C such that j|U : U,→ Pic (C). This turns “most of” Pic (C) into an an , and then there is a general theorem of Weil that allows one to extend to get an abelian variety structure on all of Pic0(C). Let’s summarize our observations in a theorem. Theorem. (Three key facts about Pic0). (1) j(C(n)) = Pic0(C). (g) 0 (2) There is a non-empty open set U ⊂ C such that j|U : U,→ Pic (C). (3) There exists an abelian variety J = JC = Jac(C) such that C(g) −→∼ J −→∼ Pic0(C). birational 0 Proof. (1) Let [D] ∈ Pic (C). Consider the divisor g(P0) + D. By Riemann-Roch,

`(g(P0) + D) ≥ deg(g(P0) + D) − g + 1 = g − g + 1 = 1

Therefore, there is some f ∈ K(C) such that div(f) + g(P0) + D ≥ 0. Let E = div(f) + g(P0) + D. Note that D ∼ E − g(P0), and deg(E) = g. See the book [HS00] for the proofs of parts (2) and (3).  g ∼ Upshot. There exists an abelian variety J such that j : C /Sg −→ J. birational Let Θ := j(Cg−1) ∈ Div(J) be the . This is an irreducible divisor, as C is irreducible. Theorem. (Properties of the Theta divisor). ∗ ∗ (1)[ −1] Θ ∼ Θκ where Θκ = Θ − κ, or more formally, Θκ = Tκ Θ. In fact, we will see in − ∗ the proof that κ = j(KC ). We will also use the notation Θ := [−1] Θ. ∗ ∗ − (2) j Θ = Θ|C ∼ g(P0) + κ, and j (Θz ) ∼ g(P0) − z for z ∈ J. Moral: Θ|C has degree g. (3) Recall that σ : J × J → J is the map (P,Q) 7→ P + Q, while π1 : J × J → J, π2 : J × J → J are the usual projections. Then ∗ ∗ ∗ ∗ (j × j) [σ Θ − π1Θ − π2Θ] ∼ −∆ + {P0} × C + C × {P0}

| {z } | ∗{z } | ∗{z } ∈Div(J×J) p1(P0) p2(P0) | {z } ∈Div(C×C)

where p1 : C × C → C and p2 : C × C → C are the usual projections.

Remark. We should explain the notation Θ−κ. Note that κ = j(KC ) in our application. Assuming g ≥ 1, we have that `(KC ) = g ≥ 1, so there exists an effective divisor which is P2g−2 linearly equivalent to KC . In other words, KC ∼ i=1 (Pi) for 2g − 2 distinct points on C. 43 2g−2 So we may define j(KC ) as an image under the map C → J. Now that we have viewed κ = j(KC ) as a point in J, we can rigorously define Θ − κ as the translation of Θ pointwise by κ. In other words, Θ − κ = {Q − κ : Q ∈ Θ}. Proof. (a) By definition, g−1 Θ = j(C ) = {[(P1) + ... + (Pg−1) − (g − 1)(P0)]}

Take some D ≥ 0, where deg(D) = g − 1. By Riemann-Roch, `(D) − `(KC − D) = deg(D) − g + 1, so

`(KC − D) = `(D) − deg(D) + g − 1 Note that `(D) ≥ 1 as D ≥ 0, and − deg(D) + g − 1 = 0 by assumption, so we obtain `(KC − D) ≥ 1, i.e. there exists f ∈ K(C) such that div(f) + KC − D ≥ 0. Let E = div(f) + KC − D. Then E ≥ 0, and deg(E) = (2g − 2) − (g − 1) = g − 1. By definition,

−D ∼ E − KC As a result,

−(D − (g − 1)(P0)) = (E − (g − 1)(P0)) −(KC − (2g − 2)(P0)) | {z } | {z } ∈Θ− ∈Θ | {z } ∈Θ translated by −j(KC ) Since E is effective, and deg(E) = g − 1, we know that E is linearly equivalent to g − 1 g−1 points. Thus, E − (g − 1)(P0) comes from j(C ). (g) 0 (g) (2) Sketch: We have a surjection C  J = Pic (C). There exists an open set U ⊆ C ,

U = {(P1, ..., Pg): Pi are distinct and `((P1) + (P2) + ... + (Pg)) = 1}

It is an exercise to show that the condition `((P1) + (P2) + ··· + (Pg)) = 1 is equivalent to showing that j is injective on U. Consider the special case −z ∈ j(U) where −z = [(Q1) + ··· + (Qg) − g(P0)] ∈ j(U). Note ∗ − − that P ∈ Support(j (Θz )) if and only if j(P ) ∈ Θz if and only if

(P ) − (P0) ∼ [(g − 1)(P0) − ((P1) + ... + (Pg−1))] − z

for some P1, ...., Pg−1, i.e.

(P ) + (P1) + ... + (Pg−1) ∼ (Q1) + ... + (Qg)

Since −z ∈ j(U), the points Q1, ..., Qg are all distinct and `(Q1 + ... + Qg) = 1,

(P ) + (P1) + ... + (Pg−1) = (Q1) + ... + (Qg) ∗ − and so P ∈ {Q1, ..., Qg}. So the Support(j (Θz )) ⊆ {Q1, ..., Qg}. Conversely, each (Qj) appears on the left hand-hand side as some (Pi). Interchanging the roles of P and Pi, ∗ − and running the argument backwards, we get that Qj = Pi ∈ Support(j (Θz )). Thus, ∗ − Support(j (Θz )) = {Q1, ..., Qg}. Then ∗ − j (Θz ) = (Q1) + ... + (Qg) ∼ g(P0) − z Next, Θ = Θ− by (1), and so z z−j(Kc) j∗(Θ ) = j∗(Θ− ) ∼ g(P ) − z + j(K ) = g(P ) − z + κ z z−j(KC ) 0 C 0 for all z ∈ j(U). For general z ∈ J, see the book [HS00]. 44 19. The Seesaw Principle Let C be a smooth curve of genus g ≥ 1, and P ∈ C(k) be a point. We have talked about ∼ 0 the map j : C,→ J = Pic (C) given by P 7→ [(P ) − (P0)]. We have also defined the theta divisor: Θ := j(C) + j(C) + ··· + j(C) ∈ Div(J) | {z } g−1 times The goal of this section is to prove the following linear equivalence: ∗ ∗ ∗ ∗ (12) (j × j) (σ Θ − π1Θ − π2Θ) ∼ −∆ + (C × {P0}) + ({P0} × C) Here σ : J × J → J is the addition map, and j × j : C × C → J → J is the natural map (P,P ) 7→ (j(P ), j(P )). The proof needs the following result from the theory of abelian varieties. Seesaw Principle. Let D ∈ Div(X × Y ) be a divisor. Suppose that:

• For every x1 ∈ X, D|{x1}×Y ∼ 0 in Div(Y ).

• There exists y1 ∈ Y such that D|X×{y1} ∼ 0 in Div(X). Then D ∼ 0 in Div(X × Y ).

Proof idea. For every x1 ∈ X, there exists some function fx1 ∈ k(Y ) such that div(fx1 ) =

D|{x1}×Y . Look at the map 1 X × Y → P

(x, y) 7→ fx(y)

By letting f(x, y) := fx(y), we have f(x, y) ∈ k(X × Y ). Here is a fact:

divX×Y (f(x, y)) = D + E × Y

for some E ∈ Div(X). By restricting this identity on the slice X × {y1},

divX×Y (f(x, y))|X×{y1} = D|X×{y1} + (E × Y )X×{y1}

By hypothesis, D|X×{y1} ∼ 0 and so we get

divX×Y (f(x, y))|X×{y1} ∼ (E × Y )X×{y1} or equivalently,

divX (f(x, y1)) ∼ E This means that E ∼ 0, and so E = div(g(x)) for some g(x) ∈ k(X). Substituting this back, we obtain

divX×Y (f(x, y)) ∼ D + divX×Y (g(x)) −1 Thus, D ∼ divX×Y (f(x, y)g(x) ) ∼ 0.  We proceed to prove the key linear equivalence (12). We will show that the difference of the two sides is ∼ 0 when restricted to C × {P1} for every P1 ∈ C. By , the difference will also be ∼ 0 when restricted to {P1} × C. The Seesaw Principle will then complete the proof. Fix a point P1 on the curve C. Define ι: C → C × C by P 7→ (P,P1). Note that for any ∗ divisor D ∈ Div(C × C), we have ι D ∼ DC×{P1}. So, we want to check ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ι (j × j) σ Θ − ι (j × j) π1Θ − ι (j × j) π2Θ ∼ −ι ∆ + ι (C × {P0}) + ι ({P0} × C) 45 Using functoriality of the pullback, this is equivalent to checking:

∗ ∗ ∗ ∗ ∗ ∗ (σ◦(j ×j)◦ι) Θ−(π1 ◦(j ×j)◦ι) Θ−(π2 ◦(j ×j)◦ι) Θ ∼ −ι ∆+ι (C ×{P0})+ι ({P0}×C) We will first focus on the left hand side. Note that

σ ◦ (j × j) ◦ ι(P ) = σ ◦ (j × j)(P,P1) = j(P ) + j(P1) = Tj(P1) ◦ j(P )

π1 ◦ (j × j) ◦ ι(P ) = π1 ◦ (j × j)(P,P1) = j(P )

π2 ◦ (j × j) ◦ ι(P ) = π2 ◦ (j × j)(P,P1) = j(P1) So the left hand side becomes

∗ ∗ ∗ ∗ ∗ j Tj(P1)(Θ) − j (Θ) − 0 ∼ j (Θ − j(P1)) − j (Θ)

∼ g(P0) − j(P1) + κ − (g(P0) + κ)

∼ −j(P1) = −[(P1) − (P0)] = [(P0) − (P1)] The right hand side is the same:

∗ ∗ ∗ −ι ∆ + ι (C × {P0}) + ι ({P0} × C) ∼ −(P1) + 0 + (P0) = [(P0) − (P1)] as desired.

20. Mumford’s Theorem The goal is to prove the following result due to Mumford: #{P ∈ C(K): h(P ) ≤ B} ≤ c log(B) for some constant c > 0. It is worthwhile to compare this with the corresponding result for the Jacobian variety:

1 rank(J(K)) #{P ∈ J(K): h(P ) ≤ B} ≈ B 2 ˆ We have the canonical height function associated to the divisor Θ, namely hJ,Θ : J(K) → R. Theorem. The theta divisor Θ is ample. q ˆ For z ∈ J(K), let ||z|| := hJ,Θ(z). We also have the inner product: 1 hz , z i := ||z + z ||2 − ||z ||2 − ||z ||2 1 2 2 1 2 1 2

Here we really ought to be thinking of z ∈ J(K) ⊗Z R. During the course of the proof, we will make a simplification, which is actually false (i.e. almost never happens). The advantage is that the analysis becomes cleaner, and one can then go back and fix the argument in a straightforward manner. Basically, we would rather see the big picture (under the false, but morally right, assumption) rather than get lost in the technical details from the very beginning. Simplification. Θ− ∼ Θ (false) The key linear equivalence (12) converts into a statement about height functions: ˆ ˆ ˆ hJ,Θ(j(P1) + j(P2)) − hJ,Θ(j(P1)) − hJ,Θ(j(P2)) = −h∆(P1,P2) + h(P0)(P1) + h(P0)(P2) + O(1) 46 If P1 6= P2, then h∆(P1,P2) ≥ O(1) because ∆ > 0 by the base-locus property. Apparently ∗ ∗ − this can fail if P1 = P2. Using our assumption, j (Θ) = j (Θ ) ∼ g(P0), we know that ˆ gh(P0)(P ) = hJ,Θ(j(P )). Thus, 1 1 ||j(P ) + j(P )||2 − ||j(P )||2 − ||j(P )||2 ≤ ||j(P )||2 + ||j(P )||2 + O(1) 1 2 1 2 g 1 g 2 ∼ r For simplicity, we will use ||P || := ||j(P )||. Let V := J(K) ⊗ R = R where r = rank J(K). Note that ker(J(K) → V ) = J(K)tors. Let S be the image of C(K) in V . We might as well focus on bounding the set S. We have shown that for P,Q ∈ S with P 6= Q, we have 1 ||P + Q||2 − ||P ||2 − ||Q||2 ≤ ||P ||2 + ||Q||2 g where we are ignoring the constant O(1) at the end for simplicity. Using the parallelogram law, we get an equivalent formulation: 1 ||P ||2 + ||Q||2 − ||P − Q||2 ≤ ||P ||2 + ||Q||2 g Consequently,  1 ||P − Q||2 ≥ 1 − ||P ||2 + ||g||2 g Somehow this inequality is only useful if the points P and Q have similar heights. In this case, it will force the points P and Q to be apart. We will prove a more general result that holds for any lattice in a finite-dimensional vector space. Our application will be the special case when L = J(K) and S = C(K). Proposition. Let (V, || · ||) be a finite-dimensional real vector space equipped with the Euclidean norm || · ||. Let S ⊂ L ⊂ V where L is a lattice in V . Suppose there exists α > 0 such that ∀x, y ∈ S with x 6= y, ||x − y||2 ≥ α ||x||2 + ||y||2 Then #{x ∈ S : ||x|| ≤ R} ≤ c log R for some constant c > 0. Compare this with: r #{x ∈ L : ||x|| ≤ R} ∼ cL · R

where r = dim(V ) and cL depends on the fundamental domain of the lattice L. The idea behind the proof is to count the points in S carefully by considering points on the annulus. Proof. For u ≤ v, let

S(u, v) = {x ∈ S : u < ||x|| ≤ v} = S ∩ (B0(v) \ B0(u)) where 0 is the zero vector of V . Here we are employing the notation:

Bx0 (R) = {x ∈ V : ||x − x0|| ≤ R} 2 for a closed ball of radius R centered at x0. If x,√ y ∈ S(u, v) and x 6= y, then ||x − y|| ≥ α(||x||2 + ||y||2) ≥ 2αu2, which implies ||x − y|| ≥ 2αu. As a result,

Bx(βu) ∩ By(βu) = ∅ 47 √ 1 p where β = 2 2α = α/2. Using x ∈ B0(v), we get Bx(βu) ⊆ B0(v + βu). Thus, [ B0(v + βu) ⊇ Bx(βu) x∈S(u,v) where the union on the right is in fact a disjoint union because Bx(βu) ∩ By(βu) = ∅ for any x, y ∈ S with x 6= y. By taking the volume of both sides, X vol(B0(v + βu)) ≥ vol(Bx(βu)) x∈S(u,v) r r ⇒ vol(B0(1))(v + βu) ≥ #S(u, v) · vol(B0(1))(βu) ⇒ (v + βu)r ≥ #S(u, v)(βu)r

We deduce that v + βur  v r #S(u, v) ≤ = 1 + βu βu This is a very strong condition. The bound depends only on the ratio of the radii in the annulus. We now use a dyadic trick:   blog2 Rc X k k+1 #{x ∈ S : ||x|| ≤ R} ≤  #S(2 , 2 ) + #{x ∈ S : ||x|| ≤ 1} k=0 Note that the second piece is at most #{x ∈ L : ||x|| ≤ 1}, which is finite as L is a lattice, and so it is discrete. We will focus on the first piece:

blog2 Rc blog2 Rc r r X X  2k+1   2  #S(2k, 2k+1) ≤ 1 + ≤ log R · 1 + β2k 2 β k=0 k=0 p As β = α/2 and r are both constants, this finishes the proof of the proposition.  Let us address a few steps where we were not accurate, and how to fix them. Fudge. 1) Rather than ||x − y||2 ≥ α(||x||2 + ||y||2), one should use ||x − y||2 ≥ α(||x||2 + ||y||2) + O(1). This is not a huge problem, since we can just replace α with something else, i.e. we can instead use ||x − y||2 ≥ α˜(||x||2 + ||y||2) for a suitably chosenα ˜ < α. 2) In reality Θ 6= Θ−, so Θ is not a symmetric divisor. So we should instead apply the − argument with the divisor D = Θ + Θ = Θ + Θk, so the correct inequality should be  1 ||x − y||2 ≥ 1 − ||x||2 + ||y||2 − C (||x|| + ||y|| + 1) g 1 In our analysis we did not have the linear term ||x|| + ||y|| + 1. Using ||x + k||2 = ||x||2 + 2hx, ki + ||k||2, and completing the square, we can convert this into a version ||x − y||2 ≥ α0(||x||2 + ||y||2) for some α0 > 0. We have finally proven the theorem due to Mumford: Theorem (Mumford). If C is a smooth curve of genus g ≥ 2 over a number field K, then there is a constant c > 0 such that: #{P ∈ C(K): h(P ) ≤ R} ≤ c log R where D is any ample divisor on C. 48 How do different height functions on a curve relate to each other? Proposition. Let D1,D2 ∈ Div(C) where D2 is ample. Then h (P ) deg(D ) lim D1 = 1 hD2 (P )→∞ hD2 (P ) deg(D2) for any P ∈ C(K). Proof. First, we will prove the special case when E = D1 satisfies deg(E) = 0. By Riemann-Roch, if D ∈ Div(D) satisfies deg(D) ≥ 2g + 1, then D is very ample. It follows that D ∈ Div(C) is ample if and only if deg(D) ≥ 1. We also know that

D is ample ⇒ hD(P ) ≥ O(1) Take any integer n ≥ 1. Then deg(D + nE) = deg(D) ≥ 1 and deg(D − nE) = deg(D) ≥ 1 So D + nE and D − nE are ample, and therefore,

hD+nE(P ) ≥ O(1) and hD−nE(P ) ≥ O(1) Using functoriality, we obtain

hD(P ) ≥ −nhE(P ) + O(1)

hD(P ) ≥ nhE(P ) + O(1)

Dividing both sides by −nhD(P ) in the first equation, and by nhD in the second equation, we get 1 h (P )  1  − ≤ E + O n hD(P ) hD(P ) 1 h (P )  1  ≥ E + O n hD(P ) hD(P ) We have shown that: 1 h (P )  1  1 − ≤ E + O ≤ n hD(P ) hD(P ) n

Letting hD(P ) → ∞, 1 h (P ) h (P ) 1 − ≤ lim inf E ≤ lim sup E ≤ h (P )→∞ n D hD(P ) hD(P )→∞ hD(P ) n We deduce that h (P ) lim E = 0 hD(P )→∞ hD(P ) which proves the special case when deg(E) = 0. General Case. Let d1 = deg(D1), and d2 = deg(D2). We will apply the special case with E = d2D1 − d1D2 and D = D2. Note that deg(E) = 0. By using the conclusion of the special case, d h (P ) − d h (P ) lim 2 D1 1 D2 → 0 hD2 (P )→∞ hD2 (P ) 49 This translates into:   d2hD1 (P ) hD1 (P ) d1 lim − d1 → 0 ⇒ lim → hD2 (P )→∞ hD2 (P ) hD2 (P )→∞ hD2 (P ) d2 and we are done. p  In fact, a stronger fact is true. One can apparently show that hE(P ) ≤ c hD(P ) when E is a degree 0 divisor, and D is an ample divisor.

21. Diophantine Approximation Motivation. Let’s say that we want to solve xn − yn = A where n ≥ 2 and A ∈ Z with A 6= 0. So, we want to find all pairs (x, y) ∈ Z2 which satisfies this equation. One idea is to use factorization: (x − y)(xn−1 + xn−2y + ··· + yn−1) = A Let B = x − y and C = xn−1 + ··· + yn−1 so that A = BC. The set {(B,C): A = BC} is finite. For each potential value of B, one can substitute x = y + B into xn−1 + ··· + yn−1 to get (y + B)n−1 + ··· + yn−1 = 0 which has at most ≤ n − 1 solutions. So, in this example the solution set can be checked in finite time (in theory). In contrast, x2 − 2y2 = A often has infinitely many solutions. Perhaps more surprisingly, it turns out that x3 − 2y3 = A has finitely many solutions. This example explains some of the motivation for diophantine approximation. Here is the idea. Let’s assume A > 0 for simplicity. We can still factorize: √ √ √ (x − 3 2y)(x2 + 3 2xy + 3 4y2) = A The second factor can be completed to a square: √ √  1√ 2 3√ x2 + 3 2xy + 3 4y2 = x + 3 2y + 3 4y2 2 4 √ 3 3 2 Let C := 4 4y . It follows that √ √ x2 + 3 2xy + 3 4y2 ≥ Cy2 It follows that √ A A |x − 3 2y| = √ √ ≤ |x2 + 3 2xy + 3 4y2| Cy2 Consequently, √ x 3 A − 2 ≤ y C|y|3 √ 3 This last inequality says that x/y ∈ Q is very close to 2. According to the theorem of Axel Thue, there are only finitely many solutions x/y ∈ Q with gcd(x, y) = 1 satisfying this inequality. From now on, when we write x/y ∈ Q we will implicitly assume that gcd(x, y) = 1. More generally, Thue was able to show that if α is an irrational with degree [Q(α): Q] = d, then for any  > 0, the inequality

x 1 − α ≤ y yd/2+1+ε 50 is satisfied for only finitely many rational x/y ∈ Q with y > 0. In our application above, d = 3, and so d/2 + 1 + ε = 2.5 + ε. Diophantine approximation is the theory of how close rational quantities approximate irrational quantities. Theorem. (Dirichlet) Let α ∈ R\Q. There are infinitely many rational numbers x/y ∈ Q with

x 1 − α ≤ y y2 Proof. We will apply the pigeonhole principle. Let {t} := t − btc. Choose B ∈ N. Look at {0}, {α}, {2α}, ..., {Bα}. These will be the pigeons. Since there are B + 1 pigeons, we need B pigeonholes. Look at the intervals:  1   1 2  B − 1  0, , , , ..., , 1 B B B B

These are the pigeonholes. By the pigeonhole principle, there exist 0 ≤ x1 < x2 ≤ B, and 0 ≤ i < B such that  i i + 1 {x α}, {x α} ∈ , 1 2 B B Consequently, 1 |{x α} − {x α}| ≤ 2 1 B which translates into: 1 |x α − bx αc − x α + bx αc| ≤ 2 2 1 1 B 1 |(x − x )α − (bx αc − bx αc)| ≤ 2 1 2 1 B 1 Letting y = x2 − x1 and x = bx2αc − bx1αc, we get |yα − x| ≤ B . Dividing both sides by |y|, we obtain

x 1 1 α − ≤ ≤ y B|y| y2 since |y| = |x2 − x1| ≤ B.  Remark. One of the improvements of Dirichlet’s theorem is a result due to Hurwitz, which states that for α ∈ R \ Q, there are infinitely many solutions to:

x 1 − α ≤ √ y 5y2 √ The constant 5 cannot be replaced by any bigger constant, so the result is optimal in a √ √ 1+ 5 certain sense. This 5 comes from attempting to approximate the golden ratio α = 2 . In some sense, the golden ratio is the real number that is hardest to approximate by a rational number. Theorem. (Liouville) Let α ∈ Q \ Q and d = [Q(α): Q]. There exists a constant C(α), which depends on α, such that for all x/y ∈ Q,

x C(α) − α ≥ y yd 51 An equivalent formulation is the statement that there are only finitely many rational numbers x/y ∈ Q satisfying |x/y − α| < 1/yd+ε. d d−1 Proof. We may assume that α ∈ Q ∩ R. Let f(T ) = a0T + a1T + ··· + ad ∈ Z[T ] be the minimal polynomial of α. Note that f(α) = 0 but f 0(α) 6= 0. We can write f(T ) = (T − α)g(T ) where g(T ) ∈ R[x] and g(α) 6= 0. For any x/y ∈ Q,     x x x f = − α · g y y y So we have   d d−1 d x a0x + a1x y + ··· + ady 1 0 6= f = ≥ y yd |y|d Consequently,   x x 1 x 1 1 − α · g ≥ ⇒ − α ≥ · y y |y|d y |y|d |g(x/y)|

So we just need an upper bound for |g(x/y)|. Either x − α > 1, in which case we are done. y

So, we may assume that x − α ≤ 1, and y   x −1 g ≤ sup |g(t)| = C(α) y |t−α|≤1 for some C(α) > 0. Here we used the fact that g(α) 6= 0, and the fact that g(x) is a , and so it achieves a minimum on a closed set {t ∈ R : |t − α| ≤ 1}. Combining the inequalities, we obtain

x 1 1 C(α) − α ≥ · ≥ y |y|d |g(x/y)| |y|d as desired.  P∞ 1 Corollary. Let β = n=0 10n! is in R \ Q, i.e. it is a . xN PN 1 Proof sketch. Let = n! be the partial sums. If β were algebraic with degree yN n=0 10 d = [Q(β): Q], it would satisfy Liouville’s theorem. So,

C(β) xN constant d ≤ − β ≤ f(N) y yN N yN where f is some function that satisfies f(N) → ∞ as N → ∞. This contradicts the inequality above for values of N that are much bigger than d.  As we mentioned above, Thue improved Liouville’s theorem by showing that for every α ∈ Q \ Q with degree d = [Q(α): Q], and ε > 0, there exists a constant C(α, ε) > 0 depending only on α and ε such that

x C(α, ε) − α ≥ y |y|d/2+1+ε for all rational numbers x/y ∈ Q. The bound was later improved by Siegel to

x C(α, ε) − α ≥ √ y |y|2 d+1+ε 52 Dyson and Gelfond independently proved a slightly stronger statement:

x C(α, ε) − α ≥ √ y |y| 2d+1+ε √ √ k In his paper, Dyson suggested that√ maybe 2d can be replaced with kd for any k ≥ 1. By letting k → ∞, we would get k kd → 1. This goal was achieved by Roth. Thus, Roth’s theorem is the inequality:

x C(α, ε) − α ≥ y |y|2+ε The exponent 2 + ε is the best possible in view of Dirichlet’s theorem. All the results above are not effective, i.e. the proofs do not furnish any estimate for how big C(α, ε) can get. An effective version for Liouville’s theorem was proved by Baker, who showed that there are constants δ(α) > 0 and C(α) > 0 such that:

x C(α) − α ≥ y |y|d−δ(α) where both δ(α) and C(α) are effective. We can apply Baker’s theorem to the x3 − 2y3 = A. As we mentioned above, any solution (x, y) ∈ Z2 with y 6= 0 must satisfy: √ 3 √ C( 2) x 3 4A √3 ≤ − 2 ≤ √ |y|3−δ( 2) y 3 3 4|y|3 √ δ( 3 2) The left-hand side√ comes from Baker’s theorem. Rearranging the inequality, we get |y| ≤ 3 constant. Since δ( 2) > 0 is effective, there is a finite√ search space for the potential values of y. In theory, depending on how large the constant δ( 3 2) > 0 is, this allows us to enumerate all integral solutions to x3 − 2y3 = A.

References [Dob79] E. Dobrowolski, On a question of Lehmer and the number of irreducible factors of a polynomial, Acta Arith. 34 (1979), no. 4, 391–401, DOI 10.4064/aa-34-4-391-401. [HS00] Marc Hindry and Joseph H. Silverman, 201 (2000), xiv+558, DOI 10.1007/978-1-4612-1210-2. An introduction.

53