Quick viewing(Text Mode)

Arxiv:2004.13108V2 [Math.NT] 29 Apr 2020 N“Nta Ht Aabitfo H Edo Oui,[O1a Corollary 5.0.1

Arxiv:2004.13108V2 [Math.NT] 29 Apr 2020 N“Nta Ht Aabitfo H Edo Oui,[O1a Corollary 5.0.1

arXiv:2004.13108v2 [math.NT] 29 Apr 2020 n“nta ht aabitfo h edo oui,[o1a Corollary 5.0.1. [Moc15a, Claim moduli”, in of stated place field archimedean an the the of from at hypothesis built behavior on assumed data depend theta “ inequalities Szpiro”, “initial these “Probabilistic of in call All will we Szpiro”. which comput “Explicit inequalities explicit some three perform derive we to paper ity present the In [Moc15a]). of f[H0] nta ht aai tuple a is data theta initial [DH20b], of Date nodrt tt u eut ene otl bu nta ht dat theta initial about talk to need we results our state to order In 22 Inequ Mochizuki’s of interpretation probabilistic a gave we [DH20b] In Mochizuki’s From Inequality Szpiro’s For Inequalities Constants Szpiro Explicit References and Deriving Mochizuki the of Versions 8. Probabilistic Hulls on Bounds 7. Upper Logarithms 6. Archimedean 5. .Cnutr,MnmlDsrmnns n aicto 10 on Ramification Estimates and Discriminants, Minimal 4. Conductors, Packets Tensor in Computations 3. Explicit 2. Introduction 1. RBBLSI ZIO AYSPR,ADEPII SZPIRO EXPLICIT AND SZPIRO, BABY SZPIRO, PROBABILISTIC pi 0 2020. 30, April : Abstract. omltdi D2b ae ntento frno esrbeset measurable random of notion [M the of on version u based probabilistic [DH20b] str an derive from in mo be to derived formulated transparent, are will how more inequalities get are show these of proofs we we All the inequalities data but The 1.10] theta Theorem initial [Moc15b, constants). of in numerical spirit curve explicit the elliptic (in (with inequality an Szpiro’s for of variants ticular, d derive probabilistic theta our to initial with 3.12] together of Corollary formulas, presentation explicit new these a use we as paper well as Inequality Mochizuki’s in n[H0]w aesm xlctfrua o h idtriais in “indeterminacies” the for formulas explicit some gave we [DH20b] In p RMMCIUISCRLAY3.12 COROLLARY MOCHIZUKI’S FROM ai Logarithms -adic A LRDPYADATNHILADO ANTON AND DUPUY TAYLOR ( /,E F/F, 1. Introduction Contents F ,M l, , 1 V , V , mod bad ǫ , ) ial,adue friendly. user and difiable, omlto f[Moc15a, of formulation c5,Crlay3.12] Corollary oc15a, s. tosuigti inequal- this using ations Mc5].I par- In [Moc15b]). t.I h present the In ata. .A omltdin formulated As a. cl ekrthan weaker ictly lt Crlay3.12 (Corollary ality litccrebeing curve elliptic iomSzpiro niform aySpr” and Szpiro”, Baby .2,adsome and 3.12], d1 nqaiy29 Inequality , ind2 , ind3 35 18 18 16 § 5 1 5 surrounding an E = EF over a field F satisfying various conditions which are not important for the purposes of the introduction (the curious reader should consult [DH20b, 5]). What is important are the data types of the tuple: the entry l is a fixed prime § and, in the present paper, the choices of M, and ǫ will be irrelevant. We will discuss the sets bad of places V and Vmod momentarily. This requires some set up. bad In order to define the sets of places Vmod and V we need to introduce the fields F0 and K to which they belong. The field F0 is the field of Moduli of the elliptic curve defined by

F0 := Q(jE ),

in Mochizuki’s notation this is Fmod. The field K is the l-division field of F given by K := F (E[l]), (1.1) obtained by adjoining the l-torsion of E(F ) to F . In this paper, for any field L we will let V (L) denote the collection of places of L and for any non-archimedean place v V (L) we ∈ will let κ(v) denote the residue field and Lv denote the completion of L at v. We now come to the definitions of V bad and V . First V bad V (F ) is a non-empty set of mod mod ⊂ 0 bad multiplicative places over the field of moduli: for every E0 an elliptic curve over F0 such that E = E F , if v V bad, then E has multiplicative reduction at v. Next, the set ∼ 0 ⊗F0 ∈ mod 0 V V (K) is a set that maps bijectively to V (F ) under the natural map V (K) V (F ). ⊂ 0 → 0 We will be using these quantities momentarily but first we need to describe a special type of initial theta data that will be used in the course of this manuscript. For computational purposes one can always take an elliptic curve over its field of moduli (satisfying some mild conditions) and base change this field to a larger field to obtain some curve that can be put in initial theta data. We call such theta data “built from the field of moduli”. The precise definition is below. Definition 1.0.1. Let E/F be an elliptic curve inside initial theta data

bad (F/F,EF ,l,M,V ,Vmod, ǫ). We will say that such a tuple is built from the field of moduli provided E = E F where 0 ⊗F0 F = Q(j ) is the field of moduli of E, E is a model of E over F , F := F (√ 1, E [30]), 0 E 0 0 0 − 0 and V bad V (F ) is the full set of places of multiplicative reduction.1 mod ⊂ 0 We will often use the notation

d0 := [F0 : Q]. The constants in our Szpiro-like inequalities will depend on this degree. In stating our results we recall from [DH20b] that for a rational prime p that V (F0)p is given the structure of a probability space where Pr : V (F ) [0, 1] is defined by 0 p → [F : Q ] Pr(v) := 0,v p . [F0 : Qp]

1The definition of initial theta data in 5 of [DH20b] precludes bad places from having residue characteristic § two. Also the set of bad places needs to be nonempty. 2 Using V one can define some interesting probabilistic quantities which appear in our Prob- abilistic Szpiro inequality and give a good sense of the types of things that Mochizuki’s inequality “knows about”. Definition 1.0.2. (1) The probability that v V is unramified is ∈ p

Punr,p = Pr(w). (1.2) w∈{v∈V (FX0)p:e(v/p)=1} (2) The average ramification degree of v V is defined to be ∈ p ep = E(e(v/p)). (1.3) (3) The average different of V /Q is defined to be

Diff(V /Q)= pdiffp . (1.4) p Y diff(v/p) In (1.4) we have diffp = logp(E(p )) and diff(v/p) = ordp(Diff(Kv/Qp)); Diff(Kv/Qp) is the different of Kv over Qp. Using these quantities we can now state the Probabilistic Szpiro. Theorem 1.0.3 (Probabilistic Szpiro). Assume [Moc15a, Corollary 3.12] and Claim 5.0.1. bad For any elliptic curve E/F in initial theta data (F/F,EF ,l,M,V ,Vmod, ǫ) built from the field of moduli we have 1 ln ∆min | E/F | ln Diff(V /Q)+ ln(e )+ A (1.5) 6+ ε [F : Q] ≤ p l,V l p X where 5 A = ln(π)+ (1 Pl+1/2) ln(b )+ , l,V − unr,p p l +4 p X   and b =1/ exp(1) ln(p), and ε = 24(l + 3)/(l2 + l 12). p l − From the Probabilistic Szpiro Inequality we can derive a “Baby Szpiro” Inequality. This inequality only depends on discriminant and degree of the division field K. This inequality can be derived quickly dispensing with a discussion of ramification of the mod l Galois representation and its relation to the conductor (which is reviewed in 3). In what follows § for an I in a ring of integers R we let I denote the absolute norm. | | Theorem 1.0.4 (Baby Szpiro). Assume [Moc15a, Corollary 3.12] and Claim 5.0.1. Then bad for any elliptic curve E/F in initial theta data (F/F,EF ,l,M,V ,Vmod, ǫ) built from the field of moduli we have 1 ln ∆min | E/F | ln([K : Q]5/4) ln( Disc(K/Q) 5/4) + ln(π). (1.6) 6+ εl [F : Q] ≤ | | 2 In the above formula εl = (24l + 72)/(l + l 12) and K = Q(jE , E[30l], √ 1). −3 − In 8 we are more careful with our field invariants and delicately apply the theory of § [ST68]. The result of this delicate work is our “Explicit Szpiro”.

Theorem 1.0.5 (Explicit Szpiro). Assume [Moc15a, Corollary 3.12] and Claim 5.0.1. If bad E/F is an elliptic curve in initial theta data (F/F,EF ,l,M,V ,Vmod, ǫ) built from the field of moduli then

min A d2l4+B d 24+ε ∆ e 0 0 0 0 ( Cond(E/F ) Disc(F/Q) ) l , (1.7) | E/F | ≤ | |·| | where

A0 = 84372107405,

B0 = 316495, ε = 96(l + 3) /(l2 + l 12). l − All of these computations follow the same pattern. First, one computes an upper bound for the so-called hull of the multiradial representation (c.f. [DH20b, 4]) then one tries to § be as clever as possible in order to obtain futher bounds on the radiuses of these bounding polydiscs. The initial bounding is performed in 6 and the secondary bounds come in sub- § sequent sections. All of the material before 7 is built for this imminent application. A lot § of interesting comes into play at both stages of these computations and inter- estingly, as our treatment will show, it seems that there is a lot of room for improvement. We have put in a great deal of effort in an attempt to highlight some of these avenues for improvement which we hope readers will take an interest in (see for example Remark 6.2.1).

Acknowledgements. This article is very much indebted to many previous expositions of IUT including (but not limited to) [Fes15, Hos18, Ked15, Hos15, Sti15, Mok15, Moc17, Yam17, Hos17, Tan18, SS17]. The first author also greatly benefitted from conversations with many other mathematicians and would especially like to thank Yuichiro Hoshi for help- ful discussions regarding Kummer theory and his patience during discussions of the theta link and Mochizuki’s comparison; Kirti Joshi for discussions on deformation theory in the context of IUT; for productive discussions on Frobenioids, tempered funda- mental groups, and global aspects of IUT; Emmanuel Lepage for helpful discussions on the p-adic logarithm, initial theta data, aut holomorphic spaces, the log-kummer correspondence, theta functions and their functional equations, tempered fundamental groups, log-structures, cyclotomic synchronization, reconstruction of fundamental groups, reconstruction of decom- position groups, the ”multiradial representation of the theta pilot object”, the third indeter- minacy, the second indeterminacy, discussions on Hodge Theaters, labels, and kappa coric functions, and discussions on local class field theory; for his patience in clarifying many aspects of his theory — these include discussions regarding the relation- ship between IUT and Hodge Arakelov theory especially the role of ”global multiplicative subspaces” in IUT, discussions on technical hypotheses in initial theta data; discussions on Theorem 3.11 and ”(abc)-modules”, discussions on mono-theta environments and the inte- rior and exterior cyclotomes, discussions of the behavior of various objects with respect to 4 automorphisms and providing comments on treatment of log-links and the use of polyiso- morphisms, discussions on indeterminacies and the multiradial representation, discussions of the theta link, discussions on various incarnations of Arakelov Divisors, discussions on cyclotomic synchronization; Chung Pang Mok for productive discussions on the p-adic log- arithm, anabelian evaluation, indeterminacies, the theta link, and hodge theaters; Thomas Scanlon for discussions regarding interpretations and infinitary logic as applied to IUT and anabelian geometry. We apologize if we have forgotten anybody. The authors also benefitted from the existence of the following workshops: the 2015 Ox- ford workshop funded by the Clay Mathematics Institute and the EPSRC programme grant Symmetries and Correspondences; the 2017 Kyoto IUT Summit workshop funded by RIMS and EPSRC; the Vermont workshop in 2017 funded by the NSF DMS-1519977 and Sym- metries and Correspondences entitled Kummer Classes and Anabelian Geometry; the 2018 Vermont Workshop on Witt Vectors, Deformations and Absolute Geometry funded by NSF DMS-1801012. The first author was partially supported by the European Research Council under the European Unions Seventh Framework Programme (FP7/2007-2013) / ERC Grant agreement no. 291111/ MODAG while working on this project. The research discussed in the present paper profited enormously from the generous support of the International Joint Usage/Research Center (iJU/RC) located at Kyoto Universities Research Institute for Mathematical Sciences (RIMS) as well as the Preparatory Center for Research in Next-Generation Geometry located at RIMS.

2. Explicit Computations in Tensor Packets The entire purpose of this subsection is Theorem 2.8.1 and the entire purpose of Theo- rem 2.8.1 is the for hull computation in 6. At the end of the day the differents appearing in § these sections are what give rise to the conductor term in Szpiro inequalities (in conjunction with the material in 3). § Fix K ,...,K finite extensions of Q . Let L = K K . We are interested in 1 m p 1 ⊗···⊗ m the difference between the Z -lattices2 . It turns out that the index p OK1 ⊗···⊗OKm ⊂ OL of in is related to the Differents of K /Q which we describe in the OK1 ⊗···⊗OKm OL i p subsequent subsections. Again, this is needed for the hull computations.3 2.1. Integral Closures. For a reduced ring T , the total ring of fractions is defined by −1 κ(T ) = SmaxT where Smax is the multiplicatively closed set of non-zero divisors. We let Intκ(T )(T ) denote the integral closure of T in κ(T ). Let K ,...,K be finite extensions of Q and consider the special case T = 1 m p OK1 ⊗· · ·⊗OKm (where tensor products are taken over Z ). It turns out that κ(T )= K K (where p 1 ⊗···⊗ m tensor products are over Qp). Let L = κ(T ). We can use the Chinese Remainder Theorem

2A lattice of a Q -vectors space V is a Z -submodule V V which is free of rank dim(V ) whose Q -span p p 0 ⊂ p is all of V . 3And in explicit cases one can actually proceed differently, say, using conductors in the sense of Commu- tative Algebra. 5 to write L = r L where each L is a finite extension of Q . Define = r . It ∼ j=1 j j p OL j=1 OLj also turns out that Int (T )= . L L OL L 2.2. Differents and Discriminants. For a discussion on Differents and Discriminants of fields we refer the reader to [Neu99, III.2] or [Sut15] or [Con]. A very comprehensive review of Differents in great generality can be found in [Aut19, 0DW4] and the references therein. For A B a finite extension of rings we define the different ideal to be ⊃ Diff(A/B) := annB(ΩB/A).

Here ΩB/A is the module of Kahler differentials. When L/L0 is an extension of fields we use the notation Diff(L/L ) := Diff( / ). For an extension of number fields L/L the 0 OL OL0 0 different ideal and be computed “as a product” of local differents (see [Neu99], for details).

The different also behaves well in towers. If K is a finite extension of Qp with residue field k then can be written as OK = W (k)[x]/(f(x)), OK where f(x) is an Eisenstein polynomial of degree e (also e is the ramification degree of

K/Qp) and W (k) is the full ring of p-typical Witt vectors of k. As Diff(W (k)/Zp) = 1 (this extension is unramified) to compute Diff( /Z ) it remains to compute Diff( /W (k)). OK p OK From the formula Ω =( dx)/( df) and d(f(x)) = f ′(x)dx we find that OK /W (k) OK · OK · Diff( /Z )=(f ′(π)). (2.1) OK p We can do some more computations to get some useful information. We find f ′(π) = eπe−1 + where all of terms have distinct and the leading term of f ′(π) has ··· minimal valuation (this is due to the Eisenstein-ness hypothesis).4 This gives the formula e−1 Diff(K/Qp)=(eπ ) from which we compute 1 ord (Diff(K/Q )) = ord (e)+(e 1) ord (π) = ord (e)+(e 1) . p p p − p p − e

The discriminant of an extension of fields L/L0 is then defined to be the ideal-norm of the different: Disc(L/L )= N (Diff(L/L )) ⊳ 0 L/L0 0 OL0 We remark that pordp Diff(L/Q) = Disc(L/Q) 1/[L:Q]. This is helpful when thinking about p | | (say) (1.5). In later sections we will make use of the notation diff(K/Q ) = ord Diff(K/Q ). Q p p p

2.3. Explicit Chinese Remainder Formulas. Fix a field K0 and an algebraic closure K0. Let K1,...,Km be finite extensions of K0 sitting inside the common algebraic closure. The isomorphism of rings ϕ K K L (2.2) 1 ⊗···⊗ m −→ ψ Mψ∈Φ will play an important role for us. We describe its ingredients:

4See [Neu99]. 6 Φ m Hom(K , K ), is a complete system of representatives under the equiva- • ⊂ i=1 i 0 lence relation L (ψ ,...,ψ ) (σψ ,...,σψ ) 1 m ∼ 1 m for σ G(K /K ). ∈ 0 0 For ψ =(ψ ,...,ψ ) Φ we let L be the compositum • 1 m ∈ ψ L = ψ (K ) ψ (K ) K . ψ 1 1 ··· m m ⊂ 0 The isomorphism ϕ is defined via extending linearly the map • ϕ(a a )=(ϕ (a a )) , 1 ⊗···⊗ m ψ 1 ⊗···⊗ m ψ∈Φ where ϕ (a a )= ψ (a ) ψ (a ). ψ 1 ⊗···⊗ m 1 1 ··· m m We prove (2.2) is an isomorphism: To see this we first note that Spec(K1 Km) = m ⊗···⊗ i=1 Spec(Ki) so the scheme is zero dimensional (and the spectrum of a product of fields). Each maximal ideal in the tensor product is the kernel of some map ϕ : K K Q 1 ⊗···⊗ m → K0. Two such maps have the same kernel if and only if they differ by an automorphism of K0. This explains the bijection between maximal ideals of the tensor product and ( Hom(K , K ))/ . Also, using the composition i 0 ∼ L Ki K1 Km K0 → ⊗···⊗ → we see that any ϕ induces ψ : K K and we witness ϕ as having the special form i i → 0 ϕ(a a )= ψ (a ) ψ (a ). 1 ⊗···⊗ m 1 1 ··· m m

2.4. Embeddings vs Choices of Roots. Let K/K0 be a finite field extension. Write this as a primitive extension with K = K0(α) and let f(x) be the minimal polynomial of α. Using this notation we can write down a bijection Hom (K, K ) ∼ β K : f(β)=0 K0 0 −→{ ∈ 0 } ψ ψ (α). 0 7→ 0 Now, let Φ m Hom (K, K ) be a complete system of representatives for the equiva- ⊂ i=1 K0 0 lence relation . Let K = K (α ) where α has minimal polynomial f (x). We can modify ∼L i 0 i i i any ψ =(ψ ,...,ψ ) by some σ G(K /K) with σψ = id so that 1 m ∈ 0 1 K1 (ψ , ψ ,...,ψ ) (id , ψ′ ,...,ψ′ ). 1 2 m ∼ K1 2 m Such choices of ψ will be called normalized (for K1) and a collection of embeddings Φ will be called normalized if each element is normalized. Note that normalized Φ are in bijection with tuples of roots of the corresponding minimal polynomials. Φ ∼ ~α′ =(α′ ,...,α′ ): f (α′ )=0,...,f (α′ )=0 −→{ 2 m 2 2 m m } (id , ψ ,...,ψ )=(ψ , , ψ ,...,ψ ) (ψ (α ),...,ψ (α )). K1 2 m 1 2 m 7→ 2 2 m m We record that the inverse map is given by ′ ~α ψ~α′ 7→7 where the components of ψα~′ are the field embeddings uniquely determined by where they send the specified primitive element. We will make use of this correspondence frequently.

2.5. Notation for Quotients. For a polynomial ring R[x1,...,xn]/I we will sometimes use the notationx ¯1,..., x¯n to denote the images of x1,...,xn in the quotient.

2.6. Decomposition Comparisons. Given fields Ki = K0(αi) with minimal polynomials fi(x) for i = 1,...,m and Φ a K1-normalized system of embeddings we are interested in a description of the isomorphism (2.2) under the image of the base-change functor K . 0 ⊗K1 − This description will be used later in relating the tensor product of rings of integers to the ring of integers of tensor products. First, we observe that K K = K [x ,...,x ]/(f ,...,f ). This gives 1 ⊗···⊗ m ∼ 1 2 m 2 m K (K K ) = K K [x ,...,x ]/(f ,...,f ) 0 ⊗K1 1 ⊗···⊗ m ∼ 0 ⊗K1 1 2 m 2 m ∼= K0[x2,...,xm]/(f2,...,fm) ′ ′ ∼= K0[x2,...,xm]/(x2 α2,...,xm αm). ′ − − M~α Hence the isomorphism

′ ′ K0 K1 ϕ : K0[x2,...,xm]/(x2 α2,...,xm αm) K0 K1 Lψ. ⊗ ′ − − → ⊗ M~α Mψ∈Φ is now seen to be given by ′ ′ (f(¯x ,..., x¯ )) ′ (f(α ,...,α )) ′ . 2 m ~α 7→ 2 m ψ~α The point here is that base changing to the algebraic closure splits fields and this allows us to work with roots of polynomials.

2.7. Idempotents and Differents. Let K1,...,Km be finite extensions of a field K0 with Ki = K0(αi) and minimal polynomials fi. If K1 contains the Galois closures of K2,...,Km then the idempotents of K K have the form 1 ⊗···⊗ m m f (¯x ) 1 g = i i . (2.3) j2,...,jm (¯x α ) f ′(α ) i=2 i i,ji i i,ji Y − Here K K = K [x ,...,x ]/(f ,...,f ) and 1 ⊗···⊗ m 1 2 m 2 m f (x)=(x α )(x α ) (x α ). i − i,1 − i,2 ··· − i,ni Alternatively, we can fix some ψ := (ψ ,...,ψ ) a tuple of embeddings ψ : K K and 1 m i i → 0 write m f (¯x ) 1 g = i i . ψ (¯x ψ (α )) f ′(ψ(α )) i=2 i i i i i Y − Proof. We know that

′ ′ K1[x2,...,xm]/(f2,...,fm)= K1[x2,...,xm]/(x2 α2,...,xm αm), ′ ′ ′ − − ~α =(α2,...,αm) M 8 so find the idempotents in this decomposition is the same as solving for g~α′ such that

g ′ 1 (x α′ ,...,x α′ ), ~α ≡ 2 − 2 m − m . ′ ′ ′ ~′ ′ (g~α 0 (x2 β2,...,xm βm), β = ~α ≡ − − 6 Since f (x)/(x α′) f ′(α ) as x α by L’hˆopital’s rule (which by universality of the i − i → i i → i computation holds algebraically), the element f (x) 1 g (x)= i i (x α ) f ′(α ) − i i i has g (α′) = 1 and g (β′)=0 for β′ = α′. To obtain our result we just take the product of i i i i ei 6 i the gi as in the statement of the result.  Thee relation betweene idempotents and differents now appears clear via formulas (2.1) in 2.2e and (2.3). § 2.8. Rings of Integers of Tensor Products vs Tensor Products of Rings of Integers [Moc15b, Theorem 1.1]. We now give the comparison of T = m and . Here i=1 OKi OL = where L = K K = L . We remind ourselves that is a OL ψ∈Φ OLψ 1 ⊗···⊗ m ψ∈Φ ψ N OL T -algebra. Here ϕ : T is given by (extending linearly) L → OL L ϕ(a a )=(ψ (a ) ψ (a )) . 1 ⊗···⊗ m 1 1 ··· m m ψ∈Φ For future reference we will let ϕψ denote the component of ϕ in the ψth factor. Explicitly, ϕ (a a )= ψ (a ) ψ (a ) if ψ =(ψ ,...,ψ ). ψ 1 ⊗···⊗ m 1 1 ··· m m 1 m Theorem 2.8.1. Let K1,...,Km be finite extensions of Qp sitting in a fixed algebraic closure. Let T = . Let L = κ(T )= K K . Let k for i =1,...,m denote OK1 ⊗···⊗OKm 1 ⊗···⊗ m i the respective residue fields of K . If β =1 f ′ (α ) f ′ (α ) where = W (k )[α ] i ⊗ 2 2 ⊗···⊗ m m OKi i i with Eisenstein polynomial f (x) W (k )[x] then i ∈ i β (T : ). ∈ L OL That is β T . · OL ⊂

Proof. in what follows we let Zp denote the integral closure of Zp in Qp. In view of faithful flatness [AM69, Chapter 3, exercises 16,17] it is enough to show

Zp O (β L) Zp O T. ⊗ K1 · O ⊂ ⊗ K1 We will use the notation L := Zp O L and T := Zp O T . Using our embedding O ⊗ K1 O ⊗ K1 decomposition we have Zp O (β L) = β L where β = ϕψ(β)gψ. Here we note ⊗ K1 O · O ψ∈Φ that P ϕ (β)= ϕ (1 f ′ (α ) f ′ (α )) = f ′ (ψ (α )) f ′ (ψ (α )), ψ ψ ⊗ 2 2 ⊗···⊗ m m 2 2 2 ··· m m m for ψ =(ψ ,...,ψ ) Φ (here we take Φ to be K -normalized). 1 m ∈ 1 We now use that the idempotents are given by m f (¯x ) 1 1 1 g = i i Z [¯x ,..., x¯ ]= T. ψ (¯x ψ (α )) f ′(ψ (α )) ∈ ϕ (β) p 2 m ϕ (β) i=1 i i i i i i ψ ψ Y − 9 Now we just check: if x R it has the form x = x g for some x Z . We have ∈ ψ∈Φ ψ ψ ψ ∈ p P β x = ϕψ(β)gψ xξgξ · ! ! ψX∈Φ Xξ∈Φ = ϕ (β)x g T. ψ ψ ψ ∈ Xψ The second equality follows from orthogonality of idempotents and the last membership statement follows from the fact that ϕ (β)g T .  ψ ψ ∈

Remark 2.8.2. The proof of Theorem 2.8.1 has nothing to do with K1. We can chose some Ki which makes the inclusion tightest.

3. Conductors, Minimal Discriminants, and Ramification This section contains definitions and facts about bad reduction, minimal discriminants, and Galois theory necessary for our applications. Readers just interested in Probabilistic Szpiro or Baby Szpiro ( 7) may skip the last two subsections and proceed directly to 6. For § § a quick reading, readers may which to skip to 6 and come back to this section as needed in § the course of reading 7 or 8. § §

3.1. Inertia/Decomposition Sequences. Recall that for a finite extension K of Qp we have an extension of topological groups 1 I G G 1, → K → K → k → where IK is the inertia group and k is the residue field. If L is a global field and v V (L) is non-archimedean and v v is a place of L we have ∈ | 1 I(v/v) D(v/v) G(κ(v)/κ(v)) 1 → → → → where G(v/v) = Stab (v) G is the decomposition group. GL ∼= Lv

3.2. Unramified and Ramified Representations. Let L be a finite extension of Qp. If X is an object in a category, a representation ρ : G Aut(X) is unramified if and only L → if ρ(IL) = 1. We may speak of X being unramified, where the representation is understood (usually torsion points of an elliptic curve). Let L be a global field. Given ρ : G Aut(X) we say that ρ is unramified at v if and L → only if ρ is unramified. In either of these cases, if a representation is not unramified it is |Gv called ramified.

3.3. Good and Bad Reduction. Let K be a finite extension of Q . Let R = be its p OK ring of integers and let k be its residue field. Let AK be an over K. We recall that AK has good reduction if and only if there exists and abelian schemes A over R whose generic fiber is isomorphic to AK . This is equivalent to the special fiber of the N´eron model being an abelian variety. Given an abelian variety A over a global field L we say that AL has good reduction at v if and only if ALv does. 10 3.4. Division Fields and Galois Representations. Let A be an abelian variety over a field L. Let m be an integer. We will abuse notation and let A[m] denote both the group

scheme of m-torsion points and the GL-module given by taking L-points of this group scheme. Assume now that L is a number field. We will let Ll = L(A[l]). We remark that Ll may be defined by literally adjoining the coordinates of torsion points in some model and that this field extension is independent of the model. If we fix an algebraic closure L we also have ker ρl Ll ∼= L . We also note that G(Ll/L) ∼= im(ρl : GL Aut(A[l])). n → Consider now the TlA = lim A[l ] in the category of Galois modules. We let ρ ∞ : G Aut T A denote action in the←− underlying representation. When it is necessary to l L → l specify the abelian variety we use ρl∞,A. Serre’s surjectivity theorem says that for an elliptic curve without the image of ρl surjective for all but finitely many l. This implies that for l sufficiently large 5 im(ρl) ∼= GL2(Fl). We make this remark because the initial theta data hypotheses of [DH20b, 5] require ρ (G ) SL (F ) — Serre’s Conjecture says this is generically true. § l F ⊃ 2 l 3.5. Minimal Discriminants and Tate Parameters. We suppose that E is an elliptic curve over a number field F sitting in initial theta data. We will assume that the it is semi-stable (all bad places are places with multiplicative reduction). Note that if it is not semi-stable one can make a finite change of base such that all places of the new field above a place of additive reduction in the old field are places of good reduction. Under any base change, places in the new field over places of multiplicative reduction in the old field still have multiplicative reduction (hence the word semi-stable).

In the case that E is an elliptic curve over L, a finite extension of Qp by [Sil13, Ch V, Lemma 5.1] if j > 1 (which is equivalent to E having multiplicative reduction) there | E|p exists a Tate parameter q = q L and an isomorphism of elliptic curves u : E E E ∈ → q defined over L. Here Eq is the Tate curve which admits a Tate uniformization. Note that this implies that all elliptic curves without potential good reduction have a unique Tate parameter at bad places. In fact: q Q (j ) if L is a finite extension of Q . E ∈ p E p The following describes the relationship between the minimal discriminant and the Tate parameter.

Lemma 3.5.1. If E is an elliptic curve over L a complete discretely valued field with valu- ation v then min min (1) If E has multiplicative reduction then ordv(∆ ) = ordv(qE), where ∆ is the minimal discriminant E/L.

(2) All Tate curves Eq are minimal Weierstrass models.

Proof. The proof of the first assertion follows from Ogg’s Formula. This formula states

c = ord(∆min)+1 m. − 5 It is conjectured by Serre that for every number field L there exist some lmax such that for every elliptic curve and all l lmax that im(ρE,l)=GL2(Fl). In the case L = Q it is further conjectured that lmax = 37. ≥ 11 Here c is the local conductor exponent, ∆min is the minimal discriminant and m is the number of irreducible components s the special fiber of the N´eron model of E for R = L. Since our E O min elliptic curve has multiplicative reduction this implies c = 1 which implies m = ordv(∆ ). Now we have

ord / 0 = E (L)/(E ) (L) = (L×/qZ)/(R×/qZ) v Z/ ord (q). Es Es ∼ q q 0 ∼ −−→ v The first equality is the Kodaira-N´eron Theorem, where denotes the special fiber of the Es N´eron model and the superscript zero denotes the connected component of the identity. Also

(Eq)0(L) is the kernel of specialization. The second equality follows from Tate uniformization and the last equality follows from taking valuations. From this the equality follows. (see [Sil09, Appendix C]).

We now prove ∆Eq is minimal. We know that

∆ = q (1 qn)24. Eq − n≥1 Y min This shows ordv(∆Eq ) = ordv(q) and since ordv(q) = ordv(∆Eq ) from the first assertion of the lemma we are done. 

3.6. Minimal Discriminants and Base Change. The following describes how minimal discriminants behave under base change.

Lemma 3.6.1. Let K/F be a finite extension of number fields. If E/F is a semi-stable elliptic curve then [K : F ] ln ∆min = ln ∆min . | E/F | | EK /K | Proof. The proof is a computation:

ln ∆min = ord (∆min )f(w/p ) ln(p ) | EK /K | w EK /K w w w∈XV (K) = ordw(qw)f(w/pw) ln(pw) w∈XV (K) = e(w/v)ordv(qv)f(w/v)f(v/pv) ln(pv) v∈XV (F ) w∈XV (F )w

= [K : F ] ord (q )f(v/p ) ln(p )  w v  v v v v v∈XV (F ) Xw|v   = [K : F ] ord (q )f(v/p ) ln(p ) = [K : F ] ln ∆min . v v v v | E/F | v∈XV (F )  12 3.7. Normalized Arakelov Degrees. For a number field L and an Arakelov divisors D ∈ Div(L) the normalized Arakelov degree is defined by

deg (D) d deg (D)= L . L [L : Q] d For v V (L) with [v] Div(L) degreesd are normalized so that deg([v]) = ln Nv = f ln(p ) ∈ 0 ∈ | | v v where pv is the characteristic of κ(v) and fv is the inertia degree. We use the property that the normalized degree isd invariant under pullback: if f : V (L) dV (L ) is the natural map → 0 associated to an extension of number fields L L and D Div(L ) then 0 ⊂ ∈ 0 deg (D)= deg (f ∗D). L0 L d ∗ We record that f [v0]= e(v/v0)[v]. v|v0 d d

P bad 3.8. q and Theta pilots. Fix initial theta data (F/F,EF ,l,M,V ,Vmod, ǫ). Furthermore, suppose it is built from the field of moduli so that V bad V (F ) contains all the semi-stable mod ⊂ 0 places of bad reduction.

Definition 3.8.1. The q-pilot divisor of this data is then

P = ord (q1/2l)[v] Div(F ) . (3.1) q v v ∈ 0 Q v∈V bad Xmod The q-pilot is related to the minimal discriminant of an elliptic curve by the following formula: min 1 ln ∆E/F deg (Pq)= | |. (3.2) F0 2l [F : Q] To see this we perform a simple computation:d

deg ordv(qv)[v] = deg ordv(qv) e(w/v)[w] F0 K   v bad ! v∈V bad w∈V (K)v X Xmod X d d   = deg ord (q )[w] K  w v  v∈V bad w∈V (K)v Xmod X d  

= deg ordw(qw)[w] K ! wXbad d min = deg ordw(∆EK /K)[w] K ! wXbad dln ∆min ln ∆min = | EK /K| = | E/F |. [K : Q] [F : Q] We now discuss Theta pilots. 13 Definition 3.8.2. The theta pilot divior is a tuple P = (P )(l−1)/2 Div (F )(l−1)/2 Θ Θ,j j=1 ∈ lgp 0 Q where 2 P = ord (qj /2l)[v] Div(F ) . d Θ,j v v ∈ 0 Q v∈VX(F0) bad The relationship between the theta and q-pilots is givend by l(l + 1) deg (P )= deg (P ). (3.3) F0 q 12 lgp,F0 Θ This formula is derived by a simple computation: d d (l−1)/2 2 deg (P )= deg (P ) lgp,F0 Θ l 1 F0 Θ,j j=1 − X d (l−1)/2 d 2 2 = j deg (Pq) l 1 F0 j=1 − X l(l + 1) d = deg (P ). 12 F0 q Remark 3.8.3. (1) The assertion of [SS17, pg 10]d is that (3.3) is the only relation between the q-pilot and Θ-pilot degrees. The assertion of [Moc18, C14] is that [SS17, pg 10] is not what occurs in [Moc15a]. The reasoning of [SS17, pg 10] is something like what follows: ×µ F ◮×µ 0,0F ◮×µ (a) TheΘLGP-link in [Moc15a] is a polyisomorphism between -strips, LGP 1,0F ◮×µ and ∆ . 0,0 1,0 (b) Within these objects there are two global realified Frobenioids LGP and ∆. 0,0 0,0 1,0 1,0 C C Also there exists objects PΘ LGP and Pq ∆ called the (0,0) theta ∈ C ∈ C ×µ pilot object and (1,0) q pilot object respectively and the theta link ΘLGP is such ×µ 0,0 1,0 that ΘLGP( PΘ )= Pq . (c) To each such global realified Frobenioids we can interpret a one dimensional C real Pic( ). Also, to any object P there is an associated C ∈ C degree degC (P ) Pic( ). ∈ C 0,0 1,0 (d) Any isomorphism between LGP and ∆ induces an isomorphism between C C Pic(0,0 ) and Pic(1,0 ). CLGP C∆ (e) An identification one can make is to fix isomorphisms α : Pic(0,0 ) R CLGP → β : Pic(1,0 ) R C∆ → specified by extending linearly 0,0 α(deg0,0 ( P )) = deg (PΘ), CLGP Θ lgp 1,0 β(deg1,0 ( P )) = deg(Pq), C∆ q d where the degree on the left hand side are as in the present subsection ([SS17, 2.1.6] calls this the canonical trivialization.)d 14 (f) The authors of the present article, Scholze-Stix, and Mochizuki all agree that the items above lead to a contradition. Stripping away the abstraction, these assertions are tautologically equivalent to deg (P )=(l(l+1))/12) deg (P ) lgp,F0 Θ · F0 q and deglgp,F0 (PΘ)= degF0 (Pq). This clearly gives a contradiction. (g) It is our understanding that no such α mapd is specified in IUT; meaningd that ×µ commutativityd of thed diagram consisting of the map induced by ΘLGP, α, and β is not asserted. (2) We would like to point out that the diagram on page 10 of [SS17] is very similar to the diagram on 8.4 part 7, page 76 of the unpublished manuscript [Tan18] which § Scholze and Stix were reading while preparing [SS17]. (3) As of August 1st 2019, the documents above can be found at http://www.kurims.kyoto-u.ac.jp/~motizuki/IUTch-discussions-2018-03.html. We note that there is also the review [Rob 3] which some may find interesting.

3.9. N´eron-Ogg-Shafarevich: Conductors and Good Reduction. The following the- orem of Serre and Tate, which they call the N´eron-Ogg-Shafarevich Criterion, tells us how ramification of an l-power Tate module is related to the reduction geometry of the N´eron model of corresponding the abelian variety.

Theorem 3.9.1 ([ST68]). Let A be an abelian variety over a local field L of residue char- acteristic p. The following are equivalent. (1) For all m N, (m, p)=1, A[m] is unramified. ∈ (2) There exist a rational prime l such that l = p and T (A) is unramified. 6 l (3) There exist infinitely many m with (m, p)=1 such that A[m] is unramified. (4) A has good reduction.

We apply this in subsequent sections to get information about the behavior of ramification degrees in our computations. We can apply this theorem to get a criteria relating the conductor of Abelian varieties to discriminants the of an associate l division field.

Theorem 3.9.2. Let A be an abelian variety over a number field L. Let l be a rational prime.

Let Ll = L(A[l]). Let w be a non-archimedean place of Ll coprime to l and char(κ(w)) = p. The following holds

e(w/p) > 1 w l or w Cond(A/L) or w Diff(L/Q). ⇐⇒ | | | Proof. Suppose that e(w/p) > 1. Since e(w/p) = e(w/v)e(v/p), where v V (L) is the ∈ image of w V (L ) under the natural map V (L ) V (L), we must have e(w/v) > 1 or ∈ l l → e(v/p) > 1. If e(v/p) > 1 then v Diff(L/Q) which implies w Diff(L/Q). If e(w/v) > 1 | | then I = 1 since #I = e(w/v). Since ρ : G(L /L) Aut(A[l]) is injective and T w/v 6 w/v l l → l has A[l] as a quotient, we know that ρ (I ) = 1 and hence T A is ramified. This implies l v/v 6 l v Cond(A/L). The final option is w l. | | Conversely suppose w l or w Cond(A/L) or w Diff(L/Q). | | 15 | If w l then since L Q(ζ ) we have e(w/l) > l 1. If w Diff(L/Q) then by definition | l ⊃ l − | e(v/p) > 1. If w Cond(A/L) then v Cond(A/L) since L /L is Galois. We know that | | l v Cond(A/L) c =0 v is ramified I =1. | ⇐⇒ v 6 ⇐⇒ ⇐⇒ w/v 6

This proves the result. Above, cv = ordv(Cond(A/L). 

4. Estimates on p-adic Logarithms The material in this section is applied in 6 in order to obtain a bound on the smallest § polydisc containing tensor products of log-shells. In what follows we will let log denote the

p-adic logarithm, ln denote the real valued natural logarithm, and logp denote the real valued base p logarithm. We refer the reader to [Rob00] for a quick review of elementary properties of the p-adic logarithm. See also [DH20a, 2]. §

4.1. Notation.

4.1.1. We let Cp be the p-adic completion of Qp and let ordp be the unique extension of the valuation on Qp to Cp with ordp(p) = 1. We normalize the p-adic absolute values by x = p− ordp(x). | |p If K is local field with uniformizer πK we let ordK denote the valuation normalized by ord (π ) = 1. In the case that L is a global field and v V (L) is a non-archimedean place, K K ∈ we let ordv = ordLv denote the normalized valuation on Lv.

4.1.2. Let K/K0 be a finite extension of non-archimedean fields of residue characteristic p. We will let e(K/K0) denote the ramification degree of the extension. We will say e(K/K0) is small provided e(K/K )

4.1.3. For a p-adic field K, a K and r 0 a real number we will denote the closed disc ∈ ≥ of radius r by D (a, r)= x K : x r . K { ∈ | |p ≤ } Similarly if L = m L is a finite direct some of p-adic fields, ~a = (a ,...,a ) L and j=1 j 1 m ∈ ~r =(r ,...,r ) is a vector of non-negative real numbers then we will denote the polydisc of 1 m L polyradius ~r by

D (~a, ~r)= (x ,...,x ) L : x r and and x r . L { 1 m ∈ | 1|p ≤ 1 ··· | m|p ≤ m}

When writing DL(0, R) where R R we will understand this to mean DL(0, (R,R,...,R)). ∈ 16 4.2. Estimates on The Size of The p-Adic Logarithm. We begin by estimating the size of the p-adic logarithm (c.f. [Moc15b, Prop 1.2]). Lemma 4.2.1 (Crude Estimate). Let a C , with ord (a) > 1. We have ∈ p p cp log(1 + a) p < , | | ordp(a) −1 where cp = (exp(1)ln(p)) , where exp(1) = 2.71828182 ... is the base of the natural log.

n Proof. To get an upper bound on log(1 a) = a for a < 1 it suffices to | − − |p | n≥1 n |p | |p compute max an/n . Equivalently, we can compute the minimum of ord (an/n). We find | |p P p these lower bounds by using ord (an/n)= n ord (a) ord (n) n ord (a) log (n), p p − p ≥ p − p and minimizing the function f(x)= xc log (x). − p The function has global minimum at x0 =1/c ln(p) which gives 1 f(x) f(x )= + log (c ln(p)). ≥ 0 ln(p) p Converting this lower bound on the order to an upper bound on the p-adic absolute value gives our result. 6 

x Remark 4.2.2. One can also minimize the function f(x) = p c x giving log(1 + a) p |a|p 1 − | | ≤ bp , where bp = 2 . This is not of any use to us. ordp(a) ln(p)eln(p) The application of Lemma 4.2.1 gives an upper bound on the smallest radius r such that log( × ) D (0,r) where K is a finite extension of Q . With knowledge that e(K/Q ) is OK ⊂ K p p small we can do much better. We state these results and omit the proofs.

Lemma 4.2.3. Let K/Qp be a finite extension. (1) With no assumptions on the ramification of K/Q we have log( × ) D (0, e(K/Qp) ). OK ⊂ K ln(p) exp(1) (2) If e(K/Q )

Definition 4.3.1. Let K/Qp be a finite extension. The log-shell of K is the Zp-submodule of K defined by = 1 log( × ) IK 2p OK Lemma 4.3.2 (Upper Semi-Compatibility). contains both and log( × ). IK OK OK 6One could have also used ord (an/n) = pm ord (a) m. along the sequence n = pm. This will give p p − different, less useful bounds. See the remark below. 17 Proof. It is clear that log( × ) . Conversely, since 2p 1/(p 1). Hence OK p − 1 1 log(1 + 2p )= (2p )= . IK ⊃ 2p OK 2p OK OK 

Remark 4.3.3 (Module Structures on log( × )). For K/Q a finite extension we not that OK p log( × ) has the structure of an -module very rarely. In order for log( × ) to be an OK OK OK -module we need OK a log(b) = log(ba) for a and b × . This in turn depends on the convervence of ba = ∞ a(a−1)···(a−n+1) (b ∈ OK ∈ OK n=0 n! − 1)n. We will not pursue this here, as estimates will not be needed. On the other hand we P do observe that log( × ) is always a Z -module for exactly the same reason. OK p 5. Archimedean Logarithms

In order for our estimates to be complete we require definitions and estimates for hull(UΘ) at the Archimedean factor L . For ~v = (v ,...,v ) V (F )j+1 we will let H denote the ∞ 0 j ∈ 0 ∞ ~v component of hull(U ) in K K (since √ 1 K we know that K = C for each Θ v0 ⊗···⊗ vj − ∈ v ∼ v V ). ∈ Claim 5.0.1. If ~v V (F )j+1 then H D (0; R ) where ln(R )=(j + 1) ln(π). ∈ 0 ∞ ~v ⊂ L~v ~v ~v We do not develop the theory necessary to discuss this bound as this requires an Archimedean theory parallel to the p-adic theory in [?]. A full anabelian treatment requires so-called aut- holomorphic spaces. The starting place is [Moc15a, Definition 1.1]. The claim above can be found in [Moc15b, Proposition 1.5, Proof of Theorem 1.10, step vii].

6. Upper Bounds on Hulls We now come to the section of the paper which contains the first major computation. bad Fix initial theta data (F/F,EF ,l,M,V ,Vmod, ǫ). In this section our goal is to find, for each prime p and each ~v (l−1)/2 V (F )j+1, the smallest poly-disc D (0, R ) such that the ∈ j=1 0 p L~v ~v component of the multiradial representation at ~v is contained in this polydisc. The smallest ` possible polydisc here is called the hull. 6.1. Hulls. If L = m L is a finite direct sum of p-adic fields and Ω L then lets define j=1 j ⊂ R (Ω) = max x :(x ,...,x ) Ω , then define the poly-radius of Ω to be i {| i|p L 1 m ∈ } R~(Ω) = (R1(Ω),...,Rm(Ω)). Define the hull of Ω to be the smallest poly-disc containing Ω:

hull(Ω) = DL(0, R~(Ω)). It is easy to check that if α =(α ,...,α ) L then 1 m ∈ R~(α Ω)=( α1 pR1(Ω),..., αm pRm(Ω)). · | | 18 | | Also, given a collection of compact regions Ω L where i =1, 2, then for each j where i ⊂ ··· 1 j m we have ≤ ≤ ∞ R ( Ω ) = sup R (Ω ): i 1 . j i { j i ≥ } i=1 [ Note that the right hand of the above equality is possibly infinite. We now state some basic properties of hulls. For A, B L we will write ⊂ A B hull(A) hull(B). ⊂∼ ⇐⇒ ⊂ Note that A B if there exists some Q -linear tranformation T : L L with det(T ) =1 ⊂ p → | |p and T (A) B∼ (such a T could be multiplication by a unit of L for example). Also if Ω L ⊂ ⊂ and a Km (which we view as acting on L via multiplication on the mth tensor factor) then N ∈ N a Ω a Ω. To see that hull(a Ω) hull(a Ω), we observe that a Km acts on each direct · ⊂∼ · · ⊂ · ∈ summand of L by ψj(a) where we have written L = Lψj using the Chinese Remainder formulas developed in 2.3. This gives R (aN Ω) = sup R (an Ω) : n 1 = a R (Ω). § j · L{ j · ≥ } | |p j This implies R (aN Ω) R (a Ω) and hence hull(aN Ω) hull(a Ω). j · ≤ j · · ⊂ · 6.2. Worst Case Scenario. We now give a toy-version of our the computation of the hull bound associated to a tuple ~v V (F )j+1. Here we make assumptions on ramification of our ∈ 0 fields.

Let K1,...,Km be finite extensions of Qp (in our actual application m will be j +1). Let a K with a < 1. Let L = m K = r L where the factors of the right hand side ∈ m | |p i=1 i ∼ i=1 j come from the Chinese Remainder Theorem as in 2.3. N m L § In what follows we will let = be the tensor product of log-shells and Aut(L : ) I i=1 IKi I denote the collection of Q -vector space automorphisms of L obtained by extending Q - p N p linearly Z -lattice automorphisms of . These automorphisms are a stand-ins for ind1 and p I ind2 in our actual applications (see [DH20b, 4] for definitions). 7 § This subsection gives a bound on the hull of the “multiradial representation”8 m U = hull Aut (L : log( × )) ( ind3(a)) . Qp OKi · OL i=1 ! O This region is a stand-in for the random measurable set U (j) A⊗j+1 of the hull of the coarse ⊂ V ,p multiradial representation of the Theta pilot region (see [DH20b, 4] and the next section). § We prove hull Aut(L : ) ( ind3(a)) D (0; R) (6.1) I · OL ⊂ L where the radius R is given by  m ln(R)= ord (a)+ diff diff ln(p)+ m ln(c )+ ln(e(K /Q )). (6.2) −⌊ p k k∞ −k k1⌋ p i p i=1 X 7The only reason this subsection can’t directly be applied is because the actual ind1 has some permutations ⊗j+1 among different tensor product factors of AV ,p . The permutation of these factors does not appear in this example. 8In [DH20b] we used the notation U for what we are now denoting U. 19 The constant c R and the vector diff Rm are given by p ∈ ∈

cp =1/ exp(1) ln(p),

diff = (diff(K1/Qp),..., diff(Km/Qp)). To obtain this radius we compute. We have labeled each line in the computation below and give the justification for each step in the itemized environment following the displayed equations.

m Aut(L : ) aN log( × ) (6.3) I · OL ∪ OKi i=1 !! Om m Aut(L : ) a β−1 log( × ) (6.4) ⊂ I · OKi ∪ OKi ∼ i=1 i=1 !! O O Aut(L : ) aβ−1 (6.5) ⊂∼ I I Aut(L : ) p⌊ordp(a)−ordp(β)⌋ (6.6) ⊂∼ I I = p⌊ordp(a)−ordp(β)⌋  (6.7) I 2p m m p⌊ordp(a)−ordp(β)⌋D (0, e(K /Q )) (6.8) ⊂ L exp(1) ln(p) i p ∼ i=1   Y Since hull(DL(0, R)) = DL(0, R) for all radiuses R> 0 we have e(K /Q ) e(K /Q ) hull(U) D (0, 1 p ··· m p p−⌊ordp(a)+k diff k∞−k diff k1⌋). ⊂ L 2p | |p Here are the justifications for each step: (6.1) to (6.3): Uses the main result concerning ind3 in [?] • (6.4): Uses the theory of 2. In particular there exist some β = (β1,...,βr) • r § r r ∈ L = L such that β = β where ord (β )= diff j=1 j OL j=1 jOLj ⊂ i=1 OKi p j k k1 − diff for each j where 1 j r. kL k∞ ≤ L≤ N (6.5): First we are using the “upper semi-compatibility” of , namely that for a • IK finite extension K of Q we have log( × ), . We use this fact tensor factor p OK OK ⊂ IK by tensor factor. Also, since the factors of β all have large order, multiplication by β−1 will increase the size of the hull. (6.6): We are using the general fact that if A is a region and a < a then • | 1|p | 2|p a A a A. 1 ⊂ 2 (6.7):∼ This uses that Aut(L : ) is by definition Q -linear and fixes as a set. • I p I (6.8): We are applying the results of Lemma 4.2.31. • Remark 6.2.1. One can break this inclusion down in some alternative ways. Here we highlight some areas for improvement. We do not pursue these here. (1) Alternative to (6.3): For bounding Aut(L : )(a ( m log( )) one could write I · OL∪ i=1 OKi our an explicit Zp-basis for a L and explicitly compute the action by Aut(L : ). · O 20 L I (2) Alternative to (6.4): One could attempt to compute the index of m in . i=1 OKi OL This seems practical to do in specific toy cases but the size of the division fields may N give in actual applications. It seems conceivable that other invariants around this inclusions can be used to write down more precise results. (3) (6.5): One could attempt to find a smaller region here containing the two sets. Are log-shells optimal? Maybe, maybe not. (4) (6.8): We can go beyond the worst case scenario and make additional considerations about the ramification of the fields to improve bounds on . This includes applying I the second part of Lemma 4.2.3 (which is applicable most of the times). In fact, for all but finitely many places of v V (F ) we have = . ∈ 0 Iv OKv bad 6.3. Actual Scenario. Fix initial theta data (F/F,EF ,l,M,V ,Vmod, ǫ) built from the field of moduli. In what follows A = K denotes the “fake adeles” from [DH20b, V v∈V (F0) v (j) ⊗j+1 (j) (j) 3.1– 3.3]. We seek to bound the sets Up A =: Lp where Up is of the form § § Q ⊂ V ,p

(j) ind3(~aj ) Up = ind2(ind1( (j) )). OLp Here we have made the following notational conventions:

(j) = (j) , OLp OLv Mv|p j (j) = Peelv( (j) ), OLv OLv ind3(~aj ) ind3(aj,v ) (j) = (j) , OLp OLv Mv|p

and we have let ~aj =(aj,v)v∈V (F0) where

j2/2l qv , v bad multiplicative aj,v = . (1, else

All of this of course depends on a choice of initial theta data. The peel decomposition j Peelv( (j) ) is described in [DH20b, 3.3.7]. OLv § Following Mochizuki we improve the toy bounds of 6.2 using Remark 6.2.14 (= considering § what happens when ramification is small). Let ~v = (v ,...,v ) V j+1. We say that ~v is 0 j ∈ p small if every e(v /p) is small for 0 i j. Similarly we say that ~v is unramified if v is i ≤ ≤ i unramified for each i where 0 i j. We will also let L = K K , where the ≤ ≤ ~v v0 ⊗···⊗ vj tensor products are over Qp.

Lemma 6.3.1. In the notation of this subsection, we have

(j) hull(Up ) DL~v (0, R~v) ⊂ j+1 ~v∈V p 21Y where 0, ~v unramified and p ∤ ∞  ordp(aj,v) ordp(β~v) ln(p), ~v small and p ∤ ln(R~v)= −⌊ − ⌋ j ∞  ordp(aj,v) ordp(β~v) ln(p)+(j + 1) ln(bp)+ ln(e(v /p)), p and ~v general −⌊ − ⌋ i=0 j |∞ (j + 1) ln(π), P p  |∞  Proof. There are three points of departure from the computation in 6.2: the specialization § of β and a, improvement of log-bounds, and the inclusion of the archimedean place. In the

case that ~v is unramified we know that aj,vj = 1 by N´eron-Ogg-Shafarevich. In the case that ~v is small, we apply the bounds from Lemma 4. In the archimedean case we apply Lemma 5.0.1. 

7. Probabilistic Versions of the Mochizuki and Szpiro Inequalities

bad Throughout this section we fix initial theta data (F/F,EF ,l,M,V ,Vmod, ǫ) built from the field of moduli F0 = Q(jE ).

7.1. Probability Spaces. Fix a rational prime p. Recall that, as in the introduction, we give (l−1)/2 V (F )j+1 the structure of a finite probability space where (v , v ,...,v ) j=1 0 p 0 1 j ∈ (l−1)/2 V F j+1 j=1` ( 0)p is assigned probability

` 2 [Kv : Qp][Kv : Qp] [Kv : Qp] Pr((v , v ,...,v )) = 0 1 ··· j . 0 1 j l 1 [F : Q]j+1 − 0 (l−1)/2 j+1 The space j=1 V (F0)p can be viewed as a uniform independent disjoint union of prob- j+1 ability spaces V (F0) . For a random variable X(~v) that depends on ~v =(v0, v1,...,vj) (l−1)/2 ` j+1 ∈ j=1 V (F0) we can view the expectation of X as an “iterated expectation”, by first computing the expectation as we vary over (v ,...,v ) V (F )j+1 for a fixed j and then ` 0 j ∈ 0 p computing the expection of these expectations as we vary uniformly over j. In what follows 2 Ep will denote this iterated expectation: E2(X(~v)) = E(E(X(~v): ~v V (F )j+1):1 j (l 1)/2). p ∈ 0 ≤ ≤ − Note that the colons here do not denote conditional probabilities.

7.2. Jensen’s Inequality. Jensen’s inequality states that for a convex function g(x) and a random variable X that g(E(X)) E(g(X)). ≤ The inequality goes the other way for concave functions and one can test for convexity using the second derivative test: a function of a real variable g(x) is convex if and only g′′(x) 0. ≥ In particular g(x) = exp(x) is a convex function and g(x) = ln(x) in concave. This allows us to say that exp(E(ln(X))) E(X) ln(E(exp(X))). (7.1) ≤ 22 ≤ 7.3. Random Variables Pulled-back from a Projection. Let S be a discrete probability n space. Let (X1,...,Xn) be a random variable on S . If f(X1,...,Xn) only depends on Xn (i.e. f(X1,...,Xn)= g(Xn) for some function of a single variable g) then the expected value of f(X1,...,Xn) can be computed by just varying over what the function depends on. In symbols:

E(f(X1,...,Xn)) = E(g(X)). It is also elementary to check that

E(g(X )g(X ) g(X )) = E(g(X))n. 1 2 ··· n 7.4. Measures. For L a direct sum of p-adic fields, we will often make use of the formula

ln µ (D (0, R)) ln(R). L L ≤ Here, for a finite dimensional vector space V and a measurable set A V we define ⊂ ln µV (A) = ln(µV (A))/ dim(V ).

7.5. Probabilistic Mochizuki. Using the Probabilistic formalism developed in [DH20b, 3.1.2], we can state [Moc15a, Corollary 3.12] in the following way: § Theorem 7.5.1 (Tautological Probabilistic Inequality). For ~v V j+1 let ∈ p R◦ = sup R R : U D (0, R) , (7.2) ~v { ∈ ~v ⊂ L~v }

here U~v is the component of the multiradial representation in L~v. Assuming [Moc15a, Corol- lary 3.12] we have 2 ◦ deg (Pq) Ep(ln R~v). (7.3) − F0 ≤ p∈XV (Q) d ◦ The radius R~v in Lemma 6.3.1 gives an estimate on R~v in (7.3) giving

2 deg (Pq) Ep(ln R~v). (7.4) − F0 ≤ p∈XV (Q) d The rest of this subsection is devoted to estimating ln R~v (so we will be deriving, in effect, will be estimates of estimates).

Remark 7.5.2. The computation of R~v is not optimal. It can be improved upon by readers in ◦ general or in special cases. It is unclear how far off R~v is from R~v. It would be interesting to ◦ develop a table of R~v in some numerical examples (if the computations involving the division fields are not prohibitively hard).

The readers should compare what follows to [Moc15b, Proof of Theorem 1.10]. Fix p ∈ V (Q). We have 2 Ep(ln R~v) Ip + IIp +IIIp + IVp + Vp (7.5) ≤ 23 where

2 I = E2 ord (qj /2l) ln(p) p − p p vj II = E2 ( diff diff ) ln(p) p p k ~v k1 −k ~v k∞ 2 IIIp = Ep (1ram(~v))) 2 IVp = Ep((j + 1) ln(bp)1ram(~v)) j 2 Vp = Ep ln(e(vj/p)) i=0 ! X

In the above formulas for IIIp and IVp the function 1ram(~v) is the function which is 0 if ~v is 9 unramified and 1 if ~v is ramified. We will denote the sums over p of Ip,IIp, IIIp, IVp, and Vp by I, II, III, IV, and V respectively.

Remark 7.5.3. At this stage we can already see Mochizuki’s inequality as stated in [Fes15, 2.12]: we combine inequalities deg(P ) log ν (hull(U)) and log ν (hull(U)) a(l) § − q ≤ L L ≤ − b(l)deg(Pq) to get (b(l) 1)deg(Pq) a(l) which gives − d≤ a(l) d d deg(P ) . q ≤ b(l) 1 − [SS17, Claim 5] follows this style. Furtherd approximate computations can be found at [Hos17, slide 17] (adapted in [SS17, 1.3]). §

7.6. Computation of Ip. We have

j2 j+1 j2 E(ordp(q ): ~v V (F0) )= E(ordp(q ): v V (F0)) vj ∈ v ∈

[F0,v : Qp] j2 = ordp(q ) [F0 : Qp] v Xv|p 1 j2 = e(v/p)ordp(q )f(v/p) ln(p) [F0 : Q] v v∈VX(F0)p j2 = deg( ordv(q )[v]). v v∈V (XF0)p bad d Hence

2 j2 j2 Ip = Ep ordp(q ) = E(deg( ordv(q )[v]) = deg (PΘ). (7.6) vj v lgp,F0 p p X   X v∈V (XF0)p bad d d 9We say a tuple (v ,...,v ) is ramified if there exists some i with 0 i j such that e(v /p) > 1. If a 0 j ≤ ≤ i tuple is not ramified it is called unramified. 24 7.7. Computation of IIp. In what follows we make use of the average different order of V over p is defined to be the quantity diff(v/p) diffp := logp(E(p )). (7.7) We will prove (l + 1) II diff ln(p). (7.8) p ≤ 4 p diffp Note that if we define the average different for V by Diff(V /Q)= p p we get II ln Diff(V /Q). Q (7.9) ≤ Before establishing (7.8) it is convenient to make the following Lemma. Lemma 7.7.1. For ~v V (F )n let ∈ 0 p

diff~v = diff(v1,...,vn) = (diff(v1/p),..., diff(vn/p)). For ~v V (F )n following inequalities hold. ∈ 0 p (1) diff diff n−1 diff . k ~v k1 −k ~v k∞ ≤ n k ~v k1 (2) E(diff ) ndiff . (v1,...,vn) ≤ p The subscripts 1 and denote the usual l1 and l∞ norms for vectors in Rn. ∞ Proof. (1) The proof is a fortiori. For positive real numbers a1,...,an we have n n

n( ai max ai)=n( ai) n max ai) − 1≤i≤n − 1≤i≤n i=1 i=1 X Xn n n( a ) a ≤ i − i i=1 i=1 X n X =(n 1) a . − i i=1 X This proves ~a ~a n−1 ~a , if we let ~a =(a ,...,a ). k k1 −k k∞ ≤ n k k1 1 n (2) We will apply Jensen’s inequality, to turn an expectation of a sum E(diff(v1/p)+ +diff(v /p)) into (the log of) an expectation of a product E(pdiff(v1/p) pdiff(vn/p)). ··· n ··· Now that this is a product of random variables the expectation factors, namely, E(pdiff(v1/p) pdiff(vn/p))= E(pdiff(v/p))n. This shows E(diff ) n log E(pdiff(v/p)) ··· (v1,...,vn) ≤ p which is our desired result. 

We now prove our desired formulas: j E( diff diff ) E( diff ) k (v0,...,vj ) k1 −k (v0,...,vj ) k∞ ≤ j +1 k (v0,...,vj ) k1 j ((j + 1)diff ) ≤ j +1 p

= jdiffp. 25 The first line follows from Lemma 7.7.11 and the second line follows from Lemma 7.7.12 (which as an application of Jensen’s inequality together with the way expectations of prod- ucts of random variables behave). It remains to compute the expectation of these over 1,...,j . We have { } (l−1)/2 2 l +1 E2( diff diff ) E(jdiff )= j diff = diff , k (v0,...,vj ) k1 −k (v0,...,vj ) k∞ ≤ p l 1  p 4 p j=1 − X   which gives our result.

7.8. Computation of IIIp. In what follows we will make use of the probability of a place v V to be unramified. In formula this probability is defined by ∈ p P =1 E(1 (v): v V (F ) ). (7.10) unr,p − ram ∈ 0 p Also recall that since a tuple ~v = (v ,...,v ) V j+1 is unramified if and only if each v is 0 j ∈ p i unramified for 0 i j this means that E(1 (v ,...,v ))=1 Pj+1 . This then gives ≤ ≤ ram 0 j − unr,p (l−1)/2 (l−1)/2 2 2 III = E2(1 (v ,...,v )) = 1 Pj+1 =1 Pj+1 . p p ram 0 j l 1 − unr,p − l 1 unr,p − j=1 − j=1 X  X j+1 (l+1)/2 As the smallest of the Punr,p is Punr,p we get the following inequality:

III 1 P(l+1)/2. (7.11) p ≤ − unr,p

7.9. Computation of IVp. We will prove l +5 IV ln(b ) 1 Pl+1/2 . (7.12) p ≤ 4 p − unr,p  Using identical reasoning to 7.8 the first average is E((j+1) ln(bp)1ram(~v))=(j+1) ln(bp)(1 j+1 § − Punr,p). This gives

(l−1)/2 2 IV = ln(b ) (j + 1)(1 Pj+1 ) p p l 1 − unr,p  j=1 − X  l +5  ln(b )(1 Pl+1/2) . ≤ p − unr,p 4  

7.10. Computation of Vp. It will be convenient to define ep, the average ramification index of V over p. In notation it is defined by

ep := E(e(v/p): v V (F0)p). (7.13) 26 ∈ We now compute Vp: we have

j k

E( ln(e(vi/p))) = E(ln( e(vi/p))) i=0 i=0 X Y j ln(E e(v /p) ) ≤ i i=0 ! Y ln(E(e(v/p))j+1)=(j + 1) ln(e ) ≤ p The first to second line is an application of Jensen’s inequality and the second to third line uses that, for independent random variables, the expectation of the product is the product of the expectations. We then can compute the second expectation by computing the uniform average of j + 1 over 1,..., (l 1)/2 . This gives { − } l +5 IV ln(e ). (7.14) p ≤ 4 p

7.11. Archimedean Contribution. Here we only have to deal with log-shells. Applying Lemma 5.0.1 we have

(l−1)/2 2 l +5 E2 = (j + 1) ln(π)= ln(π). ∞ l 1 4 j=1 − X

7.12. Probabilistic Szpiro. We now combine the results of the previous subsections. The verification of the following identities requires some careful bookkeeping.

Theorem 7.12.1 (Probabilistic Szpiro). Assume [Moc15a, Corollary 3.12] and Claim 5.0.1. bad Then for any elliptic curve E/F in initial theta data (F/F,EF ,l,M,V ,Vmod, ǫ) built over the field of moduli we have

1 ln ∆min | E/F | ln Diff(V )+ ln(e )+ A (7.15) 6+ ε [F : Q] ≤ p l,V l p X where 5 A = ln(π)+ (1 P(l+1)/2) ln(b )+ , l,V − unr,p p l +4 p X   and b =1/ exp(1) ln(p), and ε = 24(l + 3)/(l2 + l 12). p l − Proof. For the most part, this is just a combination of the bounds on I,II,III,IV and V given by equations (7.6),(7.8), (7.11), (7.12), and (7.14). The most interesting aspect of this computation is the appearance of the 6 + εl. 27 From the Tautological Probabilistic Inequality we get l +1 deg (Pq) deg (PΘ)+ ln Diff − F0 ≤ − lgp,F0 4 l +5 l +5 d + d(1 Pl+1/2)(1 + ln(b )) + ln(e ) − unr,p 4 p 4 p p p X X l +5 + ln(π). 4

min Using that deg (PΘ)=((l + 1)l/12)deg(Pq) and deg(Pq) = ln ∆E/F /2l[F : Q] we get lgp,F0 | | (ld+ 1)l 1 ln ∆min l +1d d l +5 1 | E/F | ln Diff + (1 Pl+1/2)(1 + ln(b )) 12 − 2l [F : Q] ≤ 4 − unr,p 4 p p   X l +5 l +5 + ln(e )+ ln(π) 4 p 4 p X We now divide both sides by (l + 5)/4 an massage the algebra to get our result. A simple computation shows that l(l + 1) 1 4 1 1 = 12 − 2l l +5 6+ ε       l where 24l + 72 ε = . l l2 + l 12 − This proves the assertion that ε = O(1/l) as l . Finally, putting everything together l → ∞ we get 1 ln ∆min | E/F | ln Diff + ln(e )+ A . (7.16) 6+ ε [F : Q] ≤ p l,V l p X Here Al,V is as described in the statement of the proposition. 

7.13. Baby Szpiro. To demonstrate the utility Probabilistic Szpiro we give a “Baby” Szpiro inequality.

Theorem 7.13.1 (Baby Szpiro). Assume [Moc15a, Corollary 3.12] and Claim 5.0.1. For bad an elliptic curve E over a field F sitting in initial theta data (F/F,EF ,l,M,V ,Vmod, ǫ) built from the field of moduli we have

1 ln ∆min | E/F | ln([K : Q]5/4) ln( Disc(K/Q) 5/4) + ln(π), (7.17) 6+ εl [F : Q] ≤ | |

2 here εl = (24l + 72)/(l + l 12). − 28 Proof of Baby Szpiro. This is just a simple application of the Probabilistic Szpiro for theta data (Theorem 7.12.1) using elementary bounds for the right hand side. We use ln(Diff) ln(rad Disc(K/Q) [K : Q]), (7.18) ≤ | |· ln e ln([K : Q])ω( Disc(K/Q) ), (7.19) p ≤ | | p X l+1 4 4 1 Punr,p ln(b )+ ln Disc(K/Q) , (7.20) − p l +1 ≤ | | p X     10 where rad(N) = p|N p and ω(N) = d|n 1 is the “number of divisors” function. These together with the bounds ω(N) ln(N)/ ln2(N) give that the right hand side of the second Q ≤ P probabilistic Szpiro is less than ln(Dd) + ln(D) ln(d) + ln(D) where D = Disc(K/Q) and d = [K : Q]. This simplifies to | | ln(Dd) + ln(D) ln(d) + ln(D) (ln(D) + 2)(ln(d)+2). ≤ Since D,d SL (F ) 6840, we will find an a Q such that ln(xa) ln(x) + 2. Solving ≥ 2 l ≥ ∈ ≥ the inequality gives a 2/ ln(x) + 1 and since ≥ 2 2 5 +1 +1 ln(x) ≤ ln(6840) ≤ 4 we get that (ln(D) + 2)(ln(d)+2) ln(D5/4) ln(d5/4) which proves the result.  ≤ 8. Deriving Explicit Constants For Szpiro’s Inequality From Mochizuki’s Inequality In order to get strong uniform versions of Szpiro’s inequality from Mochizuki’s inequality one needs to do some careful ramification analysis based on the N´eron-Ogg-Shafarevich Criterion ( 3). This process is a sort-of “separation of variables” writing an upper bound § for Disc(K/Q) in terms of d , l, Disc(F/Q) . | | 0 | | Here are the questions we needs to answer: What makes a place w V (K) ramify? What ∈ p is the maximum possible ramification index e(w/p) as we vary over w V (K) ? Does the ∈ p size of p matter? We answer all of these questions in the subsequent section and apply these results to get our version of uniform Szpiro with exponent 24.

10 The inequality (7.18) is an application of the bounds on the different order given in 2.2. We have § [F0,v : Qp] [F0,v : Qp] 1− 1 +ord e(v/p) Q pdiffp = E(pdiff(v/p))= pdiff(v/p) p e(v/p) p p pordp[K: ]. [F0 : Q] ≤ [F0 : Q] ≤ · Xv|p Xv|p We then have Diff(V /Q) p pordp[K:Q] = rad( Disc(K/Q) )[K : Q], ≤ · | | p Y which gives the result. 29 8.1. Ramification Analysis. The following Lemma answers the question about the max- imal ramification index. Lemma 8.1.1. For every place w V (K) we have ∈ e(w/p) B ≤ l,d0 4 where Bl,d0 = 276480l d0. Proof. Fix w V (K). We consider the successive extensions11 ∈ K F = F ′(E[15]) F ′ = F (E[2], √ 1) F Q. ⊃ 2 ⊃ 2 0 − ⊃ 0 ⊃ We label the various images of w under the induced map on places as follows: V (K) V (F ) V (F ′) V (F ) V (Q) → → 2 → 0 → w v v′ v p. 7→ 7→ 2 7→ 0 7→ In this notation we have ′ ′ e(w/p)= e(w/v)e(v/v2)e(v2/v0)e(v0/p) l4 23040 12 [F : Q] ≤ · · · 0 4 = 276480l d0 =: Bl,d0 . We explain these inequalities: rach of the extensions (other than F Q) is Galois and 0 ⊃ we have G(K/F ) GL (F ), G(F/F ) GL (Z/15), and #G(F ′/F ) 12. Knowing that ⊂ 2 l 2 ⊂ 2 2 0 | #GL (F )= q(q 1)(q2 1) and plugging in explicit values gives the result.  2 q − − As a Corollary we get the following. Lemma 8.1.2. If p > B then e(w/p)

p ≤ l l + 1 < p < Bl;d0 p > Bl;d0 p

4 l + 1 Bl;d0 = 276480l d0

e(w=p) wild e(w=p) tame e(w=p) small

Figure 1. A breakdown of the ramification of a tuple ~v =(v , v ,...,v ) V j+1. 0 1 j ∈ p

11These come from the hypotheses of “initial theta data built from the field of moduli”. 30 8.2. Explicit Szpiro. In the remainder of the paper we derive the following version of Szpiro’s inequality from (7.4).

Theorem 8.2.1. Assume [Moc15a, Corollary 3.12] and Claim 5.0.1. If E/F is an elliptic bad curve in initial theta data (F/F,EF ,l,M,V ,Vmod, ǫ) built from the field of moduli then min A d2l4+B d 24+ε ∆ e 0 0 0 0 ( Cond(E/F ) Disc(F/Q) ) l , (8.1) | E/F | ≤ | |·| | where A = 84372107405, B = 316495 and ε = (96(l + 3))/(l2 + l 12). 0 0 l − 2 Let B = Bl,d0 =2d0#GL2(Z/30l). In what follows let Ep be the expected value of ln µ ( )+ diff diff +1 (~v) (8.2) ~v I~v k ~v k1 −k ~v k∞ ram over ~v = (v ,...,v ) (l−1)/2 V (F )j+1. Above, denotes the hull of the tensor product 0 j ∈ j=1 0 p I~v of log-shells for ~v (l−1)/2 V j+1. We compute E2 for a given by breaking p into the cases ∈ j=1` p p infinite: p = • `∞ large: p > B • small: p B • ≤ Also, within each case we break (8.2) into three subcomputations: ln µ ( ) + diff diff +1 (~v) . v Iv k ~v k1 −k ~v k∞ ram I II III We then put these estimates| {z together} | to get our{z results.} | {z }

8.3. Computation at Infinite Places. Over the infinite places we have l +5 E2 ln(π). (8.3) ∞ ≤ 4

Proof. At the infinite prime II∞ = III∞ = 0. The number ln(π)(l + 5)/4 is just the uniform average of (j + 1) ln(π) over j which comes from Lemma 5.0.1. 

8.4. Computation at Large Places. Over the large places we have l +5 E2 ln(p), (8.4) p ≤ 4 p>B,pX6=∞ p>B,pX||DK,Q| which we can further estimate using ln Disc(F/Q) ln Cond(E/F ) ln(p) 2 | | + | | . ≤ [F : Q] [F : Q] p>B,p||XDisc(K/Q)|   We give a proof of these two claims.

Proof. By the results of 8.1 we know that for p > B , and every place v V we have § l,d0 ∈ p e(v/p)

For the different term IIp we have l +1 E2( diff diff )) diff ln(p) p k ~v k1 −k ~v k∞k ≤ 4 p where diff = log (E(pdiff(v/p) : v V )). Using tameness, we have that diff(v/p)=1 p p ∈ p − 1/e(v/p) 1 which implies E(pdiff(v/p)) E(p)= p. This gives ≤ ≤ 1, v p e(v/p)=1, diffp 1ram(p)= ∀ | ≤ (0, v p, e(v/p) > 1. ∃ | Hence l +1 E2 ( diff diff ) 1 (p) ln(p). p k ~v k1 −k ~v k∞ ≤ 4 ram

Finally we estimate the third term IIIp:

l+1 2 4 E (1 ) (1 Punr,p) 1 (p). p ram(~v) ≤ − ≤ ram

Putting the estimates for Ip, IIp, and IIIp together in the case that p > B we get l +5 l +1 E2 ln(p)1 (p)+ ln(p)1 (p)+1 (p) p ≤ 4 ram 4 ram ram l +5 l +5 + ln(p)1 (p) ≤ 4 4 ram   l +5 = ln(p)1 (p). 2 ram To finish our result we use the Lemma just outside this proof environment.12 

Lemma 8.4.1. ln Disc(F/Q) ln Cond(E/F ) ln(p) 2 | | + | | (8.5) ≤ [F : Q] [F : Q] p|| Disc(XK/Q)|,p>B   12We have decided to label this theorem because it is a critical juncture where discriminants for K meet conductors using N´eron-Ogg-Shafarevich. This seems to be the critical step in relating the two. 32 Proof. The hard part of this formula is not getting too greedy, it seems. For p > B we know that p Disc(K/Q) p Disc(F/Q) or p Cond(E/F ) . || | ⇐⇒ || | || | It is enough to show for each prime p with p > B and p Disc(F/Q) we have || | ln Disc(F/Q) ln Cond(E/F ) ln(p) 2 − | |p − | |p . ≤ [F : Q]   Note that we are using p-adic absolute values to take the p-parts of these integers. We observe that ln Disc(F/Q) = f(w/p)d(w/p) ln(p), − | |p w∈XV (F )p ln Cond(F/Q) = f(w/p)c (w) ln(p) − | |p E w∈XV (F )p where we have used

cE(w) d(w/pw) Cond(E/F )= Pw , Disc(F/Q)= Pw w w Y Y From 2.2 we know that d(w/p )= e(w/p ) 1 since p > B. Hence, it is enough to show § w w − w that for each p Disc(K/Q) that || | 2 (f(w/p)(e(w/p) 1) + f(w/p)cE(w)) w|p − 1. (8.6) P [F : Q] ≥ Using that p > B and 2(e(w/p) 1) e(w/p) together with the fact that f(w/p)e(w/p)= − ≥ w∈V (F )p [F : Q] we get P [F : Q]+ 2f(w/p)cE(w) LHS of (8.6) w∈V (F )p ≥ P [F : Q] f(w/p)cE(w) =1+2 w∈V (F )p . P [F : Q] This proves the result. We note that it is strictly greater than one since the initial theta bad  data hypothesis says that there is a non-empty set of primes in Vmod of bad reduction. 8.5. Computation at Small Places. Over the small places we have13 E2 (l + 3) ln(B)π(B) l5d . (8.7) p ≤ ≍ 0 pX≤B Proof. In the situation where p B we have worse bounds for E2. We will not care so much ≤ l,d0 p about these bounds as they turn into the constant which appears in Szpiro’s inequality.14

13For f(x) and g(x) positive functions of a single real variable we write f(x) g(x) as x if and only ≍ → ∞ if f(x)= O(g(x)) and g(x)= O(f(x)) as x . → ∞ 14On some level, of course, we do care because we would like better constants. This is secondary achieving some Szpiro though. 33 In the first term Ip we use j e(v /p) D 0; pj+1 i D (0; pj+1Bj+1). I~v ⊂ ~v ln(p) exp(1) ⊂ ~v i=0 ! Y This gives ln µ ( ) log (pj+1Bj+1) ln(p)=(j + 1) ln(pB), ~v I~v ≤ p which in turn gives (for p ramified) l +5 l +5 E2(ln µ ( )) E((j + 1) ln(pB)) = ln(pB) ln(B), p ~v I~v ≤ 4 ≤ 2 where the last inequality used p B. ≤ For term IIp involving differents, we have l +1 E2( diff diff )= diff ln(p). p k ~v k1 −k ~v k∞ 4 p Since diff(v/p) 1 1/e(v/p)+ord (e(v/p)) we get diff(v/p) 1+ord [K : Q] which proves ≤ − p ≤ p diff log (E(p1+ordp[K:Q])) = log (p1+ordp[K:Q])=1+ord ([K : Q]). p ≤ p p p Hence we have l +1 l +1 E2( diff diff ) (1 + ord [K : Q])) ln(p) ln(B) p k ~v k1 −k ~v k∞ ≤ 4 p ≤ 2

Finally in term IIIp we have

l+1 2 4 E (1 (~v)) (1 Punr,p) 1 (p). p ram ≤ − ≤ ram Putting the estimates for Ip, IIp, and IIIp together we get l +5 l +1 E2 ln(B)+ ln(B)+1 p ≤ 2 2 p≤B p≤B X X   ((l + 3) ln(B)+1)π(B) ≤ (l + 3) ln(B)π(B). ≤ This gives our main result. The asymptotic is then derived by using bounds in the prime number function π(x). One such bound is Dusart’s bound [Dus18] which states that for x> 1 we have x 1.3 π(x) 1+ . (8.8) ≤ ln(x) ln(x)   4 This then shows, using B = 276480l d0, that 1.3 (l + 3) ln(B)π(B) (l + 3)B 1+ l5d as l . ≤ ln(B) ≍ 0 →∞   

Remark 8.5.1. Using a slightly better form of Dusart’s bound gives an 1/ ln(B)2 correction term. 34 8.6. Proof of Explicit Szpiro. Working from deg (P ) deg(P ) E2 (8.9) lgp Θ − q ≤ p p X The left hand side of (8.9) becomesd d l(l + 1) 1 ln ∆min 1 | E/F |, 12 − 2l [F : Q]     and the right hand side of (8.9) becomes

2 2 2 2 Ep = Ep + Ep + E∞ p p≤B p>B X X X l +5 ln Disc(F/Q) ln Cond(E/F ) (l + 3) ln(B)π(B)+ 2 | | + | | ≤ 2 · [F : Q] [F : Q]   l +5 + ln(π) 4   We now divide both sides by (l + 5). The coefficient of the left hand side becomes l(l + 1) 1 1 l2 + l 12 1 1 = − =: , 12 − 2l l +5 24l(l + 5) 24+ ε       l where solving for εl gives 96(l + 3) ε = . l l2 + l 12 We now have −

1 min ln ∆E/F [ln(B)π(B) + ln(π)] [F : Q] + ln Disc(F/Q) + ln Cond(E/F ) (8.10) 24 + εl | | ≤ | | | | 4 Finally, using d1 = 276480 (the upper bound on [F : F0]) so that B = l d1d0, we get 1.3 [ln(B)π(B) + ln(π)] [F : Q] l4d d 1+ + ln(π) d d ≤ 0 1 ln(d ) 1 0   1   A d2l4 + B d ≤ 0 0 0 0 where A0 = 84372107405, and B0 = 316495. This gives our result after rewriting (8.10) multiplicatively with new bounds.

References

[AM69] Michael Atiyah and Ian Macdonald, Introduction to commutative algebra,, Addison Wesley, 1969. 2.8 [Aut19] Stacks Project Authors, Stacks project, 2019. 2.2 [Con] Keith Conrad, Differents, Notes of course, available on-line. 2.2 [DH20a] Taylor Dupuy and Anton Hilado, Log-Kummer Correspondences and Mochizuki’s Third Indeter- minacy, pre-print (2020). 4 [DH20b] , The Statement of Mochizuki’s Corollary 3.12, Initial Theta Data, and the First Two Indeterminacies. (document), 1, 1, 1, 1, 3.4, 6.2, 8, 6.3, 7.5 [Dus18] Pierre Dusart, Explicit estimates of some functions over primes, Ramanujan J. 45 (2018), no. 1, 227–251. MR 3745073 8.5 35 [Fes15] , Arithmetic deformation theory via arithmetic fundamental groups and nonar- chimedean theta-functions, notes on the work of Shinichi Mochizuki, Eur. J. Math. 1 (2015), no. 3, 405–440. MR 3401899 1, 7.5.3 [Hos15] Yuichiro Hoshi, IUT Hodge-Arakelov-theoretic evalutation, 2015. 1 [Hos17] , [IUTchIII-IV] from the point of view of mono-anabelian transport, 2017. 1, 7.5.3 [Hos18] , Introduction to mono-anabelian geometry, 2018. 1 [Ked15] Kiran Kedlaya, Etale theta function, 2015. 1 [Moc15a] Shinichi Mochizuki, Inter-universal Teichm¨uller theory III: Canonical splittings of the log-theta- lattice, RIMS preprint (2015). (document), 1, 1.0.3, 1.0.4, 1.0.5, 1, 1a, 5, 7.5, 7.5.1, 7.12.1, 7.13.1, 8.2.1 [Moc15b] , Inter-universal Teichm¨uller theory IV: log-volume computations and set-theoretic founda- tions, RIMS preprint 1 (2015). (document), 2.8, 4.2, 4.2.4, 5, 7.5 [Moc15c] , Topics in absolute anabelian geometry III: global reconstruction algorithms, J. Math. Sci. Univ. 22 (2015), no. 4, 939–1156. MR 3445958 4.3 [Moc17] Shinichi Mochizuki, The mathematics of mutually alien copies: From Gaussian integrals to inter- universal Teichmuller theory, 2017. 1 [Moc18] , Comments on the manuscript (2018-08 version) by Scholze-Stix concerning inter-universal Teichmuller theory (iutch), 2018. 1 [Mok15] Chung Pang Mok, Notes on Hodge theaters (for the 2015 Oxford workshop)., Handwritten Notes, 2015. 1 [Neu99] J¨urgen Neukirch, Algebraic , Grundlehren der Mathematischen Wissenschaften [Fun- damental Principles of Mathematical Sciences], vol. 322, Springer-Verlag, Berlin, 1999, Translated from the 1992 German original and with a note by Norbert Schappacher, With a foreword by G. Harder. MR 1697859 2.2, 4 [Rob00] Alain M. Robert, A course in p-adic analysis, Graduate Texts in Mathematics, vol. 198, Springer- Verlag, New York, 2000. MR 1760253 4 [Rob 3] David Roberts, A crisis of identification, Inference Review 4 (2019 in Volume 4, Issue 3). 3 [Sil09] Joseph H Silverman, The arithmetic of elliptic curves, vol. 106, Springer Science & Business Media, 2009. 3.5 [Sil13] , Advanced topics in the arithmetic of elliptic curves, vol. 151, Springer Science & Business Media, 2013. 3.5 [SS17] and Jakob Stix, Why abc is still a conjecture., 2017. 1, 1, 1e, 2, 7.5.3 [ST68] Jean-Pierre Serre and John Tate, Good reduction of abelian varieties, Ann. of Math. (2) 88 (1968), 492–517. MR 0236190 1, 3.9.1 [Sti15] Jakob Stix, Reconstruction of fields using Belyi cuspidalization, 2015. 1 [Sut15] Drew Sutherland, Notes for 18.785 - number theory i, MIT course notes (2015). 2.2 [Tan18] Fucheng Tan, Note on IUT, 2018. 1, 2 [Yam17] Go Yamashita, A proof of the after Mochizuki, RIMS preprint (2017). 1

36