Matijasevi˘c’sTheorem: Diophantine descriptions of recursively enumerable sets
Bachelor’s thesis
S.R. Groen ∗
First supervisor: prof. dr. J. Top Second supervisor: dr. A.E. Sterk
2017
Abstract In 1970, Yuri Matijasevi˘cfinished the proof that all recursively enumerable sets are Dio- phantine, rendering Hilbert’s tenth problem unsolvable. He did so by showing that exponential Diophantine sets are Diophantine, which complemented earlier work done by Martin Davis, Hilary Putnam and Julia Robinson. In this thesis, we analyze, explore and apply this result. We reconstruct a√ known way to create a Diophantine description of exponentiation: using the unit group of Z[ d]. This provides a mechanism with which we can create a Diophantine de- scription of any recursively enumerable set. We apply this to find Diophantine descriptions of some specific sets of integers. We also study the complexity of such Diophantine descriptions. Furthermore, we try to create a new method of creating a Diophantine description of expo- √3 nentiation,√ using the unit group of Z[ d], whose structure is similar to that of the unit group of Z[ d]. It turns out that such a similar method does not work, as the desired divisibility sequences don’t exist.
Keywords: Hilbert’s tenth problem, Diophantine sets, Matijasevi˘c’stheorem, number rings, al- gebraic number theory.
∗Faculty of mathematics and natural sciences, Rijksuniversiteit Groningen, Nijenborgh 4, 9747 AG Groningen, The Netherlands, e-mail: [email protected]
1 Contents
1 Introduction 3 1.1 Hilbert’s tenth problem...... 3 1.2 Diophantine sets...... 3 1.3 Recursively enumerable sets...... 4 1.4 Matijasevi˘c’sTheorem...... 4 1.5 The DPR-theorem...... 5 1.6 From exponential Diophantine to Diophantine...... 5 1.7 The aim of this thesis...... 6 √ 2 A Diophantine√ description of exponentiation using Z[ d] 7 2.1 Z[ d] and its unit group...... 7 2.2 The Pell equation...... 8 2.3 Cyclicity...... 9 2.4 A suitable choice for d ...... 10 2.5 Behavior of xn(a) and yn(a) ...... 11 2.6 Finding the solution number using divisibility properties...... 14 2.7 A Diophantine description of xn(a) and yn(a) ...... 20 2.8 A Diophantine description of exponentiation...... 21
3 Expanding the language of Diophantine descriptions 23 3.1 Diophantine descriptions of important functions...... 23 3.2 The Bounded Universal Quantifier Theorem...... 24 3.3 The Sequence Number Theorem...... 24 3.4 Putnam’s trick...... 25
4 Application 26 4.1 The set of primes...... 26 4.1.1 The straightforward definition...... 26 4.1.2 Wilson’s Theorem...... 27 4.2 The divisor number function...... 28 4.3 The divisor sum function...... 28 4.4 Euler’s φ function...... 29 4.5 Gödel’s incompleteness theorems...... 29
5 The complexity of Diophantine descriptions 31 5.1 Degree...... 31 5.2 Dimension...... 32 √ 3 6 A Diophantine√ description of exponentiation using Z[ d] 33 3 6.1 Z[ d] and its unit group...... 33 6.2 Application of Dirichlet’s theorem...... 34 6.3 A suitable choice for d ...... 35 6.4 The three-dimensional Pell equation...... 36 6.5 Behavior of xn(a), yn(a) and zn(a) ...... 38 6.6 Finding the solution number using divisibility properties...... 41 6.7 Are xn(a), yn(a) and zn(a) Diophantine?...... 43 6.8 Comparison to two-dimensional case...... 44
7 Conclusion and outlook 45
2 1 Introduction 1.1 Hilbert’s tenth problem In 1900, David Hilbert posed 23 then unsolved problems in mathematics that he encouraged mathematicians to solve in the twentieth century. Some of these problems, such as the Riemann hypothesis, are still unsolved. Hilbert’s tenth problem plays an important role in this thesis. Through work by Martin Davis, Julia Robinson, Hilary Putnam and Yuri Matijasevi˘c,this problem has been shown to be unsolvable. The problem was posed in 1900 as follows.
Hilbert’s tenth problem: Devise an algorithm that, given any polynomial equation with in- teger coefficients as input, gives as output whether this polynomial has any roots over the integers. [Dav73]
In 1970, Matijasevi˘ccompleted the proof that such an algorithm does not exist and that Hilbert’s tenth problem is thus impossible to solve. [Mat70] In this thesis, we will explore and apply the method’s used in the proof of Matijasevi˘c’s Theorem A key notion in the theory applicable to this problem is the notion of Diophantine sets. This will be a vital concept throughout this thesis.
1.2 Diophantine sets n Definition 1.1. A set S ⊂ Z is Diophantine if there exists an m ≥ n and a p ∈ Z[X1,X2, ··· ,Xm] for which the following holds:
n m−n S = {s ∈ Z | ∃t ∈ Z s.t. p(s, t) = 0}
A Diophantine set S ⊂ Zn is thus the projection of the set of zeros in Zm of the polynomial p ∈ Z[X1,X2, ··· ,Xm] onto the first n coordinates. An equation of the form p(X1,X2, ··· Xm) = 0 is also called a Diophantine equation, and a Diophantine description of S. We will call p the polynomial corresponding to S. In the following examples, the set of numbers X1 for which the polynomial p has integer roots is Diophantine:
• The even numbers, with the corresponding polynomial p = X1 − 2X2.
2 • The squares (of integers), with the corresponding polynomial p = X1 − X2 . 2 2 2 2 • The non-negative integers, with the corresponding polynomial p = X1 − X2 − X3 − X4 − X5 . Here we use Lagrange’s result that every non-negative integer is the sum of four squares, and obviously negative integers can’t have that property).
2 2 2 • The Pythagorean hypotenuse integers, with the corresponding polynomial p = X1 −X2 −X3 Another example of a Diophantine set is the set of composite numbers. On the first hand, p = X1 − X2X3 = 0 might seem like a suitable Diophantine description of this set, but it allows for one (or two) of the factors of X1 (which are X2 and X3) to be equal to 1. We therefore also need that both are larger than 1, in order to find the positive composite numbers. That will result in either of the following equivalent Diophantine descriptions of the set of composite numbers:
2 2 2 2 2 2 2 2 2 2 2 p = (X1 − X2X3) + X2 − X4 − X5 − X6 − X7 − 2 + X3 − X8 − X9 − X10 − X11 − 2 = 0 2 2 2 2 2 2 2 2 p = X1 − X2 + X4 + X5 + X6 + 2 X3 + X7 + X8 + X9 + 2 = 0 If we would also want to find the negative composite numbers, this would be equivalent to also allowing X3 to be smaller than −1 instead of greater than 1. We would then have the following polynomial:
2 2 2 2 2 2 p = (X1 − X2X3) + X2 − X4 − X5 − X6 − X7 − 2 + 2 2 2 2 2 2 2 2 2 X3 − X8 − X9 − X10 − X11 − 2 X3 + X12 + X13 + X14 + X15 + 2 = 0 We will later see that the complement of this last set, which is set of primes, is also Diophantine. However, this is far from trivial and may feel counterintuitive at this moment.
3 As the example of the set of composite numbers shows, using the four squares theorem so many times is quite a hassle. We therefore assume from now on that every variable can only be nonnegative. This is no loss of generality, as we can always introduce a minus sign to let a number be negative. We conclude that, for any set S, the following three are equivalent: 1. There exists a Diophantine description of S in the integers 2. There exists a Diophantine description of S in the non-negative integers
3. There exists a Diophantine description of S in the positive integers This is because we can always introduce a minus sign or use Lagrange’s four square theorem. It is straightforward to prove that the set of Diophantine sets is closed under union: if we have S1 and S2, with corresponding polynomials p1 and p2, The set S1∪S2 has corresponding polynomial p1 ·p2. This polynomial is zero if and only if at least one of the polynomials p1 and p2 is zero, which means we are dealing with an element of S1 or S2. Similarly, the set S1 ∩ S2 has corresponding 2 2 polynomial p1 + p2. We have already applied this technique to our Diophantine description of composite numbers. Not all Diophantine sets have the property that their complement is also Diophantine, but this is also not trivially seen. We can now see what the algorithm Hilbert asked for should do precisely: it should be able to decide within a finite amount of time, given any polynomial as input, whether the corresponding Diophantine set is empty or non-empty.
1.3 Recursively enumerable sets Another important notion will be the notion of recursively enumerable sets. Definition 1.2. A set S is recursively enumerable if the Turing machine has an algorithm that enumerates S. Equivalently: an algorithm exists that halts precisely when its input is an element of S. Examples of such sets are the following:
• Any finite set: S = {s1, s2, ··· , sn} = {s | s = s1 ∨ s = s2 · · · ∨ s = sn} • The positive numbers: S = {s | s > 0} • The even numbers: S = {s | ∃y s = 2y}
• The set of powers of 2: S = {s | ∃y s = 2y}
• The set of prime numbers: S = {s | ¬(∃y)1 1.4 Matijasevi˘c’sTheorem As said, Hilbert’s tenth problem boils down to devising an algorithm that decides membership of any given Diophantine set, within a finite amount of time. In 1936 already, Alonzo Church showed that an algorithm that decides membership of any given recursively enumerable set can’t exist. [Chu36]. It was only in 1970 that Matijasevi˘ccompleted the work of proving Matijasevi˘c’s Theorem, also known as the MRDP-theorem or DPRM-theorem (in credit to the others that had contributed). Theorem 1.3. Matijasevi˘c’sTheorem. A set is Diophantine if and only if it is recursively enumerable. [Mat70] 4 Then, since there does not exist an algorithm that can determine, within a finite amount of time, of any recursively enumerable set whether it’s non-empty, neither can we find an algorithm that does this for Diophantine sets. But this algorithm was exactly what Hilbert had asked for. Thus Matijasevi˘c’sresult implies that Hilbert’s tenth problem is unsolvable. The proof of Matijasevi˘c’sTheorem is a main subject of this thesis. We can make a good start by proving the following lemma: Lemma 1.4. If a set is Diophantine, then it is recursively enumerable. n Proof. Let S ⊂ Z be a Diophantine set, with the corresponding polynomial p ∈ Z[X1,X2, ··· ,Xm] and n ≤ m. Let s ∈ Zn be arbitrary. Then an algorithm that systematically checks all elements of Zm−n (for instance by ordering on the absolute value of the sum of the coordinates) will suffice. If s is in S, then, by definition of S, p(s, t) = 0 for some t ∈ Zm−n. We know that this t will be found by our algorithm after a finite amount of time, and therefore the algorithm will halt. On the other hand, when s is not in S, no such t ∈ Zm−n can be found, and thus our algorithm will run forever. As this algorithm halts precisely for elements in S, we conclude that S is recursively enumerable. Moreover, S was chosen arbitrarily, so we conclude that every Diophantine set is recursively enumerable. The algorithm described here is not the algorithm Hilbert asked for, as this algorithm is not able to conclude within a finite amount of time that a Diophantine equation does not have any integer solutions. Lemma 1.4 has a short proof. The inclusion the other way around, which is that all recursively enumerable sets are Diophantine, is a much more complicated and surprising result. It means that any recursively enumerable set of integers, e.g. the set of primes, corresponds to a polynomial that has a zero in the integers precisely when the first coordinate is an element of our set. In other words, every property of integers that can be found algorithmically, is expressible by a Diophantine equation. Before Matijasevi˘cstarted his work, a lot of work had already been done on the connection between recursively enumerable sets and Diophantine sets. The most important result of that research is described in the following section. 1.5 The DPR-theorem In 1961, Martin Davis, Julia Robinson and Hilary Putnam proved the Davis-Putnam-Robinson- theorem, or in short DPR-theorem, which was a very important step towards Matijasevi˘c’sThe- orem. In order to state this theorem, we must first define what an exponential Diophantine set is. Definition 1.5. A set is exponential Diophantine if there exists an m ≥ 0, a base b and a polynomial p ∈ Z[X1,X2, ··· ,Xn+2m] such that the following holds: n m t1 t2 tm S = {s ∈ Z | ∃{t1, t2, ··· , tm} ∈ Z s.t. p(s, t1, t2, ··· , tm, b , b , ··· , b ) = 0} We can now state the DPR-theorem Theorem 1.6. DPR-theorem. Every recursively enumerable set is exponential Diophantine. [DPR61] The proof of this theorem is intricate and contains quite a lot of analysis of algorithms and the Turing machine. It will therefore not be included in this thesis. An outline of the proof can be found in [DPR61] or in secondary literature, for instance [Kui10]. 1.6 From exponential Diophantine to Diophantine Now that we take the DPR-theorem for granted, the step towards Diophantine sets has become smaller. All we have to prove now, and what Matijasevi˘cproved in 1970, is that all exponential Diophantine sets are Diophantine. This is equivalent to proving that sets of the form 3 c S = {(a, b, c) ∈ Z | a = b } are Diophantine. Put otherwise, exponentiation is a Diophantine function. 5 There are several strategies for finding a Diophantine description of such sets. The strategy first used by Matijasevi˘cin his original proof was based on the Fibonacci numbers. [Mat70] In slightly different words, it was based on the unit group (the elements with a multiplicative inverse) √ √ 1+ 5 of the number ring Z[ 2 ]. After Matijasevi˘c’sproof, similar proofs were given using Z[ d] for a non-square d. Although Matijasevi˘c’sconstruction sufficed to show that all exponential Diophan- tine sets are Diophantine, the second approach is more useful for the systematic construction of Diophantine descriptions of recursively enumerable sets. The use of unit groups of number rings can briefly be explained as follows. We will see that these unit groups have a cyclic subgroup, for which there is a Diophantine description. If every element of this subgroup is a power of some fundamental unit, the units behave exponentially with respect to the exponent to which the fundamental unit is raised. Roughly speaking, if we can relate a unit to the exponent using Diophantine equations, a Diophantine description of exponentiation follows. Divisibility sequences, such as the Fibonacci sequence, play a central role in relating a certain unit to the exponent. The proof by Matijasevi˘cis constructive: it contains a recipe that can turn any recursively enumerable set into a Diophantine set. Specifically, it can turn an algorithm that enumerates a set into a polynomial corresponding to that set. 1.7 The aim of this thesis This thesis will focus on the construction of Diophantine descriptions of recursively enumerable sets. Firstly, in section√2, we will reconstruct the proof that exponential Diophantine sets are Diophantine, using Z[ d]. Subsequently, in section3, we give some new tools in Diophantine descriptions, using our Diophantine description of exponentiation. We then determine, in section 4, Diophantine descriptions corresponding to concrete sets, such as the following: • The set of primes. • The set of perfect numbers (numbers that are equal to the sum of their divisors). • Highly composite numbers (numbers that have more divisors than all smaller numbers). As said, Matijasevi˘c’sproof is constructive, so Diophantine descriptions of these sets of integers can be made explicit in a systematic way. Finding these Diophantine descriptions will provide insight in how the recipe and all the theory involved works precisely. In section5 we will also look at the complexity of Diophantine descriptions in terms of dimension (the number of variables in the Diophantine equation) and degree (the degree of the Diophantine equation). We examine to what extent both can be minimized. Furthermore, in section6 we will discuss a√ different strategy of proving that exponential√ Dio- 3 phantine sets are Diophantine. Instead of Z[ d], we will look at the number ring Z[ d]. In this ring, everything becomes a bit more complicated, and we work in a three-dimensional system in- stead of a two-dimensional√ one. However, its unit group has the same useful properties. If our 3 approach using Z[ d] works, this entails a new proof of Matijasevi˘c’sTheorem. As it turns out, many lemmas from section2 have a three-dimensional analog in section6, but the new approach ultimately does not work. It cannot similarly lead to a new Diophantine description of exponentiation. This is because the required divisibility sequences do not exist, which had already been shown in 1936 (in a paper not related to Matijasevi˘c’s Theorem). [Hal36] This is an unexpected twist, as we thought a divisibility sequence would follow naturally. It makes Matijasevi˘c’sTheorem and the two-dimensional case even more subtle and special. Treating the three-dimensional case also provides lots of knowledge of the theory of unit groups of number rings, as the proofs in the three-dimensional case are more intricate and need more advanced algebra. This knowledge also gives us a deeper insight into why the two-dimensional case works as well as it does, and into Matijasevi˘c’sTheorem in general. 6 √ 2 A Diophantine description of exponentiation using Z[ d] A common approach in finding a Diophantine√ description of exponentiation is to use the cyclicity of a subgroup of the unit group of Z[ d], for some non-square d. Elements of this subgroup correspond to solutions to the Pell equation. We first need to prove some lemmas about these solutions. We can eventually apply these to find a Diophantine description of exponentiation, which means that {(a, b, c) | a = bc} is a Diophantine set, the result that Matijasevi˘chas obtained in 1970. This section provides a construction of this Diophantine description of exponentiation. Our construction is similar to the construction in [Dav73]. √ 2.1 Z[ d] and its unit group A ring that is useful for our purpose is the following: √ √ Z[ d] = {x + y d | x, y ∈ Z} In which d is a positive integer, but not the square of an integer. This ring is an example of a number ring: Definition 2.1. A ring K is called a number ring if its field of fractions is a number field. √ √ In our case, the field of fractions of Z[ d] is Q( d). The latter is an algebraic field extension of Q and hence a number field. We wish to determine which of the√ elements of this ring have a multiplicative inverse. That is, we are studying the unit group of Z[ d]. √ √ √ × Z[ d] = {α ∈ Z[ d] | ∃β ∈ Z[ d] s.t. α · β = 1} It can straightforwardly be seen that this set is in fact a group under multiplication. Firstly, 1 is obviously a unit. Furthermore, the product of two units and the inverse of a unit are again units, which proves the fact that the set of units is in fact a group. In order to find out whether some element is a unit, we compute the norm of that element. √ √ Definition 2.2. Let α = x + y d be an element of Z[ d]. We define the norm of α as follows: √ √ N(α) = x2 − dy2 = x + y d x − y d (2.1) √ The origin of this norm can be explained√ using a little module theory.√ The ring Z[ d] can is a module over Z, with the basis {1, d}. Then, for every element of Z[ d] there exists a matrix in Z2×2 that corresponds to multiplication by that element. In order to determine the columns of this matrix Mα, we check what multiplying with α does with the elements of our basis: √ √ x + y d · 1 = x + y d √ √ √ x + y d · d = yd + x d This gives us the following matrix: x yd M = α y x We then compute the determinant of this matrix. √ √ x yd 2 2 det(Mα) = = x − dy = x + y d x − y d (2.2) y x This norm resembles the usual norm for complex numbers or the Euclidean norm. √ Lemma 2.3. α ∈ Z[ d] is a unit if and only if N(α) = ±1. 7 Proof. Only if: Suppose α is a unit. Since x and y√are integers, N(α√) will also be an integer. Furthermore, the norm is multiplicative: if α = x + y d and β = χ + ψ d, then we have √ √ N(αβ) = N x + y d χ + ψ d √ = N (xχ + yψd) + (xψ + yχ) d = (xχ + yψd)2 − d (xψ + yχ)2 = x2χ2 + 2xyχψ + y2ψ2d2 − x2ψ2 − 2xyχψ − y2χ2 = x2 − dy2 χ2 − dψ2 = N(α)N(β) Because of equation (2.2), this multiplicativity also follows directly from the multiplicativity of the determinant. Now, if α and β are each other’s inverse, then N(α)N(β) = N(αβ) = N(1) = 1 It follows that N(α) and N(β) are each other’s inverse in Z. The only units in Z are −1 and 1, so we conclude N(α) = N(β) = ±1 √ If: Suppose α = x + y d is such that N(α) = ±1. Then it follows from equation (2.1) √ √ x + y d x − y d = x2 − dy2 = ±1 √ We have thus found the multiplicative inverse of α, namely β = ± x − y d , and we conclude that α is a unit. √ Lemma 2.3 provides a different notation for the unit group of Z[ d]: √ √ × 2 2 x + y d = α ∈ Z[ d] ⇔ N(α) = x − dy = ±1 √ √ × ⇒ Z[ d] = {α ∈ Z[ d] | N(α) = ±1} This unit group will be of great importance in our construction of a Diophantine description of exponential sets. 2.2 The Pell equation We now study the following set: √ √ × Z[ d] ⊃ Gd = {α ∈ Z[ d] | N(α) = 1 , α > 0} (2.3) √ × Lemma 2.4. Gd is a subgroup of Z[ d] . Proof. 1. First of all, we observe that 1 ∈ Gd. 2. If α and β are positive, so is αβ. Furthermore, N(αβ) = N(α)N(β) = 1 · 1 = 1, so Gd is closed under multiplication. −1 −1 −1 −1 3. If α is in Gd, then, since αα = 1, α is also positive. Finally, N(α)N(α ) = 1∗N(α ) = −1 −1 1, so N(α ) = 1, and hence α ∈ Gd. Gd is√ thus closed under inverses. × We conclude that Gd is indeed a subgroup of Z[ d] . √ Lemma 2.5. A necessary and sufficient condition for any α = x + y d to be an element of Gd is that x is positive and x, y is an integer solutions to the Pell equation: x2 − dy2 = 1 (2.4) 8 √ Proof. Necessity: Suppose α = x + y d is an element of Gd. This means α > 0 and N(α) = 2 2 x −dy = 1. It follows√ that x, y is a solution to 2.4. Now, as Gd is closed under inverses, it follows −1 that α = x − y d is also in Gd, and hence positive: √ x + y d > 0 √ x − y d > 0 Adding these equations gives that x > 0. We conclude that the condition is satisfied. √ Sufficiency: Suppose x is positive and x, y is a solution to equation (2.4). We define α = x + y √d. 2 2 Equation (2.4) implies√ that N(α) = x −dy = 1. Furthermore, as x is positive, either α = x+y d or α−1 = x − y d is positive. αα−1 = 1 implies they have the same sign. It follows that both must be positive, and thus α > 0. We conclude that α is an element of Gd. We furthermore observe that the Pell equation is a Diophantine equation and positivity is a Diophantine property, such that the set √ 2 2 2 2 {(x, y) ∈ Z | x + y d ∈ Gd} = {(x, y) ∈ Z | x − dy = 1 , x > 0} is a Diophantine set. This is a crucial part in the construction of a Diophantine description of exponentiation. 2.3 Cyclicity √ × To find the structure of Z[ d] and Gd, we use a general and powerful unit theorem by Johann Dirichlet. We first need to define what an order of a number field is. Definition 2.6. An order O a number field K is a subring of K that is free of rank n = [K : Q]. Theorem 2.7. Dirichlet (1846): Let K be a number field with r1 real embeddings and r2 pairs of complex conjugate embeddings (so 2r2 complex embeddings in total). Then the unit group of any order in K is finitely generated with r1 + r2 − 1 independent generators of infinite order. More precisely, letting r = r1 + r2 − 1, any order O in K contains multiplicatively independent units u1, ··· , ur of infinite order such that every unit in O can be written uniquely in the form m1 mr ζu1 ··· ur where ζ is a root of unity and every mi is an integer. Abstractly, O× =∼ µ(O) × Zr, where µ(O) is the finite group of roots of unity in O.[Con] A proof of Dirichlet’s Unit Theorem, which can be found in [Con], is too intricate to include in this thesis. Nevertheless, it is a very powerful tool for us to use. √ √We can readily apply it to our case. We√ are working with the number√ field K = Q( d). As {1√, d} forms a basis for√ the vector√ space Q( d) over Q, we have [Q( √d): Q] = 2. We also√ have Z[ d] as a subring of Q( d), and Z[ d] is free with rank 2, such that Z[ d] is an order√ in Q( d). We can hence apply Dirichlet’s√ Theorem 2.7. We observe that the roots of√ unity of Z[ d] are just ±1, which means that µ(Z[ d]) = {±1}. Furthermore, we can embed Q( d) in the real numbers by the following ring homomorphism: √ f : Q( d) ,→ R √ √ x + y d 7→ x − y d √ √ Of course, as Q( d) is contained in R, the identity is also an embedding of Q( d) in R.√ That these are the only two embeddings follows from field theory: the minimum polynomial of d is X2 − d. These embeddings permute the zeros of this polynomial, and are the identity on Q. There can only be two such embeddings, as this polynomial only has two zeros. From this we conclude that we have found r. r1, the number of real embeddings, is 2, whereas r2, the number of complex embeddings, is 0. This gives us r = r1 + r2 − 1 = 2 + 0 − 1 =√ 1. We thus only have one generator of infinite order, which we call u, the fundamental unit of Z[ d]. It follows that the unit group is of the following form: √ × n Z[ d] = {±u | n ∈ Z} 9 This unit u could be the smallest unit greater than 1, or its inverse, the√ greatest unit smaller than 1. We define it to be the smallest unit greater than 1. The group Z[ d]× has 2 generators, namely −1 and u. Note that −1 has order 2 and u has infinite order. Let us now look at the set of positive units: √ × n {α ∈ Z[ d] | α > 0} = {u | n ∈ Z} This group has only one generator: we have eliminated the generator −1. We have thus ended up with a cyclic group. We can now return to the group Gd from 2.3. This is a subgroup of the group of positive units, and is therefore itself cyclic. It is hence of the following form: √ n Gd = {α ∈ Z[ d] | N(α) = 1 , α > 0} = {u1 | n ∈ Z} √ Where u1 is the smallest element of Z[ d] greater than 1 with norm 1. Now we have also excluded the units with norm −1. The unit√ u1 is the fundamental unit of Gd. It need not be identical to u, the fundamental unit of Z[ d], as the latter could have norm −1. The fact that the group Gd is cyclic can also be proven without Dirichlet’s Unit Theorem, using a more elementary arithmetical proof. [Dav73] However, we wish to find a deeper, more general reason for this fact, as this will be useful in section6. This reason is Dirichlet’s Unit Theorem. 2.4 A suitable choice for d Finding the fundamental unit of the group Gd given some non-square d is not always easy. On top of that, such a unit can be quite large. For instance, if d is equal to 1141, the fundamental unit is √ 1036782394157223963237125215 + 30693385322765657197397208 1141. The case d = 1000099 is even worse. In that case, the smallest positive value of y that is part of a solution to the Pell equation has 115 decimal digits. [Ste12] Another infamous example of such a huge solution to a Pell equation is Archimedes’ cattle problem, a relatively simple problem posed by the ancient Greek mathematician Archimedes. Solving the cattle problem eventually comes down to solving a Pell equation, for which the smallest solution has 206545 decimal digits. [Len02] Clever algorithms are needed to find such units, as simply trying values for x and y will take too long. However, if we choose d = a2 − 1 for some integer a > 1, finding the fundamental unit is easier. We obtain the following Pell equation: x2 − (a2 − 1)y2 = 1 (2.5) We can straightforwardly check that (x, y) = (a, 1) is a solution to this Pell equation. x2 − (a2 − 1)y2 = a2 − (a2 − 1) = 1 Throughout the rest of this section, we will use the letter d as an abbreviation for the expression 2 a − 1, with a > 1. We can now prove that we have found the fundamental unit of Gd. √ Lemma 2.8. The fundamental unit of Gd is u1 = a + d Proof. The proof is by contradiction. Assume that u1 is not the fundamental unit of Gd, such that Gd has an element larger than√ 1, but√ smaller than u1. That is, that there exist x and y such that x2 − dy2 = 1 and 1 < x + y d < a + d. By √ √ √ √ (x + y d)(x − y d) = 1 = (a + d)(a − d), √ √ √ √ It follows that 1 > x − y d > a − d, and hence −1√< −x +√y d < −a + d. Adding this to the assumed inequality gives 0 < 2y d < 2 d. No integer y can fulfill this last inequality. This is a contradiction, from which we conclude that u1 is the fundamental unit of Gd. We now know what the positive solutions (xn(a), yn(a)) to equation (2.5) look like precisely: √ √ n xn(a) + yn(a) d = a + d (2.6) In which n ranges over the integers. As seen, the sequences xn and yn are functions of a. 10 2.5 Behavior of xn(a) and yn(a) Because of Dirichelt’s Unit Theorem 2.7, every positive solution (x, y) to the Pell equation (2.5) is equal to (xn(a), yn(a)) for some n, as given by equation (2.6). We can now prove some lemmas about the arithmetical behavior of xn(a) and yn(a). From now on, we will drop the dependence on a, and just write xn, yn. Lemma 2.9. If (xm, ym) and (xn, yn) are solutions to equation (2.5), then we have xm±n = xmxn ± dymyn ym±n = xnym ± xmyn Proof. Let (xm, ym) and (xn, yn) be solutions to the Pell equation 2.5. Then we compute √ √ m+n xm+n + ym+n d = a + d √ m √ n = a + d a + d √ √ = xm + ym d xn + yn d √ = (xmxn + dymyn) + (xnym + xmyn) d So, xm+n = xmxn + dymyn and ym+n = xnym + xmyn. Similarly, we compute √ √ m−n xm−n + ym−n d = a + d √ m √ −n = a + d a + d √ √ −1 = xm + ym d xn + yn d √ √ = xm + ym d xn − yn d √ = (xmxn − dymyn) + (xnym − xmyn) d which proves the lemma. √ √ √ Substituting n = ±1 in Lemma 2.9, or simply working out xm±1+ym±1 d = xm + ym d a ± d gives us the following relations: xm±1 = axm ± dym (2.7) ym±1 = aym ± xn (2.8) We can now state the recursive relations for the sequences xn and yn, by which they are called Lucas sequences.[JSWW76] Lemma 2.10. xn+1 = 2axn − xn−1 and yn+1 = 2ayn − yn−1 Proof. From equation (2.7) follows xn+1 = axn + dyn xn−1 = axn − dyn Adding these two equations gives xn+1 + xn−1 = 2axn ⇒ xn+1 = 2axn − xn−1 Similarly, from equation (2.8) follows yn+1 = ayn + xn yn−1 = ayn − xn 11 Adding these two equations gives yn+1 + yn−1 = 2ayn ⇒ yn+1 = 2ayn − yn−1 The proof given above is the elementary arithmetical proof given by Martin Davis in [Dav73]. It is instructive to also give a different proof that is a bit more involved, but gives more insight in why this lemma holds. The following proof uses matrix notation and the Cayley-Hamilton Theorem to prove Lemma 2.10. Proof. We denote a solution to the Pell equation (2.5) by the column vector xn yn √ such that xn and yn are the coordinates of a unit in the ordered basis {1, d}. We now have to √ find the matrix corresponding to multiplication with our fundamental unit a + d . We find the √ columns of this matrix by checking how this multiplication acts on our basis vectors, 1 and d. √ √ a + d · 1 = a + d √ √ √ a + d · d = d + a d This gives us the following matrix: a d A = M = u1 1 a It follows that solutions are of the following form: n x a d x 1 n = 0 = An (2.9) yn 1 a y0 0 We now compute the characteristic polynomial of A and use the definition d = a2 − 1. pA(λ) = det(A − λI) a − λ d = 1 a − λ = (a − λ)2 − d · 1 = a2 − 2aλ + λ2 − a2 + 1 = λ2 − 2aλ + 1 From the Cayley-Hamilton Theorem then follows A2 − 2aA + I = 0 ⇒ A2 = 2aA − I We substitute that in into equation (2.9) the following: x 1 n+1 = An+1 yn+1 0 1 = A2An−1 0 x = A2 n−1 yn−1 x = (2aA − I) n−1 yn−1 x x = 2aA n−1 − n−1 yn−1 yn−1 x x = 2a n − n−1 yn yn−1 12 which is what we wanted to prove. Matrix notation can also be used to derive Lemma 2.9 and equations (2.7) and (2.8), but this does not save us any effort or provide us any additional insight. The recursive relations provided by Lemma 2.10 allow us to prove properties of xn and yn by induction, using our first two solutions (x0, y0) = (1, 0) and (x1, y1) = (a, 1). Specifically, we can now prove some lemmas about the growth of xn and yn, which resembles exponential growth. Lemma 2.11. For every non-negative n, xn+1 > xn > n and yn+1 > yn ≥ n. Proof. This can be shown by induction. It is straightforwardly seen that x1 > x0 > 0 and y1 > y0 ≥ 0. Furthermore, suppose the lemma holds up to n = k. Then, we have: xk+2 = 2axk+1 − xk > axk+1 + xk+1 − xk > axk+1 > xk+1 > k + 1 ⇒ xk+2 > xk+1 xk+1 > xk > k ⇒ xk+1 > k + 1 yk+2 = 2ayk+1 − yk > ayk+1 + yk+1 − yk > ayk+1 > yk+1 ⇒ yk+2 > yk+1 yk+1 > yk ≥ k ⇒ yk+1 ≥ k + 1 which proves our induction step. n n Lemma 2.12. For every non-negative n, a ≤ xn ≤ (2a) . 0 0 1 1 Proof. Again, we use induction. Firstly, a ≤ x0 ≤ (2a) and a ≤ x1 ≤ (2a) . Now our induction step is as follows: n n+1 xn+1 = 2axn − xn−1 ≤ 2axn ≤ 2a(2a) = (2a) n n+1 xn+1 = 2axn − xn−1 ≥ 2axn − xn ≥ axn + (a − 1)xn ≥ axn ≥ a · a = a which proves our lemma. In this induction step, we have used Lemma 2.11. Lemma 2.13. Let p be any positive number. Then we have n 2 xn + (p − a)yn ≡ p mod (2ap − p − 1) Proof. This again can be proven inductively. We first observe: x0 + (p − a)y0 = 1 + (p − a) · 0 = 1 x1 + (p − a)y1 = a + (p − a) · 1 = p We now suppose the lemma holds up to n = k. We then use the following induction step: xk+1 + (p − a)yk+1 = 2axk − xk−1 + (p − a)(2ayk − yk−1) = 2a(xk + (p − a)yk) − (xk−1 + (p − a)yk−1) ≡ 2apk − pk−1 mod (2ap − p2 − 1) = (2ap − 1)pk−1 mod (2a − p2 − 1) = p2pk−1 mod (2a − p2 − 1) = pk+1 mod (2a − p2 − 1) in which we have used Lemma 2.10. This induction step completes the proof of the lemma. The factors placed in front of xn and yn may seem arbitrary at first, but they are the solution c1, c2 to the following system: 1 0 c 1 1 = a 1 c2 p These factors are not relevant in the induction step. 13 Furthermore, note that the modulus 2ap − p2 − 1 is just minus the characteristic polynomial of A, which is also the origin of the coefficients in the recurrence relations of Lemma 2.10. In this lemma, it can be used to create a higher power of p. We have now seen that a Diophantine description of exponentiation could follow from a Dio- phantine description of xn(a) and yn(a). This is no surprise, the n-th unit larger than 1 is just the fundamental unit exponentiated by n. We need now finish our Diophantine description of xn(a) and yn(a). All we have left to do is find the solution number given a solution to the Pell equation. That is, given some solution (x, y) to equation (2.5), find n such that (x, y) = (xn, yn). We must, of course, do this using only Diophantine equations. For that goal, we use some divisibility properties of xn and yn. 2.6 Finding the solution number using divisibility properties Divisibility properties are very useful in finding what the solution number n is given a solution (x, y) of equation (2.5). The technique essentially comes down to proving that two numbers have the same residue modulo some modulus, and that both are smaller than the modulus, such that they must be equal. Lemma 2.14. For every n, gcd(xn, yn) = 1. Proof. Any divisor of both xn and yn must also divide the left hand side of the Pell equation (2.5), and thus its right hand side, which is 1. It thus follows that this divisor must equal 1. Lemma 2.15. For every positive n and k we have yn|ynk. Proof. We prove by induction on k. For k = 1, we have identity and hence division. Now suppose the lemma holds up to k = m. Then by Lemma 2.9 it follows that yn(m+1) = xnynm + xnmyn ≡ xnynm mod yn Our induction hypothesis provides that yn|ynm, so it follows that yn|yn(m+1), which completes our induction step We conclude that yn|ynk for every positive n and k. The property of the sequence y0, y1, ··· expressed by Lemma 2.15 makes that sequence a divis- ibility sequence. Definition 2.16. A sequence u1, u2, ··· that is constructed along the recurrence relation un+k = a1un+k−1 + ··· akun is called a divisibility sequence of k-th order if n|m implies un|um. Definition 2.17. The characteristic polynomial corresponding to this sequence is given by k k−1 f(x) = x − a1x · · · − ak In our case, this is just the characteristic polynomial of A. This polynomial immediately determines the coefficients of our recurrence relation by the Cayley-Hamilton Theorem, as seen in Lemma 2.10. Definition 2.18. A divisibility sequence u0, u1, ··· is called normal if u0 = 0 In fact, all sequences of the form un+2 = P un+1 − Qun with u0 = 0 and u1 = 1, with P and Q integers, are second order divisibility sequences. [Smy10] By definition, these sequences are normal. We will pay extra attention to the importance of the fact that y0, y1, ··· is a divisibility sequence, because this will be relevant in section 6.6. It is useful that the converse of Lemma 2.15 also holds. Lemma 2.19. For every n and t, yn|yt if and only if n|t. 14 Proof. If: Suppose n|t. Then Lemma 2.15 provides yn|yt. Only if: Suppose yn|yt. We write t = nq + r, with 0 ≤ r < n and observe: yt = xrynq + xnqyr Since yn divides both yt (our assumption) and ynq (by Lemma 2.15), it must divide xnqyr as well. Now we use Lemma 2.14: xnq and ynq are coprime. As yn|ynq, yn and xnq are coprime as well. It must then be that yn|yr. However, since r < n, we have yr < yn by Lemma 2.11. But yn can’t divide a positive number smaller than itself. It follows that yr = 0 and hence r = 0. We conclude that n|t. 2 Lemma 2.20. yn|yt if and only if nyn|t. 2 Proof. Only if: Suppose yn|yt. Then yn|yt, and hence by Lemma 2.19 n|t, so t = nk. We work out: √ √ t xt + yt d = a + d √ k = xn + yn d k X k = xk−jyj j n n j=0 k X k ⇒ y = xk−jyj t j n n j=0 , j6 |2 k ≡ xk−1y mod y2 1 n n n k−1 2 = kxn yn mod yn 2 = 0 mod yn It is used that terms of the sum in which j exceeds 1 contain higher powers of yn and are therefore 2 2 2 k−1 divisible by yn. The last equality is by our supposition that yn|yt. It then follows that yn|kxn yn k−1 k−1 and hence yn|kxn . However, by Lemma 2.14 xn and yn are coprime, and hence so are xn and yn. It follows that yn|k, and thus nyn|nk = t, which is what we wanted to prove. If: Suppose nyn|t. We first set k = yn to find: yn X y y = n xyn−jyj nyn j n n j=0 , j6 |2 y ≡ n xyn−1y mod y2 1 n n n 2 yn−1 2 = ynxn mod yn 2 = 0 mod yn 2 2 We conclude that yn|ynyn and then by Lemma 2.19 it follows that yn|yt. 2 2 Note that this lemma is stronger than the lemmas ’yn|ynyn ’ and ’yn|yt implies yn|t’, which are presented in [Dav73]. Lemma 2.20 forms a more general and compact lemma. Nevertheless, the lemmas in [Dav73] are also sufficient for the construction of a Diophantine description of exponentiation. Lemma 2.21. For any non-negative n, we have: x2n ≡ −1 mod xn x2n ≡ 1 mod yn y2n ≡ 0 mod xn y2n ≡ 0 mod yn 15 Proof. This can be seen by substituting m = n in Lemma 2.9, and using the fact that we are dealing with a solution to the Pell equation (2.5): 2 2 2 2 2 2 x2n = xn + dyn = xn + (xn − 1) = (dyn + 1) + dyn ⇒ x2n = −1 mod xn and x2n = 1 mod yn y2n = 2xnyn ⇒ y2n = 0 mod xn and y2n = 0 mod yn Note that the last congruence also follows from the fact that y0, y1, ··· is a divisibility sequence (Lemma 2.15). Lemma 2.22. For any non-negative n, we have: x4n ≡ 1 mod xn x4n ≡ 1 mod yn y4n ≡ 0 mod xn y4n ≡ 0 mod yn Proof. Using Lemma 2.21, and a similar construction, we find: 2 2 2 2 2 2 x4n = x2n + dy2n = x2n + (x2n − 1) = (dy2n + 1) + dy2n 2 2 ⇒ x4n = (−1) + ((−1) − 1) mod xn = 1 mod xn and x4n = (0 + 1) + 0 mod yn = 1 mod yn y4n = 2x2ny2n ⇒ y4n = 0 mod xn and y4n = 0 mod yn Again, the last congruence also follows from the fact that y0, y1, ··· is a divisibility sequence. Lemma 2.21 and Lemma 2.22 can also be shown using matrix notation, which might give some more insight into why the lemmas hold. Proof. We consider the ring homomorphism ’mod xn’: 2×2 2×2 xn 0 0 xn 0 0 0 0 f : Z → Z / , , , 0 0 0 0 xn 0 0 xn a a a mod x a mod x 11 12 7→ 11 n 12 n a21 a22 a21 mod xn a22 mod xn We compute f(An): n a d x dy 0 dy f(An) = f = f n n = n 1 a yn xn yn 0 We then use the fact that f is a homomorphism to find: 2 2 n 2 n 2 0 dyn dyn 0 −1 0 f (A ) = f(A ) = = 2 = = −I yn 0 0 dyn 0 −1 such that x mod x x −1 0 1 −1 2n n ≡ (An)2 0 = = y2n mod xn y0 0 −1 0 0 which proves the first and the third congruence of Lemma 2.21. We then find f((An)4): 2 f (An)4 = f (An)2 = (−I)2 = I 16 n We conclude that A is of order 4 modulo xn. It then similarly follows that x mod x 1 1 4n n ≡ I = y4n mod xn 0 0 which shows the first and third congruence of Lemma 2.22. n A completely analogous computation shows that A has order 2 modulo yn, which provides the other congruences. This method can also be utilized to prove Lemma 2.23 and Lemma 2.24, but this does not provide much additional insight. Lemma 2.23. For any integer j (hence allowing j to be negative) we have xj+2n = −xj mod xn Proof. First, note that x−j = xj, as √ −1 √ xj + yj d = xj − yj d Then, we apply Lemma 2.9 and Lemma 2.21 to obtain the following: xj+2n = xjx2n + dyjy2n ≡ xj · (−1) + dyj · 0 mod xn = −xj mod xn Lemma 2.24. For any integer j, we have xj+4n = xj mod xn Proof. This follows similarly from Lemma 2.22: xj+4n = xjx4n + dyjy2n ≡ xj · 1 + dyj · 0 mod xn = xj mod xn Lemma 2.25. Suppose xi ≡ xj mod xn, with n > 0 and 0 ≤ i ≤ j ≤ 2n. Then either i = j or we have the exceptional case: a = 2, n = 1, i = 0 and j = 2 Proof. We split the proof into two parts: either xn is even or xn is odd. We will show, in both cases, that x0, x1, ··· x2n are all different modulo xn, such that the result follows. xn−1 1. First, we treat the case in which xn is odd. We define q = 2 . Then we consider the following set: x0, ··· , xn−1 It follows from Lemma 2.11 that x0 < ··· < xn−1 xn xn Moreover, 2.10 implies that xn−1 ≤ a ≤ 2 < q. Thus these are all unique residues modulo xn that are smaller than q. We then consider the set xn+1, ··· x2n By Lemma 2.23, they are congruent modulo xn, respectively, to: −xn−1, ··· , −x0 Similarly, these are all unique negative residues modulo xn that are greater than −q. Thus all residues modulo xn of the set x0, x1, ··· x2n are unique numbers between −q and q, which is a range smaller than xn. We conclude that they are mutually incongruent modulo xn. From this follows our result: if xi and xj have the same residue, then they must be the same, as no two different possibilities have the same residue. 17 xn 2. Now suppose xn is even. We then define q = 2 . The result follows similarly, unless xn−1 = q. In that case, we will have, by Lemma 2.23 that xn+1 ≡ −q mod xn = q mod xn = xn−1 mod xn This is precisely the case when xn = axn−1 + dyn−1 = 2xn−1 This is the case when a = 2 and yn−1 = 0. This in turn implies that n = 1, i = 0 and j = 2, which completes the exceptional case. Lemma 2.26. Suppose xi ≡ xj mod xn, with n > 0, 0 < i ≤ n and 0 ≤ j < 4n. Then either j = i or j = 4n − i. Proof. We will split the proof up into two cases. Either j ≤ 2n or j > 2n. In the first case, Lemma 2.25 implies that j = i. The exceptional case is excluded: in that case, n = 1 and i = 0 or i = 2. This contradicts 0 < i ≤ n. In the second case, Lemma 2.24 implies x4n−j = xj−4n ≡ xj mod xn = xi mod xn Then similarly by Lemma 2.25, it follows that i = 4n − j, and hence j = 4n − i. The exceptional case is ruled out because both i and 4n − j cannot be zero. Lemma 2.27. If n > 0, 0 < i ≤ n, j is any integer and we have xi ≡ xj mod xn, then it follows that j ≡ ±i mod 4n. Proof. We can write j = 4n + r, where 0 ≤ r < 4n. Then Lemma 2.24 implies xr ≡ xj mod xn = xi mod xn Then from Lemma 2.26 follows that i = r or i = 4n−r, and thus j ≡ r mod 4n = ±i mod 4n. The following two lemmas have an elementary, arithmetical proof using induction and a more general proof, using ring theory, that can give more insight in why the lemmas hold. I will give both proofs, roughly as presented in [Dav73] and [Kui10], respectively. Lemma 2.28. xn ≡ 1 mod (a − 1) and yn ≡ n mod (a − 1). Proof. We show this by induction: For n = 0, 1, it is straightforwardly seen. Then suppose it holds up to n = k. We then use the following induction step: xk+1 = 2axk − xk−1 ≡ 2xk − xk−1 mod (a − 1) = 2k − (k − 1) mod (a − 1) = k + 1 mod (a − 1) √ Proof. The ring Z[ d] is the isomorphic to the polynomial ring Z[t]/(t2 − d). Any element of 2 this ring looks like√ x + yt, and in this ring t − d = 0 holds. The variable t could be seen as a replacement for d. With our choice of d, the ring becomes Z[t]/(t2 − a2 + 1). We naturally have √ 2 2 × × Z[t]/(t − a + 1) =∼ Z[ d] From equation (2.6), we know that the positive units of Z[t]/(t2 − a2 + 1) are of the form n xn(a) + yn(a) = (a + t) 18 In this lemma, we are looking at the ring homomorphism ’mod (a − 1)’: 2 2 2 Z[t]/(t − a + 1) → Z/(a − 1)Z [t]/(t ) x + yt 7→ x mod (a − 1) + t · (y mod (a − 1)) We have used that a2 − 1 = (a − 1)(a + 1). This induces a group homomorphism: