<<

Analysis of fast versions of the euclid algorithm Eda Cesaratto, Julien Clément, Benoît Daireaux, Loïck Lhote, Véronique Maume-Deschamps, Brigitte Vallée

To cite this version:

Eda Cesaratto, Julien Clément, Benoît Daireaux, Loïck Lhote, Véronique Maume-Deschamps, et al.. Analysis of fast versions of the euclid algorithm. 2008. ￿hal-00211424￿

HAL Id: hal-00211424 https://hal.archives-ouvertes.fr/hal-00211424 Preprint submitted on 21 Jan 2008

HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Analysis of fast versions of the Euclid Algorithm

Eda Cesaratto ∗ Julien Cl´ement † Benoˆıt Daireaux‡ Lo¨ıck Lhote § V´eronique Maume-Deschamps ¶ Brigitte Vall´ee k To appear in the ANALCO’07 Proceedings

Abstract gebra. Many gcd algorithms have been designed since There exist fast variants of the gcd algorithm which are Euclid. Most of them compute a sequence of remainders all based on principles due to Knuth and Sch¨onhage. On by successive divisions, which leads to algorithms with inputs of size n, these algorithms use a Divide and Con- a quadratic bit–complexity (in the worst-case and in the quer approach, perform FFT and stop average-case). Using Lehmer’s ideas [20] (which replace the recursion at a depth slightly smaller than lg n. A large divisions by large multiplications and small divi- rough estimate of the worst–case complexity of these sions), computations can be sped-up by a constant fac- fast versions provides the bound O(n(log n)2 log log n). tor, but the asymptotic complexity remains quadratic. However, this estimate is based on some heuristics and Major improvements in this area are due to Knuth [19], is not actually proven. Here, we provide a precise prob- who designed the first subquadratic algorithm in 1970, abilistic analysis of some of these fast variants, and we and to Sch¨onhage [24] who subsequently improved it the prove that their average bit–complexity on random in- same year. They use both Lehmer’s ideas and Divide puts of size n is Θ(n(log n)2 log log n), with a precise and Conquer techniques to compute in a recursive way remainder term. We view such a fast algorithm as a se- the quotient sequence (whose total size is O(n)). More- quence of what we call interrupted algorithms, and we over, if a fast with subquadratic complex- obtain three results about the (plain) Euclid Algorithm ity (FFT, Karatsuba...) is performed, then one obtains which may be of independent interest. We precisely a subquadratic gcd algorithm (in the worst-case). Such describe the evolution of the distribution during the ex- a methodology has been recently used by Stehl´eand ecution of the (plain) Euclid Algorithm; we obtain a Zimmermann [25] to design a Least-Significant-Bit ver- sharp estimate for the probability that all the quotients sion of the Knuth-Sch¨onhage algorithm. According to produced by the (plain) Euclid Algorithm are small experiments due to [5] or [22], these algorithms (with a enough; we also exhibit a strong regularity phenomenon, FFT multiplication) become efficient only for of which proves that these interrupted algorithms are lo- size larger than 10000 words, whereas, with the Karat- cally “similar” to the total algorithm. This finally leads suba multiplication, they are efficient for smaller inte- to the precise evaluation of the average bit–complexity gers (around 100 words). A precise description of the of these fast algorithms. This work uses various tools, Knuth-Sch¨onhage algorithm can be found in [29, 22] for and is based on a precise study of generalised transfer instance. operators related to the dynamical system underlying the Euclid Algorithm. 1.1 Previous results. The average-case behaviour of the quadratic gcd algorithms is now well understood. 1 Introduction First results are due to Heilbronn and Dixon in the sev- enties, who studied for the first time the mean number Gcd computation is a widely used routine in compu- of iterations of the Euclid Algorithm, then Brent anal- tations on long integers. It is omnipresent in rational ysed the Binary algorithm [4], and Hensley [15] provided computations, public key cryptography or computer al- the first distributional analysis for the number of steps of the Euclid Algorithm. Since 90, the Caen Group ∗Facultad de Ingeniera, Universidad de Buenos Aires, Ar- [26, 28, 27] has performed an average-case analysis of gentina and GREYC, Universit´ede Caen, France various parameters of a large class of Euclidean algo- † GREYC, Universit´ede Caen, F-14032 Caen, France rithms. More recently, distributional results have also ‡IrisResearch Center, Stavanger, Norway §GREYC, Universit´ede Caen, F-14032 Caen, France been obtained for the Euclid algorithm and some of its ¶IMB, Universit´ede Bourgogne, F-21078 Dijon Cedex, France variants: first Baladi and Vall´ee prove that a whole class kGREYC, Universit´ede Caen, F-14032 Caen, France of so–called additive costs of moderate growth follows bution relative to the Gauss density ϕ, defined as an asymptotic gaussian law [2] (for instance, the num- 1 1 ber of iterations, the number of occurrences of a given (1.1) ϕ(x)= . log2 1+ x digit, etc). This year, Lhote and Vall´ee [21] show that a more general class of parameters also follows an asymp- For rational inputs, one begins with some distribution totic gaussian law. This class contains the length of a on the set of the inputs x := A1/A0 of size n, and remainder at a fraction of the execution, and the bit- we consider the rationals xk := Ak+1/Ak. We focus complexity. To the best of our knowledge, there are on the first index k where the binary size of xk is less yet few results on “efficient” gcd algorithms. In [9], the than (1 δ)n and we denote the corresponding rational − authors perform an average-case analysis of Lehmer’s xk by x δ . What is the distribution of the rational h i algorithm, and exhibit the average speed-up obtained x δ ? The evolution of this distribution is clearly more using these techniques. But, as far as we know, there intricateh i than in the real case, since at the end of the does not exist any probabilistic analysis of subquadratic Algorithm (when δ = 1), the distribution is the Dirac gcd algorithms. This is the goal of this paper to perform measure at x = 0. We obtain here a precise description such a study. of this distribution (see Theorem 2 and Figure 1) which involves another density 1.2 Our results. There are two algorithms to be 12 log(m + x) analyzed, the algorithm and the algorithm. The (1.2) ψ(x) := . HG G π2 (m + x)(m + x + 1) algorithm computes the gcd, and the algorithm m 1 X≥ G(for Half-gcd Algorithm) only simulates theHG “first half” of the algorithm. We first show that these algorithms G Theoretical can be viewed as a sequence of the so–called Interrupted 1.4 Experimental Euclidean algorithms. An Interrupted Euclidean algo- rithm is a subsequence formed by successive iterations of 1.2 the algorithm. On an input (A, B), the plain Euclid al- 1 gorithm builds a sequence of remainders Ai, a sequence of quotients Qi, and a sequence of matrix i [see Sec- 0.8 tion 2.1]. On an input (A, B) of binary sizeMn, the Inter- 0.6 rupted starts at the index Density probability E[δ,δ+γ] k of the execution of the Euclid Algorithm, as soon as 0.4 the remainder Ak has already lost δn bits (with respect to the initial A which has n bits) and stops at index 0.2 k + i as soon as the remainder Ak+i has lost γn supple- 0 mentary bits (with respect to the remainder Ak). The 0 0.2 0.4 0.6 0.8 1 x algorithm is just an algorithm which simulates the interruptedHG algorithm . A quite natural question E[0,1/2] Figure 1: Density distribution of xhδi in the case δ = 1/2. is: How many iterations are necessary to lose these γn This corresponds to the density distribution of the rational bits? Of course, it is natural to expect that this sub- xk := Ak+1/Ak obtained as soon as ℓ(Ak) is smaller than sequence of the Euclidean algorithm is just “regular” (1/2)ℓ(A0). The density distribution is plotted against its and locally similar to the “total” Euclidean Algorithm; theoretical counterparts ψ(x). Here we consider Ωn for in this case, the number of iterations would be close to n = 48 (48 bits), the interval [0, 1] is subdivided into equal γ P (where P is the number of iterations of the “total” subintervals of length 1/50 to estimate the density, and Euclid algorithm). We prove in Theorem 1 that this is 3 537 944 rational are drawn from Ωn according to the initial the case. density distribution ϕ by Monte Carlo simulation. We also need precise results on the distribution of some truncatures of these remainders. This is done For a probabilistic study of these fast variants, a precise in Theorem 3. Then, the choice of parameters in the description of the evolution of the distribution during fast algorithms must take into account this evolution the execution of the plain Euclid Algorithm is of crucial of distribution. This is why we are led to design some interest. For real inputs, we know that the continued variants of the classical algorithms, denoted by and for which the precise analysis can be performed.HG fraction algorithm does not terminate (except for ratio- G nals ...). Moreover, as the continued fraction algorithm The fast versions also deal with other functions, which is executed, the distribution of reals tends to the distri- are called the Adjust functions. Such functions perform few steps of the (plain) Euclid Algorithm. However, the tories of “rational” points), and a transfer “from con- bit-complexity of the Adjust functions depends on the tinuous to discrete” must finally be performed. size of the quotients which are computed during these The present paper mainly uses two previous works, steps. Even for estimating the worst–case complexity of and can be viewed as an extension of them: first, the the fast variants, the Adjust functions are not precisely average-case analysis of the Lehmer-Euclid algorithm analyzed. The usual argument is “The size of a quotient performed in [9], second the distributional methods de- is O(1)”. Of course, this assertion is false, since this is scribed in [2, 21]. First, we again use the general frame- only true on average. Since the Adjust functions are work that Daireaux and Vall´ee have defined for the anal- related to some precise steps, the size of the quotients ysis of the Lehmer-Euclid Algorithm, which explains computed at these precise steps may be large. We how the Lehmer-Euclid algorithm can be viewed as a are then led to study the probability that the Euclid sequence of Interrupted Euclidean algorithms [δ,δ+γ]. Algorithm only produces “small” quotients. This is However, in [9], we only used some “easy” propertiesE of done in Theorem 4. And of course, we also need the transfer operator Hs. We now need proving that this result for our truncated data. This is the goal of properties which were already crucial in previous distri- Theorem 5. butional analysis [2, 1, 21] –namely, the US Property for 1 the quasi-inverse (I H )− of the transfer operator– Finally, we obtain the exact average-case complexity of − s our versions of the two main algorithms of interest, the also hold in our context. The US property can be sum- algorithm, and the algorithm itself. We prove the marized as follows: HGfollowing results [TheoremsG 6 and 7] about the average For any ξ, there exists a vertical strip of the form s 1 σ for which the following holds:S bit-complexity B, G of both algorithms, on the set of |ℜ − |≤ random inputs of size n (i) The quasi-inverse has a unique pˆole in , S E [B]=Θ(n(log n)2 log log n), n 1 2 (ii) On the left line s =1 σ, one has (I Hs)− [1] = En[G]=Θ(n(log n) log log n). O( s ξ). ℜ − − |ℑ | Furthermore, we obtain some precise information about Here, our various theorems lead to study parameters of the constants which are involved in the Θ–terms. Then, various type, whose generating functions involve various our proven average bit–complexity of the , algo- operators G which depend on two variables s, w. rithms is of the same order as the usual heuristicHG G bound s,w However, for small w’s, all these operators can be viewed on the worst-case complexity of , algorithms. 1 HG G as a perturbation of the quasi-inverse (I Hs)− and we have to prove that the US Property extends− to these 1.3 Methods. All our main conclusions obtained perturbated quasi-inverses. In particular, the existence here are “expected”, and certainly do not surprise the of a strip where the US property holds uniformly reader. However, the irruption of the density ψ is not with respectS to w is crucial in the analysis: the main expected, and an actual proof of this phenomenon is choices of the parameters in our versions and of not straightforward. This is due to the fact that there the fast algorithms depend on the width σHGof this stripG is a correlation between successive steps of the Euclid (see Theorem 3). Algorithm. Then, the tools which are usual in analysis of algorithms [13], as generating functions, are not well- Plan and notations. Section 2 describes the main algorithms and . Section 3 is devoted to explain adapted to study this algorithm. All the analyses which HG G will be described here are instances of the so–called the main steps to be done for obtaining a proven dynamical analysis, where one proceeds in three main analysis. We then state our main results of general steps: First, the (discrete) algorithm is extended into interest. In Section 4, we describe the versions and to be analyzed, and we prove the two mainHG results a continuous process, which can be defined in terms of G the dynamical system related to the Gauss map. Then, about their average bit-complexity. The last Section is devoted to the proof of the main results of Section 3. the transfer operator Hs defined as We denote the logarithm in base 2 by lg x, and ℓ(x) 1 1 denotes the binary size of x, namely ℓ(x) := H [f](x) := f s (m + x)2s m + x lg x + 1. m 1   ⌊ ⌋ X≥ explains how the distribution evolves, but only in the 2 Fast and Interrupted Euclidean algorithms continuous world. The executions of the gcd algorithm We present in this section the main algorithms studied are now described by particular trajectories (i.e., trajec- in this paper. We first describe the general structure of the Knuth-Sch¨onhage algorithm. We explain how the The Lehmer Algorithm [20, 18] replaces large divi- algorithm can be seen as a sequence of interrupted sions by large multiplications and small divisions. The EuclideanHG algorithms, where the sequence of divisions fast algorithm applies recursively the principles of is stopped as soon as the integers have lost a fraction of Lehmer, and using fast FFT multiplications of complex- their number of bits. ity Θ(µ(n)) (with µ(n) = n log n log log n) replaces the costly computation of the remainder sequence Ai (which 2 2.1 Euclid’s algorithm. Let (A1, A0) be a pair of requires O(n ) bit operations), by a sequence of matrix positive integers with A1 A0. On input (A1, A0), products: it divides the total Euclidean Algorithm into the Euclid algorithm computes≤ the remainder sequence interrupted Euclidean algorithms, of the form and E(i,j) (Ak) with a succession of divisions of the form computes matrices of the form (i,j), defined in (2.5). The recursion, based on Divide andM Conquer techniques, (2.3) Ak = Qk+1Ak+1 + Ak+2 is stopped when the integers are small enough, and, at this moment, the algorithm uses small divisions. One A with Q = k , finally obtains a subquadratic gcd algorithm. k+1 A  k+1  2.2 How to replace large divisions by small and stops when Ap+1 = 0. The integer Qk is the k– th quotient and the successive divisions are written as divisions? Lehmer remarked that, when two pairs (A, B) and (a,b) are sufficiently close (i.e., the rationals k = k+1 k+1, A Q A A/B and a/b are close enough), the Euclid algorithm on

Ak+1 0 1 (A, B) or(a,b) produces (at least at the beginning) the with k := and k := , A Ak Q 1 Qk same quotient sequence (Qi). This is why the following     definition is introduced: so that Definition. Consider a pair (A, B) with A B and an ≤ integer b of length ℓ(b) ℓ(B). We denote by π[b](A, B) (2.4) = with := . ≤ A0 M(i)Ai M(i) Q1Q2 ···Qi any pair (a,b) which satisfies In the following, we consider a “slice” of the plain A a 1 . Euclidean Algorithm , between index i and index j, B − b ≤ b E namely the interrupted algorithm which begins E(i,j) with the pair as its input and computes the sequence Ai And the criterion (due to Lehmer and made precise by of divisions (2.3) with i k j 1. Its output is the Jebelean) is: pair together with the≤ matrix≤ − Aj Lemma 1. [Lehmer, Jebelean] For a pair (A, B) with A B and n := ℓ(B), consider, for m n, the small j ≤ ≤ pair (a,b) = π[b](A, B) of length ℓ(b) = m, and the (2.5) (i,j) = k, (1,i) = (i), M Q M M sequence of the remainders (a ) of the Euclid Algorithm k=i+1 i Y on the small input (a,b). Denote by k the first integer k with matrix defined in (2.4). We define the size of for which ak satisfies ℓ(ak) m/2 . Then the sequence M(i) ≤⌈ ⌉ a matrix as the maximum of the binary sizes of its of the quotients qi of the Euclid Algorithm on the small coefficients.M The size ℓ of the matrix satisfies input (a,b) coincide with the sequence of the quotients (i,j) M(i,j) Qi of the Euclid Algorithm on the large input(A, B) for j i k 3. (2.6) ℓ 2(j i)+ ℓ(Q ) ≤ − (i,j) ≤ − k Usually, this criterion is used with a particular pair k=i+1 X π[b](A, B) where the integer b is obtained by the m- truncation of B, i.e., the suppression of its (n m) The (naive) bit-complexity C(i,j) of the algorithm (i,j), E least significant bits. Then a is easy to compute− since j it may be chosen itself as the m-truncation of A. In C(i,j) := ℓ(Ak) ℓ(Qk) this case, the π[b] function corresponds to truncation of · both A and B and is denoted by T (A, B). However, k=Xi+1 m the Jebelean criterion holds for any choice of (a,b) = satisfies π[b](A, B), even if the integer a is less easy to compute j in the general case: the integer a can be chosen as the integer part of the rational (Ab)/B, and its computation (2.7) C(i,j) ℓ(Ai+1) ℓ(Qk). ≤ · needs a product and a division. k=Xi+1 2.3 Interrupted Algorithms. In Jebelean’s prop- Algorithm δ(A, B) E erty (Lemma 1), the Euclid Algorithm on the small pair n := ℓ(B) (a,b) of binary size m is stopped as soon the remainder i := 1 ak has lost m/2 bits. This is a particular case of the A1 := A,A0 := B ⌈ ⌉ 0 := I so–called Interrupted Euclidean Algorithm of parame- M While ℓ(Ai) > (1 δ) n ter δ (with 0 <δ < 1), which stops as soon as the − · Qi := Ai−1/Ai current remainder has lost δm bits (with respect to ⌊ ⌋ ⌊ ⌋ Ai+1 := Ai−1 QiAi the input which has m bits). This (general) interrupted − i := i−1 i M M ·Q Algorithm denoted by δ, and described in Figure 2, is i := i + 1 defined as follows: On theE input (A, B)ofΩ , this algo- n Return (Ai−1,Ai, i−1) rithm begins at the beginning of the Euclid Algorithm, M and stops as soon as the remainder Ai has lost δn bits Algorithm δ(A, B) (with respect to the input B). Then, with the notations E n := ℓ(B) defined in Section 2.1, one has = , with b Eδ E(1,Pδ ) i := 1 A1 := A,A0 := B (2.8) Pδ := min k; ℓ(Ak) (1 δ)n . ≤⌊ − ⌋ 0 := I M  While ℓ(Ai) > (1 δ) n Figure 2 also describes the δ Algorithm, which is just − · E Qi := Ai−1/Ai a slight modification of the δ Algorithm, where the ⌊ ⌋ Ai+1 := Ai−1 QiAi E − last two steps are suppressedb (in view of applications of i := i−1 i M M ·Q Lemma 1), and P denotes the variable P 2. Then, i := i + 1 δ δ − P , and P are just the number of iterations of the , Return (Ai−3,Ai−2, i−3) δ δ Eδ Eδ M algorithms and Pb1 = P is just the number of iterations of the Euclidb Algorithm. b Figure 2: The Algorithm, and the algorithm, Eδ Eδ In the following, it will be convenient to consider more which is a slight modification of the δ Algorithm. E general interrupted algorithms, of the form [δ,δ+γ]. b The Algorithm is defined as follows:E On the E[δ,δ+γ] input (A, B) of Ωn, this algorithm begins at the Pδ- (a,b) will produce a matrix which is a matrix which th iteration of the Euclid Algorithm, as soon as the would have been produced byM the Euclid algorithm on remainder Ak has lost δn bits (with respect to the ⌊ ⌋ the pair (A′,B′). Then, the pair (C,D) computed as input B) and stops when the remainder Ai has lost C 1 A′ ( D ) = − B′ is a remainder pair of the Euclid γn supplementary bits (with respect to the input B). algorithmM on the input (A, B). The size of the matrix Then,⌊ ⌋ = = and = ,  E[0,δ] Eδ E(0,Pδ) E[δ,γ+δ] E(Pδ,Pδ+γ ) is approximately m/2, but smaller than m/2 (due to where Pδ is defined in (2.8). Of course, we can also theM two backward steps of Lemma 1), and thus of the design the variants with a hat, where the last two steps form (m/2) r(a,b), where r(a,b) is the number of bits are suppressed. which are “lost”− for the matrix during the backward steps. Then, with (2.6), M 2.4 Description of the Algorithm. The general principles. This isHG the algorithm which 1/2 r(a,b) 2max ℓ(qi)+1; 1 i p(a,b) , is used in Jebelean’s Lemma. ThisE lemma is a main tool ≤ { ≤ ≤ } to compute (in a recursive way) a functionb [for Half- where q are the quotients that occur in (a,b), and HG i gcd]. On an input (A, B) of binary size n, this function p(a,b) the number of steps of (a,b). If theE truncature E returns exactly the same result as 1/2, but runs faster. length m is chosen as a linear function of the input size With the algorithm , it is possibleE to design a fast n, of the form m =2γn, then the size of the pair (C,D) HG algorithm denoted which computesb the gcd itself. Let is approximately equal to [1 δ γ]n, but slightly larger. G − − us explain the main principles of the algorithm. If we wish to obtain a remainder pair (C′,D′) of length HG Suppose that the Euclid Algorithm, on an input (A, B) [1 δ γ]n, we have to perform, from the pair (C,D) a certain− − number of steps of the Euclid Algorithm, in of length n, has already performed Pδ iterations. Now, order to cancel the loss due to the backward steps. This the current pair, denoted by (A′,B′) has a binary size close to (1 δ)n. We may use the Jebelean Property to is the goal of the Adjust function, whose cost will be − (1 δ)n r(a,b) [see (2.7)]. continue. Then, we choose a length m for truncating, ≈ − · an integer b of length m, and consider the small pair Finally, we have designed an algorithm which produces (a,b) = π (A′,B′). The algorithm on this pair the same result as the interrupted algorithm , [b] HG E[δ,δ+γ] and can be described as follows: Algorithm (A, B) HG n := ℓ(B) (i) it truncates the input (A′,B′), with a truncation S := log2 n length m = 2γn, and obtains a small pair (a,b) of Return (A,B,S) length m. HG (ii) it performs the algorithm on the pair (a,b): Algorithm (A,B,S) HG this produces a pairHG (c, d) and a matrix . 1 n := ℓ(B) M 2 If n S then return 1/2(A, B) C 1 A′ ≤ E (iii) it performs the product ( ) := − ′ . 3 := I b D M B M 4 m := n/2 ;  ⌊ ⌋ (iv) It uses the Adjust function, which performs some 5 For i := 1 to 2 do steps of the Euclid Algorithm from the pair (C,D) 6 (ai,bi) := Tm(A, B) and stops as soon the current remainder pair 7 (ci,di, i) := (ai,bi,S) C M −1 HG (C′,D′) has a size equal to (1 δ γ)n . 8 i := ( A ) ⌊ − − ⌋ “ Di ” Mi B 9 Adjusti(Ci, Di, i) 2.5 The usual designs for the and algo- M 10 (A, B):=(Ci, Di) rithms. HG G How to use this idea for computing (in a re- 11 := i M M·M cursive way) the Algorithm? The usual choice for 12 Return (A,B, ) γ is γ =1/4, moreHG precisely m = n/2 . Then, the pre- M vious description provides a method⌈ ⌉ to obtain E[0,1/4] Algorithm (A, B) (with a first choice δ = 0), then (with a second G E[1/4,1/2] 1 n := ℓ(B) choice δ = 1/4). Since [0,1/2] = [0,1/4] [1/4,1/2], we 2 T := √n log n E b E · E 3 While ℓ(A) T do are done. Remark that using the “hat” algorithm in ≥ 4 (A,B, 1) := (A, B) the second step leads tob modifying the Adjustb function M HG for this step. We use (for this second step) a so–called 5 Return gcd(A, B) “hat Adjust” function, which may also perform some backward steps in the Euclid Algorithm on the large Figure 3: General structure of the classical algorithms and . inputs. HG G The general structure of the algorithm is described in Figure 3. The recursion is stoppedHG when the naive algorithm becomes competitive. This defines a Θ(µ(n)) log n, since the cost due to the leaves (where the E1/2 threshold for the binary size denoted by S (remark that naive is performed) will be of asymptotic smaller E1/2 S = S(n) isb a function of the input size n). order. Then, H satisfies the relation With this algorithm, we can obtain an algorithm b HG n 2 named which computes the gcd. The idea for 2H µ(n)log n, G · 2H ≈≤ designing such an algorithm is to decompose the total   Euclid Algorithm into interrupted algorithms, as n so that S(n) = log2 n, H lg n 2 lg lg n. 2H ≈≤ ≈≥ − [0,1] = [0,1/2] [1/2,3/4] . . . [1 (1/2)k,1 (1/2)k+1] . . . E E ·E · ·E − − · This is the “classical” version of the Knuth–Sch¨onhage Then, the algorithm, when running on inputs of algorithm. Clearly, the cost of this algorithm comes k HG size n/(2 ) produced by the [0,1 (1/2)k] algorithm can from three types of operations: E − easily simulate the [1 (1/2)k,1 (1/2)k+1] algorithm. E − − This decomposition also stops when the naive algorithm (i) the two recursive calls of line 7; gcd becomes competitive. This defines a threshold for the length denoted by T (remark that T = T (n) is also (ii) the products done at lines 8 and 11: with a clever a function of the input size n). implementation, it is possible to use in line 8 the We now consider the Algorithm, where all the pair (c, d) just computed in line 7. If all the products use a FFT multiplicationHG of order matrices and integer pairs have –on average– the expected size, the total expected cost due to the µ(n)=Θ(n log n log log n). products is [12 + 8 + 8] µ(n/4) = 28 µ(n/4);

In this case, we choose the recursion depth H so that (iii) the two functions Adjust performed at line 9, the main cost will be the “internal” cost, of order whose total average cost is R(n). We consider as the set of possible inputs of the course, this assertion is false, since this is only true on algorithm the set HG average. Since the Adjust functions are related to some precise steps, the size of the quotient computed at these Ω := (u, v); 0 u v, ℓ(v)= n n { ≤ ≤ } precise steps may be large. We are then led to study the probability that the Euclid Algorithm only produces endowed with some probability P . We denote by B(n) n “small” quotients. the average number of bit operations performed by the algorithm on Ω . Since each of the two recursive n 3.1 Evolution of densities. Consider a density f calls is madeHG on data with size n/2, it can be “expected” on the interval = [0, 1]. Any set Ω is seen as a set that B(n) asymptotically satisfies n of rationals n n (2.9) B(n) 2B( )+28 µ( )+ R(n) for n>S. Ω := x Q ]0, 1], x = u/v, ℓ(v)= n ≈ 2 4 n { ∈ ∩ }

Moreover, the average cost R(n) can be “expected” to and is endowed with the restriction of f to Ωn: for any be negligible with respect to the multiplication cost x0 Ωn, µ(n). If the FFT multiplication is used, with µ(n) = ∈ 1 Θ(n log n log log n), the total average bit–cost is “ex- P (x ) := f(x ). n,f 0 f(x) 0 pected” to be x Ωn ∈ B(n) Θ(µ(n)log n)=Θ(n(log n)2 log log n). The evolution of the densityP during the execution of the ≈ Euclid Algorithm is of crucial interest. We begin with With this (heuristic) analysis of the algorithm, it all the sets Ωn endowed with the probability Pn,f . is easy to obtain the (heuristic) averageHG bit–complexity For x Ω, we recall that P (x) is the smallest integer k of the algorithm which makes a recursive use of the δ where∈ the binary size of x is less than (1 δ)ℓ(x). We algorithmG and stops as soon as the naive algorithm k are interested in describing the density of− the rational becomesHG competitive. It then stops at a recursion depth x δ defined as M, when h i n 2 µ(n)log n, x δ := xk when Pδ(x)= k. 2M ≈≤ h i so that   This rational is the input for all interrupted algorithms n 1 with a beginning parameter δ. It is then essential to T (n)= √n log n, M lg n lg lg n. study the random variable Pδ. Since the rational x 2M ≈≤ ≈≥ 2 − loses ℓ(x) bits during P (x) iterations, it can be expected The average bit–cost G(n) of the algorithm on data that it loses δℓ(x) bits during δP (x) iterations, which G of size n satisfies would imply that Pδ(x) is sufficiently close to δP (x).

M 1 This is what we call the regularity of the algorithm. − n G(n) B( ) so that G(n) Θ(B(n)). With techniques close to the renewal methods, we prove ≈ 2i ≈ i=0 a quasi-powers expression for the moment generating X function of Pδ, from which we deduce two results. The 3 The main steps towards a proven analysis. first one is interesting per se, and provides an extension The analysis is based on the Divide and Conquer of the result of Baladi-Vall´ee [2], which exhibits an equation (2.9), which is not a “true” equality. It is not asymptotic gaussian law for P := P1. clear why a “true” equality should hold, since each of the two recursive calls is done on data which do not Theorem 1. Consider the set Ωn endowed with a possess the same distribution as the input data. And, probability Pn,f relative to a strictly positive function f of class 1. Then, for any δ ]0, 1], the random variable of course, the same problem will be asked at each depth C ∈ of the recursion. We have to make precise the evolution Pδ is asymptotically gaussian on Ωn [with a speed of of the distribution during the Euclid Algorithm, but also convergence of order O(1/√n)]. Moreover, the distribution of the truncated data. 2log2 Moreover, we have also to make precise the average bit– En,ψ[Pδ] δn, ∼ Λ (1) · complexity R(n) of the Adjust function. Even for esti- | ′ | Λ (1) mating the worst–case complexity of the fast variants, V [P ] (2 log 2)2 ′′ δn. n,f δ ∼ Λ (1)3 · the Adjust functions are not precisely analyzed. The ′ usual argument is “The size of a quotient is O(1)”. Of

Theorem 2. There exist n0 and a constant σ > 0, This is sufficient for applying Jebelean’s criterion such that, for any n n0, for any real δ with 0 <δ< 1, (Lemma 1). for any fonction ε : [0≥, 1] [0, 1], such that ε(x) x, for any interval J I, for any→ strictly positive density≤ f of We start with a strictly positive density f of class 1 on 1 ⊂ C class , the probability that the rational x δ computed [0, 1]. The function gm = gm(f) defined on Ωm as C h i by the Euclid Algorithm belongs to the interval J 1 satisfies g (f)(y )= f(t)dt m 0 J(y ) | 0 | ZJ(y0) Pn,f [x δ J]= ψ(t)dt [1 + βn′ (δ, J, ε)] . P P h i ∈ · satisfies n,f [x; πm(x) = y] = m,gm(f)[y]. Further- ZJ  more, the function gm(f) is a smoothed version of the Here, the density ψ is described in (1.2), the real σ is initial function f, and the width of the US strip and g (f)(x)= f(x)+ O( J f ) ε( J ) m | |·k k1 βn′ (δ, J, ε)=O | | + P J m,gm(f) m −  | |  1 so that =1+ O(2 ). 2 Pm,f 1 σ(1 δ)n σ δn O 2− − + O 2 2 . ε( J ) J Since f is a density on [0, 1], the cumulative sum of  | | | |  gm(x) onΩm satisfies The constants in the O-term only depend on the func- tion f. gm(f)(x)= θm.

x Ωm 3.2 Probabilistic truncations. Finally, we are also X∈ interested by the distribution of the truncated rationals. This allows a comparison between two probabilities: We recall that the truncated rationals classically used Lemma 2. Consider a strictly positive density f of are obtained with truncations of numerator A and class 1 on I. For any n, for any m n, for any y Ω , denominator B of rational A/B. It is not clear how m one hasC ≤ ∈ to reach the distribution of such truncated rationals. m This is why we define a probabilistic truncation, which Pn,f [x; πm(x)= y]= Pm,f [y] [1 + O(2− )], leads to more regular distributions, and also allows us · to apply Jebelean’s Property (Lemma 1). where the constant in the O-term only depends on f. For x = A/B Ω , and m n, we define π (x) as n m 3.3 Truncations and evolution of densities. follows: ∈ ≤ With Theorem 2, the previous comparison of densities (1) Choose a denominator b in the set of integers of done in Lemma 2, and the optimal choice of ε(x)= x2, binary size m, with a probability proportional to we obtain the following result which will be a central b. More precisely, we choose a denominator b tool in our analysis. according to the law Theorem 3. Consider the constant σ and the density 1 Pm[b = b0]= b0 with θm = b. ψ of Theorem 2. There exists n0 such that, for any θm · b;ℓ(b)=m n n , for any real δ with 0 <δ< 1, for any pair of X ≥ 0 strictly positive constants σ1, σ2 with σ1 < (2/3)σ, for (2) Compute the integer a which is the integer part of a truncation m which satisfies x b. This computation involves the product A b · · then the division of the integer A b by B. This σ2 δn

However, we will see that using this probabilistic Pn,ψ[x; πm(x δ )= y0]= Pm,ψ[y0] [1 + O(βn)] , truncation does not change the order of the average h i · σ3(1 δ)n σ4δn complexity of the algorithm. with βn =2− − +2− , HG and σ = σ (3/2)σ , σ := inf(σ/2, σ ). (3) Define πm(A/B) as the rational a/b, and remark 3 − 1 4 2 1 that the set πm− (a/b) is the interval Remark that Lemma 2 is designed to deal with the case a a a 1 a 1 m J := , + , with J( ) = = Θ(2− ). b b b b b b δ = 0.

  

3.4 Analysis of the Adjust function: the rˆole of small quotients. In the design of fast versions of We also need the same type of result for our variables the Euclid Algorithm, the rˆole of the Adjust function πm(x δ ). is important. This is due to the fact that Jebelean’s h i Property does not produce rationals of the exact size. Theorem 5. Consider the general framework of The- We return to the notations of Section 2.4. Consider a orem 3. The probability that the rational πm(x δ ) be- h i (large) input (A, B) of size n. During the two backward longs to the set m of small quotients satisfies steps that are necessary for the Euclid algorithm on the U 1 (log log m)1/2 small input (a,b), the matrix gets smaller, so that Pn,ψ[x; πm(x δ ) m]=1 O(m − ). the rational (C,D) has a largerM size. It is then necessary h i ∈U − to perform some steps of the Euclid algorithm on (C,D), Due to Lemma 2, this is also true for the case δ = 0 . and the cost of these steps [see Section 2.4] is less than 4 The algorithms to be analyzed. 2n max ℓ(qi)+1; 1 i p(a,b) , · { ≤ ≤ } There are two main differences between the usual HG where q are the quotients that occur in (a,b), and and Algorithm and our versions to be analyzed which i G p(a,b) the number of steps of (a,b). If we wishE this cost are denoted as and . See Figure 4. HG G to be “small”, it is then crucialE to study the probability that the Euclid Algorithm produces small quotients, (i) the truncatures Tm defined in Section 2.2 are and the set of interest gathers the rationals x for which replaced by probabilistic truncatures πm, as defined in Section 3.3. all the quotients qi(x) which occur in the Continued Fraction Expansion of x are “small”. Of course, it is (ii) For the algorithm, the number of recursive calls natural to relate the size of the quotients to the size of L andHG the degree of truncatures (i.e., the ratio the rational itself, and we are led to introduce the subset m/n) are not the same as in the Algorithm. of Ω defined as HG Un n The algorithm is also built as a Divide and HG 1/2 Conquer Algorithm; however, the relations between n := x Ωn; ℓ(qi(x)) log n (log log n) , U { ∈ ≤ · the truncation m and the parameter δ, crucial for applying Theorems 3 and 5, lead to a recursive (3.10) for1 i p(x) . algorithm with L recursive calls, where L ≤ ≤ } depends onHG the width of the US strip and can be Remark that, by construction, when x = a/b belongs greater than 2. to , the Adjust function for the large input (A, B) of Um size n will be less than n log m(log log m)1/2: it will be (iii) The study is done when the initial density equals negligible with respect to µ(n). More generally, for any ψ. This choice makes easier the study of various re- sequence n M(n), we may define cursions. The constants which appear in Theorems 7→ 6 and 7 are relative to this particular case. Since [

Theorem 4. [7] There exist n0 and a real σ > 0 such density, with other constants, which depend on f. that, for any strictly positive density f of class 1, for [

E [A ]=Θ(L) m log m (log log m)1/2 + n,ψ i nk L En,ψ[Bk,L]= 1+ O(2− ) Θ µ(nk)log nk 2 1 (log log m)1/2 log L × Θ(L) O(m ) O m −    1/2 1/2 1 n 1  (4.15) 1+ O , = Θ(L) µ( ) . log log log n L log log(n/L) "  k  #   The total mean cost of the Adjust functions is then where the first factor is due to Theorem 3, and the other factors due to Theorem 6. n 1 1/2 (4.14) Θ(L2)µ( ) . L · log log(n/L) 4.3 End of the recursion. We stop calling the   algorithms inside the algorithm when the naive gcd algorithmHG becomes competitive,G namely, when the Total mean cost Cn. The mean cost Cn is the sum of the mean costs of all the functions Adjust and the products, bound T (n)= nM (see Figure 4) satisfies at any node of the tree of the recursive calls. Relations L (4.12) and (4.14) entail the equality n2 =Θ µ(n )log n , M log L M M   H h n Cn = Θ(L) L µ( ) + 2Lh so that nM = √n log n, M = Θ(log n). hX=1 H 1/2 Then the total cost G of the Algorithm satisfies h n 1 G Θ(L) L µ( h ) n L · log log Lh hX=1   M En,ψ[G]= En,ψ[Bk,L] 1/2 L 1 k=1 =Θ µ(n)log n 1+ O . X log L · log log log n L 2   "   # =Θ n log n (log log n) [1 + ε′(n)] , log L · Finally, we have proven the following   Theorem 6. Consider the algorithm defined in where the error term ε′(n) satisfies Figure 3, relative to a parameterHG L which satisfies 1/2 L max( 4/σ , 2), where σ is the width of the US strip. √n 1 ≥ ⌈ ⌉ 1+ε′(n)= 1+ O(n− ) 1+ O , The average bit–complexity BL of this algorithm on · log log log n "   # inputs of size n satisfies HG h i

1/2 L 1 1/2 1 En,ψ[BL]=Θ µ(n)log n 1+ O , so that ε′(n)= O . log L log log log n log log log n       where the constants in the O and Θ–terms are uniform Finally, we have proven the following: with respect to L. Theorem 7. Consider the algorithm defined in G 4.2 The k-th recursive call. The k-th recursive Figure 3, relative to a parameter L which satisfies L = max( 4/σ , 2), where σ is the width of the US call of to is made on integers with size nk = ⌈ ⌉ kG1 HG strip. The average bit–complexity G of this algorithm n(1/2) − . It deals with values δ which belong to G k 1 k on inputs of size n satisfies the interval [1 (1/2) − , 1 (1/2) ]. We consider a − − truncation mk of the form mk = nk/Lk with an integer L Lk 2, and the following constraints due to Theorem E [G]=Θ n log2 n log log n ≥ n,ψ log L × 3 are now:   1/2 k 1 1 1 1 2 − 1 1+ O , δn m = n (1 δ)n = (1 δ)n. k 1 k k k " log log log n # 2 − Lk ≤ Lk ≤ Lk − Lk −  

We can choose Lk equal to L, and then Theorems 3 and where the constants in the O and Θ–terms are uniform 5 can be applied. Then, in the same vein as previously, with respect to L. 5 Dynamical Analysis: Proofs of main is relative to matrix of Section 2.1. Then, the M(i) Theorems 1, 2, 4. CF –expansion (5.16) of a1/a0, when splitted at depth Here, we explain how to prove these Theorems. We i, creates two LFT’s bi := h1 h2 . . . hi 1 and e := h . . . h defining each a◦ rational◦ number:◦ − the first present the general framework of the so–called dy- i i ◦ ◦ p namical analysis, which uses various tools, that come “beginning” rational bi(0), and the “ending” rational from analysis of algorithms (generating functions, here ei(0). The “ending” rational ei(0) can be expressed with of Dirichlet types, described in 5.3) or dynamical sys- the remainder sequence (ai) tems theory (mainly transfer operators Hs, described ai+1 in 5.4 ). We introduce the main costs C of interest ei(0) := hi+1 hi+2 hp(0) = , ◦ ◦···◦ ai (in 5.2), and their related Dirichlet series, for which we provide an alternative expression with the transfer op- while the “beginning” rational bi(0) can be expressed erator (in 5.5). For obtaining the asymptotic estimates with the two co-sequences (ui), (vi) related to coeffi- of Theorems 1, 2, 3, 5, we extract coefficients from these cients of matrix M(i) Dirichlet series, in a “uniform way”. Then, Property US (already described in 1.3) is crucial here for applying ui bi(0) := h1 h2 hi 1(0) = | |. with success the Perron Formula, as in previous results ◦ ◦···◦ − vi | | of Baladi and Vall´ee [2], where this Property is proven The main parameters of interest of the Euclid Algorithm to hold for the quasi-inverse (I H ) 1. However, the s − involve the denominators sequences a , v , which are true quasi-inverse is replaced here− by some of its pertur- i i called the continuants. The continuants are closely bated versions for which we have to prove that Property related to derivatives of LFT’s: for any LFT h, the US also holds. A sketch of this proof is done in 5.7. derivative h′(x) can be expressed with the denominator function D defined by D[g](x)= cx + d 5.1 The Euclidean Dynamical system. When computing the gcd of the integer-pair (a0,a1), Euclid’s ax + b for g(x)= with gcd(a,b,c,d)=1, algorithm performs a sequence of divisions. A division cx + d a = bq + r replaces the pair (u, v) with the new pair as (r, u). If we consider now rationals instead of integer pairs, there exists a map S which replaces the (old) ra- det h (5.17) h′(x)= . tional u/v by the (new) rational r/u, defined as D[h](x)2

1 1 And, finally, since any LFT h ⋆ has a determinant S(x)= , S(0) = 0. ∈ H x − x of equal to 1, one has:   1/2 1/2 When extended to the real interval I = [0, 1], the pair (5.18) vi = b′ (0) − , ai = e′ (0) − . | i | | i | (I,S) defines the so–called dynamical system relative to Euclid algorithm. We denote by the set of the inverse 5.2 Costs of interest. We now describe the main branches of S, H costs that intervene in this paper. For Theorem 1, we consider the cost C1 := Pδ for δ [0, 1], defined by the 1 ∈ = h : x ; q 1 , relation (2.8). This means that Pδ(x)= k iff H { [q] → q + x ≥ } ℓ(xk) (1 δ)ℓ(x) <ℓ(xk 1). and by p the set of inverse branches of depth p (i.e., ≤⌊ − ⌋ − H p p the set of inverse branches of S ), namely = h = For Theorem 2, we consider the cost C2 (which depends ⋆ Hp { h1 hp; hi , i . The set := p is the set on the interval J), of all◦···◦ the possible∈ H inverse∀ } branchesH of any∪ depth.H Then, C2(x)= 1 x J . the sequence (2.3) builds a continued fraction { hδi∈ } u For Theorem 4, we consider the cost C (which depends (5.16) = h(0) with h = h h . . . h p. 3 v 1 ◦ 2 ◦ ◦ p ∈ H on an integer M) One then associates to each execution of the algorithm a C3(x)= 1Ω[ 0 inside the convergence≥ domain of F , operator acts. Here, the Banach space is 1( ), and ℜ P C I L+i s+1 we recall now the main properties of the operator Hs, 1 ∞ T Ψ(T ) := an(T n)= F (s) ds . when acting on this functional space. − 2iπ L i s(s + 1) n T Z − ∞ H 1 X≤ For (s) > 1/2, the operator s acts on ( ) and the It is next natural to modify the integration contour mapℜs H is analytic. For s = 1, theC operatorI is → s s = L into a contour which contains a unique pole quasi–compact: there exists a spectral gap between the ℜof F (s), and it is thus useful to know that the Property unique dominant eigenvalue (that equals 1, since the US [Uniform Estimates on Strips] holds [see Section 1.3]. operator is a density transformer) and the remainder of the spectrum. By perturbation theory, these facts — From works of Dolgopyat [11] and Baladi-Vall´ee [2], we 1 know that (I H )− satisfies the US Property, with existence of a dominant eigenvalue λ(s) and of a spectral − s gap— remain true in a complex neighborhood of s = a strip of width σ. The real σ mentioned in Theorem 1. There, the operator splits into two parts:V the part 2, 3 and 5 is precisely the one associated to this US Property. relative to the dominant eigensubspace, denoted Ps, and the part relative to the remainder of the spectrum, For Theorem 2, the Dirichlet series (1/w)F2(s,w,J) denoted Ns, whose spectral radius is strictly less than defined in Section 5.5 can be viewed as a perturbation η λ(s) (with η< 1). This leads to the following spectral of | | decomposition 1 1 F (s, J) := (I H )− [1 H′ (I H )− [f]](0). 2 − s J · s ◦ − s This Dirichlet series F (s, J) involves the operator Hs[f](x)= λ(s)Ps[f](x)+ Ns[f](x), 2 Hs′ := (d/ds)Hs, has a pˆole of order 2 at s = 1, and n satisfies for s close to 1, which extends to the powers Hs of the operator 1 1 2 n n n F2(s, J) ϕ(0) H′[ϕ](t)dt , (5.21) Hs [f](x)= λ (s)Ps[f](x)+ Ns [f](x), ∼ (s 1)2 λ (1) −  ′  ZJ  and finally to the quasi-inverse (I H ) 1 where ϕ is the Gauss density defined in (1.1). This s − H (5.22) − explains why ψ = ′[ϕ] introduced in (1.2) plays a central rˆole in our analyses. 1 λ(s) 1 (I H )− [f](x)= P [f](x)+(I N )− [f](x). − s 1 λ(s) s − s For Theorem 4, Cesaratto and Vall´ee studied in [7] the − constrained operator H[ 1 . The dominant eigenvalue λ(s) is m