Analysis of fast versions of the euclid algorithm Eda Cesaratto, Julien Clément, Benoît Daireaux, Loïck Lhote, Véronique Maume-Deschamps, Brigitte Vallée
To cite this version:
Eda Cesaratto, Julien Clément, Benoît Daireaux, Loïck Lhote, Véronique Maume-Deschamps, et al.. Analysis of fast versions of the euclid algorithm. 2008. hal-00211424
HAL Id: hal-00211424 https://hal.archives-ouvertes.fr/hal-00211424 Preprint submitted on 21 Jan 2008
HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Analysis of fast versions of the Euclid Algorithm
Eda Cesaratto ∗ Julien Cl´ement † Benoˆıt Daireaux‡ Lo¨ıck Lhote § V´eronique Maume-Deschamps ¶ Brigitte Vall´ee k To appear in the ANALCO’07 Proceedings
Abstract gebra. Many gcd algorithms have been designed since There exist fast variants of the gcd algorithm which are Euclid. Most of them compute a sequence of remainders all based on principles due to Knuth and Sch¨onhage. On by successive divisions, which leads to algorithms with inputs of size n, these algorithms use a Divide and Con- a quadratic bit–complexity (in the worst-case and in the quer approach, perform FFT multiplications and stop average-case). Using Lehmer’s ideas [20] (which replace the recursion at a depth slightly smaller than lg n. A large divisions by large multiplications and small divi- rough estimate of the worst–case complexity of these sions), computations can be sped-up by a constant fac- fast versions provides the bound O(n(log n)2 log log n). tor, but the asymptotic complexity remains quadratic. However, this estimate is based on some heuristics and Major improvements in this area are due to Knuth [19], is not actually proven. Here, we provide a precise prob- who designed the first subquadratic algorithm in 1970, abilistic analysis of some of these fast variants, and we and to Sch¨onhage [24] who subsequently improved it the prove that their average bit–complexity on random in- same year. They use both Lehmer’s ideas and Divide puts of size n is Θ(n(log n)2 log log n), with a precise and Conquer techniques to compute in a recursive way remainder term. We view such a fast algorithm as a se- the quotient sequence (whose total size is O(n)). More- quence of what we call interrupted algorithms, and we over, if a fast multiplication with subquadratic complex- obtain three results about the (plain) Euclid Algorithm ity (FFT, Karatsuba...) is performed, then one obtains which may be of independent interest. We precisely a subquadratic gcd algorithm (in the worst-case). Such describe the evolution of the distribution during the ex- a methodology has been recently used by Stehl´eand ecution of the (plain) Euclid Algorithm; we obtain a Zimmermann [25] to design a Least-Significant-Bit ver- sharp estimate for the probability that all the quotients sion of the Knuth-Sch¨onhage algorithm. According to produced by the (plain) Euclid Algorithm are small experiments due to [5] or [22], these algorithms (with a enough; we also exhibit a strong regularity phenomenon, FFT multiplication) become efficient only for integers of which proves that these interrupted algorithms are lo- size larger than 10000 words, whereas, with the Karat- cally “similar” to the total algorithm. This finally leads suba multiplication, they are efficient for smaller inte- to the precise evaluation of the average bit–complexity gers (around 100 words). A precise description of the of these fast algorithms. This work uses various tools, Knuth-Sch¨onhage algorithm can be found in [29, 22] for and is based on a precise study of generalised transfer instance. operators related to the dynamical system underlying the Euclid Algorithm. 1.1 Previous results. The average-case behaviour of the quadratic gcd algorithms is now well understood. 1 Introduction First results are due to Heilbronn and Dixon in the sev- enties, who studied for the first time the mean number Gcd computation is a widely used routine in compu- of iterations of the Euclid Algorithm, then Brent anal- tations on long integers. It is omnipresent in rational ysed the Binary algorithm [4], and Hensley [15] provided computations, public key cryptography or computer al- the first distributional analysis for the number of steps of the Euclid Algorithm. Since 90, the Caen Group ∗Facultad de Ingeniera, Universidad de Buenos Aires, Ar- [26, 28, 27] has performed an average-case analysis of gentina and GREYC, Universit´ede Caen, France various parameters of a large class of Euclidean algo- † GREYC, Universit´ede Caen, F-14032 Caen, France rithms. More recently, distributional results have also ‡IrisResearch Center, Stavanger, Norway §GREYC, Universit´ede Caen, F-14032 Caen, France been obtained for the Euclid algorithm and some of its ¶IMB, Universit´ede Bourgogne, F-21078 Dijon Cedex, France variants: first Baladi and Vall´ee prove that a whole class kGREYC, Universit´ede Caen, F-14032 Caen, France of so–called additive costs of moderate growth follows bution relative to the Gauss density ϕ, defined as an asymptotic gaussian law [2] (for instance, the num- 1 1 ber of iterations, the number of occurrences of a given (1.1) ϕ(x)= . log2 1+ x digit, etc). This year, Lhote and Vall´ee [21] show that a more general class of parameters also follows an asymp- For rational inputs, one begins with some distribution totic gaussian law. This class contains the length of a on the set of the inputs x := A1/A0 of size n, and remainder at a fraction of the execution, and the bit- we consider the rationals xk := Ak+1/Ak. We focus complexity. To the best of our knowledge, there are on the first index k where the binary size of xk is less yet few results on “efficient” gcd algorithms. In [9], the than (1 δ)n and we denote the corresponding rational − authors perform an average-case analysis of Lehmer’s xk by x δ . What is the distribution of the rational h i algorithm, and exhibit the average speed-up obtained x δ ? The evolution of this distribution is clearly more using these techniques. But, as far as we know, there intricateh i than in the real case, since at the end of the does not exist any probabilistic analysis of subquadratic Algorithm (when δ = 1), the distribution is the Dirac gcd algorithms. This is the goal of this paper to perform measure at x = 0. We obtain here a precise description such a study. of this distribution (see Theorem 2 and Figure 1) which involves another density 1.2 Our results. There are two algorithms to be 12 log(m + x) analyzed, the algorithm and the algorithm. The (1.2) ψ(x) := . HG G π2 (m + x)(m + x + 1) algorithm computes the gcd, and the algorithm m 1 X≥ G(for Half-gcd Algorithm) only simulates theHG “first half” of the algorithm. We first show that these algorithms G Theoretical can be viewed as a sequence of the so–called Interrupted 1.4 Experimental Euclidean algorithms. An Interrupted Euclidean algo- rithm is a subsequence formed by successive iterations of 1.2 the algorithm. On an input (A, B), the plain Euclid al- 1 gorithm builds a sequence of remainders Ai, a sequence of quotients Qi, and a sequence of matrix i [see Sec- 0.8 tion 2.1]. On an input (A, B) of binary sizeMn, the Inter- 0.6 rupted Euclidean algorithm starts at the index Density probability E[δ,δ+γ] k of the execution of the Euclid Algorithm, as soon as 0.4 the remainder Ak has already lost δn bits (with respect to the initial A which has n bits) and stops at index 0.2 k + i as soon as the remainder Ak+i has lost γn supple- 0 mentary bits (with respect to the remainder Ak). The 0 0.2 0.4 0.6 0.8 1 x algorithm is just an algorithm which simulates the interruptedHG algorithm . A quite natural question E[0,1/2] Figure 1: Density distribution of xhδi in the case δ = 1/2. is: How many iterations are necessary to lose these γn This corresponds to the density distribution of the rational bits? Of course, it is natural to expect that this sub- xk := Ak+1/Ak obtained as soon as ℓ(Ak) is smaller than sequence of the Euclidean algorithm is just “regular” (1/2)ℓ(A0). The density distribution is plotted against its and locally similar to the “total” Euclidean Algorithm; theoretical counterparts ψ(x). Here we consider Ωn for in this case, the number of iterations would be close to n = 48 (48 bits), the interval [0, 1] is subdivided into equal γ P (where P is the number of iterations of the “total” subintervals of length 1/50 to estimate the density, and Euclid algorithm). We prove in Theorem 1 that this is 3 537 944 rational are drawn from Ωn according to the initial the case. density distribution ϕ by Monte Carlo simulation. We also need precise results on the distribution of some truncatures of these remainders. This is done For a probabilistic study of these fast variants, a precise in Theorem 3. Then, the choice of parameters in the description of the evolution of the distribution during fast algorithms must take into account this evolution the execution of the plain Euclid Algorithm is of crucial of distribution. This is why we are led to design some interest. For real inputs, we know that the continued variants of the classical algorithms, denoted by and for which the precise analysis can be performed.HG fraction algorithm does not terminate (except for ratio- G nals ...). Moreover, as the continued fraction algorithm The fast versions also deal with other functions, which is executed, the distribution of reals tends to the distri- are called the Adjust functions. Such functions perform few steps of the (plain) Euclid Algorithm. However, the tories of “rational” points), and a transfer “from con- bit-complexity of the Adjust functions depends on the tinuous to discrete” must finally be performed. size of the quotients which are computed during these The present paper mainly uses two previous works, steps. Even for estimating the worst–case complexity of and can be viewed as an extension of them: first, the the fast variants, the Adjust functions are not precisely average-case analysis of the Lehmer-Euclid algorithm analyzed. The usual argument is “The size of a quotient performed in [9], second the distributional methods de- is O(1)”. Of course, this assertion is false, since this is scribed in [2, 21]. First, we again use the general frame- only true on average. Since the Adjust functions are work that Daireaux and Vall´ee have defined for the anal- related to some precise steps, the size of the quotients ysis of the Lehmer-Euclid Algorithm, which explains computed at these precise steps may be large. We how the Lehmer-Euclid algorithm can be viewed as a are then led to study the probability that the Euclid sequence of Interrupted Euclidean algorithms [δ,δ+γ]. Algorithm only produces “small” quotients. This is However, in [9], we only used some “easy” propertiesE of done in Theorem 4. And of course, we also need the transfer operator Hs. We now need proving that this result for our truncated data. This is the goal of properties which were already crucial in previous distri- Theorem 5. butional analysis [2, 1, 21] –namely, the US Property for 1 the quasi-inverse (I H )− of the transfer operator– Finally, we obtain the exact average-case complexity of − s our versions of the two main algorithms of interest, the also hold in our context. The US property can be sum- algorithm, and the algorithm itself. We prove the marized as follows: HGfollowing results [TheoremsG 6 and 7] about the average For any ξ, there exists a vertical strip of the form s 1 σ for which the following holds:S bit-complexity B, G of both algorithms, on the set of |ℜ − |≤ random inputs of size n (i) The quasi-inverse has a unique pˆole in , S E [B]=Θ(n(log n)2 log log n), n 1 2 (ii) On the left line s =1 σ, one has (I Hs)− [1] = En[G]=Θ(n(log n) log log n). O( s ξ). ℜ − − |ℑ | Furthermore, we obtain some precise information about Here, our various theorems lead to study parameters of the constants which are involved in the Θ–terms. Then, various type, whose generating functions involve various our proven average bit–complexity of the , algo- operators G which depend on two variables s, w. rithms is of the same order as the usual heuristicHG G bound s,w However, for small w’s, all these operators can be viewed on the worst-case complexity of , algorithms. 1 HG G as a perturbation of the quasi-inverse (I Hs)− and we have to prove that the US Property extends− to these 1.3 Methods. All our main conclusions obtained perturbated quasi-inverses. In particular, the existence here are “expected”, and certainly do not surprise the of a strip where the US property holds uniformly reader. However, the irruption of the density ψ is not with respectS to w is crucial in the analysis: the main expected, and an actual proof of this phenomenon is choices of the parameters in our versions and of not straightforward. This is due to the fact that there the fast algorithms depend on the width σHGof this stripG is a correlation between successive steps of the Euclid (see Theorem 3). Algorithm. Then, the tools which are usual in analysis of algorithms [13], as generating functions, are not well- Plan and notations. Section 2 describes the main algorithms and . Section 3 is devoted to explain adapted to study this algorithm. All the analyses which HG G will be described here are instances of the so–called the main steps to be done for obtaining a proven dynamical analysis, where one proceeds in three main analysis. We then state our main results of general steps: First, the (discrete) algorithm is extended into interest. In Section 4, we describe the versions and to be analyzed, and we prove the two mainHG results a continuous process, which can be defined in terms of G the dynamical system related to the Gauss map. Then, about their average bit-complexity. The last Section is devoted to the proof of the main results of Section 3. the transfer operator Hs defined as We denote the logarithm in base 2 by lg x, and ℓ(x) 1 1 denotes the binary size of integer x, namely ℓ(x) := H [f](x) := f s (m + x)2s m + x lg x + 1. m 1 ⌊ ⌋ X≥ explains how the distribution evolves, but only in the 2 Fast and Interrupted Euclidean algorithms continuous world. The executions of the gcd algorithm We present in this section the main algorithms studied are now described by particular trajectories (i.e., trajec- in this paper. We first describe the general structure of the Knuth-Sch¨onhage algorithm. We explain how the The Lehmer Algorithm [20, 18] replaces large divi- algorithm can be seen as a sequence of interrupted sions by large multiplications and small divisions. The EuclideanHG algorithms, where the sequence of divisions fast algorithm applies recursively the principles of is stopped as soon as the integers have lost a fraction of Lehmer, and using fast FFT multiplications of complex- their number of bits. ity Θ(µ(n)) (with µ(n) = n log n log log n) replaces the costly computation of the remainder sequence Ai (which 2 2.1 Euclid’s algorithm. Let (A1, A0) be a pair of requires O(n ) bit operations), by a sequence of matrix positive integers with A1 A0. On input (A1, A0), products: it divides the total Euclidean Algorithm into the Euclid algorithm computes≤ the remainder sequence interrupted Euclidean algorithms, of the form and E(i,j) (Ak) with a succession of divisions of the form computes matrices of the form (i,j), defined in (2.5). The recursion, based on Divide andM Conquer techniques, (2.3) Ak = Qk+1Ak+1 + Ak+2 is stopped when the integers are small enough, and, at this moment, the algorithm uses small divisions. One A with Q = k , finally obtains a subquadratic gcd algorithm. k+1 A k+1 2.2 How to replace large divisions by small and stops when Ap+1 = 0. The integer Qk is the k– th quotient and the successive divisions are written as divisions? Lehmer remarked that, when two pairs (A, B) and (a,b) are sufficiently close (i.e., the rationals k = k+1 k+1, A Q A A/B and a/b are close enough), the Euclid algorithm on
Ak+1 0 1 (A, B) or(a,b) produces (at least at the beginning) the with k := and k := , A Ak Q 1 Qk same quotient sequence (Qi). This is why the following definition is introduced: so that Definition. Consider a pair (A, B) with A B and an ≤ integer b of length ℓ(b) ℓ(B). We denote by π[b](A, B) (2.4) = with := . ≤ A0 M(i)Ai M(i) Q1Q2 ···Qi any pair (a,b) which satisfies In the following, we consider a “slice” of the plain A a 1 . Euclidean Algorithm , between index i and index j, B − b ≤ b E namely the interrupted algorithm which begins E(i,j) with the pair as its input and computes the sequence Ai And the criterion (due to Lehmer and made precise by of divisions (2.3) with i k j 1. Its output is the Jebelean) is: pair together with the≤ matrix≤ − Aj Lemma 1. [Lehmer, Jebelean] For a pair (A, B) with A B and n := ℓ(B), consider, for m n, the small j ≤ ≤ pair (a,b) = π[b](A, B) of length ℓ(b) = m, and the (2.5) (i,j) = k, (1,i) = (i), M Q M M sequence of the remainders (a ) of the Euclid Algorithm k=i+1 i Y on the small input (a,b). Denote by k the first integer k with matrix defined in (2.4). We define the size of for which ak satisfies ℓ(ak) m/2 . Then the sequence M(i) ≤⌈ ⌉ a matrix as the maximum of the binary sizes of its of the quotients qi of the Euclid Algorithm on the small coefficients.M The size ℓ of the matrix satisfies input (a,b) coincide with the sequence of the quotients (i,j) M(i,j) Qi of the Euclid Algorithm on the large input(A, B) for j i k 3. (2.6) ℓ 2(j i)+ ℓ(Q ) ≤ − (i,j) ≤ − k Usually, this criterion is used with a particular pair k=i+1 X π[b](A, B) where the integer b is obtained by the m- truncation of B, i.e., the suppression of its (n m) The (naive) bit-complexity C(i,j) of the algorithm (i,j), E least significant bits. Then a is easy to compute− since j it may be chosen itself as the m-truncation of A. In C(i,j) := ℓ(Ak) ℓ(Qk) this case, the π[b] function corresponds to truncation of · both A and B and is denoted by T (A, B). However, k=Xi+1 m the Jebelean criterion holds for any choice of (a,b) = satisfies π[b](A, B), even if the integer a is less easy to compute j in the general case: the integer a can be chosen as the integer part of the rational (Ab)/B, and its computation (2.7) C(i,j) ℓ(Ai+1) ℓ(Qk). ≤ · needs a product and a division. k=Xi+1 2.3 Interrupted Algorithms. In Jebelean’s prop- Algorithm δ(A, B) E erty (Lemma 1), the Euclid Algorithm on the small pair n := ℓ(B) (a,b) of binary size m is stopped as soon the remainder i := 1 ak has lost m/2 bits. This is a particular case of the A1 := A,A0 := B ⌈ ⌉ 0 := I so–called Interrupted Euclidean Algorithm of parame- M While ℓ(Ai) > (1 δ) n ter δ (with 0 <δ < 1), which stops as soon as the − · Qi := Ai−1/Ai current remainder has lost δm bits (with respect to ⌊ ⌋ ⌊ ⌋ Ai+1 := Ai−1 QiAi the input which has m bits). This (general) interrupted − i := i−1 i M M ·Q Algorithm denoted by δ, and described in Figure 2, is i := i + 1 defined as follows: On theE input (A, B)ofΩ , this algo- n Return (Ai−1,Ai, i−1) rithm begins at the beginning of the Euclid Algorithm, M and stops as soon as the remainder Ai has lost δn bits Algorithm δ(A, B) (with respect to the input B). Then, with the notations E n := ℓ(B) defined in Section 2.1, one has = , with b Eδ E(1,Pδ ) i := 1 A1 := A,A0 := B (2.8) Pδ := min k; ℓ(Ak) (1 δ)n . ≤⌊ − ⌋ 0 := I M While ℓ(Ai) > (1 δ) n Figure 2 also describes the δ Algorithm, which is just − · E Qi := Ai−1/Ai a slight modification of the δ Algorithm, where the ⌊ ⌋ Ai+1 := Ai−1 QiAi E − last two steps are suppressedb (in view of applications of i := i−1 i M M ·Q Lemma 1), and P denotes the variable P 2. Then, i := i + 1 δ δ − P , and P are just the number of iterations of the , Return (Ai−3,Ai−2, i−3) δ δ Eδ Eδ M algorithms and Pb1 = P is just the number of iterations of the Euclidb Algorithm. b Figure 2: The Algorithm, and the algorithm, Eδ Eδ In the following, it will be convenient to consider more which is a slight modification of the δ Algorithm. E general interrupted algorithms, of the form [δ,δ+γ]. b The Algorithm is defined as follows:E On the E[δ,δ+γ] input (A, B) of Ωn, this algorithm begins at the Pδ- (a,b) will produce a matrix which is a matrix which th iteration of the Euclid Algorithm, as soon as the would have been produced byM the Euclid algorithm on remainder Ak has lost δn bits (with respect to the ⌊ ⌋ the pair (A′,B′). Then, the pair (C,D) computed as input B) and stops when the remainder Ai has lost C 1 A′ ( D ) = − B′ is a remainder pair of the Euclid γn supplementary bits (with respect to the input B). algorithmM on the input (A, B). The size of the matrix Then,⌊ ⌋ = = and = , E[0,δ] Eδ E(0,Pδ) E[δ,γ+δ] E(Pδ,Pδ+γ ) is approximately m/2, but smaller than m/2 (due to where Pδ is defined in (2.8). Of course, we can also theM two backward steps of Lemma 1), and thus of the design the variants with a hat, where the last two steps form (m/2) r(a,b), where r(a,b) is the number of bits are suppressed. which are “lost”− for the matrix during the backward steps. Then, with (2.6), M 2.4 Description of the Algorithm. The general principles. This isHG the algorithm which 1/2 r(a,b) 2max ℓ(qi)+1; 1 i p(a,b) , is used in Jebelean’s Lemma. ThisE lemma is a main tool ≤ { ≤ ≤ } to compute (in a recursive way) a functionb [for Half- where q are the quotients that occur in (a,b), and HG i gcd]. On an input (A, B) of binary size n, this function p(a,b) the number of steps of (a,b). If theE truncature E returns exactly the same result as 1/2, but runs faster. length m is chosen as a linear function of the input size With the algorithm , it is possibleE to design a fast n, of the form m =2γn, then the size of the pair (C,D) HG algorithm denoted which computesb the gcd itself. Let is approximately equal to [1 δ γ]n, but slightly larger. G − − us explain the main principles of the algorithm. If we wish to obtain a remainder pair (C′,D′) of length HG Suppose that the Euclid Algorithm, on an input (A, B) [1 δ γ]n, we have to perform, from the pair (C,D) a certain− − number of steps of the Euclid Algorithm, in of length n, has already performed Pδ iterations. Now, order to cancel the loss due to the backward steps. This the current pair, denoted by (A′,B′) has a binary size close to (1 δ)n. We may use the Jebelean Property to is the goal of the Adjust function, whose cost will be − (1 δ)n r(a,b) [see (2.7)]. continue. Then, we choose a length m for truncating, ≈ − · an integer b of length m, and consider the small pair Finally, we have designed an algorithm which produces (a,b) = π (A′,B′). The algorithm on this pair the same result as the interrupted algorithm , [b] HG E[δ,δ+γ] and can be described as follows: Algorithm (A, B) HG n := ℓ(B) (i) it truncates the input (A′,B′), with a truncation S := log2 n length m = 2γn, and obtains a small pair (a,b) of Return (A,B,S) length m. HG (ii) it performs the algorithm on the pair (a,b): Algorithm (A,B,S) HG this produces a pairHG (c, d) and a matrix . 1 n := ℓ(B) M 2 If n S then return 1/2(A, B) C 1 A′ ≤ E (iii) it performs the product ( ) := − ′ . 3 := I b D M B M 4 m := n/2 ; ⌊ ⌋ (iv) It uses the Adjust function, which performs some 5 For i := 1 to 2 do steps of the Euclid Algorithm from the pair (C,D) 6 (ai,bi) := Tm(A, B) and stops as soon the current remainder pair 7 (ci,di, i) := (ai,bi,S) C M −1 HG (C′,D′) has a size equal to (1 δ γ)n . 8 i := ( A ) ⌊ − − ⌋ “ Di ” Mi B 9 Adjusti(Ci, Di, i) 2.5 The usual designs for the and algo- M 10 (A, B):=(Ci, Di) rithms. HG G How to use this idea for computing (in a re- 11 := i M M·M cursive way) the Algorithm? The usual choice for 12 Return (A,B, ) γ is γ =1/4, moreHG precisely m = n/2 . Then, the pre- M vious description provides a method⌈ ⌉ to obtain E[0,1/4] Algorithm (A, B) (with a first choice δ = 0), then (with a second G E[1/4,1/2] 1 n := ℓ(B) choice δ = 1/4). Since [0,1/2] = [0,1/4] [1/4,1/2], we 2 T := √n log n E b E · E 3 While ℓ(A) T do are done. Remark that using the “hat” algorithm in ≥ 4 (A,B, 1) := (A, B) the second step leads tob modifying the Adjustb function M HG for this step. We use (for this second step) a so–called 5 Return gcd(A, B) “hat Adjust” function, which may also perform some backward steps in the Euclid Algorithm on the large Figure 3: General structure of the classical algorithms and . inputs. HG G The general structure of the algorithm is described in Figure 3. The recursion is stoppedHG when the naive algorithm becomes competitive. This defines a Θ(µ(n)) log n, since the cost due to the leaves (where the E1/2 threshold for the binary size denoted by S (remark that naive is performed) will be of asymptotic smaller E1/2 S = S(n) isb a function of the input size n). order. Then, H satisfies the relation With this algorithm, we can obtain an algorithm b HG n 2 named which computes the gcd. The idea for 2H µ(n)log n, G · 2H ≈≤ designing such an algorithm is to decompose the total Euclid Algorithm into interrupted algorithms, as n so that S(n) = log2 n, H lg n 2 lg lg n. 2H ≈≤ ≈≥ − [0,1] = [0,1/2] [1/2,3/4] . . . [1 (1/2)k,1 (1/2)k+1] . . . E E ·E · ·E − − · This is the “classical” version of the Knuth–Sch¨onhage Then, the algorithm, when running on inputs of algorithm. Clearly, the cost of this algorithm comes k HG size n/(2 ) produced by the [0,1 (1/2)k] algorithm can from three types of operations: E − easily simulate the [1 (1/2)k,1 (1/2)k+1] algorithm. E − − This decomposition also stops when the naive algorithm (i) the two recursive calls of line 7; gcd becomes competitive. This defines a threshold for the length denoted by T (remark that T = T (n) is also (ii) the products done at lines 8 and 11: with a clever a function of the input size n). implementation, it is possible to use in line 8 the We now consider the Algorithm, where all the pair (c, d) just computed in line 7. If all the products use a FFT multiplicationHG of order matrices and integer pairs have –on average– the expected size, the total expected cost due to the µ(n)=Θ(n log n log log n). products is [12 + 8 + 8] µ(n/4) = 28 µ(n/4);
In this case, we choose the recursion depth H so that (iii) the two functions Adjust performed at line 9, the main cost will be the “internal” cost, of order whose total average cost is R(n). We consider as the set of possible inputs of the course, this assertion is false, since this is only true on algorithm the set HG average. Since the Adjust functions are related to some precise steps, the size of the quotient computed at these Ω := (u, v); 0 u v, ℓ(v)= n n { ≤ ≤ } precise steps may be large. We are then led to study the probability that the Euclid Algorithm only produces endowed with some probability P . We denote by B(n) n “small” quotients. the average number of bit operations performed by the algorithm on Ω . Since each of the two recursive n 3.1 Evolution of densities. Consider a density f calls is madeHG on data with size n/2, it can be “expected” on the unit interval = [0, 1]. Any set Ω is seen as a set that B(n) asymptotically satisfies n of rationals n n (2.9) B(n) 2B( )+28 µ( )+ R(n) for n>S. Ω := x Q ]0, 1], x = u/v, ℓ(v)= n ≈ 2 4 n { ∈ ∩ }
Moreover, the average cost R(n) can be “expected” to and is endowed with the restriction of f to Ωn: for any be negligible with respect to the multiplication cost x0 Ωn, µ(n). If the FFT multiplication is used, with µ(n) = ∈ 1 Θ(n log n log log n), the total average bit–cost is “ex- P (x ) := f(x ). n,f 0 f(x) 0 pected” to be x Ωn ∈ B(n) Θ(µ(n)log n)=Θ(n(log n)2 log log n). The evolution of the densityP during the execution of the ≈ Euclid Algorithm is of crucial interest. We begin with With this (heuristic) analysis of the algorithm, it all the sets Ωn endowed with the probability Pn,f . is easy to obtain the (heuristic) averageHG bit–complexity For x Ω, we recall that P (x) is the smallest integer k of the algorithm which makes a recursive use of the δ where∈ the binary size of x is less than (1 δ)ℓ(x). We algorithmG and stops as soon as the naive algorithm k are interested in describing the density of− the rational becomesHG competitive. It then stops at a recursion depth x δ defined as M, when h i n 2 µ(n)log n, x δ := xk when Pδ(x)= k. 2M ≈≤ h i so that This rational is the input for all interrupted algorithms n 1 with a beginning parameter δ. It is then essential to T (n)= √n log n, M lg n lg lg n. study the random variable Pδ. Since the rational x 2M ≈≤ ≈≥ 2 − loses ℓ(x) bits during P (x) iterations, it can be expected The average bit–cost G(n) of the algorithm on data that it loses δℓ(x) bits during δP (x) iterations, which G of size n satisfies would imply that Pδ(x) is sufficiently close to δP (x).
M 1 This is what we call the regularity of the algorithm. − n G(n) B( ) so that G(n) Θ(B(n)). With techniques close to the renewal methods, we prove ≈ 2i ≈ i=0 a quasi-powers expression for the moment generating X function of Pδ, from which we deduce two results. The 3 The main steps towards a proven analysis. first one is interesting per se, and provides an extension The analysis is based on the Divide and Conquer of the result of Baladi-Vall´ee [2], which exhibits an equation (2.9), which is not a “true” equality. It is not asymptotic gaussian law for P := P1. clear why a “true” equality should hold, since each of the two recursive calls is done on data which do not Theorem 1. Consider the set Ωn endowed with a possess the same distribution as the input data. And, probability Pn,f relative to a strictly positive function f of class 1. Then, for any δ ]0, 1], the random variable of course, the same problem will be asked at each depth C ∈ of the recursion. We have to make precise the evolution Pδ is asymptotically gaussian on Ωn [with a speed of of the distribution during the Euclid Algorithm, but also convergence of order O(1/√n)]. Moreover, the distribution of the truncated data. 2log2 Moreover, we have also to make precise the average bit– En,ψ[Pδ] δn, ∼ Λ (1) · complexity R(n) of the Adjust function. Even for esti- | ′ | Λ (1) mating the worst–case complexity of the fast variants, V [P ] (2 log 2)2 ′′ δn. n,f δ ∼ Λ (1)3 · the Adjust functions are not precisely analyzed. The ′ usual argument is “The size of a quotient is O(1)”. Of
Theorem 2. There exist n0 and a constant σ > 0, This is sufficient for applying Jebelean’s criterion such that, for any n n0, for any real δ with 0 <δ< 1, (Lemma 1). for any fonction ε : [0≥, 1] [0, 1], such that ε(x) x, for any interval J I, for any→ strictly positive density≤ f of We start with a strictly positive density f of class 1 on 1 ⊂ C class , the probability that the rational x δ computed [0, 1]. The function gm = gm(f) defined on Ωm as C h i by the Euclid Algorithm belongs to the interval J 1 satisfies g (f)(y )= f(t)dt m 0 J(y ) | 0 | ZJ(y0) Pn,f [x δ J]= ψ(t)dt [1 + βn′ (δ, J, ε)] . P P h i ∈ · satisfies n,f [x; πm(x) = y] = m,gm(f)[y]. Further- ZJ more, the function gm(f) is a smoothed version of the Here, the density ψ is described in (1.2), the real σ is initial function f, and the width of the US strip and g (f)(x)= f(x)+ O( J f ) ε( J ) m | |·k k1 βn′ (δ, J, ε)=O | | + P J m,gm(f) m − | | 1 so that =1+ O(2 ). 2 Pm,f 1 σ(1 δ)n σ δn O 2− − + O 2 2 . ε( J ) J Since f is a density on [0, 1], the cumulative sum of | | | | gm(x) onΩm satisfies The constants in the O-term only depend on the func- tion f. gm(f)(x)= θm.
x Ωm 3.2 Probabilistic truncations. Finally, we are also X∈ interested by the distribution of the truncated rationals. This allows a comparison between two probabilities: We recall that the truncated rationals classically used Lemma 2. Consider a strictly positive density f of are obtained with truncations of numerator A and class 1 on I. For any n, for any m n, for any y Ω , denominator B of rational A/B. It is not clear how m one hasC ≤ ∈ to reach the distribution of such truncated rationals. m This is why we define a probabilistic truncation, which Pn,f [x; πm(x)= y]= Pm,f [y] [1 + O(2− )], leads to more regular distributions, and also allows us · to apply Jebelean’s Property (Lemma 1). where the constant in the O-term only depends on f. For x = A/B Ω , and m n, we define π (x) as n m 3.3 Truncations and evolution of densities. follows: ∈ ≤ With Theorem 2, the previous comparison of densities (1) Choose a denominator b in the set of integers of done in Lemma 2, and the optimal choice of ε(x)= x2, binary size m, with a probability proportional to we obtain the following result which will be a central b. More precisely, we choose a denominator b tool in our analysis. according to the law Theorem 3. Consider the constant σ and the density 1 Pm[b = b0]= b0 with θm = b. ψ of Theorem 2. There exists n0 such that, for any θm · b;ℓ(b)=m n n , for any real δ with 0 <δ< 1, for any pair of X ≥ 0 strictly positive constants σ1, σ2 with σ1 < (2/3)σ, for (2) Compute the integer a which is the integer part of a truncation m which satisfies x b. This computation involves the product A b · · then the division of the integer A b by B. This σ2 δn However, we will see that using this probabilistic Pn,ψ[x; πm(x δ )= y0]= Pm,ψ[y0] [1 + O(βn)] , truncation does not change the order of the average h i · σ3(1 δ)n σ4δn complexity of the algorithm. with βn =2− − +2− , HG and σ = σ (3/2)σ , σ := inf(σ/2, σ ). (3) Define πm(A/B) as the rational a/b, and remark 3 − 1 4 2 1 that the set πm− (a/b) is the interval Remark that Lemma 2 is designed to deal with the case a a a 1 a 1 m J := , + , with J( ) = = Θ(2− ). b b b b b b δ = 0.