Progress on LLL and Reduction

Claus Peter Schnorr

Abstract We surview variants and extensions of the LLL-algorithm of Lenstra, Lenstra Lovasz´ , extensions to quadratic indefinite forms and to faster and stronger reduction algorithms. The LLL-algorithm with Householder orthogonalisation in floating-point arithmetic is very efficient and highly accurate. We surview approx- imations of the shortest lattice vector by feasible lattice reduction, in particular by block reduction, primal-dual reduction and random sampling reduction. Segment re- duction performs LLL-reduction in high dimension mostly working with a few local coordinates.

Key words: LLL-reduction, Householder orthogonalisation, floating-point arith- metic, block reduction, segment reduction, primal-dual reduction, sampling reduction, reduction of indefinite quadratic forms.

1 Introduction

A lattice of dimension n consists of n linearly independent real vectors b1, ..., bn m ∈ R . The basis B = [b1, ..., bn] generates the lattice L consisting of all integer linear combinations of the basis vectors. Lattice reduction transforms a given basis of L into a basis with short and nearly orthogonal vectors. Unifying two traditions. The LLL-algorithm of Lenstra, Lenstra Lovasz´ [47] provides in polynomial time a reduced basis of proven quality. Its inventors focused on Gram-Schmidt orthogonalisation (GSO) in exact integer arithmetic which is only affordable for small dimension n. With floating-point arithmetic (fpa) available it is faster to orthogonalise in fpa. In numerical mathematics the orthogonalisation of

Claus Peter Schnorr Fachbereich Informatik und Mathematik, Universit¨at Frankfurt, PSF 111932, D-60054 Frankfurt am Main, Germany. [email protected] – http: www.mi.informatik.uni-frankfurt.de

1 a lattice basis B is usually done by QR-factorization B = QR using Householder reflections [76]. The LLL-community has neglected this approach as it requires square roots and does not allow exact integer arithmetic. We are going to unify these two separate traditions. Practical experience with the GSO in fpa led to the LLL-algorithm of [66] imple- mented by Euchner that tackles the most obvious problems in limiting fpa-errors by heuristic methods. It is easy to see that the heuristics works and short lattice vectors are found. The LLL of [66] with some additional accuracy measures performs well in double fpa up to dimension 250. But how to proceed in higher dimensions ? [65] gives a proven LLL in approximate rational arithmetic. It uses an iterative method by Schulz to recover accuracy and is not adapted to fpa. Its time bound in bit operations is a polynomial of degree 7, resp., 6+ε using school-, resp. FFT-multiplication while the degree is 9, resp., 7 + ε for the original LLL. In 1996 Rossner¨ implemented in his thesis a continued fraction algorithm applying [31] and Householder orthogonalization (HO) instead of GSO. This algorithm turned out to be very stable in high dimension [63]. Our experience has well confirmed the higher accuracy obtained with HO, e.g., by attacks of May and Koy in 1999 on NTRU- and GGH-cryptosystems. The accuracy and stability of the QR-factorization of a basis B with HO has been well analysed for backwards accuracy [26], [34]. These results do not directly provide bounds for worst case forward errors. As fully proven fpa-error bounds are in practice far to pessimistic we will use strong heuristics. Outline. Section 2 presents the LLL in ideal arithmetic and discusses various extensions of the LLL by deep insertions that are more powerful than LLL swaps. Section 3 extends the LLL to arbitrary quadratic forms, indefinite forms included.

Section 4 presents LLLH , an LLL with HO together with an analysis of forward errors. Our heuristics assumes that small error vectors have a negligible impact on the vector length as correct and error vectors are in high dimension very unlikely near parallel. It is important to weaken size-reduction under fpa such that the reduction becomes independent of fpa-errors. This also prevents infinite cycling of the algorithm. Fortunately, the weakened size-reduction has a negligible impact on the quality of the 2 reduced basis. LLLH of [70] and the L of [56] are adapted to fpa, they improve the time bounds of the theoretical LLL of [65] and are well analysed. L2 provides provable correctness, it is quadratic in the bit length of the basis. However it looses about half of the accuracy compared to LLLH and it takes more time. Sections 5–7 survey practical improvements of the LLL that strengthen LLL- reduction to find shorter lattice vectors and better approximations of the shortest lattice vector. We revisit block reduction from [64] extend it to Koy’s primal-dual reduction and combine it with random sampling reduction of [69].

Sections 8, 9 survey recent results of [70], they speed up LLLH for large dimen- sion n to LLL-type segment reduction that goes back to an idea of Schonhage¨ [71]. This reduces the time bound of LLLH to a polynomial of degree 6, resp., 5+ε and preserves the quality of the reduced bases. Iterating this method by iterated subseg-

2 ments performs LLL-reduction in O(n3+ε) arithmetic steps using large integers and fpa-numbers. It is still open how to turn this into a practical algorithm. For general background on lattices and lattice reduction see [13], [16], [52], [54], [59]. Here is a small selection of applications in number theory [8], [47], [14], [72] computational theory [11], [27], [31], [45], [49], [53] cryptography [1], [10],[9], [17], [54], [58], [61] and complexity theory [2], [12], [20], [25],[36], [54]. Standard program packages are LIDIA, Magma and NTL.

Notation, GSO and GNF. Let Rm denote the real vector space of dimension m t m 1 with inner product hx, yi = x y. A vector b ∈ R has length kbk = hb, bi 2 .A m sequence of linearly independent vectors b1, ..., bn ∈ R is a basis, written as ma- m×n trix B = [b1, ..., bn] ∈ R with columns bi. The basis B generates the lattice n Pn m L = L(B) = {Bx | x ∈ Z } = i=1 biZ ⊂ R . It has dimension dim L = n. Let qi denote the orthogonal projection of bi in ⊥ m span(b1, ..., bi−1) , q1 = b1. The orthogonal vectors q1, ..., qn ∈ R and the Gram- Schmidt coefficients µj,i, 1 ≤ i, j ≤ n of the basis b1, ..., bn satisfy for j = 1, ..., n: Pj bj = i=1 µj,iqi, µj,j = 1, µj,i = 0 for i > j, µj,i = hbj, qii/hqi, qii, hqj, qii = 0 for j 6= i. The basis B ∈ Rm×n has a unique QR-decomposition B = QR, where Q ∈ Rm×n is isometric (i.e., Q preserves the inner product, hx, yi = hQx,Qyi, Q can be ex- 0 m×m n×n tended to an orthogonal matrix Q ∈ R ) and R = [ri,j] ∈ R is upper- triangular (ri,j = 0 for i > j) with positive diagonal entries r1,1, ..., rn,n > 0. Hence 2 Pi 2 Q = [ q1/kq1k, ...., qn/kqnk ], µj,i = ri,j/ri,i, kqik = ri,i and kbik = j=1 rj,i. Two bases B = QR, B0 = Q0R0 are isometric iff R = R0, or equivalently iff BtB = B0tB0. We call R the geometric normal form (GNF) of the basis, GNF(B) := R. The GNF is preserved under isometric transforms Q, i.e., GNF(QB) = GNF(B).

The successive minima. The j-th successive minimum λj(L) of a lattice L, 1 ≤ j ≤ dim L, is the minimal real number ρ for which there exist j linearly in- dependent lattice vectors of length ≤ ρ ; λ1 is the length of the shortest nonzero lattice vector.

n×n Further notation. GLn(Z) = {T ∈ Z | det T = ±1}, R−t = (R−1)t = (Rt)−1 is the inverse transpose of R ∈ Rn×n, t t 1/2 1/2 di = det([b1, ..., bi] [b1, ..., bi]), d0 = 1, det L(B) = det(B B) = dn , m ⊥ πi : R → span(b1, ..., bi−1) is the orthogonal projection, qi = πi(bi), 2 2/n γn = maxL λ1(L)/ det L over all lattices L of dim L = n is the Hermite constant,   1 k×k Uk = · ∈ Z ; B := BUn / B := UmB reverses the order of columns/rows of 1 m×n B ∈ R , we write matrices A = [ai,j] with capital letters A and small letter entries n×n ai,j, In ∈ Z denotes the unit matrix, let ε ∈ R, 0 ≤ ε ≈ 0.

3 Details of floating point arithmetic. We use the fpa model of Wilkinson [76]. 0 0 e Pt i A fpa number with t = 2t + 1 precision bits is of the form ±2 i=−t0 bi2 , where s bi ∈ {0, 1} and e ∈ Z. It has bit length t + s + 2 for |e| < 2 , two signs included. We denote the set of these numbers by FLt. Standard double length fpa has t = 53 2s 2s precision bits, t + s + 2 = 64. Let fl : R ⊃ [−2 , 2 ] 3 r 7→ FLt approximate real numbers by fpa numbers. A step c := a ◦ b for a, b, c ∈ R and a binary operation ◦ ∈ {+, −, ·,/} translates under fpa intoa ¯ := fl(a), ¯b := fl(b),c ¯ := fl(¯a ◦ ¯b), resp. √ intoa ¯ := fl(◦(¯a)) for unary operations ◦ ∈ {d c, }. Each fpa operation induces a normalized relative error bounded in magnitude by 2−t: |fl(¯a◦¯b)−a¯ ◦¯b|/|a¯ ◦¯b| ≤ 2−t. s s If |a¯ ◦ ¯b| > 22 or |a¯ ◦ ¯b| < 2−2 then fl(¯a ◦ ¯b) is undefined due to an overflow, resp. underflow. s 2 Usually one requires that 2 ≤ t and thus s ≤ 2 log2 t, for brevity we identify the bit length of fpa-numbers with t, neglecting the minor (s+2)-part. We use approximate ¯ m m vectors hl, ¯rl ∈ FLt for HO under fpa and exact basis vectors bl ∈ Z .

2 LLL-Reduction and Deep Insertions

n×n We describe reduction of the basis B = QR in terms of the GNF R = [ri,j] ∈ R . Standard reductions. A lattice basis B = QR ∈ Rm×n is size-reduced (for ε) if 1 |ri,j|/ri,i ≤ 2 + ε for all j > i.(occasionally we neglect ε) 2 1 B = QR is LLL-reduced (or an LLL-basis) for δ ∈ (η , 1], η = 2 + ε, if B is size- reduced and 2 2 2 δri,i ≤ ri,i+1 + ri+1,i+1 for i = 1, ..., n − 1. A basis B = QR is HKZ-reduced (or an HKZ-basis) if B is size-reduced, and each n×n diagonal coefficient ri,i of the GNF R = [ri,j] ∈ R is minimal under all transforms in GLn(Z) that preserve b1, ..., bi−1.

2 2 2 LLL-bases satisfy ri,i ≤ α ri+1,i+1 for α := 1/(δ−η ). This yields Theorem 1. A.K. Lenstra, H.W. Lenstra, Jr. and L. Lovasz´ [47] introduced LLL-bases focusing on δ = 3/4,  = 0 and α = 2. HKZ-bases are due to Hermite [33] and Korkine-Zolotareff [38]. LLL / HKZ- bases B = QR are preserved under isometry. LLL / HKZ-reducedness is a property of the GNF R.

Theorem 1. [47] An LLL-basis B ∈ Rm×n of lattice L satisfies for α = 1/(δ − η2) 2 n−1 2/n 2 n−1 2 1. kb1k ≤ α 2 (det L) , 2. kb1k ≤ α λ1 2 i−1 2 −i+1 2 −2 n−1 3. kbik ≤ α ri,i, 4. α ≤ kbik λi ≤ α for i = 1, ..., n. 2 2 If an LLL-basis satisfies the stronger inequalities ri,i ≤ αr¯ i+1,i+1 for all i and some 1 < α¯ < α, then the inequalities 1-4 hold with αn−1 replaced byα ¯n−1 and with αi−1 in 3, 4 replaced byα ¯i−1/(4¯α − 4).

4 Sections 5–9 survey variants of LLL- and HKZ-bases that either allow faster re- duction for large dimension n or provide a rather short vector b1. For these bases we either modify clause 1 or clause 2 of Theorem 1. Either clause is sufficient due to an observation of Lovasz´ [49] pp.24 ff, that the following problems are polynomial time equivalent for all lattices L(B) given B: O(1) 1. Find b ∈ L, b 6= 0 with kbk ≤ n λ1(L). 2. Find b ∈ L, b 6= 0 with kbk ≤ nO(1)(det L)1/n.

Theorem 2. [48] An HKZ-basis B ∈ Rm×n of lattice L satisfies 2 −2 4/(i + 3) ≤ kbik λi ≤ (i + 3)/4 for i = 1, ..., n. The algorithms for HKZ-reduction [35], [21], exhaustively enumerate, for various l, all lattice vectors b such that kπl(b)k ≤ kπl(bl)k for the current bl. Their theoretical n +o(n) analysis has been stepwise improved [35], [32] to a proven mn 2e time bound for HKZ-reduction [28]. The enumeration ENUM of [66], [67] is particularly fast in practice, see [1] for a survey and [28], [60] for heuristic and experimental results.

Alg. 1: LLL in ideal arithmetic m 1 INPUT b1, ..., bn ∈ Z a basis with M0 = max{kb1k, ..., kbnk }, δ with 4 < δ < 1 1. l := 1, # at stage l b1, ..., bmax(l−1,1) is an LLL-basis with given GNF. 2. WHILE l ≤ n DO

2.1 #compute rl = col(l, R): Pi−1 FOR i = 1, ..., l − 1 DO [ ri,l := (hbi, bli − k=1 rk,irk,l)/ri,i ], 2 Pl−1 2 1/2 t n rl,l := | kblk − k=1 rk,l | , rl := (r1,l, ..., rl,l, 0, ...0) ∈ R . 1 2.2 #size-reduce bl and rl, drc = dr− 2 e denotes the nearest integer to r ∈ R: FOR i = l − 1, ..., 1 DO bl := bl − dri,l/ri,icbi, rl := rl − dri,l/ri,icri. 2 2 2 2.3 IF l > 1 and δ rl−1,l−1 > rl−1,l + rl,l THEN swap bl−1, bl, l := l − 1 ELSE l := l + 1. 3. OUTPUT LLL-basis B = [b1, ..., bn], R = [r1, ..., rn] for δ.

Comments. LLL performs simultaneous column operations on R and B, it swaps columns rl−1, rl and bl−1, bl if this shortens the length of the first column of the h rl−1,l−1 rl−1,l i √ submatrix of R by the factor δ. To enable a swap the entry rl−1,l 0 rl,l 1 is first reduced to |rl−1,l| ≤ 2 |rl−1,l−1| by transforming rl := rl −drl−1,l/rl−1,l−1crl−1. At stage l we get rl = col(l, R) of R = GNF(B), and we have rl−1 from a previous stage. The equation GNF([b1, ...bl]) = [r1, ...rl] is preserved during simultaneous size- reduction of rl and bl. (1) Qn−1 (1) Each swap in step 2.3 decreases D := i=1 di by the factor δ. As initially D ≤ n2 (1) 2 M0 and D remains integer LLL performs ≤ n log1/δ M0 rounds, denoting M0 =

5 max{kb1k, ..., kbnk} for the input basis. Each round performs O(nm) arithmetic steps, 3 LLL runs in O(n m log1/δ M0) arithmetic steps. 2 Time bound in exact rational/integer arithmetic. The rationals rl,l, µl,i = ri,l/ri,i can easily be obtained within LLL in exact rational/integer arithmetic. Moreover, the integer µj,idi has bit length O(n log2 M0) throughout LLL-computations. This yields

5 3 4+ε 2+ε Theorem 3. LLL runs in O(n m(log1/δ M0) ), resp. O(n m(log1/δ M0) ) bit operations under school-, resp. FFT-multiplication.

The degree of this polynomial time bound (in n, m, log M0) is 9 / 7 + ε under school-/FFT-multiplication. The degree reduces to 7 / 6 + ε by computing the GNF, resp., the GSO in floating-point arithmetic (fpa). This is done by LLLH [70] of section 2 4 and the L of [56], both use fpa numbers of bit length O(n + log2 M0). LLL-type segment reduction SLLL of [70] in section 9 further decreases the time bound degree to 6 / 5 + ε. The factor m in the time bounds can be reduced to m by performing the reduction under random projections [6]. The theoretical, less practical algorithms of [65], [73] approach the time bound of LLLH by approximate rational, resp. very long integer arithmetic. A canonical order of the input basis vectors. While the LLL depends heavily on the order of the basis vectors this order can be made nearly canonical by swapping before 2 2 step 2.3 bl and bj for some j ≥ l that minimizes the value rl−1,l + rl,l resulting from the swap. This facilitates a subsequent swap of bl−1, bl by step 2.3. Moreover, this tends to decrease the number of rounds of the LLL and anticipates subsequent deep insertions into the LLL-basis b1, ...., bl−1 via the following new step 2.3.

New step 2.3, deep insertion at (j, l) [SE91], (the old step 2.3 is restricted to depth l − j = 1): 2 Pl 2 IF ∃j, 0 < j < l such that δrj,j > i=j ri,l THEN for the smallest such j do (bj , ..., bl) := (bl, bj , bj+1, ..., bl−1), l := j ELSE l := l + 1. LLL with deep insertions is efficient in practice, and runs with some restrictions 2 2 in polynomial time. Deep insertion at (j, l) decreases rj,j and dj = rj,j ··· rn,n and new Pl 2 2 provides rj+k,j+k close to rj+k−1,j+k−1 for j + k < l. This is because i=j ri,l < δrj,j and the hπj(bl), bj+k−1i are quite small.

Deep insertion at (j, l) can be strengthened by decreasing kπj(bl)k through ad- 2 Pl 2 2 ditional, more elaborate size-reduction. Before testing δrj,j > i=j ri,l = kπj(rl)k shorten πj(bl) as follows

2.2.b additional size-reduction of πj(bl): 0 Ph Ph 2 0 1 WHILE ∃h : j ≤ h < l such that µj,h := i=j ri,lri,h/ i=j ri,h satisfies δ|µj,h| ≥ 2 0 0 DO bl := bl − dµj,hcbh, rl := rl − dµj,hcrh. Obviously, deep insertion with additional size-reduction remains polynomial time per round. This makes the improved deep insertion quite attractive and more efficient than the algorithms of sections 5–7. In addition one can perform deep insertion of

6 bl and of several combinations of bl with other lattice vectors. This leads to random sampling reduction of [69] to be studied in section 7. n−1 2 Decreasing α by deep insertion of depth ≤ 2. The bound kb1k ≤ α 2 (det L) n of n×n (−i+1)/2 Theorem 1 is sharp for the LLL-GNF R = [ri,j] ∈ R : ri,i = α , ri,i+1 = 1 2 ri,i, ri,j = 0 for j ≥ i+2. (Note that deep insertion at (1, n) results in r1,1 = λ1 right 2 2 2 2 away.) While this GNF satisfies ri,i/ri+1,i+1 = α the maximum of ri,i/ri+2,i+2 under 3 3 2 16 LLL-reduction with deep insertion of depth ≤ 2 for δ = 1 is 2 , 2 < α = 9 , since the 2 2 3 HKZ-reduced GNF of the critical lattice of dimension 3 satisfies r1,1/r3,3 = 2 . This shows that deep insertion of depth ≤ 2 decreases α in clauses 1, 2 of Theorem 1 from 4 3 1/2 3 to ( 2 ) . LLL-reduction of bases B with large entries. Similar to Lehmer’s version of Euclid’s algorithm for large numbers [37] section 4.5.2, p. 342, most of the LLL-work can be done in single precision arithmetic: 0 43 0 Pick a random integer ρ ∈R [1, 2 ] and truncate B = [bi,j] into the matrix B 0 0 τ 0 τ 0 0 0 with entries bi,j := dbi,jρ /2 c such that |bi,jρ /2 − bi,j| ≤ 1/2 and |bi,j| ≤ ρ + 1/2, where τ := maxi,jdlog2 |bi,j|e  53. LLL-reduce B0 into B0T 0 working mostly in single precision, e.g. with 53 precision bits, and transform B into BT 0. Iterate this process as long as it shortens B. Note that (2τ /ρ0)B0 and (2τ /ρ0)n det B0 well approximate B and det B. In par- ticular, | det B|  (2τ /ρ0)n implies that det B0 = 0, and thus LLL-reduction of B0 produces zero vectors of B0T 0 and highly shortened vectors of BT 0.

3 LLL-Reduction of Quadratic and Indefinite Forms

We present extensions of the LLL-algorithm to other domains than Z and general quadratic forms. Rational, algebraic and real numbers. It is essential for the LLL-algorithm that the input basis B ∈ Rm×n has entries in an euclidean domain R with efficient, exact arithmetic such as R = Z. Using rational arithmetic the LLL-algorithm directly extends from Z to the field of rational numbers Q and to rings R of algebraic numbers. However, the bit length of the numbers occuring within the reduction requires further care. If the rational basis matrix B has an integer multiple dB ∈ Zm×n, d ∈ Z of moderate size then the LLL-algorithm applied to the integer matrix dB ∈ Zm×n yields an LLL-basis of L(dB) ∈ Zm which is the d-multiple of an LLL-basis of the lattice L(B). Gram-matrices, symmetric matrices, quadratic forms. The LLL-algorithm directly extends to real bases B ∈ Rm×n of rank n that have an integer Gram-matrix BtB ∈

7 Zn×n. It transforms A := BtB into A0 = T tAT such that the GNF R0 of A0 = R0tR0 is LLL-reduced. t n×n We identify symmetric matrices A = A = [ai,j]1≤i,j≤n ∈ R with n-ary t 0 0 t quadratic forms x Ax ∈ R[x1, ..., xn]. The forms A, A are equivalent if A = T AT holds for some T ∈ GLn(Z). The form A ∈ Rn×n with det(A) 6= 0 is indefinite if xtAx takes positive and nega- tive values, otherwise A is either positive or negative (definite). The form A is regular t if det(A) 6= 0. We call A = A = [ai,j] strongly regular (s.r.) if det([ai,j]1≤i,j≤`) 6= 0 n×n for ` = 1, ..., n. Let Dσ ∈ {0, ±1} denote the diagonal matrix with diagonal n σ = (σ1, ..., σn) ∈ {±1} . An easy proof shows

Proposition 1. Every s.r. form A = At ∈ Rn×n has a unique decomposition A = t n×n n×n R DσR ∈ R with a GNF R ∈ R and a diagonal matrix Dσ with diagonal σ ∈ {±1}n.

We call R the geometric normal form (GNF) of A, R = GNF(A). The signature t #{i | σi = 1} is invariant under equivalence. The form A = R DσR is positive, resp., negative if all entries σi of σ are +1, resp., −1. The form is indefinite if σ has −1 and +1 entries.

Definition 1. A s.r. form A = RDσR is an LLL-form if its GNF R is LLL-reduced.

t The LLL-form A = [ai,j] = B B of an LLL-basis B = (b1, ..., bn) satisfies the 2 2 n−1 2 classical bound a1,1 = kb1k ≤ α 2 | det(A)| n of Theorem 1. If B ∈ Rm×n generates a lattice L = {Bx | x ∈ Z} of dimension n0 ≤ n the LLL- 0 0 algorithm transforms the input basis B into an output basis [0, ..., 0, b1, ..., bn0 ] ∈ m×n 0 0 R such that [b1, ..., bn0 ] is an LLL-basis. This generalizes to Corollary 1. An adjusted LLL transforms an input form A ∈ n×n of rank n0 into 0 0 Z an equivalent regular form A0 ∈ Zn ×n with det(A0) 6= 0.

LLL-reduction easily extends to s.r. forms A ∈ Zn×n: compute R = GNF(A), LLL- reduce R into RT and transform A into the equivalent form A0 = T tAT . By Lemma 1 the LLL-reduction of non s.r. indefinite forms reduces to LLL-reduction in dimension n − 2, see also Simon [72].

Lemma 1. An adjusted LLL-algorithm transforms a non s.r. input form A ∈ Zn×n 0 0 0 0 in polynomial time into an equivalent A = [ai,j] such that a1,i = a2,j = 0 for all  0  0 0 0 0 a1,2 0 i 6= 2, j ≥ 3 and 0 ≤ a2,2 ≤ 2a1,2. Such A is a direct sum 0 0 ⊕ [ai,j]3≤i,j≤n. a1,2 a2,2 0 Moreover a1,2 = 1 if det(A) 6= 0 is square-free.

Proof. If det([ai,j]1≤i,j≤`) = 0 for some ` ≤ n then the LLL-algorithm achieves in 0 polynomial time that a1,1 = 0. The corresponding A can be transformed into A such 0 0 0 0 0 that a1,2 = gcd(a1,2, ..., a1,n), a1,3 = ··· = a1,n = 0 = a2,3 = ··· = a2,n. Moreover 0 0 a2,2 can be reduced modulo 2a1,2. Doing all transforms symmetrically on rows and

8 0 t 0 0 0 2 columns the transformed A = T AT is symmetric a1,2 = a2,1, and thus (a1,2) divides 0 0 det(A). If det(A) is square-free then |a1,2| = 1 and a1,2 = 1 is easy to achieve.  Lemma 1 shows that the LLL-algorithm can be adjusted to satisfy

Theorem 4. An adjusted LLL-algorithm transforms a given form A ∈ Zn×n in poly- k (i) (k) nomial time into a direct sum ⊕i=1A of an LLL-form A and binary forms (i) h 0 ai i A = for i = 1, ..., k − 1, where 0 ≤ bi < 2ai and ai = 1 if det(A) 6= 0 is ai bi square-free. If A is positive definite and det A 6= 0 then k = 1.

LLL-reduction of indefinite forms is used in cryptographic schemes based hard problems of indefinite forms. In particular, the equivalence problem is NP-hard for ternary indefinite forms [29]. [30] presents public key identification and signatures based on the equivalence problem of quadratic forms.

4 LLL Using Householder Orthogonalization (HO)

In practice GNF(B) = R is computed in fpa within LLL whereas B is transformed in exact integer arithmetic. For better accuracy we replace steps 2.1, 2.2 of LLL by the procedure TriColl below, denoting the resulting LLL-algorithm by LLLH . TriColl performs HO via reflections, these isometric transforms preserve the length of vectors and of error vectors. LLLH improves the accuracy of the LLL of [66]. All subsequent reduction algorithms are based on LLLH . We streamline the LLLH -analysis of [70]. Computing the GNF of B. Numerical algorithms for computing the GNF R = n×n [ri,j] ∈ R of B = QR have been well analyzed, see [34] chapter 19. It is known for quite some time that HO is very stable. Wilkinson showed that the computation of a Householder vector and the transform of a given matrix by a reflection are both normwise stable in the sense that the computed Householder vector is very close to the exact one and the computed transform is the update of a tiny normwise perturbation of the original matrix. Wilkinson also showed that the QR factorization algorithmn is normwise backwards stable [76] pp.153–162, p. 236. For a componentwise and normwise error analysis see [34] ch. 19.3.

To simplify the analysis let all basis vectors bj start with n zero entries bj = n 0 n m−n 0 0 (0 , bj) ∈ 0 Z . The bases b1, ..., bn and b1, ..., bn have the same GNF. The 4 padding with initial zeros increases TriCol’s number of steps by the factor 3 , it is not required in practice. We compute an orthogonal matrix Q0 ∈ Rm×m that extends Q ∈ Rm×n by m − n columns and a matrix R0 ∈ Rm×n that extends R ∈ Rn×n by final zero rows. In ideal arithmetic we get R0 by a sequence of Householder reflections m×m Qj ∈ R as 0 0 0 R0 := B, Rj := QjRj−1 for j = 1, ..., n, 0 0 0 t t R := Rn, Q := Q1 ··· Qn = Q1 ··· Qn,

9 −2 t m where Qj := Im − 2khjk hjhj is orthogonal and symmetric and hj ∈ R . 0 0 The transform Rj 7→ QjRj−1 zeroes the entries in positions j + 1 through m of 0 t 0 0 m×n col(j, Rj−1), it triangulates r := (r1, ..., rm) := col(j, Rj−1) so that Rj ∈ R is upper-triangular for the first j columns. The reflection Qj reflects about the hyper- ⊥ plane span(hj) : Qjhj = −hj,Qjx = x for hhj, xi = 0. t n−j+1 n m−n Note that (rj, ..., rn) = 0 due to b1, ..., bj ∈ 0 Z . Pm 2 1/2 j−1 n−j t We set rj,j := ( i=j ri ) , hj := (0 , −rj,j, 0 , rn+1, ..., rm) . −2 2 2 Correctness of hj. We have 2hhj, rikhjk = 1 and khjk = 2rj,j and thus m−j t m Qjr = r − hj = (r1, ..., rj−1, rj,j, 0 ) ∈ R . Hence Qjr is correctly triangulated and the Householder vector hj is well chosen.

TriCol (b1, ..., bl, h1, ..., hl−1, r1, ..., rl−1)(TriColl for short) # TriColl computes hl and rl := col(l, R) and size-reduces bl, rl. 2 1. r0,l := bl, FOR j = 1, ..., l − 1 DO rj,l := rj−1,l − hhj , rj−1,lihj /rj,j . t Pm 2 1/2 2. (r1, ..., rm) := rl−1,l, ζ := maxi |ri|, rl,l := ζ( i=n+1(ri/ζ) ) , # ζ prevents under/overflow; ri = 0 holds for i = l, ..., n l−1 n−l t 2 2 3. hl := (0 , −rl,l, 0 , rn+1, ..., rm) ,# note that khlk = 2rl,l. m−l t m 4. rl := (r1, ..., rl−1, rl,l, 0 ) ∈ R . 5. # size-reduce bl and rl: FOR i = l − 1, ..., 1 DO 1 IF |ri,l/ri,i| ≤ 2 + ε THEN µi := 0 ELSE µi := dri,l/ri,ic, bl := bl−µibi, rl := rl−µiri. Pl−1 6. IF i=1 |µi| 6= 0 THEN GO TO 1 ELSE output bl, rl, hl.

1 TriColl under fpa with t precision bits. Zeroing µi in case |ri,l/ri,i| <≤ 2 + ε in step 5 cancels a size-reduction step and prevents cycling through steps 1-6. In TriColl’s last round size-reduction is void. Zeroing µi, the use of ζ and the loop through steps 1– 6 are designed for fpa. The µi in steps 5, 6 consist of the leading Θ(t) bits of the µl,i. TriColl replaces in the Schnorr-Euchner-LLL [66] classical Gram-Schmidt by an economic, modified Gram-Schmidt, where col(l, R) gets merely transformed by the l − 1 actual reflections.

Accurate floating point summation. Compute the scalar product hhj, rj−1,li = hx, yi in step 1 by summing up positive and negative terms xiyi separately, both in increasing order to P and P . The increasing order minimizes the apriori >0 <0 P P forward error [34] chapter 4, see also [19]. If both >0 and − <0 are larger than −t/2 P P −t/2 P P 2 and nearly opposite, | >0 + <0 | < 2 ( >0 − <0), then compute hx, yi exactly. This provision is far more economic than that of [66] to compute hx, yi ex- actly if |hx, yi| < 2t/2kxkkyk. Prop. 2 and Thm. 5 do not require occasional exact computations of hx, yi. 3 2 Efficiency. TriColl performs 4ml + 2 l + O(l) arithmetic steps and one sqrt per 3 2 round, the 2 l steps cover step 1. TriColl is more economic than the full modified GSO of [34] as only the reflections of the current Householder vectors h1, ..., hl−1 are applied to bl. The contribution of step 1 to the overall costs is negligible for reasonably

10 long input bases since on average l ≤ n/2 and there are no long integer steps. TriColl t t−1 performs at most log2(2kblk/2 ) rounds, each round shortens bl by a factor ≤ 1/2 .

fpa-Heuristics. There is room for heuristics in speeding up the LLL since the cor- rectness of the output can most likely be efficiently verified [75]. We want to catch the typical behaviour of fpa-errors knowing that worst case error bounds are to pes- simistic. Note that error vectors x − x¯ are in high dimension rarely near parallel to x. Small errors kx − x¯k ≤ εkxk with expected value E[hx, x¯ − xi] = 0 satisfy E[ kx¯k2] = E[ kx + (¯x − x)k2] = E[ kxk2 + kx¯ − xk2] ≤ (1 + ε2)E[ kxk2]. Hence the relative error | kx¯k − kxk |/kxk is for ε  1 on average smaller than ε2 and can be neglected. 0 m m−n We let the projection πn : R → R remove the first n coordinates so 0 0 n 0 that πn(bj) = bj for bj = (0 , bj). Recall that TriColl computes in step 2 Ql−1 ¯ rl−1,l = j=1 Qjbl, as ¯r0,l := bl, ¯rj,l := fl(Qj¯rj−1,l) for 1 ≤ j < l and ¯ ¯ −2 ¯ ¯t Qj = Im − khjk hjhj.

n m−n Proposition 2. [fpa-Heur.] TriColl applied to an LLL-basis b1, ..., bl ∈ 0 Z approximates the GNF R = [ri,j] = [r1, ..., rl] such that for j = 0, ..., l − 1 0 3 j−1 −t 1. |r¯j+1,l − rj+1,l| ≤ kπn(¯rj,l − rj,l)k ≤ 10 m ( 2 ) max1≤i≤l ri,i 2 , 0 5 (j−1)/2 −t 2. E[kπn(¯rj,l − rj,l)k] ≤ 10 m( 4 ) max1≤i≤l ri,i 2 holds for random fpa-error vectors.

2 l−i 2 l−1 LLL-bases b1, ..., bl satisfy ri,i ≤ α rl,l and kblk ≤ α 2 rl,l.Clause 1. of Prop. 2 shows for j = i − 1 that TriColl achieves for i = 1, ..., l l−1 3 i−2 rl,l −t |µ¯l,i − µl,i| ≈ |r¯i,l − ri,l|/ri,i ≤ 10 m α 2 ( ) 2 . (1) 2 ri,i This bound can easily be adjusted to the case that merely b1, ..., bl−1 is LLL-reduced. Thus it covers for i = l − 1 the critical situation of swapping b , b within LLL . √ l−1 l √ H t l−1 3 √ It guarantees correct swapping of bl−1, bl if 2 ≥ 100 m 3 , since 2 α ≈ 3 holds 4 for α ≈ 3 . On average the error bounds are much smaller for random fpa-error vec- 3 5 1/2 tors. Clause 2. of Prop.2 reduces 2 in (1) on average to ( 4 ) ≈ 1.12. Moreover, the constant α in (1) reduces considerably by stronger lattice reduction. Sections 3-7 show that α can be decreased by feasible lattice reduction to about 1.025. This re- √ t l−1 duces our correctness condition 2 ≥ 100 m 3 , for LLL-swapping of bl−1, bl to 2t ≥ 100 m 1.132l−1.

Proof of Prop. 2. Induction on j; j = 0: Consider the last round of TriColl where size-reduction in step 5 is void. So let bl and rl be size-reduced. We have that 2 n−1 0 r0,l = bl, r1,l = bl − hh1, blih1/r1,1, h1 = (−r1,1, 0 , b1), r1,1 = kb1k. A lengthy d −t −2t proof shows k¯r1,l − r1,lk/kr1,lk ≤ ( 2 + 3)2 + O(2 ), see [44] pp. 84, 85, (15.21). We disregard all O(2−2t)-terms. The claim holds for j = 0 and arbitrary l since the P 2 1/2 size-reduced bl satisfies kblk = kr1,lk ≤ ( 1≤i≤l ri,i) . The constant factor 10 in the claim is a crude upper bound.

11 2 2 0 Induction step j − 1 → j: Clearly rj,l = Qjrj−1,l, khjk = 2rj,j, kπn(rj−1,j)k = 0 0 rj,j, πn(hj) = πn(rj−1,j), the j-th entries of hj, rj,l are −rj,j, rj,l. Hence 0 rj,l = hhj, πn(rj−1,l)i/rj,j, (2) 2 rj,l − rj−1,l = −hhj, πn(rj−1,l)i hj/rj,j = −rj,l hj/rj,j, 0 0 rj,l π (rj,l) = π (rj−1,l − rj−1,j). (3) n n rj,j 0 Consider the part of kπn(¯rj,l − rj,l)k induced via (5) from |r¯j,l − rj,l|. We neglect fpa-errors from ¯rj−1,l 7→ fl(Q¯j¯rj−1,l) = ¯rj,l as they are minor. (2) shows that ¯ 0 0 r¯j,l − rj,lrj,j = hhj, πn(¯rj−1,l)i − hhj, πn(rj−1,l)i ¯ 0 ¯ 0 = hhj − hj, πn(rj−1,l)i + hhj, πn(¯rj−1,l − rj−1,l)i. ¯ 0 0 The contribution of hhj, πn(¯rj−1,l − rj−1,l)i viar ¯j,l − rj,l to kπn(¯rj,l − rj,l)k is part ¯ 0 ¯ of Qjπn(rj−1,l − ¯rj−1,l). As the orthogonal Qj preserves the length of error vectors 0 that contribution is covered by kπn(¯rj−1,l − rj−1,l)k. We neglect the contribution of ¯ 0 0 ¯ hhj − hj, πn(rj−1,l)i to kπn(¯rj,l − rj,l)k assuming that the error hj − hj is random. Therefore, (3) shows up to minor errors that

0 0 rj,l 0 π (¯rj,l − rj,l) ≈ π (¯rj−1,l − rj−1,l) + π (¯rj−1,j − rj−1,j). (4) n n rj,j n 1 Applying the induction hypothesis for j − 1 and size-reducedness, |rj,l/rj,j| ≤ 2 , we get the second part of the induction claim: 0 3 j−2 3 −t kπn(¯rj,l − rj,l)k ≤ 10 m( 2 ) ( 2 ) max1≤i≤l ri,i 2 . 0 The first part |r¯j+1,l − rj+1,l| ≤ kπn(¯rj,l − rj,lk holds since rj+1,l is an entry of rj+1,l = Qjrj,l. On the average the error bounds are much smaller since independent random error vectors are with high probability nearly orthogonal. Thus (4) shows 0 2 0 2 rj,l 2 0 2 E[kπ (¯rj,l − rj,l)k ] ≈ E[kπ (¯rj−1,l − rj−1,l)k ] + E[ | | kπ (¯rj−1,j − rj−1,j)k ]. n n rj,j n This proves by induction on j clause 2 of Prop. 2.  √ 3 √ We set δ := 0.98, δ− := 0.97, δ+ := 0.99, α := 1/0.73 < 1.37, ρ := 2 α ≈ 3, 2 3 2 1/2 ε := 0.01 and αε := (1 + ε α)/( 4 − 4ε − ε /4 − (1 + 2ε)εα ) < 1.44. Recall that LLLH is obtained by replacing steps 2.1, .2.2 of LLL by TriColl; let n M = max(d1, ..., dn, 2 ) for the input basis and M0 = max(kb1k, ..., kbnk).

Theorem 5. [Thm. 2 of [70] using fpa-Heur.] LLLH transforms with fpa of pre- t 10 n m cision 2 ≥ 2 m ρ a basis b1, ..., bn ∈ Z into an approximate LLL-basis satisfying j−i 1 2 1. |µj,i| < 2 + εαε rj,j/ri,i + ε for 1 ≤ i < j ≤ n, 2 2 2 2 2. δ− ri,i ≤ µi+1,i ri,i + ri+1,i+1 for i = 1, . . . , n − 1. 3. Clauses 1.-3. of Theorem 1 hold with α replaced by αε. 2 LLLH runs in O(n m log1/δ M) arithmetic steps using n + log2 M0 bit integers and fpa numbers.

We call a basis size-reduced under fpa if it satisfies clause 1 of Theorem 5. The j−i term εαε 2 rj,j/ri,i ≥ ε covers the error |µ¯j,i − µj,i| from (1).

12 O(n) 5 In particular LLLH runs for M0 = 2 , m = O(n) in O(n ) arithmetic steps and in O(n6)/ O(n5+ε) bit operations under school-/ FFT-multiplication. 2 Similar to L of [56] LLLH should be quadratic in log2 M0 since size-reduction in step 5 of TriColl is done in fpa using the t most significant bits of ri,l, rl,l, µl,i. Thus at least one argument of multiplication/division has bit length t. The maximal bit length of the µl,i decreases in practice by Θ(t) per round of TriColl, e.g. by about 30 for t = 53, see Fig. 1 [41]. In our experience LLLH is most likely correct up to dimension n = 350 under fpa with t = 53 precision bits for arbitrary M0, and not just for t ≥ Ω(n + log2 M0) as shown in Theorem 5. LLLH computes minimal, near zero errors for the LLL-bases given in [56], where the fpa-LLL-code of NTL fails in dim. 55, NTL orthogonalizes the Gram-matrix by classical GSO and LLLH by HO.

Givens rotations zero a single entry of col(j, Rj−1) and provide a slightly better fpa-error bound than HO [34] chapter 18.5. Givens rotations have been used in parallel LLL-algorithms of Heckler, Thiele and Joux. The L2 of Nguyen, Stehle´ [56] uses GSO in fpa for the Gram matrix BtB. √ √ l−1 Theorem 2 of [56] essentially replaces 3 by 3 in our condition 2t ≥ 100 m 3 for correct LLL-swapping. LLLH performs correct LLL-swaps in about twice the dimension compared to the L2 of [56]. This is because replacing the basis B by the Gram matrix BtB squares the matrix condition number K(B) = kBkkB−1k which characterizes the sensitivity of Q, R to small perturbations of B, see [34] chapter 18.8. As fpa-errors during the QR-factorization act similar to perturbations of B the squaring K(BtB) = O(K(B)2) essentially halves the accurate bits for R,Q and the dimension for which L2 is correct. The squaring also doubles the bit length of B and more than doubles the running time. While [74] reports that L2 is most likely correct with 53 precision bits up to dimension 170, LLLH is most likely correct up to dimension 350. The strength of L2 is its proven accuracy, moreover L2 is quadratic in the bit length log2 M0. While the proven accuracy bound of [56] is rather modest for 53 precision bits the Magma-code of L2 uses intermediary heuristics and provides proven accuracy of the output via multiprecision fpa [74]. Scaled LLL-reduction. Scaling is a useful concept of numerical analysis for reducing fpa-errors. Scaled LLL-reduction of [41] associates with a given lattice basis a scaled 1 2 2 basis of a sublattice of the given lattice. The scaled basis satisfies 2 ≤ |r1,1/rj,j| ≤ 2 for all j. Scaled LLL-reduction performs a relaxed size-reduction, reducing relative to an associated scaled basis. The relaxed size-reduction is very accurate, independent of fpa-errors, and its relaxation is negligible. Scaled LLL-reduction is useful in dimension n > 350 where LLLH becomes inaccurate. That way we reduced in 2002 a lattice bases of dimension 1000 consisting of integers of bit length 400 in 10 hours on a 800 MHz PC. Comparison with [65] and the modular LLL of [73]. Theorem 7 improves the time bound and the accuracy of the theoretic method of [S88] which uses approximate rational arithmetic.

13 The modular LLL [73] performs O(nm log1/δ M) arithmetic steps on integers of bit length log2(M0M) using standard matrix multiplication, where M denotes n Ω(n) max(d1, ..., dn, 2 ). If M0 = 2 , the LLL’s of [65], [73] match asymptotically the bound for the number of bit operations of LLLH . Neither of these LLL’s is quadratic in M0. The practicability of LLLH rests on the use of small inte- gers of bit length 1.11 n + log2 M0 whereas [73] uses long integers of bit length log2(M0M) = O(n log M0).

5 Semi Block 2k-Reduction Revisited

Survey, background and perspectives of sections 5–7. We survey feasible basis reduc- tion algorithms that decrease α in clauses 1, 2 of Theorem 1 toα ¯ < α for n  2. n−1 n−1 The factors α 2 , α of Theorem 1 decrease within polynomial time reduction to 2 2O((n log log n) / log n) [64] and combined with [4] to 2O(n log log n/ log n). In this survey we focus on reductions of α achievable in feasible lattice reduction time. Some reductions are proven by heuristics to be feasible on the average.

For the rest of the paper let δ ≈ 1 so that α ≈ 4/3. LLL-bases approximate λ1 n−1 n up to a factor α 2 . 1.155 . They approximate λ1 much better for lattices of high 2 2/n n−1 √ n density where λ1 ≈ γn(det L) , namely up to a factor . α 4 / γn . 1.075 due to part 1 of Theorem 1. Moreover, [57] reports that α decreases on average to about 1.024 ≈ 1.08 for the random lattices of [57]. The constant α can be further decreased within polynomial reduction time by blockwise basis reduction. We compare Schnorr’s algorithm for semi block 2k- reduction [64] and Koy’s primal-dual reduction[39] with blocksize 2k. Both algorithms perform HKZ-reductions in dimension 2k and have similar polynomial time bounds. They are feasible for 2k ≤ 50. Koy’s algorithm guarantees within the same time bound under known proofs better approximations of the shortest lattice vector. Under rea- sonable heuristics both algorithms are equally strong and much better than proven in worst-case. We combine primal-dual reduction with Schnorr’s random sampling reduction (RSR) to a highly parallel reduction algorithm that is on the average more 4 n/2 efficient than previous algorithms. It reduces the approximation factor ( 3 ) guar- anteed by the LLL-algorithm on average to 1.025n/2 using feasible lattice reduction.

Semi block 2k-reduced bases of [64] satisfy by Theorem 6 the inequalities of Theo- 1/k 1/k rem 1 with α replaced by (βk/δ) , for a lattice constant βk such that limk→∞ βk = k 2 ln 2+1/k 1. The best known bounds on βk are k/12 < βk < (1 + 2 ) [23]. Primal-dual 2 1/2k reduction (Alg. 3) replaces α by (αγ2k) (Theorem 7). The second bound out- performs the first, unless βk is close to its lower bound k/12. Primal-dual reduction for blocks of length 48 replaces α in Theorem 1 within feasible reduction time by 2 1/48 (αγ48) ≈ 1.084. The algorithms Alg. 2, Alg. 3 for semi block 2k reduction and

14 primal-dual reduction are equally powerful in approximating λ1 under the GSA- heuristic of [69]. They perform under GSA much better than proven in worst case. Section 7 surveys some basis reduction algorithms that are effcient on average but not proven polynomial time. BKZ-reduction of [66] runs in practice for blocksize 10 in less than twice the LLL-time. The LLL with the deep insertion step of [66] seems to be polynomial time on the average and greatly improves the approximational power of the LLL. Based on experiments [57] reports that LLL with deep insertions decreases α for random lattices on average to 1.0124 ≈ 1.05 ≈ α1/6. In section 7 we replace HKZ-reduction within primal-dual reduction by random sampling reduction (RSR) of [69], a parallel extension of the deep insertion step of [66]. RSR is nearly feasible up to blocksize k = 80. The new algorithm, primal-dual RSR (Alg. 4) replaces under the worst-case GSA-heuristics α in Theorem 1 by (80/11)1/80 ≈ 1.025. Alg. 4 is highly parallel and polynomial time on the average but not proven polynomial time.

For Table 1 we assume that the densest known lattice packings P48p,P48q in di- mension 48 [16] table 1.3, have nearly maximal density, then γ48 ≈ 6.01. For the assumptions GSA, RA see section 6. GSA is a worst case heuristics in the sense 2 2 that bases B = QR having a large spread of the values ri+1,i+1/ri,i are in general easier to reduce.

1. Semi block 2k-reduction [64], k = 24α ¯ 1/24 proven [23] (β24/δ) < 1.165 1/47 by heuristic, GSA γ47 ≈ 1.039 2. Primal-dual reduction, Koy 2004, k = 48 2 1/48 proven (αγ48) ≈ 1.084 1/47 by heuristic, GSA γ48 ≈ 1.039 3. Primal-dual RSR, k = 80, under GSA, RA 1.025 4. LLL on the average for random lattices, experimental [57]: 1.08 5. LLL with deep insertion [66] on the average for random lattices, experimental [57], [7]: 1.0124 ≈ 1.05

4 Table 1: Reductions α¯ of α ≈ 3 in Thm. 1 under feasible lattice basis reduction for n  2.

m×n Notation. For a basis B = QR ∈ R , R = [ri,j]1≤i,j≤n with n = hk let k×k R` := [ri,j]k`−k

15 denote the principal submatrices of the GNF R corresponding to the segments B` = [bk`−k+1, ..., bk`] and [B`,B`+1] of B. We denote 2 D` =def (det R`) = dk`/dk`−k, (1) Qh−1 Qh−1 h−` Dk = D =def `=1 dk` = `=1 D` .

1/k The lattice constant βk. Let βk =def max(det R1/ det R2) maximized over all h 0 i R1 R 2k×2k HKZ-reduced GNF’s R = R1,2 = 1 ∈ R . OR2 r r 2 2 h 1,1 1,2 i 2×2 Note that β1 = max r /r over all GNF’s R = ∈ satisfying 1,1 2,2 0 r2,2 R 2 2 2 4 1 |r1,2| ≤ r1,1/2, r1,1 ≤ r1,2 + r2,2 and thus β1 = 3 = α holds for δ = 1, η = 2 . Note k 2 ln 2+1/k that βk ≤ (1 + 2 ) [23].

Definition 2. [64] A basis B = QR ∈ Rm×n, n = hk, is semi block 2k-reduced for 2 2 δ ∈ (η , 1] and α = 1/(δ − η ) if the GNF R = [ri,j] satisfies

1. R1, ..., Rh ⊂ R are HKZ-reduced, 2 2 2. rk`,k` ≤ α rk`+1,k`+1 for ` = 1, ..., h − 1, k k 3. δ D` ≤ βk D`+1 for ` = 1, ..., h − 1.

k 3 In [64] α in clause 2 has been set to 2 and δ in clause 3 has been set to 4 . For 2 2 k = 1, clause 3 means that r`,` ≤ αr`+1,`+1. LLL-bases for δ are semi block 2k-reduced for k = 1.

Theorem 6. [64]. A semi block 2k-reduced basis B = QR ∈ Rm×n, n = hk, satisfies 2 n/k−1 2/n kb1k ≤ γk(βk/δ) 2 (det L(B)) .

2 1/k 2/k Proof. We have that kb1k ≤ γkD1 = γk(det R1) since R1 is HKZ-reduced. 1/k 1/k Clause 3 of Def. 3 shows D` ≤ (βk/δ)D`+1 and yields 2 1/k `−1 1/k kb1k ≤ γkD1 ≤ γk(βk/δ) D` for ` = 1, ..., h = n/k. Multiplying these inequalities and taking h-th roots yields the claim. 

2 −2 ln k+2 n/k−2 Moreover, kb1k λ1 ≤ k (βk/δ) holds for k ≥ 3 [64] Thm 3.1, Cor. 3.5. Ajtai [3] proved these bounds to be optimal up to a constant factor in the expo- nent. There exist semi block 2k-reduced bases of arbitrary dimension n satisfying 2 Ω(n/k) 2/n kb1k ≥ γk(βk/δ) (det L(B)) .

Alg. 2 rephrases the algorithm of [64] without using the unknown constant βk. For k = 1 Alg. 2 essentially coincides with LLL-reduction.

The transforms T of R`,`+1 that perform LLL-, resp. HKZ-reduction are only trans- k/2 ported to the basis B if this decreases D` by the factor δ, resp. δ .

16 Alg. 2: Semi block 2k-reduction m×n 2 1 INPUT basis B = [b1,..., bn] ∈ Z , δ ∈ [η , 1), η = 2 + ε, n = hk. OUTPUT semi block 2k-reduced basis B. n×n 1. LLL-reduce B, HKZ-reduce [B1,B2] = [b1, ..., b2k], compute R = [ri,j ] ∈ R , ` := 2. 0 0 0 2. HKZ-reduce R`+1 into R`+1 T for some T ∈ GLk(Z), B`+1 := B`+1 T , LLL-reduce R`,`+1 into R`,`+1T .

If an LLL-swap bridging R` and R`+1 occured THEN

[B`,B`+1] := [B`,B`+1] T , ` := max(` − 1, 1) GO TO 2

HKZ-reduce R`,`+1 into R`,`+1T for some T ∈ GL2k(Z). " new 0 new # new new 2 R` R` 3. Compute D` := (det R` ) for new := GNF(R`,`+1T ), OR`+1 new k/2 IF D` ≤ δ D` THEN [B`,B`+1] := [B`,B`+1] T , recomputr R`,R`+1, ` := ` − 1 ELSE ` := ` + 1. 4. IF ` = 1 THEN GO TO 1, IF 1 < ` < h THEN GO TO 2 ELSE terminate.

Correctness. Induction over the rounds of the algorithm shows that the basis b1, ..., bk` is always semi block 2k-reduced for the current `. We show that clauses 2, 3 of Def. 2 hold, clause 1 obviously holds.

Clause 2 : LLL-reduction of R`,`+1 in step 2 guarantees clause 2. In particular if an LLL-swap bridging R`, R`+1 occured the blocks B`, B`+1 get transformed. new k new Clause 3 : After HKZ-reduction of R`,`+1 we have that D` ≤ βk D`+1 . Before new k/2 new −k/2 increasing ` we have D` > δ D` and thus D`+1 < δ D`+1 resulting in k/2 new k new −k/2 k k k δ D` < D` ≤ βk D`+1 < δ βk D`+1, and thus δ D` < βk D`+1. 

Lemma 2. Semi block 2k-reduction performs at most h − 1 + 2n(h − 1) log1/δ M0 rounds, i.e., passes of step 2.

Proof. Let k ≥ 2. Semi block 2k-reduction iteratively decreases D` either by LLL- reduction or by HKZ-reduction of R`,`+1 Each pass of steps 2, 3 either decreases Qh−1 h−` k/2 D` and D = `=1 D` by the factor δ, resp. δ or else increments `. Since ini- 2k h Qh−1 (2) n(h−1) tially D = `=1 dk` ≤ M0 = M0 the integer D can be decreased at most 2n(h − 1) log1/δ M0 times by the factor δ. Hence there are at most n(h − 1) log1/δ M0 passes of steps 2, 3 that decrease D by the factor δ, resp. δk/2 and at most h − 1 + n(h − 1) log1/δ M0 passes of step 2 that do not change D but increment ` in step 3.  2 The proof of [64] Theorem 3.2 shows that a HKZ-reduction of R`,`+1 performs O(n k+ 4 k+o(k) k log M0) + (2k) arithmetic steps using integers of bit length O(n log M0). Fol- 4 2 k+o(k) lowing [S87], semi block 2k-reduction performs O((n +n (2k) ) log1/δ M0) arith- metic steps.

17 6 Primal-Dual Reduction

2 Koy’s primal-dual reduction [39] decreases D` = (det R`) as follows. It maximizes rk`,k` over the GNF’s of R` T` and minimizes rk`+1,k`+1 over the GNF’s of R`+1 T`+1 for all T`, T`+1 ∈ GLk(Z) and then swaps bk`, bk`+1 if this decreases rk`,k` and D`. Primal-dual√ reduction with double blocksize 2k replaces the constant βk/δ in Theo- rem 6 by α γ2k which is better understood than βk/δ since γk = Θ(k).

Dual lattice and dual basis. The dual of lattice L = L(QR) is the lattice L∗ = {z ∈ span(L) | zty ∈ Z for all y ∈ L}. ∗ −t −t t −1 t −1 −t −t L = L(QR ) holds because (QR ) QR = R Q QR = R R = In. QR = B holds for m = n. −t −t R is a lower triangular matrix and UnR Un is upper-triangular with positive di- ∗ −t −t agonal entries. Clearly L = L(QR Un), the basis QR Un has QR-decomposition −t −t −t QR Un = (QUn)(UnR Un) because QUn is isometric and UnR Un is upper- ∗ −t triangular. B := QR Un is the (reversed) dual basis of B = QR. Note that ∗ ∗ ∗ ∗ −t (B ) = B. B has the dual GNF R := UnR Un. The (reversed) dual basis ∗ ∗ ∗ B = [b1,..., bn] of B = [b1,..., bn] is characterized by ∗ ∗ hbi , bn−j+1i = δi,j = hbi, bn−j+1i, ∗ ∗ ∗ ∗ −t where δi,j ∈ {0, 1} is 1 iff i = j. The dual basis B satisfies B = [b1, ..., bn] = B ∗ for m = n. (The bi denote the dual basis vectors and not the orthogonal vectors ∗ ∗ qi = πi(bi) as in [47]. The diagonal entries of R = [ri,j] and R = [ri,j] satisfy ∗ ri,i = 1/rn−i+1,n−i+1 for i = 1, . . . , n. (5) ∗ ∗ ∗ ∗ HKZ-reduction of R minimizes r1,1 = kb1k and maximizes rn,n = 1/r1,1.

m×n Notation. For a basis B = QR ∈ R we letr ¯k`,k` for k` ≤ n denote the maximum of rek`,k` over the GNF’s [rei,j]k`−k

Definition 3. A basis B = QR ∈ Rm×n, n = hk is a primal-dual basis for k and 2 1 δ ∈ (η , 1], α = 1/(δ − 4 ) if its GNF R = [ri,j] satisfies 1. R1, ..., Rh ⊂ R are HKZ-reduced, 2 2 2.r ¯k`,k` ≤ α rk`+1,k`+1 for ` = 1, ..., h − 1.

We see from (5) that clause 2 of Def. 3 also holds for the dual B∗ of a primal-dual basis B. Therefore, such B∗ can be transformed into a primal-dual basis by HKZ- ∗ ∗ ∗ ∗ reducing B` = [bk`+1, ..., bk`+k] into B` T` for ` = 1, ..., h. Moreover, clauses 2 and 3 of Def. 1 are preserved under duality, they hold for the dual of a semi block 2k-reduced 2 1/k basis. Theorem 7 replaces α in Theorem 1 by (αγk) .

18 Theorem 7. [39], [23]. A primal-dual basis B = QR ∈ Rm×n, n = hk of the lattice 2 2 h−1 2/n 2 2 h−1 2 L satisfies 1. kb1k ≤ γk(αγk) 2 (det L) , 2. kb1k ≤ (αγk) λ1. 2 2 Proof. 1. The maximumr ¯k`,k` of rk`,k` over the R`T satisfies by clause 2 of Def. 3 2 2 r¯k`,k` ≤ α rk`+1,k`+1. 1/k 2 2 ∗ Moreover we have D` ≤ γkr¯k`,k` = γk/λ1(L(R` )) 2 ∗ sincer ¯k`,k` is computed by HKZ-reduction of R` , and 2 2 1/k λ1(L(R`+1)) = rk`+1,k`+1 ≤ γkD`+1 since R`+1 is HKZ-reduced. Combining these inequalities we get 1/k 2 2 2 1/k D` ≤ γkr¯k`,k` ≤ αγkrk`+1,k`+1 ≤ αγkD`+1. (6) Since R1 is HKZ-reduced this yields : 2 1/k 2 `−1 1/k kb1k ≤ γkD1 ≤ γk(αγk) D` for ` = 1, ..., h. Multiplying these h inequalities and taking h-th roots yields the claim. ∗ ∗ 1/k 2. Note that the inequality (6) also holds for the dual basis B , i.e., (D` ) ≤ 2 ∗ 1/k ∗ ∗ 2 −1 ∗ αγk(D`+1) holds for D` = (det R` ) = Dh−`+1. Hence the dual 1 . of part 1. of Thm 7 also holds: −1 −h+1 2 2 2 2/n 1*. r¯n,n ≥ γk (αγk) (det L) . ∗ 2 2 2 h−1 2 1. and 1. yield kb1k ≤ γk(αγk) r¯n,n. By clause 2 of Def. 3 we get 2 2 2 `−1 2 2 ` 2 kb1k ≤ γk(αγk) r¯k`,k` ≤ (αγk) rk`+1,k`+1 for ` = 0, ..., h − 1. Therefore rk`+1,k`+1 ≤ λ1 yields the claim. In fact rk`+1,k`+1 ≤ kπk`+1(b)k ≤ λ1 Pn holds if a shortest lattice vector b = j=1 rjbj 6= 0 satisfies k` < µ ≤ k` + k for µ := max{j | rj 6= 0}, because R`+1 is HKZ-reduced. 

Alg. 3: Koy’s algorithm for primal-dual reduction m×n 1 2 INPUT basis [b1, ..., bn] = B = QR ∈ Z , δ ∈ (( 2 + ε) , 1), n = hk. OUTPUT primal-dual reduced basis B for k, δ. 1. LLL-reduce B, HKZ-reduce B1 = [b1, ..., bk] and compute R = GNF(B), ` := 1. 2. #reduce R`,`+1 by primal and dual HKZ-reduction of blocksize k: 0 0 HKZ-reduce R`+1 into R`+1T , B`+1 := B`+1T . » −t – −t HKZ-reduce R∗ = U R U into R∗ T , set T¯ := UkT` O . ` k ` k ` ` OI ¯ k LLL-reduce R`,`+1T into R`,`+1T with δ. 3. IF an LLL-swap bridging R` and R`+1 occured THEN [B`,B`+1] := [B`,B`+1] T , ` := max(` − 1, 1) ELSE ` := ` + 1. 4. IF ` < h THEN GO TO 2 ELSE output B and terminate.

Correctness. Induction over the rounds of the algorithm shows that the basis b1, ..., bk` is always a primal-dual basis for the current `. The algorithm deviates from semi block 2k-reduction in the steps 2, 3. Step 2 maxi- ∗ mizes rk`,k` within R` by HKZ-reduction of R` and minimizes rk`+1,k`+1 within R`+1 by HKZ-reduction of R`+1, and then LLL-reduces R`,`+1. If no LLL-swap bridging R` and R`+1 occured in step 2 then clause 2 of Def. 2 was previously satisfied for `.

19 Lemma 3. Primal-dual reduction performs at most h−1+2n(h−1) log1/δ M0 passes of step 2. Qh−1 n(h−1) Proof. Initially we have D = `=1 d`k ≤ M0 . Steps 2, 3 either decrease D` by a factor δ or else increments `. Thus the proof of Lemma 2 applies. 

The number of passes of step 2 is in worst case at most h − 1 + 2n(h − 1) log1/δ M0. The actual number of passes may be smaller but on the average it should be propor- tional to h − 1 + 2n(h − 1) log1/δ M0. The dual clause 2 of Def. 3 has been strengthened in [24] for arbitrary small ε > 0 to + 2 . r¯kl+1,kl+1 ≤ (1 + ε)rkl+1,kl+1 for l = 1, ..., h − 1 0 0 denotingr ¯kl+1,kl+1 := maxT rkl+1,kl+1 of [ri,j] := GNF([ri,j]kl−k+2≤i,j≤kl+1T ) over all T ∈ GLk(Z). 2 This improves the inequalities of Theorem 7 to hold with αγk replaced by 2k  k−1 (1 + ε)γk [24]: n−k n−k 2  k−1 2/n  k−1 1. kb1k ≤ γk (1 + ε)γk (det L) , 2. kb1k ≤ (1 + ε)γk λ1.

+ In adjusting Alg. 3 to the new clause 2 it is crucial that increasing rkl+1,kl+1 by + 2 −2 a factor 1 + ε to satisfy clause 2 decreases Dl = det Rl by the factor (1 + ε) and preserves DlDl+1 and all Di for i 6= l, l + 1. The adjusted Alg. 3 performs at most h + 2nh log1+ε M0 HKZ-reductions in dimension k.

Alg. 2 and Alg. 3 (for blocksize 2k) have by Lemma 2 and 3 the same time bound, both algorithms do HKZ-reductions in dimension 2k. For double blocksize 2k Theorem 7 shows 2 2 n/4k−1/2 2/n 2 2 n/2k−1 2 1. kb1k ≤ γ2k(αγ2k) (det L) , 2. kb1k ≤ (αγ2k) λ1.

These bounds are better than the bounds√ of Theorem 6 unless βk is close to the lower bound βk > k/12 of [23] so that βk/δ ≤ α γ2k. Due to the unknown values βk semi block 2k-reduction can still be competitive.

Theorems 6 and 7 under the GSA-heuristics. We associate the quotients qi := 2 2 n×n ri+1,i+1/ri,i with the GNF R = [ri,j] ∈ R . In practice the qi are all very close for typical reduced bases. For simplicity we assume the geometric series assumption (GSA) of [69]:

GSA q =def q1 = q2 = ··· = qn−1. GSA is a worst-case property. Bases that do not satisfy GSA are easier to reduce, see [69]. The worst case bases of Ajtai [3] also satisfy GSA. We next show that the bounds of Theorem 6 improve under GSA to those of Theorem 7 while the bounds of Theorem 7 are mainly preserved under GSA. 2/n 2 Under GSA Theorem 1 holds with α replaced by 1/q. Note that (det L) /r1,1 = n 1 n−1 1−n (2) n 2 2/n q = q 2 holds under GSA and thus r1,1 = q 2 (det L) . Hence part 1.

20 of Theorem 1 holds with α replaced by 1/q. Part 2. also holds due to the duality argument used in the proof of Theorem 7.

HKZ-bases R` satisfy under GSA 2 k 1 k−1 2 2 2 (2) k 2 kb1k = λ1(L(R`)) ≤ γk det L(R`) k = γkkb1k q = γkkb1k q 2 . 2 k−1 This shows that 1/q ≤ γk holds under GSA. Replacing in Theorem 1 α by 1/q ≤ 2 k−1 γk we get

Corollary 2. Primal-dual bases of blocksize k and n = hk satisfy under GSA the n−1 2 k−1 2 k−1 2/n bounds of Theorem 1 with α replaced by γ , in particular kb1k ≤ γk (det L) , 2 n−1 2 k−1 2 kb1k ≤ γk λ1.

The bounds of Cor. 2 and those of Thm. 7 nearly coincide. Cor. 2 eliminates α n n−1 from Thm. 7 and replaces h = k by the slightly larger value k−1 . Interestingly , the bounds of Cor. 2 coincide for k = 2 with clauses 1, 2 of Theorem 1 for LLL-bases with 4 2 4 δ = 1, α = 3 because γ2 = 3 . 2 2k−1 Similarly, 1/q ≤ γ2k holds under GSA for any HKZ-reduced GNF R = R1,2 ∈ R2k×2k. Hence

Corollary 3. Semi block 2k-reduced bases with n = hk satisfy under GSA the in- 2 n−1 2k−1 2 2k−1 2/n equalities of Theorem 1 with α replaced by γ2k , in particular kb1k ≤ γ2k (det L) , 2 n−1 2 2k−1 2 kb1k ≤ γ2k λ1. Primal-dual bases of blocksize 2k and semi block 2k-reduced bases are by Cor. 2, Cor. 3 nearly equally strong under GSA. This suggests that Alg. 2 for semi block 2k-reduction can be strengthened by working towards GSA (similar to Alg. 4) so that GSA approximately holds for the output basis.

2 −2 Practical bounds for kb1k λ1 .We compare the bounds of Thm. 6 for 2k = 48 to those of Thm. 7 for double blocksize 48. Note that γ24 = 4 [15]. Assuming that the densest known lattice packings P48p,P48q in dimension 48 [16] table 1.3, is nearly maximal we have that γ48 ≈ 6.01. HKZ-reduction in dimension 48 is nearly feasible. Let δ = 0.99.

2 ln 2+1/24 Semi block 2k-reduction for k = 24. Using β24 ≤ 13 Theorem 6 2 2/n n/48−1/2 n/2 proves kb1k /(det L) ≤ γ24(β24/δ) < γ24 1.165 . Moreover 1 n−1 2 2/n 47 2 kb1k /(det L) ≤ γ48 1/47 holds under GSA. This replaces α in Theorem 1 by γ48 < 1.039.

Primal-dual bases of blocksize 48 satisfy by Theorem 7 2 2/n 2 n/48−1 n/2 √ kb1k /(det L) < γ48(αγ48) 2 . 1.075 / α. 2 1/48 This replaces α in Theorem 1 by (αγ48) ≈ 1.084. Moreover,

21 1 n−1 2 2/n 47 2 kb1k /(det L) ≤ γ48 1/47 holds under GSA which replaces α in Theorem 1 by γ48 < 1.039.

While γ48 is relatively large Alg. 2 and 3 perform better when γk is relatively small. This suggests to choose k in practice clearly apart from multiples of 24 since γk is relatively large for k = 0 mod 24.

7 Primal-Dual Random Sampling Reduction

We replace in primal-dual reduction HKZ-reduction by random sampling reduction (RSR) of [69], a method that shortens the first basis vector. RSR extends the deep insertion step of [66] to a highly parallel algorithm. We use local RSR in a way to approximate the GSA-property which has been used in the analysis of [69]. Moreover, global RSR breaks the worst-case bases of [Aj03] against semi-block k-reduction and those of [24] against primal-dual reduction. These worst-case bases B = QR satisfy ri,j = 0 for j ≥ i + k which results in a very short last vector bn. Primal-dual RSR (ALG. 4) runs for k = 80 in expected feasible time under reasonable heuristics. It reduces α in Theorem 1 to less than (80/11)1/80 ≈ 1.025 which so far is the smallest feasible reduction of α proven under heuristics.

m×n Notation.We associate with a basis B = QR = [b1, ...., bn] ∈ R , R = [ri,j]1≤i,j≤n the submatrix Rν,k := [ri,j]ν

RSR of Rν,k. Let Rν,k = [ri,j]ν

Analysis of RSR on Rν,k. Under the assumptions 1 1 0 RA rν+j,ν+j0 ∈R [− 2 , 2 ] is random for j > j,

22 GSA ri+1,i+1/ri,i = qν for all i, ν < i < ν + k, ∗ [S03, Thm 1, Remark 2] shows that RSR of Rν,k and Rν,k succeeds in steps 3, 4 as 1/k long as qν < (11/k) . Primal-dual RSR uses all indices ν = 1, ..., n − k in a uniform way, this helps to approximate the GSA-property.

Theorem 8. [69] RA, GSA. Primal-dual RSR transforms a basis B ∈ Rm×n of L = L(B) such that 2 n−1 2/n 2 n−1 2 1. kb1k ≤ (k/11) 2k (det L) , 2. kb1k ≤ (k/11) k λ1.

Theorem 8 replaces α in Theorem 1 by (k/11)1/k with (80/11)1/80 ≈ 1.025 for k = 80.

Alg. 4. Primal-dual RSR m×n 2 1 INPUT basis B = QR ∈ Z , δ ∈ [η , 1), η = 2 + ε, k OUTPUT reduced basis B. 1. LLL-reduce B with δ and compute the GNF R of B, ` := 0. 2. IF ` = 0 mod bn/kc THEN [ BKZ-reduce B into BT with δ and blocksize 20, compute the new GNF R, ` := ` + 1. ] # this approximates the GSA-property. 3. Primal RSR-step. Randomly select 0 ≤ ν ≤ n − k that nearly maximizes 1/k rν+1,ν+1/| det Rν,k| . Try to decrease rν+1,ν+1 by the factor δ through RSR of Rν,k ⊂ R, i.e., compute some Ta,k in GLk(Z) such that the GNF Reν,k = [rei,j ] of Rν,kTa,k satisfies reν+1,ν+1 ≤ δ rν+1,ν+1. Transform Bν,k := Bν,kTa,k. Recompute Rν,k. 4. Dual RSR-step. Randomly select 0 ≤ ν ≤ n − k that nearly minimizes 1/k rν+k,ν+k/| det Rν,k| . Try to increase rν+k,ν+k by the factor 1/δ through RSR of the ∗ −t ∗ ∗ dual GNF Rν,k = UkRν,kUk, i.e., compute by RSR of Rν,k = [ri,j ] some Ta,k ∈ GLk(Z) ∗ ∗ ∗ ∗ such that the GNF Reν,k = [rei,j ] of Rν,kTa,k satisfies reν+1,ν+1 ≤ δrν+1,ν+1.# Hence −t the GNF Reν,k = [rei,j ] of Rν,kUkTa,k satisfies reν+k,ν+k ≥ rν+k,ν+k/δ. Transform −t Bν,k := Bν,kUkTa,k and recompute Rν,k. 5. Global RSR-step. Try to decrease r1,1 by the factor1/δ through RSR of R, i.e., compute some Ta,n in GLn(Z) such that the GNF Re = [rei,j ] of RTa,n satisfies re1,1 ≤ δ r1,1. Transform B := BTa,n and recompute R. 6. IF either of steps 3,4,5 succeeds, or ` 6= 0 mod bn/kc THEN GOTO 2 ELSE output B and terminate.

Primal-dual RSR time bound. RSR succeeds under RA, GSA in steps 3 and 4 using k/4+o(k) 1/k (k/11) arithm. steps provided that qν < (11/k) [69] Thm 1, ff]. For RA see [57] Fig. 4, 5 (Randomness of ri,i+1 is irrelevant, ri,i+1 is in practice nearly random 1 1 2 2 2 in [− 2 , 2 ] under the condition that ri,i ≈ ri,i+1 + ri+1,i+1, and this improves by deep insertion.) GSA is a worst-case assumption, in practice GSA is approximately satisfied. Ludwig [51] analyses an approximate version of GSA. (1) Qn−1 On the average one round of Alg. 4 decreases the integer D := i=1 di by the 2 1 2 factor δ . This bounds the average number of rounds by about 2 n log1/δ M0 since

23 (1) n(n−1) (1) initially D ≤ M0 . In worst case however D can even increase per round and Alg. 4 must not terminate. 40.14/40 Compairing Alg. 3 and Alg. 4 for k = 80. We assume that γ80 ≈ 4 · 2 ≈ 8.02, i.e., that the Mordell-Weil lattice MWn in dimension n = 80 has near maximal density, [16] table 1.3, similarly we assume γ400 ≈ 24. 1/79 Alg. 3 would under GSA reduce α in Theorem 1 for k = 80 to γ80 ≈ 1.027 but Alg. 3 does not work towards GSA. Moreover, primal-dual reduction with full HKZ- reduction is infeasible in dimension k = 80 requiring 8040+o(1) steps, whereas RSR is nearly feasible. Alg. 4 for primal-dual RSR reduces α in Theorem 1 to (80/11)1/80 ≈ 1.025 (by 1/n n−1 Theorem 8) and thus achieves kb1k/(det L) . 1.025 4 . 2/n For lattices of high density, λ1 ≈ γn det(L) , and n = 400, k = 80 this yields 99.75 √ kb1k/λ1 . 1.025 / γ400 . 2.4. If this bound further decreases on the average case this might endanger the NTRU schemes for parameters N ≤ 200. The secret key is a sufficiently short lattice vector of a lattice of dimension 2N. Ludwig [51] reports on attacks to NTRU via RSR. Doing HKZ-reduction via the sieve algorithm of [4] reduces all asymptotic time bounds at the expense of super-polynomial space. [62], [60] improve the AKS- algorithm and its analysis. The experimental comparison in [60] of AKS with the Schnorr, Euchner version [66] of [64] shows that the SE-algorithm with BKZ for block size 20 outperforms the improved AKS for dimension n ≤ 50.

8 Basic Segment LLL

Segment LLL uses an idea of Schonhage¨ [71] to do most of LLL-reduction locally in segments of low dimension k using k local coordinates. It guarantees that the of segments do not decrease to fast, see Def. 5. Here we present the basic algorithm SLLL0. Theorem 12 bounds the number of local LLL-reductions within SLLL0. Lemma 4 and Corollary 5 bound the norm of and the fpa-errors induced by local LLL-transforms. The algorithm SLLL0 is faster by a factor n in the number of arithmetic steps compared to LLLH but uses longer integers and fpa numbers, a drawback that will be repaired by SLLL. m×n Segments and local coordinates. Let the basis B = [b1,..., bn] ∈ Z have di- n×n mension n = k h and GNF R ∈ R . We partition B into m segments Bl,k = [blk−k+1,..., blk] for l = 1, ..., h. Local LLL-reduction of two consecutive segments Bl,k,Bl+1,k is done in local coordinates of the principal submatrix 2k×2k Rl,k := [rlk+i,lk+j]−k

24 the calls, and minimize the number, of local LLL-reductions of the Rl,k by means of the local squared of Bl,k 2 2 Dl,k =def kqlk−k+1k · · · kqlkk . 2 2 We have that dlk = kq1k · · · kqlkk = D1,k ··· Dl,k. Moreover, we will use (k) Qh−1 Qh−1 h−l D =def l=1 dlk = l=1 Dl,k , Ml,k =def maxlk−k

For the input basis B = QR we denote M1 := max1≤i≤j≤n kqik/kqjk. Ml,k is the M1-value of Rl,k when calling locLLL(Rl,k), obviously Ml,k ≤ M1. Recall that n M = max(d1, ..., dn, 2 ).

m Definition 4. A basis b1,..., bn ∈ Z , n = kh, is an SLLL0-basis (or SLLL0- 2 3 reduced) for given k, δ ≥ η , α = 1/(δ − 4 ) if it is size-reduced and 2 2 2 2 1. δ kqik ≤ µi+1,ikqik + kqi+1k for i ∈ [1, n − 1] \ kZ, k2 2. Dl,k ≤ (α/δ) Dl+1,k for l = 1, . . . , h − 1.

Size-reducedness under fpa is defined by clause 1 of Theorem 5. Segment Bl,k of an SLLL0-basis is LLL-reduced in the sense that the k ×k-submatrix [rlk+i,lk+j]−k

n−1 1 Theorem 9. Thm. 3 of [70]. kb1k ≤ (α/δ) 4 (det L) n holds for all SLLL0-bases b1, ..., bn .

The dual of Theorem 9. Clause 2 of Def. 4 is preserved under duality. If it holds for ∗ ∗ ∗ a basis b1, ..., bn it also holds for the dual basis b1, ..., bn of the lattice L . We have ∗ −1 ∗ −1 that kb1k = kqnk and det(L ) = (det L) . Hence, Theorem 9 implies that every n−1 1 SLLL0-basis satisfies kqnk ≥ (δ/α) 4 (det L) n .

Local LLL-reduction. The procedure locLLL(Rl,k)of [S06] locally LLL-reduces Rl,k ⊂ 0 0 R given Hl,k ⊂ H. Initially it produces a copy [b1, ..., b2k] of Rl,k. It LLL-reduces 0 0 the local basis [b1, ..., b2k] consisting of fpa-vectors. It updates and stores the local 2k×2k 0 0 transform Tl,k ∈ Z so that [b1, ..., b2k] = Rl,kTl,k always holds for the current 0 0 0 0 local basis [b1, ..., b2k] and the initial Rl,k. E.g., it does col(l ,Tl,k) := col(l ,Tl,k) − 0 0 0 0 µ col(i, Tl,k) along with bl0 := bl0 − µ bi within TriColl. It freshly computes bl0 from 0 the updated Tl,k. Using a correct Tl,k this correction of bl0 limits fpa-errors of the local basis, see Cor. 5. Local LLL-reduction of Rl,k is done in local coordinates of dimension 2k. A local LLL-swap merely requires O(k2) arithmetic steps, update of

25 Rl,k, local triangulation and size-reduction via TriColl included, compared to O(nm) arithmetic steps for an LLL-swap in global coordinates.

SLLL0-algorithm. SLLL0 transforms a given basis into an SLLL0-basis. It iterates locLLL(Rl,k) for submatrices Rl,k ⊂ R, followed by a global update that transports Tl,k to B and triangulates Bl,k,Bl,k+1 via TriSegl,k. Transporting Tl,k to B,R,T1,n/2 and so on means to multiply the submatrix consisting of 2k columns of B,R,T1,n/2 corresponding to Rl,k from the right by Tl,k.

SLLL0 d INPUT b1,..., bn ∈ Z (a basis with M0,M1,M), k, m, δ OUTPUT b1,..., bn SLLL0-basis for k, δ k2 WHILE ∃ l, 1 ≤ l < m such that either Dl,k > (α/δ) Dl+1,k or TriSegl,k has not yet been executed DO for the minimal such l: TriSegl,k, locLLL(Rl,k) # global update:[Bl,k,Bl+1,k] := [Bl,k,Bl+1,k] Tl,k, TriSegl,k.

The procedure TriSegl,k triangulates and size-reduces two adjacent segments Bl,k,Bl+1,k. Given Bl,k,Bl+1,k and h1, ..., hlk−k, it computes [rlk−k+1, ..., rlk+k] ⊂ R and [hlk−k+1, ..., hlk+k] ⊂ H.

TriSegl,k 0 1. FOR l = lk − k + 1, ..., lk + k DO TriColl0 (including updates of Tl,k) Qk−1 2 2. Dj,k := i=0 rkj−i,kj−i for j = l, l + 1.

k2 Correctness in ideal arithmetic. All inequalities Dl,k ≤ (α/δ) Dl+1,k hold upon ter- mination of SLLL0. All segments Bl,k are locally LLL-reduced and globally size- reduced and thus the terminal basis is SLLL0-reduced.

The number of rounds of SLLL0. Let #k denote the number of loclll(Rl,k)- k2 executions due to Dl,k > (α/δ) Dl+1,k for all l. The first loclll(Rl,k)-executions for each l is possibly not counted in #k, this yields at most n/k −1 additional rounds. #k can be bounded by the Lov´asz volume argument.

−3 Theorem 10. Thm. 4 of [70]. #k ≤ 2 n k log1/δ M.

All intermediate Ml,k-values within SLLL0 are bounded by the M1-value of the 2k×2k input basis of SLLL0. Consider the local transform Tl,k ∈ Z within locLLL(Rl,k). Let kTl,kk1 denote the maximal k k1-norm of the columns of Tl,k.

3 2k Lemma 4. [70] Within locLLL(Rl,k) we have that kTl,kk1 ≤ 6k( 2 ) Ml,k.

Next consider locLLL(Rl,k) under fpa based on the iterative fpa-version of TriColl. P 2 1/2 Let k[ri,j]kF = ( i,j ri,j) denote the Frobenius norm. [S06] shows

26 Corollary 4. [fpa-Heur.] 0 ¯0 1. Within locLLL(Rl,k) the current Rl,k := Rl,kTl,k and its approximation Rl,k satisfy ¯0 0 ¯ 2k −t kRl,k − Rl,kkF ≤ kRl,k − Rl,kkF 2 Ml,k + 7nkRl,kkF 2 . t 10 n 2 ¯ 2. Let TriSegl,k and locLLL use fpa with precision 2 ≥ 2 d ρ M1 . If Rl,k is computed by TriSegl,k then locLLL(R¯l,k) computes a correct Tl,k so that Rl,kTl,k

is LLL-reduced with δ− . √ Theorem 11. Thm. 5 of [70] using fpa-Heur. Let k = Θ( n). Given a basis with t 10 n 2 M0,M1,M, SLLL0 computes under fpa with precision 2 ≥ 2 m ρ M1 an SLLL0- 2 basis for δ− . It runs in O(nm log1/δ M) arithmetic steps using 2n + log2(M0M1 )-bit integers.

SLLL0 saves a factor n in the number of arithmetic steps compared to LLLH but O(n) uses longer integers and fpa numbers. SLLL0 runs for M0 = 2 , and thus for 2 M = 2O(n ), in O(n3m) arithmetic steps using O(n2) bit integers. Algorithm SLLL of section 9 reduces the bit length O(n2) to O(n).

9 Gradual SLLL Using Short fpa-Numbers

SLLL reduces the required precision and the bit length of integers and fpa numbers n compared to SLLL0. This results from limilting the norm of local transforms to O(2 ). Theorem 14 shows that SLLL-bases are as strong as LLL-bases. For input bases of length 2O(n) and d = O(n) SLLL performs O(n5+o(1)) bit operations compared to 6+o(1) O(n ) bit operations for LLLH , SLLL0 and the LLL-algorithms of [65], [73]. The advantage of SLLL is the use of small integers which is crucial in practice. The use of small integers and short intermediate bases within SLLL rests on a gradual LLL-type reduction so that all local LLL-transforms Tl,2σ of Rl,2σ have norm O(2n). For this we work with segments of all sizes 2σ and to perform LLL-reduction on Rl,2σ with a measured strength, i.e., SLLL-reduction according to Definition 6. If the submatrices R2l,2σ−1 ,R2l+1,2σ−1 ⊂ Rl,2σ are already SLLL-reduced then locLLL(Rl,k) n performs a transform Tl,2σ bounded as kTl,2σ kF = O(2 ). This is the core of fpa- correctness of SLLL. Comparison with Schonhage¨ ’s semi-reduction [71]. Semi-reduction also uses seg- ments but proceeds without adjusting LLL-reduction according to Def. 5 and does not 4 n/2 satisfy Theorems 11 and 12. While SLLL achieves length defect kb1k/λ1 ≤ ( 3 +ε) n semi-reduction achieves merely kb1k/λ1 ≤ 2 . SLLL is practical even for small n, all O-constants and n0-values are small. 1 √ s √ We let n be a power of 2. We set s := d 2 log2 ne so that n ≤ 2 < 2 n.

m 1 Definition 5. A basis b1, ..., bn ∈ R is an SLLL-basis (or SLLL-reduced) for δ ≥ 2 1 σ if it satisfies for σ = 0, ..., s = d 2 log2 ne and all l, 1 ≤ l < n/2 : 4σ −n Dl,2σ ≤ α δ Dl+1,2σ .

27 If the inequalities of Def. 6 hold for a basis they also hold for the dual basis. Thus the dual of an SLLL-basis is again an SLLL-basis. To preserve SLLL-reducedness by duality we do not require SLLL-bases to be size-reduced. 2 −n 2 The inequalities of Def. 6 for σ = 0 mean that kqlk ≤ αδ√ kql+1k holds for all l. The inequalities of Def. 6 are merely required for 2σ ≤ 2 n. Therefore, SLLL σ √ locally LLL-reduces Rl,2σ via locLLL(Rl,2σ ) merely for segment sizes 2 < 2 n, where size-reduction of a vector requires O(22σ) = O(n) arithmetic steps.

k2 σ The inequalities of Def. 6 and Dl,k ≤ (α/δ) Dl+1,k of Def. 5 coincide for k = 2 n4−σ when setting δ := δσ in Def. 6, and δσ := δ for the δ of Def. 6. Note that δσ can 1 σ √ be arbitrarily small, e.g. δσ  4 , δσ decreases with σ. In particular for 2 = k ≥ n σ 2 we have that α4 δ−n ≤ (α/δ)k and thus the inequalities of Def. 6 are stronger than the ones of Def. 5. Thm 12 shows that the vectors of SLLL-bases approximate the successive minima in nearly the same way as for LLL-bases.

Theorem 12. Thm. 6 of [70]. Every size-reduced SLLL-basis satisfies 2 j−1 −7n 2 1. λj ≤ α δ rj,j for j = 1, . . . , n, 2 j−1 −7n 2 2. kblk ≤ α δ rj,j for l ≤ j, 2 n−1 −7n 2 3. kbjk ≤ α δ λj for j = 1, . . . , n.

LLLSegl,1

# Given Rl,1, b1, ..., bl+1, h1, ..., hl, r1, ..., rl, LLLSegl,1 LLL-reduces Rl,1. n+1 0 1. IF rl,l/rl+1,l+1 > 2 THEN [ Rl,1 := Rl,1, 0 0 −n−1 0 row(2,Rl,1) := row(2,Rl,1) 2 rl,l/rl+1,l+1 locLLL(Rl,1), # global update:[bl, bl+1] := [bl, bl+1] Tl,1, TriColl, TriColl+1 ]

2. locLLL(Rl,1).

SLLL uses the procedure LLLSegl,1 that breaks locLLL(Rl,1) up into parts, each n+1 with a bounded transform kTl,1k1 ≤ 9 · 2 . This keeps intermediate bases of length n O(4 M0) and limits fpa-errors within LLLSegl,1. LLLSegl,1 LLL-reduces the basis r r h l,l l,l+1 i n+1 Rl,1 = ⊂ R after dilating row(2,Rl,1) so that rl,l/rl+1,l+1 ≤ 2 . 0 rl+1,l+1 After the LLL-reduction of the dilated Rl,1 we undo the dilation, by transporting the 2×2 local transform Tl,1 ∈ Z to B. LLLSegl,1 includes global updates between local rounds.

LLLSegl,1 performs O(nm) arithmetic steps [70] Le.3. An effectual step 1 decreases (1) −n/2 n+1 D by a factor 2 via a transform Tl,1 satisfying kTl,1k1 ≤ 9 · 2 .

28 SLLL m INPUT b1,..., bn ∈ Z (a basis with M0,M1,M), δ, α, ε OUTPUT b1,..., bn size-reduced SLLL-basis for δ, ε 0 1 1. TriCol1, TriCol2, l := 2, s := d 2 log2 ne 0 # TriColl0 has always been executed for the current l σ 0 4σ −n 2. WHILE ∃ σ ≤ s, l, 2 (l + 1) ≤ l such that Dl,2σ > α δ Dl+1,2σ

# Note that r1,1, ..., rl0,l0 and thus Dl,2σ , Dl+1,2σ are given DO for the minimal such σ and the minimal l:

IF σ = 0 THEN LLLSegl,1 ELSE locLLL(Rl,2σ ) #global update: transport Tl,2σ to B, TriSegl,2σ 0 0 0 3. IF l < n THEN [ l := l + 1, TriColl0 , GOTO 2. ]

4σ −n Correctness in ideal arithmetic. All inequalities Dl,2σ ≤ α δ Dl+1,2σ hold upon termination of SLLL. As TriSegl,2σ results in size-reduced segments Bl,2σ , Bl+1,2σ the terminal basis is size-reduced.

Theorem 13. Thm. 7 of [70] using fpa-Heur. Given a basis with M0,M, SLLL

finds under fpa of precision t = 3n + O(log m) an SLLL-basis for δ− . It runs in O(nm log2 n log1/δ M) arithmetic steps using integers of bit length 2 n + log2 M0.

O(n) 4 For M0 = 2 and m = O(n) SLLL runs in O(n log n) arithmetic steps, and in O(n6+ε)/ O(n5+ε) bit operations under school-/FFT-multiplication. SLLL-bases versus LLL-bases. LLL-bases with δ satisfy the inequalities of Theorem n−1 14 with δ replaced by 1. Thus kbjk approximates λj to within a factor α 2 for LLL- 7 n−1 0 1/8 bases, resp., within a factor (α/δ ) 2 for SLLL-bases. But SLLL-bases for δ = δ are ”better” than LLL-bases for δ, in the sense that they guarantee a smaller length 0 07 1 1 1 defect, because α /δ = δ08−δ07/4 = δ−δ07/4 < δ−1/4 = α.

Dependence of time bounds on δ. The time bounds contain a factor log1/δ 2, δ log1/δ 2 = log2(e)/ ln(1/δ) ≤ log2(e) , 1−δ √ since ln(1/δ) ≥ 1/δ − 1. We see that replacing δ by δ essentially halves 1 − δ and doubles the SLLL-time bound. Hence, replacing δ by δ1/8 increases the SLLL-time bound at most by a factor 3. In practice, the LLL-time may increase slower than by δ the factor 1−δ as δ approaches 1, see [41] Fig.3. Reducing a generator system. There is an algorithm that, given a generator matrix B ∈ Zm×n of arbitrary rank ≤ n, transforms B with the performance of SLLL, into an SLLL-basis for δ− of the lattice generated by the columns of B. SLLL-Reduction via iterated subsegments. SLLL+ of [70] is a variant of SLLL that extends LLL-operations stepwise to larger and larger submatrices Rl,2σ ⊂ R by transporting local transforms from level σ − 1 to level σ recursively for σ = 1, ..., s = log2 n. Local LLL-reduction and the transport of local LLL-transforms is done by a 0 local procedure locSLLL(Rl,2σ ) that recursively executes locSLLL(Rl0,2σ−1 ) for l =

29 2l − 1, 2l, 2l + 1. SLLL+ does not iterate the global procedure TriSeg but a faster local one.

m s + + Definition 6. A basis b1,..., bn ∈ Z with n = 2 is an SLLL -basis (or SLLL - reduced) for δ if it satisfies for σ = 0, ..., s = log2 n 4σ σ Dl,2σ ≤ (α/δ) Dl+1,2σ for odd l ∈ [1, n/2 [. (7) Unlike to Def. 5 and Def. 6 the inequalities (7) are not required for even l, this opens new efficiencies for SLLL+-reduction. The inequalities (7) hold for each σ and odd l locally in double segments [Bl,2σ ,Bl+1,2σ ], they do not bridge these pairwise disjoint 2 2 double segments. For σ = 0 the inequalities (7) mean that kqlk ≤ α/δ kql+1k holds for odd l. + The inequalities (7) are preserved under duality. If b1, ..., bn is an SLLL -basis ∗ ∗ then so is the dual basis b1, ..., bn. Theorem 16 extends Theorem 11 and shows that + 2 the first vector of an SLLL -basis is almost as short relative to (det L) n as for LLL- bases.

+ Theorem 14. Thm. 8 of [70]. Every SLLL -basis b1,..., bn, where n is a power n−1 1 n−1 1 of 2 satisfies 1. kb1k ≤ (α/δ) 4 (det L) n 2. kqnk ≥ (δ/α) 4 (det L) n .

Theorem 15. Thm. 9 of [70]. In ideal arithmetic algorithm SLLL+ of [S06] com- + 2 putes a size-reduced SLLL -basis for δ and runs in O(n m+n log2 n log1/δ M) arith- metic steps.

+ SLLL requires t = O(log(M0M1)) = O(n log M0) precision bits to cover the fpa-errors that get accumulated by the initial TriSeg and by iterating locTri. For O(n) + M0 = 2 and m = O(n) SLLL saves a factor n in the number of arithmetic steps compared to SLLL but requires n-times longer fpa-numbers.

Acknowledgement. I like to thank P. Nguyen and D. Stehl´efor useful comments.

References

1. E. Agrell, T. Eriksson, A. Vardy and K. Zeger, Closest Point Search in Lattices. IEEE Trans. on Inform. Theory, 48(8), pp. 2201–2214, 2002. 2. M. Ajtai, The Shortest Vector Problem in L2 is NP-hard for Randomized Reductions. In Proc. 30th STOC, ACM, pp. 10–19, 1998. 3. M. Ajtai, The Worst-case Behavior of Schnorr’s Algorithm Approximating the Shortest Nonzero Vector in a Lattice. In Proc. 35th STOC, ACM, pp. 396–406, 2003. 4. M. Ajtai, R. Kumar and D. Sivakumar, A Sieve Algorithm for the Shortest Lattice Vector Problem. In Proc. 33th STOC, ACM, pp. 601–610, 2001. 5. A. Akhavi, Worst Case Complexity of the Optimal LLL. In Proc. of LATIN 2000, LNCS 1776, Springer-Verlag, Berlin NewYork, 2000. 6. A. Akhavi and D.Stehl´e, Speeding-up Lattice Reduction with Random Projections. In Proc. of LATIN 2008, to be published in LNCS by Springer-Verlag, Berlin New York, 2008.

30 7. W. Backes and S. Wetzel, Heuristicsc on Lattice Reduction in Practice. Journal of Experi- mental Algorithm, ACM 7(1), 2002. 8. G. Bergman, Notes on Ferguson and Forcade’s Generalized . TR. Dep. of Mathematics, University of Berkeley, CA, 1980. 9. D. Bleichenbacher and A. May, New Attacks on RSA with Small Secret CRT-Exponents. In Proc. PKC 2006, LNCS 3958, Springer-Verlag, Berlin New York, pp. 1–13, 2006. 10. J. Bl¨omer and A. May, New Partial Key Exposure Attacks on RSA. In Proc. Crypto 2003, LNCS 2729, Springer-Verlag, Berlin New York, pp. 27–43, 2003. 11. J. Bl¨omer and J.P. Seifert, On the Complexity of Computing Short Linearly Independent Vectors and Short Bases in a Lattice. In Proc. 31th STOC, ACM, pp. 711–720, 1999. 12. J. Cai, The Complexity of some Lattice Problems. In Proc. Algorithmic- Number Theory, LNCS 1838, Springer-Verlag, Berlin New York, pp. 1–32, 2000. 13. J.W.S. Cassels, Rational Quadratic Forms. London Mathematical Society Monographs 13 , Academic Press Inc., Londen, 1978. 14. H. Cohen, A Course in Computational Number Theory, second edition, Springer-Verlag, Berlin New-York, 2001. 15. H. Cohn and A. Kumar, Optimality and Uniqueness of the Leech Lattice Among Lattices. arXiv:math.MG/04 03263v1 16 Mar 2004. 16. J.H. Conway and N.J.A. Sloane, Sphere Packings, Lattices and Groups. Third edition, Springer-Verlag, Berlin New York, 1998. 17. D. Coppersmith, Small Solutions to Polynomial Equations, and Low Exponent RSA Vulner- abilities. J. Cryptology, 10, pp. 233–260, 1997. 18. D. Coppersmith, Finding Small Solutions to Small Degree Polynomials. In Proc. CaLC 2001, LNCS 2146, Springer-Verlag, Berlin New York, pp. 20–1, 2001. 19. J.Demmel and Y. Hida, Accurate floating point summation. TR University Berkeley, 2002, http://www.cs.berkeley.edu/ demmel/AccurateSummation.ps. 20. P. van Emde Boas, Another NP-Complete Partition Problem and the Complexity of Com- puting Short Vectors in a Lattice. Mathematics Department, University of Amsterdam, TR 81-04, 1981. 21. U. Fincke and M. Pohst, A Procedure for Determining Algebraic Integers of a Given Norm. In Proc. EUCAL, LNCS 162, Springer-Verlag, Berlin New York, pp. 194–202, 1983. 22. C. F. Gauss, Disquisitiones Arithmeticae. 1801; English transl., Yale Univ. Press, New Haven, Conn. 1966. 23. N. Gama, N. How-Grave-Graham, H. Koy and P. Nguyen, Rankin’s Constant and Blockwise Lattice Reduction. In Proc. CRYPTO 2006, LNCS 4117, Springer-Verlag, Berlin New York, pp. 112–139, 2006. 24. N. Gama and P. Q. Nguyen, Finding short lattice vectors within Mordell’s inequality. In Proc. of the 2008 ACM Symposium on Theory of Computing, pp. 207–216, 2008. 25. O. Goldreich and S. Goldwasser, On the Limits of Nonapproximability of Lattice Problems. of Comp. and System Sciences, 60(3), pp. 540–563, 2000. Preliminary version in STOC ’98. 26. G. Golub and C. van Loan, Matrix Computations. John Hopkins University Press, 1996. 27. M. Gr¨otschel, L. Lov´asz,and A. Schrijver, Geometric Algorithms and Combinatorial Opti- mization, Springer-Verlag, Berlin New-York, 1988. 28. G. Hanrot and D. Stehl´e, Improved Analysis of Kannan’s Shortest Lattice Vector Algorithm. In Proc. CRYPTO 2007, LNCS 4622, Springer-Verlag, Berlin New York, pp. 170–186, 2007. 29. H.J. Hartung and C.P. Schnorr, Public Key Identification Based on the Equivalence of Quadratic Forms. In Proc. of Math. Found. of Comp. Sci., Aug. 26 –Aug. 31, Cesk´yKrumlov,˘ Czech Republic, LNCS 4708, Springer-Verlag, Berlin New York, pp. 333–345, 2007. 30. R.J. Hartung and C.P. Schnorr, Identification and Signatures Based on NP-hard Problems of Indefinite Quadratic Forms. J. of Math. Crypt. 2, pp. 327– 341, 2008. Preprint University Frankfurt, 2007,//www.mi.informatik.uni-frankfurt.de/research/papers.html. 31. J. H˚astad, B. Just, J.C. Lagarias und C.P. Schnorr, Polynomial Time Algorithms for Find- ing Integer Relations Among Real Numbers, SIAM Journal on Computing, 18, pp. 859–881, 1989.

31 32. B. Helfrich, Algorithms to Construct Minkowski Reduced and Hermite Reduced Bases. Theor. Comput. Sci. 41, pp. 125–139, 1985. 33. C. Hermite, Extraits de l`ettres de M. Ch. Hermite`aJacobi sur differents objects de la th´eorie des nombres, deuxi`eme lettre, J. Reine Angewandte Mathematik, 40, pp. 279–290, 1850. 34. N.J. Higham, Accuracy and Stability of Numerical Algorithms. SIAM, Philadelphia, 2nd edition, 2002. 35. R. Kannan, Minkowski’s Convex Body Theorem and . Math. Oper. Res., 12, pp. 415–440, 1987. 36. S. Khot, Hardness of Approximating the Shortest Vector Problem in Lattices. In Proc. FOCS, 2004. 37. D.E. Knuth, The Art of Computer Programming, Vol. 2, Addison -Wesley, , Boston, third edition, 1997. 38. A. Korkine und G. Zolotareff, Sur les Formes Quadratiques, Mathematische Annalen, 6, pp. 366–389, 1873. 39. H. Koy, Primal/duale Segment-Reduktion von Gitterbasen, Lecture Universit¨atFranfurt 2000, files from Mai 2004. //www.mi.informatik.uni-frankfurt.de/research/papers.html 40. H. Koy and C.P. Schnorr, Segment LLL-Reduction. In Proc. CaLC 2001, LNCS 2146, Springer-Verlag, Berlin New York, pp.67–80, 2001. //www.mi.informatik.uni- frankfurt.de/research/papers.html 41. H. Koy and C.P. Schnorr, Segment LLL-Reduction with Floating Point Orthogonaliza- tion. In Proc. CaLC 2001, LNCS 2146, Springer-Verlag, Berlin New York, pp. 81–96, 2001. //www.mi.informatik.uni-frankfurt.de/research/papers.html 42. H. Koy and C.P. Schnorr, Segment and Strong Segment LLL-Reduction of Lattice Bases. TR Universit¨at Franfurt, April 2002, //www.mi.informatik.uni- frankfurt.de/research/papers.html 43. J.L.Lagrange, Recherches d’arithm´etique.Nouveaux M´emoires de l’Acad´emie de Berlin, 1773. 44. C.L. Lawson and R.J. Hanson, Solving Least Squares Problems. Siam, Philadelphia, 1995. 45. H.W. Lenstra, Jr., Integer Programming With a Fixed Number of Variables. Math. Oper. Res. 8, pp. 538-548, 8, 1983. 46. LIDIA, a C++ Library for computational number theory. The LIDIA Group, http://www.informatik.tu-darmstadt.de/TI/LiDIA/ 47. A. K. Lenstra, H. W. Lenstra, Jr and L. Lov´asz, Factoring Polynomials with Rational Co- efficients. Math. Ann., 261, pp. 515–534, 1982. 48. J.C. Lagarias, H.W. Lenstra, Jr and C.P. Schnorr, Korkin-Zolotarev Bases and Successive Minima of a Lattice and its Reciprocal Lattice. Combinatorica, 10, pp. 333–348, 1990. 49. L. Lov´asz, An Algorithmic Theory of Numbers, Graphs and Convexity, CBMS-NSF Regional Conference Series in Applied Mathematics, 50, Siam Publications, Philadelphia, 1986. 50. L. Lov´aszand H. Scarf, The Generalized Basis Reduction Algorithm. Math. of Oper. Re- search, 17 (3), pp. 754–764, 1992. 51. C. Ludwig, Practical Lattice Basis Reduction. Dissertation, TU-Darmstadt, December 2005, http://elib.tu-darmstadt.de/diss/000640 and http://deposit.ddb.de/cgi- bin/dokserv?idn=978166493. 52. J. Martinet, Perfect Lattices in Euclidean Spaces. Springer-Verlag, Berlin New York, 2002. 53. A. May, Computing the RSA Secret Key is Deterministic Polynomial Time Equivalent to Factoring. In Proc. CRYPTO 2004, LNCS 3152, Springer-Verlag, Berlin New York, pp. 213– 219, 2004. 54. D. Micciancio and S. Goldwasser, Complexity of Lattice Problems, A Cryptographic Per- spective. Kluwer Academic Publishers, London, 2002. 55. V. Shoup, Number Theory Library. http: //www.shoup.net/ntl/ 56. P.Q. Nguyen and D. Stehl´e, Floating-Point LLL Revisited. In Proc. Eurocrypt’05, LNCS 3494, Springer-Verlag, Berlin New York, pp. 215–233, 2005. 57. P. Nguyen and D. Stehl´e, LLL on the Average. In Proc. ANTS-VII, LNCS 4076, Springer- Verlag, Berlin New York, pp. 238–356, 2006.

32 58. P.Q. Nguyen and J. Stern, Lattice Reduction in Cryptology, An Update. Algorithmic Num- ber Theory, LNCS 1838, Springer-Verlag, Berlin New York, pp. 85–112, 2000. full version http://www.di.ens.fr/pnguyen,stern/ 59. P.Q. Nguyen and J. Stern, The Two Faces of Cryptology, In Proc. CaLC’01, LNCS 2139, Springer-Verlag, Berlin New York, pp. 260–274, 2001. 60. P.Q. Nguyen and T. Vidick, Sieve Algorithms for the Shortest Vector Problem are Practical. Preprint, 2007. 61. A.M.Odlyzko, The Rise and the Fall of Knapsack Cryptosystems, In Proc. of Cryptology and Computational Number Theory, vol. 42 of Proc. of Symposia in Applied Mathematics, pp. 75–88, 1989. 62. O. Regev, Lecture Notes of Lattices in Computer Science, taught at the Computer Science Tel Aviv University, 2004; available at http://www.cs.tau.il/ odedr. 63. C. R¨ossner and C.P. Schnorr, An Optimal, Stable Continued Fraction Algorithm for Ar- bitrary Dimension. In Proc. 5-th IPCO 1996, LNCS 1084, Springer, Berlin New York, pp. 31–43, 1996. 64. C.P. Schnorr, A Hierarchy of Polynomial Time Lattice Basis Reduction Algorithms. Theoret. Comput. Sci., 53, pp. 201–224, 1987. 65. C.P. Schnorr, A More Efficient Algorithm for Lattice Reduction. J. of Algor. 9, 47–62, 1988. 66. C.P. Schnorr and M. Euchner, Lattice Basis Reduction and Solving Subset Sum Problems. In Proc. Fundamentals of Comput. Theory, LNCS 591, Springer-Verlag, Berlin New York, pp. 68–85, 1991. Complete paper in Math. Programming Studies, 66A, 2, pp. 181–199, 1994. 67. C.P. Schnorr, Block Reduced Lattice Bases and Successive Minima. Combin. Probab. and Comput., 3, pp. 507-522, 1994. 68. C.P. Schnorr and H.H. H¨orner, Attacking the Chor-Rivest cryptosystem by improved lattice reduction. In Proc. Eurocrypt 1995, LNCS 921, Springer-Verlag,Berlin New York, pp. 1–12, 1995. 69. C.P. Schnorr, Lattice Reduction by Random Sampling and Birthday Methods. In Proc. STACS 2003, Eds. H. Alt and M. Habib, LNCS 2607, Springer-Verlag, Berlin New York, pp. 145–156, 2003. 70. C.P. Schnorr, Fast LLL-type lattice reduction. Information and Computation, 204, pp. 1–25, 2006. //www.mi.informatik.uni-frankfurt.de 71. A. Sch¨onhage, Factorization of Univariate Integer Polynomials by Diophantine Approxima- tion and Improved Lattice Basis Reduction Algorithm. In Proc. ICALP 1984, LNCS 172, Springer-Verlag, Berlin New York, pp. 436–447, 1984. 72. D.Simon, Solving Quadratic Equations Using Reduced Unimodular Quadratic Forms. Math. of Comp. 74 (251), pp. 1531–1543, 2005. 73. A. Storjohann, Faster Algorithms for Integer Lattice Basis Reduction. TR 249, Swiss Federal Institute of Technology, ETH-Zurich, July 1996. //www.inf.ethz.ch/research/publications/html. 74. D. Stehl´e, Floating Point LLL: Theoretical and Practical Aspects. This Proceedings. 75. G. Villard, Certification of the QR Factor R, and of Lattice Basis Reducedness. LIP Research Report RR2007-03, ENS de Lyon, 2007. 76. J.H. Wilkinson, The Algebraic Eigenvalue Problem. Oxford University Press, 1965, reprinted 1999.

33