Fast Fourier Orthogonalization

Fast Fourier Orthogonalization ∗ y Léo Ducas Thomas Prest CWI, Cryptology Group Thales Communications & Security Amsterdam, The Netherlands Gennevilliers, France [email protected] [email protected] ABSTRACT when the matrix is circulant. This is achieved by exploiting the isomorphism between the ring of d × d circulant matrices The classical fast Fourier transform (FFT) allows to compute d in quasi-linear time the product of two polynomials, in the and the circular convolution ring Rd = R[x]=(x − 1). d A widely studied and more involved question is matrix circular convolution ring R[x]=(x − 1) | a task that naively requires quadratic time. Equivalently, it allows to accelerate decomposition [20] for structured matrices, in particular matrix-vector products when the matrix is circulant. Gram-Schmidt orthgonalization (GSO). In this work, we are In this work, we discover that the ideas of the FFT can be interested in the GSO of circulant matrices, and more gener- applied to speed up the orthogonalization process of matrices ally of matrices with circulant blocks. Our main motivation with circulant blocks of size d × d. We show that, when d is is to accelerate lattice algorithms for lattices that admit a composite, it is possible to proceed to the orthogonalization basis with circulant blocks. This use case allows a helpful in an inductive way |up to an appropriate re-indexation of extra degree of freedom: one may permute rows and columns rows and columns. This leads to a structured Gram-Schmidt of the lattice basis since this leaves the generated lattice decomposition. In turn, this structured Gram-Schmidt de- unchanged |up to an isometry. composition accelerates a cornerstone lattice algorithm: the As we will show, a proper re-indexation of these matri- nearest plane algorithm. The complexity of both algorithms ces highlights an inductive structure, with a fast Fourier may be brought down to Θ(d log d). flavor. This leads to accelerations of the orthogonalization Our results easily extend to cyclotomic rings, and can be process |and of the related nearest plane algorithm| down adapted to Gaussian samplers. This finds applications in to quasilinear time and space. lattice-based cryptography, improving the performances of trapdoor functions. The Nearest Plane Algorithm, Lattices and Cryptography Keywords. Fast Fourier transform, Gram-Schmidt orthogonalization, nearest plane algorithm, lattice algorithms, The nearest plane algorithm [1] is a central algorithm over lattice trapdoor functions. lattices. It allows, after precomputation of the GSO and using a quadratic number of real operations, to find a rel- atively close point in a lattice to an arbitrary target. It is 1. INTRODUCTION a core subroutine of LLL [13], and can be used for error When operations involving linear algebra are to be per- correction over analogical noisy channels. It has also found formed over structured matrices, a natural problem is to applications in lattice-based cryptography as a decryption al- accelerate them by exploiting the structure. The most classi- gorithm, and a randomized variant (called discrete Gaussian cal example is the fast Fourier transform [2, 8], which allows sampling) [11, 5] provides secure trapdoor functions based to perform matrix-vector multiplication in quasilinear time on lattice problems. This leads to cryptosystems (attribute- based encryption) with fine-grained access control, as [18, 6] ∗ This work has been supported by a grant from CWI from budget to name a few. d for public-private-partnerships and in part by a grant from NXP Given a basis B of a lattice L ⊂ R and a target vector c, Semiconductors. Part of this work was also supported by an NWO the nearest plane algorithm finds a lattice point somewhat Free Competition Grant. close to c. The result belongs to a fundamental domain y ´ Part of this work was done when the author was at the Ecole centered in c, whose shape is the cuboid defined by B~ , the Normale Supérieure, Paris. This work was partially funded by the 2 European Union H2020 SAFEcrypto project (grant no. 644729). GSO of B (see Figure 1). This algorithm performs Θ(d ) real operations. The GSO itself is required as a precomputation. Structured lattices in cryptography. When it comes to practical lattice-based cryptography, a quadratic cost in the dimension is rather prohibitive consid- ering the lattices at hand have dimensions ranging in the hundreds, or even thousands. For efficiency purposes, many cryptosystems (such as [10, 15, 16] to name a few) chose to Full version of the article to appear at the 41st International Symposium rely on lattices with some algebraic structure, improving time on Symbolic and Algebraic Computation (ISSAC 2016). and memory requirements to quasilinear in the dimension. This is sometimes referred as lattice-based cryptography in must follow the tower of rings R ⊂ Rd1 ⊂ ··· ⊂ Rdi−1 ⊂ Rd, the ring setting. Technically, the chosen rings typically are for some chain of divisors 1jd1j ::: jdi−1jd. cyclotomic rings, but those are closely related to the convo- Such a representation is obtained by applying the (mixed- lution rings discussed so far. The core of this optimization radix) digit-reversal order to the indexation of the rows and is the fast Fourier transform (FFT) [2, 4, 8, 19] allowing column of the circulant blocks, as pictured in Figure 2. fast multiplication of polynomials. But this improvement We show that this alternative indexation allows to repre- did not apply in the case of the nearest plane algorithm or sent the matrix L of the GSO in a factorized form: a product its randomized variant [11, 5]: na¨ıve GSO do not take the of Θ(log d) (sparse) structured matrices, each of which can algebraic structure into account. be stored in space O(d). An example is given in Figure 3. One possible work-around [9, 22] consist of using the round- Once this hidden structure is unveiled (Theorem 1), the al- off algorithm [1] instead of the nearest-plane algorithm. How- gorithmic implications follow quite naturally: one first easily ever, this simpler algorithm outputs further vectors, both in derive an algorithm in time O(n log2 n) |matching previous the average and worst cases, weakening those cryptosystems. and more general results [21]| but noting that the splitting step may be performed directly in the Fourier domain allows us to save another log n factor. For easier algorithmic manipulations, the factorization of L is represented using a c c tree. 2 3 2 0 1 2 3 4 5 6 7 3 0 2 4 6 1 3 5 7 7 0 1 2 3 4 5 6 6 6 0 2 4 7 1 3 5 7 Round-off Nearest plane 6 7 6 7 6 6 7 0 1 2 3 4 5 7 6 4 6 0 2 5 7 1 3 7 6 7 6 2 4 6 0 3 5 7 1 7 Figure 1: Round-off and nearest plane algorithms, 6 5 6 7 0 1 2 3 4 7 6 7 6 7 ) 6 7 and their associated fundamental domains. 6 4 5 6 7 0 1 2 3 7 6 7 1 3 5 0 2 4 6 7 6 7 6 7 6 3 4 5 6 7 0 1 2 7 6 5 7 1 3 6 0 2 4 7 6 7 6 7 Our contribution 4 2 3 4 5 6 7 0 1 5 4 3 5 7 1 4 6 0 2 5 1 2 3 4 5 6 7 0 1 3 5 7 2 4 6 0 In this work, we discover new algorithms, obtained by cross- C(a) C(M (a)) ing Cooley-Tukey's [2] fast Fourier transform algorithm to- 8=4 gether with the orthogonalization and nearest plane algo- 2 0 4 2 6 1 5 3 7 3 ? rithms (not exactly the GSO, but the closely related LDL de- 6 4 0 6 2 5 1 7 3 7 6 7 composition). Precisely, we show that, up to a re-indexation 6 6 2 0 4 7 3 1 5 7 6 7 of rows and columns, the orthogonalization of matrices com- 6 2 6 4 0 3 7 5 1 7 posed of d × d-circulant blocks can be done in time Θ(d log d) ) 6 7 6 7 3 1 5 0 4 2 6 7 6 7 when the prime factors of d are bounded. Our algorithm pro- 6 3 7 5 1 4 0 6 2 7 ? 6 7 duces the LDL decomposition in a special compact format, 6 7 requiring Θ(d log d) complex numbers to represent. 4 5 1 7 3 6 2 0 4 5 From this compact representation, the nearest plane algo- 1 5 3 7 2 6 4 0 1 rithm can also be performed using Θ(d log d) real operations. C(M8=2(a)) = M8=1(a) As a demonstration of the simplicity of our algorithms, we propose an implementation in python for d × d-circulant Figure 2: Re-indexing the transformation matrix of 2 7 matrices, when d is a power of 2. f 2 R8 7! fx (a = 0 + x + 2x + ··· + 7x 2 R8). Computational model and Parallelism. Our algorithms and their complexity are described for 2 1 3 2 1 3 Random Access Machines with unitary arithmetic operations 6 1 7 6 1 7 on real numbers, i.e. we do not study the issue related to 6 7 6 7 6 1 7 6 e f 1 7 floating-point approximations and numerical stability. 6 1 7 6 f e 1 7 ? L = 6 7 · 6 7 Regarding parallelism, we note that our Fast-Fourier LDL 6 a c 1 7 6 1 7 6 b d 7 6 7 algorithm is almost fully parallelizable: viewed as an arith- 6 b a d c 1 7 6 1 7 6 7 6 7 metic circuit on real numbers, it has depth O(log(n)) and 4 d c a b 1 5 4 g h 1 5 width O(n).

Fast Fourier Orthogonalization

Algorithm Construction of Hli Hash Function

SWIFFT: a Modest Proposal for FFT Hashing*

Truly Fast NTRU Using NTT∗

Fundamental Data Structures Contents

Post-Quantum Authenticated Key Exchange from Ideal Lattices

Automatic Library Generation and Performance Tuning for Modular Polynomial Multiplication a Thesis Submitted to the Faculty of D

Efficient Lattice-Based Inner-Product Functional Encryption

SWIFFTX: a Proposal for the SHA-3 Standard

Lattice-Based Cryptography

The Hidden Subgroup Problem

SWIFFT: a Modest Proposal for FFT Hashing*

Fast Fourier Orthogonalization