Towards Efficient Arithmetic for Lattice-Based Cryptography On

Towards Efficient Arithmetic for Lattice-Based Cryptography on Reconfigurable Hardware Thomas Pöppelmann and Tim Güneysu∗ Horst GörtzInstitute for IT-Security, Ruhr-University Bochum, Germany Abstract. In recent years lattice-based cryptography has emerged as quantum secure and theoretically elegant alternative to classical cryp- tographic schemes (like ECC or RSA). In addition to that, lattices are a versatile tool and play an important role in the development of efficient fully or somewhat homomorphic encryption (SHE/FHE) schemes. n In practice, ideal lattices defined in the polynomial ring Zp[x]=hx + 1i allow the reduction of the generally very large key sizes of lattice constructions. Another advantage of ideal lattices is that polynomial multiplication is a basic operation that has, in theory, only quasi-linear time n complexity of O(n log n) in Zp[x]=hx + 1i. However, few is known about the practical performance of the FFT in this specific application domain and whether it is really an alternative. In this work we make a first step towards efficient FFT-based arithmetic for lattice-based cryptography and show that the FFT can be implemented efficiently on reconfigurable hardware. We give instantiations of recently proposed parameter sets for homomorphic and public-key encryption. In a generic setting we are able to multiply polynomials with up to 4096 coefficients and a 17-bit prime in less than 0.5 milliseconds. For a parameter set of a SHE scheme (n=1024,p=1061093377) our implementation performs 9063 polynomial multiplications per second on a mid-range Spartan-6. Keywords: Lattice-Based Cryptography, Ideal Lattices, FFT, NTT, FPGA Implementation 1 Introduction Currently used cryptosystems rely on similar number theoretical problems such as the hardness of factoring (RSA) or the elliptic curve discrete logarithm problem (ECC). Unfortunately, these do not hold any more in case it becomes feasible to build a large enough quantum computer [52]. Even without quantum computers it may occur that improvements of cryptanalytic algorithms heavily impact the security of the currently used schemes or raise their required security parameters beyond practicability. Therefore, it is necessary to investigate practical and efficient alternative cryptosystems on relevant architectures (hardware/software) that can withstand quantum attacks [12]. ∗ This work was partially supported by European Commission through the ICT pro- gramme under contract ICT-2007-216676 ECRYPT II. Promising candidates to satisfy this need are lattice-based cryptosystems. They rely on well-studied problems for which no efficient quantum algorithms are known and also enable elegant security proofs and worst-case to average-case reductions. However, for long lattice-based cryptography has been considered secure only for large parameter sets beyond practicability (e.g., a key size around one megabyte [42]) and required complex operations like multiplication of large matrices. This has changed since the introduction of cyclic [41] and ideal lattices [35] which enable the construction of a great variety of theoretically sound and also efficient cryptosystems. Besides being considered as a replacement for classical schemes, (ideal) lattices are also employed for new primitives like fully or somewhat homomorphic encryption (SHE/FHE) that have the potential to secure cloud computing. The first fully homomorphic encryption system has been proposed by Gentry in 2009 [24] and with his work, research in SHE/FHE gathered momentum due to the great variety of applications and new ideas. However, efficiency is still far from being practical and a lot of research focuses on performance improvements for SHE/FHE [25, 11, 43]. When considering the runtime performance of proposed ideal lattice-based cryptosystems, the most common and also most expensive operation is poly- n nomial multiplication in Zp[x]=hx + 1i. This is also one reason for the often claimed asymptotic speed advantage of lattice-based schemes over number theoretical constructions, as polynomial multiplication is equivalent to the negacyclic or negative-wrapped convolution product which can be efficiently computed in quasi-linear time O(n log n)1 using the Fast Fourier Transform (FFT). However, this notation omits constants and terms of lower order and in practice it is not al- ways clear if simpler algorithms like schoolbook multiplication or Karatsuba [32] perform better on a specific platform. In case of software implementations it is usually easy to determine which polynomial multiplication algorithm yields the best performance for a target architecture by running a few experiments with the desired parameter sets. Es- pecially, as the simple algorithms are easy to implement and the more evolved ones are already available and heavily optimized in the NTL [53] library of Vic- tor Shoup. However, for hardware this is different as only very few results are available [29, 26] which consume lots of FPGA resources and do not target larger parameter sets needed for SHE/FHE. Some examples of previous implementations of polynomial multiplication not designed for lattice-based cryptography are [56, 14, 7] but they mostly deal with Galois Fields GF (2n). Other implementations are only optimized for the com- putation of approximate results due to the complex floating-point arithmetic. This lack of reference implementations may prevent the usage of lattice-based cryptography in hardware. We therefore present a versatile implementation of arithmetic for lattice-based cryptography and hope that this will allow designers to focus more on the specific nature of a scheme than on the tedious implementation of the underlying FFT arithmetic. 1 All logarithms in this paper are of base two. Our Contribution. In this work we present an adaptable and extensible FPGA implementation of polynomial multiplication based on the FFT in Zp. n All arithmetic operations are specifically optimized for the ring Zp[x]=hx + 1i which is heavily used in ideal lattice-based cryptography, especially homomorphic encryption. Our implementation supports a broad range of parameters and we give optimized variants for previously proposed parameter sets for public key encryption and SHE. We show that the FFT seems to be the method of choice for future implementations and that homomorphic encryption can benefit from FPGA hardware accelerators. Our design is scalable, has small area consumption on a low-cost FPGA and offers decent performance compared to general purpose computers. We will make our source files, test-cases and documentation available in order to allow other researchers to use and evaluate our implementation.2 Outline. This work is structured as follows. At first we give a short introduction to lattice-based cryptography and introduce different methods for polynomial multiplication and outline their theoretical complexity in the next section. We then discuss several design options in Section 4 and detail our implementations of these algorithms on reconfigurable hardware in Section 5. We finally give considerations on the expected runtime and present performance results for a broad range of parameters in Section 6. We finish this paper with future work and a conclusion. 2 Ideal Lattice-Based Cryptography Lattice-based cryptography has become more and more important in the last years starting with the seminal result by Ajtai [2] who proved a worst-case to average-case reduction between several lattice problems. In addition to that, no efficient quantum algorithms solving lattice problems are currently known. The two most commonly used average-case lattice problems are the short integer solution (SIS) [2] and the learning with errors (LWE) [48] problem. Both can be shown to be as hard as solving other lattice problems in the worst-case, e.g., the (gap) shortest vector problem (SVP). However, while schemes based on SIS or LWE admit very appealing theoretical properties they are not efficient enough for practical usage, especially as in most schemes the key-size grows (at least) quadratically in O(n2) for the security parameter n [42]. This problem can be mitigated by introducing algebraic structure into the previously completely randomly generated lattice. This led to the definition of cyclic [41] and the more generalized ideal lattices [35] which correspond to ideals in the ring Z[x]=hfi for some irreducible polynomial f of degree n. They proved to be a valuable tool for the construction of signature schemes [37, 38], collision resistant hash functions [39, 3], public key encryption schemes [36, 54], identification schemes [34] as well as FHE [24] and SHE [43]. While certain properties can be established for various rings, in most cases the n n ring R = Zp[x]=hx + 1i is used with f(x) = x + 1, n being a power of 2 and p being a prime such that p ≡ 1 mod 2n (e.g., [37, 39, 43, 54]) 2 See our web page at http://www.sha.rub.de/research/projects/lattice/ In the ring variant of the previously mentioned decisional LWE problem the attacker has to decide whether the samples (a1; t1);:::; (am; tm) 2 R × R are chosen completely random or whether each ti = ais+ei with s; e1; : : : ; em having small coefficients from a Gaussian distribution. The pseudo-randomness of the ring-LWE distribution can then directly be applied to construct a semantically n secure public-key cryptosystem. When defining this cryptosystem in Zp[x]=hx + 1i it is also possible to compute the polynomial product of the noisy coefficients of the ring-LWE problem efficiently by using the Fast Fourier Transform (FFT) in the field Zp. This gives lattice-based cryptosystems the advantage that for most cases the runtime is quasi-linear in O(n log n) and an increase of the security parameter affects the efficiency far less than in other proposals [36]. For practical applications theoretical proofs of security are an important foundation but the choice of parameters is much more relevant.

Towards Efficient Arithmetic for Lattice-Based Cryptography On

Algorithm Construction of Hli Hash Function

SWIFFT: a Modest Proposal for FFT Hashing*

Truly Fast NTRU Using NTT∗

Fundamental Data Structures Contents

Post-Quantum Authenticated Key Exchange from Ideal Lattices

Automatic Library Generation and Performance Tuning for Modular Polynomial Multiplication a Thesis Submitted to the Faculty of D

Efficient Lattice-Based Inner-Product Functional Encryption

Fast Fourier Orthogonalization

SWIFFTX: a Proposal for the SHA-3 Standard

Lattice-Based Cryptography

The Hidden Subgroup Problem

SWIFFT: a Modest Proposal for FFT Hashing*