Cryptography for Next Generation TLS
Total Page:16
File Type:pdf, Size:1020Kb
Cryptography for Next Generation TLS: Implementing the RFC 7748 Elliptic Curve448 Cryptosystem in Hardware Pascal Sasdrich Tim Güneysu Horst Görtz Institute for IT Security, University of Bremen & DFKI Ruhr-Universität Bochum [email protected] [email protected] ABSTRACT has to be implemented on various physical and embedded With RFC 7748 the two elliptic curves Curve25519 and devices. But since both curves were primarily designed for Curve448 were proposed for the next generation of TLS. software platforms (x86/x64) and without explicit evalua- Both curves were designed and optimized purely for software tion of their resistance against low-level physical such as implementation; their implementation in hardware or phys- side-channel attacks, we still need to investigate the hard- ical protection against side-channel attacks were not con- ware implementation of both curves in combination with sidered in the design phase. Recently, it has been shown suitable countermeasures. that for Curve25519 an efficient implementations in hard- ware along with side-channel protection is feasible { yet re- Contribution. In this work, we present an efficient hard- sults for the high-security Curve448 are missing. In this ware implementation of Curve448 for modern Xilinx Field- work we demonstrate that Curve448 can indeed be efficiently Programmable Gate Array (FPGA) devices. Our work fills and securely implemented in hardware. We present a novel the gap left from the previous work on Curve25519 [14, 15]. architecture for Curve448 that can compute more than 1000 Despite of the similarity of Curve448 and Curve25519, we point multiplications per second with 1580 logic slices and were faced with completely revising the design rationales 33 DSP units of a Xilinx XC7Z020 FPGA. due to specially-crafted parameters and the significantly in- creased field size of 448 bits. Interestingly, we can demon- Keywords strate that even in light of the asymptotically growing com- plexity in the field size, Curve448 still can provide similarly Curve448, RFC 7748, FPGA, high-performance, implemen- high throughput on a mid-range Xilinx XC7Z020 FPGA, tation, side-channel, countermeasure providing a performance of more than 1000 point multipli- cations per second at moderate resource costs. 1. INTRODUCTION Elliptic Curve Cryptography (ECC) has become the most Outline. This work is organized as follows: In Section 2 predominant scheme in the area of asymmetric cryptogra- we summarize and discuss previous but related work of rel- phy during the last decade. However, recent revelations on evance before we give a brief introduction into the topic of manipulations and backdoors have undermined the confi- ECC in Section 3, focusing on Curve448 in particular. Be- dence in existing schemes and led to discussion among re- fore presenting the final implementation details in Section 5, searchers and standardization bodies on the selection of new we discuss optimization strategies and justify our final de- curves and the rigidity of existing curve generation pro- sign choices in Section 4. Eventually, implementation and cesses. Within these discussions, the Internet Engineering performance results are summarized and presented in Sec- Task Force (IETF) Transport Layer Security (TLS) working tion 6 before we conclude in Section 7. group requested new recommendations on elliptic curves for the next generation of TLS on which the Internet Research 2. PREVIOUS WORK Task Force (IRTF) Crypto Forum Research Group (CFRG) ECC has been proposed independently by Neal Koblitz [7] has selected two candidates for the RFC 7748: Curve25519 and Victor Miller [10] in 1985. Since then, it has established and Curve448. Simultaneously, the IETF curdle group pro- as a viable field of research in public-key cryptography and posed both curves for application in future protocols such as has been implemented in a great number of industrial prod- DNSSEC. With further emergence of, e.g., self-driving cars, ucts. Up to date, there exists a plethora of publications on high-performance TLS (including Curve25519 and Curve448) hardware implementations that we will not cover or discuss Permission to make digital or hard copies of all or part of this work for personal or within the scope of this work. Instead we restrict ourselves classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation to briefly list the most relevant ones. However, for a de- on the first page. Copyrights for components of this work owned by others than the tailed history and list of references we would like to refer author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or the interested reader to [2]. republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. As one of the first ECC architectures implemented on a DAC ’17, June 18 - 22, 2017, Austin, TX, USA FPGA, in 2001, Orlando and Paar [12] presented a design using pre-computations and Montgomery multiplications for c 2017 Copyright held by the owner/author(s). Publication rights licensed to ACM. ISBN 978-1-4503-4927-7/17/06. $15.00 improved performance results. Their work was followed by DOI: http://dx.doi.org/10.1145/3061639.3062222 many more, including designs based on dedicated multi- pliers [13] and integrated Digital Signal Processor (DSP) Due to its golden ratio φ the Solinas prime allows fast and units embedded into modern FPGAs [4]. In 2014, Sasdrich efficient Karatsuba-based multiplication of two operands A = and Guneysu¨ [14, 15] presented an FPGA architecture for (a0 + a1φ) and B = (b0 + b1φ), such that: Curve25519, the first candidate of RFC 7748. Recently, C = A · B = (a + a φ) · (b + b φ) = J¨arvinen et al. [6] presented a high-performance architecture 0 1 0 1 for FourQ, a novel elliptic curve with about 128 bit security (a0b0 + a1b1) + ((a0 + a1) · (b0 + b1) − a0b0)φ (2) that supports highly-efficient point multiplications. Eventually, modular inversion can be tweaked compared to modular inversions based on Fermat's Little Theorem 3. BACKGROUND (FLT) using that, if p ≡ 3 mod 4, the Inverse Square Root (p−3) p1 4 In this section we will briefly cover basic aspects of ECC. (ISR) can be computed as ± x = x , which directly Note, however, that we will only focus on ECC over prime leads to: fields. Further, we give a short introduction to Curve448 −1 1 2 2 (p−3) 2 p−2 and its specification. x = x · ( p ) = x · [(x ) 4 ] = x (3) ± x2 3.1 Elliptic Curve Cryptography 3.2.2 Group Arithmetic Given an elliptic curve E over the Galois Field GF (p) According to RFC 7748, the function X448 performs a (with p > 3), defined by the Edwards equation variable scalar-point multiplication on the Montgomery curve 2 2 2 2 (Curve448) using the Montgomery ladder algorithm [11] that Ed : y + x ≡ 1 + dx y (mod p); (1) enables the computation of a point addition and a point dou- we can define tuples of affine coordinates (xi; yi) as points bling in a single combined step. Hence, given two points Pi 2 E on the elliptic curve. Considering two points Pi; Pj , Q1; Q2 2 E and their difference Q3 = Q1 − Q2, a single step we can define the basic group operations as point addition of the Montgomery ladder algorithm computes two points if i 6= j and point doubling if i = j. Moving from affine Q4; Q5 2 E such that Q4 = 2 ·Q1 = (x4; z4) is the point coordinate to projective coordinate representations, hence doubling with: defining points on E as tuples (xi; yi; zi), curve equations can 2 2 be relaxed in order to reduce the number of costly operations x4 = (x1 − z1) · (x1 + z1) (4) 2 2 (i.e., modular inversions). z4 = 4x1z1 · (x1 + dx1z1 + z1 ) (5) Applying elliptic curves for cryptographic purposes relies on the Elliptic Curve Discrete Logarithm Problem (ECDLP) and Q5 = Q1 + Q2 = (x5; z5) is the point addition with: 2 which states that, given two points P; Q 2 E, it is hard to x5 = z3((x1 − z1) · (x2 + z2) + (x1 + z1) · (x2 − z2)) (6) find a scalar k such that Q = k ·P holds, whereas the point 2 multiplication is defined as a k-fold point addition of P. z5 = x3((x1 − z1) · (x2 + z2) − (x1 + z1) · (x2 − z2)) (7) Eventually, the ECDLP is the fundamental problem that Fortunately, these equations are independent of the y- allows building cryptographic protocols and schemes based coordinate which can be omitted during the intermediate on elliptic curves. Common examples include the Elliptic computation and only has to be restored for the final result. Curve Diffie-Hellman (ECDH) key exchange, the Elliptic In total, a single step of the Montgomery ladder algorithm Curve Digital Signature Algorithm (ECDSA) and the El- involves 6 multiplications, 4 squarings, 4 additions, and 4 Gamal encryption scheme. subtractions over GF (p). 3.2 Specification of Curve448 3.2.3 Point Multiplication Ed448-Goldilocks, in RFC 7748 denoted as Curve448, was Given a public point P 2 E and a secret 448-bit scalar proposed and designed by Mike Hamburg [5] in order to pro- k, the point multiplication routine computes another point vide a high-security alternative (i.e., a field size between 384 Q = k ·P using a sequence of consecutive Montgomery lad- and 521 bits) to existing NIST [9] and Brainpool curves [8]. der steps. For this purpose, the scalar is scanned bitwise The design process of Curve448 strictly heeded the Safe- (starting from the most significant bit) and depending on Curves policies and criteria [1] which should enable simple the value of the currently processed bit, inputs to a sin- but secure implementations of an elliptic curve that meets gle step of the Montgomery ladder algorithm are swapped.