<<

Comparison of Multi-Purpose Cores of Keccak and AES Panasayya Yalla, Ekawat Homsirikamol, Jens-Peter Kaps Department of Electrical and Computer Engineering, George Mason University, Fairfax, Virginia 22030, USA

Abstract Design Decisions Comparison of AES vs. Keccak

Most widely used security protocols, Internet Protocol Security (IPSec), Results of AES and Keccak Implementations I One high speed (HS) and one low-area (LA) all-in-one design each. Secure Socket Layer (SSL), and (TLS), provide Mode Design Area Freq. TP TP/Area All-in-one supports Hash, MAC, AEAD, and PRNG. several cryptographic services which in turn require multiple dedicated I (Slices) (MHz) (Gbps) (Mbps/Slices cryptographic algorithms. A single cryptographic primitive for all secret I One HS and one LA dedicated AES-GCM and Keyak design each. High-Speed Designs on Xilinx Virtex-7 FPGA Multi-AES 3061 188.18 3.212 1.049 functions utilizing different mode of operations can overcome this I HS design of Keccak uses full width datapath of 1600 bits. Hash HS design of AES uses 2 cores of AES-128/256 that can be combined Multi-Keccak 2495 206.70 9.370 3.756 constraint. This paper investigates the possibility of using AES and I Multi-AES 3061 188.18 2.190 0.715 MAC Keccak as the underlying primitives for high-speed and resource to a single Rijndael with 256 block size. Multi-Keccak 2495 206.70 9.370 3.756 constrained applications. I LA design AES 32-bit datapath (width of MixColumns). Multi-AES 3061 188.18 4.380 1.431 AEAD Multi-Keccak 2495 206.70 23.150 9.279 I LA design Keccak 64-bit (width of a word in Keccak). AES-PRNG 3061 188.18 3.212 1.049 All is performed in hardware. PRNG Introduction and Motivation I Multi-Keccak 2495 206.70 23.150 9.279 Dedicated AES-GCM 1455 352.98 4.107 2.823 AEAD Keyak 2444 258.40 28.941 11.841 1. Integrity is provided through a Hash Function. High-Speed Implementations Low-Area Designs on Xilinx Artix-7 FPGA 2. Authentication and integrity are provided by a Message Multi-AES 629 82.83 0.166 0.263 Hash Authentication Code (MAC). AES Keccak Multi-Keccak 264 152.23 0.125 0.474 3. Confidentiality, integrity, and authentication are provided Multi-AES 629 82.83 0.189 0.301 I Contains two AES-128 cores that I Based on single round iterative MAC simultaneously by Authenticated (AE). Multi-Keccak 264 152.23 0.119 0.451 can support key sizes of 128- and architecture. Multi-AES 629 82.83 0.074 0.117 4. Pseudo Random Numbers required for secret keys and nonces are AEAD 256-bit. I DuplexPad enables Multi-Keccak 264 152.23 0.274 1.037 generated using Pseudo Random Number Generators (PRNG). Multi-AES 629 82.83 0.379 0.602 I These cores also have the injection of input into the state PRNG capability to form a single via an XOR. Multi-Keccak 264 152.23 0.280 1.060 Hash MAC Dedicated AES-GCM 548 71.09 0.630 0.115 Rijndael-256 core for AES-Hash. I SinglePad provides padding to AEAD Keyak 260 177.87 0.136 1.231 Primitive I Two 128x16 Galois field injected input. AES/Keccak multipliers are used in AES-GCM I Multi-Keccak implementations always outperform multi-AES I Due to non-uniform block sizes implementations. In some cases up to 14 times. mode. for Keccak modes, the controller PRNG AEAD I High-speed Keccak datapath width is 12.5 times wider than AES. I Dedicated AES-GCM design for loading module is complex. I Low-area Keccak datapath width is 2 times wider than AES.

I A single secret key algorithm such as AES could provide several of these contains only one AES-128 bit. I This increase in width is not translated into increase in area.

I Adding mode to Keccak requires minimal additional resources whereas AES modes 0 Hash services through application of various modes of operation. Note: IV||CTR1||IV||CTR2 Buses are 128−bit 0 1 unless specified otherwise 0 256 0 have vastly different underlying characteristics. 256 256 256 SpongePad DuplexPad I Another interesting option is using Keccak’s f-permutation. CMAC Hash di 0 1 4 Keyak outperforms AES-GCM in a similar way as multi-Keccak 0 1 2 3 128 I

2 1 0 256 1344 10 Byte 256 0 2 GCM 252 Sel outperforms multi-AES. AES−ECB key

1 ENC 0 0 1344 1344 SKey2 SKey1 0 1 I Transforming Keccak into Keyak requires only minimal additional hardware. 256 256 Hkey1 Hkey2 di

Cryptographic Algorithms 256 0 256 1344

0 0

0 Transforming AES into AES-GCM adds costly GCM multipliers and increases

256 State I

0 0 1 1 1 do 1 1 1 Ek0 256

RB 256 0 Len 128x16 128x16 0 1344 number of cycles. 256 Byte Keccak 1 >>1 GCM MULT GCM MULT Sel Round

4 3 2 1 0 4 3 2 1 0 256 0 1344 I The performance of Keyak relative to AES-GCM is higher than AES Keccak-p[1600,n ] 0 r 256 1 1344 0 multi-Keccak in AEAD mode relative to multi-AES in the same more 256 1 do NIST standard for block . Permutation based on Keccak, 2 256 I I 256 256 3 for low-area designs.

I Based on Rijndael block . winner of competition for next I The is not true for high-speed design as multi-AES employs two cores but not

I 128-bit block size. Secure Hash Algorithm (SHA-3). AES-GCM. 1600-bit state size. Low-Area Implementations I 128/192/256-bit . I Normalized TP/Area 14 Hash AES Keccak 12 CMAC AES Modes AEAD I Two 32-bit wide dual-port RAMs I Two 64-bit wide dual-port RAMs 10 PRNG are used to store inputs and state are used to store state variables. 8

Hash→AES-Hash AEAD→GCM s variables. e I Additionally a single-port RAM is m 6 i t

(Galois/Counter Mode) X I Based on Davies-Meyer. It takes 4 clock cycles for one for storing Key, IV, and Seed. It is a combination of counter I 4 I The message enters on the input I round of AES. It takes 58 clock cycles for one mode with Galois field I 2 for the key. Multiplications in AES-GCM round of Keccak. multiplication. I 0 I Uses a block size of 256-bit mode are performed using a LA HS LA HS LA HS LA HS LA HS LA HS LA HS I Only 6-bits of each round Virtex-5 Spartan-6 Virtex-6 Artix-7 Virtex-7 Cyclone-IV Stratix-IV (Rijndael-256). I Recommended mode by NIST. 128x2 multiplier. constant are stored to conserve Device Parallelizable. I Not parallelizable. I Dedicated AES-GCM design is a space. Performance improvement of multi-Keccak over multi-AES for specific modes of I operation I Not a NIST standardized mode. I Widely used for its efficiency and reduced version of multi-purpose I Dedicated Keyak is derived from performance. core. the multi-purpose Keccak with MAC→CMAC Normalized TP/Area minimal changes. I Equivalent to One-Key PRNG→Fortuna 10

M1b Multi-purpose

I It is counter mode with 32-bit 1 0 CBC-MAC (OMAC1). Key 0 3 2 1 0 M1a Dedicated RAM1 1 8 32 RAM1 0 SF1 counter. (M, S,IV) cnt 2 0 I Recommended mode of 1 1 R1 0 0 AES−Rnd 2 6

s 3 0 operation by NIST. I Cryptographically secure PRNG. 3 2 1 e 0 m

16 di 31 i 0 t 4 1 RAM2 SF2

I Not a NIST standardized mode. 32 16 1 X I Uses two additional subkeys Reg 2 others 15 2 Min

0 0 2 generated from original key. I The is processed as key. 3 2 1 16 <<<1 RAM2 64 0

(A,|A|,|C|, 32 others SF3 0 0 I Not parallelizable. I Parallelizable. H,Jo) 1 do 3 2 1 2 LA HS LA HS LA HS LA HS LA HS LA HS LA HS 128 0 Mout1 0 1 R2 2 Mul−H 1 Mout Virtex-5 Spartan-6 Virtex-6 Artix-7 Virtex-7 Cyclone-IV Stratix-IV 2 0 F Device 3 1 chi 2 M4 32 3 IB Keccak Modes 0 31 do 0 1 0 32 di 0 16 15 1 1 Performance improvement of dedicated and multi- purpose Keccak over corresponding 1 0 Mcon 2 1 0 2 1 2 3 3 2 RAM3 AES cores for AEAD Sponge Mode Duplex Mode 80 3 σ σ σ M Z 0 Z 0 1 Z 1 2 Z 2

. l pad . l . l . l pad 0 pad 1pad 2 Implementation Results and Comparisons Conclusions r 0 r 0 f f f f f f f f Overall, our Multi-Keccak design outper- 2.5 c 0 Comparison of our designs with other implementations on Xilinx Virtex-5 I c 0 2.0 Multi−Purpose absorbing squeezing (TW = This Work) forms our Multi-AES design by a factor of duplexing duplexing duplexing 1.5 Hash→Keccak Mode Design Area Freq. TP TP/Area 4 on average across all functions and 4x AEAD → Keyak 1.0 (Slices) (MHz) (Gbps) (Mbps/Slices) FPGAs in terms of throughput over area.

I r=1088, c=512, 24 rounds. TP/Area r=1348, c=252, 12 rounds. 0.5 I Multi-AES[TW] 2871 203.29 3.470 1.208 Keccak in AE-mode (Keyak) achieves a I |P (M)|= n · 1088 I AES Keccak MS Hash Multi-Keccak[TW] 2805 163.92 7.431 2.649 0.0 I CAESAR candidate. TP of 23.2 Gbps on Xilinx Virtex-7 and MAC→Sponge |P (M k3)| = 1348, ∀ i =6 n − 1; Keccak[1] 1395 281.84 12.777 9.16 I MK i Multi-AES[TW] 2871 203.29 2.366 0.824 28.7 Gbps on Altera Stratix-IV. 5.0 I r=1088, c=512, 24 rounds. |P (M k1)| = 1348 MK n−1 MAC Multi-Keccak[TW] 2805 163.92 7.431 2.649 I Dedicated Keyak outperforms AES-GCM 4.0 Dedicated KeyPack is used to encode the GMAC[2] 9405 120.17 15.382 1.636 I PRNG→Duplex by a factor of 6 on average across all 3.0 AES-GCM[TW] 1089 283.53 3.299 3.030 secret key in a uniform way. devices. 2.0 6x I r=1344, c=252, 12 rounds. High-Speed AES-GCM[3] 678 335.00 2.250 3.319 TP/Area I |PMS(M)|= n · 1088 Dedicated 1.0 I Random bits are generated using AES-CCM[4] 490 274.00 1.525 3.112 ⇒Keccak is more flexible than AES. AEAD AES−GCM Keyak I |PMS(KeyPackkIV)|= 1088 Grøestl/AES[5] 3102 233.00 3.848 1.240 0.0 the seed as input. 45 34 Keyak[TW] 2357 243.96 27.324 11.593 Padding Additional random bits are gene- AES Keccak I Multi-AES[TW] 478 131.23 0.262 0.549 40 33 I Adding modes to P : Padding for message in Sponge Mode MS rated using empty seed as input. Multi-Keccak[TW] 318 257.00 0.211 0.665 35 32 AES-GCM increases the Hash PMK: Message padding for Keyak 1.9x I |PSD(Seed)| = |PSD(0)| = 1348 Keccak[6] 275 251.25 0.118 0.430 30 31 1.1x area by a factor of 1.9 and PSD: Padding for seed in PRNG Mode Keccak[7] 393 159.0 0.864 2.198 25 30 for Keyak only by 1.1.

AES-GCM[TW] 351 130.87 0.116 0.331 Area (Slices)x100 AES−GCM Multi Area (Slices)x100 Keyak Multi 20 29

Low-Area Dedicated AES-GCM[3] 247 393.00 0.230 0.931 Modes of Operation Summary AEAD AES-CCM[4] 214 272.00 0.363 1.696 Keyak[TW] 259 281.29 0.506 1.954 References AES / Rijndael* and Keccak Modes (Rd. = Number of rounds) I All comparisons are made with dedicated design from literature. Operation Mode Block Key Rd. ρ Inputs Outputs [1] E. Homsirikamol, M. Rogawski, and K. Gaj, “Throughput vs area trade-offs architectures of five round Our high-speed multi-Keccak performance degrades by more than 1/3 Hash* AES-Hash 256 N/A 14 |M|, M H I 3 SHA-3 candidates implemented using Xilinx and Altera FPGAs,” in CHES, ser. LNCS, B. Preneel and compared to dedicated Keccak [1] due to additional hardware for modes T. Takagi, Eds., vol. 6917. Springer, Sep 2011, pp. 491–506. MAC CMAC 128 128 10 |M|, M, K, IV T and padding. [2] Y. Lu, G. Shou, Y. Hu, and Z. Guo, “The research and efficient fpga implementation of ghash core for AEAD GCM 128 128 10 |M|, M, K, IV , T , C gmac,” in E-Business and Information System Security, 2009. EBISS ’09, 2009, pp. 1–5. I High-speed multi-AES trails behind GMAC[2], however multi-Keccak

AES |AD|, AD can compete in MAC mode. [3] AES-GCM Core family for Xilinx FPGA, Helion Technology, Fulbourn, Cambridge CB21 5DQ, England, PRNG Fortuna 128 N/A 14 S R 2011, http://www.heliontech.com/downloads/aes_gcm_8bit_xilinx_datasheet.pdf. Hash Sponge 1600 N/A 24 1088 |M|, M H I Our dedicated high-speed AES-GCM TP/Area closely matches those of [4] AES-CCM Core family for Xilinx FPGA, Helion Technology Limited, Fulbourn, Cambridge CB21 5DQ, single message AEAD designs [3] and [4]. England, 2011, MAC Sponge 1600 128 24 1088 |M|, M, K, IV T http://www.heliontech.com/downloads/Helion_PB_-_AES-CCM_8-bit_FPGA.pdf. I Keyak outperforms [5] by a factor of 9 and other designs by almost a AEAD Duplex 1600 128 12 1344 |M|, M, K, IV , T , C [5] M. Rogawski, “Development and Benchmarking of New Hardware Architectures for Emerging |AD|, AD factor of 4 due to larger block size with similar number of rounds. Cryptographic Transformations,” Ph.D. dissertation, George Mason University, July 2013.

Keccak PRNG Duplex 1600 N/A 12 1344 S R I Even our low-area Keccak suffers performance degradation due to [6] J.-P. Kaps, P. Yalla, K. K. Surapathi, B. Habib, S. Vadlamudi, and S. Gurung, “Lightweight additional hardware for modes. implementations of SHA-3 finalists on FPGAs,” Mar 2012, third SHA-3 Candidate Conference. M–Message, K–Key, AD–Associated Data, S–Seed, IV –Initialization Value, H–Hash, T –Tag, [7] B. Jungk and J. Apfelbeck, “Area-efficient FPGA implementations of the SHA-3 finalists,” in I Implementation details of [3] and [4] are unknown. C–Cipher-text, R–Random Number, |X|–Length of X International Conference on ReConfigurable Computing and FPGAs. IEEE: ReConfig’11, DEC 2011.

Cryptographic Engineering Research Group (CERG) Department of Electrical and Computer Engineering George Mason University http://cryptography.gmu.edu