Sok: a Performance Evaluation of Cryptographic Instruction Sets on Modern Architectures

SoK: A Performance Evaluation of Cryptographic Instruction Sets on Modern Architectures A. Faz-Hernández, Julio López, Ana Karina D. S. de Oliveira [email protected] Institute of Computing, University of Campinas, Brazil June 4, 2018. Incheon, Republic of Korea. 5th ACM ASIA Public-Key Cryptography Workshop AsiaCCS/Asia PKC 2018 Instruction Sets for Cryptography Processors have extensions to the Instruction Set Architecture (ISA) that aid on the execution of cryptographic algorithms. SHA-NI Zen ADX & RDSEED Kaby Lake MULX & AVX2 Haswell AES-NI & CLMUL Westmere CRC32 Nehalem SSE2 x64 Pentium 4 Faz, López, de Oliveira (IC-UNICAMP) Performance Eval Crypto ISA Modern Arch AsiaCCS/Asia-PKC 2018 1 / 33 Motivation Goals of this work: • Performance evaluation of algorithms based on SHA-256 and AES. • Look for optimizations on the use of the SHA New Instructions. Optimized Implementations: • Multiple-Message Hashing. • XMSS Digital Signature. • AES Modes of Operation. • AEGIS Authenticated Encryption. Deliverables: • Source code available. /armfazh/flo-shani-aesni Faz, López, de Oliveira (IC-UNICAMP) Performance Eval Crypto ISA Modern Arch AsiaCCS/Asia-PKC 2018 2 / 33 Outline 1 SHA New Instructions Performance Comparison 2 Multiple-Message Hashing SIMD Instructions Pipelining SHA-NI 3 Hash-Based Digital Signatures 4 AES Modes of Operation 5 Final Remarks SHA New Instructions SHA New Instructions (SHA-NI) In 2013, Intel [1] released the specification of the SHA New Instructions (SHA-NI), which is composed by: SHA1: SHA-256: • SHA1MSG1 • SHA256MSG1 • SHA1MSG2 • SHA256MSG2 • SHA1NEXTE • SHA256RNDS2 • SHA1RNDS4 Processors that support SHA-NI. 2016 Intel Goldmont, a low power consumption micro-architecture. 2017 AMD Zen, a middle- and high-end micro-architecture. Faz, López, de Oliveira (IC-UNICAMP) Performance Eval Crypto ISA Modern Arch AsiaCCS/Asia-PKC 2018 3 / 33 2 Pad the message and split it into n 512-bit blocks: M Pad 3 For each block, process the state using the Update function. m1 mn ··· S0 U U U U U U ··· U Sn 4 The digest of M is SHA2(M) = Sn. The SHA-256 Hashing Algorithm 1 Initialize the state. " # 0x6a09e667 0xbb67ae85 0x3c6ef372 0xa54ff53a S0 = 0x510e527f 0x9b05688c 0x1f83d9ab 0x5be0cd19 Faz, López, de Oliveira (IC-UNICAMP) Performance Eval Crypto ISA Modern Arch AsiaCCS/Asia-PKC 2018 4 / 33 3 For each block, process the state using the Update function. m1 mn ··· S0 U U U U U U ··· U Sn 4 The digest of M is SHA2(M) = Sn. The SHA-256 Hashing Algorithm 1 Initialize the state. " # 0x6a09e667 0xbb67ae85 0x3c6ef372 0xa54ff53a S0 = 0x510e527f 0x9b05688c 0x1f83d9ab 0x5be0cd19 2 Pad the message and split it into n 512-bit blocks: M Pad Faz, López, de Oliveira (IC-UNICAMP) Performance Eval Crypto ISA Modern Arch AsiaCCS/Asia-PKC 2018 4 / 33 4 The digest of M is SHA2(M) = Sn. The SHA-256 Hashing Algorithm 1 Initialize the state. " # 0x6a09e667 0xbb67ae85 0x3c6ef372 0xa54ff53a S0 = 0x510e527f 0x9b05688c 0x1f83d9ab 0x5be0cd19 2 Pad the message and split it into n 512-bit blocks: M Pad 3 For each block, process the state using the Update function. m1 mn ··· S0 U U U U U U ··· U Sn Faz, López, de Oliveira (IC-UNICAMP) Performance Eval Crypto ISA Modern Arch AsiaCCS/Asia-PKC 2018 4 / 33 The SHA-256 Hashing Algorithm 1 Initialize the state. " # 0x6a09e667 0xbb67ae85 0x3c6ef372 0xa54ff53a S0 = 0x510e527f 0x9b05688c 0x1f83d9ab 0x5be0cd19 2 Pad the message and split it into n 512-bit blocks: M Pad 3 For each block, process the state using the Update function. m1 mn ··· S0 U U U U U U ··· U Sn 4 The digest of M is SHA2(M) = Sn. Faz, López, de Oliveira (IC-UNICAMP) Performance Eval Crypto ISA Modern Arch AsiaCCS/Asia-PKC 2018 4 / 33 SHA256 Update Function The Update function consists of two phases: 1 Message Schedule. 2 State Update. Faz, López, de Oliveira (IC-UNICAMP) Performance Eval Crypto ISA Modern Arch AsiaCCS/Asia-PKC 2018 5 / 33 w0 w1 w2 w3 w4 w5 w6 w7 w8 w9 w10 w11 w12 w13 w14 w15 w1 w2 w3 w4 w5 w6 w7 w8 w9 w10 w11 w12 w13 w14 w15 w16 w2 w3 w4 w5 w6 w7 w8 w9 w10 w11 w12 w13 w14 w15 w16 w17 w63 w1 + σ0(w2) + w10 + σ1(w15)→ ... • Rename words to calculate w17. • Repeat this proceeding to calculate the words w16, . , w63. w16 w0 + σ0(w1) + w9 + σ1(w14) → Update Phase 1: Message Schedule • Split message block into sixteen blocks of 32 bits and calculate w16. w0 w1 w2 w3 w4 w5 w6 w7 w8 w9 w10 w11 w12 w13 w14 w15 Faz, López, de Oliveira (IC-UNICAMP) Performance Eval Crypto ISA Modern Arch AsiaCCS/Asia-PKC 2018 6 / 33 w0 w1 w2 w3 w4 w5 w6 w7 w8 w9 w10 w11 w12 w13 w14 w15 w1 w2 w3 w4 w5 w6 w7 w8 w9 w10 w11 w12 w13 w14 w15 w16 w2 w3 w4 w5 w6 w7 w8 w9 w10 w11 w12 w13 w14 w15 w16 w17 w63 w1 + σ0(w2) + w10 + σ1(w15)→ ... • Rename words to calculate w17. • Repeat this proceeding to calculate the words w16, . , w63. Update Phase 1: Message Schedule • Split message block into sixteen blocks of 32 bits and calculate w16. w0 w1 w2 w3 w4 w5 w6 w7 w8 w9 w10 w11 w12 w13 w14 w15 w16 w0 + σ0(w1) + w9 + σ1(w14) → σ0(x) = rot(x, 7) ⊕ rot(x, 18) ⊕ shr(x, 3) where σ1(x) = rot(x, 17) ⊕ rot(x, 19) ⊕ shr(x, 10) Faz, López, de Oliveira (IC-UNICAMP) Performance Eval Crypto ISA Modern Arch AsiaCCS/Asia-PKC 2018 6 / 33 • Repeat this proceeding to calculate the words w16, . , w63. w2 w3 w4 w5 w6 w7 w8 w9 w10 w11 w12 w13 w14 w15 w16 w17 w63 w1 + σ0(w2) + w10 + σ1(w15)→ ... Update Phase 1: Message Schedule • Split message block into sixteen blocks of 32 bits and calculate w16. w0 w1 w2 w3 w4 w5 w6 w7 w8 w9 w10 w11 w12 w13 w14 w15 w16 w0 + σ0(w1) + w9 + σ1(w14) → • Rename words to calculate w17. w0 w1 w2 w3 w4 w5 w6 w7 w8 w9 w10 w11 w12 w13 w14 w15 w1 w2 w3 w4 w5 w6 w7 w8 w9 w10 w11 w12 w13 w14 w15 w16 Faz, López, de Oliveira (IC-UNICAMP) Performance Eval Crypto ISA Modern Arch AsiaCCS/Asia-PKC 2018 6 / 33 • Repeat this proceeding to calculate the words w16, . , w63. w0 w1 w2 w3 w4 w5 w6 w7 w8 w9 w10 w11 w12 w13 w14 w15 w2 w3 w4 w5 w6 w7 w8 w9 w10 w11 w12 w13 w14 w15 w16 w17 w63 ... Update Phase 1: Message Schedule • Split message block into sixteen blocks of 32 bits and calculate w16. w0 w1 w2 w3 w4 w5 w6 w7 w8 w9 w10 w11 w12 w13 w14 w15 w16 w0 + σ0(w1) + w9 + σ1(w14) → • Rename words to calculate w17. w1 w2 w3 w4 w5 w6 w7 w8 w9 w10 w11 w12 w13 w14 w15 w16 w17 w1 + σ0(w2) + w10 + σ1(w15)→ Faz, López, de Oliveira (IC-UNICAMP) Performance Eval Crypto ISA Modern Arch AsiaCCS/Asia-PKC 2018 6 / 33 • Repeat this proceeding to calculate the words w16, . , w63. w0 w1 w2 w3 w4 w5 w6 w7 w8 w9 w10 w11 w12 w13 w14 w15 w17 w63 w1 + σ0(w2) + w10 + σ1(w15)→ ... Update Phase 1: Message Schedule • Split message block into sixteen blocks of 32 bits and calculate w16. w0 w1 w2 w3 w4 w5 w6 w7 w8 w9 w10 w11 w12 w13 w14 w15 w16 w0 + σ0(w1) + w9 + σ1(w14) → • Rename words to calculate w17. w1 w2 w3 w4 w5 w6 w7 w8 w9 w10 w11 w12 w13 w14 w15 w16 w2 w3 w4 w5 w6 w7 w8 w9 w10 w11 w12 w13 w14 w15 w16 w17 Faz, López, de Oliveira (IC-UNICAMP) Performance Eval Crypto ISA Modern Arch AsiaCCS/Asia-PKC 2018 6 / 33 w0 w1 w2 w3 w4 w5 w6 w7 w8 w9 w10 w11 w12 w13 w14 w15 w1 w2 w3 w4 w5 w6 w7 w8 w9 w10 w11 w12 w13 w14 w15 w16 w17 w1 + σ0(w2) + w10 + σ1(w15)→ Update Phase 1: Message Schedule • Split message block into sixteen blocks of 32 bits and calculate w16. w0 w1 w2 w3 w4 w5 w6 w7 w8 w9 w10 w11 w12 w13 w14 w15 w16 w0 + σ0(w1) + w9 + σ1(w14) → • Rename words to calculate w17. w2 w3 w4 w5 w6 w7 w8 w9 w10 w11 w12 w13 w14 w15 w16 w17 w63 ... • Repeat this proceeding to calculate the words w16, . , w63. Faz, López, de Oliveira (IC-UNICAMP) Performance Eval Crypto ISA Modern Arch AsiaCCS/Asia-PKC 2018 6 / 33 SHA-NI implementation. w16 w0 w1 w9 w12 w17 w1 w2 w10 w13 = + σ0 + + σ1 w18 w2 w3 w11 w14 w19 w3 w4 w12 w15 | {z } | {z } SHA256MSG1 PALIGNR | {z } PADD | {z } SHA256MSG2 Update Phase 1: SHA-NI Implementation Sequential implementation. w16 = w0 + σ0(w1) + w9 + w12 w17 = w1 + σ0(w2) + w10 + w13 w18 = w2 + σ0(w3) + w11 + w14 w19 = w3 + σ0(w4) + w12 + w15 Faz, López, de Oliveira (IC-UNICAMP) Performance Eval Crypto ISA Modern Arch AsiaCCS/Asia-PKC 2018 7 / 33 Update Phase 1: SHA-NI Implementation Sequential implementation. w16 = w0 + σ0(w1) + w9 + w12 w17 = w1 + σ0(w2) + w10 + w13 w18 = w2 + σ0(w3) + w11 + w14 w19 = w3 + σ0(w4) + w12 + w15 SHA-NI implementation. w16 w0 w1 w9 w12 w17 w1 w2 w10 w13 = + σ0 + + σ1 w18 w2 w3 w11 w14 w19 w3 w4 w12 w15 | {z } | {z } SHA256MSG1 PALIGNR | {z } PADD | {z } SHA256MSG2 Faz, López, de Oliveira (IC-UNICAMP) Performance Eval Crypto ISA Modern Arch AsiaCCS/Asia-PKC 2018 7 / 33 Update Phase 2: State Update The state is split in eight 32-bit words and is processed for i from 0 to 63: T2i ai ai+1 bi bi+1 ci ci+1 di di+1 ei ei+1 fi fi+1 gi gi+1 hi T1i hi+1 ki wi Faz, López, de Oliveira (IC-UNICAMP) Performance Eval Crypto ISA Modern Arch AsiaCCS/Asia-PKC 2018 8 / 33 Update Phase 2: Two Iterations Some values are not modified in every two consecutive iterations.

Sok: a Performance Evaluation of Cryptographic Instruction Sets on Modern Architectures

Intel® Architecture Instruction Set Extensions and Future Features Programming Reference

A Superscalar Out-Of-Order X86 Soft Processor for FPGA

Multiprocessing Contents

2Nd Generation Intel® Core™ Processor Family Mobile with ECC

Travelmate P6 Series Product Sheet

The Intel X86 Microarchitectures Map Version 2.0

Intel Xeon Processor Can Be Identified by the Following Values

Intel's Haswell CPU Microarchitecture

Tenant/Landlord Special Issue

The Intel X86 Microarchitectures Map Version 2.2

Itanium and Vnuma

P6: Microarchitecture