SIMD Instruction Set Extensions for KECCAK with Applications to SHA-3, Keyak and Ketje

Total Page:16

File Type:pdf, Size:1020Kb

SIMD Instruction Set Extensions for KECCAK with Applications to SHA-3, Keyak and Ketje SIMD Instruction Set Extensions for KECCAK with Applications to SHA-3, Keyak and Ketje! Hemendra K. Rawat and Patrick Schaumont! Virginia tech, Blacksburg, USA! {hrawat, schaum}@vt.edu! 1 Motivation q" Secure systems and protocols rely on a suite of cryptographic applications §" Hashing, MAC, Encryption, PRNG, AEAD etc... q" Traditionally, different algorithms: SHA-1, SHA-2, AES, GMAC, HMAC q" Different Instruction set extensions: AES-NI, SHA and Carry-less multiplication §" Vector extensions in Intel, ARM, SPARC and PowerPC Q" KECCAK is the SHA-3 §" Successor of SHA-2 q" Subset of the cryptographic primitive family KECCAK SPONGE q" Not just a Hash q" A Flexible Sponge for all symmetric cryptography “A flexible SIMD Instruction set, for flexible KECCAK Sponge” ! 2 Outline Q" KECCAK Background q" Novelty Claims q" Design Principles q" Proposed Instruction Set Extensions q" Results q" Performance q" Hardware Overhead q" Conclusion 3 KECCAK HASH (SHA-3) r bits Variable length input Output Digest Padding rate (r bits) 0 f f … f f f … f capacity 0 (c bits) Cryptographic State Permutation Absorbing Squeezing Hashing using KECCAK Sponge Construction 4 KECCAK-f in Nutshell b = (r +c ) bits KECCAK-f[b] Permutation state [0 : b] 0 0 def keccakf[b]: for numRounds in range(0, maxRounds): theta(state) # Add column parities x=0 x=1 x=2 x=3 x=4 f rho(state) # Rotate lanes lane pi(state) # Transpose lanes y=4 L04 L14 L24 L34 L44 chi(state) # Add non-linearity iota(state) # L00 ⊕ Kround y=3 L03 L13 L23 L33 L43 b = 1600, 800, 400, 200, 100, 50, 25 bits y=2 L02 L12 L22 L32 L42 y=1 L01 L11 L21 L31 L41 y=0 L00 L10 L20 L30 L40 y z plane slice x State x 5 KECCAK Applications Message Digest Key IV Keystream rate r 0 f f f 0 f f f capacity c (a) (c) Key Message MAC Key Message MAC 0 f f f 0 f f f (b) (d) Keystream Constructions with sponge: (a) Hash, (b) Message Authentication Code, (c) Keystream Generation. Construction with duplex: (d) Authenticated Encryption 6 Application Stack Application Hash MAC PRNG AEAD Layer Cryptographic Sponge Duplex Construction API KECCAK- KECCAK- KECCAK- KECCAK- Primitives f[1600] p[800,12] f[400] f[200] rl1x rl1x rl1x rl1x Custom kxorr64 xorr xorr xorr Instructions chi1 chi1 chi1 chi1 chi2 chi3 chi3 chi3 7 Novelty Claims Previous work q Custom-instruction designs for a 16 bit micro-controller for SHA-3 finalists, Constantin et al. q Integration of 64 bit KECCAK(SHA-3) data path into a 32 bit LEON3, Wang et al. This work q Six new custom instructions for 128 bit SIMD unit. q Multipurpose KECCAK:1600, 800, 400 and 200 bits. q Five KECCAK algorithms: SHA3-512 hash, LakeKEYAK, RiverKEYAK, KetjeSR and KetjeJR authenticated ciphers. q Compatible with ARM NEON Instruction set 8 Design Principles § Easy integration into different processor architectures Portability § RISC-like instruction format (2 input,1 output operand) § No non-standard architectural features § Support for multiple symmetric cryptographic applications § Hashing § MACing Flexibility § Stream Ciphers § AEAD § PRNG § Small set of instructions § Low hardware Overhead Simplicity § Simple operations like XOR, AND, NOT and rotations § Short critical path 9 Design Principles § Optimize Computation intensive part – KECCAK-{f,p} primitives § Partition dataflow graph into SIMD-like instruction patterns § 2x128 bit input and 128 bit output § Minimize the schedule length of the graph Performance § Optimize instruction shapes and functionality for § High ILP (instruction-level parallelism) § Minimum register spills § Minimum MOV/VEXT for register rearrangements § In-place round computation 10 Mapping KECCAK to Instructions x=0 x=1 x=2 x=3 x=4 lane NEON Register Naming Convention Qi = Quad word (128 bit) y=4 L04 L14 L24 L34 L44 Di = Double word (64 bit) y=3 L03 L13 L23 L33 L43 State Si = Single word (32 bit) d0/s0 d1/s1 y=2 L02 L12 L22 L32 L42 y=1 L01 L11 L21 L31 L41 NEON Instruction specifier ROL VADD.I32 q1, q2, q3 specifies y=0 L00 L10 L20 L30 L40 1 y q1, q2 and q3 have 4x32-bit integer plane z slice XOR x XOR XOR XOR XOR XOR x Column parity d2/s2 C0 C1 C2 C3 C4 ROL ROL ROL ROL ROL rl1x.u64 d2, d0, d1 1 1 1 1 1 rl1x.u32 s2, s0, s1 rl1x.u16 s2, s0, s1 XOR XOR XOR XOR XOR rl1x.u8 s2, s0, s1 D0 D1 D2 D3 D4 θ-effect Instruction rl1x Data dependencies of θ-effect 11 KECCAK Custom Instructions Instruction Step Description Target Primitive Syntax rl1x.u64 d2, d0, d1 theta 1600, 800, 400, rl1x.u32 s2, s0, s1 rl1x Rotate Left 1 and XOR 200 rl1x.u16 s2, s0, s1 rl1x.u8 s2, s0, s1 theta, rho kxorr64 XOR Rotate & Assign 1600 kxorr64 d2, d0, d1, #i & pi xorr.u32 s2, s0, s1, #i theta, rho xorr XOR Rotate & Assign 800, 400, 200 xorr.u16 s2, s0, s1, #i & pi xorr.u8 s2, s0, s1, #i chi 1600, 800, 400, chi1.u32 q2, q0, q1 chi1 chi step 200 chi1.u64 q2, q0, q1 chi2 chi chi step (last lane) 1600 chi2.u64 d4, q0, q1 chi3 chi chi step (last lane) 800, 400, 200 chi3.u32 s4, d0, d1 12 Implementation & Validation q Reference Implementations SHA3 KetjeJR KetjeSR Crypto § KeccakCodePackge : http://keccak.noekeon.org/files.html LakeKeyak RiverKeyak Applications q Hand Optimized Custom Instruction implementations Compiler § KECCAK-f,p {1600,800,400, 200} GNU KECCAK Assembler Instructions Compiler Framework q Cross compiled GCC with KECCAK Instruction support Linker q Timing simple CPU model in GEM5 ARM Native Binaries GEM5 q Instructions in GEM5’s ISA description language GEM5 Front End Architectural Simulator q Simulation CPU Model ISA KECCAK Instructions § Single core ARM CPU @ 1 GHz Atomic Simple Decoder § 32KB L1 I and D cache Timing SimpleCP TLB InOrder Faults § L2 cache of 2MB O3 Interrupts 13 Results: Performance OpHmized C 32-bit ASM NEON This work(CI) 350 KECCAK-p[200,nr] Performance gain between 1.4 - 2.6x over hand 300 KECCAK-f[1600] optimized assembly on 250 ARMv7 ECCAK K -f[400,nr] 200 KECCAK-f[800,nr] 150 100 INSTRUCTIONS/BYTE 50 0 SHA3(c=1024) Lakekeyak (E) Lakekeyak (D) Riverkeyak (E) Riverkeyak (D) KetjeSR (E) KetjeSR (D) KetjeJR (E) KetjeJR (D) Hash AEAD (EncrypHon+MAC) Lightweight AEAD Performance in instructions/byte for various KECCAK modes 14 Results: Hardware Cost Gate equivalent es3mates with UMC 90nm (4658 GE) 1 1869 1221 894 242 122 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% rl1x kxorr64 xorr chi1 chi2 chi3 [CATEGORY NAME] 0.1% Overhead Cortex-A9 Core 100% Keccak InstrucHons Cortex-A9 Core 15 Conclusion q Analysis of the instruction set design space for KECCAK primitives q Six custom instructions based on NEON instruction set in ARMv7 q Five different KECCAK applications: SHA3, LakeKEYAK, RiverKEYAK, KetjeSR and KetjeJR q Performance gain between 1.4 - 2.6x over hand optimized assembly on ARMv7 at a hardware overhead of just 4658 GEs. q Portability Aspects, Intel AVX, Generic 64/32 bit architectures, in the paper…. 16 Thank you …. Questions! 17 Appendix 18 KECCAK (SHA-3) Instance Definition SHA3-224(M) Keccak[448](M||01, 224) length) SHA3-256(M) Keccak[512](M||01, 256) H SHA3-384(M) Keccak[768](M||01, 384) Arbitrrary SHA3-512(M) Keccak[1024](M||01, 512) SHAKE128(M, d) Keccak[256](M||1111, d) Hash Value (Fixed Length) Hash Value MessageM ( SHAKE256(M, d) Keccak[512](M||1111, d) 19 Keccak Source: http://keccak.noekeon.org/ 20 KECCAK-f Permutation (2) L00 ⊕ KRound θ step ρ step π step χ step ι step R= θ o ρ o π o χ o ι Source: http://keccak.noekeon.org/ 21 Proposed Instruction Set Extensions L00 L11 L22 L33 L44 θ &π d0/s0 d1/s1 step D0 D1 D2 D3 D4 XOR XOR XOR XOR XOR XOR ROL Intermediate (i) value B0 B1 B2 B3 B4 ρ step d2/s2 ρ ρ ρ ρ ρ ( b) Intermediate P0 P1 P2 P3 P4 value kxorr64 d2, d0, d1, #i xorr.u32 s2, s0, s1, #i χ xorr.u16 s2, s0, s1, #i step χ χ χ χ χ xorr.u8 s2, s0, s1, #i L00 L10 L20 L30 L40 Instruction kxorr64/xorr Data dependencies of θ,ρ ,π ,χ, ι (one plane) 22 Proposed Instruction Set Extensions q0 q1 q0 q1 NOT NOT NOT NOT NOT NOT AND AND AND AND AND AND XOR XOR XOR XOR XOR XOR (a) q2 q2 chi1.u32 q2, q0, q1 chi1.u64 q2, q0, q1 q0 q1 d0 d1 NOT NOT AND AND XOR XOR (b) (c) d4 s4 Data dependencies X step (one plane) chi2.u64 d4, q0, q1 chi3.u32 s4, d0, d1 Instructions chi1,chi2 and chi3 23 http://keccak.noekeon.org/ Portability Aspects • Based on instruction format, register width and the instruction encoding width. Table 2: Feasibility of the proposed instructions on different platforms Instruction Intel AVX 64 bit Arch 32 bit Arch rl1x ü ü* ü* kxorr64 ü ü X xorr ü ü ü chi1 ü X X chi2 ü X X chi3 ü ü X * Not all variants supported 24 Results: Hardware Cost Table 1: Area, GE and Transistor count estimates for the proposed custom instructions Instruction µ2 Gate equivalent Transistor Equivalent rl1x 1238 310 1238 kxorr64 7474 1869 7474 xorr 4884 1221 4884 chi1 3576 894 3576 chi2 966 242 966 chi3 486 122 486 Total 18624 4658 18624 Typical Cortex-A9 CPU 3.8 Million Gates KECCAK Instructions have 0.1 % overhead 25 Motivation Cryptographic Intel AES-NI, SHA-1,2, Carry-less multiplication extensions Instruction-Set ARMv8 Crypto Instructions Processor AARM (Soft IP) Architecture RISC-V (Open Architecture from UC, Berkeley) Symmetric HHash, MAC, Encryption/Decryption, AEAD, PRNG Cryptography Applications Universal Hash: SHA-3 (Winner of NIST SHA-3 Competition) Sponge AEAD: LakeKeyak, Ketje AEAD Construction PRNG, Stream Ciphers etc.… 26 .
Recommended publications
  • GPU-Based Password Cracking on the Security of Password Hashing Schemes Regarding Advances in Graphics Processing Units
    Radboud University Nijmegen Faculty of Science Kerckhoffs Institute Master of Science Thesis GPU-based Password Cracking On the Security of Password Hashing Schemes regarding Advances in Graphics Processing Units by Martijn Sprengers [email protected] Supervisors: Dr. L. Batina (Radboud University Nijmegen) Ir. S. Hegt (KPMG IT Advisory) Ir. P. Ceelen (KPMG IT Advisory) Thesis number: 646 Final Version Abstract Since users rely on passwords to authenticate themselves to computer systems, ad- versaries attempt to recover those passwords. To prevent such a recovery, various password hashing schemes can be used to store passwords securely. However, recent advances in the graphics processing unit (GPU) hardware challenge the way we have to look at secure password storage. GPU's have proven to be suitable for crypto- graphic operations and provide a significant speedup in performance compared to traditional central processing units (CPU's). This research focuses on the security requirements and properties of prevalent pass- word hashing schemes. Moreover, we present a proof of concept that launches an exhaustive search attack on the MD5-crypt password hashing scheme using modern GPU's. We show that it is possible to achieve a performance of 880 000 hashes per second, using different optimization techniques. Therefore our implementation, executed on a typical GPU, is more than 30 times faster than equally priced CPU hardware. With this performance increase, `complex' passwords with a length of 8 characters are now becoming feasible to crack. In addition, we show that between 50% and 80% of the passwords in a leaked database could be recovered within 2 months of computation time on one Nvidia GeForce 295 GTX.
    [Show full text]
  • BLAKE2: Simpler, Smaller, Fast As MD5
    BLAKE2: simpler, smaller, fast as MD5 Jean-Philippe Aumasson1, Samuel Neves2, Zooko Wilcox-O'Hearn3, and Christian Winnerlein4 1 Kudelski Security, Switzerland [email protected] 2 University of Coimbra, Portugal [email protected] 3 Least Authority Enterprises, USA [email protected] 4 Ludwig Maximilian University of Munich, Germany [email protected] Abstract. We present the hash function BLAKE2, an improved version of the SHA-3 finalist BLAKE optimized for speed in software. Target applications include cloud storage, intrusion detection, or version control systems. BLAKE2 comes in two main flavors: BLAKE2b is optimized for 64-bit platforms, and BLAKE2s for smaller architectures. On 64- bit platforms, BLAKE2 is often faster than MD5, yet provides security similar to that of SHA-3: up to 256-bit collision resistance, immunity to length extension, indifferentiability from a random oracle, etc. We specify parallel versions BLAKE2bp and BLAKE2sp that are up to 4 and 8 times faster, by taking advantage of SIMD and/or multiple cores. BLAKE2 reduces the RAM requirements of BLAKE down to 168 bytes, making it smaller than any of the five SHA-3 finalists, and 32% smaller than BLAKE. Finally, BLAKE2 provides a comprehensive support for tree-hashing as well as keyed hashing (be it in sequential or tree mode). 1 Introduction The SHA-3 Competition succeeded in selecting a hash function that comple- ments SHA-2 and is much faster than SHA-2 in hardware [1]. There is nev- ertheless a demand for fast software hashing for applications such as integrity checking and deduplication in filesystems and cloud storage, host-based intrusion detection, version control systems, or secure boot schemes.
    [Show full text]
  • Efficient Hashing Using the AES Instruction
    Efficient Hashing Using the AES Instruction Set Joppe W. Bos1, Onur Özen1, and Martijn Stam2 1 Laboratory for Cryptologic Algorithms, EPFL, Station 14, CH-1015 Lausanne, Switzerland {joppe.bos,onur.ozen}@epfl.ch 2 Department of Computer Science, University of Bristol, Merchant Venturers Building, Woodland Road, Bristol, BS8 1UB, United Kingdom [email protected] Abstract. In this work, we provide a software benchmark for a large range of 256-bit blockcipher-based hash functions. We instantiate the underlying blockci- pher with AES, which allows us to exploit the recent AES instruction set (AES- NI). Since AES itself only outputs 128 bits, we consider double-block-length constructions, as well as (single-block-length) constructions based on RIJNDAEL- 256. Although we primarily target architectures supporting AES-NI, our frame- work has much broader applications by estimating the performance of these hash functions on any (micro-)architecture given AES-benchmark results. As far as we are aware, this is the first comprehensive performance comparison of multi- block-length hash functions in software. 1 Introduction Historically, the most popular way of constructing a hash function is to iterate a com- pression function that itself is based on a blockcipher (this idea dates back to Ra- bin [49]). This approach has the practical advantage—especially on resource-constrained devices—that only a single primitive is needed to implement two functionalities (namely encrypting and hashing). Moreover, trust in the blockcipher can be conferred to the cor- responding hash function. The wisdom of blockcipher-based hashing is still valid today. Indeed, the current cryptographic hash function standard SHA-2 and some of the SHA- 3 candidates are, or can be regarded as, blockcipher-based designs.
    [Show full text]
  • NISTIR 7620 Status Report on the First Round of the SHA-3
    NISTIR 7620 Status Report on the First Round of the SHA-3 Cryptographic Hash Algorithm Competition Andrew Regenscheid Ray Perlner Shu-jen Chang John Kelsey Mridul Nandi Souradyuti Paul NISTIR 7620 Status Report on the First Round of the SHA-3 Cryptographic Hash Algorithm Competition Andrew Regenscheid Ray Perlner Shu-jen Chang John Kelsey Mridul Nandi Souradyuti Paul Information Technology Laboratory National Institute of Standards and Technology Gaithersburg, MD 20899-8930 September 2009 U.S. Department of Commerce Gary Locke, Secretary National Institute of Standards and Technology Patrick D. Gallagher, Deputy Director NISTIR 7620: Status Report on the First Round of the SHA-3 Cryptographic Hash Algorithm Competition Abstract The National Institute of Standards and Technology is in the process of selecting a new cryptographic hash algorithm through a public competition. The new hash algorithm will be referred to as “SHA-3” and will complement the SHA-2 hash algorithms currently specified in FIPS 180-3, Secure Hash Standard. In October, 2008, 64 candidate algorithms were submitted to NIST for consideration. Among these, 51 met the minimum acceptance criteria and were accepted as First-Round Candidates on Dec. 10, 2008, marking the beginning of the First Round of the SHA-3 cryptographic hash algorithm competition. This report describes the evaluation criteria and selection process, based on public feedback and internal review of the first-round candidates, and summarizes the 14 candidate algorithms announced on July 24, 2009 for moving forward to the second round of the competition. The 14 Second-Round Candidates are BLAKE, BLUE MIDNIGHT WISH, CubeHash, ECHO, Fugue, Grøstl, Hamsi, JH, Keccak, Luffa, Shabal, SHAvite-3, SIMD, and Skein.
    [Show full text]
  • High-Speed Hardware Implementations of BLAKE, Blue
    High-Speed Hardware Implementations of BLAKE, Blue Midnight Wish, CubeHash, ECHO, Fugue, Grøstl, Hamsi, JH, Keccak, Luffa, Shabal, SHAvite-3, SIMD, and Skein Version 2.0, November 11, 2009 Stefan Tillich, Martin Feldhofer, Mario Kirschbaum, Thomas Plos, J¨orn-Marc Schmidt, and Alexander Szekely Graz University of Technology, Institute for Applied Information Processing and Communications, Inffeldgasse 16a, A{8010 Graz, Austria {Stefan.Tillich,Martin.Feldhofer,Mario.Kirschbaum, Thomas.Plos,Joern-Marc.Schmidt,Alexander.Szekely}@iaik.tugraz.at Abstract. In this paper we describe our high-speed hardware imple- mentations of the 14 candidates of the second evaluation round of the SHA-3 hash function competition. We synthesized all implementations using a uniform tool chain, standard-cell library, target technology, and optimization heuristic. This work provides the fairest comparison of all second-round candidates to date. Keywords: SHA-3, round 2, hardware, ASIC, standard-cell implemen- tation, high speed, high throughput, BLAKE, Blue Midnight Wish, Cube- Hash, ECHO, Fugue, Grøstl, Hamsi, JH, Keccak, Luffa, Shabal, SHAvite-3, SIMD, Skein. 1 About Paper Version 2.0 This version of the paper contains improved performance results for Blue Mid- night Wish and SHAvite-3, which have been achieved with additional imple- mentation variants. Furthermore, we include the performance results of a simple SHA-256 implementation as a point of reference. As of now, the implementations of 13 of the candidates include eventual round-two tweaks. Our implementation of SIMD realizes the specification from round one. 2 Introduction Following the weakening of the widely-used SHA-1 hash algorithm and concerns over the similarly-structured algorithms of the SHA-2 family, the US NIST has initiated the SHA-3 contest in order to select a suitable drop-in replacement [27].
    [Show full text]
  • Reducing the Impact of Dos Attacks on Endpoint IP Security
    Reducing the Impact of DoS Attacks on Endpoint IP Security Joseph D. Touch and Yi-Hua Edward Yang USC/ISI [email protected] / [email protected] 1 Abstract some attack traffic. These results also indicate that this technique becomes more effective as the algorithm IP security is designed to protect hosts from attack, becomes more computationally intensive, and suggest but can itself provide a way to overwhelm the that such hierarchical intra-packet defenses are needed resources of a host. One such denial of service (DoS) to avoid IPsec being itself an opportunity for attack. attack involves sending incorrectly signed packets to a host, which then consumes substantial CPU resources 2. Background to reject unwanted traffic. This paper quantifies the impact of such attacks and explores preliminary ways Performance has been a significant issue for to reduce that impact. Measurements of the impact of Internet security since its inception and affects the DoS attack traffic on PC hosts indicate that a single IPsec suite both at the IKE (session establishment) and attacker can reduce throughput by 50%. This impact IPsec (packets in a session) level [10][11]. This paper can be reduced to 20% by layering low-effort nonce focuses on the IPsec level, i.e., protection for validation on IPsec’s CPU-intensive cryptographic established sessions. Previous performance analysis of algorithms, but the choice of algorithm does not have HMAC-MD5 (Hashed-MAC, where MAC means as large an effect. This work suggests that effective keyed Message Authentication Code), HMAC-SHA1, DoS resistance requires a hierarchical defense using and 3DES showed that the cost of the cryptographic both nonces and strong cryptography at the endpoints, algorithms dwarfs other IPsec overheads [6] [7] [15].
    [Show full text]
  • Performance Analysis of Cryptographic Hash Functions Suitable for Use in Blockchain
    I. J. Computer Network and Information Security, 2021, 2, 1-15 Published Online April 2021 in MECS (http://www.mecs-press.org/) DOI: 10.5815/ijcnis.2021.02.01 Performance Analysis of Cryptographic Hash Functions Suitable for Use in Blockchain Alexandr Kuznetsov1 , Inna Oleshko2, Vladyslav Tymchenko3, Konstantin Lisitsky4, Mariia Rodinko5 and Andrii Kolhatin6 1,3,4,5,6 V. N. Karazin Kharkiv National University, Svobody sq., 4, Kharkiv, 61022, Ukraine E-mail: [email protected], [email protected], [email protected], [email protected], [email protected] 2 Kharkiv National University of Radio Electronics, Nauky Ave. 14, Kharkiv, 61166, Ukraine E-mail: [email protected] Received: 30 June 2020; Accepted: 21 October 2020; Published: 08 April 2021 Abstract: A blockchain, or in other words a chain of transaction blocks, is a distributed database that maintains an ordered chain of blocks that reliably connect the information contained in them. Copies of chain blocks are usually stored on multiple computers and synchronized in accordance with the rules of building a chain of blocks, which provides secure and change-resistant storage of information. To build linked lists of blocks hashing is used. Hashing is a special cryptographic primitive that provides one-way, resistance to collisions and search for prototypes computation of hash value (hash or message digest). In this paper a comparative analysis of the performance of hashing algorithms that can be used in modern decentralized blockchain networks are conducted. Specifically, the hash performance on different desktop systems, the number of cycles per byte (Cycles/byte), the amount of hashed message per second (MB/s) and the hash rate (KHash/s) are investigated.
    [Show full text]
  • Comb to Pipeline: Fast Software Encryption Revisited†
    Comb to Pipeline: Fast Software Encryption Revisitedy Andrey Bogdanov, Martin M. Lauridsen, and Elmar Tischhauser DTU Compute, Technical University of Denmark, Denmark anbog,mmeh,ewti @dtu.dk f g Abstract. AES-NI, or Advanced Encryption Standard New Instruc- tions, is an extension of the x86 architecture proposed by Intel in 2008. With a pipelined implementation utilizing AES-NI, parallelizable modes such as AES-CTR become extremely efficient. However, out of the four non-trivial NIST-recommended encryption modes, three are inherently sequential: CBC, CFB, and OFB. This inhibits the advantage of using AES-NI significantly. Similar observations apply to CMAC, CCM and a great deal of other modes. We address this issue by proposing the comb scheduler { a fast scheduling algorithm based on an efficient look-ahead strategy, featuring a low overhead { with which sequential modes profit from the AES-NI pipeline in real-world settings by filling it with multiple, independent messages. As our main target platform we apply the comb scheduler to implemen- tations on Haswell, a recent Intel microarchitecture, for a wide range of modes. We observe a drastic speed-up of factor 5 for NIST's CBC, CFB, OFB and CMAC performing around 0:88 cpb. Surprisingly, con- trary to the entire body of previous performance analysis, the through- put of the authenticated encryption (AE) mode CCM gets very close to that of GCM and OCB3, with about 1:64 cpb (vs. 1:63 cpb and 1:51 cpb, resp.), when message lengths are sampled according to a realistic distribution for Internet packets, despite Haswell's heavily improved bi- nary field multiplication.
    [Show full text]
  • Keccak and the SHA-3 Standardization
    Keccak and the SHA-3 Standardization Guido Bertoni1 Joan Daemen1 Michaël Peeters2 Gilles Van Assche1 1STMicroelectronics 2NXP Semiconductors NIST, Gaithersburg, MD February 6, 2013 1 / 60 Outline 1 The beginning 2 The sponge construction 3 Inside Keccak 4 Analysis underlying Keccak 5 Applications of Keccak, or sponge 6 Some ideas for the SHA-3 standard 2 / 60 The beginning Outline 1 The beginning 2 The sponge construction 3 Inside Keccak 4 Analysis underlying Keccak 5 Applications of Keccak, or sponge 6 Some ideas for the SHA-3 standard 3 / 60 The beginning Cryptographic hash functions * h : f0, 1g ! f0, 1gn Input message Digest MD5: n = 128 (Ron Rivest, 1992) SHA-1: n = 160 (NSA, NIST, 1995) SHA-2: n 2 f224, 256, 384, 512g (NSA, NIST, 2001) 4 / 60 The beginning Our beginning: RadioGatún Initiative to design hash/stream function (late 2005) rumours about NIST call for hash functions forming of Keccak Team starting point: fixing Panama [Daemen, Clapp, FSE 1998] RadioGatún [Keccak team, NIST 2nd hash workshop 2006] more conservative than Panama variable-length output expressing security claim: non-trivial exercise Sponge functions [Keccak team, Ecrypt hash, 2007] closest thing to a random oracle with a finite state Sponge construction calling random permutation 5 / 60 The beginning From RadioGatún to Keccak RadioGatún confidence crisis (2007-2008) own experiments did not inspire confidence in RadioGatún neither did third-party cryptanalysis [Bouillaguet, Fouque, SAC 2008] [Fuhr, Peyrin, FSE 2009] follow-up design Gnoblio went nowhere NIST SHA-3
    [Show full text]
  • Stribobr2: “WHIRLBOB”
    STRIBOBr2: “WHIRLBOB” Second Round CAESAR Algorithm Tweak Specification Pricipal Submitter Markku-Juhani O. Saarinen [email protected] Queen’s University Belfast, UK Billy B. Brumley [email protected] Tampere University of Technology, Finland August 28, 2015 For updates and further information: http://www.stribob.com Contents Preface and Introduction 1 1 Specification 2 1.1 Structure of the π Permutation .................................. 2 1.1.1 Substitution S or SubBytes ................................ 2 1.1.2 Permutation P or ShiftColumns ............................. 2 1.1.3 Linear operation L or MixRows .............................. 3 1.1.4 Constants Cr or AddRoundKey ............................... 3 1.1.5 Code for π Computation .................................. 4 1.2 BLNK Sponge Mode and Padding ................................ 5 1.2.1 BLNK Block Operations .................................. 5 1.2.2 The CAESAR encrypt() and decrypt() AEAD API .................... 6 1.3 Test Vectors ............................................. 7 1.3.1 The 12-round π transform ................................. 7 1.3.2 Authenticated Encryption ................................. 8 2 Security Goals 9 2.1 Specific Goals ............................................ 9 2.2 Nonce Re-Use ............................................ 9 2.3 General Goals ............................................ 9 3 Security Analysis 10 3.1 Attacks on the Sponge AEAD Mode BLNK ........................... 10 3.2 The π Permutation ........................................
    [Show full text]
  • Keccak and SHA-3: Code and Standard Updates
    Keccak and SHA-3: code and standard updates Guido Bertoni1 Joan Daemen1 Michaël Peeters2 Gilles Van Assche1 Ronny Van Keer1 1STMicroelectronics 2NXP Semiconductors FOSDEM 2015, Brussels, January 31st & February 1st, 2015 1 / 39 Outline 1 What is Keccak 2 NIST plans 3 The CAESAR competition 4 Keccak code package 2 / 39 What is Keccak Outline 1 What is Keccak 2 NIST plans 3 The CAESAR competition 4 Keccak code package 3 / 39 ∗ h : f0, 1g ! f0, 1gn #!/bin/ash notmagritte() In Out { echo ”this is a ash function!” } This is not a hash function! This is a hash function! What is Keccak What is a hash function? 4 / 39 ∗ h : f0, 1g ! f0, 1gn In Out This is not a hash function! This is a hash function! What is Keccak What is a hash function? #!/bin/ash notmagritte() { echo ”this is a ash function!” } 4 / 39 ∗ h : f0, 1g ! f0, 1gn In Out This is a hash function! What is Keccak What is a hash function? #!/bin/ash notmagritte() { echo ”this is a ash function!” } This is not a hash function! 4 / 39 In Out What is Keccak What is a hash function? ∗ h : f0, 1g ! f0, 1gn #!/bin/ash notmagritte() { echo ”this is a ash function!” } This is not a hash function! This is a hash function! 4 / 39 What is Keccak Cryptographic hash functions ∗ h : f0, 1g ! f0, 1gn Input message Digest MD5: n = 128 (Ron Rivest, 1992) SHA-1/2: n 2 f160, 224, 256, 384, 512g (NSA, NIST, 1995-2001) …and Keccak? It is a (cryptographic) sponge function! 5 / 39 What is Keccak Cryptographic sponge functions Var.-length input Variable-length output r 0 f f … f f f … f c 0 absorbing squeezing
    [Show full text]
  • Statistical Testing of Blockchain Hash Algorithms
    Statistical Testing of Blockchain Hash Algorithms Alexandr Kuznetsov 1 [0000-0003-2331-6326], Maria Lutsenko 1 [0000-0003-2075-5796], Kateryna Kuznetsova 1 [0000-0002-5605-9293], Olena Martyniuk 2 [0000-0002-0377-7881], Vitalina Babenko 1 [0000-0002-4816-4579] and Iryna Perevozova 3[0000-0002-3878-802X] 1 V. N. Karazin Kharkiv National University, Kharkiv, Ukraine [email protected], [email protected], [email protected], [email protected] 2 International Humanitarian University, Odessa, Ukraine, [email protected] 3 Ivano-Frankivsk National Technical University of Oil and Gas, Ivano-Frankivsk, Ukraine, [email protected] Abstract. Various methods are used for statistical testing of cryptographic algo- rithms, for example, NIST STS (A Statistical Test Suite for the Validation of Random Number Generators and Pseudo Random Number Generators for Cryptographic Applications) and DIEHARD (Diehard Battery of Tests of Ran- domness). Tests consists of verification the hypothesis of randomness for se- quences generated at the output of a cryptographic algorithm (for example, a keys generator, encryption algorithms, a hash function, etc.). In this paper, we use the NIST STS technique and study the statistical properties of the most common hashing functions that are used or can be used in modern blockchain networks. In particular, hashing algorithms are considered which specified in national and international standards, as well as little-known hash functions that were developed for limited use in specific applications. Thus, in this paper, we consider the most common hash functions used in more than 90% of blockchain networks. The research results are given as average by testing data of 100 se- quences of 108 bytes long, which means that is, the size of the statistical sample for each algorithm was 1010 bytes.
    [Show full text]