Accelerating AES with Vector Permute Instructions
Total Page:16
File Type:pdf, Size:1020Kb

Load more
Recommended publications
-
RESOURCE-EFFICIENT CRYPTOGRAPHY for UBIQUITOUS COMPUTING Lightweight Cryptographic Primitives from a Hardware & Software Perspective
RESOURCE-EFFICIENT CRYPTOGRAPHY FOR UBIQUITOUS COMPUTING Lightweight Cryptographic Primitives from a Hardware & Software Perspective DISSERTATION for the degree of Doktor-Ingenieur of the Faculty of Electrical Engineering and Information Technology at the Ruhr University Bochum, Germany by Elif Bilge Kavun Bochum, December 2014 Copyright © 2014 by Elif Bilge Kavun. All rights reserved. Printed in Germany. Anneme ve babama... Elif Bilge Kavun Place of birth: Izmir,˙ Turkey Author’s contact information: [email protected] www.emsec.rub.de/chair/ staff/elif bilge kavun Thesis Advisor: Prof. Dr.-Ing. Christof Paar Ruhr-Universit¨atBochum, Germany Secondary Referee: Prof. Christian Rechberger Danmarks Tekniske Universitet, Denmark Thesis submitted: December 19, 2014 Thesis defense: February 6, 2015 v Abstract Technological advancements in the semiconductor industry over the last few decades made the mass production of very small-scale computing devices possible. Thanks to the compactness and mobility of these devices, they can be deployed “pervasively”, in other words, everywhere and anywhere – such as in smart homes, logistics, e-commerce, and medical technology. Em- bedding the small-scale devices into everyday objects pervasively also indicates the realization of the foreseen “ubiquitous computing” concept. However, ubiquitous computing and the mass deployment of the pervasive devices in turn brought some concerns – especially, security and privacy. Many people criticize the security and privacy management in the ubiquitous context. It is even believed that an inadequate level of security may be the greatest barrier to the long-term success of ubiquitous computing. For ubiquitous computing, the adversary model and the se- curity level is not the same as in traditional applications due to limited resources in pervasive devices – area, power, and energy are actually harsh constraints for such devices. -
RISC-V Vector Extension Webinar II
RISC-V Vector Extension Webinar II August 3th, 2021 Thang Tran, Ph.D. Principal Engineer Webinar II - Agenda • Andes overview • Vector technology background – SIMD/vector concept – Vector processor basic • RISC-V V extension ISA – Basic – CSR • RISC-V V extension ISA – Memory operations – Compute instructions • Sample codes – Matrix multiplication – Loads with RVV versions 0.8 and 1.0 • AndesCore™ NX27V • Summary Copyright© 2020 Andes Technology Corp. 2 Terminology • ACE: Andes Custom Extension • ISA: Instruction Set Architecture • CSR: Control and Status Register • GOPS: Giga Operations Per Second • SEW: Element Width (8-64) • GFLOPS: Giga Floating-Point OPS • ELEN: Largest Element Width (32 or 64) • XRF: Integer register file • XLEN: Scalar register length in bits (64) • FRF: Floating-point register file • FLEN: FP register length in bits (16-64) • VRF: Vector register file • VLEN: Vector register length in bits (128-512) • SIMD: Single Instruction Multiple Data • LMUL: Register grouping multiple (1/8-8) • MMX: Multi Media Extension • EMUL: Effective LMUL • SSE: Streaming SIMD Extension • VLMAX/MVL: Vector Length Max • AVX: Advanced Vector Extension • AVL/VL: Application Vector Length • Configurable: parameters are fixed at built time, i.e. cache size • Extensible: added instructions to ISA includes custom instructions to be added by customer • Standard extension: the reserved codes in the ISA for special purposes, i.e. FP, DSP, … • Programmable: parameters can be dynamically changed in the program Copyright© 2020 Andes Technology Corp. 3 RISC-V V Extension ISA Basic Copyright© 2020 Andes Technology Corp. 4 Vector Register ISA • Vector-Register ISA Definition: − All vector operations are between vector registers (except for load and store). -
Fair and Comprehensive Methodology for Comparing Hardware Performance of Fourteen Round Two SHA-3 Candidates Using Fpgas
Fair and Comprehensive Methodology for Comparing Hardware Performance of Fourteen Round Two SHA-3 Candidates Using FPGAs Kris Gaj, Ekawat Homsirikamol, and Marcin Rogawski ECE Department, George Mason University, Fairfax, VA 22030, U.S.A. {kgaj,ehomsiri,mrogawsk}@gmu.edu http://cryptography.gmu.edu Abstract. Performance in hardware has been demonstrated to be an important factor in the evaluation of candidates for cryptographic stan- dards. Up to now, no consensus exists on how such an evaluation should be performed in order to make it fair, transparent, practical, and ac- ceptable for the majority of the cryptographic community. In this pa- per, we formulate a proposal for a fair and comprehensive evaluation methodology, and apply it to the comparison of hardware performance of 14 Round 2 SHA-3 candidates. The most important aspects of our methodology include the definition of clear performance metrics, the de- velopment of a uniform and practical interface, generation of multiple sets of results for several representative FPGA families from two major vendors, and the application of a simple procedure to convert multiple sets of results into a single ranking. Keywords: benchmarking, hash functions, SHA-3, FPGA. 1 Introduction and Motivation Starting from the Advanced Encryption Standard (AES) contest organized by NIST in 1997-2000 [1], open contests have become a method of choice for select- ing cryptographic standards in the U.S. and over the world. The AES contest in the U.S. was followed by the NESSIE competition in Europe [2], CRYPTREC in Japan, and eSTREAM in Europe [3]. Four typical criteria taken into account in the evaluation of candidates are: security, performance in software, performance in hardware, and flexibility. -
SIMD Extensions
SIMD Extensions PDF generated using the open source mwlib toolkit. See http://code.pediapress.com/ for more information. PDF generated at: Sat, 12 May 2012 17:14:46 UTC Contents Articles SIMD 1 MMX (instruction set) 6 3DNow! 8 Streaming SIMD Extensions 12 SSE2 16 SSE3 18 SSSE3 20 SSE4 22 SSE5 26 Advanced Vector Extensions 28 CVT16 instruction set 31 XOP instruction set 31 References Article Sources and Contributors 33 Image Sources, Licenses and Contributors 34 Article Licenses License 35 SIMD 1 SIMD Single instruction Multiple instruction Single data SISD MISD Multiple data SIMD MIMD Single instruction, multiple data (SIMD), is a class of parallel computers in Flynn's taxonomy. It describes computers with multiple processing elements that perform the same operation on multiple data simultaneously. Thus, such machines exploit data level parallelism. History The first use of SIMD instructions was in vector supercomputers of the early 1970s such as the CDC Star-100 and the Texas Instruments ASC, which could operate on a vector of data with a single instruction. Vector processing was especially popularized by Cray in the 1970s and 1980s. Vector-processing architectures are now considered separate from SIMD machines, based on the fact that vector machines processed the vectors one word at a time through pipelined processors (though still based on a single instruction), whereas modern SIMD machines process all elements of the vector simultaneously.[1] The first era of modern SIMD machines was characterized by massively parallel processing-style supercomputers such as the Thinking Machines CM-1 and CM-2. These machines had many limited-functionality processors that would work in parallel. -
Sharing Resources Between AES and the SHA-3 Second Round
Sharing Resources Between AES and the SHA-3 Second Round Candidates Fugue and Grøstl Kimmo Järvinen Department of Information and Computer Science Aalto University, School of Science and Technology Espoo, Finland AES-inspired SHA-3 Candidates I Design strongly influenced by AES: Share the structure and have significant similarity in transformations, or even use AES as a subroutine I ECHO, Fugue, Grøstl, and SHAvite-3 I Benadjila et al. (ASIACRYPT 2009) studied useability of Intel’s AES instructions for AES-inspired candidates Conclusion: only ECHO and SHAvite-3, which use AES as a subroutine, benefit from the instructions I This is the first study of combining AES with the SHA-3 candidates on hardware (FPGA) The 2nd SHA-3 Candidate Conference Santa Barbara, CA, USA August 23–24, 2010 Research Topics and Motivation Research Questions I What modifications are required to embed AES into the data path of the hash algorithm (or vice versa)? I How much resources can be shared (logic, registers, memory, . )? I What are the costs (area, delay, throughput, power consumption, . )? Applications I Any applications that require dedicated hardware implementations of a hash algorithm and a block cipher would benefit from reduced costs I Particularly important if resources are very limited The 2nd SHA-3 Candidate Conference Santa Barbara, CA, USA August 23–24, 2010 Advanced Encryption Standard AES with a 128-bit key (AES-128) 8 I State: 4 × 4 bytes; each byte is an element of GF(2 ) I 10 rounds with four transformations Transformations I SubBytes: Bytes mapped -
RISC-V Vector Extension Webinar I
RISC-V Vector Extension Webinar I July 13th, 2021 Thang Tran, Ph.D. Principal Engineer Who WeAndes Are Technology Corporation CPU Pure-play RISC-V Founding Major Open-Source CPU IP Vendor Premier Member Contributor/Maintainer RISC-V Ambassador 16-year-old Running Task Groups Public Company TSC Vice Chair Director of the Board Quick Facts + NL 100 years 80% FR BJ KR USA JP IL SH CPU Experience in Silicon Valley R&D SZ TW (HQ) 200+ 20K+ 7B+ Licensees AndeSight IDE Total shipment of Andes- installations Embedded™ SoC Confidential Taking RISC-V® Mainstream 2 Andes Technology Corporation Overview Andes Highlights •Founded in March 2005 in Hsinchu Science Park, Taiwan, ROC. •World class 32/64-bit RISC-V CPU IP public company •Over 200 people; 80% are engineers; R&D team consisting of Silicon Valley veterans •TSMC OIP Award “Partner of the Year” for New IP (2015) •A Premier founding member of RISC-V Foundation •2018 MCU Innovation Award by China Electronic News: AndesCore™ N25F/NX25F •ASPENCORE WEAA 2020 Outstanding Product Performance of the Year: AndesCore™ NX27V •2020 HsinChu Science Park Innovation Award: AndesCore™ NX27V Andes Mission • Trusted Computing Expert and World No.1 RISC-V IP Provider Emerging Opportunities • AIoT, 5G/Networking, Storage and Cloud computing 3 V5 Adoptions: From MCU to Datacenters • Edge to Cloud: − ADAS − Datacenter AI accelerators − AIoT − SSD: enterprise (& consumer) − Blockchain − 5G macro/small cells − FPGA − MCU − Multimedia − Security − Wireless (BT/WiFi) 5G Macro • 1 to 1000+ core • 40nm to 5nm • Many in AI Copyright© 2020 Andes Technology Corp. 4 Webinar I - Agenda • Andes overview • Vector technology background – SIMD/vector concept – Vector processor basic • RISC-V V extension ISA – Basic – CSR – Memory operations – Compute instructions • Sample codes – Matrix multiplication – Loads with RVV versions 0.8 and 1.0 • AndesCore™ NX27V introduction • Summary Copyright© 2020 Andes Technology Corp. -
Optimizing Packed String Matching on AVX2 Platform
Optimizing Packed String Matching on AVX2 Platform M. Akif Aydo˘gmu¸s1,2 and M.O˘guzhan Külekci1 1 Informatics Institute, Istanbul Technical University, Istanbul, Turkey [email protected], [email protected] 2 TUBITAK UME, Gebze-Kocaeli, Turkey Abstract. Exact string matching, searching for all occurrences of given pattern P on a text T , is a fundamental issue in computer science with many applica- tions in natural language processing, speech processing, computational biology, information retrieval, intrusion detection systems, data compression, and etc. Speeding up the pattern matching operations benefiting from the SIMD par- allelism has received attention in the recent literature, where the empirical results on previous studies revealed that SIMD parallelism significantly helps, while the performance may even be expected to get automatically enhanced with the ever increasing size of the SIMD registers. In this paper, we provide variants of the previously proposed EPSM and SSEF algorithms, which are orig- inally implemented on Intel SSE4.2 (Streaming SIMD Extensions 4.2 version with 128-bit registers). We tune the new algorithms according to Intel AVX2 platform (Advanced Vector Extensions 2 with 256-bit registers) and analyze the gain in performance with respect to the increasing length of the SIMD registers. Profiling the new algorithms by using the Intel Vtune Amplifier for detecting performance bottlenecks led us to consider the cache friendliness and shared-memory access issues in the AVX2 platform. We applied cache op- timization techniques to overcome the problems particularly addressing the search algorithms based on filtering. Experimental comparison of the new solutions with the previously known-to- be-fast algorithms on small, medium, and large alphabet text files with diverse pattern lengths showed that the algorithms on AVX2 platform optimized cache obliviously outperforms the previous solutions. -
NASM – the Netwide Assembler
NASM – The Netwide Assembler version 2.14rc7 © 1996−2017 The NASM Development Team — All Rights Reserved This document is redistributable under the license given in the file "LICENSE" distributed in the NASM archive. Contents Chapter 1: Introduction . 17 1.1 What Is NASM?. 17 1.1.1 License Conditions . 17 Chapter 2: Running NASM . 19 2.1 NASM Command−Line Syntax . 19 2.1.1 The −o Option: Specifying the Output File Name . 19 2.1.2 The −f Option: Specifying the Output File Format . 20 2.1.3 The −l Option: Generating a Listing File . 20 2.1.4 The −M Option: Generate Makefile Dependencies. 20 2.1.5 The −MG Option: Generate Makefile Dependencies . 20 2.1.6 The −MF Option: Set Makefile Dependency File. 20 2.1.7 The −MD Option: Assemble and Generate Dependencies . 20 2.1.8 The −MT Option: Dependency Target Name . 21 2.1.9 The −MQ Option: Dependency Target Name (Quoted) . 21 2.1.10 The −MP Option: Emit phony targets . 21 2.1.11 The −MW Option: Watcom Make quoting style . 21 2.1.12 The −F Option: Selecting a Debug Information Format . 21 2.1.13 The −g Option: Enabling Debug Information. 21 2.1.14 The −X Option: Selecting an Error Reporting Format . 21 2.1.15 The −Z Option: Send Errors to a File. 22 2.1.16 The −s Option: Send Errors to stdout ..........................22 2.1.17 The −i Option: Include File Search Directories . 22 2.1.18 The −p Option: Pre−Include a File . 22 2.1.19 The −d Option: Pre−Define a Macro . -
Rebound Attack
Rebound Attack Florian Mendel Institute for Applied Information Processing and Communications (IAIK) Graz University of Technology Inffeldgasse 16a, A-8010 Graz, Austria http://www.iaik.tugraz.at/ Outline 1 Motivation 2 Whirlpool Hash Function 3 Application of the Rebound Attack 4 Summary SHA-3 competition Abacus ECHO Lesamnta SHAMATA ARIRANG ECOH Luffa SHAvite-3 AURORA Edon-R LUX SIMD BLAKE EnRUPT Maraca Skein Blender ESSENCE MCSSHA-3 Spectral Hash Blue Midnight Wish FSB MD6 StreamHash Boole Fugue MeshHash SWIFFTX Cheetah Grøstl NaSHA Tangle CHI Hamsi NKS2D TIB3 CRUNCH HASH 2X Ponic Twister CubeHash JH SANDstorm Vortex DCH Keccak Sarmal WaMM Dynamic SHA Khichidi-1 Sgàil Waterfall Dynamic SHA2 LANE Shabal ZK-Crypt SHA-3 competition Abacus ECHO Lesamnta SHAMATA ARIRANG ECOH Luffa SHAvite-3 AURORA Edon-R LUX SIMD BLAKE EnRUPT Maraca Skein Blender ESSENCE MCSSHA-3 Spectral Hash Blue Midnight Wish FSB MD6 StreamHash Boole Fugue MeshHash SWIFFTX Cheetah Grøstl NaSHA Tangle CHI Hamsi NKS2D TIB3 CRUNCH HASH 2X Ponic Twister CubeHash JH SANDstorm Vortex DCH Keccak Sarmal WaMM Dynamic SHA Khichidi-1 Sgàil Waterfall Dynamic SHA2 LANE Shabal ZK-Crypt The Rebound Attack [MRST09] Tool in the differential cryptanalysis of hash functions Invented during the design of Grøstl AES-based designs allow a simple application of the idea Has been applied to a wide range of hash functions Echo, Grøstl, JH, Lane, Luffa, Maelstrom, Skein, Twister, Whirlpool, ... The Rebound Attack Ebw Ein Efw inbound outbound outbound Applies to block cipher and permutation based -
A Short Note on AES-Inspired Hashes
A Short Note on AES-Inspired Hashes Tor E. Bjørstad University of Bergen, Norway Email : [email protected] 1 Introduction This paper gives a rough classification of various AES-based hash algorithms that have been accepted for Round 1 of the NIST Hash Function Competition. We compare the different algorithms based on the number of \AES-like" rounds performed by a single invocation of the compression function, versus the size of the input block. This is not particularly rigorous or scientific way of comparing hash algorithms, but as we shall see it may still yield somewhat interesting results. With the exception of Sarmal and Whirlpool, all the listed algorithms use the AES S-box as their only source of nonlinearity. How they use it is a different matter. What all the algorithms do have in common, is that they use the 8 bit S-box together with a MDS matrix, in a construction where SubBytes() and MixColumns() are applied after one another on some internal data structure. The actual MDS matrices used are frequently different from the one in AES; for instance, Fugue uses a 16 × 16 matrix. For the sake of comparison, the number of of rounds has been normalised to correspond to those of AES-128. This is not entirely fair, but it would be hard to make any comparison between the internals of such wildly different hash algorithms without making some (over)simplifications. The reader is invited to take note of 3 main observations. First, there is a very wide spread between the number of \rounds" used by the different algorithms { but a strong tendency of clustering. -
AMD's Bulldozer Architecture
AMD's Bulldozer Architecture Chris Ziemba Jonathan Lunt Overview • AMD's Roadmap • Instruction Set • Architecture • Performance • Later Iterations o Piledriver o Steamroller o Excavator Slide 2 1 Changed this section, bulldozer is covered in architecture so it makes sense to not reiterate with later slides Chris Ziemba, 鳬o AMD's Roadmap • October 2011 o First iteration, Bulldozer released • June 2013 o Piledriver, implemented in 2nd gen FX-CPUs • 2013 o Steamroller, implemented in 3rd gen FX-CPUs • 2014 o Excavator, implemented in 4th gen Fusion APUs • 2015 o Revised Excavator adopted in 2015 for FX-CPUs and beyond Instruction Set: Overview • Type: CISC • Instruction Set: x86-64 (AMD64) o Includes Old x86 Registers o Extends Registers and adds new ones o Two Operating Modes: Long Mode & Legacy Mode • Integer Size: 64 bits • Virtual Address Space: 64 bits o 16 EB of Address Space (17,179,869,184 GB) • Physical Address Space: 48 bits (Current Versions) o Saves space/transistors/etc o 256TB of Address Space Instruction Set: ISA Registers Instruction Set: Operating Modes Instruction Set: Extensions • Intel x86 Extensions o SSE4 : Streaming SIMD (Single Instruction, Multiple Data) Extension 4. Mainly for DSP and Graphics Processing. o AES-NI: Advanced Encryption Standard (AES) Instructions o AVX: Advanced Vector Extensions. 256 bit registers for computationally complex floating point operations such as image/video processing, simulation, etc. • AMD x86 Extensions o XOP: AMD specified SSE5 Revision o FMA4: Fused multiply-add (MAC) instructions -
NISTIR 7620 Status Report on the First Round of the SHA-3
NISTIR 7620 Status Report on the First Round of the SHA-3 Cryptographic Hash Algorithm Competition Andrew Regenscheid Ray Perlner Shu-jen Chang John Kelsey Mridul Nandi Souradyuti Paul NISTIR 7620 Status Report on the First Round of the SHA-3 Cryptographic Hash Algorithm Competition Andrew Regenscheid Ray Perlner Shu-jen Chang John Kelsey Mridul Nandi Souradyuti Paul Information Technology Laboratory National Institute of Standards and Technology Gaithersburg, MD 20899-8930 September 2009 U.S. Department of Commerce Gary Locke, Secretary National Institute of Standards and Technology Patrick D. Gallagher, Deputy Director NISTIR 7620: Status Report on the First Round of the SHA-3 Cryptographic Hash Algorithm Competition Abstract The National Institute of Standards and Technology is in the process of selecting a new cryptographic hash algorithm through a public competition. The new hash algorithm will be referred to as “SHA-3” and will complement the SHA-2 hash algorithms currently specified in FIPS 180-3, Secure Hash Standard. In October, 2008, 64 candidate algorithms were submitted to NIST for consideration. Among these, 51 met the minimum acceptance criteria and were accepted as First-Round Candidates on Dec. 10, 2008, marking the beginning of the First Round of the SHA-3 cryptographic hash algorithm competition. This report describes the evaluation criteria and selection process, based on public feedback and internal review of the first-round candidates, and summarizes the 14 candidate algorithms announced on July 24, 2009 for moving forward to the second round of the competition. The 14 Second-Round Candidates are BLAKE, BLUE MIDNIGHT WISH, CubeHash, ECHO, Fugue, Grøstl, Hamsi, JH, Keccak, Luffa, Shabal, SHAvite-3, SIMD, and Skein.