Sok: a Performance Evaluation of Cryptographic Instruction Sets on Modern Architectures Armando Faz-Hernández Julio López Ana Karina D

Total Page:16

File Type:pdf, Size:1020Kb

Sok: a Performance Evaluation of Cryptographic Instruction Sets on Modern Architectures Armando Faz-Hernández Julio López Ana Karina D Session: Card-based Protocol, Implementation, and Authentication for IoT APKC’18, June 4, 2018, Incheon, Republic of Korea SoK: A Performance Evaluation of Cryptographic Instruction Sets on Modern Architectures Armando Faz-Hernández Julio López Ana Karina D. S. de Oliveira Institute of Computing Institute of Computing Federal University of Mato Grosso Do University of Campinas University of Campinas Sul (FACOM-UFMS) Campinas, São Paulo, Brazil Campinas, São Paulo, Brazil Campo Grande MS, Brazil [email protected] [email protected] [email protected] ABSTRACT 1 INTRODUCTION The latest processors have included extensions to the instruction The omnipresence of cryptographic services has influenced modern set architecture tailored to speed up the execution of cryptographic processor’s designs. One proof of that is the support given to the algorithms. Like the AES New Instructions (AES-NI) that target the Advanced Encryption Standard (AES) [29] employing extensions to AES encryption algorithm, the release of the SHA New Instructions the instruction set architecture known as the AES New Instructions (SHA-NI), designed to support the SHA-256 hash function, intro- (AES-NI) [15]. Other examples in the same vein are the CRC32 duces a new scenario for optimizing cryptographic software. In instructions, which aid on error-detection codes, and the carry- this work, we present a performance evaluation of several crypto- less multiplier (CLMUL), which is used to accelerate the AES-GCM graphic algorithms, hash-based signatures and data encryption, on authenticated encryption algorithm [16]. All of these extensions platforms that support AES-NI and/or SHA-NI. In particular, we re- enhance the performance of cryptographic implementations. visited several optimization techniques targeting multiple-message Latest processors have added instructions that support crypto- hashing, and as a result, we reduce by 21% the running time of this graphic hash functions. In 2013, Intel [21] announced the speci- task by means of a pipelined SHA-NI implementation. In public- fication of the SHA New Instructions (SHA-NI), a set of new in- key cryptography, multiple-message hashing is one of the critical structions for speeding up the execution of SHA1 and SHA2 hash operations of the XMSS and XMSSMT post-quantum hash-based functions [31]. However, SHA-NI-ready processors were available digital signatures. Using SHA-NI extensions, signatures are com- several years later. In 2016, Intel released the Goldmont micro- puted 4 faster; however, our pipelined SHA-NI implementation architecture which targets low power-consumption devices; and × increased this speedup factor to 4:3 . For symmetric cryptography, in 2017 AMD introduced Zen, a micro-architecture that supports × we revisited the implementation of AES modes of operation and both AES-NI and SHA-NI. reduced by 12% and 7% the running time of CBC decryption and The Single Instruction Multiple Data (SIMD) parallel process- CTR encryption, respectively. ing is an optimization strategy increasingly used to develop effi- cient software implementations. The Advanced Vector eXtensions 2 CCS CONCEPTS (AVX2) [26] are SIMD instructions that operate over 256-bit vector • Security and privacy → Digital signatures; Hash functions registers. The latest generations of Intel processors, such as Haswell, and message authentication codes; • Computer systems or- Broadwell, Skylake, and Kaby Lake, have support for AVX2; mean- ganization → Pipeline computing; Single instruction, multiple data; while, Zen is the first AMD’s architecture that implements AVX2. The main contribution of this work is to provide a performance KEYWORDS benchmark report of cryptographic algorithms accelerated through SHA-NI and AES-NI. We focus on mainstream processors, like the AES-NI; SHA-NI; Vector Instructions; Hash-based Digital Signa- ones found in laptops and desktops, and also those used in cloud tures; Data Encryption computing environments. To that end, we revisited known opti- ACM Reference Format: mization techniques targeting the capabilities (and limitations) of Armando Faz-Hernández, Julio López, and Ana Karina D. S. de Oliveira. four micro-architectures: Haswell (an Intel Core i7-4770 processor), 2018. SoK: A Performance Evaluation of Cryptographic Instruction Sets on Skylake (an Intel Core i7-6700K processor), Kaby Lake (an Intel Core Modern Architectures. In APKC’18: 5th ACM ASIA Public-Key Cryptography i5-7400 processor), and Zen (an AMD Ryzen 7 1800X processor). Workshop , June 4, 2018, Incheon, Republic of Korea. ACM, New York, NY, As part of our contributions, we improve the performance of USA, 10 pages. https://doi.org/10.1145/3197507.3197511 multiple-message hashing, which refers to the task of hashing sev- Permission to make digital or hard copies of all or part of this work for personal or eral messages of the same length. We followed two approaches. classroom use is granted without fee provided that copies are not made or distributed First, we develop a pipelined implementation using SHA-NI that for profit or commercial advantage and that copies bear this notice and the full citation saves around 18-21% of the running time. Second, we review SIMD on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or vectorization techniques and report timings for the parallel hashing republish, to post on servers or to redistribute to lists, requires prior specific permission of four and eight messages. and/or a fee. Request permissions from [email protected]. In public-key cryptography, multiple-message hashing serves as APKC’18, June 4, 2018, Incheon, Republic of Korea © 2018 Copyright held by the owner/author(s). Publication rights licensed to the a building block on the construction of hash-based digital signatures, Association for Computing Machinery. such as XMSS [8] and XMSSMT [24]. These signature schemes are ACM ISBN 978-1-4503-5756-2/18/06...$15.00 strong candidates to be part of a new cryptographic portfolio of https://doi.org/10.1145/3197507.3197511 9 Session: Card-based Protocol, Implementation, and Authentication for IoT APKC’18, June 4, 2018, Incheon, Republic of Korea APKC’18, June 4, 2018, Incheon, Republic of Korea Faz-Hernández, López, de Oliveira Algorithm 1 Definition of the Update SHA-256 function. Algorithm 2 The Update SHA-256 function using the SHA-NI set. Domain Parameters: The functions σo, σ1, Σo, Σ1, Ch, Maj, and Input: S is the current state, and M is a 512-bit message block. the values k0;:::;k63 are defined in Appendix A. Output: S = Update(S; M). Input: S is the current state, and M is a 512-bit message block. Message Schedule Phase Output: S = Update(S; M). 1: Load M into 128-bit vector registers: W0, W4, W8, and W12. Message Schedule Phase 2: for i 0 to 11 do 1: (w0;:::;w15) M //Split block into sixteen 32-bit words. 3: X SHA256MSG1 (W4i ; W4i+4) 2: for t 16 to 63 do 4: W4i+9 PALIGNR (W4i+12; W4i+8; 4) 3: wt σ0(wt 15) + σ1(wt 2) + wt 7 + wt 16 5: Y PADDD (X; W4i+9) − − − − 4: end for 6: W4i+16 = SHA256MSG2 (Y; W4i+12) Update State Phase 7: end for 5: (a;b;c;d; e; f ;д;h) S //Split state into eight 32-bit words. Update State Phase 6: for i 0 to 63 do 8: (a;b;c;d; e; f ;д;h) S, A = a;b; e; f , and C = c;d;д;h . 7: t1 h + Σ1(e) + Ch(e; f ;д) + ki + wi 9: for i 0 to 15 do f g f g 8: t2 Σ0(a) + Maj(a;b;c) 10: X PADDD (W4i ; K4i ) 9: h д, д f , f e, e d + t1 11: Y PSRLDQ (X; 8) 10: d c, c b, b a, a t1 + t2 12: C SHA256RNDS2 (C; A; X ) 11: end for 13: A SHA256RNDS2 (A; C; Y ) 12: (a;b;c;d; e; f ;д;h) S //Split state into eight 32-bit words. 14: end for 13: return S (a + a;b + b;c + c;d + d; e + e; f + f ;д + д;h + h) 15: (a;b;c;d; e; f ;д;h) S , A = a;b; e; f , and C = c;d;д;h 16: a;b; e; f PADDD A; A f g f g 17: fc;d;д;h g PADDD C; C quantum-resistant algorithms [23]. For this reason, we evaluate the 18: returnf Sg (a;b;c;d; e; f ;д;h) impact on the performance of these signature schemes carried by optimizations of multiple-message hashing. Finally, we revisited software implementation techniques for the instructions SHA256MSG1, SHA256MSG2, and SHA256RNDS2 aid the AES modes of operation. As a result, we show optimizations on the calculation of the Update function of the SHA-256 algorithm. for Zen that reduce the running time of AES-CBC decryption and In this section, we describe the implementation of the Update AES-CTR encryption by 12% and 9%, respectively. As a side result, function using the SHA-NI set (see Algorithm 2). First of all, it we also improve the implementation of AEGIS, an AES-based algo- is required that the message block (the values w0;:::;w15) be rithm classified in the final round of the authenticated-encryption stored into four 128-bit vector registers W0, W4, W8, and W12 as competition CAESAR [4], reducing by 11% its running time. follows Wi = [wi ;wi+1;wi+2;wi+3]. After that, the SHA256MSG1 All the source codes derived from this research work are released and SHA256MSG2 instructions will help on the message schedule under a permissive software license and can be retrieved from phase for computing the values w16;:::;w63. [ http://github.com/armfazh/FLO-shani-aesni ].
Recommended publications
  • 07 Vectorization for Intel C++ & Fortran Compiler .Pdf
    Vectorization for Intel® C++ & Fortran Compiler Presenter: Georg Zitzlsberger Date: 10-07-2015 1 Agenda • Introduction to SIMD for Intel® Architecture • Compiler & Vectorization • Validating Vectorization Success • Reasons for Vectorization Fails • Intel® Cilk™ Plus • Summary 2 Optimization Notice Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Vectorization • Single Instruction Multiple Data (SIMD): . Processing vector with a single operation . Provides data level parallelism (DLP) . Because of DLP more efficient than scalar processing • Vector: . Consists of more than one element . Elements are of same scalar data types (e.g. floats, integers, …) • Vector length (VL): Elements of the vector A B AAi i BBi i A B Ai i Bi i Scalar Vector Processing + Processing + C CCi i C Ci i VL 3 Optimization Notice Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Evolution of SIMD for Intel Processors Present & Future: Goal: Intel® MIC Architecture, 8x peak FLOPs (FMA) over 4 generations! Intel® AVX-512: • 512 bit Vectors • 2x FP/Load/FMA 4th Generation Intel® Core™ Processors Intel® AVX2 (256 bit): • 2x FMA peak Performance/Core • Gather Instructions 2nd Generation 3rd Generation Intel® Core™ Processors Intel® Core™ Processors Intel® AVX (256 bit): • Half-float support • 2x FP Throughput • Random Numbers • 2x Load Throughput Since 1999: Now & 2010 2012 2013 128 bit Vectors Future Time 4 Optimization Notice
    [Show full text]
  • Analysis of SIMD Applicability to SHA Algorithms O
    1 Analysis of SIMD Applicability to SHA Algorithms O. Aciicmez Abstract— It is possible to increase the speed and throughput of The remainder of the paper is organized as follows: In an algorithm using parallelization techniques. Single-Instruction section 2 and 3, we introduce SIMD concept and the SIMD Multiple-Data (SIMD) is a parallel computation model, which has architecture of Intel including MMX technology and SSE already employed by most of the current processor families. In this paper we will analyze four SHA algorithms and determine extensions. Section 4 describes SHA algorithm and Section possible performance gains that can be achieved using SIMD 5 discusses the possible improvements on SHA performance parallelism. We will point out the appropriate parts of each that can be achieved by using SIMD instructions. algorithm, where SIMD instructions can be used. II. SIMD PARALLEL PROCESSING I. INTRODUCTION Single-instruction multiple-data execution model allows Today the security of a cryptographic mechanism is not the several data elements to be processed at the same time. The only concern of cryptographers. The heavy communication conventional scalar execution model, which is called single- traffic on contemporary very large network of interconnected instruction single-data (SISD) deals only with one pair of data devices demands a great bandwidth for security protocols, and at a time. The programs using SIMD instructions can run much hence increasing the importance of speed and throughput of a faster than their scalar counterparts. However SIMD enabled cryptographic mechanism. programs are harder to design and implement. A straightforward approach to improve cryptographic per- The most common use of SIMD instructions is to perform formance is to implement cryptographic algorithms in hard- parallel arithmetic or logical operations on multiple data ware.
    [Show full text]
  • Operating Guide
    Operating Guide EPIA EN-Series Mini-ITX Mainboard January 18, 2012 Version 1.21 EPIA EN-Series Operating Guide Table of Contents Table of Contents ...................................................................................................................................................................................... i VIA EPIA EN-Series Overview.............................................................................................................................................................. 1 VIA EPIA EN-Series Layout .................................................................................................................................................................. 2 VIA EPIA EN-Series Specifications ...................................................................................................................................................... 3 VIA EPIA EN Processor SKUs .............................................................................................................................................................. 4 VIA CN700 Chipset Overview ............................................................................................................................................................... 5 VIA EPIA EN-Series I/O Back Panel Layout ...................................................................................................................................... 6 VIA EPIA EN-Series Layout Diagram & Mounting Holes ..............................................................................................................
    [Show full text]
  • Hoja De Datos De Familes Del Procesador Intel(R) Core(TM) De 10A Generación, Vol.1
    10a generación de familias de procesadores Intel® Core™ Ficha técnica, Volumen 1 de 2 Compatible con la 10a generación de la familia de procesadores Intel® Core™, procesadores Intel® Pentium®, procesadores Intel® Celeron® para plataformas U/Y, anteriormente conocidos como Ice Lake. Agosto de 2019 Revisión 001 Número del Documento: 341077-001 Líneas legales y descargos de responsabilidad Esta información es una combinación de una traducción hecha por humanos y de la traducción automática por computadora del contenido original para su conveniencia. Este contenido se ofrece únicamente como información general y no debe ser considerada como completa o precisa. No puede utilizar ni facilitar el uso de este documento en relación con ninguna infracción u otro análisis legal relacionado con los productos Intel descritos en este documento. Usted acepta conceder a Intel una licencia no exclusiva y libre de regalías a cualquier reclamación de patente redactada posteriormente que incluya el objeto divulgado en este documento. Este documento no concede ninguna licencia (expresa o implícita, por impedimento o de otro tipo) a ningún derecho de propiedad intelectual. Las características y beneficios de las tecnologías Intel dependen de la configuración del sistema y pueden requerir la activación de hardware, software o servicio habilitado. El desempeño varía según la configuración del sistema. Ningún equipo puede ser absolutamente seguro. Consulte al fabricante de su sistema o su distribuidor minorista u obtenga más información en intel.la. Las tecnologías Intel pueden requerir la activación de hardware habilitado, software específico o servicios. Consulte con el fabricante o distribuidor del sistema. Los productos descritos pueden contener defectos de diseño o errores conocidos como erratas que pueden hacer que el producto se desvíe de las especificaciones publicadas.
    [Show full text]
  • GPU Developments 2018
    GPU Developments 2018 2018 GPU Developments 2018 © Copyright Jon Peddie Research 2019. All rights reserved. Reproduction in whole or in part is prohibited without written permission from Jon Peddie Research. This report is the property of Jon Peddie Research (JPR) and made available to a restricted number of clients only upon these terms and conditions. Agreement not to copy or disclose. This report and all future reports or other materials provided by JPR pursuant to this subscription (collectively, “Reports”) are protected by: (i) federal copyright, pursuant to the Copyright Act of 1976; and (ii) the nondisclosure provisions set forth immediately following. License, exclusive use, and agreement not to disclose. Reports are the trade secret property exclusively of JPR and are made available to a restricted number of clients, for their exclusive use and only upon the following terms and conditions. JPR grants site-wide license to read and utilize the information in the Reports, exclusively to the initial subscriber to the Reports, its subsidiaries, divisions, and employees (collectively, “Subscriber”). The Reports shall, at all times, be treated by Subscriber as proprietary and confidential documents, for internal use only. Subscriber agrees that it will not reproduce for or share any of the material in the Reports (“Material”) with any entity or individual other than Subscriber (“Shared Third Party”) (collectively, “Share” or “Sharing”), without the advance written permission of JPR. Subscriber shall be liable for any breach of this agreement and shall be subject to cancellation of its subscription to Reports. Without limiting this liability, Subscriber shall be liable for any damages suffered by JPR as a result of any Sharing of any Material, without advance written permission of JPR.
    [Show full text]
  • SIMD Extensions
    SIMD Extensions PDF generated using the open source mwlib toolkit. See http://code.pediapress.com/ for more information. PDF generated at: Sat, 12 May 2012 17:14:46 UTC Contents Articles SIMD 1 MMX (instruction set) 6 3DNow! 8 Streaming SIMD Extensions 12 SSE2 16 SSE3 18 SSSE3 20 SSE4 22 SSE5 26 Advanced Vector Extensions 28 CVT16 instruction set 31 XOP instruction set 31 References Article Sources and Contributors 33 Image Sources, Licenses and Contributors 34 Article Licenses License 35 SIMD 1 SIMD Single instruction Multiple instruction Single data SISD MISD Multiple data SIMD MIMD Single instruction, multiple data (SIMD), is a class of parallel computers in Flynn's taxonomy. It describes computers with multiple processing elements that perform the same operation on multiple data simultaneously. Thus, such machines exploit data level parallelism. History The first use of SIMD instructions was in vector supercomputers of the early 1970s such as the CDC Star-100 and the Texas Instruments ASC, which could operate on a vector of data with a single instruction. Vector processing was especially popularized by Cray in the 1970s and 1980s. Vector-processing architectures are now considered separate from SIMD machines, based on the fact that vector machines processed the vectors one word at a time through pipelined processors (though still based on a single instruction), whereas modern SIMD machines process all elements of the vector simultaneously.[1] The first era of modern SIMD machines was characterized by massively parallel processing-style supercomputers such as the Thinking Machines CM-1 and CM-2. These machines had many limited-functionality processors that would work in parallel.
    [Show full text]
  • Intel® Architecture and Tools
    Klaus-Dieter Oertel Intel-SSG-Developer Products Division FZ Jülich, 22-05-2017 The “Free Lunch” is over, really Processor clock rate growth halted around 2005 Source: © 2014, James Reinders, Intel, used with permission Software must be parallelized to realize all the potential performance 4 Optimization Notice Copyright © 2014, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Changing Hardware Impacts Software More cores More Threads Wider vectors Intel® Intel® Xeon® Intel® Xeon® Intel® Xeon® Intel® Xeon® Intel® Future Intel® Xeon Intel® Xeon Phi™ Future Intel® Xeon® Processor Processor Processor Processor Xeon® Intel® Xeon® Phi™ x100 x200 Processor Xeon Phi™ Processor 5100 series 5500 series 5600 series E5-2600 v2 Processor Processor1 Coprocessor & Coprocessor (KNH) 64-bit series E5-2600 (KNC) (KNL) v3 series v4 series Up to Core(s) 1 2 4 6 12 18-22 TBD 61 72 TBD Up to Threads 2 2 8 12 24 36-44 TBD 244 288 TBD SIMD Width 128 128 128 128 256 256 512 512 512 TBD Intel® Intel® Intel® Intel® Intel® Intel® Intel® Intel® Vector ISA IMCI 512 TBD SSE3 SSE3 SSE4- 4.1 SSE 4.2 AVX AVX2 AVX-512 AVX-512 Optimization Notice Product specification for launched and shipped products available on ark.intel.com. 1. Not launched or in planning. Copyright © 2016, Intel Corporation. All rights reserved. 9 *Other names and brands may be claimed as the property of others. Changing Hardware Impacts Software More cores More Threads Wider vectors Intel® Intel® Xeon® Intel® Xeon® Intel® Xeon® Intel® Xeon® Intel® Future Intel® Xeon Intel® Xeon Phi™ Future Intel® Xeon® Processor Processor Processor Processor Xeon® Intel® Xeon® Phi™ x100 x200 Processor Xeon Phi™ Processor 5100 series 5500 series 5600 series E5-2600 v2 Processor Processor1 Coprocessor & Coprocessor (KNH) 64-bit series E5-2600 (KNC) (KNL) v3 series v4 series Up to Core(s) 1 2 4 6 12 18-22 TBD 61 72 TBD Up to Threads 2 High2 performance8 12 24software36-44 mustTBD be 244both: 288 TBD SIMD Width 128 .
    [Show full text]
  • Demystifying Internet of Things Security Successful Iot Device/Edge and Platform Security Deployment — Sunil Cheruvu Anil Kumar Ned Smith David M
    Demystifying Internet of Things Security Successful IoT Device/Edge and Platform Security Deployment — Sunil Cheruvu Anil Kumar Ned Smith David M. Wheeler Demystifying Internet of Things Security Successful IoT Device/Edge and Platform Security Deployment Sunil Cheruvu Anil Kumar Ned Smith David M. Wheeler Demystifying Internet of Things Security: Successful IoT Device/Edge and Platform Security Deployment Sunil Cheruvu Anil Kumar Chandler, AZ, USA Chandler, AZ, USA Ned Smith David M. Wheeler Beaverton, OR, USA Gilbert, AZ, USA ISBN-13 (pbk): 978-1-4842-2895-1 ISBN-13 (electronic): 978-1-4842-2896-8 https://doi.org/10.1007/978-1-4842-2896-8 Copyright © 2020 by The Editor(s) (if applicable) and The Author(s) This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. Open Access This book is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made. The images or other third party material in this book are included in the book’s Creative Commons license, unless indicated otherwise in a credit line to the material.
    [Show full text]
  • Extensions to Instruction Sets
    Extensions to Instruction Sets Ricardo Manuel Meira Ferrão Luis Departamento de Informática, Universidade do Minho 4710 - 057 Braga, Portugal [email protected] Abstract. This communication analyses extensions to instruction sets in general purpose processors, together with their evolution and significant innovation. It begins with an overview of specific multimedia and digital signal processing requirements and then it focuses on feature detection to properly identify the adequate extension. A brief performance evaluation on two competitive processors closes the communication. 1 Introduction The purpose of this communication is to analyse extensions to instruction sets, describe the specific multimedia and digital signal processing (DSP) requirements of a processor, detail how to detect these extensions and verify if the processor supports them. To achieve this functionality an IA32 architecture example code will be demonstrated. To study the instruction sets supported by each processor, an analysis is performed to compare and distinguish each technology in terms of new extensions to instruction sets and architecture evolution, mapping a table that lists the instruction sets supported by each processor. To clearly identify these differences between architectures, tests are performed with an application example, determining comparable results of two competitive processors. In this communication, Section 2 gives an overview of multimedia and digital signal processing requirements, Section 3 explains the feature detection of these instructions, Section 4 details some of the extensions supported by IA32 based processors, Section 5 tests and describes the performance of each technology and analyses the results. Section 6 concludes this communication. 2 Multimedia and DSP Requirements Computer industries needed a way to improve their multimedia capabilities because of the demand for 3D graphics, video and other multimedia applications such as games.
    [Show full text]
  • FAST HASHING in CUDA Neville Walo Department of Computer
    FAST HASHING IN CUDA Neville Walo Department of Computer Science ETH Zurich¨ Zurich,¨ Switzerland ABSTRACT cies, it is possible to use the compression function of SHA- Hash functions, such as SHA-256 [1], are extensively used 256 along with the Sarkar-Schellenberg composition prin- in cryptographic applications. However, SHA-256 cannot ciple [2] to create a parallel collision resistant hash function be parallelized due to sequential dependencies. Using the called PARSHA-256 [3]. Sarkar-Schellenberg composition principle [2] in combina- In this work, we try to accelerate hashing in CUDA [6]. tion with SHA-256 gives rise to PARSHA-256 [3], a parallel We have divided this project into two sub-projects. The first collision resistant hash function. We present efficient imple- one is the Bitcoin scenario, with the goal to calculate many mentations for both SHA-256 and PARSHA-256 in CUDA. independent SHA-256 computation in parallel. The second Our results demonstrate that for large messages PARSHA- case is PARSHA-256, where the goal is to implement the 256 can significantly outperform SHA-256. proposed algorithm efficiently in CUDA. Related work. To our knowledge there is no compara- ble implementation of PARSHA-256 which runs on a GPU. 1. INTRODUCTION There exists only the implementation of the original paper, Hash functions are one of the most important operations which uses multithreading [3]. On the other hand, there in cryptographic applications, like digital signature algo- are countless implementations of Bitcoin Miners in CUDA rithms, keyed-hash message authentication codes, encryp- [7, 8], as this was the most prominent way to mine Bitcoins tions and the generation of random numbers.
    [Show full text]
  • Intel® Software Guard Extensions (Intel® SGX)
    Intel® Software Guard Extensions (Intel® SGX) Developer Guide Intel(R) Software Guard Extensions Developer Guide Legal Information No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document. Intel disclaims all express and implied warranties, including without limitation, the implied war- ranties of merchantability, fitness for a particular purpose, and non-infringement, as well as any warranty arising from course of performance, course of dealing, or usage in trade. This document contains information on products, services and/or processes in development. All information provided here is subject to change without notice. Contact your Intel rep- resentative to obtain the latest forecast, schedule, specifications and roadmaps. The products and services described may contain defects or errors known as errata which may cause deviations from published specifications. Current characterized errata are available on request. Intel technologies features and benefits depend on system configuration and may require enabled hardware, software or service activation. Learn more at Intel.com, or from the OEM or retailer. Copies of documents which have an order number and are referenced in this document may be obtained by calling 1-800-548-4725 or by visiting www.intel.com/design/literature.htm. Intel, the Intel logo, Xeon, and Xeon Phi are trademarks of Intel Corporation in the U.S. and/or other countries. Optimization Notice Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations.
    [Show full text]
  • AMD Ryzen 5 1600 Specifications
    AMD Ryzen 5 1600 specifications General information Type CPU / Microprocessor Market segment Desktop Family AMD Ryzen 5 Model number 1600 CPU part numbers YD1600BBM6IAE is an OEM/tray microprocessor YD1600BBAEBOX is a boxed microprocessor with fan and heatsink Frequency 3200 MHz Turbo frequency 3600 MHz Package 1331-pin lidded micro-PGA package Socket Socket AM4 Introduction date March 15, 2017 (announcement) April 11, 2017 (launch) Price at introduction $219 Architecture / Microarchitecture Microarchitecture Zen Processor core Summit Ridge Core stepping B1 Manufacturing process 0.014 micron FinFET process 4.8 billion transistors Data width 64 bit The number of CPU cores 6 The number of threads 12 Floating Point Unit Integrated Level 1 cache size 6 x 64 KB 4-way set associative instruction caches 6 x 32 KB 8-way set associative data caches Level 2 cache size 6 x 512 KB inclusive 8-way set associative unified caches Level 3 cache size 2 x 8 MB exclusive 16-way set associative shared caches Multiprocessing Uniprocessor Features MMX instructions Extensions to MMX SSE / Streaming SIMD Extensions SSE2 / Streaming SIMD Extensions 2 SSE3 / Streaming SIMD Extensions 3 SSSE3 / Supplemental Streaming SIMD Extensions 3 SSE4 / SSE4.1 + SSE4.2 / Streaming SIMD Extensions 4 SSE4a AES / Advanced Encryption Standard instructions AVX / Advanced Vector Extensions AVX2 / Advanced Vector Extensions 2.0 BMI / BMI1 + BMI2 / Bit Manipulation instructions SHA / Secure Hash Algorithm extensions F16C / 16-bit Floating-Point conversion instructions
    [Show full text]