Efficient Cryptography on the RISC-V Architecture
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Lab 7: Floating-Point Addition 0.0
Lab 7: Floating-Point Addition 0.0 Introduction In this lab, you will write a MIPS assembly language function that performs floating-point addition. You will then run your program using PCSpim (just as you did in Lab 6). For testing, you are provided a program that calls your function to compute the value of the mathematical constant e. For those with no assembly language experience, this will be a long lab, so plan your time accordingly. Background You should be familiar with the IEEE 754 Floating-Point Standard, which is described in Section 3.6 of your book. (Hopefully you have read that section carefully!) Here we will be dealing only with single precision floating- point values, which are formatted as follows (this is also described in the “Floating-Point Representation” subsec- tion of Section 3.6 in your book): Sign Exponent (8 bits) Significand (23 bits) 31 30 29 ... 24 23 22 21 ... 0 Remember that the exponent is biased by 127, which means that an exponent of zero is represented by 127 (01111111). The exponent is not encoded using 2s-complement. The significand is always positive, and the sign bit is kept separately. Note that the actual significand is 24 bits long; the first bit is always a 1 and thus does not need to be stored explicitly. This will be important to remember when you write your function! There are several details of IEEE 754 that you will not have to worry about in this lab. For example, the expo- nents 00000000 and 11111111 are reserved for special purposes that are described in your book (representing zero, denormalized numbers and NaNs). -
Improved Cryptanalysis of the Reduced Grøstl Compression Function, ECHO Permutation and AES Block Cipher
Improved Cryptanalysis of the Reduced Grøstl Compression Function, ECHO Permutation and AES Block Cipher Florian Mendel1, Thomas Peyrin2, Christian Rechberger1, and Martin Schl¨affer1 1 IAIK, Graz University of Technology, Austria 2 Ingenico, France [email protected],[email protected] Abstract. In this paper, we propose two new ways to mount attacks on the SHA-3 candidates Grøstl, and ECHO, and apply these attacks also to the AES. Our results improve upon and extend the rebound attack. Using the new techniques, we are able to extend the number of rounds in which available degrees of freedom can be used. As a result, we present the first attack on 7 rounds for the Grøstl-256 output transformation3 and improve the semi-free-start collision attack on 6 rounds. Further, we present an improved known-key distinguisher for 7 rounds of the AES block cipher and the internal permutation used in ECHO. Keywords: hash function, block cipher, cryptanalysis, semi-free-start collision, known-key distinguisher 1 Introduction Recently, a new wave of hash function proposals appeared, following a call for submissions to the SHA-3 contest organized by NIST [26]. In order to analyze these proposals, the toolbox which is at the cryptanalysts' disposal needs to be extended. Meet-in-the-middle and differential attacks are commonly used. A recent extension of differential cryptanalysis to hash functions is the rebound attack [22] originally applied to reduced (7.5 rounds) Whirlpool (standardized since 2000 by ISO/IEC 10118-3:2004) and a reduced version (6 rounds) of the SHA-3 candidate Grøstl-256 [14], which both have 10 rounds in total. -
Apunet: Revitalizing GPU As Packet Processing Accelerator
APUNet: Revitalizing GPU as Packet Processing Accelerator Younghwan Go, Muhammad Asim Jamshed, YoungGyoun Moon, Changho Hwang, and KyoungSoo Park School of Electrical Engineering, KAIST GPU-accelerated Networked Systems • Execute same/similar operations on each packet in parallel • High parallelization power • Large memory bandwidth CPU GPU Packet Packet Packet Packet Packet Packet • Improvements shown in number of research works • PacketShader [SIGCOMM’10], SSLShader [NSDI’11], Kargus [CCS’12], NBA [EuroSys’15], MIDeA [CCS’11], DoubleClick [APSys’12], … 2 Source of GPU Benefits • GPU acceleration mainly comes from memory access latency hiding • Memory I/O switch to other thread for continuous execution GPU Quick GPU Thread 1 Thread 2 Context Thread 1 Thread 2 Switch … … … … a = b + c; a = b + c; d =Inactivee * f; Inactive d = e * f; … … … … v = mem[a].val; … v = mem[a].val; … Memory Prefetch in I/O background 3 Memory Access Hiding in CPU vs. GPU • Re-order CPU code to mask memory access (G-Opt)* • Group prefetching, software pipelining Questions: Can CPU code optimization be generalized to all network applications? Which processor is more beneficial in packet processing? *Borrowed from G-Opt slides *Raising the Bar for Using GPUs in Software Packet Processing [NSDI’15] 4 Anuj Kalia, Dong Zhu, Michael Kaminsky, and David G. Anderson Contributions • Demystify processor-level effectiveness on packet processing algorithms • CPU optimization benefits light-weight memory-bound workloads • CPU optimization often does not help large memory -
The Hexadecimal Number System and Memory Addressing
C5537_App C_1107_03/16/2005 APPENDIX C The Hexadecimal Number System and Memory Addressing nderstanding the number system and the coding system that computers use to U store data and communicate with each other is fundamental to understanding how computers work. Early attempts to invent an electronic computing device met with disappointing results as long as inventors tried to use the decimal number sys- tem, with the digits 0–9. Then John Atanasoff proposed using a coding system that expressed everything in terms of different sequences of only two numerals: one repre- sented by the presence of a charge and one represented by the absence of a charge. The numbering system that can be supported by the expression of only two numerals is called base 2, or binary; it was invented by Ada Lovelace many years before, using the numerals 0 and 1. Under Atanasoff’s design, all numbers and other characters would be converted to this binary number system, and all storage, comparisons, and arithmetic would be done using it. Even today, this is one of the basic principles of computers. Every character or number entered into a computer is first converted into a series of 0s and 1s. Many coding schemes and techniques have been invented to manipulate these 0s and 1s, called bits for binary digits. The most widespread binary coding scheme for microcomputers, which is recog- nized as the microcomputer standard, is called ASCII (American Standard Code for Information Interchange). (Appendix B lists the binary code for the basic 127- character set.) In ASCII, each character is assigned an 8-bit code called a byte. -
1 Assembly Language Programming Status Flags the Status Flags Reflect the Outcomes of Arithmetic and Logical Operations Performe
Assembly Language Programming Status Flags The status flags reflect the outcomes of arithmetic and logical operations performed by the CPU. • The carry flag (CF) is set when the result of an unsigned arithmetic operation is too large to fit into the destination. • The overflow flag (OF) is set when the result of a signed arithmetic operation is too large or too small to fit into the destination. • The sign flag (SF) is set when the result of an arithmetic or logical operation generates a negative result. • The zero flag (ZF) is set when the result of an arithmetic or logical operation generates a result of zero. Assembly Programs We are going to run assembly programs from (http://www.kipirvine.com/asm/) using Visual Studio. I have downloaded all of the example programs and placed them in CS430 Pub. Copy them onto your local machine and start up Visual Studio. The first program we are going to run is below. Copy this into the Project_Sample project in the examples folder. Run the program. Let’s talk about what this program does. TITLE Add and Subtract ; This program ; Last update: 06/01/2006 INCLUDE Irvine32.inc .code main PROC mov eax,10000h add eax,40000h sub eax,20000h call DumpRegs exit main ENDP END main 1 What’s the difference between the previous program and this one: TITLE Add and Subtract, Version 2 (AddSub2.asm) ; This program adds and subtracts 32-bit integers ; and stores the sum in a variable. ; Last update: 06/01/2006 INCLUDE Irvine32.inc .data val1 dword 10000h val2 dword 40000h val3 dword 20000h finalVal dword ? .code main PROC mov eax,val1 ; start with 10000h add eax,val2 ; add 40000h sub eax,val3 ; subtract 20000h mov finalVal,eax ; store the result (30000h) call DumpRegs ; display the registers exit main ENDP END main Data Transfer Instructions The MOV instruction copies from a source operand to a destination operand. -
ARM Instruction Set
4 ARM Instruction Set This chapter describes the ARM instruction set. 4.1 Instruction Set Summary 4-2 4.2 The Condition Field 4-5 4.3 Branch and Exchange (BX) 4-6 4.4 Branch and Branch with Link (B, BL) 4-8 4.5 Data Processing 4-10 4.6 PSR Transfer (MRS, MSR) 4-17 4.7 Multiply and Multiply-Accumulate (MUL, MLA) 4-22 4.8 Multiply Long and Multiply-Accumulate Long (MULL,MLAL) 4-24 4.9 Single Data Transfer (LDR, STR) 4-26 4.10 Halfword and Signed Data Transfer 4-32 4.11 Block Data Transfer (LDM, STM) 4-37 4.12 Single Data Swap (SWP) 4-43 4.13 Software Interrupt (SWI) 4-45 4.14 Coprocessor Data Operations (CDP) 4-47 4.15 Coprocessor Data Transfers (LDC, STC) 4-49 4.16 Coprocessor Register Transfers (MRC, MCR) 4-53 4.17 Undefined Instruction 4-55 4.18 Instruction Set Examples 4-56 ARM7TDMI-S Data Sheet 4-1 ARM DDI 0084D Final - Open Access ARM Instruction Set 4.1 Instruction Set Summary 4.1.1 Format summary The ARM instruction set formats are shown below. 3 3 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 9876543210 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 Cond 0 0 I Opcode S Rn Rd Operand 2 Data Processing / PSR Transfer Cond 0 0 0 0 0 0 A S Rd Rn Rs 1 0 0 1 Rm Multiply Cond 0 0 0 0 1 U A S RdHi RdLo Rn 1 0 0 1 Rm Multiply Long Cond 0 0 0 1 0 B 0 0 Rn Rd 0 0 0 0 1 0 0 1 Rm Single Data Swap Cond 0 0 0 1 0 0 1 0 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 1 Rn Branch and Exchange Cond 0 0 0 P U 0 W L Rn Rd 0 0 0 0 1 S H 1 Rm Halfword Data Transfer: register offset Cond 0 0 0 P U 1 W L Rn Rd Offset 1 S H 1 Offset Halfword Data Transfer: immediate offset Cond 0 -
Compact Implementation of KECCAK SHA3-1024 Hash Algorithm
International Journal of Emerging Engineering Research and Technology Volume 3, Issue 8, August 2015, PP 41-48 ISSN 2349-4395 (Print) & ISSN 2349-4409 (Online) Compact Implementation of KECCAK SHA3-1024 Hash Algorithm Bonagiri Hemanthkumar1, T. Krishnarjuna Rao2, Dr. D. Subba Rao3 1Department of ECE, Siddhartha Institute of Engineering and Technology, Hyderabad, India (PG Scholar) 2Department of ECE, Siddhartha Institute of Engineering and Technology, Hyderabad, India (Associate Professor) 3Department of ECE, Siddhartha Institute of Engineering and Technology, Hyderabad, India (Head of the Department) ABSTRACT Five people with five different approaches has proposed SHA3 algorithm. NIST (National Institute of Standards and Technology) has selected an approach, which was proposed by Keccak. The proposed design is logically optimized for area efficiency by merging Rho, Pi and Chi steps of algorithm into single step, by logically merging these three steps we save 16 % logical resources for overall implementation. It in turn reduced latency and enhanced maximum operating frequency of design, our design shows the best throughput per slice (TPS) ratio, in this paper we are implementing SHA3 1024 variant using Xilinx 13.2 Keywords: theta, rho, pi, chi, iota. INTRODUCTION MD5 is one in a series of message digest algorithms designed by Professor Ronald Rivest of MIT (Rivest, 1992). When analytic work indicated that MD5's predecessor MD4 was likely to be insecure, Rivest designed MD5 in 1991 as a secure replacement. (Hans Dobbertin did indeed later find weaknesses in MD4.)In 1993, Den Boer and Baseliners gave an early, although limited, result of finding a "pseudo-collision" of the MD5 compression function; that is, two different initialization vectors which produce an identical digest. -
POINTER (IN C/C++) What Is a Pointer?
POINTER (IN C/C++) What is a pointer? Variable in a program is something with a name, the value of which can vary. The way the compiler and linker handles this is that it assigns a specific block of memory within the computer to hold the value of that variable. • The left side is the value in memory. • The right side is the address of that memory Dereferencing: • int bar = *foo_ptr; • *foo_ptr = 42; // set foo to 42 which is also effect bar = 42 • To dereference ted, go to memory address of 1776, the value contain in that is 25 which is what we need. Differences between & and * & is the reference operator and can be read as "address of“ * is the dereference operator and can be read as "value pointed by" A variable referenced with & can be dereferenced with *. • Andy = 25; • Ted = &andy; All expressions below are true: • andy == 25 // true • &andy == 1776 // true • ted == 1776 // true • *ted == 25 // true How to declare pointer? • Type + “*” + name of variable. • Example: int * number; • char * c; • • number or c is a variable is called a pointer variable How to use pointer? • int foo; • int *foo_ptr = &foo; • foo_ptr is declared as a pointer to int. We have initialized it to point to foo. • foo occupies some memory. Its location in memory is called its address. &foo is the address of foo Assignment and pointer: • int *foo_pr = 5; // wrong • int foo = 5; • int *foo_pr = &foo; // correct way Change the pointer to the next memory block: • int foo = 5; • int *foo_pr = &foo; • foo_pr ++; Pointer arithmetics • char *mychar; // sizeof 1 byte • short *myshort; // sizeof 2 bytes • long *mylong; // sizeof 4 byts • mychar++; // increase by 1 byte • myshort++; // increase by 2 bytes • mylong++; // increase by 4 bytes Increase pointer is different from increase the dereference • *P++; // unary operation: go to the address of the pointer then increase its address and return a value • (*P)++; // get the value from the address of p then increase the value by 1 Arrays: • int array[] = {45,46,47}; • we can call the first element in the array by saying: *array or array[0]. -
Agilio® SSL and SSH Visibility TRANSPARENT SSL/SSH PROXY AS SMARTNIC OR APPLIANCE
PRODUCT BRIEF Agilio® SSL and SSH Visibility TRANSPARENT SSL/SSH PROXY AS SMARTNIC OR APPLIANCE Pervasive Adoption of SSL and TLS SSL has become the dominant stream-oriented encryption protocol and now con- stitutes a significant and growing percentage of the traffic in the enterprise LAN and WAN, as well as throughout service provider networks. It has proven popular as it is easily deployed by software vendors, while offering privacy and integrity protection. The privacy benefits provided by SSL can quickly be overshadowed by the risks it KEY FEATURES brings to enterprises. Network-based threats, such as spam, spyware and viruses Decrypts SSH, SSL 3, - not to mention phishing, identity theft, accidental or intentional leakage of confi- TLS 1.0, 1.1, 1.2 and 1.3 dential information and other forms of cyber crime - have become commonplace. Unmodified attached software Network security appliances, though, are often blind to the payloads of SSL-en- and appliances gain visibility crypted communications and cannot inspect this traffic, leaving a hole in any enter- into SSL traffic prise security architecture. Known key and certificate re- signing modes Existing methods to control SSL include limiting or preventing its use, preventing its use entirely, deploying host-based IPS systems or installing proxy SSL solutions that Offloads and accelerates SSL significantly reduce network performance. processing Delivers traffic to software via Agilio SSL and SSH Visibility Technology kernel netdev, DPDK, PCAP or Netronome’s SSL and SSH Visibility technology enables standard unmodified soft- netmap interfaces ware and appliances to inspect the contents of SSL while not compromising the use Delivers traffic to physical of SSL or reducing performance. -
Subtyping Recursive Types
ACM Transactions on Programming Languages and Systems, 15(4), pp. 575-631, 1993. Subtyping Recursive Types Roberto M. Amadio1 Luca Cardelli CNRS-CRIN, Nancy DEC, Systems Research Center Abstract We investigate the interactions of subtyping and recursive types, in a simply typed λ-calculus. The two fundamental questions here are whether two (recursive) types are in the subtype relation, and whether a term has a type. To address the first question, we relate various definitions of type equivalence and subtyping that are induced by a model, an ordering on infinite trees, an algorithm, and a set of type rules. We show soundness and completeness between the rules, the algorithm, and the tree semantics. We also prove soundness and a restricted form of completeness for the model. To address the second question, we show that to every pair of types in the subtype relation we can associate a term whose denotation is the uniquely determined coercion map between the two types. Moreover, we derive an algorithm that, when given a term with implicit coercions, can infer its least type whenever possible. 1This author's work has been supported in part by Digital Equipment Corporation and in part by the Stanford-CNR Collaboration Project. Page 1 Contents 1. Introduction 1.1 Types 1.2 Subtypes 1.3 Equality of Recursive Types 1.4 Subtyping of Recursive Types 1.5 Algorithm outline 1.6 Formal development 2. A Simply Typed λ-calculus with Recursive Types 2.1 Types 2.2 Terms 2.3 Equations 3. Tree Ordering 3.1 Subtyping Non-recursive Types 3.2 Folding and Unfolding 3.3 Tree Expansion 3.4 Finite Approximations 4. -
A Variable Precision Hardware Acceleration for Scientific Computing Andrea Bocco
A variable precision hardware acceleration for scientific computing Andrea Bocco To cite this version: Andrea Bocco. A variable precision hardware acceleration for scientific computing. Discrete Mathe- matics [cs.DM]. Université de Lyon, 2020. English. NNT : 2020LYSEI065. tel-03102749 HAL Id: tel-03102749 https://tel.archives-ouvertes.fr/tel-03102749 Submitted on 7 Jan 2021 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. N°d’ordre NNT : 2020LYSEI065 THÈSE de DOCTORAT DE L’UNIVERSITÉ DE LYON Opérée au sein de : CEA Grenoble Ecole Doctorale InfoMaths EDA N17° 512 (Informatique Mathématique) Spécialité de doctorat :Informatique Soutenue publiquement le 29/07/2020, par : Andrea Bocco A variable precision hardware acceleration for scientific computing Devant le jury composé de : Frédéric Pétrot Président et Rapporteur Professeur des Universités, TIMA, Grenoble, France Marc Dumas Rapporteur Professeur des Universités, École Normale Supérieure de Lyon, France Nathalie Revol Examinatrice Docteure, École Normale Supérieure de Lyon, France Fabrizio Ferrandi Examinateur Professeur associé, Politecnico di Milano, Italie Florent de Dinechin Directeur de thèse Professeur des Universités, INSA Lyon, France Yves Durand Co-directeur de thèse Docteur, CEA Grenoble, France Cette thèse est accessible à l'adresse : http://theses.insa-lyon.fr/publication/2020LYSEI065/these.pdf © [A. -
Authenticated Encryption
Authen'cated Encryp'on Andrey Bogdanov Technical University of Denmark June 2, 2014 Scope • Main focus on modes of operaon for block ciphers • Permutaon-based designs briefly men'oned Outline • Block ciphers • Basic modes of operaon • AE and AEAD • Nonce-based AE modes and features • Nonce-based AE: Implementaon proper'es • Nonce-free AE modes and features • Nonce-free AE: Implementaon proper'es • Permutaon-based AE Outline • Block ciphers • Basic modes of operaon • AE and AEAD • Nonce-based AE modes and features • Nonce-based AE: Implementaon proper'es • Nonce-free AE modes and features • Nonce-free AE: Implementaon proper'es • Permutaon-based AE Block ciphers plaintext ciphertext block cipher n bits n bits key k bits Block cipher 2n! A block cipher with n-bit block and k-bit key is a subset of 2k permutaons among all 2n! permuta9ons on n bits. Subset: 2k Some standard block ciphers plaintext ciphertext f f … f n bits 1 2 r n bits AES PRESENT Visualizaon of a round transform Why block ciphers? • Most basic security primi've in nearly all security solu'ons, e.g. used for construc'ng – stream ciphers, – hash func'ons, – message authen'caon codes, – authencated encrypon algorithms, – entropy extractors, … • Probably the best understood cryptographic primi'ves • U.S. symmetric-key encryp'on standards and recommendaons have block ciphers at their core: DES, AES Modes of operaon • The block cipher itself only encrypts one block of data – Standard and efficient block ciphers such as AES • To encrypt data that is not exactly one block – Switch a