Apunet: Revitalizing GPU As Packet Processing Accelerator

Total Page:16

File Type:pdf, Size:1020Kb

Apunet: Revitalizing GPU As Packet Processing Accelerator APUNet: Revitalizing GPU as Packet Processing Accelerator Younghwan Go, Muhammad Asim Jamshed, YoungGyoun Moon, Changho Hwang, and KyoungSoo Park School of Electrical Engineering, KAIST GPU-accelerated Networked Systems • Execute same/similar operations on each packet in parallel • High parallelization power • Large memory bandwidth CPU GPU Packet Packet Packet Packet Packet Packet • Improvements shown in number of research works • PacketShader [SIGCOMM’10], SSLShader [NSDI’11], Kargus [CCS’12], NBA [EuroSys’15], MIDeA [CCS’11], DoubleClick [APSys’12], … 2 Source of GPU Benefits • GPU acceleration mainly comes from memory access latency hiding • Memory I/O switch to other thread for continuous execution GPU Quick GPU Thread 1 Thread 2 Context Thread 1 Thread 2 Switch … … … … a = b + c; a = b + c; d =Inactivee * f; Inactive d = e * f; … … … … v = mem[a].val; … v = mem[a].val; … Memory Prefetch in I/O background 3 Memory Access Hiding in CPU vs. GPU • Re-order CPU code to mask memory access (G-Opt)* • Group prefetching, software pipelining Questions: Can CPU code optimization be generalized to all network applications? Which processor is more beneficial in packet processing? *Borrowed from G-Opt slides *Raising the Bar for Using GPUs in Software Packet Processing [NSDI’15] 4 Anuj Kalia, Dong Zhu, Michael Kaminsky, and David G. Anderson Contributions • Demystify processor-level effectiveness on packet processing algorithms • CPU optimization benefits light-weight memory-bound workloads • CPU optimization often does not help large memory workloads • GPU is more beneficial for compute-bound workloads • GPU’s data transfer overhead is the main bottleneck, not its capacity • Packet processing system with integrated GPU w/o DMA overhead • Addresses GPU kernel setup / data sync overhead, and memory contention • Up to 4x performance over CPU-only approaches! 5 Discrete GPU • Peripheral device communicating with CPU via a PCIe lane Host GPU PCIe Lanes CPU DRAM L2 Cache Streaming Multiprocessor (SM) High computation power Graphics Processing x N Scheduler Registers High memory bandwidth Cluster Fast inst./data access Fast context switch Instruction Cache GDDR Device Memory L1 Cache Shared Memory Require CPU-GPU DMA transfer! 6 Integrated GPU • Place GPU into same die as CPU share DRAM • AMD Accelerated Processing Unit (APU), Intel HD Graphics Host CPU Unified Northbridge DRAM No DMA transfer! GPU High computation power Graphics Northbridge L2 Cache Fast inst./data access Compute Compute Unit Fast context switch Unit Low power & cost Scheduler Registers L1 x N Cache APU 7 CPU vs. GPU: Cost Efficiency Analysis • Performance-per-dollar on 8 popular packet processing algorithms • Memory- or compute-intensive • IPv4, IPv6, Aho-Corasick pattern match, ChaCha20, Poly1305, SHA-1, SHA-2, RSA • Test platform • CPU-baseline, G-Opt (optimized CPU), dGPU w/ copy, dGPU w/o copy, iGPU CPU / Discrete GPU APU / Integrated GPU CPU Intel Xeon E5-2650 v2 (8 @ 2.6 GHz) CPU AMD RX-421BD (4 @ 3.4 GHz) GPU NVIDIA GTX980 (2048 @ 1.2 GHz) GPU AMD R7 Graphics (512 @ 800 MHz) RAM 64 GB (DIMM DDR3 @ 1333 MHz) RAM 16 GB (DIMM DDR3 @ 2133 MHz) Cost CPU: $1143.9 dGPU: $840 Cost iGPU: $67.5 8 Cost Effectiveness of CPU-based Optimization • G-Opt helps memory-intensive, but not compute-intensive algorithms • Computation capacity as bottleneck with more computations IPv6 table lookup AC pattern matching SHA-2 2 1.8 2.5 4 3.5 Detailed analysis on CPU-based optimization2.0 in the paper 1.5 2 3 1.5 1.0 2 1 1.0 1 1.0 1.0 1 0.5 0.5 0 Normalized perf per dollar Normalized perf per dollar Normalized perf per dollar Normalized perf per dollar 0 0 CPU G-Opt dGPU CPU G-Opt CPU G-Opt w/ copy 9 Cost Effectiveness of Discrete/Integrated GPUs • Discrete GPU suffers from DMA transfer overhead • Integrated GPU is most cost efficient! IPv4 table lookup ChaCha20 2048-bit RSA decryption 3.5 20 16 14.4 3.08 Our approach: 17.0 3 12 2.5 Use integrated GPU15 to accelerate packet processing! 2 10 8 1.5 1.00 1.08 4.8 1 5 4.3 4 0.5 1.0 1.0 0 0 0 Normalized perf perNormalized dollar perf per dollar Normalized perf per dollar G-Opt dGPU dGPU Normalized perf per dollar G-Opt dGPU iGPU G-Opt dGPU iGPU w/ copy w/o copy w/o copy w/o copy 10 Contents • Introduction and motivation • Background on GPU • CPU vs. GPU: cost efficiency analysis • Research Challenges • APUNet design • Evaluation • Conclusion 11 Research Challenges Set input • Frequent GPU kernel setup overhead Redundant Launch kernel • Overhead exposed w/o DMA transfer overhead! Teardown kernel Retrieve result • High data synchronization overhead APU • CPU-GPU cache coherency CPU Bottleneck! Explicit DRAM GPU Sync! BW: 10x↓ • More contention on shared DRAM Cache • Reduced effective memory bandwidth APUNet: a high-performance APU-accelerated network packet processor 12 Persistent Thread Execution Architecture • Persistently run GPU threads without kernel teardown • Master passes packet pointer addresses to GPU threads CPU Shared Virtual Memory (SVM) GPU Master Thread 0 Thread 1 Workers Packet Thread 2 Packet I/O Packet Thread 3 Packet NIC Packet Pool Pointer Array Persistent Threads Packet 13 Data Synchronization Overhead • Synchronization point for GPU threads: L2 cache • Require explicit synchronization to main memory Shared Virtual Memory P P P P P Master CPU GPU Thread 0 Thread 1 Graphics L2 Thread 2 Thread 3 Update result? Northbridge Cache Thread 4 Thread 5 Need explicit Can process one Thread 6 Thread 7 sync! request at a time! 14 Solution: Group Synchronization • Implicitly synchronize group of packet memory GPU threads processed • Exploit LRU cache replacement policy Shared Virtual Memory Master P P P P P Group 1 Group 2 32 0 For more details andVerify tuning/optimizations, please refer to our CPUpaper correctness! GPU GPU Thread Group 1 GPU Thread Group 2 GPU Thread 0 Thread 31 Thread 0 L2 Cache Poll Poll Poll DP DP DP Process Process Dummy Mem I/O DP DP DP Barrier Barrier Barrier 15 Zero-copy Based Packet Processing Option 1. Traditional method with discrete GPU Option 2. Zero-copy between CPU-GPU NIC CPU COPY GPU NIC COPY CPU GPU Standard (e.g., mmap) GDDR (e.g., cudaMalloc) Standard (e.g., mmap) Shared (e.g., clSVMAlloc) High overhead! High overhead! • Integrate memory allocation for NIC, CPU, GPU NIC CPU GPU No copy overhead! Shared (e.g., clSVMAlloc) 16 Evaluation APUNet (AMD Carrizo APU) 40 Gbps NIC Client (packet/flow generator) RX-421BD (4 cores @ 3.4 GHz) Mellanox ConnectX-4 Xeon E3-1285 v4 (8 cores @ 3.5 GHz) R7 Graphics (512 cores @ 800 MHz) 32GB DRAM 16GB DRAM • How well does APUNet reduce latency and improve throughputs? • How practical is APUNet in real-world network applications? 17 Benefits of APUNet Design • Workload: IPsec (128-bit AES-CBC + HMAC-SHA1) Packet Processing Latency Synchronization Throughput GPU-Copy GPU-ZC GPU-ZC-PERSIST (64B Packet) 6 20 5.31 5 16 4 12 3 5.7x 8 2 5.4x Throughput (Gbps) 0.93 Packet (us) latency Packet 4 1 1.5x 0 0 64 128 256 512 1024 1451 Atomics Group sync Packet size (bytes) 18 Real-world Network Applications • 5 real-world network applications • IPv4/IPv6 packet forwarding, IPsec gateway, SSL proxy, network IDS IPsec Gateway SSL Proxy CPU Baseline G-Opt APUNet CPU Baseline G-Opt APUNet 20 5000 16.4 4241 3583 15 4000 2x 3000 1801 1540 2.75x 10 7.7 8.2 1791 5.3 2000 1539 5 2.6 2.8 1000 HTTP trans/sec HTTP Throughput (Gbps) 0 0 64 1451 256 8192 Packet size (bytes) Number of concurrent connections 19 Real-world Network Applications • Snort-based Network IDS Network IDS • Aho-Corasick pattern matching CPU Baseline G-Opt APUNet DFC 12 9.710.4 • No benefit from CPU optimization! 10 • Access many data structures 8 6 4x • Eviction of already cached data 3.6 3.6 4.2 4 2.6 2.3 2.4 2 • DFC* outperforms AC-APUNet Throughput (Gbps) 0 • CPU-based algorithm 64 1514 • Cache-friendly & reduces memory access Packet size (bytes) *DFC: Accelerating String Pattern Matching for Network Applications [NSDI’16] Byungkwon Choi, Jongwook Chae, Muhammad Jamshed, KyoungSoo Park, and Dongsu Han 20 Conclusion • Re-examine the efficacy of GPU-based packet processor • GPU is bottlenecked by PCIe data transfer overhead • Integrated GPU is the most cost effective processor • APUNet: APU-accelerated networked system • Persistent thread execution: eliminate kernel setup overhead • Group synchronization: minimize data synchronization overhead • Zero-copy packet processing: reduce memory contention • Up to 4x performance improvement over CPU baseline & G-Opt APUNet High-performance, cost-effective platform for real-world network applications 21 Thank you. Q & A.
Recommended publications
  • Improved Cryptanalysis of the Reduced Grøstl Compression Function, ECHO Permutation and AES Block Cipher
    Improved Cryptanalysis of the Reduced Grøstl Compression Function, ECHO Permutation and AES Block Cipher Florian Mendel1, Thomas Peyrin2, Christian Rechberger1, and Martin Schl¨affer1 1 IAIK, Graz University of Technology, Austria 2 Ingenico, France [email protected],[email protected] Abstract. In this paper, we propose two new ways to mount attacks on the SHA-3 candidates Grøstl, and ECHO, and apply these attacks also to the AES. Our results improve upon and extend the rebound attack. Using the new techniques, we are able to extend the number of rounds in which available degrees of freedom can be used. As a result, we present the first attack on 7 rounds for the Grøstl-256 output transformation3 and improve the semi-free-start collision attack on 6 rounds. Further, we present an improved known-key distinguisher for 7 rounds of the AES block cipher and the internal permutation used in ECHO. Keywords: hash function, block cipher, cryptanalysis, semi-free-start collision, known-key distinguisher 1 Introduction Recently, a new wave of hash function proposals appeared, following a call for submissions to the SHA-3 contest organized by NIST [26]. In order to analyze these proposals, the toolbox which is at the cryptanalysts' disposal needs to be extended. Meet-in-the-middle and differential attacks are commonly used. A recent extension of differential cryptanalysis to hash functions is the rebound attack [22] originally applied to reduced (7.5 rounds) Whirlpool (standardized since 2000 by ISO/IEC 10118-3:2004) and a reduced version (6 rounds) of the SHA-3 candidate Grøstl-256 [14], which both have 10 rounds in total.
    [Show full text]
  • Compact Implementation of KECCAK SHA3-1024 Hash Algorithm
    International Journal of Emerging Engineering Research and Technology Volume 3, Issue 8, August 2015, PP 41-48 ISSN 2349-4395 (Print) & ISSN 2349-4409 (Online) Compact Implementation of KECCAK SHA3-1024 Hash Algorithm Bonagiri Hemanthkumar1, T. Krishnarjuna Rao2, Dr. D. Subba Rao3 1Department of ECE, Siddhartha Institute of Engineering and Technology, Hyderabad, India (PG Scholar) 2Department of ECE, Siddhartha Institute of Engineering and Technology, Hyderabad, India (Associate Professor) 3Department of ECE, Siddhartha Institute of Engineering and Technology, Hyderabad, India (Head of the Department) ABSTRACT Five people with five different approaches has proposed SHA3 algorithm. NIST (National Institute of Standards and Technology) has selected an approach, which was proposed by Keccak. The proposed design is logically optimized for area efficiency by merging Rho, Pi and Chi steps of algorithm into single step, by logically merging these three steps we save 16 % logical resources for overall implementation. It in turn reduced latency and enhanced maximum operating frequency of design, our design shows the best throughput per slice (TPS) ratio, in this paper we are implementing SHA3 1024 variant using Xilinx 13.2 Keywords: theta, rho, pi, chi, iota. INTRODUCTION MD5 is one in a series of message digest algorithms designed by Professor Ronald Rivest of MIT (Rivest, 1992). When analytic work indicated that MD5's predecessor MD4 was likely to be insecure, Rivest designed MD5 in 1991 as a secure replacement. (Hans Dobbertin did indeed later find weaknesses in MD4.)In 1993, Den Boer and Baseliners gave an early, although limited, result of finding a "pseudo-collision" of the MD5 compression function; that is, two different initialization vectors which produce an identical digest.
    [Show full text]
  • Agilio® SSL and SSH Visibility TRANSPARENT SSL/SSH PROXY AS SMARTNIC OR APPLIANCE
    PRODUCT BRIEF Agilio® SSL and SSH Visibility TRANSPARENT SSL/SSH PROXY AS SMARTNIC OR APPLIANCE Pervasive Adoption of SSL and TLS SSL has become the dominant stream-oriented encryption protocol and now con- stitutes a significant and growing percentage of the traffic in the enterprise LAN and WAN, as well as throughout service provider networks. It has proven popular as it is easily deployed by software vendors, while offering privacy and integrity protection. The privacy benefits provided by SSL can quickly be overshadowed by the risks it KEY FEATURES brings to enterprises. Network-based threats, such as spam, spyware and viruses Decrypts SSH, SSL 3, - not to mention phishing, identity theft, accidental or intentional leakage of confi- TLS 1.0, 1.1, 1.2 and 1.3 dential information and other forms of cyber crime - have become commonplace. Unmodified attached software Network security appliances, though, are often blind to the payloads of SSL-en- and appliances gain visibility crypted communications and cannot inspect this traffic, leaving a hole in any enter- into SSL traffic prise security architecture. Known key and certificate re- signing modes Existing methods to control SSL include limiting or preventing its use, preventing its use entirely, deploying host-based IPS systems or installing proxy SSL solutions that Offloads and accelerates SSL significantly reduce network performance. processing Delivers traffic to software via Agilio SSL and SSH Visibility Technology kernel netdev, DPDK, PCAP or Netronome’s SSL and SSH Visibility technology enables standard unmodified soft- netmap interfaces ware and appliances to inspect the contents of SSL while not compromising the use Delivers traffic to physical of SSL or reducing performance.
    [Show full text]
  • Authenticated Encryption
    Authen'cated Encryp'on Andrey Bogdanov Technical University of Denmark June 2, 2014 Scope • Main focus on modes of operaon for block ciphers • Permutaon-based designs briefly men'oned Outline • Block ciphers • Basic modes of operaon • AE and AEAD • Nonce-based AE modes and features • Nonce-based AE: Implementaon proper'es • Nonce-free AE modes and features • Nonce-free AE: Implementaon proper'es • Permutaon-based AE Outline • Block ciphers • Basic modes of operaon • AE and AEAD • Nonce-based AE modes and features • Nonce-based AE: Implementaon proper'es • Nonce-free AE modes and features • Nonce-free AE: Implementaon proper'es • Permutaon-based AE Block ciphers plaintext ciphertext block cipher n bits n bits key k bits Block cipher 2n! A block cipher with n-bit block and k-bit key is a subset of 2k permutaons among all 2n! permuta9ons on n bits. Subset: 2k Some standard block ciphers plaintext ciphertext f f … f n bits 1 2 r n bits AES PRESENT Visualizaon of a round transform Why block ciphers? • Most basic security primi've in nearly all security solu'ons, e.g. used for construc'ng – stream ciphers, – hash func'ons, – message authen'caon codes, – authencated encrypon algorithms, – entropy extractors, … • Probably the best understood cryptographic primi'ves • U.S. symmetric-key encryp'on standards and recommendaons have block ciphers at their core: DES, AES Modes of operaon • The block cipher itself only encrypts one block of data – Standard and efficient block ciphers such as AES • To encrypt data that is not exactly one block – Switch a
    [Show full text]
  • Tocubehash, Grøstl, Lane, Toshabal and Spectral Hash
    FPGA Implementations of SHA-3 Candidates: CubeHash, Grøstl, Lane, Shabal and Spectral Hash Brian Baldwin, Andrew Byrne, Mark Hamilton, Neil Hanley, Robert P. McEvoy, Weibo Pan and William P. Marnane Claude Shannon Institute for Discrete Mathematics, Coding and Cryptography & Department of Electrical & Electronic Engineering, University College Cork, Ireland. Hash Functions The SHA-3 Contest Hash Function Implementations Results Conclusions Overview Hash Function Description Introduction Background Operation UCC Cryptography Group, 2009 The Claude Shannon Workshop On Coding and Cryptography Hash Functions The SHA-3 Contest Hash Function Implementations Results Conclusions Overview Hash Function Description Introduction Background Operation The SHA-3 Contest UCC Cryptography Group, 2009 The Claude Shannon Workshop On Coding and Cryptography Hash Functions The SHA-3 Contest Hash Function Implementations Results Conclusions Overview Hash Function Description Introduction Background Operation The SHA-3 Contest Overview of the Hash Function Architectures UCC Cryptography Group, 2009 The Claude Shannon Workshop On Coding and Cryptography Hash Functions The SHA-3 Contest Hash Function Implementations Results Conclusions Overview Hash Function Description Introduction Background Operation The SHA-3 Contest Overview of the Hash Function Architectures Hash Function Implementations CubeHash Grøstl Lane Shabal Spectral Hash UCC Cryptography Group, 2009 The Claude Shannon Workshop On Coding and Cryptography Hash Functions The SHA-3 Contest Hash Function
    [Show full text]
  • Improved Cryptanalysis of the Reduced Grøstl Compression Function, ECHO Permutation and AES Block Cipher
    Improved Cryptanalysis of the Reduced Grøstl Compression Function, ECHO Permutation and AES Block Cipher Florian Mendel1,ThomasPeyrin2,ChristianRechberger1, and Martin Schl¨affer1 1 IAIK, Graz University of Technology, Austria 2 Ingenico, France [email protected], [email protected] Abstract. In this paper, we propose two new ways to mount attacks on the SHA-3 candidates Grøstl,andECHO, and apply these attacks also to the AES. Our results improve upon and extend the rebound attack. Using the new techniques, we are able to extend the number of rounds in which available degrees of freedom can be used. As a result, we present the first attack on 7 rounds for the Grøstl-256 output transformation1 and improve the semi-free-start collision attack on 6 rounds. Further, we present an improved known-key distinguisher for 7 rounds of the AES block cipher and the internal permutation used in ECHO. Keywords: hash function, block cipher, cryptanalysis, semi-free-start collision, known-key distinguisher. 1 Introduction Recently, a new wave of hash function proposals appeared, following a call for submissions to the SHA-3 contest organized by NIST [26]. In order to analyze these proposals, the toolbox which is at the cryptanalysts’ disposal needs to be extended. Meet-in-the-middle and differential attacks are commonly used. A recent extension of differential cryptanalysis to hash functions is the rebound attack [22] originally applied to reduced (7.5 rounds) Whirlpool (standardized since 2000 by ISO/IEC 10118-3:2004) and a reduced version (6 rounds) of the SHA-3 candidate Grøstl-256 [14], which both have 10 rounds in total.
    [Show full text]
  • Finding Bugs in Cryptographic Hash Function Implementations Nicky Mouha, Mohammad S Raunak, D
    1 Finding Bugs in Cryptographic Hash Function Implementations Nicky Mouha, Mohammad S Raunak, D. Richard Kuhn, and Raghu Kacker Abstract—Cryptographic hash functions are security-critical on the SHA-2 family, these hash functions are in the same algorithms with many practical applications, notably in digital general family, and could potentially be attacked with similar signatures. Developing an approach to test them can be par- techniques. ticularly diffcult, and bugs can remain unnoticed for many years. We revisit the NIST hash function competition, which was In 2007, the National Institute of Standards and Technology used to develop the SHA-3 standard, and apply a new testing (NIST) released a Call for Submissions [4] to develop the new strategy to all available reference implementations. Motivated SHA-3 standard through a public competition. The intention by the cryptographic properties that a hash function should was to specify an unclassifed, publicly disclosed algorithm, to satisfy, we develop four tests. The Bit-Contribution Test checks be available worldwide without royalties or other intellectual if changes in the message affect the hash value, and the Bit- Exclusion Test checks that changes beyond the last message bit property restrictions. To allow the direct substitution of the leave the hash value unchanged. We develop the Update Test SHA-2 family of algorithms, the SHA-3 submissions were to verify that messages are processed correctly in chunks, and required to provide the same four message digest lengths. then use combinatorial testing methods to reduce the test set size Chosen through a rigorous open process that spanned eight by several orders of magnitude while retaining the same fault- years, SHA-3 became the frst hash function standard that detection capability.
    [Show full text]
  • Grøstl – a SHA-3 Candidate
    Cryptographic hash functions NIST SHA-3 Competition Grøstl Grøstl – a SHA-3 candidate Krystian Matusiewicz Wroclaw University of Technology CECC 2010, June 12, 2010 Krystian Matusiewicz Grøstl – a SHA-3 candidate 1 / 26 Cryptographic hash functions NIST SHA-3 Competition Grøstl Talk outline ◮ Cryptographic hash functions ◮ NIST SHA-3 Competition ◮ Grøstl Krystian Matusiewicz Grøstl – a SHA-3 candidate 2 / 26 Cryptographic hash functions NIST SHA-3 Competition Grøstl Cryptographic hash functions Krystian Matusiewicz Grøstl – a SHA-3 candidate 3 / 26 Cryptographic hash functions NIST SHA-3 Competition Grøstl Cryptographic hash functions: why? ◮ We want to have a short, fixed length “fingerprint” of any piece of data ◮ Different fingerprints – certainly different data ◮ Identical fingerprints – most likely the same data ◮ No one can get any information about the data from the fingerprint Krystian Matusiewicz Grøstl – a SHA-3 candidate 4 / 26 Cryptographic hash functions NIST SHA-3 Competition Grøstl Random Oracle Construction: ◮ Box with memory ◮ On a new query: pick randomly and uniformly the answer, remember it and return the result ◮ On a repeating query, repeat the answer (function) Krystian Matusiewicz Grøstl – a SHA-3 candidate 5 / 26 Cryptographic hash functions NIST SHA-3 Competition Grøstl Random Oracle Construction: ◮ Box with memory ◮ On a new query: pick randomly and uniformly the answer, remember it and return the result ◮ On a repeating query, repeat the answer (function) Properties: ◮ No information about the data ◮ To find a preimage:
    [Show full text]
  • Media Access Control (MAC) Security Amendment: Galois
    P802.1AEbn/D0.3 DRAFT Amendment to IEEE Std 802.1AE–2006 Draft Standard for Local and metropolitan area networks— Media Access Control (MAC) Security Amendment: Galois Counter Mode– Advanced Encryption Standard–256 (GCM-AES-256) Cipher Suite Sponsor LAN/MAN Standards Committee of the IEEE Computer Society Prepared by the Security Task Group of IEEE 802.1 This initial draft has been prepared by the Task Group Chair and reviewed by the task group as part of the process of discussing the scope and purpose of a proposed P802.1AEbn PAR . The Institute of Electrical and Electronics Engineers, Inc. 3 Park Avenue, New York, NY 10016-5997, USA Copyright © 2010 by the Institute of Electrical and Electronics Engineers, Inc. All rights reserved. Published 5 February 2010. Printed in the United States of America IEEE and 802 are registered trademarks in the U.S. Patent & Trademark Office, owned by The Institute of Electrical and Electronics Engineers, Incorporated. IEEE prohibits discrimination, harassment, and bullying. For more information, visit http://www.ieee.org/web/aboutus/ whatis/policies/p9-26.html. No part of this publication may be reproduced in any form, in an electronic retrieval system or otherwise, without the prior written permission of the publisher. P802.1AEa/D0.1 June 4, 2010 1 IEEE Standards documents are developed within the IEEE Societies and the Standards Coordinating Committees of the 2 IEEE Standards Association (IEEE-SA) Standards Board. The IEEE develops its standards through a consensus develop- 3 ment process, approved by the American National Standards Institute, which brings together volunteers representing varied 4 viewpoints and interests to achieve the final product.
    [Show full text]
  • Master Thesis Secure Updates in Automotive Systems
    Master Thesis Secure updates in automotive systems Author: Internal supervisor: Remy Spaan - s4156889 Peter Schwabe [email protected] [email protected] Second reader: External supervisor: Lejla Batina Sjoerd Verheijden [email protected] [email protected] May 30, 2016 2 Abstract Modern cars are increasingly connected to the Internet, providing a variety of new features that are beneficial towards both drivers and car manufacturers. With all these new features comes more leisure, although it also introduces an entire new set of security issues. It is generally known among car manufacturers and security researchers that the current state of car security is weak. There are no real standards regarding car security on a physical or wireless level. In the recent years, several studies have been conducted on modern vehicles, with a security perspective in mind. This master thesis identifies the current shortcomings regarding automotive security by taking a closer look at these studies. Additionally, this thesis provides a model and a proof-of-concept implementation to secure a part of the update system of a widely used electronic control unit (ECU) in car systems. This proof-of-concept system provides aspects like confidentiality, authenticity and integrity of a supplied update, while preventing common security pitfalls. It uses implementations of cryptographic primi- tives designed for high speed and takes into account the constraints the ECUs are bound to. While this thesis does not cover all aspects of the update process, it takes a step towards the direction of making over-the-air firmware updates for car systems more secure. Keywords.
    [Show full text]
  • Minalpher V1.1
    Minalpher v1.1 Designers Yu Sasaki, Yosuke Todo, Kazumaro Aoki, Yusuke Naito, Takeshi Sugawara, Yumiko Murakami, Mitsuru Matsui, Shoichi Hirose Submitters NTT Secure Platform Laboratories, Mitsubishi Electric Corporation, University of Fukui Contact [email protected] Version 1.1: 29 August 2015 Changes From v1.0 to v1.1 { Clarified the handling of illegal input length, specially, when a ciphertext length is not a multiple of a block. { Updated the numbers for software implementation on x86 64. { Added references which are found after the submission of v1.0. { Corrected typos. 2 1 Specification Section 1.1 specifies the mode of operation of Minalpher. Section 1.2 shows the exact parameters in our design. Section 1.3 defines the maximum input message length to Minalpher. Section 1.4 specifies the underlying primitive in Minalpher. 1.1 Mode of Operation Minalpher supports two functionalities: authenticated encryption with associated data (AEAD) and mes- sage authentication code (MAC). In this section, we specify these modes of operation. Subsection 1.1.1 gives notations used in the modes of operation. Subsection 1.1.2 specifies a padding rule used in the modes of operation. Subsection 1.1.3 specifies a primitive which we call tweakable Even- Mansour. The construction dates back to Kurosawa [27, 28], who builds a tweakable block-cipher from a permutation-based block-cipher proposed by Even and Mansour [18]. Tweakable block-ciphers are originated by Liskov, Rivest and Wagner [33]. Subsection 1.1.4 specifies the AEAD mode of operation. Subsection 1.1.5 specifies the MAC mode of operation.
    [Show full text]
  • A Lightweight Implementation of Keccak Hash Function for Radio-Frequency Identification Applications
    A Lightweight Implementation of Keccak Hash Function for Radio-Frequency Identification Applications Elif Bilge Kavun and Tolga Yalcin Department of Cryptography Institute of Applied Mathematics, METU Ankara, Turkey {e168522,tyalcin}@metu.edu.tr Abstract. In this paper, we present a lightweight implementation of the permutation Keccak-f[200] and Keccak-f[400] of the SHA-3 candidate hash function Keccak. Our design is well suited for radio-frequency identification (RFID) applications that have limited resources and demand lightweight cryptographic hardware. Besides its low-area and low-power, our design gives a decent throughput. To the best of our knowledge, it is also the first lightweight implementation of a sponge function, which differentiates it from the previous works. By implementing the new hash algorithm Keccak, we have utilized unique advantages of the sponge construction. Although the implementation is targeted for Application Specific Integrated Circuit (ASIC) platforms, it is also suitable for Field Programmable Gate Arrays (FPGA). To obtain a compact design, serialized data processing principles are exploited together with algorithm-specific optimizations. The design requires only 2.52K gates with a throughput of 8 Kbps at 100 KHz system clock based on 0.13-µm CMOS standard cell library. Keywords: RFID, Keccak, SHA-3, sponge function, serialized processing, low- area, low-power, high throughput. 1 Introduction In recent years, the developments on digital wireless technology have improved many areas such as the mobile systems. Mobile and embedded devices will be everywhere in times to come, making it possible to use communication services and other applications anytime and anywhere. Among these mobile devices, radio-frequency identification (RFID) tags offer low-cost, long battery life and unprecedented mobility [1-2].
    [Show full text]