Analysis of Transient-Execution Attacks on the Out-Of-Order CHERI- RISC-V Microprocessor Toooba

Total Page:16

File Type:pdf, Size:1020Kb

Analysis of Transient-Execution Attacks on the Out-Of-Order CHERI- RISC-V Microprocessor Toooba DEGREE PROJECT IN COMPUTER SCIENCE AND ENGINEERING, SECOND CYCLE, 30 CREDITS STOCKHOLM, SWEDEN 2021 Analysis of Transient-Execution Attacks on the out-of-order CHERI- RISC-V Microprocessor Toooba FRANZ ANTON FUCHS KTH ROYAL INSTITUTE OF TECHNOLOGY SCHOOL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCE Analysis of Transient-Execution Attacks on the out-of-order CHERI-RISC-V Microprocessor Toooba FRANZ ANTON FUCHS Master in Computer Science Date: January 27, 2021 Supervisor: Roberto Guanciale Examiner: Mads Dam School of Electrical Engineering and Computer Science Host Organisation: University of Cambridge Department of Computer Science and Technology Swedish title: Analys av transient-execution attacker på out-of-order CHERI-RISC-V mikroprocessorn Toooba ii Analysis of Transient-Execution Attacks on the out-of-order CHERI-RISC-V Microprocessor Toooba Copyright © 2021 by Franz Anton Fuchs All rights reserved. No part of this work may be reproduced or used in any manner without written permission of the copyright owner except for the use of quotations. iii Abstract Transient-execution attacks have been deemed a large threat for microarchitec- tures through research in recent years. In this work, I reproduce and develop transient-execution attacks against RISC-V and CHERI-RISC-V microarchi- tectures. CHERI is an instruction set architecture (ISA) security extension that provides fine-grained memory protection and compartmentalisation. I con- duct transient-execution experiments for this work on Toooba – a superscalar out-of-order processor implementing CHERI-RISC-V. I present a new sub- class of transient-execution attacks dubbed Meltdown-CF(Capability Forgery). Furthermore, I reproduced all four major Spectre-style attacks and important Meltdown-style attacks. This work analyses all attacks and explains the out- come of the respective experiments based on architectural and microarchitec- tural decisions made by their developers. While all four Spectre-style attacks could be successfully reproduced, the cores do not appear to be vulnerable to prior Meltdown-style attacks. I find that Spectre-BTB and Spectre-RSB pose a large threat to CHERI systems as well as the newly developed transient- execution attack subclass Meltdown-CF. However, all four major Spectre-style attacks and all attacks of the Meltdown-CF subclass violate CHERI’s security model and therefore require security mechanisms to be put in place. iv Sammanfattning Transient-execution-attacker har utgjort ett stort hot för mikroarkitekturer i senaste årens forskning. I den här avhandlingen återskapar jag och utvecklar transient-execution-attacker mot RISC-V och CHERI-RISC-V mikroarkitek- turer. CHERI är en instruction set architecture (ISA) security extension som ger finkornig memory protection och compartmentalisation. I avhandlingen genomför jag transient-execution-experiment på Toooba – en superscalar out- of-order processor som implementerar CHERI-RISC-V. Jag presenterar en ny sorts transient-execution-attack som kallas Meltdown-CF(Capability Forge- ry). Därutöver har jag återskapat de fyra stora Spectre-style-attackerna och viktiga Meltdown-style-attacker. I avhandlingen analyserar jag dessa attac- ker och förklarar resultaten från experimenten utifrån de arkitektoniska och mikroarkitektoniska besluten tagna av respektive utvecklare. Medan de fyra Spectre-style-attackerna kunde återskapas med framgång verkar inte proces- sorkärnorna vara sårbara för tidigare Meltdown-style-attacker. Jag kom fram till att Spectre-BTB och Spectre-RSB såväl som den nya sortens transient- execution-attack Meltdown-CF utgör ett stort hot för CHERI-system. Däremot bryter de fyra stora Spectre-style-attackerna och alla attacker av Meltdown- CF-typen mot CHERI:s threat-model och kräver därmed säkerhetsmekanismer för att verkställas. v Acknowledgements I would like to thank: • Simon W. Moore, my supervisor at Cambridge, who – even though the circumstances were not in our favour – believed in me and gave me the opportunity to conduct my work remotely. Furthermore, he provided lots of feedback throughout close and regular supervision sessions. • Jonathan Woodruff, my advisor, who spent many hours explaining vari- ous concepts to me, was always happy to discuss my ideas, and provided feedback and inspirations that heavily impacted my work. • Peter Rugg, Alexandre Joannou, Jessica Clarke, Marno van der Maas, and others who assisted me in solving a wide range of problems and made me rethink my approaches and ideas. • Robert N. M. Watson and the entire CHERI team who warmly welcomed me into the team and created a helpful and encouraging atmosphere. • Roberto Guanciale, my supervisor at KTH, who made it possible to con- duct this thesis work within the CHERI group and supported me through the entire process by providing important high-level feedback. Contents 1 Introduction 1 1.1 Research Question and Scope . 2 1.2 Contributions . 2 1.3 Figures and Permissions . 2 1.4 Outline . 3 2 Background 4 2.1 Microarchitectural Background . 4 2.1.1 RISC-V . 4 2.1.2 Caches and Memory . 6 2.1.3 Out-of-order Execution . 6 2.1.4 Speculative Execution . 7 2.1.5 Memory Disambiguation . 9 2.2 Transient-Execution Attacks . 9 2.2.1 Spectre Attacks . 10 2.2.2 Meltdown Attacks . 13 2.2.3 Timing Side Channels . 15 2.3 Security Mechanisms . 15 2.3.1 Tagging Microarchitectural State . 16 2.3.2 Special Instructions . 16 2.4 CHERI . 17 2.4.1 CHERI Abstract Model . 17 2.4.2 CHERI-RISC-V . 21 2.4.3 CHERI-RISC-V Hardware . 22 2.4.4 CHERI Software Stack . 22 2.4.5 CHERI Security Model . 23 2.5 Related Work . 24 vii viii CONTENTS 3 Methods 26 3.1 Toooba . 26 3.2 Research Methodology . 28 3.3 Common Mechanisms . 29 3.3.1 Flushing Caches . 29 3.3.2 Timing Measurements . 30 4 RISC-V Results 32 4.1 Spectre Attacks . 32 4.1.1 Spectre-PHT . 33 4.1.2 Spectre-PHT-Write . 34 4.1.3 Spectre-BTB . 34 4.1.4 Spectre-RSB . 34 4.1.5 Spectre-STL . 35 4.2 Meltdown Attacks . 36 4.2.1 Meltdown-US . 36 4.2.2 Meltdown-GP . 37 5 CHERI-RISC-V Results 38 5.1 Spectre Attacks . 38 5.1.1 Spectre-PHT . 38 5.1.2 Spectre-PHT-CHERI-Write . 41 5.1.3 Spectre-BTB on CHERI-Sandboxes . 41 5.1.4 Priv-Mode Attacks . 45 5.1.5 Spectre-RSB . 46 5.1.6 Spectre-STL . 48 5.2 Meltdown Attacks . 49 5.2.1 Meltdown-US-CHERI . 49 5.2.2 Meltdown-GP-CHERI . 50 5.2.3 Meltdown-CF . 51 6 Discussion 58 6.1 SinglePCC . 58 6.1.1 Mechanism . 58 6.1.2 Testing SinglePCC . 59 6.1.3 Hardening SinglePCC . 60 6.1.4 Spectre-BTB in Kernel Code . 62 6.2 Preventing Meltdown-CF . 63 6.3 Ethics and Sustainability . 64 6.4 Future Work . 64 CONTENTS ix 7 Conclusions 66 Bibliography 67 A Full C Attack 73 B Full CHERI-RISC-V Attack 78 x CONTENTS Acronyms ABI Application Binary Interface ALU Arithmetic Logic Unit ASID Address Space Identifier ASR Access System Registers BHT Branch History Table BOOM Berkeley Out-of-Order Machine BTB Branch Target Buffer CHERI Capability Hardware Enhanced RISC Instructions CID CHERI Compartment Identifier CISC Complex Instruction Set Computing CSR Control and Status Register DDC Default Data Capability FPGA Field Programmable Gate Array FPU Floating Point Unit HDL Hardware Description Language ILP Instruction-Level Parallelism IR Intermediate Representation ISA Instruction-Set Architecture LFB Line Fill Buffer LLC Last Level Cache LSB Least Significant Bit LSQ Load-Store Queue MMU Memory Management Unit CONTENTS xi MSB Most Significant Bit PCC Program Counter Capability PHT Pattern History Table PTE Page Table Entry RAS Return Address Stack RIDL Rogue In-Flight Data Load RISC Reduced Instruction Set Computing ROB Reorder Buffer ROP Return-Oriented Programming RSB Return Stack Buffer SCR Special Capability Register STL Store-To-Load SUM Supervisor User Memory TLB Translation Lookaside Buffer Chapter 1 Introduction Memory safety in general has been one of the most difficult security problems in the secure computing world. The heartbleed bug gives a good example of the severity of memory safety problems and explains the need for strong memory safety [1]. One approach to mitigate these kinds of attacks is Cyclone – a dialect of C that aims to achieve memory safety [2]. Similar approaches are CCured [3] that aims to enhance type-safety of C programs and Checked C [4] that helps to guarantee spatial memory safety for C programs. Another approach to implement memory safety is in-memory capability systems, which enforce memory accesses through capabilities in place of in- teger addresses. The idea of capability systems is not new, but has existed for more than forty years, e.g., the CAP Computer [7] or Ackerman’s architecture [8]. However, capability systems have never been commercially successful. The CHERI project starting in 2010 revived the idea of capability systems and had a large impact on the field. The main idea of CHERI is to effectively ensure spatial and temporal memory safety. CHERI systems can mitigate at- tacks targeting spatial or temporal memory safety vulnerabilities. However, in January 2018, a new class of attacks was published called transient-execution attacks. These kinds of attacks had a major impact on the processor industry and pose a large threat to CHERI systems as they can circumvent the security mechanisms in place. Transient-execution attacks have partly been evaluated on RISC-V and not evaluated at all on CHERI-RISC-V systems. Therefore, the question remains whether these attacks are also possible on RISC-V and CHERI-RISC-V systems, which this thesis aims to answer. 1 2 CHAPTER 1. INTRODUCTION 1.1 Research Question and Scope The main research question evaluated throughout the course of this thesis is: Is the out-of-order CHERI-RISC-V processor Toooba vulnerable to transient- execution attacks? In order to answer that question, I will attempt to repro- duce all major transient-execution attacks in both RISC-V and CHERI-RISC- V.
Recommended publications
  • 2. Instruction Set Architecture
    !1 2. ISA Gebotys ECE 222 2. Instruction Set Architecture “Ideally, your initial instruction set should be an exemplar, …” " Instruction set architecture (ISA) defines the interface between the hardware and software# " instruction set is the language of the computer# " RISCV instructions are 32-bits, instruction[31:0]# " RISC-V assembly1 language notation # " uses 64-bit registers, 64-bits refer to double word, 32-bits refers to word (8-bits is byte).# " there are 32 registers, namely x0-x31, where x0 is always zero # " to perform arithmetic operations (add, sub, shift, logical) data must always be in registers # " the number of variables in programs is typically larger than 32, hence ‘less frequently used’ [or those used later] variables are ‘spilled’ into memory [spilling registers]# " registers are faster and more energy e$cient than memory# " for embedded applications where code size is important, a 16-bit instruction set exists, RISC-V compressed (e.g. others exist also ARM Thumb and Thumb2)# " byte addressing is used, little endian (where address of 64-bit word refers to address of ‘little' or rightmost byte, [containing bit 0 of word]) so sequential double word accesses di%er by 8 e.g. byte address 0 holds the first double word and byte address 8 holds next double word. (Byte addressing allows the supports of two byte instructions)# " memory contains 261 memory words - using load/store instructions e.g. 64-bits available (bits 63 downto 0, 3 of those bits are used for byte addressing, leaving 61 bits)# " Program counter register (PC)
    [Show full text]
  • Security and Hardening Guide Security and Hardening Guide SUSE Linux Enterprise Desktop 15 SP2
    SUSE Linux Enterprise Desktop 15 SP2 Security and Hardening Guide Security and Hardening Guide SUSE Linux Enterprise Desktop 15 SP2 Introduces basic concepts of system security, covering both local and network security aspects. Shows how to use the product inherent security software like AppArmor, SELinux, or the auditing system that reliably collects information about any security-relevant events. Supports the administrator with security-related choices and decisions in installing and setting up a secure SUSE Linux Enterprise Server and additional processes to further secure and harden that installation. Publication Date: September 24, 2021 SUSE LLC 1800 South Novell Place Provo, UT 84606 USA https://documentation.suse.com Copyright © 2006– 2021 SUSE LLC and contributors. All rights reserved. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or (at your option) version 1.3; with the Invariant Section being this copyright notice and license. A copy of the license version 1.2 is included in the section entitled “GNU Free Documentation License”. For SUSE trademarks, see https://www.suse.com/company/legal/ . All other third-party trademarks are the property of their respective owners. Trademark symbols (®, ™ etc.) denote trademarks of SUSE and its aliates. Asterisks (*) denote third-party trademarks. All information found in this book has been compiled with utmost attention to detail. However, this does not guarantee complete accuracy. Neither SUSE LLC,
    [Show full text]
  • GPU Implementation Over IPTV Software Defined Networking
    Esmeralda Hysenbelliu. Int. Journal of Engineering Research and Application www.ijera.com ISSN : 2248-9622, Vol. 7, Issue 8, ( Part -1) August 2017, pp.41-45 RESEARCH ARTICLE OPEN ACCESS GPU Implementation over IPTV Software Defined Networking Esmeralda Hysenbelliu* Information Technology Faculty, Polytechnic University of Tirana, Sheshi “Nënë Tereza”, Nr.4, Tirana, Albania Corresponding author: Esmeralda Hysenbelliu ABSTRACT One of the most important issue in IPTV Software defined Network is Bandwidth Issue and Quality of Service at the client side. Decidedly, it is required high level quality of images in low bandwidth and for this reason it is needed different transcoding standards (Compression of image as much as it is possible without destroying the quality of it) as H.264, H265, VP8 and VP9. During a test performed in SMC IPTV SDN Cloud Network, it was observed that with a server HP ProLiant DL380 g6 with two physical processors there was not possible to transcode in format H.264 more than 30 channels simultaneously because CPU’s achieved 100%. This is the reason why it was immediately needed to use Graphic Processing Units called GPU’s which offer high level images processing. After GPU superscalar processor was integrated and done functional via module NVENC of FFEMPEG Program, number of channels transcoded simultaneously was tremendous increased (more than 100 channels). The aim of this paper is to real implement GPU superscalar processors in IPTV Cloud Networks by achieving improvement of performance to more than 60%. Keywords - GPU superscalar processor, Performance Improvement, NVENC, CUDA --------------------------------------------------------------------------------------------------------------------------------------- Date of Submission: 01 -05-2017 Date of acceptance: 19-08-2017 --------------------------------------------------------------------------------------------------------------------------------------- I.
    [Show full text]
  • Computer Hardware Architecture Lecture 4
    Computer Hardware Architecture Lecture 4 Manfred Liebmann Technische Universit¨atM¨unchen Chair of Optimal Control Center for Mathematical Sciences, M17 [email protected] November 10, 2015 Manfred Liebmann November 10, 2015 Reading List • Pacheco - An Introduction to Parallel Programming (Chapter 1 - 2) { Introduction to computer hardware architecture from the parallel programming angle • Hennessy-Patterson - Computer Architecture - A Quantitative Approach { Reference book for computer hardware architecture All books are available on the Moodle platform! Computer Hardware Architecture 1 Manfred Liebmann November 10, 2015 UMA Architecture Figure 1: A uniform memory access (UMA) multicore system Access times to main memory is the same for all cores in the system! Computer Hardware Architecture 2 Manfred Liebmann November 10, 2015 NUMA Architecture Figure 2: A nonuniform memory access (UMA) multicore system Access times to main memory differs form core to core depending on the proximity of the main memory. This architecture is often used in dual and quad socket servers, due to improved memory bandwidth. Computer Hardware Architecture 3 Manfred Liebmann November 10, 2015 Cache Coherence Figure 3: A shared memory system with two cores and two caches What happens if the same data element z1 is manipulated in two different caches? The hardware enforces cache coherence, i.e. consistency between the caches. Expensive! Computer Hardware Architecture 4 Manfred Liebmann November 10, 2015 False Sharing The cache coherence protocol works on the granularity of a cache line. If two threads manipulate different element within a single cache line, the cache coherency protocol is activated to ensure consistency, even if every thread is only manipulating its own data.
    [Show full text]
  • Internet Security Threat Report VOLUME 21, APRIL 2016 TABLE of CONTENTS 2016 Internet Security Threat Report 2
    Internet Security Threat Report VOLUME 21, APRIL 2016 TABLE OF CONTENTS 2016 Internet Security Threat Report 2 CONTENTS 4 Introduction 21 Tech Support Scams Go Nuclear, 39 Infographic: A New Zero-Day Vulnerability Spreading Ransomware Discovered Every Week in 2015 5 Executive Summary 22 Malvertising 39 Infographic: A New Zero-Day Vulnerability Discovered Every Week in 2015 8 BIG NUMBERS 23 Cybersecurity Challenges For Website Owners 40 Spear Phishing 10 MOBILE DEVICES & THE 23 Put Your Money Where Your Mouse Is 43 Active Attack Groups in 2015 INTERNET OF THINGS 23 Websites Are Still Vulnerable to Attacks 44 Infographic: Attackers Target Both Large and Small Businesses 10 Smartphones Leading to Malware and Data Breaches and Mobile Devices 23 Moving to Stronger Authentication 45 Profiting from High-Level Corporate Attacks and the Butterfly Effect 10 One Phone Per Person 24 Accelerating to Always-On Encryption 45 Cybersecurity, Cybersabotage, and Coping 11 Cross-Over Threats 24 Reinforced Reassurance with Black Swan Events 11 Android Attacks Become More Stealthy 25 Websites Need to Become Harder to 46 Cybersabotage and 12 How Malicious Video Messages Could Attack the Threat of “Hybrid Warfare” Lead to Stagefright and Stagefright 2.0 25 SSL/TLS and The 46 Small Business and the Dirty Linen Attack Industry’s Response 13 Android Users under Fire with Phishing 47 Industrial Control Systems and Ransomware 25 The Evolution of Encryption Vulnerable to Attacks 13 Apple iOS Users Now More at Risk than 25 Strength in Numbers 47 Obscurity is No Defense
    [Show full text]
  • Efficiently Mitigating Transient Execution Attacks Using the Unmapped Speculation Contract Jonathan Behrens, Anton Cao, Cel Skeggs, Adam Belay, M
    Efficiently Mitigating Transient Execution Attacks using the Unmapped Speculation Contract Jonathan Behrens, Anton Cao, Cel Skeggs, Adam Belay, M. Frans Kaashoek, and Nickolai Zeldovich, MIT CSAIL https://www.usenix.org/conference/osdi20/presentation/behrens This paper is included in the Proceedings of the 14th USENIX Symposium on Operating Systems Design and Implementation November 4–6, 2020 978-1-939133-19-9 Open access to the Proceedings of the 14th USENIX Symposium on Operating Systems Design and Implementation is sponsored by USENIX Efficiently Mitigating Transient Execution Attacks using the Unmapped Speculation Contract Jonathan Behrens, Anton Cao, Cel Skeggs, Adam Belay, M. Frans Kaashoek, and Nickolai Zeldovich MIT CSAIL Abstract designers have implemented a range of mitigations to defeat transient execution attacks, including state flushing, selectively Today’s kernels pay a performance penalty for mitigations— preventing speculative execution, and removing observation such as KPTI, retpoline, return stack stuffing, speculation channels [5]. These mitigations impose performance over- barriers—to protect against transient execution side-channel heads (see §2): some of the mitigations must be applied at attacks such as Meltdown [21] and Spectre [16]. each privilege mode transition (e.g., system call entry and exit), To address this performance penalty, this paper articulates and some must be applied to all running code (e.g., retpolines the unmapped speculation contract, an observation that mem- for all indirect jumps). In some cases, they are so expensive ory that isn’t mapped in a page table cannot be leaked through that OS vendors have decided to leave them disabled by de- transient execution. To demonstrate the value of this contract, fault [2, 22].
    [Show full text]
  • Superscalar Fall 2020
    CS232 Superscalar Fall 2020 Superscalar What is superscalar - A superscalar processor has more than one set of functional units and executes multiple independent instructions during a clock cycle by simultaneously dispatching multiple instructions to different functional units in the processor. - You can think of a superscalar processor as there are more than one washer, dryer, and person who can fold. So, it allows more throughput. - The order of instruction execution is usually assisted by the compiler. The hardware and the compiler assure that parallel execution does not violate the intent of the program. - Example: • Ordinary pipeline: four stages (Fetch, Decode, Execute, Write back), one clock cycle per stage. Executing 6 instructions take 9 clock cycles. I0: F D E W I1: F D E W I2: F D E W I3: F D E W I4: F D E W I5: F D E W cc: 1 2 3 4 5 6 7 8 9 • 2-degree superscalar: attempts to process 2 instructions simultaneously. Executing 6 instructions take 6 clock cycles. I0: F D E W I1: F D E W I2: F D E W I3: F D E W I4: F D E W I5: F D E W cc: 1 2 3 4 5 6 Limitations of Superscalar - The above example assumes that the instructions are independent of each other. So, it’s easily to push them into the pipeline and superscalar. However, instructions are usually relevant to each other. Just like the hazards in pipeline, superscalar has limitations too. - There are several fundamental limitations the system must cope, which are true data dependency, procedural dependency, resource conflict, output dependency, and anti- dependency.
    [Show full text]
  • Class-Action Lawsuit
    Case 3:20-cv-00863-SI Document 1 Filed 05/29/20 Page 1 of 279 Steve D. Larson, OSB No. 863540 Email: [email protected] Jennifer S. Wagner, OSB No. 024470 Email: [email protected] STOLL STOLL BERNE LOKTING & SHLACHTER P.C. 209 SW Oak Street, Suite 500 Portland, Oregon 97204 Telephone: (503) 227-1600 Attorneys for Plaintiffs [Additional Counsel Listed on Signature Page.] UNITED STATES DISTRICT COURT DISTRICT OF OREGON PORTLAND DIVISION BLUE PEAK HOSTING, LLC, PAMELA Case No. GREEN, TITI RICAFORT, MARGARITE SIMPSON, and MICHAEL NELSON, on behalf of CLASS ACTION ALLEGATION themselves and all others similarly situated, COMPLAINT Plaintiffs, DEMAND FOR JURY TRIAL v. INTEL CORPORATION, a Delaware corporation, Defendant. CLASS ACTION ALLEGATION COMPLAINT Case 3:20-cv-00863-SI Document 1 Filed 05/29/20 Page 2 of 279 Plaintiffs Blue Peak Hosting, LLC, Pamela Green, Titi Ricafort, Margarite Sampson, and Michael Nelson, individually and on behalf of the members of the Class defined below, allege the following against Defendant Intel Corporation (“Intel” or “the Company”), based upon personal knowledge with respect to themselves and on information and belief derived from, among other things, the investigation of counsel and review of public documents as to all other matters. INTRODUCTION 1. Despite Intel’s intentional concealment of specific design choices that it long knew rendered its central processing units (“CPUs” or “processors”) unsecure, it was only in January 2018 that it was first revealed to the public that Intel’s CPUs have significant security vulnerabilities that gave unauthorized program instructions access to protected data. 2. A CPU is the “brain” in every computer and mobile device and processes all of the essential applications, including the handling of confidential information such as passwords and encryption keys.
    [Show full text]
  • The Multiscalar Architecture
    THE MULTISCALAR ARCHITECTURE by MANOJ FRANKLIN A thesis submitted in partial ful®llment of the requirements for the degree of DOCTOR OF PHILOSOPHY (Computer Sciences) at the UNIVERSITY OF WISCONSIN Ð MADISON 1993 THE MULTISCALAR ARCHITECTURE Manoj Franklin Under the supervision of Associate Professor Gurindar S. Sohi at the University of Wisconsin-Madison ABSTRACT The centerpiece of this thesis is a new processing paradigm for exploiting instruction level parallelism. This paradigm, called the multiscalar paradigm, splits the program into many smaller tasks, and exploits ®ne-grain parallelism by executing multiple, possibly (control and/or data) depen- dent tasks in parallel using multiple processing elements. Splitting the instruction stream at statically determined boundaries allows the compiler to pass substantial information about the tasks to the hardware. The processing paradigm can be viewed as extensions of the superscalar and multiprocess- ing paradigms, and shares a number of properties of the sequential processing model and the data¯ow processing model. The multiscalar paradigm is easily realizable, and we describe an implementation of the multis- calar paradigm, called the multiscalar processor. The central idea here is to connect multiple sequen- tial processors, in a decoupled and decentralized manner, to achieve overall multiple issue. The mul- tiscalar processor supports speculative execution, allows arbitrary dynamic code motion (facilitated by an ef®cient hardware memory disambiguation mechanism), exploits communication localities, and does all of these with hardware that is fairly straightforward to build. Other desirable aspects of the implementation include decentralization of the critical resources, absence of wide associative searches, and absence of wide interconnection/data paths.
    [Show full text]
  • Bank of Chile Affected by Cyber-Attack Malware Found Pre
    JUNE 2018 Bank of Chile Affected By Cyber-Attack On May 28, 2018, the Bank of Chile, the largest bank operating in the country, declared in a public statement that a virus presumably sent from outside of the country affected the bank’s operations. According to the announcement, the virus was discovered by internal IT experts on May 24. It impacted workstations, executives’ terminals, and cashier personnel, causing difficulties in office services and telephone banking. After the emergency, the Bank of Chile activated its contingency protocol by disconnecting some workstations and suspending normal operations to avoid the propagation of the virus. Although the virus severely affected the quality of banking services, the institution assured that the security of transactions, as well as client information and money remained safe at all times. Pinkerton assesses that cyber-attacks targeting financial institutions and international banks form part of a trend that is likely to continue increasing in 2018. So far, Pinkerton Vigilance Network sources had identified Mexico and Chile as the two most impacted by cyber-crimes in Latin America; however, Pinkerton finds that no nation is exempt from becoming a target. Clients are encouraged to review the standard regulations on cyber- security for their banks and its contingency protocols in the event of cyber-attacks. Any unrecognized banking operation or phishing scam should be reported as soon as possible to the Bank of Chile emergency phone line (600) 637 3737. For further information concerning security advise from the Bank of Chile, the following website can be consulted: https://ww3.bancochile.cl/wps/wcm/connect/personas/portal/seguridad/inicio-seguridad#Tab_ Acorden_Respon3.
    [Show full text]
  • Exploiting Branch Target Injection Jann Horn, Google Project Zero
    Exploiting Branch Target Injection Jann Horn, Google Project Zero 1 Outline ● Introduction ● Reverse-engineering branch prediction ● Leaking host memory from KVM 2 Disclaimer ● I haven't worked in CPU design ● I don't really understand how CPUs work ● Large parts of this talk are based on guesses ● This isn't necessarily how all CPUs work 3 Variants overview Spectre Meltdown ● CVE-2017-5753 ● CVE-2017-5715 ● CVE-2017-5754 ● Variant 1 ● Variant 2 ● Variant 3 ● Bounds Check ● Branch Target ● Rogue Data Cache Bypass Injection Load ● Primarily affects ● Primarily affects ● Affects kernels (and interpreters/JITs kernels/hypervisors architecturally equivalent software) 4 Performance ● Modern consumer CPU clock rates: ~4GHz ● Memory is slow: ~170 clock cycles latency on my machine ➢ CPU needs to work around high memory access latencies ● Adding parallelism is easier than making processing faster ➢ CPU needs to do things in parallel for performance ● Performance optimizations can lead to security issues! 5 Performance Optimization Resources ● everyone wants programs to run fast ➢ processor vendors want application authors to be able to write fast code ● architectural behavior requires architecture documentation; performance optimization requires microarchitecture documentation ➢ if you want information about microarchitecture, read performance optimization guides ● Intel: https://software.intel.com/en-us/articles/intel-sdm#optimization ("optimization reference manual") ● AMD: https://developer.amd.com/resources/developer-guides-manuals/ ("Software Optimization Guide") 6 (vaguely based on optimization manuals) Out-of-order execution front end out-of-order engine port (scheduler, renaming, ...) port instruction stream add rax, 9 add rax, 8 inc rbx port inc rbx sub rax, rbx mov [rcx], rax port cmp rax, 16 ..
    [Show full text]
  • Trends in Processor Architecture
    A. González Trends in Processor Architecture Trends in Processor Architecture Antonio González Universitat Politècnica de Catalunya, Barcelona, Spain 1. Past Trends Processors have undergone a tremendous evolution throughout their history. A key milestone in this evolution was the introduction of the microprocessor, term that refers to a processor that is implemented in a single chip. The first microprocessor was introduced by Intel under the name of Intel 4004 in 1971. It contained about 2,300 transistors, was clocked at 740 KHz and delivered 92,000 instructions per second while dissipating around 0.5 watts. Since then, practically every year we have witnessed the launch of a new microprocessor, delivering significant performance improvements over previous ones. Some studies have estimated this growth to be exponential, in the order of about 50% per year, which results in a cumulative growth of over three orders of magnitude in a time span of two decades [12]. These improvements have been fueled by advances in the manufacturing process and innovations in processor architecture. According to several studies [4][6], both aspects contributed in a similar amount to the global gains. The manufacturing process technology has tried to follow the scaling recipe laid down by Robert N. Dennard in the early 1970s [7]. The basics of this technology scaling consists of reducing transistor dimensions by a factor of 30% every generation (typically 2 years) while keeping electric fields constant. The 30% scaling in the dimensions results in doubling the transistor density (doubling transistor density every two years was predicted in 1975 by Gordon Moore and is normally referred to as Moore’s Law [21][22]).
    [Show full text]