"LISARM: Embedded ARM Platform Design and Optimization" Thesis

Total Page:16

File Type:pdf, Size:1020Kb

POLITECNICO DI TORINO III Facolt`adi Ingegneria Corso di Laurea in Ingegneria Elettronica Tesi di Laurea LISARM: embedded ARM platform design and optimization Relatori: Prof. Guido Masera Ing. Maurizio Martina Ing. Fabrizio Vacca Candidato: Carlo Ceriani Aprile 2007 A mia madre, a mio padre e . a chi ha avuto ¯ducia in me I Acknowledgements Il primo e pi`ugrande ringraziamento va a mia madre, per il fondamentale supporto datomi in questi lunghi anni di studi, per non avermi mai fatto mancare la propria ¯ducia ed avermi saputo dare i giusti stimoli, soprattutto nei momenti pi`udi±cili. In queste righe non posso non ricordare mio padre, in particolare per avermi insegnato che, rimboccandosi le maniche ed avendo ¯ducia nelle proprie capacit`a,ci si pu`o sempre spingere oltre, allargando i propri orizzonti. Ringrazio il mio relatore, prof. Guido Masera, ed i miei corelatori, Maurizio Martina e Fabrizio Vacca, per le essenziali consulenze, per avermi saputo indirizzare negli snodi cruciali del mio lavoro e per avermi messo a disposizione le risorse di cui necessitavo. Ringrazio gli altri componenti del VLSILab, con i quali ho avuto il piacere di condividere questa esperienza, per essersi sempre dimostrati disponibili a risolvere una moltitudine di ordinari problemi presentatisi. Un particolare ringraziamento va a Federico Quaglio, per l'aiuto che mi ha dato sia nella fase di ricerca e sviluppo del progetto, che in quella di stesura di questo elaborato. Trattandosi dell'atto conclusivo di un lungo percorso di studi, ma anche e soprat- tutto per suggellare un tratto importante della mia vita, ringrazio tutti coloro che in questo cammino hanno saputo arricchire la mia vita di conoscenza, di esperienza, ma anche semplicemente di piacevoli momenti di svago. I Summary The di®usion of electronic devices in many aspects of the common life has deeply changed not only the industrial production constraints but also the technologies the applications required by the market are based on. Although System-on-Chip technology allows to put heterogeneous components on the same die, the devel- opment time of hardwired technologies and the noteworthy constraints imposed by economic return reasons, have led to ¯nd new approaches. Hardware-software partitioning is one of the most applied techniques; it allows to divide the target application complexity on two di®erent levels: powerful and flexible programmable system design and complex algorithm implementation for the market demand satis- faction. The development phase must be performed in a coordinated way between the designer groups, so that this approach can ensure reduced times for the product implementation. Other constraints, for power consumption and occupied area, are also important, particularly for mobile devices which have to give long endurance for batteries and higher performance with respect to preceding applications, as required by the customers. In this technology branch, microprocessor based platforms are the most di®used and the ARM7TDMI processor represents a successful product, thanks to its noteworhty performance and low power characteristics. Embedded processors use is not the unique solution, although architectures available on the market furnish many of the characteristics requested by manufacturers, sometimes they are not tailor-made for critical applications or their structure is too complex, with dramatic e®ects on power consumption and area occupation. A di®erent so- lution is represented by the ASIPs, i.e. processors speci¯cally designed for target applications, that provide a dedicated instruction-set, built on the software algo- rithms which have to be executed on them. The programmable architectures design uses particular software environments which allow to describe the instruction-set in a flexible manner, enabling the code reuse by writing it with an Architecture Descrip- tion Language like LISA 2.0. LISATek Toolsuite and Language for Instruction-Set Architecture allow the processor behavior description in all its aspects, also by a temporal point of view, integrating technologies like pipelining and caching and allowing to obtain an hardware description in HDL, a powerful simulator and all the dedicated tools for software development. Aim of this thesis work is to explore II the possibilities o®ered by the software environment in the development of a pro- grammable platform based on the ARM7 processor, whose available documentation, due to a number of its applications, allows to analyse in-depth the characteristics to be transferred to the model. Chapter 1 Contains a brief review of all the topics treated in this thesis and an extended summary in italian, as required by the university rules for foreign language thesis. Chapter 2 Introduces some concepts about computer architectures, reporting some historical outlines about the evolution of computers and microprocessors. Chapter 3 Describes the ARM7TDMI processor from its programmer's model to an in-depth architecture analysis, describing its instruction set and the core inter- facing with external systems. Chapter 4 Introduces the LISATek toolsuite, a powerful software environment for ASIP modeling, the principal instrument used for the LISARM development and veri¯cation. Chapter 5 Describes the LISARM processor model by reporting the guidelines fol- lowed in the development of its various parts and the architectural solutions adopted to obtain a coherent ARM7 model, for both behavior and internal structure. Chapter 6 Describes the tools obtained from the model description by using the LISATek automatic generation tools and some external solutions for the compati- bility issues, like memory wrapping and toolchain adaption. Chapter 7 Contains some conclusive considerations about the thesis work and traces some hypothesis about future applications of the produced material. III Contents Acknowledgements I Summary II 1 Sintesi 1 1.1 Introduzione . 1 1.2 L'architettura dei processori RISC . 2 1.3 Architettura del microprocessore ARM7 . 8 1.4 L'ambiente di sviluppo LISATek . 14 1.5 Il modello LISA dell'ARM7 . 19 1.6 Strumenti di sviluppo per ARM7 . 26 1.7 Conclusioni e sviluppi futuri . 29 2 The RISC microprocessor architecture 31 2.1 The Von Neumann architecture . 31 2.2 Harvard architecture . 33 2.3 The increased processor complexity . 34 2.4 The RISC architecture . 36 2.5 Pipelining and cache technology . 41 2.6 RISC vs CISC architecture . 45 3 The ARM microprocessor architecture 49 3.1 The ARM processor family . 50 3.2 The Thumb concept . 51 3.3 The programmer model . 53 3.3.1 Operating states and state switching . 53 3.3.2 Memory formats and data types . 53 3.3.3 Operating modes . 54 3.3.4 Processor resources . 55 3.3.5 The Processor Status Registers (PSRs) . 56 3.4 The exception handling . 57 IV 3.4.1 Processor reset . 60 3.4.2 Interrupt and fast interrupt requests . 60 3.4.3 Abort conditions . 61 3.4.4 Software interrupts and supervisor mode . 62 3.4.5 Unde¯ned instruction . 62 3.4.6 Exception priorities . 63 3.5 ARM instruction set . 63 3.5.1 Conditional execution . 63 3.5.2 Branch and exchange (BX) . 64 3.5.3 Branch and branch with link (B-BL) . 66 3.5.4 Data processing instructions . 67 3.5.5 PSR transfer instructions . 71 3.5.6 Multiply and multiply and accumulate (MUL-MLA) . 73 3.5.7 Multiply and multiply and accumulate long (MULL-MLAL) . 75 3.5.8 Single data transfer operations (LDR-STR) . 77 3.5.9 Halfword and signed data transfer operations . 79 3.5.10 Block data transfer operations (LDM-STM) . 80 3.5.11 Single data swap (SWP) . 82 3.5.12 Software interrupt . 83 3.5.13 Coprocessor instructions . 83 3.5.14 Unde¯ned instruction . 84 3.6 Thumb instruction set . 85 3.7 The memory interface . 86 3.8 The coprocessor interface . 89 3.9 The debugging system . 90 4 LISATek toolsuite 94 4.1 The ASIP design flow . 95 4.2 Architecture exploration . 97 4.3 The architecture description: the LISA language . 99 4.3.1 Memory model . 99 4.3.2 Resource model . 101 4.3.3 Instruction-set model . 102 4.3.4 Behavioral model . 103 4.3.5 Timing model . 104 4.3.6 Microarchitecture model . 105 4.4 The LISATek model development tools . 105 4.4.1 The Processor Designer . 105 4.4.2 The Instruction-set Designer . 106 4.4.3 The Syntax Debugger . 107 4.5 The architecture implementation . 108 V 4.6 The application software design . 110 4.6.1 Assembler and linker . 110 4.6.2 Disassembler . 110 4.6.3 Simulator: the \Processor Debugger" . 111 4.6.4 The C-Compiler . 113 4.7 The system integration and veri¯cation . 114 5 The LISARM model 116 5.1 The model structure . 116 5.1.1 Processor resources, interface, internal units . 117 5.1.2 The main LISA operation . 120 5.1.3 The coding tree and the decoding mechanism . 123 5.2 The processor datapath . 124 5.2.1 The barrel shifter unit . 124 5.2.2 The arithmetic logic unit . 127 5.2.3 The 32x8 bit multiplier . 128 5.3 Other LISA operations . 130 5.4 The branch instructions . 131 5.5 Data processing instructions . 133 5.6 PSR transfer instructions . 136 5.7 Multiplication instructions . 138 5.8 Single data transfer instructions . 140 5.9 Block data transfer instructions . 145 5.10 The data swap instruction . 146 5.11 Software interrupt and unde¯ned instructions . 147 6 LISARM support tools 149 6.1 The ARM LISA simulator . 149 6.2 The memory wrapping . 150 6.3 ARM commercial toolchains . 152 6.4 ARM model toolchain adaption . 153 6.5 HDL generation and tests . 155 7 Conclusions and possible future applications 158 7.1 Conclusions . 158 7.2 Possible future applications . 159 A Model LISA operations summary 161 Bibliography 166 VI Chapter 1 Sintesi 1.1 Introduzione La di®usione dei dispositivi elettronici in molti aspetti della vita comune ha cambia- to profondamente gli assetti della produzione industriale e le tecnologie che stanno alla base delle applicazioni che il mercato richiede.
Recommended publications
  • Parallel Architectures
    Parallel architectures Denis Barthou [email protected] 1 Parallel architectures 2014-2015 D. Barthou 1- Objectives of this lecture ● Analyze and understand how parallel machines work ● Study modern parallel architectures ● Use this knowledge to write better code 2 Parallel architectures 2014-2015 D. Barthou Outline 1. Introduction 2. Unicore architecture Pipeline, OoO, superscalar, VLIW, branch prediction, ILP limit 3. Vectors Definition, vectorization 4. Memory and caches Principle, caches, multicores and optimization 5. New architectures and accelerators 3 Parallel architectures 2014-2015 D. Barthou 1- Parallelism Many services and machines are already parallel ● Internet and server infrastructures ● Data bases ● Games ● Sensor networks (cars, embedded equipment, …) ● ... What's new ? 4 Parallel architectures 2014-2015 D. Barthou 1- Parallelism Many services and machines are already parallel ● Internet and server infrastructures ● Data bases ● Games ● Sensor networks (cars, embedded equipment, …) ● ... What's new ? ● Parallelism everywhere ● Dramatic increase of parallelism inside a compute node 5 Parallel architectures 2014-2015 D. Barthou 1- Multicore/manycore Many core already there Nvidia Kepler: 192 cores Intel Tera chip, 2007 7,1 billion of transistors (80 cores) Intel SCC, 2010 (48 cores) Many Integrated Chips ou Xeon Phi (60 cores) 6 Parallel architectures 2014-2015 D. Barthou 1- Why so many cores ? Moore's law Every 18 mois, the number of transistors double, with the same cost (1965) Exponential law applies on: ● Processor performance, ● Memory & disk capacity ● Size of the wire ● Heat dissipated 7 Parallel architectures 2014-2015 D. Barthou 1- Moore's law, limiting factor: W W = CV2f 8 Parallel architectures 2014-2015 D. Barthou 1- Impacts ● No more increase in frequency ● Increase in core number 9 Parallel architectures 2014-2015 D.
    [Show full text]
  • Vector Microprocessors for Cryptography
    UCAM-CL-TR-701 Technical Report ISSN 1476-2986 Number 701 Computer Laboratory Vector microprocessors for cryptography Jacques Jean-Alain Fournier October 2007 15 JJ Thomson Avenue Cambridge CB3 0FD United Kingdom phone +44 1223 763500 http://www.cl.cam.ac.uk/ c 2007 Jacques Jean-Alain Fournier This technical report is based on a dissertation submitted April 2007 by the author for the degree of Doctor of Philosophy to the University of Cambridge, Trinity Hall. Technical reports published by the University of Cambridge Computer Laboratory are freely available via the Internet: http://www.cl.cam.ac.uk/techreports/ ISSN 1476-2986 Abstract Embedded security devices like ‘Trusted Platforms’ require both scalability (of power, performance and area) and flexibility (of software and countermea- sures). This thesis illustrates how data parallel techniques can be used to implement scalable architectures for cryptography. Vector processing is used to provide high performance, power efficient and scalable processors. A pro- grammable vector 4-stage pipelined co-processor, controlled by a scalar MIPS compatible processor, is described. The instruction set of the co-processor is defined for cryptographic algorithms like AES and Montgomery modular multiplication for RSA and ECC. The instructions are assessed using an in- struction set simulator based on the ArchC tool. This instruction set simu- lator is used to see the impact of varying the vector register depth (p) and the number of vector processing units (r). Simulations indicate that for vec- tor versions of AES, RSA and ECC the performance improves in O(log(r)). A cycle-accurate synthesisable Verilog model of the system (VeMICry) is implemented in TSMC’s 90nm technology and used to show that the best p area/power/performance trade-off is reached for r = 4 .
    [Show full text]
  • Energy Efficient Branch Prediction
    View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by University of Hertfordshire Research Archive Energy Efficient Branch Prediction Michael Andrew Hicks A thesis submitted in partial fulfilment of the requirements of the University of Hertfordshire for the degree of Doctor of Philosophy December 2007 To my family and friends. Contents 1 Introduction 1 1.1 Thesis Statement . 1 1.2 Motivation and Energy Efficiency . 1 1.3 Branch Prediction . 3 1.4 Contributions . 4 1.5 Dissertation Structure . 5 2 Energy Efficiency in Modern Processor Design 7 2.1 Transistor Level Power Dissipation . 7 2.1.1 Static Dissipation . 8 2.1.2 Dynamic Dissipation . 9 2.1.3 Energy Efficiency Metrics . 9 2.2 Transistor Level Energy Efficiency Techniques . 10 2.2.1 Clock Gating and Vdd Gating . 10 2.2.2 Technology Scaling . 11 2.2.3 Voltage Scaling . 11 2.2.4 Logic Optimisation . 11 2.3 Architecture & Software Level Efficiency Techniques . 11 2.3.1 Activity Factor Reduction . 12 2.3.2 Delay Reduction . 12 2.3.3 Low Power Scheduling . 12 2.3.4 Frequency Scaling . 13 2.4 Branch Prediction . 13 2.4.1 The Branch Problem . 13 2.4.2 Dynamic and Static Prediction . 14 2.4.3 Dynamic Predictors . 15 2.4.4 Power Consumption . 18 2.5 Summary . 18 3 Related Techniques 20 3.1 The Prediction Probe Detector (Hardware) . 20 3.1.1 Implementation . 20 3.1.2 Pipeline Gating . 22 i 3.2 Software Based Approaches . 23 3.2.1 Hinting and Hint Instructions .
    [Show full text]
  • Constant-Time Foundations for the New Spectre Era
    Constant-Time Foundations for the New Spectre Era Sunjay Cauligiy Craig Disselkoeny Klaus v. Gleissenthally Dean Tullseny Deian Stefany Tamara Rezk¢ Gilles Barthe♠| yUC San Diego, USA ¢INRIA Sophia Antipolis, France ♠MPI for Security and Privacy, Germany |IMDEA Software Institute, Spain Abstract 1 Introduction The constant-time discipline is a software-based countermea- Protecting secrets in software is hard. Security and cryptog- sure used for protecting high assurance cryptographic imple- raphy engineers must write programs that protect secrets, mentations against timing side-channel attacks. Constant- both at the source level and when they execute on real hard- time is effective (it protects against many known attacks), ware. Unfortunately, hardware too easily divulges informa- rigorous (it can be formalized using program semantics), and tion about a program’s execution via timing side-channels— amenable to automated verification. Yet, the advent of micro- e.g., an attacker can learn secrets by simply observing (via architectural attacks makes constant-time as it exists today timing) the effects of a program on the hardware cache[16]. far less useful. The most robust way to deal with timing side-channels This paper lays foundations for constant-time program- is via constant-time programming—the paradigm used to im- ming in the presence of speculative and out-of-order exe- plement almost all modern cryptography [2, 11, 12, 26, 27]. cution. We present an operational semantics and a formal Constant-time programs can neither branch on secrets nor definition of constant-time programs in this extended setting. access memory based on secret data.1 These restrictions Our semantics eschews formalization of microarchitectural ensure that programs do not leak secrets via timing side- features (that are instead assumed under adversary control), channels on hardware without microarchitectural features.
    [Show full text]
  • ARM Cortex-A Series Programmer's Guide
    ARM® Cortex™-A Series Version: 4.0 Programmer’s Guide Copyright © 2011 – 2013 ARM. All rights reserved. ARM DEN0013D (ID012214) ARM Cortex-A Series Programmer’s Guide Copyright © 2011 – 2013 ARM. All rights reserved. Release Information The following changes have been made to this book. Change history Date Issue Confidentiality Change 25 March 2011 A Non-Confidential First release 10 August 2011 B Non-Confidential Second release. Updated to include Virtualization, Cortex-A15 processor, and LPAE. Corrected and revised throughout 25 June 2012 C Non-Confidential Updated to include Cortex-A7 processor, and big.LITTLE. Index added. Corrected and revised throughout. 22 January 2014 D Non-Confidential Updated to include Cortex-A12 processor, Cache Coherent Interconnect, expanded GIC coverage, Multi-core processors, Corrected and revised throughout. Proprietary Notice This Cortex-A Series Programmer’s Guide is protected by copyright and the practice or implementation of the information herein may be protected by one or more patents or pending applications. No part of this Cortex-A Series Programmer’s Guide may be reproduced in any form by any means without the express prior written permission of ARM. No license, express or implied, by estoppel or otherwise to any intellectual property rights is granted by this Cortex-A Series Programmer’s Guide. Your access to the information in this Cortex-A Series Programmer’s Guide is conditional upon your acceptance that you will not use or permit others to use the information for the purposes of determining whether implementations of the information herein infringe any third party patents. This Cortex-A Series Programmer’s Guide is provided “as is”.
    [Show full text]
  • Cyan Holdings
    Job: 13666G-- Warrior Date: 30-11-05 Area: A1 Operator: MC Typesetter ID:DESIGN: ID Number:1077 TCP No.7 Time: 22:20 Rev: 1 Gal: 0001 THIS DOCUMENT IS IMPORTANT AND REQUIRES YOUR IMMEDIATE ATTENTION. If you are in any doubt about the contents of this document or as to the action you should take, you are recommended to seek your own personal financial advice immediately from your stockbroker, bank manager, solicitor, accountant or other independent financial adviser authorised pursuant to the Financial Services and Markets Act 2000, who specialises in advising on the acquisition of shares and other securities. This document is an AIM admission document and has been drawn up in accordance with the AIM Rules. This document does not constitute a prospectus under the Prospectus Rules and has not been approved by or filed with the Financial Services Authority. Copies of this document will be available free of charge to the public during normal business hours on any day (Saturdays, Sundays and public holidays excepted) at the offices of Collins Stewart, 88 Wood Street, London EC2V 7QR from the date of this document for the period ending one month after Admission. Application has been made for the Ordinary Shares issued and to be issued pursuant to the Placing to be admitted to trading on AIM, a market operated by the London Stock Exchange. AIM is a market designed primarily for emerging or smaller companies to which a higher investment risk tends to be attached than to larger or more established companies. AIM securities are not admitted to the Official List of the UK Listing Authority.
    [Show full text]
  • System-On-A-Chip
    System-on-a-chip From Wikipedia, the free encyclopedia Jump to: navigation, search System-on-a-chip or system on chip (SoC or SOC) is an idea of integrating all components of a computer or other electronic system into a single integrated circuit (chip). It may contain digital, analog, mixed-signal, and often radio-frequency functions – all on one chip. A typical application is in the area of embedded systems. If it is not feasible to construct an SoC for a particular application, an alternative is a system in package (SiP) comprising a number of chips in a single package. SoC is believed to be more cost effective since it increases the yield of the fabrication and because its packaging is simpler. Contents [hide] • 1 Structure • 2 Design flow • 3 Fabrication • 4 See also • 5 External links [edit] Structure y513719001187192499 from [email protected] was published by D-Publish on August 15, 2007 Microcontroller-based System-on-a-Chip A typical SoC consists of: • One or more microcontroller, microprocessor or DSP core(s). • Memory blocks including a selection of ROM, RAM, EEPROM and Flash. • Timing sources including oscillators and phase-locked loops. • Peripherals including counter-timers, real-time timers and power-on reset generators. • External interfaces including industry standards such as USB, FireWire, Ethernet, USART, SPI. • Analog interfaces including ADCs and DACs. • Voltage regulators and power management circuits. These blocks are connected by either a proprietary or industry-standard bus such as the AMBA bus from ARM. DMA controllers route data directly between external interfaces and memory, by-passing the processor core and thereby increasing the data throughput of the SoC.
    [Show full text]
  • A Systematic Evaluation of Transient Execution Attacks and Defenses
    A Systematic Evaluation of Transient Execution Attacks and Defenses Claudio Canella1, Jo Van Bulck2, Michael Schwarz1, Moritz Lipp1, Benjamin von Berg1, Philipp Ortner1, Frank Piessens2, Dmitry Evtyushkin3, Daniel Gruss1 1 Graz University of Technology, 2 imec-DistriNet, KU Leuven, 3 College of William and Mary Abstract instruction which has not been executed (and retired) yet. Hence, to keep the pipeline full at all times, it is essential to Research on transient execution attacks including Spectre predict the control flow, data dependencies, and possibly even and Meltdown showed that exception or branch mispredic- the actual data. Modern CPUs, therefore, rely on intricate mi- tion events might leave secret-dependent traces in the CPU’s croarchitectural optimizations to predict and sometimes even microarchitectural state. This observation led to a prolifera- re-order the instruction stream. Crucially, however, as these tion of new Spectre and Meltdown attack variants and even predictions may turn out to be wrong, pipeline flushes may be more ad-hoc defenses (e.g., microcode and software patches). necessary, and instruction results should always be committed Both the industry and academia are now focusing on finding according to the intended in-order instruction stream. Pipeline effective defenses for known issues. However, we only have flushes may occur even without prediction mechanisms, as on limited insight on residual attack surface and the completeness modern CPUs virtually any instruction can raise a fault (e.g., of the proposed defenses. page fault or general protection fault), requiring a roll-back In this paper, we present a systematization of transient of all operations following the faulting instruction.
    [Show full text]
  • G3 CARD IST-1999-13515 Public Final Report January 2003
    G3 CARD IST-1999-13515 Public Final Report January 2003 G3Card –Public Final Report Contents 1 Project Overview....................................................................................................................................................2 2 Background ............................................................................................................................................................4 2.1 Self-timed logic ..............................................................................................................................................4 2.2 Attack technologies ........................................................................................................................................5 2.3 Societal needs .................................................................................................................................................6 3 Results – summary .................................................................................................................................................7 4 Results – details......................................................................................................................................................8 4.1 XAP designs...................................................................................................................................................8 4.2 Prototype SmartMIPS.....................................................................................................................................9
    [Show full text]
  • SPECCFI: Mitigating Spectre Attacks Using CFI Informed Speculation
    SPECCFI: Mitigating Spectre Attacks using CFI Informed Speculation Esmaeil Mohammadian Koruyeh∗, Shirin Haji Amin Shirazi∗,Khaled N. Khasawnehy, Chengyu Song∗ and Nael Abu-Ghazaleh∗ ∗Computer Science and Engineering Department University of California, Riverside femoha004,shaji007,csong,[email protected] yElectrical and Computer Engineering Department George Mason University [email protected] Abstract—Spectre attacks and their many subsequent variants [42], [45], [47], [61], [66], [70], [76]; it is clear that this is a are a new vulnerability class affecting modern CPUs. The attacks general class of vulnerability that requires deep rethinking of rely on the ability to misguide speculative execution, generally processor architecture. by exploiting the branch prediction structures, to execute a vulnerable code sequence speculatively. In this paper, we propose Since speculation is essential for the performance of modern to use Control-Flow Integrity (CFI), a security technique used processors, to mitigate this threat without severely restricting to stop control-flow hijacking attacks, on the committed path, to speculation, some solutions such as InvisiSpec [77] and Safe- prevent speculative control-flow from being hijacked to launch Spec [40] propose separating speculative data from committed the most dangerous variants of the Spectre attacks (Spectre- data. Such an approach, rather than attempting to limit specu- BTB and Spectre-RSB). Specifically, CFI attempts to constrain the possible targets of an indirect branch to a set of legal lation,
    [Show full text]
  • Control Flow Speculation for Distributed Architectures
    Copyright by Nitya Ranganathan 2009 The Dissertation Committee for Nitya Ranganathan certifies that this is the approved version of the following dissertation: Control Flow Speculation for Distributed Architectures Committee: Douglas C. Burger, Supervisor Stephen W. Keckler Kathryn S. Mckinley Yale N. Patt Daniel A. Jim´enez Control Flow Speculation for Distributed Architectures by Nitya Ranganathan, B.E.; M.S. Dissertation Presented to the Faculty of the Graduate School of The University of Texas at Austin in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy The University of Texas at Austin May 2009 To my parents. Acknowledgments First and foremost, I would like to thank my advisor Doug Burger for giving me his guid- ance, support, and advice on numerous occasions. When I started as a graduate student several years ago, he taught me how to approach architecture research. He was always enthusiastic about positive results and encouraging whenever there was a disappointment. I learned many important things from him: looking for the upper-bounds when getting started, thinking about a two-line summary of why a paper would be cited in future, consid- ering alternative solutions and so on. Doug gave me interesting problems to work on and provided me with ideas to explore alternative solutions. I will always be grateful to him for his encouragement during the numerous ups and downs over the years. I thank Steve Keckler for his valuable guidance and support. I had the opportunity to work with him during the TRIPS prototype work and learned several things from design simplification for easier verification to writing Verilog code.
    [Show full text]
  • Spectre Returns! Speculation Attacks Using the Return Stack Buffer
    Spectre Returns! Speculation Attacks using the Return Stack Buffer Esmaeil Mohammadian Koruyeh, Khaled N. Khasawneh, Chengyu Song and Nael Abu-Ghazaleh Computer Science and Engineering Department University of California, Riverside naelag@ ucr. edu Abstract not take effect until the instruction is committed. The recent Spectre attack [23, 13, 31] has shown that this The recent Spectre attacks exploit speculative execution, behavior can be exploited to expose information that is a pervasively used feature of modern microprocessors, to otherwise inaccessible. In the two variants of Spectre at- allow the exfiltration of sensitive data across protection tacks, attackers either mistrain the branch predictor unit boundaries. In this paper, we introduce a new Spectre- or directly pollute it to force the speculative execution class attack that we call SpectreRSB. In particular, rather of code that can enable exposure of the full memory of than exploiting the branch predictor unit, SpectreRSB other processes and hypervisor. exploits the return stack buffer (RSB), a common pre- Chen et al. demonstrated that known Spectre variants dictor structure in modern CPUs used to predict return are able to expose information from SGX enclaves [3]. addresses. We show that both local attacks (within the New variants of Spectre that utilize other triggers for same process such as Spectre 1) and attacks on SGX are speculative execution have been introduced including possible by constructing proof of concept attacks. We speculative store bypass [17]. also analyze additional types of the attack on the ker- In this paper, we introduce a new attack vector Spec- nel or across address spaces and show that under some tre like attacks that are not prevented by deployed de- practical and widely used conditions they are possible.
    [Show full text]