Constant-Time Foundations for the New Spectre Era

Total Page:16

File Type:pdf, Size:1020Kb

Constant-Time Foundations for the New Spectre Era Constant-Time Foundations for the New Spectre Era Sunjay Cauligiy Craig Disselkoeny Klaus v. Gleissenthally Dean Tullseny Deian Stefany Tamara Rezk¢ Gilles Barthe♠| yUC San Diego, USA ¢INRIA Sophia Antipolis, France ♠MPI for Security and Privacy, Germany |IMDEA Software Institute, Spain Abstract 1 Introduction The constant-time discipline is a software-based countermea- Protecting secrets in software is hard. Security and cryptog- sure used for protecting high assurance cryptographic imple- raphy engineers must write programs that protect secrets, mentations against timing side-channel attacks. Constant- both at the source level and when they execute on real hard- time is effective (it protects against many known attacks), ware. Unfortunately, hardware too easily divulges informa- rigorous (it can be formalized using program semantics), and tion about a program’s execution via timing side-channels— amenable to automated verification. Yet, the advent of micro- e.g., an attacker can learn secrets by simply observing (via architectural attacks makes constant-time as it exists today timing) the effects of a program on the hardware cache[16]. far less useful. The most robust way to deal with timing side-channels This paper lays foundations for constant-time program- is via constant-time programming—the paradigm used to im- ming in the presence of speculative and out-of-order exe- plement almost all modern cryptography [2, 11, 12, 26, 27]. cution. We present an operational semantics and a formal Constant-time programs can neither branch on secrets nor definition of constant-time programs in this extended setting. access memory based on secret data.1 These restrictions Our semantics eschews formalization of microarchitectural ensure that programs do not leak secrets via timing side- features (that are instead assumed under adversary control), channels on hardware without microarchitectural features. and yields a notion of constant-time that retains the ele- Unfortunately, these guarantees are moot for most modern gance and tractability of the usual notion. We demonstrate hardware: Spectre [20], Meltdown [22], ZombieLoad [29], the relevance of our semantics in two ways: First, by con- RIDL [32], and Fallout [5] are all dramatic examples of attacks trasting existing Spectre-like attacks with our definition of that exploit microarchitectural features. These attacks reveal constant-time. Second, by implementing a static analysis that code that is deemed constant-time in the usual sense tool, Pitchfork, which detects violations of our extended may, in fact, leak information on processors with microar- constant-time property in real world cryptographic libraries. chitectural features. The decade-old constant-time recipes are no longer enough.2 CCS Concepts: • Security and privacy ! Formal secu- In this work, we lay the foundations for constant-time in rity models; Side-channel analysis and countermeasures. the presence of microarchitectural features that have been Keywords: Spectre; speculative execution; semantics; static exploited in recent attacks: out-of-order and speculative ex- analysis ecution. We focus on constant-time for two key reasons. First, impact: constant-time programming is largely used in ACM Reference Format: real-world crypto libraries—and high-assurance code—where Sunjay Cauligi, Craig Disselkoen, Klaus v. Gleissenthall, Dean developers already go to great lengths to eliminate leaks via Tullsen, Deian Stefan, Tamara Rezk, and Gilles Barthe. 2020. Constant- side-channels. Second, foundations: constant-time program- Time Foundations for the New Spectre Era. In Proceedings of the 41st ming is already rooted in foundations, with well-defined ACM SIGPLAN International Conference on Programming Language semantics [4, 8]. These semantics consider very powerful Design and Implementation (PLDI ’20), June 15–20, 2020, London, attackers—e.g., attackers in [4] have control over the cache UK. ACM, New York, NY, USA, 20 pages. https://doi.org/10.1145/ and the scheduler. An advantage of considering powerful 3385412.3385970 attackers is that the semantics can overlook many hardware details—e.g., since the cache is adversarially controlled, there Permission to make digital or hard copies of part or all of this work for is no point in modeling it precisely—making constant-time personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies amenable to automated verification and enforcement. bear this notice and the full citation on the first page. Copyrights for third- Contributions. We first define a semantics for an abstract, party components of this work must be honored. For all other uses, contact three-stage (fetch, execute, and retire) machine. Our machine the owner/author(s). PLDI ’20, June 15–20, 2020, London, UK 1More generally, constant-time programs cannot use secret data as input to © 2020 Copyright held by the owner/author(s). any variable-time operation—e.g., floating-point multiplication. ACM ISBN 978-1-4503-7613-6/20/06. 2OpenSSL found this situation so hopeless that they recently updated their https://doi.org/10.1145/3385412.3385970 security model to explicitly exclude “physical system side channels” [25]. PLDI ’20, June 15–20, 2020, London, UK S. Cauligi, C. Disselkoen, K. v. Gleissenthall, D. Tullsen, D. Stefan, T. Rezk, and G. Barthe supports out-of-order and speculative execution by model- before evaluating the branch condition. In that case, the pro- ing reorder buffers and transient instructions, respectively. cessor guesses which branch will be taken. For example, the We assume that attackers have complete control over mi- processor may erroneously guess that the branch condition croarchitectural features (e.g., the branch target predictor) at line 1 evaluates to true, even though A0 contains value 9. It when executing a victim program and model the attacker’s will therefore continue down the “true” branch speculatively. control over predictors using directives. This keeps our se- In hardware, such guesses are made by a branch prediction mantics simple yet powerful: our semantics abstracts over unit, which may have been mistrained by an adversary. all predictors when proving security—of course, assuming These guesses, as well as additional choices such as exe- that predictors themselves do not leak secrets. We further cution order, are directly supplied by the adversary in our show how our semantics can be extended to capture new semantics. We model this through a series of directives, as predictors—e.g., a hypothetical memory aliasing predictor. shown on the bottom left of Figure1. The directive fetch: true We then define speculative constant-time, an extension of instructs our model to speculatively follow the true branch constant-time for machines with out-of-order and specula- and to place the fetched instruction at index 1 in the reorder tive execution. This definition allows us to discover microar- buffer. Similarly, the two following fetch directives place the chitectural side channels in a principled way—all four classes loads at indices 2 and 3 in the buffer. The instructions in the of Spectre attacks as classified by Canella et al.[6], for ex- reorder buffer, called transient instructions, do not necessarily ample, manifest as violations of our constant-time property. match the original instructions, but can contain additional We further use our semantics as the basis for a prototype information (see Table1). For instance, the transient version analysis tool, Pitchfork, built on top of the angr symbolic of the branch instruction records which branch has been execution engine [30]. Like other symbolic analysis tools, speculatively taken. Pitchfork suffers from path explosion, which limits the depth In our example, the attacker next instructs the model to of speculation we can analyze. Nevertheless, we are able to execute the first load, using the directive execute 2. Because use Pitchfork to detect multiple Spectre bugs in real code. the bounds check has not yet been executed, the load reads We use Pitchfork to detect leaks in the well-known Kocher from the secret element Key»1¼, placing the value in A1 . The test cases [19] for Spectre v1, as well as our more exten- attacker then issues directive execute 3 to execute the fol- sive test suite which includes Spectre v1.1 variants. More lowing load; this load’s address is calculated as 44 ¸ Key»1¼. significantly, we use Pitchfork to analyze—and find leaks Accessing this address affects externally visible cache state, in—real cryptographic code from the libsodium, OpenSSL, allowing the attacker to recover Key»1¼ through a cache and curve25519-donna libraries. side-channel attack [16]. This is encoded by the leakage ob- Open source. Pitchfork and our test suites are open source servation shown in red on the bottom right. Though this and available at https://pitchfork.programming.systems. secret leakage cannot happen under sequential execution, our semantics clearly highlights the possible leak when we 2 Motivating Examples account for microarchitectural features. In this section, we show why classical constant-time pro- Modeling hypothetical attacks. Next, we give an example gramming is insufficient when attackers can exploit microar- of a hypothetical class of Spectre attack captured by our chitectural features. We do this via two example attacks and extended semantics. The attack is based on a microarchi- show how these attacks are captured by our semantics. tectural feature which would allow processors to speculate Classical constant time is not enough. Our first example whether a store and load pair might operate on the same consists of 3 lines of code, shown in Figure1 (top right). The address, and forward values between them [18, 28]. program, a variant of the classical Spectre v1 attack [20], We demonstrate this attack in Figure2. The reorder buffer, branches on the value of register A0 (line 1). If A0’s value after all instructions have been fetched, is shown in the top is smaller than 4, the program jumps to program location right.
Recommended publications
  • Parallel Architectures
    Parallel architectures Denis Barthou [email protected] 1 Parallel architectures 2014-2015 D. Barthou 1- Objectives of this lecture ● Analyze and understand how parallel machines work ● Study modern parallel architectures ● Use this knowledge to write better code 2 Parallel architectures 2014-2015 D. Barthou Outline 1. Introduction 2. Unicore architecture Pipeline, OoO, superscalar, VLIW, branch prediction, ILP limit 3. Vectors Definition, vectorization 4. Memory and caches Principle, caches, multicores and optimization 5. New architectures and accelerators 3 Parallel architectures 2014-2015 D. Barthou 1- Parallelism Many services and machines are already parallel ● Internet and server infrastructures ● Data bases ● Games ● Sensor networks (cars, embedded equipment, …) ● ... What's new ? 4 Parallel architectures 2014-2015 D. Barthou 1- Parallelism Many services and machines are already parallel ● Internet and server infrastructures ● Data bases ● Games ● Sensor networks (cars, embedded equipment, …) ● ... What's new ? ● Parallelism everywhere ● Dramatic increase of parallelism inside a compute node 5 Parallel architectures 2014-2015 D. Barthou 1- Multicore/manycore Many core already there Nvidia Kepler: 192 cores Intel Tera chip, 2007 7,1 billion of transistors (80 cores) Intel SCC, 2010 (48 cores) Many Integrated Chips ou Xeon Phi (60 cores) 6 Parallel architectures 2014-2015 D. Barthou 1- Why so many cores ? Moore's law Every 18 mois, the number of transistors double, with the same cost (1965) Exponential law applies on: ● Processor performance, ● Memory & disk capacity ● Size of the wire ● Heat dissipated 7 Parallel architectures 2014-2015 D. Barthou 1- Moore's law, limiting factor: W W = CV2f 8 Parallel architectures 2014-2015 D. Barthou 1- Impacts ● No more increase in frequency ● Increase in core number 9 Parallel architectures 2014-2015 D.
    [Show full text]
  • "LISARM: Embedded ARM Platform Design and Optimization" Thesis
    POLITECNICO DI TORINO III Facolt`adi Ingegneria Corso di Laurea in Ingegneria Elettronica Tesi di Laurea LISARM: embedded ARM platform design and optimization Relatori: Prof. Guido Masera Ing. Maurizio Martina Ing. Fabrizio Vacca Candidato: Carlo Ceriani Aprile 2007 A mia madre, a mio padre e . a chi ha avuto ¯ducia in me I Acknowledgements Il primo e pi`ugrande ringraziamento va a mia madre, per il fondamentale supporto datomi in questi lunghi anni di studi, per non avermi mai fatto mancare la propria ¯ducia ed avermi saputo dare i giusti stimoli, soprattutto nei momenti pi`udi±cili. In queste righe non posso non ricordare mio padre, in particolare per avermi insegnato che, rimboccandosi le maniche ed avendo ¯ducia nelle proprie capacit`a,ci si pu`o sempre spingere oltre, allargando i propri orizzonti. Ringrazio il mio relatore, prof. Guido Masera, ed i miei corelatori, Maurizio Martina e Fabrizio Vacca, per le essenziali consulenze, per avermi saputo indirizzare negli snodi cruciali del mio lavoro e per avermi messo a disposizione le risorse di cui necessitavo. Ringrazio gli altri componenti del VLSILab, con i quali ho avuto il piacere di condividere questa esperienza, per essersi sempre dimostrati disponibili a risolvere una moltitudine di ordinari problemi presentatisi. Un particolare ringraziamento va a Federico Quaglio, per l'aiuto che mi ha dato sia nella fase di ricerca e sviluppo del progetto, che in quella di stesura di questo elaborato. Trattandosi dell'atto conclusivo di un lungo percorso di studi, ma anche e soprat- tutto per suggellare un tratto importante della mia vita, ringrazio tutti coloro che in questo cammino hanno saputo arricchire la mia vita di conoscenza, di esperienza, ma anche semplicemente di piacevoli momenti di svago.
    [Show full text]
  • Energy Efficient Branch Prediction
    View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by University of Hertfordshire Research Archive Energy Efficient Branch Prediction Michael Andrew Hicks A thesis submitted in partial fulfilment of the requirements of the University of Hertfordshire for the degree of Doctor of Philosophy December 2007 To my family and friends. Contents 1 Introduction 1 1.1 Thesis Statement . 1 1.2 Motivation and Energy Efficiency . 1 1.3 Branch Prediction . 3 1.4 Contributions . 4 1.5 Dissertation Structure . 5 2 Energy Efficiency in Modern Processor Design 7 2.1 Transistor Level Power Dissipation . 7 2.1.1 Static Dissipation . 8 2.1.2 Dynamic Dissipation . 9 2.1.3 Energy Efficiency Metrics . 9 2.2 Transistor Level Energy Efficiency Techniques . 10 2.2.1 Clock Gating and Vdd Gating . 10 2.2.2 Technology Scaling . 11 2.2.3 Voltage Scaling . 11 2.2.4 Logic Optimisation . 11 2.3 Architecture & Software Level Efficiency Techniques . 11 2.3.1 Activity Factor Reduction . 12 2.3.2 Delay Reduction . 12 2.3.3 Low Power Scheduling . 12 2.3.4 Frequency Scaling . 13 2.4 Branch Prediction . 13 2.4.1 The Branch Problem . 13 2.4.2 Dynamic and Static Prediction . 14 2.4.3 Dynamic Predictors . 15 2.4.4 Power Consumption . 18 2.5 Summary . 18 3 Related Techniques 20 3.1 The Prediction Probe Detector (Hardware) . 20 3.1.1 Implementation . 20 3.1.2 Pipeline Gating . 22 i 3.2 Software Based Approaches . 23 3.2.1 Hinting and Hint Instructions .
    [Show full text]
  • ARM Cortex-A Series Programmer's Guide
    ARM® Cortex™-A Series Version: 4.0 Programmer’s Guide Copyright © 2011 – 2013 ARM. All rights reserved. ARM DEN0013D (ID012214) ARM Cortex-A Series Programmer’s Guide Copyright © 2011 – 2013 ARM. All rights reserved. Release Information The following changes have been made to this book. Change history Date Issue Confidentiality Change 25 March 2011 A Non-Confidential First release 10 August 2011 B Non-Confidential Second release. Updated to include Virtualization, Cortex-A15 processor, and LPAE. Corrected and revised throughout 25 June 2012 C Non-Confidential Updated to include Cortex-A7 processor, and big.LITTLE. Index added. Corrected and revised throughout. 22 January 2014 D Non-Confidential Updated to include Cortex-A12 processor, Cache Coherent Interconnect, expanded GIC coverage, Multi-core processors, Corrected and revised throughout. Proprietary Notice This Cortex-A Series Programmer’s Guide is protected by copyright and the practice or implementation of the information herein may be protected by one or more patents or pending applications. No part of this Cortex-A Series Programmer’s Guide may be reproduced in any form by any means without the express prior written permission of ARM. No license, express or implied, by estoppel or otherwise to any intellectual property rights is granted by this Cortex-A Series Programmer’s Guide. Your access to the information in this Cortex-A Series Programmer’s Guide is conditional upon your acceptance that you will not use or permit others to use the information for the purposes of determining whether implementations of the information herein infringe any third party patents. This Cortex-A Series Programmer’s Guide is provided “as is”.
    [Show full text]
  • A Systematic Evaluation of Transient Execution Attacks and Defenses
    A Systematic Evaluation of Transient Execution Attacks and Defenses Claudio Canella1, Jo Van Bulck2, Michael Schwarz1, Moritz Lipp1, Benjamin von Berg1, Philipp Ortner1, Frank Piessens2, Dmitry Evtyushkin3, Daniel Gruss1 1 Graz University of Technology, 2 imec-DistriNet, KU Leuven, 3 College of William and Mary Abstract instruction which has not been executed (and retired) yet. Hence, to keep the pipeline full at all times, it is essential to Research on transient execution attacks including Spectre predict the control flow, data dependencies, and possibly even and Meltdown showed that exception or branch mispredic- the actual data. Modern CPUs, therefore, rely on intricate mi- tion events might leave secret-dependent traces in the CPU’s croarchitectural optimizations to predict and sometimes even microarchitectural state. This observation led to a prolifera- re-order the instruction stream. Crucially, however, as these tion of new Spectre and Meltdown attack variants and even predictions may turn out to be wrong, pipeline flushes may be more ad-hoc defenses (e.g., microcode and software patches). necessary, and instruction results should always be committed Both the industry and academia are now focusing on finding according to the intended in-order instruction stream. Pipeline effective defenses for known issues. However, we only have flushes may occur even without prediction mechanisms, as on limited insight on residual attack surface and the completeness modern CPUs virtually any instruction can raise a fault (e.g., of the proposed defenses. page fault or general protection fault), requiring a roll-back In this paper, we present a systematization of transient of all operations following the faulting instruction.
    [Show full text]
  • SPECCFI: Mitigating Spectre Attacks Using CFI Informed Speculation
    SPECCFI: Mitigating Spectre Attacks using CFI Informed Speculation Esmaeil Mohammadian Koruyeh∗, Shirin Haji Amin Shirazi∗,Khaled N. Khasawnehy, Chengyu Song∗ and Nael Abu-Ghazaleh∗ ∗Computer Science and Engineering Department University of California, Riverside femoha004,shaji007,csong,[email protected] yElectrical and Computer Engineering Department George Mason University [email protected] Abstract—Spectre attacks and their many subsequent variants [42], [45], [47], [61], [66], [70], [76]; it is clear that this is a are a new vulnerability class affecting modern CPUs. The attacks general class of vulnerability that requires deep rethinking of rely on the ability to misguide speculative execution, generally processor architecture. by exploiting the branch prediction structures, to execute a vulnerable code sequence speculatively. In this paper, we propose Since speculation is essential for the performance of modern to use Control-Flow Integrity (CFI), a security technique used processors, to mitigate this threat without severely restricting to stop control-flow hijacking attacks, on the committed path, to speculation, some solutions such as InvisiSpec [77] and Safe- prevent speculative control-flow from being hijacked to launch Spec [40] propose separating speculative data from committed the most dangerous variants of the Spectre attacks (Spectre- data. Such an approach, rather than attempting to limit specu- BTB and Spectre-RSB). Specifically, CFI attempts to constrain the possible targets of an indirect branch to a set of legal lation,
    [Show full text]
  • Control Flow Speculation for Distributed Architectures
    Copyright by Nitya Ranganathan 2009 The Dissertation Committee for Nitya Ranganathan certifies that this is the approved version of the following dissertation: Control Flow Speculation for Distributed Architectures Committee: Douglas C. Burger, Supervisor Stephen W. Keckler Kathryn S. Mckinley Yale N. Patt Daniel A. Jim´enez Control Flow Speculation for Distributed Architectures by Nitya Ranganathan, B.E.; M.S. Dissertation Presented to the Faculty of the Graduate School of The University of Texas at Austin in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy The University of Texas at Austin May 2009 To my parents. Acknowledgments First and foremost, I would like to thank my advisor Doug Burger for giving me his guid- ance, support, and advice on numerous occasions. When I started as a graduate student several years ago, he taught me how to approach architecture research. He was always enthusiastic about positive results and encouraging whenever there was a disappointment. I learned many important things from him: looking for the upper-bounds when getting started, thinking about a two-line summary of why a paper would be cited in future, consid- ering alternative solutions and so on. Doug gave me interesting problems to work on and provided me with ideas to explore alternative solutions. I will always be grateful to him for his encouragement during the numerous ups and downs over the years. I thank Steve Keckler for his valuable guidance and support. I had the opportunity to work with him during the TRIPS prototype work and learned several things from design simplification for easier verification to writing Verilog code.
    [Show full text]
  • Spectre Returns! Speculation Attacks Using the Return Stack Buffer
    Spectre Returns! Speculation Attacks using the Return Stack Buffer Esmaeil Mohammadian Koruyeh, Khaled N. Khasawneh, Chengyu Song and Nael Abu-Ghazaleh Computer Science and Engineering Department University of California, Riverside naelag@ ucr. edu Abstract not take effect until the instruction is committed. The recent Spectre attack [23, 13, 31] has shown that this The recent Spectre attacks exploit speculative execution, behavior can be exploited to expose information that is a pervasively used feature of modern microprocessors, to otherwise inaccessible. In the two variants of Spectre at- allow the exfiltration of sensitive data across protection tacks, attackers either mistrain the branch predictor unit boundaries. In this paper, we introduce a new Spectre- or directly pollute it to force the speculative execution class attack that we call SpectreRSB. In particular, rather of code that can enable exposure of the full memory of than exploiting the branch predictor unit, SpectreRSB other processes and hypervisor. exploits the return stack buffer (RSB), a common pre- Chen et al. demonstrated that known Spectre variants dictor structure in modern CPUs used to predict return are able to expose information from SGX enclaves [3]. addresses. We show that both local attacks (within the New variants of Spectre that utilize other triggers for same process such as Spectre 1) and attacks on SGX are speculative execution have been introduced including possible by constructing proof of concept attacks. We speculative store bypass [17]. also analyze additional types of the attack on the ker- In this paper, we introduce a new attack vector Spec- nel or across address spaces and show that under some tre like attacks that are not prevented by deployed de- practical and widely used conditions they are possible.
    [Show full text]
  • Hyperflow: a Processor Architecture for Nonmalleable, Timing-Safe Information Flow Security
    HyperFlow: A Processor Architecture for Nonmalleable, Timing-Safe Information Flow Security Andrew Ferraiuolo Mark Zhao Andrew C. Myers G. Edward Suh Cornell University Cornell University Cornell University Cornell University Ithaca, NY, USA Ithaca, NY, USA Ithaca, NY, USA Ithaca, NY, USA ABSTRACT found in SGX [40] and also in the implementations of Intel VT-d [62] This paper presents HyperFlow, a processor that enforces secure and system management mode (SMM) [61]. Moreover, the recent information flow, including control over timing channels. Thede- Spectre [32] and Meltdown [37] vulnerabilities show that even if sign and implementation of HyperFlow offer security assurance the hardware is correct in a conventional sense — it implements because it is implemented using a security-typed hardware descrip- its specification — correctness is not enough to ensure security. tion language that enforces secure information flow. Unlike prior Timing channels in microprocessors do not violate the processor processors that aim to enforce simple information-flow policies specification but can be used to leak information, and in thecase such as noninterference, HyperFlow allows complex information of Meltdown, to read the entire memory of the kernel [37]. flow policies that can be configured at run time. Its fine-grained, This paper presents HyperFlow, a processor architecture and decentralized information flow mechanisms allow controlled com- implementation designed for timing-safe information-flow security. munication among mutually distrusting processes and system calls It is implemented as a full-featured processor with a complete (RISC- into different security domains. We address the significant chal- V) instruction set, extended with instructions for information flow lenges in designing such a processor architecture with contributions control.
    [Show full text]
  • Shared Frontend for Manycore Server Processors
    Shared Frontend for Manycore Server Processors THÈSE NO 6669 (2015) PRÉSENTÉE LE 3 SEPTEMBRE 2015 À LA FACULTÉ INFORMATIQUE ET COMMUNICATIONS LABORATOIRE D'ARCHITECTURE DE SYSTÈMES PARALLÈLES PROGRAMME DOCTORAL EN INFORMATIQUE ET COMMUNICATIONS ÉCOLE POLYTECHNIQUE FÉDÉRALE DE LAUSANNE POUR L'OBTENTION DU GRADE DE DOCTEUR ÈS SCIENCES PAR Ilknur Cansu KAYNAK acceptée sur proposition du jury: Prof. V. Cevher, président du jury Prof. B. Falsafi, Prof. B. R. Grot, directeurs de thèse Prof. J. Emer, rapporteur Prof. A. Seznec, rapporteur Prof. P. Ienne, rapporteur Suisse 2015 Acknowledgements First and foremost, I would like to thank my parents, Nadide and Fatih, for valuing my edu- cation more than anything else and always giving me and my sister the highest priority in their lives. Since my childhood, my parents have always broadened my horizons, set the next challenging goal for me to achieve and then endlessly supported me to achieve the goals that I set for myself as I grew up. They tirelessly put up with my whining without a single complaint and supported me whenever I was lacking confidence and afraid that I was going to fail. I could not have asked for any better parents and I will always feel in debt to them. My sister, Duygu, has always been a source of joy for me since the day she was born. Along with my parents, she is my biggest supporter. I will always be proud to be part of this great team of four. I would like to thank Babak Falsafi, my advisor, for giving me the opportunity to start this journey.
    [Show full text]
  • Cost Effective Speculation with the Omnipredictor Arthur Perais, André Seznec
    Cost Effective Speculation with the Omnipredictor Arthur Perais, André Seznec To cite this version: Arthur Perais, André Seznec. Cost Effective Speculation with the Omnipredictor. PACT ’18 - 27th International Conference on Parallel Architectures and Compilation Techniques, Nov 2018, Limassol, Cyprus. 10.1145/3243176.3243208. hal-01888884 HAL Id: hal-01888884 https://hal.inria.fr/hal-01888884 Submitted on 14 Nov 2018 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Cost Effective Speculation with the Omnipredictor Arthur Perais André Seznec INRIA Univ Rennes IRISA CNRS ABSTRACT rather late in the pipeline, often as late as the execu- Modern superscalar processors heavily rely on out-of- tion stage. Therefore the instruction fetch engine is in order and speculative execution to achieve high perfor- charge of speculatively generating this instruction block mance. The conditional branch predictor, the indirect address. Modern processors instruction address genera- branch predictor and the memory dependency predic- tor features three predictors, the Return Address Stack tor are among the key structures that enable efficient (RAS) [1], the conditional branch predictor [2] and the speculative out-of-order execution. Therefore, proces- indirect jump predictor [3]. It also features a Branch sors implement these three predictors as distinct hard- Target Buffer (BTB) [4] to cache decoded targets for di- ware components.
    [Show full text]
  • Exploiting Value Prediction with Quasi-Unlimited Resources Arthur Perais
    Exploiting Value Prediction With Quasi-Unlimited Resources Arthur Perais To cite this version: Arthur Perais. Exploiting Value Prediction With Quasi-Unlimited Resources. Hardware Architecture [cs.AR]. 2012. dumas-00725221 HAL Id: dumas-00725221 https://dumas.ccsd.cnrs.fr/dumas-00725221 Submitted on 24 Aug 2012 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Exploiting Value Prediction With Quasi-Unlimited Resources Internship Report ISTIC - Research Master in Computer Science (MRI) Author : Arthur Perais ([email protected]) Supervisor : André Seznec ([email protected]) Team : ALF - Amdahl’s Law is Forever 1 Contents 1 State-of-the art 6 1.1 Superscalar Processors and Value Prediction . ............ 6 1.2 ValuePredictors .................................... 7 1.2.1 Computational Predictors . 8 1.2.2 FiniteContextMethod(FCM)Predictors . 9 1.2.3 HybridPredictors .................................. 11 1.3 Geometric Length Branch Predictors . 13 1.3.1 TAGEandITTAGE.................................... 13 2 Validating the Choice of Geometric Length Value Predictors 16 2.1 ExperimentalFramework ............................... 16 2.2 ExperimentalResults............................... 16 2.2.1 Global Branch History . 16 2.2.2 Path History . 17 2.3 Conclusion ......................................... 17 3 The Value TAgged GEometric Predictor 19 3.1 FromITTAGEtoVTAGE.................................
    [Show full text]