Branchscope: a New Side-Channel Attack on Directional Branch Predictor

Total Page:16

File Type:pdf, Size:1020Kb

Branchscope: a New Side-Channel Attack on Directional Branch Predictor Session 8A: Security and Protection ASPLOS’18, March 24–28, 2018, Williamsburg, VA, USA BranchScope: A New Side-Channel Attack on Directional Branch Predictor Dmitry Evtyushkin Ryan Riley College of William and Mary Carnegie Mellon University in Qatar [email protected] [email protected] Nael Abu-Ghazaleh Dmitry Ponomarev University of California Riverside Binghamton University [email protected] [email protected] Abstract Support for Programming Languages and Operating Systems (AS- We present BranchScope — a new side-channel attack where PLOS’18). ACM, New York, NY, USA, 15 pages. https://doi.org/10. the attacker infers the direction of an arbitrary conditional 1145/3173162.3173204 branch instruction in a victim program by manipulating the shared directional branch predictor. The directional compo- nent of the branch predictor stores the prediction on a given 1 Introduction branch (taken or not-taken) and is a different component Modern microprocessors rely on branch prediction units from the branch target buffer (BTB) attacked by previous (BPUs) to sustain uninterrupted instruction delivery to the work. BranchScope is the first fine-grained attack on the di- execution pipeline across conditional branches. When multi- rectional branch predictor, expanding our understanding of ple processes execute on the same physical core, they share the side channel vulnerability of the branch prediction unit. a single BPU. While attractive from utilization and complex- Our attack targets complex hybrid branch predictors with ity considerations, the sharing potentially opens the door unknown organization. We demonstrate how an attacker an attacker to manipulate the shared BPU state, create a can force these predictors to switch to a simple 1-level mode side-channel, and derive a direction or target of a branch to simplify the direction recovery. We carry out BranchScope instruction executed by a victim process. Such leakage can on several recent Intel CPUs and also demonstrate the attack compromise sensitive data. For example, when a branch in- against an SGX enclave. struction is conditioned on a bit of a secret key, the key CCS Concepts • Security and privacy → Side-channel bits are leaked directly. This occurs in implementations of analysis and countermeasures; Hardware reverse en- exponentiation algorithms [13, 32] and other key mathemat- gineering; ical operations [3] of modern cryptographic schemes. The attacker may also change the predictor state, changing its Keywords Branch Predictor, Attack, Side-channel, SGX, behavior in the victim. Microarchitecture Security, Timing Attacks, Performance On modern microprocessors, the BPU is composed of two Counters structures: the branch target buffer (BTB) and the directional predictor. Previous work has specifically targeted the BTB to ACM Reference Format: create side channels [1, 3, 21, 35]. In the BTB, the target of a Dmitry Evtyushkin, Ryan Riley, Nael Abu-Ghazaleh, and Dmitry conditional branch is updated only when the branch is taken; Ponomarev. 2018. BranchScope: A New Side-Channel Attack on this can be exploited to detect whether or not a particular Directional Branch Predictor. In Proceedings of 2018 Architectural victim branch is taken. The first attack in this area proposed several BTB-based attacks that are based on filling the BTB Permission to make digital or hard copies of all or part of this work for by the attacker, causing the eviction of entries belonging to personal or classroom use is granted without fee provided that copies the victim. By observing the timing of future accesses [3], are not made or distributed for profit or commercial advantage and that the attacker can infer new branches executed by the victim. copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must We describe those attacks and their limitations in the related be honored. Abstracting with credit is permitted. To copy otherwise, or work section. In other work [21], we recently proposed a republish, to post on servers or to redistribute to lists, requires prior specific side-channel attack on the BTB that creates BTB collisions permission and/or a fee. Request permissions from [email protected]. between the victim and the attacker processes, thus allowing ASPLOS’18, March 24–28, 2018, Williamsburg, VA, USA the attacker to discover the location of a particular victim’s © 2018 Copyright held by the owner/author(s). Publication rights licensed branch instruction in the address space, bypassing address to the Association for Computing Machinery. ACM ISBN ISBN 978-1-4503-4911-6/18/03...$15.00 space layout randomization. Lee et al. [35] built on that work https://doi.org/10.1145/3173162.3173204 by exploiting the BTB collisions to also discover the direction 693 Session 8A: Security and Protection ASPLOS’18, March 24–28, 2018, Williamsburg, VA, USA of the victim’s branch instructions. They demonstrated the community considers defenses against these attacks, the vul- attack in kernel space against Intel SGX enclaves. nerability outlined in BranchScope must also be addressed. In this paper, we propose a new micro-architectural side- In summary, the main contributions and the key results channel attack, which we call BranchScope, that targets the of this paper are: directional predictor as the source of information leakage. To the best of our knowledge, BranchScope is the first at- • We propose BranchScope — the first side-channel at- tack exploiting the directional predictor structure, showing tack explicitly targeted at extracting sensitive infor- that BPUs can be vulnerable even if the BTB is protected. mation through the directional branch predictor (as BranchScope works by forcing collisions between the attacker opposed to existing work targeting the Branch Target and selected victim branches and exploiting these collisions Buffer). BranchScope is not affected by defenses against to infer information about the victim branch. This attack BTB-based attacks. has new challenges not present in a BTB attack. In order to • We demonstrate that BranchScope works reliably and achieve collisions, we must overcome the unpredictability of efficiently from user space across three generations of the complex hybrid prediction mechanisms used in modern Intel processors in the presence of system noise, with CPUs. BranchScope overcomes this by generating branch an error rate of less than 1%. patterns that force the branch predictor to select the local • We show that BranchScope can be naturally extended one-level prediction even when complex multi-level predic- to attack SGX enclaves with even lower error rates tors are present in the processor. Second, after collisions than in traditional systems. are reliably created, the victim’s branch direction can be ro- • We describe both hardware and software countermea- bustly disclosed by an attacker executing a pair of branches sures to mitigate BranchScope, providing branch pre- with predefined outcomes, measuring the prediction accu- diction units that are secure to side channel attacks. racy of these branches, and correlating this information to the predictor state and thus to the direction of the victim’s branch. 2 Background: Branch Predictor Unit We demonstrate BranchScope on three recent Intel x86_64 Modern branch predictors [15, 31, 41, 43, 50] are typically processors — Sandy Bridge, Haswell and Skylake. To perform implemented as a composition of a simple one-level bimodal BranchScope, the attacker does not need to reverse-engineer predictor indexed directly by the program counter (we refer the details of the branch predictor operation, and only needs to it as the 1-level predictor [49]), and a gshare-style 2-level to perform simple manipulations with the prediction state predictor [57]. The gshare-like predictor exploits the obser- machines from the user space. We also demonstrate how vation that the branch outcome depends on the results of BranchScope can be extended to attack SGX enclaves even if recent branches, and not only on the address of the branch. A recently-proposed protections are implemented. We show selector table indexed by the branch address identifies which that BranchScope can be performed across hyperthreaded predictor is likely to perform better for a particular branch cores, advancing previously demonstrated BTB-based attacks based on the previous behavior of the predictors. This design which leaked information only between processes scheduled combines the best features of both component predictors. on the same virtual core [21]. This capability relaxes the Figure 1 illustrates one possible design of such a hybrid attacker’s process scheduling constraints, allowing a more predictor. The 1-level predictor stores its history in the form flexible attack. Finally, we describe countermeasures topre- of a 2-bit saturating counter in a pattern history table (PHT). vent the BranchScope attack in future systems. The gshare predictor has a more complex indexing scheme The recent Meltdown [36] and Spectre [34] attacks demon- that combines the program counter with the global history strated the vulnerability of speculative execution to side- register (GHR). The GHR records the outcomes of the last sev- channel attacks, directly impacting the security of current eral branches executed by the program. The branch history systems and leading to data exfiltration. Branch predictors information is also stored in the PHT using a 2-bit saturating are critical to these attacks
Recommended publications
  • Branch Prediction Side Channel Attacks
    Predicting Secret Keys via Branch Prediction Onur Ac³i»cmez1, Jean-Pierre Seifert2;3, and C»etin Kaya Ko»c1;4 1 Oregon State University School of Electrical Engineering and Computer Science Corvallis, OR 97331, USA 2 Applied Security Research Group The Center for Computational Mathematics and Scienti¯c Computation Faculty of Science and Science Education University of Haifa Haifa 31905, Israel 3 Institute for Computer Science University of Innsbruck 6020 Innsbruck, Austria 4 Information Security Research Center Istanbul Commerce University EminÄonÄu,Istanbul 34112, Turkey [email protected], [email protected], [email protected] Abstract. This paper presents a new software side-channel attack | enabled by the branch prediction capability common to all modern high-performance CPUs. The penalty payed (extra clock cycles) for a mispredicted branch can be used for cryptanalysis of cryptographic primitives that employ a data-dependent program flow. Analogous to the recently described cache-based side-channel attacks our attacks also allow an unprivileged process to attack other processes running in parallel on the same processor, despite sophisticated partitioning methods such as memory protection, sandboxing or even virtualization. We will discuss in detail several such attacks for the example of RSA, and experimentally show their applicability to real systems, such as OpenSSL and Linux. More speci¯cally, we will present four di®erent types of attacks, which are all derived from the basic idea underlying our novel side-channel attack. Moreover, we also demonstrate the strength of the branch prediction side-channel attack by rendering the obvious countermeasure in this context (Montgomery Multiplication with dummy-reduction) as useless.
    [Show full text]
  • BRANCH PREDICTORS Mahdi Nazm Bojnordi Assistant Professor School of Computing University of Utah
    BRANCH PREDICTORS Mahdi Nazm Bojnordi Assistant Professor School of Computing University of Utah CS/ECE 6810: Computer Architecture Overview ¨ Announcements ¤ Homework 2 release: Sept. 26th ¨ This lecture ¤ Dynamic branch prediction ¤ Counter based branch predictor ¤ Correlating branch predictor ¤ Global vs. local branch predictors Big Picture: Why Branch Prediction? ¨ Problem: performance is mainly limited by the number of instructions fetched per second ¨ Solution: deeper and wider frontend ¨ Challenge: handling branch instructions Big Picture: How to Predict Branch? ¨ Static prediction (based on direction or profile) ¨ Always not-taken ¨ Target = next PC ¨ Always taken ¨ Target = unknown clk direction target ¨ Dynamic prediction clk PC + ¨ Special hardware using PC NPC 4 Inst. Memory Instruction Recall: Dynamic Branch Prediction ¨ Hardware unit capable of learning at runtime ¤ 1. Prediction logic n Direction (taken or not-taken) n Target address (where to fetch next) ¤ 2. Outcome validation and training n Outcome is computed regardless of prediction ¤ 3. Recovery from misprediction n Nullify the effect of instructions on the wrong path Branch Prediction ¨ Goal: avoiding stall cycles caused by branches ¨ Solution: static or dynamic branch predictor ¤ 1. prediction ¤ 2. validation and training ¤ 3. recovery from misprediction ¨ Performance is influenced by the frequency of branches (b), prediction accuracy (a), and misprediction cost (c) Branch Prediction ¨ Goal: avoiding stall cycles caused by branches ¨ Solution: static or dynamic branch predictor ¤ 1. prediction ¤ 2. validation and training ¤ 3. recovery from misprediction ¨ Performance is influenced by the frequency of branches (b), prediction accuracy (a), and misprediction cost (c) ��� ���� ��� 1 + �� ������� = = 234 = ��� ���� ���567 1 + 1 − � �� Problem ¨ A pipelined processor requires 3 stall cycles to compute the outcome of every branch before fetching next instruction; due to perfect forwarding/bypassing, no stall cycles are required for data/structural hazards; every 5th instruction is a branch.
    [Show full text]
  • 18-741 Advanced Computer Architecture Lecture 1: Intro And
    Computer Architecture Lecture 10: Branch Prediction Prof. Onur Mutlu ETH Zürich Fall 2017 25 October 2017 Mid-Semester Exam November 30 In class Questions similar to homework questions 2 High-Level Summary of Last Week SIMD Processing Array Processors Vector Processors SIMD Extensions Graphics Processing Units GPU Architecture GPU Programming 3 Agenda for Today & Tomorrow Control Dependence Handling Problem Six solutions Branch Prediction Other Methods of Control Dependence Handling 4 Required Readings McFarling, “Combining Branch Predictors,” DEC WRL Technical Report, 1993. Required T. Yeh and Y. Patt, “Two-Level Adaptive Training Branch Prediction,” Intl. Symposium on Microarchitecture, November 1991. MICRO Test of Time Award Winner (after 24 years) Required 5 Recommended Readings Smith and Sohi, “The Microarchitecture of Superscalar Processors,” Proceedings of the IEEE, 1995 More advanced pipelining Interrupt and exception handling Out-of-order and superscalar execution concepts Recommended Kessler, “The Alpha 21264 Microprocessor,” IEEE Micro 1999. Recommended 6 Control Dependence Handling 7 Control Dependence Question: What should the fetch PC be in the next cycle? Answer: The address of the next instruction All instructions are control dependent on previous ones. Why? If the fetched instruction is a non-control-flow instruction: Next Fetch PC is the address of the next-sequential instruction Easy to determine if we know the size of the fetched instruction If the instruction that is fetched is a control-flow
    [Show full text]
  • State of the Art Regarding Both Compiler Optimizations for Instruction Fetch, and the Fetch Architectures for Which We Try to Optimize Our Applications
    ¢¡¤£¥¡§¦ ¨ © ¡§ ¦ £ ¡ In this work we are trying to increase fetch performance using both software and hardware tech- niques, combining them to obtain maximum performance at the minimum cost and complexity. In this chapter we introduce the state of the art regarding both compiler optimizations for instruction fetch, and the fetch architectures for which we try to optimize our applications. The first section introduces code layout optimizations. The compiler can influence instruction fetch performance by selecting the layout of instructions in memory. This determines both the conflict rate of the instruction cache, and the behavior (taken or not taken) of conditional branches. We present an overview of the different layout algorithms proposed in the literature, and how our own algorithm relates to them. The second section shows how instruction fetch architectures have evolved from pipelined processors to the more aggressive wide issue superscalars. This implies increasing fetch perfor- mance from a single instruction every few cycles, to one instruction per cycle, to a full basic block per cycle, and finally to multiple basic blocks per cycle. We emphasize the description of the trace cache, as it will be the reference architecture in most of our work. 2.1 Code layout optimizations The mapping of instruction to memory is determined by the compiler. This mapping determines not only the code page where an instruction is found, but also the cache line (or which set in a set associative cache) it will map to. Furthermore, a branch will be taken or not taken depending on the placement of the successor basic blocks. By mapping instructions in a different order, the compiler has a direct impact on the fetch engine performance.
    [Show full text]
  • Branch Prediction for Network Processors
    BRANCH PREDICTION FOR NETWORK PROCESSORS by David Bermingham, B.Eng Submitted in partial fulfilment of the requirements for the Degree of Doctor of Philosophy Dublin City University School of School of Electronic Engineering Supervisors: Dr. Xiaojun Wang Prof. Lingling Sun September 2010 I hereby certify that this material, which I now submit for assessment on the programme of study leading to the award of Doctor of Philosophy is entirely my own work, that I have exercised reasonable care to ensure that the work is original, and does not to the best of my knowledge breach any law of copy- right, and has not been taken from the work of others save and to the extent that such work has been cited and acknowledged within the text of my work. Signed: David Bermingham (Candidate) ID: Date: TABLE OF CONTENTS Abstract vii List of Figures ix List of Tables xii List of Acronyms xiv List of Peer-Reviewed Publications xvi Acknowledgements xviii 1 Introduction 1 1.1 Network Processors . 1 1.2 Trends Within Networks . 2 1.2.1 Bandwidth Growth . 2 1.2.2 Network Technologies . 3 1.2.3 Application and Service Demands . 3 1.3 Network Trends and Network Processors . 6 1.3.1 The Motivation for This Thesis . 7 1.4 Research Objectives . 9 1.5 Thesis Structure . 10 iii 2 Technical Background 12 2.1 Overview . 12 2.2 Networks . 13 2.2.1 Network Protocols . 13 2.2.2 Network Technologies . 14 2.2.3 Router Architecture . 17 2.3 Network Processors . 19 2.3.1 Intel IXP-28XX Network Processor .
    [Show full text]
  • Trends in Processor Architecture
    A. González Trends in Processor Architecture Trends in Processor Architecture Antonio González Universitat Politècnica de Catalunya, Barcelona, Spain 1. Past Trends Processors have undergone a tremendous evolution throughout their history. A key milestone in this evolution was the introduction of the microprocessor, term that refers to a processor that is implemented in a single chip. The first microprocessor was introduced by Intel under the name of Intel 4004 in 1971. It contained about 2,300 transistors, was clocked at 740 KHz and delivered 92,000 instructions per second while dissipating around 0.5 watts. Since then, practically every year we have witnessed the launch of a new microprocessor, delivering significant performance improvements over previous ones. Some studies have estimated this growth to be exponential, in the order of about 50% per year, which results in a cumulative growth of over three orders of magnitude in a time span of two decades [12]. These improvements have been fueled by advances in the manufacturing process and innovations in processor architecture. According to several studies [4][6], both aspects contributed in a similar amount to the global gains. The manufacturing process technology has tried to follow the scaling recipe laid down by Robert N. Dennard in the early 1970s [7]. The basics of this technology scaling consists of reducing transistor dimensions by a factor of 30% every generation (typically 2 years) while keeping electric fields constant. The 30% scaling in the dimensions results in doubling the transistor density (doubling transistor density every two years was predicted in 1975 by Gordon Moore and is normally referred to as Moore’s Law [21][22]).
    [Show full text]
  • 5 Microprocessors
    Color profile: Disabled Composite Default screen BaseTech / Mike Meyers’ CompTIA A+ Guide to Managing and Troubleshooting PCs / Mike Meyers / 380-8 / Chapter 5 5 Microprocessors “MEGAHERTZ: This is a really, really big hertz.” —DAVE BARRY In this chapter, you will learn or all practical purposes, the terms microprocessor and central processing how to Funit (CPU) mean the same thing: it’s that big chip inside your computer ■ Identify the core components of a that many people often describe as the brain of the system. You know that CPU CPU makers name their microprocessors in a fashion similar to the automobile ■ Describe the relationship of CPUs and memory industry: CPU names get a make and a model, such as Intel Core i7 or AMD ■ Explain the varieties of modern Phenom II X4. But what’s happening inside the CPU to make it able to do the CPUs amazing things asked of it every time you step up to the keyboard? ■ Install and upgrade CPUs 124 P:\010Comp\BaseTech\380-8\ch05.vp Friday, December 18, 2009 4:59:24 PM Color profile: Disabled Composite Default screen BaseTech / Mike Meyers’ CompTIA A+ Guide to Managing and Troubleshooting PCs / Mike Meyers / 380-8 / Chapter 5 Historical/Conceptual ■ CPU Core Components Although the computer might seem to act quite intelligently, comparing the CPU to a human brain hugely overstates its capabilities. A CPU functions more like a very powerful calculator than like a brain—but, oh, what a cal- culator! Today’s CPUs add, subtract, multiply, divide, and move billions of numbers per second.
    [Show full text]
  • Opinion Ten Reasons Why HP’S Itanium-Based Servers Have Reached the Point-Of-No-Return
    Opinion Ten Reasons Why HP’s Itanium-based Servers Have Reached the Point-of-No-Return Executive Summary Hewlett-Packard (HP) refers to its HP Integrity and Integrity NonStop Itanium-based servers as “business critical systems”. In Q4, 2007, HP sold over $1 billion of these business critical systems. But, since then, due to a number of factors, Itanium-based server sales have declined significantly. Over the past year, business critical systems sales have hovered in the $400 million range per quarter, an almost 60% decline as compared with the 2007 high-mark. From our perspective, HP’s Itanium-based servers have now achieved a form of stasis (a medical term that refers to an inactive state). We expect a rise in Itanium business this quarter (due to pent-up demand for the new Itanium 9500), but we also expect that – within in a few quarters – underlying, dogging problems will again drive Itanium business downward. These problems include HP’s financial woes; increased competition (particularly from Intel x86-based servers); market factors (such as the market move toward Linux and a market preference for x86 architecture); a broken ecosystem (where HP has actually had to take legal action to get a business partner to keep supporting its independent software on Itanium-based platforms); an ill-founded Itanium recovery plan known as “converged infrastructure”; and more (in fact, we list a total of ten reasons why we believe HP’s Itanium-based servers have reached the point-of-no- return on page 2 of this Opinion)… In this Opinion , Clabby Analytics describes why we believe that HP’s business critical Integrity servers have now reached the point-of-no-return.
    [Show full text]
  • Energy Efficient Branch Prediction
    View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by University of Hertfordshire Research Archive Energy Efficient Branch Prediction Michael Andrew Hicks A thesis submitted in partial fulfilment of the requirements of the University of Hertfordshire for the degree of Doctor of Philosophy December 2007 To my family and friends. Contents 1 Introduction 1 1.1 Thesis Statement . 1 1.2 Motivation and Energy Efficiency . 1 1.3 Branch Prediction . 3 1.4 Contributions . 4 1.5 Dissertation Structure . 5 2 Energy Efficiency in Modern Processor Design 7 2.1 Transistor Level Power Dissipation . 7 2.1.1 Static Dissipation . 8 2.1.2 Dynamic Dissipation . 9 2.1.3 Energy Efficiency Metrics . 9 2.2 Transistor Level Energy Efficiency Techniques . 10 2.2.1 Clock Gating and Vdd Gating . 10 2.2.2 Technology Scaling . 11 2.2.3 Voltage Scaling . 11 2.2.4 Logic Optimisation . 11 2.3 Architecture & Software Level Efficiency Techniques . 11 2.3.1 Activity Factor Reduction . 12 2.3.2 Delay Reduction . 12 2.3.3 Low Power Scheduling . 12 2.3.4 Frequency Scaling . 13 2.4 Branch Prediction . 13 2.4.1 The Branch Problem . 13 2.4.2 Dynamic and Static Prediction . 14 2.4.3 Dynamic Predictors . 15 2.4.4 Power Consumption . 18 2.5 Summary . 18 3 Related Techniques 20 3.1 The Prediction Probe Detector (Hardware) . 20 3.1.1 Implementation . 20 3.1.2 Pipeline Gating . 22 i 3.2 Software Based Approaches . 23 3.2.1 Hinting and Hint Instructions .
    [Show full text]
  • T13: Advanced Processors – Branch Prediction
    ECE 4750 Computer Architecture, Fall 2020 T13 Advanced Processors: Branch Prediction School of Electrical and Computer Engineering Cornell University revision: 2020-11-11-15-47 1 Branch Prediction Overview 3 2 Software-Based Branch Prediction 4 2.1. Static Software Hints . .5 2.2. Branch Delay Slots . .6 2.3. Predication . .7 3 Hardware-Based Branch Prediction 8 3.1. Fixed Branch Predictor . .8 3.2. Branch History Table (BHT) Predictor . 10 3.3. Two-Level Predictor For Temporal Correlation . 14 3.4. Two-Level Predictor For Spatial Correlation . 15 3.5. Generalized Two-Level Predictors . 16 3.6. Tournament Predictors . 17 3.7. Branch Target Buffers (BTBs) Predictor . 18 Copyright © 2018 Christopher Batten, Christina Delimitrou. All rights reserved. This hand- out was originally prepared by Prof. Christopher Batten at Cornell University for ECE 4750 / CS 4420. It has since been updated by Prof. Christina Delimitrou. Download and use of this handout is permitted for individual educational non-commercial purposes only. 1 Redistribution either in part or in whole via both commercial or non-commercial means requires written permission. 2 1. Branch Prediction Overview 1. Branch Prediction Overview Assume incorrect branch prediction in dual-issue I2OL processor. bne opA opB opC opD opE opF opG opTARG Assume correct branch prediction in dual-issue I2OL processor. bne opA opTARG opX opY opZ Three critical pieces of information we need to predict control flow: • (1) Is this instruction a control flow instruction? • (2) What is the target of this control flow instruction? • (3) Do we redirect control flow to the target or next instr? 2 2.
    [Show full text]
  • The IA-32 Processor Architecture
    The IA-32 processor architecture Nicholas FitzRoy-Dale Document Revision: 1 Date: 2006/05/30 22:31:24 [email protected] http://www.cse.unsw.edu.au/∼disy/ Operating Systems and Distributed Systems Group School of Computer Science and Engineering The University of New South Wales UNSW Sydney 2052, Australia 1 Introduction This report discusses the most common instruction set architecture for desktop microprocessors: IA- 32. From a programmer’s perspective, IA-32 has not changed changed significantly since its introduc- tion with the Intel 80386 processor in 1985. IA-32 implementations, however, have undergone dra- matic changes in order to stay competitive with more modern architectures, particularly in the area of instruction-level parallelism. This report discusses the history of IA-32, and then the architectural features of recent IA-32 im- plementations, with particular regard to caching, multiprocessing, and instruction-level parallelism. An archtectural description is not particularly useful in isolation. Therefore, to provide context, each as- pect is compared with analogous features of other architectures, with particular attention paid to the RISC-style ARM processor and the VLIW-inspired Itanium. 2 A brief history of IA-32 IA-32 first appeared with the 80386 processor, but the architecture was by no means completely new. IA-32’s 8-bit predecessor first appeared in the Datapoint 2200 programmable terminal, released in 1971. Under contract to produce a single-chip version of the terminal’s multiple-chip TTL design, Intel’s im- plementation, the 8008, was not included in the terminal. Intel released the chip in 1972.
    [Show full text]
  • Bias-Free Branch Predictor
    Bias-Free Branch Predictor Dibakar Gope and Mikko H. Lipasti Department of Electrical and Computer Engineering University of Wisconsin - Madison [email protected], [email protected] Abstract—Prior research in neurally-inspired perceptron are separated by a function call containing many branches, predictors and Geometric History Length-based TAGE predic- a longer history is likely to capture the correlated branch tors has shown significant improvements in branch prediction that appeared in the dynamic execution stream prior to the accuracy by exploiting correlations in long branch histories. However, not all branches in the long branch history provide function call. useful context. Biased branches resolve as either taken or Prediction accuracy of the recently proposed ISL-TAGE not-taken virtually every time. Including them in the branch predictor [4] further confirms that looking at much longer predictor’s history does not directly contribute any useful histories (of the order of 2000 branches) can provide useful information, but all existing history-based predictors include information for prediction. However, scaling a state-of-the- them anyway. In this work, we propose Bias-Free branch predictors that art perceptron-based predictor [1] from 64KB to 1MB to are structured to learn correlations only with non-biased track distant branch correlations results in long computa- conditional branches, aka. branches whose dynamic behavior tional latency and high energy consumption in the large varies during a program’s execution. This, combined with a storage structures, which may prohibit the incorporation recency-stack-like management policy for the global history of such a branch predictor into a commercial processor.
    [Show full text]