Exploiting Speculative Execution
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Computer Science 246 Computer Architecture Spring 2010 Harvard University
Computer Science 246 Computer Architecture Spring 2010 Harvard University Instructor: Prof. David Brooks [email protected] Dynamic Branch Prediction, Speculation, and Multiple Issue Computer Science 246 David Brooks Lecture Outline • Tomasulo’s Algorithm Review (3.1-3.3) • Pointer-Based Renaming (MIPS R10000) • Dynamic Branch Prediction (3.4) • Other Front-end Optimizations (3.5) – Branch Target Buffers/Return Address Stack Computer Science 246 David Brooks Tomasulo Review • Reservation Stations – Distribute RAW hazard detection – Renaming eliminates WAW hazards – Buffering values in Reservation Stations removes WARs – Tag match in CDB requires many associative compares • Common Data Bus – Achilles heal of Tomasulo – Multiple writebacks (multiple CDBs) expensive • Load/Store reordering – Load address compared with store address in store buffer Computer Science 246 David Brooks Tomasulo Organization From Mem FP Op FP Registers Queue Load Buffers Load1 Load2 Load3 Load4 Load5 Store Load6 Buffers Add1 Add2 Mult1 Add3 Mult2 Reservation To Mem Stations FP adders FP multipliers Common Data Bus (CDB) Tomasulo Review 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 LD F0, 0(R1) Iss M1 M2 M3 M4 M5 M6 M7 M8 Wb MUL F4, F0, F2 Iss Iss Iss Iss Iss Iss Iss Iss Iss Ex Ex Ex Ex Wb SD 0(R1), F0 Iss Iss Iss Iss Iss Iss Iss Iss Iss Iss Iss Iss Iss M1 M2 M3 Wb SUBI R1, R1, 8 Iss Ex Wb BNEZ R1, Loop Iss Ex Wb LD F0, 0(R1) Iss Iss Iss Iss M Wb MUL F4, F0, F2 Iss Iss Iss Iss Iss Ex Ex Ex Ex Wb SD 0(R1), F0 Iss Iss Iss Iss Iss Iss Iss Iss Iss M1 M2 -
Branch Prediction Side Channel Attacks
Predicting Secret Keys via Branch Prediction Onur Ac³i»cmez1, Jean-Pierre Seifert2;3, and C»etin Kaya Ko»c1;4 1 Oregon State University School of Electrical Engineering and Computer Science Corvallis, OR 97331, USA 2 Applied Security Research Group The Center for Computational Mathematics and Scienti¯c Computation Faculty of Science and Science Education University of Haifa Haifa 31905, Israel 3 Institute for Computer Science University of Innsbruck 6020 Innsbruck, Austria 4 Information Security Research Center Istanbul Commerce University EminÄonÄu,Istanbul 34112, Turkey [email protected], [email protected], [email protected] Abstract. This paper presents a new software side-channel attack | enabled by the branch prediction capability common to all modern high-performance CPUs. The penalty payed (extra clock cycles) for a mispredicted branch can be used for cryptanalysis of cryptographic primitives that employ a data-dependent program flow. Analogous to the recently described cache-based side-channel attacks our attacks also allow an unprivileged process to attack other processes running in parallel on the same processor, despite sophisticated partitioning methods such as memory protection, sandboxing or even virtualization. We will discuss in detail several such attacks for the example of RSA, and experimentally show their applicability to real systems, such as OpenSSL and Linux. More speci¯cally, we will present four di®erent types of attacks, which are all derived from the basic idea underlying our novel side-channel attack. Moreover, we also demonstrate the strength of the branch prediction side-channel attack by rendering the obvious countermeasure in this context (Montgomery Multiplication with dummy-reduction) as useless. -
BRANCH PREDICTORS Mahdi Nazm Bojnordi Assistant Professor School of Computing University of Utah
BRANCH PREDICTORS Mahdi Nazm Bojnordi Assistant Professor School of Computing University of Utah CS/ECE 6810: Computer Architecture Overview ¨ Announcements ¤ Homework 2 release: Sept. 26th ¨ This lecture ¤ Dynamic branch prediction ¤ Counter based branch predictor ¤ Correlating branch predictor ¤ Global vs. local branch predictors Big Picture: Why Branch Prediction? ¨ Problem: performance is mainly limited by the number of instructions fetched per second ¨ Solution: deeper and wider frontend ¨ Challenge: handling branch instructions Big Picture: How to Predict Branch? ¨ Static prediction (based on direction or profile) ¨ Always not-taken ¨ Target = next PC ¨ Always taken ¨ Target = unknown clk direction target ¨ Dynamic prediction clk PC + ¨ Special hardware using PC NPC 4 Inst. Memory Instruction Recall: Dynamic Branch Prediction ¨ Hardware unit capable of learning at runtime ¤ 1. Prediction logic n Direction (taken or not-taken) n Target address (where to fetch next) ¤ 2. Outcome validation and training n Outcome is computed regardless of prediction ¤ 3. Recovery from misprediction n Nullify the effect of instructions on the wrong path Branch Prediction ¨ Goal: avoiding stall cycles caused by branches ¨ Solution: static or dynamic branch predictor ¤ 1. prediction ¤ 2. validation and training ¤ 3. recovery from misprediction ¨ Performance is influenced by the frequency of branches (b), prediction accuracy (a), and misprediction cost (c) Branch Prediction ¨ Goal: avoiding stall cycles caused by branches ¨ Solution: static or dynamic branch predictor ¤ 1. prediction ¤ 2. validation and training ¤ 3. recovery from misprediction ¨ Performance is influenced by the frequency of branches (b), prediction accuracy (a), and misprediction cost (c) ��� ���� ��� 1 + �� ������� = = 234 = ��� ���� ���567 1 + 1 − � �� Problem ¨ A pipelined processor requires 3 stall cycles to compute the outcome of every branch before fetching next instruction; due to perfect forwarding/bypassing, no stall cycles are required for data/structural hazards; every 5th instruction is a branch. -
The James Bond Quiz Eye Spy...Which Bond? 1
THE JAMES BOND QUIZ EYE SPY...WHICH BOND? 1. 3. 2. 4. EYE SPY...WHICH BOND? 5. 6. WHO’S WHO? 1. Who plays Kara Milovy in The Living Daylights? 2. Who makes his final appearance as M in Moonraker? 3. Which Bond character has diamonds embedded in his face? 4. In For Your Eyes Only, which recurring character does not appear for the first time in the series? 5. Who plays Solitaire in Live And Let Die? 6. Which character is painted gold in Goldfinger? 7. In Casino Royale, who is Solange married to? 8. In Skyfall, which character is told to “Think on your sins”? 9. Who plays Q in On Her Majesty’s Secret Service? 10. Name the character who is the head of the Japanese Secret Intelligence Service in You Only Live Twice? EMOJI FILM TITLES 1. 6. 2. 7. ∞ 3. 8. 4. 9. 5. 10. GUESS THE LOCATION 1. Who works here in Spectre? 3. Who lives on this island? 2. Which country is this lake in, as seen in Quantum Of Solace? 4. Patrice dies here in Skyfall. Name the city. GUESS THE LOCATION 5. Which iconic landmark is this? 7. Which country is this volcano situated in? 6. Where is James Bond’s family home? GUESS THE LOCATION 10. In which European country was this iconic 8. Bond and Anya first meet here, but which country is it? scene filmed? 9. In GoldenEye, Bond and Xenia Onatopp race their cars on the way to where? GENERAL KNOWLEDGE 1. In which Bond film did the iconic Aston Martin DB5 first appear? 2. -
Första Bilderna Från Inspelningen Av SPECTRE
Första bilderna från inspelningen av SPECTRE SPECTRE - THE 24TH JAMES BOND ADVENTURE WATCH FIRST FOOTAGE FROM THE SET OF SPECTRE ON www.007.com FIRST LOOK IMAGE OF DANIEL CRAIG IN SPECTRE AVAILABLE ON www.sffilm.se At 07:00 on Thursday 12th February, watch the first exciting footage of SPECTRE from Austria on www.007.com. Featuring behind the scenes action with Daniel Craig, Léa Seydoux, Dave Bautista and Director Sam Mendes. Associate Producer, Gregg Wilson says "We have to deliver an amazing sequence and this is going to be one of the major action sequences of the movie, a jewel in the crown so to speak. It's going to be spectacular and Austria seemed to offer everything that we needed to pull it off." Production Designer Dennis Gassner adds, "The thing that Sam and I talked about was how we are going to top SKYFALL, it's going to be SPECTRE and so far it's a great start. I think that we are going to continue the history of the Bond films, making things that are exciting for the audience to look at and what could be more exciting than to be on top of the world." SYNOPSIS: A cryptic message from Bond's past sends him on a trail to uncover a sinister organisation. While M battles political forces to keep the secret service alive, Bond peels back the layers of deceit to reveal the terrible truth behind SPECTRE. SPECTRE will be released on November 6, 2015 Also available on the official James Bond social channels facebook.com/JamesBond007 and @007 About Albert R. -
Spectre, Connoting a Denied That This Was a Reference to the Earlier Films
Key Terms and Consider INTERTEXTUALITY Consider NARRATIVE conventions The white tuxedo intertextually references earlier Bond Behind Bond, image of a man wearing a skeleton mask and films (previous Bonds, including Roger Moore, have worn bone design on his jacket. Skeleton has connotations of Central image, protag- the white tuxedo, however this poster specifically refer- death and danger and the mask is covering up someone’s onist, hero, villain, title, ences Sean Connery in Goldfinger), providing a sense of identity, someone who wishes to remain hidden, someone star appeal, credit block, familiarity, nostalgia and pleasure to fans who recognise lurking in the shadows. It is quite easy to guess that this char- frame, enigma codes, the link. acter would be Propp’s villain and his mask that is reminis- signify, Long shot, facial Bond films have often deliberately referenced earlier films cent of such holidays as Halloween or Day of the Dead means expression, body lan- in the franchise, for example the ‘Bond girl’ emerging he is Bond’s antagonist and no doubt wants to kill him. This guage, colour, enigma from the sea (Ursula Andress in Dr No and Halle Berry in acts as an enigma code for theaudience as we want to find codes. Die Another Day). Daniel Craig also emerged from the sea out who this character is and why he wants Bond. The skele- in Casino Royale, his first outing as Bond, however it was ton also references the title of the film, Spectre, connoting a denied that this was a reference to the earlier films. ghostly, haunting presence from Bond’s past. -
18-741 Advanced Computer Architecture Lecture 1: Intro And
Computer Architecture Lecture 10: Branch Prediction Prof. Onur Mutlu ETH Zürich Fall 2017 25 October 2017 Mid-Semester Exam November 30 In class Questions similar to homework questions 2 High-Level Summary of Last Week SIMD Processing Array Processors Vector Processors SIMD Extensions Graphics Processing Units GPU Architecture GPU Programming 3 Agenda for Today & Tomorrow Control Dependence Handling Problem Six solutions Branch Prediction Other Methods of Control Dependence Handling 4 Required Readings McFarling, “Combining Branch Predictors,” DEC WRL Technical Report, 1993. Required T. Yeh and Y. Patt, “Two-Level Adaptive Training Branch Prediction,” Intl. Symposium on Microarchitecture, November 1991. MICRO Test of Time Award Winner (after 24 years) Required 5 Recommended Readings Smith and Sohi, “The Microarchitecture of Superscalar Processors,” Proceedings of the IEEE, 1995 More advanced pipelining Interrupt and exception handling Out-of-order and superscalar execution concepts Recommended Kessler, “The Alpha 21264 Microprocessor,” IEEE Micro 1999. Recommended 6 Control Dependence Handling 7 Control Dependence Question: What should the fetch PC be in the next cycle? Answer: The address of the next instruction All instructions are control dependent on previous ones. Why? If the fetched instruction is a non-control-flow instruction: Next Fetch PC is the address of the next-sequential instruction Easy to determine if we know the size of the fetched instruction If the instruction that is fetched is a control-flow -
Privacy and Security
Privacy and Security Sekar Kulandaivel, Jennifer Xiao - April 21, 2020 Understanding Contention-Based Channels and Using Them for Agenda Defense Spectre Attacks: Exploiting Speculative Execution Understanding Contention-Based Channels and Using Them for Defense (HPCA ‘15) Distrustful tenants living within a neutral cloud provider ● Shared hardware can be exploited to leak information ○ e.g. CPU usage vs. operation can expose secret key ● Two bodies of solutions: ○ HW-based: state-of-the-art is either limited in scope or requires impractical architecture changes ○ SW-based: HomeAlone forgoes shared hardware and permits only friendly co-residency, but still vulnerable to an intelligent attacker Threat model of a co-resident attacker ● Distrustful tenants violate confidentiality or compromise availability ● Goal: infer info about victim VM via microarchitectural structures e.g. cache and memory controllers ● Side-channel: victim inadvertently (oops!) leaks data inferred by attacker ● Covert channel: privileged malicious process on victim deliberately leaks data to attacker Known side-channels to transmit a ‘0’ or a ‘1’ (alt. exec.) ● Alternative execution attacks ○ Timing-driven: measure time to access memory portion ○ Access-driven: measure time to access specific cache misses Known side-channels to transmit a ‘0’ or a ‘1’ (parallel exec.) ● Parallel execution attacks ○ No time sharing required ○ E.g. Receiver monitors latency of memory fetch, sender either issues more instructions or idles Formal model of covert channels ● Detection failure (undetectable flow) = same rate of false positives and false negatives for both legitimate and covert traffic ● Network vs. microarchitectural channels: ○ Network receivers read silently ○ Microarch. receivers read destructively (overwrites when reading) ● Main insight: network channels are provably undetectable whereas microarch. -
OS and Compiler Considerations in the Design of the IA-64 Architecture
OS and Compiler Considerations in the Design of the IA-64 Architecture Rumi Zahir (Intel Corporation) Dale Morris, Jonathan Ross (Hewlett-Packard Company) Drew Hess (Lucasfilm Ltd.) This is an electronic reproduction of “OS and Compiler Considerations in the Design of the IA-64 Architecture” originally published in ASPLOS-IX (the Ninth International Conference on Architectural Support for Programming Languages and Operating Systems) held in Cambridge, MA in November 2000. Copyright © A.C.M. 2000 1-58113-317-0/00/0011...$5.00 Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full cita- tion on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permis- sion and/or a fee. Request permissions from Publications Dept, ACM Inc., fax +1 (212) 869-0481, or [email protected]. ASPLOS 2000 Cambridge, MA Nov. 12-15 , 2000 212 OS and Compiler Considerations in the Design of the IA-64 Architecture Rumi Zahir Jonathan Ross Dale Morris Drew Hess Intel Corporation Hewlett-Packard Company Lucas Digital Ltd. 2200 Mission College Blvd. 19447 Pruneridge Ave. P.O. Box 2459 Santa Clara, CA 95054 Cupertino, CA 95014 San Rafael, CA 94912 [email protected] [email protected] [email protected] [email protected] ABSTRACT system collaborate to deliver higher-performance systems. -
Selective Eager Execution on the Polypath Architecture
Selective Eager Execution on the PolyPath Architecture Artur Klauser, Abhijit Paithankar, Dirk Grunwald [klauser,pvega,grunwald]@cs.colorado.edu University of Colorado, Department of Computer Science Boulder, CO 80309-0430 Abstract average misprediction latency is the sum of the architected latency (pipeline depth) and a variable latency, which depends on data de- Control-flow misprediction penalties are a major impediment pendencies of the branch instruction. The overall cycles lost due to high performance in wide-issue superscalar processors. In this to branch mispredictions are the product of the total number of paper we present Selective Eager Execution (SEE), an execution mispredictions incurred during program execution and the average model to overcome mis-speculation penalties by executing both misprediction latency. Thus, reduction of either component helps paths after diffident branches. We present the micro-architecture decrease performance loss due to control mis-speculation. of the PolyPath processor, which is an extension of an aggressive In this paper we propose Selective Eager Execution (SEE) and superscalar, out-of-order architecture. The PolyPath architecture the PolyPath architecture model, which help to overcome branch uses a novel instruction tagging and register renaming mechanism misprediction latency. SEE recognizes the fact that some branches to execute instructions from multiple paths simultaneously in the are predicted more accurately than others, where it draws from re- same processor pipeline, while retaining maximum resource avail- search in branch confidence estimation [4, 6]. SEE behaves like ability for single-path code sequences. a normal monopath speculative execution architecture for highly Results of our execution-driven, pipeline-level simulations predictable branches; it predicts the most likely successor path of a show that SEE can improve performance by as much as 36% for branch, and evaluates instructions only along this path. -
State of the Art Regarding Both Compiler Optimizations for Instruction Fetch, and the Fetch Architectures for Which We Try to Optimize Our Applications
¢¡¤£¥¡§¦ ¨ © ¡§ ¦ £ ¡ In this work we are trying to increase fetch performance using both software and hardware tech- niques, combining them to obtain maximum performance at the minimum cost and complexity. In this chapter we introduce the state of the art regarding both compiler optimizations for instruction fetch, and the fetch architectures for which we try to optimize our applications. The first section introduces code layout optimizations. The compiler can influence instruction fetch performance by selecting the layout of instructions in memory. This determines both the conflict rate of the instruction cache, and the behavior (taken or not taken) of conditional branches. We present an overview of the different layout algorithms proposed in the literature, and how our own algorithm relates to them. The second section shows how instruction fetch architectures have evolved from pipelined processors to the more aggressive wide issue superscalars. This implies increasing fetch perfor- mance from a single instruction every few cycles, to one instruction per cycle, to a full basic block per cycle, and finally to multiple basic blocks per cycle. We emphasize the description of the trace cache, as it will be the reference architecture in most of our work. 2.1 Code layout optimizations The mapping of instruction to memory is determined by the compiler. This mapping determines not only the code page where an instruction is found, but also the cache line (or which set in a set associative cache) it will map to. Furthermore, a branch will be taken or not taken depending on the placement of the successor basic blocks. By mapping instructions in a different order, the compiler has a direct impact on the fetch engine performance. -
Igpu: Exception Support and Speculative Execution on Gpus
Appears in the 39th International Symposium on Computer Architecture, 2012 iGPU: Exception Support and Speculative Execution on GPUs Jaikrishnan Menon, Marc de Kruijf, Karthikeyan Sankaralingam Department of Computer Sciences University of Wisconsin-Madison {menon, dekruijf, karu}@cs.wisc.edu Abstract the evolution of traditional CPUs to illuminate why this pro- gression appears natural and imminent. Since the introduction of fully programmable vertex Just as CPU programmers were forced to explicitly man- shader hardware, GPU computing has made tremendous age CPU memories in the days before virtual memory, for advances. Exception support and speculative execution are almost a decade, GPU programmers directly and explicitly the next steps to expand the scope and improve the usabil- managed the GPU memory hierarchy. The recent release ity of GPUs. However, traditional mechanisms to support of NVIDIA’s Fermi architecture and AMD’s Fusion archi- exceptions and speculative execution are highly intrusive to tecture, however, has brought GPUs to an inflection point: GPU hardware design. This paper builds on two related in- both architectures implement a unified address space that sights to provide a unified lightweight mechanism for sup- eliminates the need for explicit memory movement to and porting exceptions and speculation on GPUs. from GPU memory structures. Yet, without demand pag- First, we observe that GPU programs can be broken into ing, something taken for granted in the CPU space, pro- code regions that contain little or no live register state at grammers must still explicitly reason about available mem- their entry point. We then also recognize that it is simple to ory. The drawbacks of exposing physical memory size to generate these regions in such a way that they are idempo- programmers are well known.