Exploitable Hardware Features and Vulnerabilities Enhanced Side-Channel Attacks on Intel SGX and Their Countermeasures

Total Page:16

File Type:pdf, Size:1020Kb

Exploitable Hardware Features and Vulnerabilities Enhanced Side-Channel Attacks on Intel SGX and Their Countermeasures Exploitable Hardware Features and Vulnerabilities Enhanced Side-Channel Attacks on Intel SGX and Their Countermeasures Dissertation Presented in Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in the Graduate School of The Ohio State University By Guoxing Chen, B.S., M.S., Graduate Program in Computer Science and Engineering The Ohio State University 2019 Dissertation Committee: Dr. Ten H. Lai, Advisor Dr. Yinqian Zhang, Co-Advisor Dr. Radu Teodorescu Dr. Zhiqiang Lin c Copyright by Guoxing Chen 2019 Abstract Intel Software Guard eXtensions (SGX) provides software applications shielded execu- tion environments to run private code and operate sensitive data, where both the code and data are isolated from the rest of the software systems. Despite of its security promises, today’s SGX design has been demonstrated to be vulnerable to various side-channel attacks, and countermeasures have been proposed to mitigate these attacks. However, current under- standing of the attack vectors and the corresponding countermeasures is insufficient. This dissertation explores new attacks when the adversary could exploit hardware features, such as Hyper-Threading and speculative execution, and aims to design comprehensive defense mechanisms that could address existing threats. Specifically, we first demonstrate how to abuse Hyper-Threading to launch attacks that could bypass existing AEX-based mitigations. Then, we introduce SGXPECTRE Attacks, the SGX-variants of the recently disclosed Spec- tre attacks, that exploit speculative execution vulnerabilities to subvert the confidentiality of SGX enclaves. On the defense side, we first design and implement HYPERRACE, an LLVM-based tool for instrumenting SGX enclave programs to eradicate all side-channel threats due to Hyper-Threading. Then, to address the limitations of existing mitigations, we extend the idea of HYPERRACE and propose the concept of verifiable execution contracts, which request the privileged software to provide a benign execution environment for enclave within which launching attacks becomes infeasible. ii To my father, Yizai Chen, my mother, Linyan Yang, my sisters, Fangfang Chen and Xiaofang Chen, who love and support me unconditionally to pursue my dreams. iii Acknowledgments I would like to express my heartfelt gratitude to my advisors, Dr. Ten H. Lai and Dr. Yinqian Zhang for their patient and careful supervision. Dr. Lai takes me under his wings, offering me complete freedom in pursuing my own research interests and sharing with me his infectious optimism about research and life. Dr. Zhang leads me to explore cutting edge area of research and teach me patiently to tackle research problems with his extensive knowledge and expertise. His incredible energy and passion for research inspired me a lot. I feel double lucky to have both of them as my advisors. I also want to thank my collaborators and mentors. In particular, I would like to thank Dr. Dong Xuan, who taught me a lot over the years. I did enjoy the moments when we worked together to build various amazing systems. Beyond research, Dr. Xuan is also a great friend, who gave me many valuable suggestions when I encountered difficulties and unexpected situations. I am also grateful to Dr. Michael Reiter, Dr. XiaoFeng Wang and Dr. Zhiqiang Lin, for their extensive advice and dedication to our collaborative research projects. I feel so honored to have worked with all of them. iv Vita May 14, 1988 . Born, Wenzhou, China. 2010 . B.S. Information Engineering, Shanghai Jiao Tong University, Shanghai, China. 2013 ....................................... M.S. Information and Communication En- gineering, Shanghai Jiao Tong University, Shanghai, China. 2013-present . .Ph.D. Candidate, Computer Science and Engineering, The Ohio State University, USA. Publications Research Publications Guoxing Chen, Sanchuan Chen, Yuan Xiao, Yinqian Zhang, Zhiqiang Lin, and Ten H. Lai. SgxPectre Attacks: Stealing Intel Secrets from SGX Enclaves via Speculative Execution In Proceedings of IEEE European Symposium on Security and Privacy (EuroS&P), 2019. Guoxing Chen*, Wenhao Wang* (*co-first authors), Tianyu Chen, Sanchuan Chen, Yinqian Zhang, XiaoFeng Wang, Ten H. Lai, Dongdai Lin. Racing in Hyperspace: Closing Hyper- Threading Side Channels on SGX with Contrived Data Races In Proceeding of IEEE Symposium on Security and Privacy (S&P), 2018. Guoxing Chen, Ten H. Lai, Michael Reiter, Yinqian Zhang. Differentially Private Ac- cess Patterns for Searchable Symmetric Encryption In Proceeding of IEEE International Conference on Computer Communications (INFOCOM), 2018. v Wenhao Wang, Guoxing Chen, Xiaorui Pan, Yinqian Zhang, XiaoFeng Wang, Vincent Bindschaedler, Haixu Tang, Carl A. Gunter. Leaky Cauldron on the Dark Land: Understand- ing Memory Side-Channel Hazards in SGX In Proceedings of ACM SIGSAC Conference on Computer and Communications Security (CCS), 2017. Gang Li, Fan Yang, Guoxing Chen, Qiang Zhai, Xinfeng Li, Jin Teng, Junda Zhu, Dong Xuan, Biao Chen, Wei Zhao. EV-Matching: Bridging Large Visual Data and Electronic Data for Efficient Surveillance In Proceeding of IEEE International Conference on Distributed Computing Systems (ICDCS), 2017. Fan Yang, Qiang Zhai, Guoxing Chen, Adam C. Champion, Junda Zhu, Dong Xuan. Flash- Loc: Flashing Mobile Phones for Accurate Indoor Localization In Proceeding of IEEE International Conference on Computer Communications (INFOCOM), 2016. Jihun Hamm, Adam Champion, Guoxing Chen, Mikhail Belkin, Dong Xuan. Crowd-ML: A Privacy-Preserving Learning Framework for a Crowd of Smart Devices In Proceeding of IEEE International Conference on Distributed Computing Systems (ICDCS), 2015. Wenjie Lin, Guoxing Chen, Ten H. Lai, David Lee. Detecting the Vulnerability of Multi- Party Authorization Protocols to Name Matching Attacks In Proceedings of the International Conference on Security and Management (SAM), 2014. Guoxing Chen, Zhengzheng Xiang, Changqing Xu, Meixia Tao. On Degrees of Freedom of Cognitive Networks with User Cooperation In IEEE Wireless Communications Letters, 2012. Fields of Study Major Field: Computer Science and Engineering vi Table of Contents Page Abstract . ii Dedication . iii Acknowledgments . iv Vita ...........................................v List of Tables . .x List of Figures . xi 1. Introduction . .1 1.1 Overview . .1 1.2 HT-SPM: Hyper-Threading Assisted Sneaky Page Monitoring Attacks .3 1.3 SGXPECTRE: Speculative Execution Enabled Side-Channel Attacks . .5 1.4 HYPERRACE: Hyper-Threading Side-Channel Mitigation . .6 1.5 Securing TEEs with Verifiable Execution Contracts . .7 2. Background and Threat Model . .9 2.1 Intel SGX . .9 2.2 Intel Processor Internals . 12 2.2.1 Cache and Memory Hierarchy . 12 2.2.2 Hardware Extensions of Intel Processors . 13 2.2.3 Out-of-order and Speculative Execution . 14 2.3 Threat Model . 15 2.4 Existing Threats to SGX . 16 2.5 Effectiveness of Existing Defenses . 19 vii 3. HT-SPM: Hyper-Threading Assisted Sneaky Page Monitoring Attacks . 21 3.1 Overview . 21 3.2 Design . 24 3.3 Evaluation . 25 4. SGXPECTRE: Speculative Execution Enabled Side-Channel Attacks . 28 4.1 SGXPECTRE Attacks . 30 4.1.1 A Simple Example . 30 4.1.2 Injecting Branch Targets into Enclaves . 32 4.1.3 Controlling Registers in Enclaves . 35 4.1.4 Leaking Secrets via Side Channels . 36 4.1.5 Winning a Race Condition . 38 4.2 Attack Gadgets Identification . 39 4.2.1 Types of Gadgets . 39 4.2.2 Symbolically Executing SGX Code . 41 4.2.3 Gadget Identification . 42 4.2.4 Experimental Results of Gadget Detection . 43 4.3 Stealing Enclave Secrets . 48 4.3.1 Reading Register Values from Arbitrary Enclaves . 48 4.3.2 Stealing Intel Secrets . 52 4.4 Evaluating Existing Countermeasures . 55 4.5 Is SGX Broken? . 57 4.5.1 Intel’s Secrets . 57 4.5.2 Defense via Centralized Attestation Services . 60 4.6 Summary . 61 5. HYPERRACE: Hyper-Threading Side-Channel Mitigation . 62 5.1 Overview . 62 5.1.1 Motivation . 62 5.1.2 Design Summary . 64 5.2 Physical-core Co-Location Tests . 66 5.2.1 Straw-man Solutions . 66 5.2.2 Co-Location Test via Data Race Probability . 68 5.3 Security Analysis of Co-location Tests . 76 5.3.1 Security Model . 77 5.3.2 Security Analysis . 79 5.3.3 Empirical Security Evaluation . 88 5.4 Protecting Enclave Programs with HYPERRACE ............. 91 viii 5.4.1 Safeguarding Enclave Programs . 91 5.4.2 Implementation of HYPERRACE ................. 93 5.5 Performance Evaluation . 93 5.5.1 nbench . 94 5.5.2 Cryptographic Libraries . 98 5.6 Summary . 99 6. Securing TEEs with Verifiable Execution Contracts . 101 6.1 Overview . 102 6.1.1 Limitations of Existing Defenses . 102 6.1.2 Verifiable Execution Contracts as Defense . 103 6.2 Execution contracts . 104 6.2.1 Construction of Execution Contracts . 105 6.2.2 Security Guarantees . 110 6.2.3 Remaining Challenges . 113 6.3 Verifiability . 113 6.3.1 Available Signals . 113 6.3.2 Verifiability Models . 114 6.3.3 Verification of Proposed Contracts . 116 6.4 Implementation . 119 6.4.1 Enforcing Execution Contracts . 119 6.4.2 Verifying Execution Contracts . 121 6.5 Evaluation . 123 6.5.1 Security Evaluation . 123 6.5.2 Performance Evaluation . 124 6.6 Execution Contracts without Memory Confidentiality . 129 6.6.1 Threat Analysis . 130 6.6.2 Defeating Memory Leaks with Execution Contracts . 131 6.6.3 Microcode-Level Mitigation . 133 6.6.4 Preventing Replay Attacks . 136 6.7 Discussion . 137 6.8 Summary . 138 7. Conclusion . 139 Bibliography . 141 ix List of Tables Table Page 2.1 MESI cache line states. 12 2.2 Existing threats to SGX . 17 3.1 Configuration of the testbed, available per logical core when HyperThread- ing is enabled. 26 4.1 SGXPECTRE Attack Type-I gadgets in.
Recommended publications
  • Memory Hierarchy Memory Hierarchy
    Memory Key challenge in modern computer architecture Lecture 2: different memory no point in blindingly fast computation if data can’t be and variable types moved in and out fast enough need lots of memory for big applications Prof. Mike Giles very fast memory is also very expensive [email protected] end up being pushed towards a hierarchical design Oxford University Mathematical Institute Oxford e-Research Centre Lecture 2 – p. 1 Lecture 2 – p. 2 CPU Memory Hierarchy Memory Hierarchy Execution speed relies on exploiting data locality 2 – 8 GB Main memory 1GHz DDR3 temporal locality: a data item just accessed is likely to be used again in the near future, so keep it in the cache ? 200+ cycle access, 20-30GB/s spatial locality: neighbouring data is also likely to be 6 used soon, so load them into the cache at the same 2–6MB time using a ‘wide’ bus (like a multi-lane motorway) L3 Cache 2GHz SRAM ??25-35 cycle access 66 This wide bus is only way to get high bandwidth to slow 32KB + 256KB main memory ? L1/L2 Cache faster 3GHz SRAM more expensive ??? 6665-12 cycle access smaller registers Lecture 2 – p. 3 Lecture 2 – p. 4 Caches Importance of Locality The cache line is the basic unit of data transfer; Typical workstation: typical size is 64 bytes 8 8-byte items. ≡ × 10 Gflops CPU 20 GB/s memory L2 cache bandwidth With a single cache, when the CPU loads data into a ←→ 64 bytes/line register: it looks for line in cache 20GB/s 300M line/s 2.4G double/s ≡ ≡ if there (hit), it gets data At worst, each flop requires 2 inputs and has 1 output, if not (miss), it gets entire line from main memory, forcing loading of 3 lines = 100 Mflops displacing an existing line in cache (usually least ⇒ recently used) If all 8 variables/line are used, then this increases to 800 Mflops.
    [Show full text]
  • Inside Intel® Core™ Microarchitecture Setting New Standards for Energy-Efficient Performance
    White Paper Inside Intel® Core™ Microarchitecture Setting New Standards for Energy-Efficient Performance Ofri Wechsler Intel Fellow, Mobility Group Director, Mobility Microprocessor Architecture Intel Corporation White Paper Inside Intel®Core™ Microarchitecture Introduction Introduction 2 The Intel® Core™ microarchitecture is a new foundation for Intel®Core™ Microarchitecture Design Goals 3 Intel® architecture-based desktop, mobile, and mainstream server multi-core processors. This state-of-the-art multi-core optimized Delivering Energy-Efficient Performance 4 and power-efficient microarchitecture is designed to deliver Intel®Core™ Microarchitecture Innovations 5 increased performance and performance-per-watt—thus increasing Intel® Wide Dynamic Execution 6 overall energy efficiency. This new microarchitecture extends the energy efficient philosophy first delivered in Intel's mobile Intel® Intelligent Power Capability 8 microarchitecture found in the Intel® Pentium® M processor, and Intel® Advanced Smart Cache 8 greatly enhances it with many new and leading edge microar- Intel® Smart Memory Access 9 chitectural innovations as well as existing Intel NetBurst® microarchitecture features. What’s more, it incorporates many Intel® Advanced Digital Media Boost 10 new and significant innovations designed to optimize the Intel®Core™ Microarchitecture and Software 11 power, performance, and scalability of multi-core processors. Summary 12 The Intel Core microarchitecture shows Intel’s continued Learn More 12 innovation by delivering both greater energy efficiency Author Biographies 12 and compute capability required for the new workloads and usage models now making their way across computing. With its higher performance and low power, the new Intel Core microarchitecture will be the basis for many new solutions and form factors. In the home, these include higher performing, ultra-quiet, sleek and low-power computer designs, and new advances in more sophisticated, user-friendly entertainment systems.
    [Show full text]
  • Computer Science 246 Computer Architecture Spring 2010 Harvard University
    Computer Science 246 Computer Architecture Spring 2010 Harvard University Instructor: Prof. David Brooks [email protected] Dynamic Branch Prediction, Speculation, and Multiple Issue Computer Science 246 David Brooks Lecture Outline • Tomasulo’s Algorithm Review (3.1-3.3) • Pointer-Based Renaming (MIPS R10000) • Dynamic Branch Prediction (3.4) • Other Front-end Optimizations (3.5) – Branch Target Buffers/Return Address Stack Computer Science 246 David Brooks Tomasulo Review • Reservation Stations – Distribute RAW hazard detection – Renaming eliminates WAW hazards – Buffering values in Reservation Stations removes WARs – Tag match in CDB requires many associative compares • Common Data Bus – Achilles heal of Tomasulo – Multiple writebacks (multiple CDBs) expensive • Load/Store reordering – Load address compared with store address in store buffer Computer Science 246 David Brooks Tomasulo Organization From Mem FP Op FP Registers Queue Load Buffers Load1 Load2 Load3 Load4 Load5 Store Load6 Buffers Add1 Add2 Mult1 Add3 Mult2 Reservation To Mem Stations FP adders FP multipliers Common Data Bus (CDB) Tomasulo Review 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 LD F0, 0(R1) Iss M1 M2 M3 M4 M5 M6 M7 M8 Wb MUL F4, F0, F2 Iss Iss Iss Iss Iss Iss Iss Iss Iss Ex Ex Ex Ex Wb SD 0(R1), F0 Iss Iss Iss Iss Iss Iss Iss Iss Iss Iss Iss Iss Iss M1 M2 M3 Wb SUBI R1, R1, 8 Iss Ex Wb BNEZ R1, Loop Iss Ex Wb LD F0, 0(R1) Iss Iss Iss Iss M Wb MUL F4, F0, F2 Iss Iss Iss Iss Iss Ex Ex Ex Ex Wb SD 0(R1), F0 Iss Iss Iss Iss Iss Iss Iss Iss Iss M1 M2
    [Show full text]
  • Branch Prediction Side Channel Attacks
    Predicting Secret Keys via Branch Prediction Onur Ac³i»cmez1, Jean-Pierre Seifert2;3, and C»etin Kaya Ko»c1;4 1 Oregon State University School of Electrical Engineering and Computer Science Corvallis, OR 97331, USA 2 Applied Security Research Group The Center for Computational Mathematics and Scienti¯c Computation Faculty of Science and Science Education University of Haifa Haifa 31905, Israel 3 Institute for Computer Science University of Innsbruck 6020 Innsbruck, Austria 4 Information Security Research Center Istanbul Commerce University EminÄonÄu,Istanbul 34112, Turkey [email protected], [email protected], [email protected] Abstract. This paper presents a new software side-channel attack | enabled by the branch prediction capability common to all modern high-performance CPUs. The penalty payed (extra clock cycles) for a mispredicted branch can be used for cryptanalysis of cryptographic primitives that employ a data-dependent program flow. Analogous to the recently described cache-based side-channel attacks our attacks also allow an unprivileged process to attack other processes running in parallel on the same processor, despite sophisticated partitioning methods such as memory protection, sandboxing or even virtualization. We will discuss in detail several such attacks for the example of RSA, and experimentally show their applicability to real systems, such as OpenSSL and Linux. More speci¯cally, we will present four di®erent types of attacks, which are all derived from the basic idea underlying our novel side-channel attack. Moreover, we also demonstrate the strength of the branch prediction side-channel attack by rendering the obvious countermeasure in this context (Montgomery Multiplication with dummy-reduction) as useless.
    [Show full text]
  • BRANCH PREDICTORS Mahdi Nazm Bojnordi Assistant Professor School of Computing University of Utah
    BRANCH PREDICTORS Mahdi Nazm Bojnordi Assistant Professor School of Computing University of Utah CS/ECE 6810: Computer Architecture Overview ¨ Announcements ¤ Homework 2 release: Sept. 26th ¨ This lecture ¤ Dynamic branch prediction ¤ Counter based branch predictor ¤ Correlating branch predictor ¤ Global vs. local branch predictors Big Picture: Why Branch Prediction? ¨ Problem: performance is mainly limited by the number of instructions fetched per second ¨ Solution: deeper and wider frontend ¨ Challenge: handling branch instructions Big Picture: How to Predict Branch? ¨ Static prediction (based on direction or profile) ¨ Always not-taken ¨ Target = next PC ¨ Always taken ¨ Target = unknown clk direction target ¨ Dynamic prediction clk PC + ¨ Special hardware using PC NPC 4 Inst. Memory Instruction Recall: Dynamic Branch Prediction ¨ Hardware unit capable of learning at runtime ¤ 1. Prediction logic n Direction (taken or not-taken) n Target address (where to fetch next) ¤ 2. Outcome validation and training n Outcome is computed regardless of prediction ¤ 3. Recovery from misprediction n Nullify the effect of instructions on the wrong path Branch Prediction ¨ Goal: avoiding stall cycles caused by branches ¨ Solution: static or dynamic branch predictor ¤ 1. prediction ¤ 2. validation and training ¤ 3. recovery from misprediction ¨ Performance is influenced by the frequency of branches (b), prediction accuracy (a), and misprediction cost (c) Branch Prediction ¨ Goal: avoiding stall cycles caused by branches ¨ Solution: static or dynamic branch predictor ¤ 1. prediction ¤ 2. validation and training ¤ 3. recovery from misprediction ¨ Performance is influenced by the frequency of branches (b), prediction accuracy (a), and misprediction cost (c) ��� ���� ��� 1 + �� ������� = = 234 = ��� ���� ���567 1 + 1 − � �� Problem ¨ A pipelined processor requires 3 stall cycles to compute the outcome of every branch before fetching next instruction; due to perfect forwarding/bypassing, no stall cycles are required for data/structural hazards; every 5th instruction is a branch.
    [Show full text]
  • Make the Most out of Last Level Cache in Intel Processors In: Proceedings of the Fourteenth Eurosys Conference (Eurosys'19), Dresden, Germany, 25-28 March 2019
    http://www.diva-portal.org Postprint This is the accepted version of a paper presented at EuroSys'19. Citation for the original published paper: Farshin, A., Roozbeh, A., Maguire Jr., G Q., Kostic, D. (2019) Make the Most out of Last Level Cache in Intel Processors In: Proceedings of the Fourteenth EuroSys Conference (EuroSys'19), Dresden, Germany, 25-28 March 2019. ACM Digital Library N.B. When citing this work, cite the original published paper. Permanent link to this version: http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-244750 Make the Most out of Last Level Cache in Intel Processors Alireza Farshin∗† Amir Roozbeh∗ KTH Royal Institute of Technology KTH Royal Institute of Technology [email protected] Ericsson Research [email protected] Gerald Q. Maguire Jr. Dejan Kostić KTH Royal Institute of Technology KTH Royal Institute of Technology [email protected] [email protected] Abstract between Central Processing Unit (CPU) and Direct Random In modern (Intel) processors, Last Level Cache (LLC) is Access Memory (DRAM) speeds has been increasing. One divided into multiple slices and an undocumented hashing means to mitigate this problem is better utilization of cache algorithm (aka Complex Addressing) maps different parts memory (a faster, but smaller memory closer to the CPU) in of memory address space among these slices to increase order to reduce the number of DRAM accesses. the effective memory bandwidth. After a careful study This cache memory becomes even more valuable due to of Intel’s Complex Addressing, we introduce a slice- the explosion of data and the advent of hundred gigabit per aware memory management scheme, wherein frequently second networks (100/200/400 Gbps) [9].
    [Show full text]
  • Migration from IBM 750FX to MPC7447A by Douglas Hamilton European Applications Engineering Networking and Computing Systems Group Freescale Semiconductor, Inc
    Freescale Semiconductor AN2808 Application Note Rev. 1, 06/2005 Migration from IBM 750FX to MPC7447A by Douglas Hamilton European Applications Engineering Networking and Computing Systems Group Freescale Semiconductor, Inc. Contents 1 Scope and Definitions 1. Scope and Definitions . 1 2. Feature Overview . 2 The purpose of this application note is to facilitate migration 3. 7447A Specific Features . 12 from IBM’s 750FX-based systems to Freescale’s 4. Programming Model . 16 MPC7447A. It addresses the differences between the 5. Hardware Considerations . 27 systems, explaining which features have changed and why, 6. Revision History . 30 before discussing the impact on migration in terms of hardware and software. Throughout this document the following references are used: • 750FX—which applies to Freescale’s MPC750, MPC740, MPC755, and MPC745 devices, as well as to IBM’s 750FX devices. Any features specific to IBM’s 750FX will be explicitly stated as such. • MPC7447A—which applies to Freescale’s MPC7450 family of products (MPC7450, MPC7451, MPC7441, MPC7455, MPC7445, MPC7457, MPC7447, and MPC7447A) except where otherwise stated. Because this document is to aid the migration from 750FX, which does not support L3 cache, the L3 cache features of the MPC745x devices are not mentioned. © Freescale Semiconductor, Inc., 2005. All rights reserved. Feature Overview 2 Feature Overview There are many differences between the 750FX and the MPC7447A devices, beyond the clear differences of the core complex. This section covers the differences between the cores and then other areas of interest including the cache configuration and system interfaces. 2.1 Cores The key processing elements of the G3 core complex used in the 750FX are shown below in Figure 1, and the G4 complex used in the 7447A is shown in Figure 2.
    [Show full text]
  • Sgxometer: Open and Modular Benchmarking for Intel SGX
    SGXoMeter: Open and Modular Benchmarking for Intel SGX Mohammad Mahhouk Nico Weichbrodt Rüdiger Kapitza TU Braunschweig, Germany TU Braunschweig, Germany TU Braunschweig, Germany ABSTRACT mobile devices like phones and tablets. Also, personal computers, Intel’s Software Guard Extensions (SGX) are currently the most laptops and servers can be secured using AMD Secure Encrypted wide-spread commodity trusted execution environment, which Virtualisation [22, 23], and Intel SGX [24, 27]. provides integrity and confidentiality of sensitive code and data. Intel SGX promises with its isolated memory regions, so called Thereby, it offers protection even against privileged attackers and enclaves, both confidentiality and integrity protection of the sen- various forms of physical attacks. As a technology that only be- sitive data and code running inside them against malicious and came available in late 2015, it has received massive interest and privileged software. It also provides local and remote attestation undergone a rapid evolution. Despite first ad-hoc attempts, there is mechanisms [21] to ensure the authenticity and integrity of the so far no standardised approach to benchmark the SGX hardware, running enclaves, adding protection against forging attempts. Thus, its associated environment, and techniques that were designed to the utilisation of SGX in cloud services can considerably reduce harden SGX-based applications. the customers’ reluctance of using them. Furthermore, Intel has re- In this paper, we present SGXoMeter, an open and modular leased a Software Development Kit (SDK)[18] to ease programming framework designed to benchmark different SGX-aware CPUs, with SGX. It introduces wrappers for low-level instructions and `code revisions, SDK versions and extensions to mitigate side- provides a high-level interface that offers multiple functionalities, channel attacks.
    [Show full text]
  • A Superscalar Out-Of-Order X86 Soft Processor for FPGA
    A Superscalar Out-of-Order x86 Soft Processor for FPGA Henry Wong University of Toronto, Intel [email protected] June 5, 2019 Stanford University EE380 1 Hi! ● CPU architect, Intel Hillsboro ● Ph.D., University of Toronto ● Today: x86 OoO processor for FPGA (Ph.D. work) – Motivation – High-level design and results – Microarchitecture details and some circuits 2 FPGA: Field-Programmable Gate Array ● Is a digital circuit (logic gates and wires) ● Is field-programmable (at power-on, not in the fab) ● Pre-fab everything you’ll ever need – 20x area, 20x delay cost – Circuit building blocks are somewhat bigger than logic gates 6-LUT6-LUT 6-LUT6-LUT 3 6-LUT 6-LUT FPGA: Field-Programmable Gate Array ● Is a digital circuit (logic gates and wires) ● Is field-programmable (at power-on, not in the fab) ● Pre-fab everything you’ll ever need – 20x area, 20x delay cost – Circuit building blocks are somewhat bigger than logic gates 6-LUT 6-LUT 6-LUT 6-LUT 4 6-LUT 6-LUT FPGA Soft Processors ● FPGA systems often have software components – Often running on a soft processor ● Need more performance? – Parallel code and hardware accelerators need effort – Less effort if soft processors got faster 5 FPGA Soft Processors ● FPGA systems often have software components – Often running on a soft processor ● Need more performance? – Parallel code and hardware accelerators need effort – Less effort if soft processors got faster 6 FPGA Soft Processors ● FPGA systems often have software components – Often running on a soft processor ● Need more performance? – Parallel
    [Show full text]
  • The Case for Hardware Transactional Memory
    Lecture 20: Transactional Memory CMU 15-418: Parallel Computer Architecture and Programming (Spring 2012) Giving credit ▪ Many of the slides in today’s talk are the work of Professor Christos Kozyrakis (Stanford University) (CMU 15-418, Spring 2012) Raising level of abstraction for synchronization ▪ Machine-level synchronization prims: - fetch-and-op, test-and-set, compare-and-swap ▪ We used these primitives to construct higher level, but still quite basic, synchronization prims: - lock, unlock, barrier ▪ Today: - transactional memory: higher level synchronization (CMU 15-418, Spring 2012) What you should know ▪ What a transaction is ▪ The difference between the atomic construct and locks ▪ Design space of transactional memory implementations - data versioning policy - con"ict detection policy - granularity of detection ▪ Understand HW implementation of transaction memory (consider how it relates to coherence protocol implementations we’ve discussed in the past) (CMU 15-418, Spring 2012) Example void deposit(account, amount) { lock(account); int t = bank.get(account); t = t + amount; bank.put(account, t); unlock(account); } ▪ Deposit is a read-modify-write operation: want “deposit” to be atomic with respect to other bank operations on this account. ▪ Lock/unlock pair is one mechanism to ensure atomicity (ensures mutual exclusion on the account) (CMU 15-418, Spring 2012) Programming with transactional memory void deposit(account, amount){ void deposit(account, amount){ lock(account); atomic { int t = bank.get(account); int t = bank.get(account); t = t + amount; t = t + amount; bank.put(account, t); bank.put(account, t); unlock(account); } } } nDeclarative synchronization nProgrammers says what but not how nNo explicit declaration or management of locks nSystem implements synchronization nTypically with optimistic concurrency nSlow down only on true con"icts (R-W or W-W) (CMU 15-418, Spring 2012) Declarative vs.
    [Show full text]
  • Robust Architectural Support for Transactional Memory in the Power Architecture
    Robust Architectural Support for Transactional Memory in the Power Architecture Harold W. Cain∗ Brad Frey Derek Williams IBM Research IBM STG IBM STG Yorktown Heights, NY, USA Austin, TX, USA Austin, TX, USA [email protected] [email protected] [email protected] Maged M. Michael Cathy May Hung Le IBM Research IBM Research (retired) IBM STG Yorktown Heights, NY, USA Yorktown Heights, NY, USA Austin, TX, USA [email protected] [email protected] [email protected] ABSTRACT in current p795 systems, with 8 TB of DRAM), as well as On the twentieth anniversary of the original publication [10], strengths in RAS that differentiate it in the market, adding following ten years of intense activity in the research lit- TM must not compromise any of these virtues. A robust erature, hardware support for transactional memory (TM) system is one that is sturdy in construction, a trait that has finally become a commercial reality, with HTM-enabled does not usually come to mind in respect to HTM systems. chips currently or soon-to-be available from many hardware We structured TM to work in harmony with features that vendors. In this paper we describe architectural support for support the architecture's scalability. Our goal has been to TM provide a comprehensive programming environment includ- TM added to a future version of the Power ISA . Two im- ing support for simple system calls and debug aids, while peratives drove the development: the desire to complement providing a robust (in the sense of "no surprises") execu- our weakly-consistent memory model with a more friendly tion environment with reasonably consistent performance interface to simplify the development and porting of multi- and without unexpected transaction failures.2 TM must be threaded applications, and the need for robustness beyond usable throughout the system stack: in hypervisors, oper- that of some early implementations.
    [Show full text]
  • 2.2 Adaptive Routing Algorithms and Router Design 20
    https://theses.gla.ac.uk/ Theses Digitisation: https://www.gla.ac.uk/myglasgow/research/enlighten/theses/digitisation/ This is a digitised version of the original print thesis. Copyright and moral rights for this work are retained by the author A copy can be downloaded for personal non-commercial research or study, without prior permission or charge This work cannot be reproduced or quoted extensively from without first obtaining permission in writing from the author The content must not be changed in any way or sold commercially in any format or medium without the formal permission of the author When referring to this work, full bibliographic details including the author, title, awarding institution and date of the thesis must be given Enlighten: Theses https://theses.gla.ac.uk/ [email protected] Performance Evaluation of Distributed Crossbar Switch Hypermesh Sarnia Loucif Dissertation Submitted for the Degree of Doctor of Philosophy to the Faculty of Science, Glasgow University. ©Sarnia Loucif, May 1999. ProQuest Number: 10391444 All rights reserved INFORMATION TO ALL USERS The quality of this reproduction is dependent upon the quality of the copy submitted. In the unlikely event that the author did not send a com plete manuscript and there are missing pages, these will be noted. Also, if material had to be removed, a note will indicate the deletion. uest ProQuest 10391444 Published by ProQuest LLO (2017). Copyright of the Dissertation is held by the Author. All rights reserved. This work is protected against unauthorized copying under Title 17, United States C ode Microform Edition © ProQuest LLO. ProQuest LLO.
    [Show full text]