GILES-DOCUMENT-2019.Pdf

Total Page:16

File Type:pdf, Size:1020Kb

GILES-DOCUMENT-2019.Pdf ABSTRACT Hardware Transactional Persistent Memory by Ellis Giles Recent years have witnessed a sharp shift towards real-time data-driven and high- throughput applications, impelled by pervasive multi-core architectures and paral- lel programming models. This shift has spurred a broad adoption of in-memory databases and massively-parallel transaction processing across scientific, business, and industrial application domains. However, these applications are severely hand- icapped by the difficulties in maintaining persistence on typical durable media like hard-disk drives (HDDs) and solid-state drives (SSDs) without sacrificing either per- formance or reliability. Two emerging hardware developments hold enormous promise for transformative gains in both speed and scalability of concurrent data-intensive applications. The first is the arrival of Persistent Memory, or PM, a generic term for byte-addressable non- volatile memories, such as Intel's 3D XPointTM technology. The second is the avail- ability of CPU-based transaction support known as Hardware Transactional Mem- ory, or HTM, which makes it easier for applications to exploit multi-core concurrency without the need for expensive lock-based software. This thesis introduces Hardware Transactional Persistent Memory, the first union of HTM with PM without any changes to known processor designs or protocols, al- lowing for high-performance, concurrent, and durable transactions. The techniques presented are supported on three pillars: handling uncontrolled cache evictions from the processor cache hierarchy, logging to resist failure during persistent memory up- dates, and transaction ordering to permit consistent recovery from a machine crash. We develop pure software solutions that work with existing processor architectures as well as software-assisted solutions that exploit external memory controller hardware support. The thesis also introduces the notion of relaxed versus strict durability, allowing individual applications to tradeoff performance against robustness, while guaranteeing recovery to a consistent system state. Dedication This thesis is dedicated in memory of my father Vick Forrest Giles and as an inspi- ration to my children Alden Samuel Giles and Meredith Elise Giles. Acknowledgments First and foremost, I would like to thank my advisor Dr. Peter Varman for the many meetings and countless hours helping me develop research ideas and pointing out interesting areas for more exploration. Only with his guidance and support has this research, including this thesis and the many papers, been possible. I would also like to thank Dr. Kshitij Doshi from Intel Corporation for all of the guidance, insight, and assistance into the area of research along with the hours of help spent with papers and research ideas. Additionally, I would like to thank Dr. Joseph Cavallaro, Dr. Chris Jermaine, and Dr. Eugene Ng for providing valuable feedback and taking the time to serve on my thesis committee. Finally, I would like to thank all of my family. Without the continuing support from the entire family, I would not have been able to attend graduate school or immerse myself in research. My wife Mary-Helen was invaluable at providing support and helping with the children when graduate school made demands on my time. May this thesis inspire my children Alden Samuel Giles and Meredith Elise Giles to pursue higher education. My father Vick F. Giles also provided daily support for the children and guidance on my education. He would have wished to have seen the completion of the work and attended graduation. My mother and stepfather, Linda and Douglas Rowlett, also provided invaluable help and guidance along the way as well as inspiration with their Ph.D.'s from Rice University. I would also like to acknowledge the financial support from the National Science Foundation and the Software and Services Group at Intel Corporation. Contents Abstract ii Dedication iv Acknowledgments v List of Illustrations xi List of Tables xvii List of Algorithms xviii List of Theorems xix 1 Introduction and Overview 1 1.1 Introduction . .1 1.2 In-Memory Data Management Challenges . .2 1.3 Persistent Memory Overview . .4 1.3.1 Persistent Memory and Big Data . .6 1.4 Hardware Transactional Memory Overview . .7 1.5 HTM + PM Overview . 10 1.6 System Architecture Overview . 12 1.7 Contributions . 15 1.8 Thesis Organization . 18 2 Persistent Memory Transactions 20 2.1 Persistent Memory Access . 20 2.1.1 Persistent Memory Access Transaction Properties . 22 2.2 Durability for Persistent Memory . 24 2.2.1 Persistent Memory Models . 26 vii 2.3 Atomicity for Persistent Memory Transactions . 27 2.3.1 Undo Logging . 29 2.3.2 Block Copy-on-Write . 31 2.3.3 Redo Logging . 31 2.3.4 Versioning . 33 2.3.5 Summary . 34 2.4 Isolation for Concurrent Persistent Memory Transactions . 35 2.4.1 Two-Phase Locking . 36 2.4.2 Software Transactional Memory . 37 2.4.3 Hardware Transactional Memory . 39 2.4.4 Checkpoints . 41 2.4.5 Summary . 42 2.5 Related Persistent Memory Transaction Work . 43 2.5.1 Hardware Supported Transaction Work . 43 2.5.2 File System Work . 48 2.5.3 Software and Concurrent Transaction Work . 49 2.6 Summary . 53 3 Hardware Transactional Memory + Persistent Memory 54 3.1 HTM + PM Basics . 54 3.1.1 Fallback Paths . 56 3.2 Challenges Persisting HTM Transactions . 57 3.2.1 Logging Challenges . 60 3.2.2 Ordering Challenges . 64 3.3 Related Work . 66 3.4 Hardware Transactional Persistent Memory Model . 69 3.4.1 Ordering HTM Transactions . 74 3.4.2 System Checkpointing Model . 75 viii 3.4.3 Transaction Durability Model . 78 3.4.4 Transactional Durability Extensions . 79 3.4.5 Logging and Recovery . 80 3.4.6 Composability . 81 3.4.7 Example . 82 3.4.8 HTM Performance Tuning for Durability . 84 3.5 Summary . 92 4 Continuous Checkpointing HTM for PM 93 4.1 Background on Variable Aliasing for PM Transactions . 93 4.2 Continuous Checkpointing HTM for PM Approach . 95 4.2.1 Alias Table Model . 99 4.3 Algorithm and Implementation . 102 4.3.1 Blue and Red Queues . 103 4.3.2 Logging and Recovery . 104 4.3.3 Alias Table Implementation . 105 4.3.4 Lock Free Algorithm . 107 4.3.5 Data Structure Properties . 113 4.4 Evaluation . 114 4.4.1 Benchmarks . 116 4.4.2 Hash Table . 121 4.4.3 Red-Black Tree . 125 4.5 Summary . 128 5 Hardware Transactional Persistent Memory Controller 130 5.1 Background on WrAP Controller . 130 5.2 Persistent Memory Controller Approach . 132 5.2.1 Transaction Lifecycle . 133 5.2.2 Persistent Memory Controller . 136 ix 5.2.3 Recovery . 139 5.2.4 Protocol Properties . 140 5.3 Algorithm and Implementation . 141 5.3.1 Software Library . 142 5.3.2 PM Controller Implementation . 145 5.3.3 Example . 149 5.4 Evaluation . 150 5.4.1 Benchmarks . 152 5.4.2 Hash Table . 156 5.4.3 Red-Black Tree . 159 5.4.4 Persistent Memory Controller Analysis . 160 5.5 Summary . 165 6 Future Work and Conclusions 166 6.1 Future Work . 166 6.1.1 Future Software Work . 166 6.1.2 Future Persistent Memory Controller Work . 169 6.2 Conclusions . 170 A Persisting In-Memory Databases Using PM 173 A.1 Overview . 173 A.2 Redis Enhancements . 175 A.2.1 Local Alias Table Batched . 177 A.3 Evaluation . 179 A.4 Summary . 181 B Persistent Memory Containers 182 B.1 Virtualization Overview . 182 B.2 Containerized Persistent Memory Approach . 185 x B.3 Evaluation . 188 B.4 Summary . 190 C Evaluation on Intel Optane DC PM DIMMs 192 C.1 Persistent Memory Configuration . 192 C.2 Evaluation . 194 C.2.1 Benchmarks . 196 C.2.2 Hash Table . 200 C.2.3 Red-Black Tree . 205 C.3 Summary . 207 Bibliography 208 Illustrations 1.1 Persistent Memory provides a large quantity of fast, byte-addressable persistent memory alongside DRAM on the main memory bus. .5 1.2 Logical layout of an Intel E5 Series High-Core Count processor that supports HTM transactions with 18 cores and two memory controllers with both DRAM and PM . 13 1.3 Logical layout of two Intel Xeon processors on a motherboard with a total of four memory controllers accessing both DRAM and PM with HTM support. Processors and cores have shared and cache-coherent access to all memory locations. 14 2.1 Persistent Memory can be accessed directly using normal loads and stores, bypassing the file system interface and page caches. 21 2.2 Writes to persistent memory from a core require moving through multiple buffers, complicating persistence. 24 2.3 Writes to PM may be evicted out of the cache hierarchy in any order. 29 2.4 Undo Logging approach for transactional atomicity of persistent memory . 30 2.5 Redo Logging approach for transactional atomicity of persistent memory . ..
Recommended publications
  • The Case for Hardware Transactional Memory
    Lecture 20: Transactional Memory CMU 15-418: Parallel Computer Architecture and Programming (Spring 2012) Giving credit ▪ Many of the slides in today’s talk are the work of Professor Christos Kozyrakis (Stanford University) (CMU 15-418, Spring 2012) Raising level of abstraction for synchronization ▪ Machine-level synchronization prims: - fetch-and-op, test-and-set, compare-and-swap ▪ We used these primitives to construct higher level, but still quite basic, synchronization prims: - lock, unlock, barrier ▪ Today: - transactional memory: higher level synchronization (CMU 15-418, Spring 2012) What you should know ▪ What a transaction is ▪ The difference between the atomic construct and locks ▪ Design space of transactional memory implementations - data versioning policy - con"ict detection policy - granularity of detection ▪ Understand HW implementation of transaction memory (consider how it relates to coherence protocol implementations we’ve discussed in the past) (CMU 15-418, Spring 2012) Example void deposit(account, amount) { lock(account); int t = bank.get(account); t = t + amount; bank.put(account, t); unlock(account); } ▪ Deposit is a read-modify-write operation: want “deposit” to be atomic with respect to other bank operations on this account. ▪ Lock/unlock pair is one mechanism to ensure atomicity (ensures mutual exclusion on the account) (CMU 15-418, Spring 2012) Programming with transactional memory void deposit(account, amount){ void deposit(account, amount){ lock(account); atomic { int t = bank.get(account); int t = bank.get(account); t = t + amount; t = t + amount; bank.put(account, t); bank.put(account, t); unlock(account); } } } nDeclarative synchronization nProgrammers says what but not how nNo explicit declaration or management of locks nSystem implements synchronization nTypically with optimistic concurrency nSlow down only on true con"icts (R-W or W-W) (CMU 15-418, Spring 2012) Declarative vs.
    [Show full text]
  • Robust Architectural Support for Transactional Memory in the Power Architecture
    Robust Architectural Support for Transactional Memory in the Power Architecture Harold W. Cain∗ Brad Frey Derek Williams IBM Research IBM STG IBM STG Yorktown Heights, NY, USA Austin, TX, USA Austin, TX, USA [email protected] [email protected] [email protected] Maged M. Michael Cathy May Hung Le IBM Research IBM Research (retired) IBM STG Yorktown Heights, NY, USA Yorktown Heights, NY, USA Austin, TX, USA [email protected] [email protected] [email protected] ABSTRACT in current p795 systems, with 8 TB of DRAM), as well as On the twentieth anniversary of the original publication [10], strengths in RAS that differentiate it in the market, adding following ten years of intense activity in the research lit- TM must not compromise any of these virtues. A robust erature, hardware support for transactional memory (TM) system is one that is sturdy in construction, a trait that has finally become a commercial reality, with HTM-enabled does not usually come to mind in respect to HTM systems. chips currently or soon-to-be available from many hardware We structured TM to work in harmony with features that vendors. In this paper we describe architectural support for support the architecture's scalability. Our goal has been to TM provide a comprehensive programming environment includ- TM added to a future version of the Power ISA . Two im- ing support for simple system calls and debug aids, while peratives drove the development: the desire to complement providing a robust (in the sense of "no surprises") execu- our weakly-consistent memory model with a more friendly tion environment with reasonably consistent performance interface to simplify the development and porting of multi- and without unexpected transaction failures.2 TM must be threaded applications, and the need for robustness beyond usable throughout the system stack: in hypervisors, oper- that of some early implementations.
    [Show full text]
  • Leveraging Modern Multi-Core Processor Features to Efficiently
    LEVERAGING MODERN MULTI-CORE PROCESSORS FEATURES TO EFFICIENTLY DEAL WITH SILENT ERRORS, AUGUST 2017 1 Leveraging Modern Multi-core Processor Features to Efficiently Deal with Silent Errors Diego Perez´ ∗, Thomas Roparsz, Esteban Meneses∗y ∗School of Computing, Costa Rica Institute of Technology yAdvanced Computing Laboratory, Costa Rica National High Technology Center zUniv. Grenoble Alpes, CNRS, Grenoble INP ' , LIG, F-38000 Grenoble France Abstract—Since current multi-core processors are more com- about 200% of resource overhead [3]. Recent contributions [1] plex systems on a chip than previous generations, some transient [4] tend to show that checkpointing techniques provide the errors may happen, go undetected by the hardware and can best compromise between transparency for the developer and potentially corrupt the result of an expensive calculation. Because of that, techniques such as Instruction Level Redundancy or efficient resource usage. In theses cases only one extra copy checkpointing are utilized to detect and correct these soft errors; is executed, which is enough to detect the error. But, still the however these mechanisms are highly expensive, adding a lot technique remains expensive with about 100% of overhead. of resource overhead. Hardware Transactional Memory (HTM) To reduce this overhead, some features provided by recent exposes a very convenient and efficient way to revert the state of processors, such as hardware-managed transactions, Hyper- a core’s cache, which can be utilized as a recovery technique. An experimental prototype has been created that uses such feature Threading and memory protection extensions, could be great to recover the previous state of the calculation when a soft error opportunities to implement efficient error detection and recov- has been found.
    [Show full text]
  • Uebayasi's Simple Template
    eXecute In Place support in NetBSD Masao “uebs” Uebayashi <[email protected]> BSDCan 2010 2010.5.13 Who am I NetBSD developer Japanese Living in Yokohama Self-employed Since Dec. 15 2008 Tombi Inc. Agenda Demonstration Introduction Program execution VM Virtual memory management Physical memory management Fault handler, pager : Design of XIP Demonstration XIP on NetBSD/arm (i.MX35) Introduction What is XIP? Execute programs directly from devices No memory copy Only about userland programs (Kernel XIP is another story) Introduction Who needs XIP? Embedded devices Memory saving for less power consumption Boot time Mainframes (Linux) Memory saving for virtualized instances : “Nothing in between” Introduction How to achieve XIP? Don't copy programs to memory when executing it “Execute” == mmap() : : : : : What does that *actually* mean? Goals No hacks Keep code cleanliness Keep abstraction Including device handling : : Performance Latency Memory efficiency Program execution execve(2) → sys_execve() Prepare Read program header using I/O Map sections Set program entry point Execute Page fault is triggered Load pages using VM Execution is resumed Program execution I/O part needs no changes If block device interface (d_strategy()) is provided VM part needs changes!!! Virtual memory management http://en.wikipedia.org/wiki/Virtual_memory Virtual memory is a computer system technique which gives an application program the impression that it has contiguous working memory (an address space), while in fact it may be physically fragmented and may even overflow on to disk storage. Developed for multitasking kernels, virtual memory provides two primary functions: Each process has its own address space, thereby not required to be relocated nor required to use relative addressing mode.
    [Show full text]
  • Lecture 21: Transactional Memory
    Lecture 21: Transactional Memory • Topics: Hardware TM basics, different implementations 1 Transactions • New paradigm to simplify programming . instead of lock-unlock, use transaction begin-end . locks are blocking, transactions execute speculatively in the hope that there will be no conflicts • Can yield better performance; Eliminates deadlocks • Programmer can freely encapsulate code sections within transactions and not worry about the impact on performance and correctness (for the most part) • Programmer specifies the code sections they’d like to see execute atomically – the hardware takes care of the rest (provides illusion of atomicity) 2 Transactions • Transactional semantics: . when a transaction executes, it is as if the rest of the system is suspended and the transaction is in isolation . the reads and writes of a transaction happen as if they are all a single atomic operation . if the above conditions are not met, the transaction fails to commit (abort) and tries again transaction begin read shared variables arithmetic write shared variables transaction end 3 Example Producer-consumer relationships – producers place tasks at the tail of a work-queue and consumers pull tasks out of the head Enqueue Dequeue transaction begin transaction begin if (tail == NULL) if (head->next == NULL) update head and tail update head and tail else else update tail update head transaction end transaction end With locks, neither thread can proceed in parallel since head/tail may be updated – with transactions, enqueue and dequeue can proceed in parallel
    [Show full text]
  • NOVA: a Log-Structured File System for Hybrid Volatile/Non
    NOVA: A Log-structured File System for Hybrid Volatile/Non-volatile Main Memories Jian Xu and Steven Swanson, University of California, San Diego https://www.usenix.org/conference/fast16/technical-sessions/presentation/xu This paper is included in the Proceedings of the 14th USENIX Conference on File and Storage Technologies (FAST ’16). February 22–25, 2016 • Santa Clara, CA, USA ISBN 978-1-931971-28-7 Open access to the Proceedings of the 14th USENIX Conference on File and Storage Technologies is sponsored by USENIX NOVA: A Log-structured File System for Hybrid Volatile/Non-volatile Main Memories Jian Xu Steven Swanson University of California, San Diego Abstract Hybrid DRAM/NVMM storage systems present a host of opportunities and challenges for system designers. These sys- Fast non-volatile memories (NVMs) will soon appear on tems need to minimize software overhead if they are to fully the processor memory bus alongside DRAM. The result- exploit NVMM’s high performance and efficiently support ing hybrid memory systems will provide software with sub- more flexible access patterns, and at the same time they must microsecond, high-bandwidth access to persistent data, but provide the strong consistency guarantees that applications managing, accessing, and maintaining consistency for data require and respect the limitations of emerging memories stored in NVM raises a host of challenges. Existing file sys- (e.g., limited program cycles). tems built for spinning or solid-state disks introduce software Conventional file systems are not suitable for hybrid mem- overheads that would obscure the performance that NVMs ory systems because they are built for the performance char- should provide, but proposed file systems for NVMs either in- acteristics of disks (spinning or solid state) and rely on disks’ cur similar overheads or fail to provide the strong consistency consistency guarantees (e.g., that sector updates are atomic) guarantees that applications require.
    [Show full text]
  • Container-Based Virtualization for Byte-Addressable NVM Data Storage
    2016 IEEE International Conference on Big Data (Big Data) Container-Based Virtualization for Byte-Addressable NVM Data Storage Ellis R. Giles Rice University Houston, Texas [email protected] Abstract—Container based virtualization is rapidly growing Storage Class Memory, or SCM, is an exciting new in popularity for cloud deployments and applications as a memory technology with the potential of replacing hard virtualization alternative due to the ease of deployment cou- drives and SSDs as it offers high-speed, byte-addressable pled with high-performance. Emerging byte-addressable, non- volatile memories, commonly called Storage Class Memory or persistence on the main memory bus. Several technologies SCM, technologies are promising both byte-addressability and are currently under research and development, each with dif- persistence near DRAM speeds operating on the main memory ferent performance, durability, and capacity characteristics. bus. These new memory alternatives open up a new realm of These include a ReRAM by Micron and Sony, a slower, but applications that no longer have to rely on slow, block-based very large capacity Phase Change Memory or PCM by Mi- persistence, but can rather operate directly on persistent data using ordinary loads and stores through the cache hierarchy cron and others, and a fast, smaller spin-torque ST-MRAM coupled with transaction techniques. by Everspin. High-speed, byte-addressable persistence will However, SCM presents a new challenge for container-based give rise to new applications that no longer have to rely on applications, which typically access persistent data through slow, block based storage devices and to serialize data for layers of block based file isolation.
    [Show full text]
  • Practical Enclave Malware with Intel SGX
    Practical Enclave Malware with Intel SGX Michael Schwarz, Samuel Weiser, Daniel Gruss Graz University of Technology Abstract. Modern CPU architectures offer strong isolation guarantees towards user applications in the form of enclaves. However, Intel's threat model for SGX assumes fully trusted enclaves and there doubt about how realistic this is. In particular, it is unclear to what extent enclave malware could harm a system. In this work, we practically demonstrate the first enclave malware which fully and stealthily impersonates its host application. Together with poorly-deployed application isolation on per- sonal computers, such malware can not only steal or encrypt documents for extortion but also act on the user's behalf, e.g., send phishing emails or mount denial-of-service attacks. Our SGX-ROP attack uses new TSX- based memory-disclosure primitive and a write-anything-anywhere prim- itive to construct a code-reuse attack from within an enclave which is then inadvertently executed by the host application. With SGX-ROP, we bypass ASLR, stack canaries, and address sanitizer. We demonstrate that instead of protecting users from harm, SGX currently poses a se- curity threat, facilitating so-called super-malware with ready-to-hit ex- ploits. With our results, we demystify the enclave malware threat and lay ground for future research on defenses against enclave malware. Keywords: Intel SGX, Trusted Execution Environments, Malware 1 Introduction Software isolation is a long-standing challenge in system security, especially if parts of the system are considered vulnerable, compromised, or malicious [24]. Recent isolated-execution technology such as Intel SGX [23] can shield software modules via hardware protected enclaves even from privileged kernel malware.
    [Show full text]
  • Taming Transactions: Towards Hardware-Assisted Control Flow Integrity Using Transactional Memory
    Taming Transactions: Towards Hardware-Assisted Control Flow Integrity Using Transactional Memory B Marius Muench1( ), Fabio Pagani1, Yan Shoshitaishvili2, Christopher Kruegel2, Giovanni Vigna2, and Davide Balzarotti1 1 Eurecom, Sophia Antipolis, France {marius.muench,fabio.pagani,davide.balzarotti}@eurecom.fr 2 University of California, Santa Barbara, USA {yans,chris,vigna}@cs.ucsb.edu Abstract. Control Flow Integrity (CFI) is a promising defense tech- nique against code-reuse attacks. While proposals to use hardware fea- tures to support CFI already exist, there is still a growing demand for an architectural CFI support on commodity hardware. To tackle this problem, in this paper we demonstrate that the Transactional Synchro- nization Extensions (TSX) recently introduced by Intel in the x86-64 instruction set can be used to support CFI. The main idea of our approach is to map control flow transitions into transactions. This way, violations of the intended control flow graphs would then trigger transactional aborts, which constitutes the core of our TSX-based CFI solution. To prove the feasibility of our technique, we designed and implemented two coarse-grained CFI proof-of-concept implementations using the new TSX features. In particular, we show how hardware-supported transactions can be used to enforce both loose CFI (which does not need to extract the control flow graph in advance) and strict CFI (which requires pre-computed labels to achieve a better pre- cision). All solutions are based on a compile-time instrumentation. We evaluate the effectiveness and overhead of our implementations to demonstrate that a TSX-based implementation contains useful concepts for architectural control flow integrity support.
    [Show full text]
  • Storage Solutions for Embedded Applications
    White Paper Storage Solutions Brian Skerry Sr. Software Architect Intel Corporation for Embedded Applications December 2008 1 321054 Storage Solutions for Embedded Applications Executive Summary Any embedded system needs reliable access to storage. This may be provided by a hard disk drive or access to a remote storage device. Alternatively there are many flash solutions available on the market today. When considering flash, there are a number of important criteria to consider with capacity, cost, and reliability being foremost. This paper considers hardware, software, and other considerations in choosing a storage solution. Wear leveling is an important factor affecting the expected lifetime of any flash solution, and it can be implemented in a number of ways. Depending on the choices made, software changes may be necessary. Solid state drives offer the most straight forward replacement option for Hard disk drives, but may not be cost-effective for some applications. The Intel® X-25M Mainstream SATA Solid State Drive is one solution suitable for a high performance environment. For smaller storage requirements, CompactFlash* and USB flash are very attractive. Downward pressure continues to be applied to flash solutions, and there are a number of new technologies on the horizon. As a result of reading this paper, the reader will be able to take into consideration all the relevant factors in choosing a storage solution for an embedded system. Intel® architecture can benefit the embedded system designer as they can be assured of widespread
    [Show full text]
  • A Survey of Published Attacks on Intel
    1 A Survey of Published Attacks on Intel SGX Alexander Nilsson∗y, Pegah Nikbakht Bideh∗, Joakim Brorsson∗zx falexander.nilsson,pegah.nikbakht_bideh,[email protected] ∗Lund University, Department of Electrical and Information Technology, Sweden yAdvenica AB, Sweden zCombitech AB, Sweden xHyker Security AB, Sweden Abstract—Intel Software Guard Extensions (SGX) provides a Unfortunately, a relatively large number of flaws and attacks trusted execution environment (TEE) to run code and operate against SGX have been published by researchers over the last sensitive data. SGX provides runtime hardware protection where few years. both code and data are protected even if other code components are malicious. However, recently many attacks targeting SGX have been identified and introduced that can thwart the hardware A. Contribution defence provided by SGX. In this paper we present a survey of all attacks specifically targeting Intel SGX that are known In this paper, we present the first comprehensive review to the authors, to date. We categorized the attacks based on that includes all known attacks specific to SGX, including their implementation details into 7 different categories. We also controlled channel attacks, cache-attacks, speculative execu- look into the available defence mechanisms against identified tion attacks, branch prediction attacks, rogue data cache loads, attacks and categorize the available types of mitigations for each presented attack. microarchitectural data sampling and software-based fault in- jection attacks. For most of the presented attacks, there are countermeasures and mitigations that have been deployed as microcode patches by Intel or that can be employed by the I. INTRODUCTION application developer herself to make the attack more difficult (or impossible) to exploit.
    [Show full text]
  • Experiments Using IBM's STM Compiler
    Barna L. Bihari1, James Cownie2, Hansang Bae2, Ulrike M. Yang1 and Bronis R. de Supinski1 1Lawrence Livermore National Laboratory, Livermore, California 2Intel Corporation, Santa Clara, California OpenMPCon Developers Conference Nara, Japan, October 3-4, 2016 Technical: Trent D’Hooge (LLNL) Tzanio Kolev (LLNL) Supervisory: Lori Diachin (LLNL) Borrowed example and mesh figures: A.H. Baker, R.D. Falgout, Tz.V. Kolev, and U.M. Yang, Multigrid Smoothers for Ultraparallel Computing, SIAM J. Sci. Comput., 33 (2011), pp. 2864-2887. LLNL-JRNL-473191. Financial support: Dept. of Energy (DOE), Office of Science, ASCR DOE, under contract DE-AC52-07NA27344 INFORMATION IN THIS DOCUMENT IS PROVIDED “AS IS”. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO THIS INFORMATION INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. Intel, Xeon, and the Intel logo are trademarks of Intel Corporation in the U.S. and other countries. Optimization Notice Intel’s compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors.
    [Show full text]