Memory-Based Side-Channel Attacks and Countermeasures

Memory-Based Side-Channel Attacks and Countermeasures A Dissertation Presented by Zhen Hang Jiang to The Department of Electrical and Computer Engineering in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Computer Engineering Northeastern University Boston, Massachusetts July 2019 To my parents, wife, brother, and sister. i Contents Acknowledgments v Abstract of the Dissertation vi 1 Introduction 1 1.1 Motivation . .1 1.2 Existing Memory-Based Side-Channel Attacks and Countermeasures . .3 1.3 Dissertation Overview . .5 1.4 Dissertation Contribution . .5 2 Information Leakage in Memory Coalescing Unit 7 2.1 Introduction . .7 2.2 Related Work . .9 2.3 Background . 10 2.3.1 GPU Memory Architecture . 10 2.3.2 AES GPU Implementation . 11 2.4 Correlation Timing Attack . 14 2.4.1 SIMT Architecture Leakage . 15 2.4.2 AES Encryption Leakage . 19 2.4.3 Correlation Timing Attack on GPU AES Implementation . 24 2.4.4 Attack on Highly Occupied GPU . 31 2.4.5 Discussion . 34 2.5 Countermeasures . 35 2.6 Summary . 35 3 Information Leakage in Shared Memory Banks 36 3.1 Introduction . 36 3.2 Background . 38 3.2.1 AES Encryption . 38 3.2.2 Nvidia GPU Memory Hierarchy . 39 3.2.3 Single Instruction Multiple Threads Execution Model . 41 3.3 Threat Model . 42 3.4 Cache Bank Conflicts-Based Side-Channel Timing Channel . 42 3.5 Differential Timing Attack . 44 ii 3.5.1 Mapping Between the AES Lookup Tables and GPU Shared Memory Banks 46 3.5.2 Collecting Data . 46 3.5.3 Calculating the Shared Memory Bank Index . 46 3.5.4 Recovering Key Bytes . 47 3.5.5 More Realistic Attack Scenarios . 52 3.6 Timing Analysis on Other Architectures . 55 3.7 Discussions and Countermeasures . 59 3.7.1 Multi-Key Implementation As Countermeasure . 61 3.8 Summary . 65 4 Information Leakage in L1 Cache Banks 66 4.1 Introduction . 66 4.2 Background . 67 4.2.1 AES Encryption . 67 4.2.2 Intel Cache Architecture . 68 4.2.3 Cache Timing Attacks . 69 4.2.4 Countermeasures against Cache Timing Attacks . 70 4.2.5 L1 Cache Bank and CacheBleed Attack . 71 4.3 Cache Bank Timing . 73 4.3.1 Threat Model . 73 4.3.2 The Cache Bank Timing Channel . 73 4.3.3 Attacking AES Encryption . 74 4.4 Countermeasures . 83 4.5 Summary . 84 5 The Countermeasure - MemPoline 85 5.1 Introduction . 85 5.2 Background and Related Work . 87 5.2.1 Microarchitecture of the Memory Hierarchy . 87 5.2.2 Data Memory Access Footprint . 88 5.2.3 Vulnerable Ciphers . 90 5.3 Threat Model . 91 5.4 Our Countermeasure - MemPoline .......................... 91 5.4.1 Design Overview . 91 5.4.2 Define the Data Structures . 93 5.4.3 Initialization - Loading Original Sensitive Data . 94 5.4.4 Epochs of Permuting . 95 5.4.5 Security Analysis . 98 5.4.6 Operations Analysis . 98 5.4.7 Implementation - API . 99 5.5 Evaluation . 101 5.5.1 Experimental Setup . 101 5.5.2 Security Evaluation of AES . 101 5.5.3 Performance Evaluation . 107 5.6 Summary . 108 iii 6 Conclusion 110 Bibliography 112 iv Acknowledgments I would like to express my deepest gratitude to my advisor, Professor Yunsi Fei, and my dissertation committee members, Professors David Kaeli, Adam Ding, and Thomas Wahl, for their invaluable advice and continual support throughout my PhD study at Northeastern University. Finally, my sincere appreciation goes to my wife, for her encouragement and being the consummate partner in all aspects of life, and my parents, brother, and sister, for their unconditional and constant love and support. v Abstract of the Dissertation Memory-Based Side-Channel Attacks and Countermeasures by Zhen Hang Jiang Doctor of Philosophy in Computer Engineering Northeastern University, July 2019 Dr. Yunsi Fei, Advisor Recent years have seen various side-channel timing attacks demonstrated on both CPUs and GPUs, in diverse settings such as desktops, clouds, and mobile systems. These attacks observe events on different shared resources on the memory hierarchy from timing information, then the secret-dependent memory access pattern is inferred, and finally, the secret is retrieved through statistical analysis. We generalize these attacks as memory-based side-channel attacks. In this dissertation, we identify several side-channel vulnerabilities in memory resources on both GPU and CPU platforms and propose novel side-channel attacks to exploit these vulnerabilities for secret retrieval. Specifically, We examine the memory coalescing unit and Shared Memory unit on GPU platforms, and L1 cache bank on CPU platforms. These microarchitectural resources, indispensable for performance optimization, inadvertently leak applications’ memory access pattern. We craft memory-based side-channel attacks to capture such leakage and exploit it to successfully recover the entire 16-byte key of Advanced Encryption Standard (AES). As memory-based side-channel attacks are very powerful and many common microarchitecture resources on various system are vulnerable, defenses against them should be sought after. Based on the insight that all existing memory-based side-channel attacks (including our proposed ones) exploit the fixed mapping between the content and memory resources, we propose a novel vi software countermeasure, MemPoline, against memory-based side-channel attacks. MemPoline hides the secret-dependent memory access pattern by moving sensitive data around randomly within a memory space. Although an adversary may still observe events on microarchitecture resources, the randomness prevents her from retrieving useful secret information. We implement efficient permutations directed by parameters, significantly lighter weight than the prior oblivious RAM technology, yet achieving similar security. The countermeasure only requires changes in the source code, and has great advantages of being general - algorithm-agnostic, portable - independent of the underlying architecture, and compatible - a user-space approach that works for any operating system or hypervisor. The contributions of this dissertation include identification of several new memory-based side-channels on CPUs and GPUs, which are weaker than the traditional CPU cache side-channel but are on different microarchitecture resources and therefore orthogonal to cache side-channel countermeasures. The proposed software countermeasure addresses the root cause of memory-based side-channel attacks and effectively protects cryptographic implementations on both CPUs and GPUs against all these memory-based attacks with a minimal performance impact. vii Chapter 1 Introduction This dissertation focuses on memory-based side-channel attacks, which exploit the memory access footprint inferred from observable microarchitectural events, and countermeasures that prevent these attacks. In this chapter, we start with motivations for further investigations of memory- based side-channel attacks beyond the existing work, and then give an overview of the attacks and countermeasures proposed in this dissertation. Finally, we summarize the contribution of this dissertation. 1.1 Motivation Cryptography plays a crucial role in providing three fundamental security properties: con- fidentiality, integrity, and authenticity, through various cryptographic functions including encryption, hashing, signing, authentication, etc. Rather than relying on “secure by obscurity,” information security relies on only keys being secret while the algorithms and even implementations all being open and standardized. Hence, adequately protecting secret key is critical in order to deliver the security guarantee. Since the very first successful key-recovery demonstration of Differential Power Analysis (DPA) [1] by Kocher et al., side-channel attacks have changed the notion of “security” for cryptographic algorithms despite their mathematically proven security. Various side channels, including the most common power consumption and electromagnetic (EM) emanation, have been leveraged to break cryptographic engines, such as Advanced Encryption Standard (AES) and RSA, on many platforms, such as FPGA [2] ASIC [3], and GPUs [4]. While this type of attacks requires physical access to a targeted system to obtain the physical side-channel information, memory-based side-channel 1 CHAPTER 1. INTRODUCTION attacks can be mounted remotely, presenting a serious cyber threat to cryptographic software, servers, and cloud services. Memory-based side-channel attacks, which exploit the memory access footprint inferred from observable microarchitectural events, have gained the popularity in the side-channel security community and become a serious cyber threat to not only cryptographic implementations but also general software bearing secrets. For example, researchers have demonstrated successfully in recovering a full encryption key [5, 6, 7] and logging keyboard events [8, 9, 10] using memory-based side-channel attacks. Most of memory-based side-channel attacks target one of memory resources, the cache structure, and exploit its significant difference in a cache hit vs. miss access time. With the introduction of programmable shader cores and high-level programming frame- works [11, 12], GPUs have been integrated into complex heterogeneous computer systems for accelerating applications. Given their ability to provide high throughput and efficiency, GPUs are now being leveraged to offload cryptographic workloads from CPUs [13, 14, 15, 16, 17, 18]. This move to the GPU allows cryptographic processing to achieve up to 28X higher throughput [13]. While an increasing number of security systems are deploying GPUs, the security of GPU execution has not been well studied. In this dissertation, we take the first step and thoroughly analyze two memory resources on GPUs, Memory Coalescing unit and banked Shared Memory unit, and discover side-channel timing leakage of these two resources, and devise two memory-based side-channel attacks to successfully break the 16-byte AES encryption on a GPU. Similar to banked Shared Memory unit, the L1 cache of modern complex processors is also banked in order to achieve high bandwidth for superscalar processors and reduce the power consumption. Rather than a monolithic piece of a microarchitectural module, L1-cache is composed of multiple cache banks, which allow multiple concurrent accesses to different cache banks at one time.

Memory-Based Side-Channel Attacks and Countermeasures

Memory Interference Characterization Between CPU Cores and Integrated Gpus in Mixed-Criticality Platforms

A Cache-Efficient Sorting Algorithm for Database and Data Mining

How to Solve the Current Memory Access and Data Transfer Bottlenecks: at the Processor Architecture Or at the Compiler Level?

Prefetching for Complex Memory Access Patterns

Eth-28647-02.Pdf

Detecting False Sharing Efficiently and Effectively

A Hybrid Analytical DRAM Performance Model

Performance Analysis of Cache Memory

Classifying Memory Access Patterns for Prefetching

STEALTHMEM: System-Level Protection Against Cache-Based Side Channel Attacks in the Cloud

Embedded Memory Architecture for Low-Power Application Processor

A Visual Performance Analysis Tool for Memory Bound GPU Kernels