Are We Witnessing the Spectre of an HPC Meltdown?
Total Page:16
File Type:pdf, Size:1020Kb
Are We Witnessing the Spectre of an HPC Meltdown? Veronica´ G. Vergara Larrea, Michael J. Brim, Wayne Joubert, Swen Boehm, Matthew Baker, Oscar Hernandez, Sarp Oral, James Simmons, and Don Maxwell Oak Ridge Leadership Computing Facility Oak Ridge National Laboratory, Oak Ridge, TN 37830 Email: fvergaravg,brimmj,joubert,boehms,bakermb,oscar,oralhs,simmonsja,[email protected] Abstract—We measure and analyze the performance ob- applications that perform a large number of system calls served when running applications and benchmarks before and (e.g., I/O operations). Furthermore, the performance impacts after the Meltdown and Spectre fixes have been applied to the are expected to vary depending on the parallel file system Cray supercomputers and supporting systems at the Oak Ridge Leadership Computing Facility (OLCF). Of particular interest used (e.g., GPFS vs Lustre). is the effect of these fixes on applications selected from the In this work, performance results obtained from user OLCF portfolio when running at scale. This comprehensive facing systems, including Cray supercomputers at the Oak study presents results from experiments run on Titan, Eos, Ridge Leadership Computing Facility (OLCF) are presented. Cumulus, and Percival supercomputers at the OLCF. The This study includes experiments conducted on Titan (Cray results from this study are useful for HPC users running on Cray supercomputers, and serve to better understand the XK7), Eos (Intel Sandy Bridge Cray XC30), Cumulus (Intel impact that these two vulnerabilities have on diverse HPC Broadwell Cray XC40), and Percival (Intel Knights Landing workloads at scale. Cray XC40) supercomputers, as well as supporting systems. Keywords-high performance computing; performance analy- The applications and benchmarks used for these experiments sis; security vulnerabilities; Spectre; Meltdown; are selected from the acceptance test plans for each system. In order to provide a comprehensive view of the performance I. INTRODUCTION impact on HPC workloads, the test set was expanded to In early January 2018, two major security vulnerabil- include applications representative of the current OLCF ities present in the majority of modern processors were portfolio. We measure and analyze the performance observed announced. The Spectre [1] and Meltdown [2] vulnerabilities when running applications and benchmarks before and after take advantage of hardware speculative execution to expose the Meltdown and Spectre mitigations have been applied. Of kernel or application data stored in memory or caches to particular interest to this work is the effect of these fixes on other applications running on the same system, bypassing applications running at scale. security privileges and permissions. Fixes have started to If confirmed, a significant performance hit would have be released from the different vendors as microcode and enormous impact to the high performance computing (HPC) firmware updates. Mitigations have also been released in community. Since the majority of OLCF’s users are fo- the form of operating system kernel patches. cused on scientific simulation and modeling, a performance Current reports indicate that processors from Intel, ARM, degradation equates to reduced scientific output for a given IBM, Qualcomm, and AMD are impacted to various degrees. allocation of computing time. As a result, users may not Beyond posing a serious security risk, initial reports show reach their planned research milestones. Furthermore, a that available fixes for Meltdown can impact performance widespread performance impact may justify reexamining anywhere from just a few percent to nearly 50% depending community measures of performance, such as the well on the application. The highest impacts are expected in known Top500 list [3] which ranks supercomputers all over the world, after patches are applied. Notice of Copyright. This manuscript has been authored by UT- Battelle, LLC under Contract No. DE-AC05-00OR22725 with the U.S. II. BACKGROUND Department of Energy. The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the A. Understanding Spectre and Meltdown United States Government retains a non-exclusive, paid-up, irrevocable, world-wide license to publish or reproduce the published form of this Processors can at times be natural procrastinators. Over manuscript, or allow others to do so, for United States Government purposes. The Department of Energy will provide public access to these the years techniques were developed to limit the impact results of federally sponsored research in accordance with the DOE Public when the processor stalls. All processors are governed by Access Plan (http://energy.gov/downloads/doe-public-access-plan). a clock which dictates how much of an instruction can be CCPE Special Issue. This paper has been accepted and will be published as an article in a special issue of Concurrency and Computation Practice completed. Since most instructions cannot be completed in and Experience on the Cray User Group 2018. a single clock cycle, the instruction is broken into different independent stages that can each be executed in one clock Meltdown and Spectre exploit the fact that L3 cache is cycle. That is referred to as instruction pipelining. not correctly cleaned up after misprediction. Using L3 cache To maximize the processor’s performance, several mi- as an exploit vector was popularized by Yuval Yarom and croinstructions are handled in parallel. The depth of the Katrina Falkner [6]. Spectre and Meltdown do not rely on pipeline tells us how many microinstructions are handled misbehavior of the L3 cache. Rather, they use it as a side in parallel. Typically x86 processor can have pipelines that channel to exfiltrate data. They leverage the fact that data are 32 deep. This approach has a great performance benefit may be loaded into L3 cache based on bad predictions and for the processor as long as the instruction is completed not cleared from the cache upon misprediction. Thus, a before the next one. When this condition is not met, it is separate process may use processor-specific instructions to referred to as a hazard. Speculative execution is one class access the cache contents directly to find which memory of hazard. The exploits covered in this paper take advantage values are present and in turn deduce the values used in the of overlooked design details of this optimization. bad speculation. Spectre and Meltdown are different variants of a specu- lative execution vulnerability that impacts the majority of B. Spectre V1: Condition misprediction modern computer processors on the market [4]. Speculation Spectre V1 relies on mispredicting a conditional branch. occurs when the information needed for the next instruction The example given in the Spectre paper uses an array is not known, but the processor makes an educated guess. boundary check. In normal execution, the array boundary For example, branch speculation occurs when the processor will not be exceeded, thus the processor will speculate that guesses the result of a branch condition and speculatively the branch doing the boundary checking will be true. executes future instructions at the predicted branch target. /∗ In this example array a is an array that is cache hot. ∗ a r r a y a does not contain the data targeted for leaking, but Another form of branch speculation is when the processor ∗ index+array a will point to the memory the attacker wishes to exfiltrate. ∗ The variable index and array b are attacker controlled; guesses the target address of an indirect branch. If the target ∗ a r r a y a size and all indexes of array b are flushed from cache; is correctly predicted, performance can be greatly improved ∗/ i f ( i n de x < a r r a y a s i z e ) f by speculative execution. When an incorrect prediction (i.e., a r r a y value = array b [ a r r a y a [ i nd e x ]∗256]; mispredict) happens, the processor needs to roll back the g execution state. The processor discards the speculated pro- When executing the above code with the cache in the gram state to prevent user-visible effects of the speculative state as described, the processor will stall, waiting for execution. Vulnerabilities occur when speculated program array a size. The processor will then speculatively fetch state is not properly discarded. Exploit vectors include any array a[index] on the assumption that index < array a size processor hardware that is not fully and securely restored to evaluates to true. Index is an offset large enough to point its preprediction state, with typical targets being for example to another array, one with secret data (e.g., an encryption L3 caches (cache pollution) and I/O ports. private key). The speculative execution engine will then load There are three documented vulnerability variants, each into L3 cache a value from array b based on a calculation having been assigned a separate Common Vulnerabilities using values from array a. and Exposures (CVEs) number. CVE-2017-5753 (V1) and After speculative results are discarded, an attacker can- CVE-2017-5715 (V2) are the two variants known as Spectre, not directly inspect results of access to array b. However, and CVE-2017-5754 (V3) is the variant known as Meltdown. array b had a memory load performed, making its value In January 2018, Kocher et al. described Spectre [1], a new present in L3 cache. The attacker can then scan array b, time type of microarchitecture attack that exploits speculative the access of each index, and reconstruct the value from the execution by forcing the processor to execute instructions out-of-bounds memory access to array a based on whether outside the correct path of execution of a program. Melt- the access to array b was from main memory (slow) or L3 down [2] does not rely on branch prediction, but exploits cache (fast). Example code is available in Spectre paper [1]. some modern processors that perform multiple memory references without page table permission while executing C.