Dolma: Securing Speculation with the Principle of Transient

DOLMA: Securing Speculation with the Principle of Transient Non-Observability Kevin Loughlin, Ian Neal, Jiacheng Ma, Elisa Tsai, Ofir Weisse, Satish Narayanasamy, and Baris Kasikci, University of Michigan https://www.usenix.org/conference/usenixsecurity21/presentation/loughlin This paper is included in the Proceedings of the 30th USENIX Security Symposium. August 11–13, 2021 978-1-939133-24-3 Open access to the Proceedings of the 30th USENIX Security Symposium is sponsored by USENIX. DOLMA: Securing Speculation with the Principle of Transient Non-Observability Kevin Loughlin, Ian Neal, Jiacheng Ma, Elisa Tsai, Ofir Weisse, Satish Narayanasamy, Baris Kasikci University of Michigan Abstract can be transmitted through covert timing channels. Thankfully, all known Meltdown-type attacks can be thwarted by handling Modern processors allow attackers to leak data during tran- potential exceptions earlier in the pipeline, such that transient sient (i.e., mis-speculated) execution through microarchitec- reads do not propagate data to dependent micro-ops [38, 76]. tural covert timing channels. While initial defenses were channel-specific, recent solutions employ speculative informa- The second class of attacks do not rely on delayed tion flow control in an attempt to automatically mitigate attacks exception handling, and instead solely exploit hardware via any channel. However, we demonstrate that the current mispredictions to leak data (e.g., Spectre [32] and similar state-of-the-art defense fails to mitigate attacks using specula- attacks [5, 13, 31, 32, 35, 40–42, 47, 60]). For instance, Spectre tive stores, still allowing arbitrary data leakage during transient v1 [32] shows that an attacker in one security domain can mis- execution. Furthermore, we show that the state of the art does train the branch predictor to transiently bypass a bounds check not scale to protect data in registers, incurring 30:8–63:4% in a victim domain, thereby allowing micro-ops following a overhead on SPEC 2017, depending on the threat model. branch to leak victim data. Contrary to Meltdown-type attacks, there is no known comprehensive solution for Spectre-type We then present DOLMA, the first defense to automatically attacks, apart from disabling speculation. provide comprehensive protection against all known transient execution attacks. DOLMA combines a lightweight specu- Because the majority of transient execution attacks use the D- lative information flow control scheme with a set of secure cache as the covert channel [10,13,31,32,35,38,40,41,47,48, performance optimizations. By enforcing a novel principle of 50,57,59,61,64,66,68,69,71 –73,77,81], initial defenses such transient non-observability, DOLMA ensures that a time slice as InvisiSpec [83] and others [1,29,30,36,54 –56] have focused on a core provides a unit of isolation in the context of existing on protecting the D-cache. However, these solutions do not attacks. Accordingly, DOLMA can allow speculative TLB/L1 prevent numerous other covert channels [5,9,42,53,60,70,82] cache accesses and variable-time arithmetic without loss of from leaking data during transient execution. security. On SPEC 2017, DOLMA achieves comprehensive Recent solutions [3,17, 34,58,76, 86,88] acknowledge the protection of data in memory at 10:2–29:7% overhead, adding shortcomings of cache-centric mitigations, and instead employ protection for data in registers at 22:6–42:2% overhead (8:2– speculative information flow control to prevent secrets from 21:2% less than the state of the art, with greater security). entering any covert timing channel until speculation resolves. Unfortunately, current defenses are not comprehensive. For 1 Introduction example, manual defenses [17, 58, 86] require error-prone Speculative execution is a crucial performance optimization annotations of secrets to limit performance overhead. for modern processors. Unfortunately, the ongoing deluge of On the other hand, existing automatic defenses [3, 76, 88] transient execution attacks [5, 10,13, 31,32, 35,38, 40 –42, 47, suffer from high overhead. As such, they focus on the pro- 48,50,53,57,59 –61,64,66,68,69,71 –73,77,81] demonstrates tection of speculatively-accessed data (e.g., data in memory that the implementation of speculative execution in commodity at the beginning of the speculation window) and fail to processors allows attackers to leak data during transient (i.e., comprehensively protect non-speculatively-accessed data (to mis-speculated or wrong-path) execution. Specifically, attack- a first approximation, data in registers at the beginning of the ers exploit transient micro-ops whose operands are leaked via speculation window). For example, NDA [76] conservatively covert timing channels—e.g., hardware structures like the data prohibits speculative micro-ops from propagating their results cache (D-cache), which exhibit operand-dependent timing. to any of their dependent micro-ops until speculation resolves. Transient execution attacks can be classified into two pri- Thus, NDA eschews knowledge of the microarchitecture to mary categories [9]. The first class of attacks rely on delayed achieve channel-agnostic protection, resulting in high over- handling of microarchitectural exception-like conditions— heads. NDA incurs 22:3% overhead to protect data in memory henceforth referred to as exceptions—to leak data (e.g., against Spectre-type attacks, and 100% overhead to supple- Meltdown [38] and similar attacks [10,48,50,53,59,61,64,68, mentally protect against Meltdown-type attacks on SPEC 69, 71, 72, 77]). In certain commodity processors, speculative 2017. To provide even partial protection for data in registers, reads can access data in spite of—or because of—exceptions. NDA’s performance overheads rise to 45–125%, respectively. The exception is not handled until the associated micro-op The current state-of-the-art defense, STT [86], uses reaches commit, offering attackers a window in which data speculative taint tracking to only delay dependent micro-ops USENIX Association 30th USENIX Security Symposium 1397 that affect processor backend timing (e.g., during execution) • We present a novel variant of Spectre [32] that uses or frontend timing (e.g., during fetch) as a function of their a speculative store to transmit data through the TLB, operands. Thus, STT is able to significantly improve upon demonstrating that the state-of-the-art defense (STT [88]) the overheads of channel-agnostic solutions such as NDA is still vulnerable to arbitrary data leakages. and variants of SpecShield [3]. Nonetheless, according to • We define and enforce the principle of transient non- our evaluation, the overhead of protecting data in memory observability, enabling secure speculative access to select with STT is still 8:7–44:5%, with those figures rising to core-local resources. 30:8–63:4% if one extends STT to protect data in registers. • We introduce DOLMA, the first defense to provide automatic More importantly, we demonstrate that STT still allows comprehensive protection against existing transient arbitrary data leakages during transient execution. Despite execution attacks for data in both memory and registers. documented transient execution attacks exploiting speculative • We improve both state-of-the-art security and performance, stores [10, 49, 66, 73], STT assumes stores in isolation are mitigating all existing transient execution attacks on data in safe unless the processor permits speculative cache line memory at 10:2–29:7% overhead, as well as those on data invalidations [66, 88]. However, even without speculative in registers at 22:6–42:2% on SPEC 2017 [63]. invalidations, stores can still leak information. In §3.3, Our implementation and evaluation infrastructure is open- we demonstrate a novel variant of Spectre [32] that uses a source [39], including our gem5-compatible transient speculative store to transmit data through the TLB, despite execution attack suite used for penetration testing. STT’s protections being enabled. Thus, STT does not yield 2 Background the comprehensive protection it claims to offer; an attacker can still leak arbitrary data under both its Spectre-type threat We first give background on speculative execution in modern model and Meltdown-type threat model. out-of-order processors. We then describe how transient (i.e., In this paper, we present DOLMA, the first defense to mis-speculated) execution can be exploited to leak secrets. automatically provide comprehensive protection against all 2.1 Speculative, Out-of-Order Processors existing transient execution attacks. DOLMA combines a speculative information flow control scheme with a set of A modern out-of-order (OoO) processor fetches instructions secure performance optimizations, allowing it to protect data in in program order and decodes them into micro-ops. OoO both memory and—optionally—registers at tenable overhead. processors keep track of program order via a circular queue At a high level, DOLMA extends the microarchitecture to called the re-order buffer (ROB). Micro-ops enter at the tail of track speculative control and data dependencies, restricting the ROB in-order upon dispatch, and exit from the head of the execution as needed to prevent transient operand values from ROB in-order upon commit. However, rather than waiting for affecting processor timing. all elder micro-ops to retire, micro-ops in the ROB issue (i.e., DOLMA’s key innovation is ensuring that a time slice on a begin executing) as soon as their operands become ready— core provides a unit of isolation in the context of known tran- potentially out of program order. Thus, OoO processors avoid sient execution attacks. By enforcing a novel principle of tran- idle execution units, exploiting instruction-level parallelism sient non-observability, DOLMA can allow secure speculative to improve efficiency over in-order processors. access to select core-local resources (e.g., the TLB, L1 cache, To further improve efficiency, processors implement and variable-time functional units) without loss of security. control-flow and data-flow speculation to avoid pipeline stalls.

Load more