SPECBOX: a Label-Based Transparent Speculation Scheme Against Transient Execution Attacks

SPECBOX: A Label-Based Transparent Speculation Scheme Against Transient Execution Attacks Bowen Tang1;2 , Chenggang Wu1;2, Zhe Wang1;2, Lichen Jia1;2, Pen-Chung Yew3, Yueqiang Cheng4, Yinqian Zhang5, Chenxi Wang6, Guoqing Harry Xu6 1State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences, 2University of the Chinese Academy of Sciences, 3University of Minnesota at Twin Cities, 4NIO Security Research, 5Southern University of Science and Technology, 6University of California, Los Angeles 1{tangbowen, wucg, wangzhe12, jialichen}@ict.ac.cn, [email protected], [email protected], [email protected], 6{wangchenxi, harryxu}@cs.ucla.edu Abstract—Speculative execution techniques have been a cor- when the processor determines that it is a mis-prediction in nerstone of modern processors to improve instruction-level Line 5, it will squash the instructions of Lines 6-7 and follow parallelism. However, recent studies showed that this kind of the correct path. In the current design, the processor will not techniques could be exploited by attackers to leak secret data via transient execution attacks, such as Spectre. Many defenses clean up the side effects in the cache (i.e. the data brought in are proposed to address this problem, but they all face various during the mis-speculative execution). The attacker can scan challenges: (1) Tracking data flow in the instruction pipeline the dummy array in the cache, and measure the access time could comprehensively address this problem, but it could cause of each array element. According to the access latency, the pipeline stalls and incur high performance overhead; (2) Making attacker can decide which item has been loaded into the cache, side effect of speculative execution imperceptible to attackers, but it often needs additional storage components and complicated and thereby infer the secret value. data movement operations. In this paper, we propose a label-based 1 #define SIZE 10 transparent speculation scheme called SPECBOX. It dynamically 2 uint8 array[SIZE]; partitions the cache system to isolate speculative data and non- 3 uint8 dummy[256*64]; speculative data, which can prevent transient execution from 4 uint8 victim(int index) { being observed by subsequent execution. Moreover, it uses thread 5 if (index < SIZE) { ownership semaphores to prevent speculative data from being 6 uint8 val = array[index]; accessed across cores. In addition, SPECBOX also enhances 7 return dummy[val*64]; the auxiliary components in the cache system against transient 8 } else return 0; 9 } execution attacks, such as hardware prefetcher. Our security 10 victim(8);// Step1: train the branch PEC OX analysis shows that S B is secure and the performance 11 flush_cache();// Step2: prepare cache layout evaluation shows that the performance overhead on SPEC CPU 12 victim(100);// Step3: access the secret 2006 and PARSEC-3.0 benchmarks is small. 13 timing_dummy();// Step4: measure cache layout I. INTRODUCTION Listing 1: A proof-of-concept (PoC) code of the Spectre attack. Five decades of exponential growth in processor performance has led to today’s ubiquitous computation infrastructure. At the To mitigate TEAs, several approaches have been proposed heart of such rapid growth are many optimizations employed in in recent years. The first category of approaches is to delay today’s processors. Among them, using speculative execution the use of the data until the instructions that produce the data to alleviate pipeline stalls caused by control flow transfer and are no longer speculative, i.e. the data is no longer “tainted” memory access is one of the most effective optimizations in [73] and “safe” to use. For the example in Listing 1, the out- arXiv:2107.08367v1 [cs.CR] 18 Jul 2021 modern processors. However, recent studies have shown that of-bound index value in Line 6 cannot be used to access this kind of techniques can introduce security vulnerabilities and the dummy array until the branch condition in Line 5 is be exploited by attackers via transient execution attacks (TEAs) resolved. The representative schemes are STT [73], SDO [72], such as the well-known Spectre [28]. These vulnerabilities are NDA [67], ConditionalSpec [31], Delay-on-Miss [47] and widely present in billions of processors produced by mainstream SpecShield [4]. These defenses comprehensively prevent the manufacturers such as Intel, AMD and ARM. execution of instructions that may cause information leackage. Listing 1 is a proof-of-concept (PoC) code in the Spectre However, delaying data propagation can cause pipeline stalls, attack. First, the attacker steers the index value to be less than which may incur high performance overhead [4], [47], [67], SIZE, thereby training the branch predictor to choose the fall- [73]. through branch in Line 5. After that, the attacker steers the index The second category is to keep speculative execution non- value to be greater than SIZE, which leads to an out-of-bound blocking but make it "invisible" to the subsequent instructions access to the secret during the speculative execution. Then, the if it fails. In Listing 1, the array elements of dummy loaded attacker utilizes the secret to index the dummy array. This will into the cache during the speculative execution will be removed load the dummy[val*64] item into the cache. Eventually, and cannot be accessed in the timing_dummy(). From the perspective of microarchitecture designers, this approach is bit. The set/reset actions are similar to the P/V semaphore preferable because of its compatibility with other optimization operations in parallel programming, so we call the N-bit label mechanisms that are critical to processor performance [1]. thread-owner semaphore (TOS). The representative works are SafeSpec [23], InvisiSpec [69], To summarize, we make the following contributions: CleanupSpec [46] and MuonTrap [1]. 1) We propose a new label-based transparent speculation To achieve the invisible speculative execution, InvisiSpec, scheme, called SPECBOX, against transient execution SafeSpec, and Munontrap choose to add an extra storage to attacks. It leverages a cache partitioning approach on keep the speculatively installed data. When the speculative speculative data, which eliminates the need for data instructions are committed, the speculatively installed data movement during the switch between speculative and will be re-installed into the cache hierarchy from the memory non-speculative execution. system or the newly added storage to make it visible. In contrast, 2) We propose a thread-ownership semaphore to logically CleanupSpec allows the data to be installed in the cache isolate shared data among threads to prevent side channels hierarchy during the speculative execution, and the replaced to be formed in the shared cache. data is stored into a newly added storage. The replaced data will 3) The security analysis shows that the proposed label- be re-installed into the cache hierarchy to rollback the cache based transparent speculation scheme is secure. And state only when speculation fails. Since most speculation will the performance evaluation shows that the overhead succeed, CleanupSpec has a higher performance in general [46]. of SPECBOX is substantially lower than that of STT, But, the re-install operations are required no matter whether the InvisiSpec and CleanupSpec on SPEC CPU 2006 and speculation succeeds or fails. Such additional data movement PARSEC-3.0 benchmarks. on the cache hierarchy can reduce its benefit and degrade its performance. II. BACKGROUND In this paper, we re-exam the invisible speculative execution A. Speculative Execution strategy along with the cache design, and propose a new trans- Speculative execution techniques, such as branch predic- parent speculation scheme, called SPECBOX. It modifies the tion [70] and memory disambiguation [16], are commonly used cache to support the invisible speculative access efficiently. The on modern out-of-order processors to improve performance. To speculative data and the non-speculative data are distinguished avoid pipeline stalls, the processor continues to speculatively ex- in the cache, so the extra storage and data movement are no ecute instructions beyond a branch instruction along a predicted longer needed. To achieve this, SPECBOX divides each cache path before the branch condition is resolved. If the prediction set into two domains. Each cache line in the set is distinguished fails, the mis-speculated instructions will be squashed. A re- by a 1-bit label to indicate which domain it is in. The temporary order buffer (ROB) is used to maintain correctness after the domain contains the speculative data and the persistent domain out-of-order execution. When an instruction reaches the head contains the non-speculative data. When the speculation fails, of the ROB and has completed its execution, it updates the the speculatively installed cache lines in the temporary domain machine state and releases its held resources, this process is will be invalidated. When the speculation succeeds, SPECBOX called commit. An instruction not yet committed is called an only needs to flip the bits to switch the corresponding cache in-flight instruction. lines from the temporary domain to the persistent domain. On modern processors, the side effects caused by the mis- Thus, it totally avoids the data movement required in other speculated instructions such as the data brought into the cache schemes, and hence, improves the performance. memory during the speculative execution

SPECBOX: a Label-Based Transparent Speculation Scheme Against Transient Execution Attacks

Efficiently Mitigating Transient Execution Attacks Using the Unmapped Speculation Contract Jonathan Behrens, Anton Cao, Cel Skeggs, Adam Belay, M

Class-Action Lawsuit

LVI-LFB: Scenarios 3 & 4

Software-Based Power Side-Channel Attacks on X86

Speculative Dereferencing: Reviving Foreshadow (Extended Version)

Efficiently Mitigating Transient Execution Attacks Using the Unmapped Speculation Contract Jonathan Behrens, Anton Cao, Cel Skeggs, Adam Belay, M

A Systematic Analysis of Machine Clears and Their Implications For

LVI Hijacking Transient Execution with Load Value Injection

The Evolution of Transient-Execution Attacks Claudio Canella Khaled N

Transient-Execution Attacks and Defenses

Rage Against the Machine Clear: a Systematic Analysis of Machine Clears and Their Implications for Transient Execution Attacks

New Models for Understanding and Reasoning About Speculative Execution Attacks