Specusym: Speculative Symbolic Execution for Cache Timing Leak Detection

2020 IEEE/ACM 42nd International Conference on Software Engineering (ICSE) SpecuSym: Speculative Symbolic Execution for Cache Timing Leak Detection Shengjian Guo∗ Yueqi Chen∗ Peng Li, Yueqiang Cheng Baidu Security Penn State University Baidu Security [email protected] [email protected] {lipeng28,chengyueqiang}@baidu.com Huibo Wang Meng Wu Zhiqiang Zuo Baidu Security Ant Financial Services Group State Key Lab. for Novel Software [email protected] [email protected] Technology, Nanjing University [email protected] ABSTRACT ACM Reference Format: CPU cache is a limited but crucial storage component in modern Shengjian Guo, Yueqi Chen, Peng Li, Yueqiang Cheng, Huibo Wang, Meng processors, whereas the cache timing side-channel may inadver- Wu, and Zhiqiang Zuo. 2020. SpecuSym: Speculative Symbolic Execution for Cache Timing Leak Detection. In 42nd International Conference on Software tently leak information through the physically measurable timing Engineering (ICSE ’20), May 23–29, 2020, Seoul, Republic of Korea. ACM, New variance. Speculative execution, an essential processor optimiza- York, NY, USA, 13 pages. https://doi.org/10.1145/3377811.3380428 tion, and a source of such variances, can cause severe detriment on deliberate branch mispredictions. Despite static analysis could 1 INTRODUCTION qualitatively verify the timing-leakage-free property under speculative execution, it is incapable of producing endorsements including CPU cache is a limited but crucial storage area on modern processor inputs and speculated flows to diagnose leaks in depth. This work chips. It primarily relieves the speed disparity between the rapid proposes a new symbolic execution based method, SpecuSym, for processors and the slow main memory, by buffering recently used precisely detecting cache timing leaks introduced by speculative data for faster reuse. Cache timing side-channel attacks [26, 47] execution. Given a program (leakage-free in non-speculative exe- leverage the distinct cache physical symptoms, i.e., the cache visit- cution), SpecuSym systematically explores the program state space, ing latencies of various program executions, to penetrate the con- models speculative behavior at conditional branches, and accumu- fidentiality of the victims. On exploiting the vulnerable software lates the cache side effects along with subsequent path explorations. implementations, adversaries can extract the application secrets During the dynamic execution, SpecuSym constructs leak predi- [16, 36, 60, 69], infer the neural network structure [30, 40, 41, 79], cates for memory visits according to the specified cache model or even dump the kernel data [42, 46, 52, 75]. and conducts a constraint-solving based cache behavior analysis to A timing side-channel generally serves as the intermediate car- inspect the new cache behaviors. We have implemented SpecuSym rier through which private data could inadvertently disclose to atop KLEE and evaluated it against 15 open-source benchmarks. observers who can elaborately measure the timing information Experimental results show that SpecuSym successfully detected of certain operations. One particular instance is the cache timing from 2 to 61 leaks in 6 programs under 3 different cache settings side-channel, which leaks data by the variance of the cache visit- and identified false positives in 2 programs reported by recent work. ing latency. State-of-the-art program repair method [77] mitigates cache timing leaks by enforcing constant execution time for all CCS CONCEPTS secret-relevant operations. However, this strong mitigation may still get compromised by the thread-level concurrency [38] or the • Security and privacy → Cryptanalysis and other attacks; instruction-level parallelism like speculative execution [44]. • Software and its engineering → Software verification and Speculative execution [44] is a microarchitectural optimization validation. in modern processors. It primarily increases the CPU instruction pipeline throughput by beforehand scheduling instructions under KEYWORDS predicted branches, which prevents control hazards from stalling Speculative execution, cache, timing, side-channel leak, symbolic the pipeline. Despite its essential importance, the cache side ef- execution fects caused by prediction errors could engender severe detriment through the cache timing side-channel [15, 43, 46, 75]. ∗Both authors contributed equally to this research. Yueqi Chen worked on this project Program analysis for speculative execution is by no means a new while he interned at Baidu USA. research domain. Previous efforts mainly researched safe and effi- cient execution [22, 35, 64], worst-case execution time estimation Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed [50, 51], concurrency bug prediction [21, 49], and Spectre vulner- for profit or commercial advantage and that copies bear this notice and the full citation ability detection [35, 58, 71]. Wu et al. [78] recently proposed a on the first page. Copyrights for components of this work owned by others than ACM dedicated static analysis of timing-leakage-free property under must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a speculative execution. However, this abstract interpretation based fee. Request permissions from [email protected]. method [78] qualitatively answers the yes or no question — it is ICSE ’20, May 23–29, 2020, Seoul, Republic of Korea incapable of generating input and speculative flows to diagnose © 2020 Association for Computing Machinery. ACM ISBN 978-1-4503-7121-6/20/05...$15.00 leaks in depth. Moreover, the over-approximation nature inevitably https://doi.org/10.1145/3377811.3380428 results in false positives, which desires a more precise method. 1235 ICSE ’20, May 23–29, 2020, Seoul, Republic of Korea Guo et al. 1 • The implementation of SpecuSym, which addresses three Speculative Symbolic Execution Modeling major challenges and supports cache timing leak detection Sensitive Input under speculative execution. 3 4 • The evaluation of SpecuSym on 15 open-source benchmarks Leakage-free Symbolic Cache New Timing to demonstrate its effectiveness through revealing from 2 to Program P Execution Behavior Analysis Leak Witnesses 61 leaks under 3 different cache settings. Insensitive 2 The remainder of this paper is organized as follows. Section 2 mo- Plain Input Cache State Modeling tivates our work, and Section 3 reviews the background knowledge. Section 4 states the core contributions and optimizations. Then we conduct experiments in Section 5 and discuss the related work in Figure 1: Overall flow of SpecuSym. Section 6. Finally, we conclude our work in Section 7. 2 MOTIVATION To this end, we propose a new symbolic execution based method, SpecuSym, for detecting cache timing leaks caused by speculative This section motivates our work with an example. By studying its execution. Figure 1 displays the overall flow of SpecuSym. Given a leakage-free cache behavior under non-speculative execution and program P, which is timing-leakage-free in non-speculative execu- the new leaks caused by speculative execution, we position how tion, the sensitive input presented in symbol, and the insensitive SpecuSym should facilitate leak detection. input, SpecuSym leverages symbolic execution to explore P’s state space systematically. Meanwhile, it models speculative execution 2.1 Program P and the Cache Mapping at conditional branches (cf. 1 ) and accumulates cache side effects Figure 2(a) shows a program snippet P whose execution time re- along with subsequent executions (cf. 2 ). Based on a cache model, mains stationary in non-speculative execution but varies in terms SpecuSym constructs leak predicates for memory visits and con- of the sensitive input when running under speculative execution. ducts a constraint-solving based cache behavior analysis (cf. 3 ) to Listed at line 2, P has 4 local variables as S, x, v1, and v2. Op- generate the leak witnesses (cf. 4 ). erating these variables, e.g., the implicit memory read of x (line Our new method has three significant challenges. The first chal- 6) and the explicit store to v1 (line 7), may lead to memory access. lenge comes from the modeling of speculative behaviors. Classic The remaining variable i (line 3) is a register variable that incurs symbolic executors [17, 62] neither support speculative execution no memory access. Also, variable x is the sensitive input, and any nor are cache-aware since they primarily concentrate on the func- form of revealing its value turns to be a leak. tional correctness rather than reasoning the implicit program prop- We use a fully associative cache C for the analysis purpose of erties. The second challenge derives from the cache state mainte- P, as shown in Figure 2(b). It is an extreme case of the N-way nance. Due to the symbolic nature, a symbolic memory address associative cache where the memory address of a variable in P may correspond to multiple concrete addresses. Updating the cache may map to any cache line of C, subjecting to the line availability status after each memory operation unquestionably leads to an and the replacement policy. Here we assume C adopts the Least explosive number of different cache states. The last challenge stems Recently Used (LRU) policy, which always evicts the least used line from the analysis

Load more