<<

DATA – Differential Address Trace Analysis: Finding Address-based Side-Channels in Binaries Samuel Weiser, Graz University of Technology; Andreas Zankl, Fraunhofer AISEC; Raphael Spreitzer, Graz University of Technology; Katja Miller, Fraunhofer AISEC; Stefan Mangard, Graz University of Technology; Georg Sigl, Fraunhofer AISEC; Technical University of Munich https://www.usenix.org/conference/usenixsecurity18/presentation/weiser

This paper is included in the Proceedings of the 27th USENIX Security Symposium. August 15–17, 2018 • Baltimore, MD, USA ISBN 978-1-939133-04-5

Open access to the Proceedings of the 27th USENIX Security Symposium is sponsored by USENIX. DATA – Differential Address Trace Analysis: Finding Address-based Side-Channels in Binaries

Samuel Weiser1, Andreas Zankl2, Raphael Spreitzer1, Katja Miller2, Stefan Mangard1, and Georg Sigl2,3

1Graz University of Technology, 2Fraunhofer AISEC, 3Technical University of Munich

Abstract target for various side-channel attacks [11, 45, 77], as a successful attack undermines cryptographic security Cryptographic implementations are a valuable target for guarantees. Especially software-based microarchitec- address-based side-channel attacks and should, thus, be tural attacks (e.g., attacks, DRAM attacks, branch- protected against them. Countermeasures, however, are prediction attacks, and controlled-channel attacks) are often incorrectly deployed or completely omitted in prac- particularly dangerous since they can be launched from tice. Moreover, existing tools that identify information software and, thus, without the need for physical access. leaks in programs either suffer from imprecise abstrac- Many of these software-based attacks exploit address- tion or only cover a subset of possible leaks. We sys- based information leakage to recover cryptographic keys tematically address these limitations and propose a new of symmetric [6, 36] or asymmetric [28, 87] primitives. methodology to test software for information leaks. Various countermeasures against address-based infor- In this work, we present DATA, a differential address mation leakage have been proposed on an architectural trace analysis framework that detects address-based side- level [52, 62, 81]. However, these require changing the channel leaks in program binaries. This accounts for at- hardware, which prohibits fast and wide adoption. A tacks exploiting caches, DRAM, branch prediction, con- more promising line of defense are software countermea- trolled channels, and likewise. DATA works in three sures, which remove address-based information leaks by phases. First, the program under test is executed to eliminating key-dependent memory accesses to data and record several address traces. These traces are analyzed memory. For example, data leakage can be thwarted using a novel algorithm that dynamically re-aligns traces by means of bit-slicing [43, 47, 66], and control-flow to increase detection accuracy. Second, a generic leakage leakage by unifying the control flow [21]. Even though test filters differences caused by statistically independent software countermeasures are already well studied, in program behavior, e.g., randomization, and reveals true practice their adoption to crypto libraries is often par- information leaks. The third phase classifies these leaks tial, error-prone, or non-transparent, as demonstrated by according to the information that can be obtained from recent attacks on OpenSSL [27, 28, 88]. them. This provides further insight to security analysts about the risk they pose in practice. To address these issues, leakage detection tools have We use DATA to analyze OpenSSL and PyCrypto in been developed that allow developers and security ana- a fully automated way. Among several expected leaks in lysts to identify address-based side-channel vulnerabil- symmetric , DATA also reveals known and pre- ities. Most of these tools, however, primarily focus on viously unknown leaks in asymmetric primitives (RSA, cache attacks and can be classified into static and dy- DSA, ECDSA), and DATA identifies erroneous bug fixes namic approaches. Many static analysis methods use ab- of supposedly fixed constant-time vulnerabilities. stract interpretation [24,25,48,57] to give upper leakage bounds, ideally proving the absence of information leaks in already secured implementations, e.g., the evaluation 1 Introduction of Salsa20 [24]. However, these approaches struggle to accurately describe and pinpoint information leaks due Side-channel attacks infer sensitive information, such to over-approximation [24, page 443], rendering leakage as cryptographic keys or private user data, by moni- bounds meaningless in the worst case. Moreover, their toring inadvertent information leaks of computing de- approximations of the program’s data plane fundamen- vices. Cryptographic implementations are a valuable tally prohibit the analysis of interpreted code.

USENIX Association 27th USENIX Security Symposium 603 In contrast, dynamic approaches [41, 84, 89] focus tifies in a program is potentially exposed to side-channel on concrete program executions to reduce false posi- attacks. DATA works in three phases. tives. Contrary to static analysis, dynamic analysis can- Difference Detection: The first phase generates noise- not prove the absence of leakage without exhaustive in- less address traces by executing the target program with put search, which is infeasible for large input spaces. binary instrumentation. It identifies differences in these However, in case of cryptographic algorithms, testing a traces on a byte-address granularity. This accounts for subset of inputs is enough to encounter information leaks all address-based side-channel attacks such as cache at- with a high probability, because crypto primitives heav- tacks [61,64,87], DRAM attacks [65], branch-prediction ily diffuse the secret input during processing. Thus, there attacks [1], controlled-channel attacks [86], and many is a fundamental trade-off between static analysis (mini- blackbox timing attacks [11]. mizing false negatives) and dynamic analysis (minimiz- Leakage Detection: The second phase tests data and ing false positives). control-flow differences for dependencies on the secret We aim for a pragmatic approach towards minimiz- input. A generic leakage test compares the address traces ing false positives, allowing developers to identify infor- of (i) a fixed secret input and (ii) random secret inputs. If mation leaks in real-world applications. Thus, we fo- the traces differ significantly, the corresponding data or cus on dynamic analysis and tackle the limitations of ex- control-flow differences are labeled as secret-dependent isting tools. In particular, existing tools either focus on leaks. This minimizes false positives and explicitly ad- control-flow leaks or data leaks, but not both at the same dresses non-deterministic program behavior introduced time [80,89]; they consider the strongest adversary to ob- by blinding or probabilistic , for example. serve cache-line accesses only [41], which is too coarse- Leakage Classification: The third phase classifies grained in light of recent attacks (CacheBleed [88]); the information leakage of secret-dependent data and many of them lack the capability to properly filter pro- control-flow differences. This is achieved with specific gram activity that is statistically independent of secret leakage tests that find linear and non-linear relations be- input [50, 80, 84]; and most do not provide any means tween the secret input and the address traces. These leak- to further assess the severity of information leaks, i.e., age tests are a valuable tool for security analysts to de- the risk they bring and the urgency with which they termine the severity and exploitability of a leak. must be fixed. Based on these shortcomings, we argue We implement DATA in a fully automated evaluation that tools designed to identify address-based information tool that allows analyzing large software stacks, includ- leaks must tackle the following four challenges: ing initialization operations, such as key loading and 1. Leakage origin: Detect the exact location of data parsing, as well as cryptographic operations. We use and control-flow leaks in programs on byte-address DATA to analyze OpenSSL and PyCrypto, confirming granularity instead of cache-line granularity. existing and identifying new vulnerabilities. Among sev- 2. Detection accuracy: Minimize false positives, e.g., eral expected leaks in symmetric ciphers (AES, Blow- caused by non-determinism that is statistically inde- fish, Camellia, CAST, Triple DES, ARC4), DATA also pendent of the secret input, and provide reasonable reveals known and previously unknown leaks in asym- strategies to also reduce false negatives. metric primitives (RSA, DSA, ECDSA) and identifies er- 3. Leakage classification: Provide means to classify roneous bug fixes of supposedly resolved vulnerabilities. leaks with respect to the information gained by an Outline. The remainder of this paper is organized as fol- adversary. lows. In Section 2, we discuss background information 4. Practicality: Report information leaks (i) fully au- and related work. In Section 3, we present DATA on a tomated, i.e., without requiring manual interven- high level. In Sections 4–6 we describe the three phases tion, (ii) using only the program binary, i.e., with- of DATA. In Section 7, we give implementation details. out requiring the source code, and (iii) efficiently in In Section 8, we evaluate DATA on OpenSSL and Py- terms of performance. Crypto. In Section 9, we discuss possible leakage miti- In this work, we tackle these challenges with differen- gation techniques. Finally, we conclude in Section 10. tial address trace analysis (DATA), a methodology and tool to identify address-based information leaks in appli- cation binaries. DATA is intended to be a companion dur- 2 Background and Related Work ing testing and verification of security-critical software.1 It targets programs processing secret input, e.g., keys or 2.1 Microarchitectural Attacks passwords, and reveals dependencies between the secret Microarchitectural side-channel attacks rely on the ex- and the program execution. Every leak that DATA iden- ploitation of information leaks resulting from contention 1DATA is open-source and can be retrieved from for shared hardware resources. Especially microarchi- https://github.com/Fraunhofer-AISEC/DATA. tectural components such as the CPU cache, the DRAM,

604 27th USENIX Security Symposium USENIX Association and the branch prediction unit, where contention is based resents an upper-bound on the maximum leakage possi- on memory addresses, enable powerful attacks that can ble. While a zero leakage bound guarantees absence of be conducted from software only. For instance, attacks address-based side channels, a non-zero leakage bound exploiting the different memory access times to CPU could become rather imprecise (false positives) due to caches (aka cache attacks) range from timing-based at- abstractions made on the data of the program. Abstrac- tacks [11] to more fine-grained attacks that infer ac- tion also fundamentally prohibits analysis of interpreted cesses to specific memory locations [61, 64, 87]. Like- code as it is encoded in the data plane of the interpreter. wise, DRAM row buffers have been used to launch side- Dynamic Approaches. Dynamic analysis relies on channel attacks [65] by exploiting row buffer conflicts of concrete program executions, which possibly introduce different memory addresses. Also, the branch prediction false negatives. Ctgrind [50] propagates secret memory unit has been exploited to attack OpenSSL RSA imple- throughout the program execution to detect its usage in mentations [1]. Xu et al. [86] demonstrated a new class conditional branches or memory accesses. However, ct- of attacks on shielded execution environments like In- grind suffers from false positives as well as false nega- tel SGX, called controlled-channel attacks. They enable tives [4]. In contrast, Stacco [84] records address traces noise free observations of memory access patterns on a and analyzes them with respect to Bleichenbacher at- page granularity. For a detailed overview on microarchi- tacks [15], for which finding a single control-flow leak tectural attacks we refer to recent survey papers [29,76]. suffices. Stacco does not consider data leakage, and they do not consider reducing false negatives, i.e., finding 2.2 Detection of Information Leaks multiple control-flow leaks within the traces. If they did, they would suffer from false positives due to improper 2.2.1 Terminology trace alignment (they use Linux diff tool). None of the above approaches supports specific leak- We consider a program secure if it does not contain age models to further assess the information leak. address-based information leaks. We distinguish be- Zankl et al. [89] analyze modular exponentiations un- tween data and control-flow leakage. Data leakage oc- der the Hamming weight model, but they do not consider curs if accessed memory locations depend on secret in- other leakage models and only detect control-flow leaks. puts. Control-flow leakage occurs if code execution de- pends on secret inputs. We further distinguish between Combined Approaches. CacheD [80] combines dy- deterministic and non-deterministic programs. Latter in- namic trace recording with static analysis introducing clude any kind of non-determinism such as randomiza- both, false negatives and false positives. They symbol- tion of intermediates (blinding) or results (probabilistic ically execute only instructions that might be influenced constructions). A false positive denotes an identified in- by the secret key. Since they only analyze a single execu- formation leak that is in fact none. A false negative de- tion, they miss leakage in other execution paths. More- notes an information leak which was not identified. over, they do not model control-flow leaks. Attack-based Approaches. These are dynamic ap- proaches that conduct specific attacks but do not gen- 2.2.2 Blackbox Timing Leakage Detection eralize to other attacks. For instance, Brumley and These techniques measure the execution time of imple- Hakala [19] as well as Gruss et al. [36] suggested to de- mentations for different classes of inputs and rely on tect implementations vulnerable to cache attacks by re- statistical tests to infer whether or not the implemen- lying on template attacks. Irazoqui et al. [41] use cache tation leaks information [23]. Reparaz et al. [67] use observations and a mutual information metric to identify Welch’s t-test [83] to identify vulnerable cryptographic control-flow and data leaks. Basu et al. [9] and Chat- implementations. More advanced approaches use sym- topadhyay et al. [20] quantify the information leakage in bolic execution to give upper leakage bounds [63]. How- cache attacks. ever, these approaches fall short for more fine-grained Orthogonal Work. Other approaches analyze source address-based attacks such as cache attacks. code [14], which does not account for - introduced information leaks or platform-specific behav- 2.2.3 Address-based Leakage Detection ior (cf. [4]). Yet others demand source-code annota- tions [4, 5, 7] or specify entirely new languages [16]. We distinguish between static and dynamic approaches. While they can prove absence of leakage for already se- Static Approaches. Well-established static approaches cured code, they struggle to pinpoint leaks in vulnerable are CacheAudit [24, 48] and follow-up works [25, 57], code. In contrast, DATA is designed to find and pinpoint which symbolically evaluate all program paths. Rather leakage in insecure, unannotated programs. After miti- than pinpointing the leakage origin, CacheAudit accu- gating leakage found by DATA, absence of leakage could mulates potential leakage into a single metric, which rep- be proven using [4, 5, 7, 16, 25].

USENIX Association 27th USENIX Security Symposium 605 Table 1: Comparison of leakage detection tools. means that the tool suffers from false positives/negatives. means that the tool does not suffer from false positives/negatives. S denotes statistical guarantees. #

Finest Covered vulnerabilities False# positives False Output Source code Tool Tool Approach granularity CF leak Data leak Deterministic Non-deterministic negatives Leaks Key dependency required available CacheAudit [24] Static analysis Cache line   Leakage bound  no  CacheAudit 2 [25] Static analysis Byte address   # Leakage bound  no  CacheD [80] Combined Cache line   # Leak origin  no  ctgrind [50] Dynamic Byte address   Leak origin  yes  a Stacco [84] Dynamic (trace-based) Byte address   Leak origin  no  MI-Tool [41] Dynamic (attack-based) Cache line   #S S Leak origin generic yes  Zankl et al. [89] Dynamic (trace-based) Byte address   #S #S Leak origin HW no  DATA Dynamic (trace-based) Byte address   #S #S Leak origin generic, HW, etc. no  # # G# aOnly the first control-flow leak is reliably identified. Reporting multiple leaks could cause false positives.

2.3 Improvement Over Existing Tools the need for source code, allowing analysis of propri- etary software. As will be outlined in our evaluation, By addressing the identified challenges in Section 1, we achieve competitive performance, support analysis DATA overcomes several shortcomings of existing ap- of large software stacks and even interpreted code (Py- proaches, as shown in Table 1. Crypto and CPython), and DATA is open source. Leakage Origin. DATA follows a dynamic trace-based approach to identify both control flow and data leakage on byte-address granularity. This avoids wrong assump- 3 Differential Address Trace Analysis tions about attackers, e.g., only observing memory ac- cesses at cache-line granularity [24, 41, 80], which were DATA is a methodology and a tool to identify address- disproved by more advanced attacks [1, 88]. Neverthe- based information leaks in program binaries. less, identifying information leaks on a byte granularity Threat Model. To cover a wide variety of possible at- still allows to map them to more coarse-grained attacks. tacks, we consider a powerful adversary who attempts Detection Accuracy. Static approaches like CacheAudit to recover secret information from side-channel observa- suffer from false positives. In contrast, DATA filters key- tions. In practice, attackers will likely face noisy obser- independent differences with a high probability, thereby vations because side channels typically stem from shared reducing false positives even for non-deterministic pro- resources affected by noise from system load. Also, gram behavior. However, as with all dynamic ap- practical attacks only monitor a limited number of ad- proaches, DATA could theoretically miss leakage that is dresses or memory blocks. For DATA, we assume that not triggered during execution. Nevertheless, we found the attacker can accurately observe full, noise-free ad- that in practice few traces already suffice, e.g., ≤ 10 dress traces. More precisely, the attacker does not only for asymmetric algorithms, and ≤ 3 for symmetric al- learn the sequence of instruction pointers [59], i.e., the gorithms, due to the high diffusion provided by these al- addresses of instructions, but also the addresses of the gorithms. Although without formal guarantee, this gives operands that are accessed by each instruction. This is evidence that DATA reduces false negatives successfully. a strong attacker model that covers many side-channel Compared to others, we take multiple measures to reduce attacks targeting the processor (e.g., false negatives in DATA. In contrast to CacheD and ct- branch prediction) and the memory hierarchy (e.g., var- grind, we analyze several execution paths. Compared to ious CPU caches, prefetching, DRAM). A strong model Stacco, which has improper trace alignment, we report is preferable here to detect as many vulnerabilities as all leaks visible in the address traces. Contrary to MI- possible. In line with [35], we consider defenses, such Tool, we do not only focus on a specific attack technique as address space layout randomization (ASLR) and code (e.g., cache attacks). In contrast to Zankl et al. [89], we obfuscation, as ineffective against powerful attackers. can detect generic key dependencies. This advantage is Limitations. DATA covers software side channels of indicated by in Table 1. components that operate on address information only, Leakage Classification.G# While Zankl et al. [89] use the e.g., cache prefetching and replacement, and branch pre- Hamming weight (HW) model only, DATA allows test- diction. In contrast, the recent Spectre [44] and Melt- ing for various leakage models as well as defining new down [51] bugs exploit not only address information but ones. Besides pinpointing the information leaks, this rep- actual data which is speculatively processed but insuffi- resents valuable information to determine key dependen- ciently isolated across different execution contexts. In cies in the identified information leaks. these attacks, sensitive data spills over to the address Practicality. DATA analyzes information leaks fully au- bus. These hardware bugs cannot be detected by an- tomatically. It does so on the program binary without alyzing software binaries with tools listed in Table 1.

606 27th USENIX Security Symposium USENIX Association Figure 1: Overview of differential address trace analysis (DATA).

While software-only defenses exist for specific CPU DATA is also related to differential computation analy- models [22, 78], a generic solution should fix the hard- sis (DCA) [17]. DCA relies on software execution traces ware. to attack white-box crypto implementations. While DCA Methodology. DATA consists of three phases, the dif- is conceptually similar to DATA, DCA attacks (white- ference detection phase, the leakage detection phase, and box model) consider a much stronger adversary who can the leakage classification phase, as depicted in Figure 1. read the actual content of accessed memory locations. In the difference detection phase, we execute the tar- get program multiple times with varying secret inputs 4 Difference Detection Phase and record all accessed addresses with dynamic binary instrumentation in so-called address traces. Thereby, we We now introduce address-based information leaks and ensure to capture both, control flow and data leakages at discuss the steps to identify them, namely recording of their exact origin. The recorded address traces are then address traces and finding differences within the traces. compared and address differences are reported. Notation. DATA analyzes a program binary P with re- The leakage detection phase verifies whether reported spect to address leakage of secret input k. Let P(k) de- address differences are actually secret-dependent and fil- note the execution of a program with controllable secret ters all that are statistically independent. For this step, input k. We write t = trace(P(k)) to record a trace of the program is repeatedly executed with one fixed se- accessed addresses during program execution. We de- cret input and a set of varying (random) secret inputs. In fine an address trace t = [a0,a1,a2,a3...] as a sequence contrast to the previous phase, only the initially reported of executed instructions, augmented with memory ad- differences need to be monitored. The address traces be- dresses. For instructions operating on CPU registers, longing to the fixed input are then compared to those of ai = ip holds the current instruction pointer ip. In case the random inputs using a generic leakage test. Statisti- of memory operations, ai = (ip,d) also holds the ac- cal differences are reported as true information leaks. cessed memory address d. Information leaks appear as The leakage classification phase helps security ana- differences in address traces. We develop an algorithm lysts to assess the severity of previously confirmed leaks. diff(t1,t2) that, given a pair of traces (t1, t2), identifies all This is done with specific leakage tests that find lin- differences. If the traces are equal, diff(t1,t2) = ∅.A ear or non-linear relations between a given secret input deterministic program P is leakage free if and only if no and the previously recorded address traces. Such rela- differences show up for any pair of secret inputs (ki, k j): tions are formulated as so-called leakage models, e.g., ∀k ,k : diff(trace(P(k )),trace(P(k ))) = (1) the Hamming weight model. If a relation is found, the i j i j ∅ corresponding leakage model defines the information an attacker can learn about the secret input by observing 4.1 Address-based Information Leakage memory accesses to the identified addresses. All de- Data leakage is characterized by one and the same in- tected relations are included in the final leakage report. struction (ip) accessing different memory locations (d). Relation to Similar Concepts. The idea of DATA is Consider the code snippet in Listing 1, assuming line similar to differential power analysis (DPA) [46], which numbers equal code addresses. Execution with two dif- works on power traces. However, power traces are often ferent keys keyA = [10,11,12] and keyB = [16,17,18] noisy due to measurement uncertainty and the underly- yields two address traces tA = trace(P(keyA)) and tB = ing physics. Hence, DPA often requires several thou- trace(P(keyB)), with differences marked bold: sand measurements and non-constant time implemen- tations demand heavy pre-processing to correctly align tA = [0,18,19,(17,1),20,(17,11),21,(17,12),22,(17,13),23] power traces [55]. In contrast, address traces are noise- tB = [0,18,19,(17,1),20,(17,01),21,(17,02),22,(17,03),23] free, which minimizes the number of required measure- ments and allows perfect re-alignment for non-constant The function ’transform’ leaks the argument kval, time traces (due to control-flow leaks). which is used as index into the array LUT (line 17).

USENIX Association 27th USENIX Security Symposium 607 0 program entry: call process with user-input 0 program entry: call exp with user-input 1 unsigned char LUT[ 1 6 ] = { 0x52 , 1 f u n c t i o n exp ( key , ∗p ) { 2 0x19 , ...... 2 foreach (bit b in key) 16 0x37 }; 3 i f ( b ) 17 i n t t r a n s f o r m ( i n t k v a l ) { return LUT[ kval%16]; } 4 mul ( r , p ) ; 18 i n t p r o c e s s ( i n t key [ 3 ] ) { e l s e 19 i n t val = transform(0); 5 mul ( t , p ) ; 20 val += transform(key[0]); 6 return r ; 21 val += transform(key[1]); } 22 val += transform(key[2]); f u n c t i o n mul (∗ a ,∗ b ) { 23 return v a l ; 7 tmpA = ∗a ; } 8 tmpB = ∗b ; // calculate res = tmpA ∗ tmpB Listing 1: Table look-up causing data leak. 9 ∗a = r e s ; } Listing 2: Branch causing control-flow leak. Since the base address of LUT is 1, this operation leaks memory address kval + 1. The first call to transform (line 19) with kval = 0 results in a1 = (17,1). Subse- A control-flow leak is characterized by its branch quent calls (line 20–22) leak sensitive key bytes. The point, where the control flow diverges, and its merge differences in the traces—marked bold—reveal key de- point, where branches coalesce again. In this example, pendencies. the branch point is at line 3 and the merge point at line 2, To accurately report data leakage and to distinguish when the next loop iteration starts. We model control- non-leaking cases (line 19) from leaking cases (line 20– flow leaks as tuples (ip,cs,ev) of branch point ip, call 22), we take the call stack into account. We formalize stack cs, and evidence ev. For example, both differences data leaks as tuples (ip,cs,ev) of the leaking instruction occur at the same call stack cs = [0]. Hence, they are ip, its call stack cs, and the evidence ev. The call stack is reported as the same leak. The evidence is a set of sub- a list of caller addresses leading to the leaking function. traces corresponding to the two branches. Our diff algo- For example, the first leak has the call stack cs = [0,20]. rithm would report: The evidence is a set of leaking data addresses d. The larger the evidence set, the more information leaks. For diff(tA,tB) = {(3,[0],{[4,(7,R),(8,P),(9,R)], example, ev = {11,01} for the first leak, ev = {12,02} [5,(7,T),(8,P),(9,T)]} )} for the second one, etc. Our diff algorithm would report: 4.2 Recording Address Traces diff(tA,tB) = {(17,[0,20],{11,01}), (17,[0,21],{12,02}), We execute the program on a dynamic binary instrumen- (17,[0,22],{13,03})} tation (DBI) framework, namely Intel Pin [54], and store the accessed code and data addresses in an address trace. Control-flow leakage is caused by key-dependent To execute the program in a clean and noise-free envi- branches. Consider the exponentiation in Listing 2, exe- ronment, we disable ASLR and keep public inputs (e.g., cuted with two keys kA = 4 = 100b and kB = 7 = 111b. command line arguments, environment variables) to the This yields the following address traces, where R, P, and program fixed. As shown in Figure 1, we repeat this mul- T denote the data addresses of the variables r, p, and t. tiple times with varying inputs, causing address leaks to show up as differences in the address traces. trace(P(kA)) = tA = [0,1,2,3,4,(7,R),(8,P),(9,R), The concept of DATA is agnostic to concrete 2,3,5,(7,T),(8,P),(9,T), recording tools and, hence, could also rely on other tools [71] or hardware acceleration like Intel Processor 2,3,5,(7,T),(8,P),(9,T),2,6] Trace (IPT) [39]. Since the recording time is small com- trace(P(kB)) = tB = [0,1,2,3,4,(7,R),(8,P),(9,R), pared to trace analysis, we did not investigate other tools. 2,3,4,(7,R),(8,P),(9,R), 2,3,4,(7,R),(8,P),(9,R),2,6] 4.3 Finding Trace Differences There are two differences in the traces, both marked The trace comparison algorithm (diff) in Algorithm 1 bold. The differences occur due to the if in line 3 which sequentially scans a pair of traces (tA, tB) for address branches to line 4 or 5, depending on the key bit b, and differences, while continuously re-aligning traces in the causes operations in line 7 and 9 to be done either on the same pass. Whenever ip values match but data addresses intermediate variable r or a temporary variable t. (d) do not, a data difference is detected (lines 4–6).

608 27th USENIX Security Symposium USENIX Association Algorithm 1: Identifying address trace differences (diff). > 0 (lines 7–8). The functions isCall(a) and isRet(a) re-

input : tA,tB ... the two traces turn true iff the assembler instruction at address a.ip output: rep ... the report of all differences is a function call or return, respectively. If the calling 1 rep = ∅, i = 0, j = 0 depth drops below zero, the trace returned to the func- 2 while i < |tA| ∧ j < |tB| do tion’s call-site. We stop scanning this trace (lines 15–17) 3 a = tA[i], b = tB[i] and wait for the other trace to hit a merge point. 4 if a.ip = b.ip then 5 if a.d 6= b.d then Our context sensitive alignment also works for tech- 6 rep = rep ∪ report data diff(tA,tB,i, j) niques like retpoline [78] that aim to prevent Spectre at- 7 end tacks, since they just add additional call/ret layers. Code 8 i++, j++ directly manipulating the stack pointer (return stack re- 9 else 10 rep = rep ∪ report cf diff(tA,tB,i, j) fill [78], setjmp/longjmp, exceptions, etc.) could be 11 (i, j) = find merge point(tA,tB,i, j) supported by detecting such stack pointer manipulations 12 end alongside calls and rets. 13 end Comparison to Related Work. Trace alignment has 14 return rep been studied before as the problem of correspondence between different execution points. Several approaches Algorithm 2: find merge point for identifying execution points exist [74]. Instruction

input : tA,tB ... the two traces counter based approaches [58] uniquely identify points input : i, j ... the trace indices of the branches in one execution but fail to establish a correspondence output: k,l ... the indices of the merge point between different executions. Using calling contexts 1 k = i, l = j, CA = 0, CB = 0, SA = ∅, SB = ∅ as correspondence metric could introduce temporal am- 2 while k < |tA| ∧ l < |tB| do biguity in distinguishing loop iterations [75]. Xin et 3 if isCall(t [k]) then C ++ ; A A al. [85] formalize the problem of relating execution 4 if isRet(tA[k]) then CA––; 5 if isCall(tB[l]) then CB++ ; points across different executions as execution index- 6 if isRet(tB[l]) then CB––; ing (EI). They propose structural EI (SEI), which uses 7 if CA <= 0 then SA = SA ∪tA[k].ip ; taken program paths for indexing but could lose com- 8 if C <= 0 then S = S ∪t [l].ip ; B B B B prehensiveness by mismatching execution points that 9 M = SA ∩ SB 10 if M 6= ∅ then should correspond [74]. Other approaches combine call 11 k = find(tA[i...k],M) stacks with loop counting to avoid problems of ambi- 12 l = find(tB[ j...l],M) guity and comprehensiveness [74]. Many demand re- 13 return (k,l) 14 end compilation [74, 75, 85], which prohibits their usage 15 if CA >= 0 then k++ ; in our setting. Specifically, EI requires knowledge of 16 if CB >= 0 then l++ ; post-dominators, typically extracted from control flow 17 end graphs (CFGs) [30], which are not necessarily available 18 error No merge point found (e.g., obfuscated binaries or dynamic code generation). Using EI, Johnson et al. [42] align traces in order to prop- agate differences back to their originating input. We use Control-flow differences occur when ip differs (line 9– a similar intuition as Johnson et al. in processing and 11). Differences are reported using report data diff and aligning traces in a single pass, however, without the report cf diff using the format specified in Section 4.1. need to make program execution indices explicit. By Trace Alignment. For control-flow differences, it is cru- constantly re-aligning traces, we inherently maintain cor- cial to determine the correct merge points, as done by Al- respondence of execution points. Our set-based approach gorithm 2. Starting from the branch point, it sequentially does not require CFG or post-dominator information. scans both traces, extending two sets SA and SB (lines In contrast to EI, we do not explicitly recover loops. 7–8) with the scanned instructions. If their intersection This could cause imprecision when merging control-flow M becomes non-empty (lines 9–10), M holds the merge leaks embedded within loops. If the two branches are point’s ip. We then determine the first occurrence of M significantly asymmetric in length, we might match mul- in both branches using find (lines 11–12) and realign the tiple shorter loop iterations against one longer iteration, traces before proceeding (Algorithm 1, line 11). thus introducing an artificial control-flow leak (false pos- Context-Sensitivity. Since control-flow leaks could in- itive) when one branch leaves the loop while the other corporate additional function calls (e.g., function mul in does not. Should such leaks occur, they would be dis- Listing 2), we need to exclude those from the merge point missed as key independent in phase two. Note that cor- search. Therefore, we maintain the current calling depth respondence (correct alignment) would be automatically in counters CA and CB (lines 3–6) and skip calling depths restored as soon as both branches leave the loop. Also,

USENIX Association 27th USENIX Security Symposium 609 this is not a fundamental limitation of DATA, as other trace alignment methods could be implemented as well. Combining Results. We run our diff algorithm pairwise 90 on all recorded traces and accumulate the results in an in- 80 70

termediate report. Testing multiple traces helps capture Accesses 60 nested leakage, that is, leakage which appears condition- 50 ally, depending on which branches are taken in a super- 40 ordinate control-flow leak. Nested leakage would remain 30 20 hidden when testing trace pairs which either take the 10

wrong superordinate branch or exercise both branches. 5 4 r1 3 Addresses 2 r 1 Positions 5 Leakage Detection Phase 0 0

We implement a generic leakage test to reduce the num- Figure 2: Histogram Hfull over evidence traces. ber of false positives in case of (randomized) program behavior and events that are statistically independent of the secret input. The program is repeatedly executed with of the same instruction in a two-dimensional histogram, one fixed secret input and a set of random secret inputs. as depicted in Figure 2. The y-axis contains the addresses If the distributions of accessed addresses in these two sets accessed by the instruction, r0 and r1 in this case. The can be distinguished, the corresponding address differ- x-axis specifies their positions in the trace. A single ev- ences are marked as secret-dependent. A challenge that idence trace, e.g., [r0,r1,r1,r0,r0,r1], would appear as arises during this generic leakage test is that false nega- dots in the x-y plane. When aggregating multiple traces, tives might occur if the fixed input is particularly similar the z-axis accumulates all dots into bars, specifying the to the average random case. We address this challenge by overall number of accesses for each address and position. repeating the generic leakage test with multiple distinct This histogram, named Hfull, fully captures the charac- fixed inputs and merging the results in the end. We intro- teristics of evidence traces, namely when and how often duce an appropriate leakage-evidence representation to addresses are accessed. The downside of Hfull is that a compare distributions of accessed addresses. large number of traces is required to accurately estimate it. This would prolong the leakage detection phase and increase storage requirements. We therefore use two sim- 5.1 Evidence Representation plified histograms, each of which captures one character- We unify the representation of both data and control-flow istic of Hfull. The first one, Haddr, tracks the total number evidences in so-called evidence traces. These traces hold of accesses per address, thus, collapsing the x-axis and a time-ordered sequence of memory addresses that a par- omitting time information. The second one collapses the ticular instruction accesses during one program execu- y-axis and counts the total number of accesses per po- tion. Note the difference to evidence sets used in Sec- sition. This omits address information and is compara- tion 4.1, which are computed over multiple program ex- ble to counting the length of evidence traces. Observe ecutions. Evidence traces contain all essential informa- that counting the length of evidence traces equals the tion exploited in practical attacks, such as how often an (negative) difference between consecutive positions. We address is accessed [11, 45] and also when, i.e., at which therefore define Hpos as counting the length of evidence position an address is accessed in the trace [87]. traces. We illustrate how Haddr and Hpos are compiled Recording. Similar to the difference detection phase, the with the following example of three evidence traces: target program is executed to gather address traces. This ev = [r ,r ], ev = [r ,r ,r ,r ,r ], ev = [r ,r ,r ] time, however, we only monitor the previously detected 0 1 2 1 3 3 2 3 1 2 2 1 2 differences, which significantly reduces trace sizes and Haddr contains one entry per address, counting how instrumentation time. For each instruction that caused often each address occurs in the traces. Thus, Haddr = address differences in the first phase, we gather individ- [3,4,3] for addresses [r1,r2,r3]. Hpos records the length ual evidence traces. Addresses accessed in case of data of the traces, which yields Hpos = [0,1,1,0,1] for lengths differences are written to the trace in chronological or- 1 to 5. For illustration purposes, counting the number der. For control flow differences, the branch target ad- of accesses per position would yield [3,3,2,1,1,0] for dresses taken at the branch points are written to the evi- positions 1 to 6. The (negative) differences between the dence trace, again in chronological order. positions are [0,1,1,0,1], which is exactly Hpos. Building Histograms. As we execute the target program Implications. While the use of Haddr and Hpos reduces with multiple inputs, we accumulate the evidence traces the measurement effort, we might miss leaks that only

610 27th USENIX Security Symposium USENIX Association show up in Hfull. Such a leak would occur, if the se- larger number of samples increases the sensitivity of the cret permutes the addresses in the evidence traces, e.g., Kuiper statistic. It is approximated as [r ,r ] and [r ,r ], while the length of the evidence traces 1 2 2 1 r as well as the number of accesses per address remains nX nY 0.24 Cst (nX ,nY ) = + 0.155 + q . (5) the same. These special cases can still be detected with a (nX + nY ) nX nY (nX +nY ) multi-dimensional generic leakage test using Hfull. Qst is derived from the asymptotic distribution of the 5.2 Generic Leakage Test Kuiper statistic. It links the test statistic to a certain con- fidence level and is defined as We compile the evidence traces into two histograms, ∞ namely Hfix and Hfix for fixed secret inputs, and Hrnd 2 2 addr pos addr Q (λ) = 2 4i2λ 2 − 1e−2i λ . (6) rnd st ∑ and Hpos for random inputs. If these histograms can be i=1 distinguished, the corresponding address difference con- −1 stitutes a true information leak. In side-channel litera- Its inverse, Qst , is calculated numerically. The value ture [33, 67], this fixed-vs-random input testing is typ- (1 − α) determines the probability with which Kuiper’s ically done by applying Welch’s t-test [83] to distribu- test produces false positives. For all tests performed in tions of power consumption, electromagnetic emanation, this work, this probability is set to 0.0001. If Kuiper’s or execution time measurements. For DATA, we can- test statistic is significant, the corresponding data or not use the t-test, because it assumes normal distribu- control-flow difference is flagged as an information leak. tions and evidence trace distributions are not necessar- Accuracy. The probability of reporting false positives ily normal. Instead, we use the more generic Kuiper’s is sufficiently minimized by the choice of (1 − α). False test [49], which does not make this assumption. The test negatives can occur, if the histograms Haddr and Hpos are essentially determines whether two probability distribu- insufficient estimations of the underlying evidence dis- tions stem from the same base distribution or not. It is tributions. This happens if the number of program exe- closely related to the Kolmogorov-Smirnov (KS) test but cutions for fixed and random inputs is too small. It is, performs better when distributions differ in the tails in- however, a common problem of unspecific leakage test- stead of around the median. Since we do not assume ing to determine a required minimum number [55, 72]. anything about the tested distributions, we choose the in- Analysts using DATA should therefore add traces until creased sensitivity of Kuiper’s test over the KS test at the test results stabilize and no new leaks are detected. almost identical computational cost. In preparation for Kuiper’s test, we normalize our pre- 6 Leakage Classification Phase viously compiled histograms to obtain probability dis- tributions. For the explanation of the test, assume two The leakage classification phase is based on a specific random variables X and Y, for which nX and nY samples leakage test, which tests for linear and non-linear rela- are observed. The first step of the test is to derive the tions between the secret input and the evidences of in- empirical distribution functions FX (x) and FY (x) as formation leaks. Finding these relations requires appro- priate representations for both input and evidence traces, 1 nX which are described in the following two sections. F (x) = · I (x) . X ∑ [Xi,∞] (2) nX i=1 6.1 Evidence Representation I is the indicator function, which is 1 if Xi ≤ x, and 0 Similar to the leakage detection phase, we collect evi- otherwise. FY (x) is calculated accordingly. The Kuiper statistic V is then computed as dence traces for multiple random secret inputs. Unlike before, however, we do not merge evidence traces into V = sup[FX (x) − FY (x)] + sup[FY (x) − FX (x)] . (3) histograms, since this would dismiss information about x x which input belongs to which evidence trace. Instead, we The deviation of both distributions is significant if the aggregate evidence traces into evidence matrices, where Kuiper statistic V exceeds the significance threshold: each column represents a unique trace (and unique se- cret input). Since evidence traces might differ both in Q−1 (1 − α) length and accessed addresses, we cannot store them di- V = st . (4) st C (n ,n ) rectly in a matrix. Instead, we capture the characteristics st X Y ev of the evidence traces in two separate matrices, Maddr ev Cst relates the threshold to the number of samples each and Mpos. The rows in both matrices correspond to the ev empirical distribution is based on. This is important, as a possible addresses in the traces. Maddr stores the number

USENIX Association 27th USENIX Security Symposium 611 ev ev of accesses per address. If an address does not occur in a test, we derive Maddr and Mpos from the traces. We ev trace, the corresponding matrix entry is set to zero. Mpos also transform the secret inputs according to the chosen stores the position of each address in the evidence trace. leakage model L and store the results in the input ma- in If an address does not occur in a trace, the matrix entry is trix ML . Similar to the evidence matrices, every input in set to ’-1’. This labels an absent address and has no neg- gets assigned a column in ML . The number of rows is ative impact on the statistical test. Any other negative defined by the model, e.g., the Hamming weight of the in value works as well, because all valid positions are non- entire input requires one row. All rows in ML are then ev ev negative integers. If an address occurs more than once in compared to all rows in Maddr and Mpos. For these com- a trace, the matrix entry is set to the rounded median of parisons, the selected rows are interpreted as pairwise the trace positions. The median adequately determines observations of two random variables, X and Y, with around which position in the evidence trace an address is length nX = nY = n. We then use the Randomized De- accessed most frequently. pendence Coefficient (RDC) [53] to determine the rela- The following example illustrates how evidence ma- tion between the observations. The RDC detects linear trices are compiled. We reuse the evidence traces ev0 and non-linear relations between random variables, its to ev2 from Section 5.1 and insert one column for each test statistic R is defined between 0 and 1, with R = 1 trace. For each of the addresses r1 to r3, we insert one showing perfect dependency and R = 0 stating statistical row. After adding the data, we obtain: independence. The parameters of the RDC are set to the values proposed in [53]: k = 20 and s = 1 . In contrast to  1 1 1   0 4 1  6 ev ev mutual information estimators and similar metrics [68], Maddr =  1 1 2 , Mpos =  1 2 1  which are also used in side-channel literature [31], the 0 3 0 -1 1 -1 RDC can be calculated efficiently, especially for large sample sizes (n > 100). We precompute the significance 6.2 Leakage Model threshold Rst for a given confidence level α by generat- ing a sufficiently large number (≥ 104) of statistically in- The transformation of the input is called leakage model. dependent sequences of length n (the same length as the It defines which property or part of the secret input rows in Mev , Mev , and Min) and estimating the distri- is compared to the evidence representations stored in addr pos L bution of R. Since the resulting distribution is approxi- Mev and Mev . This serves two purposes. First, it con- addr pos mately normal, we estimate the mean µ and the standard fines the scope of the statistical test. This is important deviation σ. The significance threshold is then derived because the complete input space of a secret is often too from Φ−1(x), which is the inverse cumulative distribu- large to handle in practice, e.g., > 2128 for strong crypto- tion function of the standard normal distribution, as fol- graphic keys. Second, this confinement implicitly quan- lows: tifies the information an adversary could gain from ob- −1 serving evidences. A well-known leakage model is the Rst = µ + σ · Φ (α) . (7) Hamming weight model [55], which reduces a secret in- put to the number of its 1-bits. In [89], the Hamming The value (1 − α) determines the probability with weight model is used to find leaks in asymmetric which the RDC produces false positives. For all tests implementations. Another popular approach is slicing performed in this work, it is set to 0.0001. If R exceeds the secret input into smaller chunks [46], e.g., bytes or Rst , the tested rows exhibit a significant statistical rela- bits. While input slices are a good fit for byte- and bit- tion. This means that an adversary is able to infer the wise operations in symmetric ciphers, they might not values and properties of the secret input that are defined be the best fit for big-integer operations in asymmetric by the leakage model from side-channel observations. ciphers. Clearly, the choice of an appropriate leakage Accuracy. The probability of reporting false positives is model is important, but ultimately depends on the target sufficiently minimized by the choice of (1 − α). False program. It requires some degree of domain knowledge, negatives can occur if the number of observations n is which we assume that analysts have. Our framework is too small. Similar to the discussion in Section 5, it is designed to support a variety of leakage models, includ- not possible to determine a required minimum number ing Hamming weight and input slicing. of observations that holds for arbitrary target programs. Naturally, simple and direct relations will be discovered with far less observations than faint and indirect ones. 6.3 Specific Leakage Test For the specific leakage test, the target binary is executed 7 Implementation and Optimizations n times with random secret inputs. Instead of gathering new measurements, we reuse the (random input) traces While the concept of DATA is platform independent, we from the leakage detection phase. In preparation for the implement trace recording on top of the Intel Pin frame-

612 27th USENIX Security Symposium USENIX Association 1 Table 2: Leakage summary of algorithms. Generic Specific generic Algorithm Differences s Dismissed CF Data Byte/Bit HW 1 AES-NI 0 (2) 0 0 0 (2) 0 (2) - 0.5 HW(s1) AES-VP 0 0 0 0 0 - AES bit-sliced 4 0 0 4 4 - Rst AES T-table 20 0 0 20 20 - Blowfish 194 0 0 194 171 - Test statistics Vst Camellia 82 0 0 82 55 - CAST 202 0 0 202 133 - 0 OpenSSL DES 138 0 0 138 63 - Triple DES 410 0 0 410 292 - 1 2 3 4 5 ECDSA (secp256k1) 515 487 1 27 3 1 DSA 781 354 160 267 19 33 AES rounds RSA 2248 1510 278 460 11 139 AES 96 0 0 96 96 - ARC4 5 0 0 5 5 - Figure 3: OpenSSL AES T-table leakage classification. Blowfish 384 0 0 384 384 - CAST 284 0 0 284 216 - PyCrypto Triple DES 108 0 12 96 101 - work [38] for analyzing x86 binaries. We record address traces in separate trace files. To reduce their size, we only 8.1 Analysis Results monitor instructions with branching behavior and their target branch as well as instructions performing memory Table 2 shows the results of the three phases of DATA, operations. This suffices to detect control-flow and data namely address differences, generic and specific leaks. leakage. To speed up recording of evidence traces in the OpenSSL (Symmetric Primitives). As summarized second phase, we only record those instructions flagged in the upper part of Table 2, AES-NI (AES new in- as potential leaks in the first phase. structions [37]) as well as AES-VP (vector permuta- We implement the difference detection as well as the tions based on SSSE3 extensions) do not leak. However, generic and the specific leakage tests in Python scripts, when using AES-NI (and other ciphers) via the OpenSSL which condense all findings into human-readable leakage command-line tool, the key parsing yields two data leaks, reports in XML format, structuring information leaks by as indicated in brackets. Calling the AES-NI implemen- libraries, functions, and call stacks. tation without this command-line tool, as also done for Tracking Heap Allocations. Depending on the utiliza- the other three AES implementations, does not trigger tion of the heap, identical memory objects could get as- these two data leaks. Besides, we identified four data signed different addresses by the memory allocator. Dur- leaks in the bit-sliced AES. While OpenSSL uses the pro- ing trace analysis, this could cause the same objects to tected implementation by Kaspar¨ and Schwabe [43] for be interpreted as different ones. We encountered such the actual encryption, they use the same unprotected key behavior for OpenSSL, which dynamically allocates big expansion as used in T-table implementations. numbers on the heap and resizes them on demand. This All other tested symmetric implementations yield a causes frequent re-allocations and big numbers hopping significant number of data leaks since they rely on between different heap addresses for different program lookup tables with key-dependent memory accesses, executions. Our Pintool can therefore be configured to which makes them vulnerable to cache attacks [11, 77]. detect heap objects and replace their virtual address with These leaks have also been confirmed by the byte leak- its relative address offset. Currently, our analysis treats age model test. Figure 3 shows statistical test results of all heap objects equally, making the results more read- the vulnerable AES T-table implementation for the first able. More elaborate approaches like [73] are left as fu- five rounds, averaged over the 16 table lookups in each ture work. round. Phase two finds generic key dependencies, re- gardless of the round (values well above Vst ), confirm- 8 Evaluation and Results ing its accuracy. The chosen byte leakage model detects linear dependencies to the first round state (s1), which We used Pin version 3.2-81205 for instrumentation and allows known-plaintext attacks [11]. For intermediate compiled glibc 2.24 as well as OpenSSL 1.1.0f2 in a rounds, for which the chosen byte leakage model is not default configuration with additional debug information, applicable, the test output is well below the threshold Rst . using GCC version 6.3.0. Although debug symbols are By adapting the leakage model to the last round state, one not required by DATA, it incorporates available debug could also test for ciphertext-only attacks [60]. More- symbols in the final report. This allows to map detected over, one can see that the Hamming weight model on leaks to the responsible functions and data symbols. key bytes detects the same leakage but with a lower con- fidence, since it loses information about the key. This 2Specifically, we tested commit 7477c83e15. emphasizes the importance of choosing appropriate leak-

USENIX Association 27th USENIX Security Symposium 613 age models. We summarize results in Appendix A. OpenSSL (Asymmetric Primitives). The asymmetric 2,000 primitives show significant non-deterministic behavior, Non-determinism 1,500 which is dismissed in the leakage detection phase. For RSA data leaks example, OpenSSL uses RSA base blinding with a ran- RSA control-flow leaks dom blinding value. From 2248 differences in RSA, 1,000 1510 are dismissed, leaving 278 control-flow and 460 500 data leaks with key dependency. Among those, we found

two constant-time vulnerabilities in RSA and DSA, re- No. discovered leaks/differences 0 spectively, which bypass constant-time implementations 2 5 10 15 20 25 30 in favor of vulnerable implementations. This could allow No. traces key recovery attacks similar to [3, 82]. Moreover, DATA reconfirms address differences in the ECDSA wNAF im- Figure 4: Dismissed non-deterministic differences and plementation, as exploited in [10, 26, 79]. discovered leaks for OpenSSL RSA as stacked plot. For asymmetric ciphers, we applied the Hamming weight (HW) model as well as the key bit model. The majority of leaks reported by the HW model are indi- high probability for quickly hitting all variations in the cating that the length of the key or of intermediate val- program execution. This suggests that the difference ues leaks (as the HW usually correlates with the length). detection phase achieves good accuracy for symmetric For example, we detect leaks in functions that deter- ciphers. Symmetric ciphers are typically deterministic, mine the length of big numbers, reconfirming the find- thus all differences are key-dependent. Indeed, Table 2 ings of [80]. Also, OpenSSL uses lazy heap allocation to shows that the leakage detection phase confirms all dif- resize objects on demand. This can cause different heap ferences as leaks. addresses for different key lengths, which will show up To evaluate DATA’s accuracy on non-deterministic as data leakage. In contrast to the HW, the key bit model programs, we tested OpenSSL asymmetric ciphers and is more fine-grained and thus targets very specific leaks collected up to 30 traces, as shown in Figure 4. While only, e.g., it reveals leaks that occur when the private key the address differences found in the difference detection is parsed. This constitutes an insecure usage of the pri- phase do not settle within 30 traces (introducing false vate key, and a very subtle bug to find. Details about negatives), an important finding is that the majority of leaking functions are given in Appendix A. these differences are due to statistically independent pro- Python. We tested PyCrypto 2.6.1 running on CPython gram activity, e.g., RSA base blinding. These differences 2.7.13. The lower part of Table 2 summarizes our results. are characterized as key-independent and successfully PyCrypto incorporates native shared libraries for certain filtered in the leakage detection phase. The number of cryptographic operations. From a side-channel perspec- actual data and control-flow leaks with key dependencies tive, this is desirable since those native libraries could be already settles at 4 traces. The few leaks observable with tightened against side-channel attacks, independently of more traces are due to heap cleanup (these leaks were al- the used interpreter. However, we found that all ciphers ready discovered at heap allocation), leakage of the heap leak key bytes via unprotected lookup table implemen- object’s size, and exploring more paths of already dis- tations within those shared libraries, as indicated by the covered programming bugs. For example, DATA discov- byte leakage model. We list the leaks in Appendix A. ered the aforementioned RSA constant-time vulnerabil- Leakage-free Crypto. We analyzed Curve25519 in ity, which was missed by other solutions, with only two NaCl [13] as well as the corresponding Diffie-Hellman traces. Analyzing more traces identifies more informa- variant of OpenSSL (X25519) and found no address- tion leaks caused by the same programming bug. Hence, based information leakage (apart from OpenSSL’s key we recommend ≤ 10 traces for asymmetric primitives as parsing), approving their side-channel security. a conservative choice. We observed similar behavior for DSA and ECDSA, but omit the details for brevity. 8.2 Discussion Performance. We ran our experiments on a Xeon E5- 2630v3 with 386 GB RAM. DATA achieves good perfor- Detection Accuracy. For symmetric algorithms in mance, adapting its runtime to the number of discovered OpenSSL, we recorded up to 10 traces in the difference leaks. Analysis of the leakage-free AES-NI and AES- detection phase. We found that 3 traces are sufficient as VP took around 6 s, as only the first phase is needed. more traces did not uncover additional differences. The Finding leaks in the OpenSSL AES T-table implementa- low number of traces results from the high diffusion and tion took 5 CPU minutes. Leakage classification took 8 the regular design of symmetric ciphers, which yields a CPU min. Asymmetric algorithms require more traces

614 27th USENIX Security Symposium USENIX Association and yield significantly more differences. Hence, the first 9 Mitigating Address-based Leaks phases took between 29.8 (for DSA) and 79.8 CPU min- utes (for ECDSA). Running all three phases on RSA After using DATA to identify address-based information takes 233.8 CPU minutes with a RAM utilization of less leaks in cryptographic software implementations, the fol- than 4.5 GB (single core). By exploiting parallelism, lowing approaches could be applied as mitigation. the actual execution time can be significantly reduced, Software-based Mitigations. Coppens et al. [21] e.g., from 55 min to approximately 250 s for the first proposed compiler transformations to eliminate key- phase of RSA. Analyzing PyCrypto yields large address dependent control-flow dependencies. Similar ap- traces due to the interpreter (1GB and more), neverthe- proaches are followed by other program transforma- less DATA handles such large traces without hassle: The tions [2, 56] and transactional branching [8]. Data leaks first phase discards all non-leaking instructions, stripping of lookup table implementations can be mitigated by bit- down trace sizes of the subsequent phases to kilobytes slicing [43, 47, 66]. Scatter-gather prevents data leaks on (see Appendix B). RSA exponentiation by interleaving data in memory such that cache lines are accessed irrespective of the used in- Summary. The adoption of side-channel countermea- dex. However, scatter-gather must be implemented cor- sures is often partial, error-prone, and non-transparent rectly to prevent more sophisticated attacks [88]. Obliv- in practice. Even though countermeasures have been ious RAM [32, 77, 91] has been proposed as a generic known for over a decade [66], most OpenSSL symmetric countermeasure against data leaks by hiding memory ac- ciphers as well as PyCrypto do not rely on protected im- cess patterns. Hardened software implementations could plementations like bit-slicing. Also, the bit-sliced AES then be proven leakage-free using [4, 5, 7, 16, 25]. adopted by OpenSSL leaks during the key schedule, as Mitigations on Architectural/OS Level. Cache color- the developers integrated it only partially [43] since prac- ing [69] and similar cache isolation mechanisms [52] tical attacks have not been shown yet. Moreover, we dis- have been proposed to mitigate cache attacks. Oth- covered two new vulnerabilities, bypassing OpenSSL’s ers [90] proposed OS-level defenses against last-level constant-time implementations for RSA and DSA initial- cache attacks by controlling page sharing via a copy-on- ization. Considering incomplete bug fixes of similar vul- access mechanism. Hardware transactional memory can nerabilities identified by Garcia et al. [27, 28], this sums be used to mitigate cache attacks by keeping all sensitive up to four implementation bugs related to the same coun- data in the cache during the computation [34]. Compiler- termeasure. This clearly shows that the tedious and error- based tools aim to protect SGX enclaves against cache prone task of implementing countermeasures should be attacks [18] or controlled channel attacks [70]. backed by appropriate tools such as DATA to detect and appropriately fix vulnerabilities as early as possible. We found issues in loading and parsing cryptographic 10 Conclusion keys as well as initialization routines. Finding these is- sues demands analysis of the full program execution, In this work, we proposed differential address trace from program start to exit, which is out of reach for many analysis (DATA) to identify address-based information existing tools. Also, analysis often neglects these infor- leaks. We use statistical tests to filter non-deterministic mation leaks because an attacker typically has no way program behavior, thus improving detection accuracy. to trigger key loading and other single events in prac- DATA is efficient enough to analyze real-world software tice. However, when using OpenSSL inside SGX en- – from program start to exit. Thereby, we include key claves (cf. Intel’s SGX SSL library [40]), the attacker loading and parsing in the analysis and found leakage can trigger arbitrarily many program executions, mak- which has been missed before. Based on DATA, we con- ing single-event leakage practically relevant, as demon- firmed existing and identified several unknown informa- strated by the RSA key recovery attack in [82]. tion leaks as well as already (supposedly) fixed vulnera- bilities in OpenSSL. In addition, we showed that DATA Responsible Disclosure. We informed the library de- is capable of analyzing interpreted code (PyCrypto) in- velopers as well as Intel of our findings. In response, cluding the underlying interpreter, which is conceptually OpenSSL merged our proposed patches upstream. impossible with current static methods. This shows the Security Implications. A leak found by DATA does not practical relevance of DATA in assisting security analysts necessarily constitute an exploitable vulnerability. The to identify information leaks as well as developers in the leakage classification phase helps in rating its severity, tedious task of correctly implementing countermeasures. however, an accurate judgment often demands significant Outlook. The generic design of DATA also allows effort in assembling and improving concrete attacks [12]. detecting other types of leakage such as variable time We argue that, unless good counter-arguments are given, floating point instructions by including the instruction any leak should be considered serious. operands in the recorded address traces. DATA also

USENIX Association 27th USENIX Security Symposium 615 paves the way for analyzing other interpreted languages 1 i n t BN MONT CTX set (BN MONT CTX ∗mont , and quantifying the effects of interpretation and just-in- 2 BIGNUM ∗mod , BN CTX ∗ c t x ) { 3 . . . time compilation on side-channel security. Moreover, 4 BN copy(&(mont->N) , mod ) ; DATA could be extended to analyze multi-threaded pro- 5 . . . 6 BN mod inverse(Ri, R, &mont->N, c t x ) ; grams by recording and analyzing individual traces per 7 . . . execution thread. 8 } Listing 3: OpenSSL RSA vulnerability. Acknowledgments

We would like to thank the anonymous reviewers, as Similar to symmetric ciphers, asymmetric implementa- well as Mario Werner and our shepherd Stephen McCa- tions leak during key loading and parsing. We found mant for their valuable feedback and insightful discus- leaks in EVP DecodeUpdate, in EVP DecodeBlock via sions that helped improve this work. lookup table data ascii2bin, in c2i ASN1 INTEGER This work was partially supported by the TU Graz that uses c2i ibuf and in BN bin2bn. Although the key LEAD project “Dependable Internet of Things in Ad- is typically loaded only once at program startup, this has verse Environments”, by the Austrian Research Promo- direct implications on applications using Intel SGX SSL. tion Agency (FFG) via the K-project DeSSnet, which is DATA discovered two new vulnerabilities regarding funded in the context of COMET – Competence Centers OpenSSL’s handling of constant-time implementations. for Excellent Technologies by BMVIT, BMWFW, Styria The first one leaks during the initialization of Mont- and Carinthia, as well as the European Research Coun- gomery constants for secret RSA primes p and q. This cil (ERC) under the European Unions Horizon 2020 re- is a programming bug: the so-called constant-time search and innovation programme (grant agreement No flag is set for p and q in function rsa ossl mod exp 681402). but not propagated to temporary working copies inside BN MONT CTX set, as shown in Listing 3, since the func- tion BN copy in line 3 does not propagate the consttime- A Leaking Functions flag from mod to mont->N. This causes the inversion in line 5 to fall back to non-constant-time implementations OpenSSL (Symmetric Primitives). To analyze AES, (int bn mod inverse and BN div). The second vul- we implemented a wrapper that calls the algorithm di- nerability is a missing constant-time flag for the DSA rectly. For other algorithms, we used the openssl enc private key inside dsa priv decode. This causes the command-line tool with keys in hex format. DATA iden- DSA key loading to use the unprotected exponentiation tified information leaks in the code that parses these keys. function BN mod exp mont. Moreover, DATA confirms In particular, the leaks occur in function set hex, which that ECDSA still uses the vulnerable point multiplication uses stdlib’s isxdigit function that performs leaking in ec wNAF mul, which was exploited in [10, 26, 79]. table lookups. Besides, OPENSSL hexchar2int uses a Finally, we found that the majority of information switch case to convert key characters to integers. Al- leaks reported for OpenSSL are leaking the length of the though symmetric keys are usually stored in binary for- key or of intermediate variables. For example, we recon- mat, one should be aware of such leaks. firm the leak in BN num bits word [80], which leaks the The bit-sliced AES implementation uses the vulner- number of bits of the upper word of big numbers. There able x86 64 AES set encrypt key function for key are several examples where the key length in bytes is schedule. In addition, the unprotected AES leaks in func- leaked, e.g., via ASN1 STRING set, BN bin2bn, strlen tion x86 64 AES encrypt compact. Blowfish leaks of glibc as well as via heap allocation. at BF encrypt, Camellia leaks the LCamellia SBOX at Camellia Ekeygen and x86 64 Camellia encrypt, PyCrypto. PyCrypto symmetric ciphers leak dur- CAST leaks the CAST S table0 to 7 at CAST set key ing encryption, mostly via lookup tables. AES as well as CAST encrypt, DES leaks the des skb at leaks the tables Te0 to Te4 and Td0 to Td3 DES set key unchecked as well as DES SPtrans at in functions ALGnew, rijndaelKeySetupEnc and DES encrypt2. rijndaelEncrypt. Blowfish leaks in functions ALGnew and Blowfish encrypt. CAST leaks the tables S1 to OpenSSL (Asymmetric Primitives). For the analysis S4 in function block encrypt and the tables S5 to S8 of asymmetric ciphers, we use OpenSSL to generate keys in schedulekeys half. Triple DES leaks the table in PEM format and then invoke the openssl pkeyutl des ip in function desfunc as well as deskey. ARC4 command-line tool to create signatures with those keys. leaks in function ALGnew.

616 27th USENIX Security Symposium USENIX Association B Performance much larger number of evidences compared to PyCrypto AES, as can be seen from their trace sizes (271.8 kB for Table 3 summarizes the performance figures of DATA Blowfish versus 13.6 kB for AES). In general, testing for each phase.3 Unless stated otherwise, all timings re- the HW model is faster than the bit model because the flect the runtime in CPU minutes (single-core) and thus HW cumulates all key bits into a single metric, while for represent a fair and conservative metric. If tasks are par- the bit model we need to analyze multiple key bits in- allelized, the actual runtime can be significantly reduced. dependently. Table 2 shows that the cumulative runtime Difference Detection Phase. For OpenSSL, the trace over both models is between 55 and 95 min. Also, the size is < 30 MB for symmetric and < 55 MB for asym- classification phase is generally slower than the leakage metric ciphers. For PyCrypto, each trace has approxi- detection phase. This is because, first, DATA performs mately 1 GB, because the execution of the interpreter is more specific leakage tests than generic ones (Haddr/pos included. Regarding runtime, OpenSSL symmetric ci- ev vs. Maddr/pos), and second, the RDC is more costly to phers require less than a minute. PyCrypto ciphers finish compute than Kuiper’s test. We believe significant per- in 5 minutes or less, despite large trace sizes. OpenSSL formance savings are possible by pruning large evidence asymmetric ciphers need between 29.8 and 79.8 CPU lists and by optimizing the RDC implementation. minutes for two reasons. First, they require more traces. Summary. The last two columns illustrate that the over- As we compare traces pairwise in the first phase, the run- all performance of DATA adapts to the amount of dis- time grows quadratically in the number of traces. Sec- covered leakage, which is desirable. Leakage-free im- ond, asymmetric ciphers yield significantly more differ- plementations finish within 6 s, while leaky ones take up ences that need to be analyzed. Especially control-flow to 580 CPU minutes. In any of the phases, analysis re- differences demand costly re-alignment of traces. Yet, quires less than 4.5 GB of RAM when executing on a sin- these results are quite encouraging, especially since the gle core. This is within the range of desktop computers automated analysis of large real-world software stacks is and commodity laptops. When multi-core environments out of reach for many existing tools. Also, we see possi- are available, one can exploit parallelism to greatly speed ble improvements in further speeding up analysis times. up analysis times. In fact, we parallelized phase one and Leakage Detection Phase. We analyze three fixed and reduced its runtime for RSA from 55 CPU minutes to one random set a` 60 traces, yielding 240 traces in to- approximately 250 real seconds. Similar optimizations tal. Since this phase only analyzes address differences could be implemented for phase two and three. More- reported by the previous phase, the sizes of the recorded over, when doing frequent testing, software developers traces are significantly smaller. From several MB to over could not only omit the leakage classification phase in- 1 GB in phase one, the traces are now several KB to tended for security analysts but also skip the leakage de- around 1.3 MB for RSA. This makes recording and an- tection phase in case of deterministic algorithms. alyzing an even larger number of traces, e.g., more than 240, efficient. For example, the analysis of OpenSSL bit- References sliced AES takes less than 5 CPU minutes. As expected, analyzing PyCrypto takes longer due to the instrumen- [1]A CIIC¸ MEZ,O.,KOC¸ , C¸. K., AND SEIFERT, J. Predicting Secret tation of the Python interpreter. Also, analysis of RSA Keys Via Branch Prediction. In Topics in Cryptology – CT-RSA 2007 (2007), vol. 4377 of LNCS, Springer, pp. 225–242. is slower due to the high number of address differences [2]A GAT, J. Transforming Out Timing Leaks. In Principles of Pro- to analyze. For example, RSA generates traces of up to gramming Languages – POPL 2000 (2000), ACM, pp. 40–53. 1343.9 KB to be analyzed. Nevertheless, phase two com- [3]A LDAYA,A.C.,GARC´IA, C. P., TAPIA,L.M.A., AND BRUM- pletes within less than 61 CPU minutes. LEY, B. B. Cache-Timing Attacks on RSA Key Generation. Leakage Classification Phase. The last phase records IACR Cryptology ePrint Archive 2018 (2018), 367. and analyzes 200 traces with random keys. To speed up [4]A LMEIDA,J.B.,BARBOSA,M.,BARTHE,G.,DUPRESSOIR, recording, we reuse traces from the random input set of F., AND EMMI, M. Verifying Constant-Time Implementations. the previous phase. We benchmarked symmetric ciphers In USENIX Security Symposium 2016 (2016), USENIX Associa- tion, pp. 53–70. with the byte leakage model. Analysis times vary heav- ily between ciphers, because the performance critically [5]A LMEIDA,J.B.,BARBOSA,M.,PINTO,J.S., AND VIEIRA, B. Formal Verification of Side-Channel Countermeasures Using depends on the number of reported address leaks and the Self-Composition. Sci. Comput. Program. 78 (2013), 796–812. size of the evidences, which need to be classified. For [6]A PECECHEA,G.I.,EISENBARTH, T., AND SUNAR, B. S$A: instance, most ciphers complete in less than 80 minutes, A Shared Cache Attack That Works across Cores and Defies VM and AES bit-sliced in even 3.2 minutes. In contrast, Py- Sandboxing - and Its Application to AES. In IEEE Symposium Crypto Blowfish took almost 9 CPU hours because of a on Security and Privacy – S&P 2015 (2015), IEEE Computer Society, pp. 591–604. 3The overall performance might be higher than the sum of all [7]B ARTHE,G.,BETARTE,G.,CAMPO,J.D.,LUNA,C.D., AND phases because it includes the generation of final reports. PICHARDIE, D. System-level Non-interference for Constant-

USENIX Association 27th USENIX Security Symposium 617 Table 3: Performance of DATA during the analysis of OpenSSL (top) and PyCrypto (bottom). Sizes are per trace. Time is in CPU minutes. Trace sizes for Classification are identical to Leakage Detection. Difference Detection Leakage Detection Classification Total Algorithm Size Time Size Time Time Time RAM Traces Traces Traces (MB) (min.) (kB) (min.) (min.) (min.) (MB) AES-NI 3 0.5 0.1 - - - - - 0.1 72.0 AES-VP 3 0.5 0.1 - - - - - 0.1 72.2 AES bit-sliced 3 0.5 0.4 240 0.2 4.6 200 3.2 8.4 77.1 AES T-table 3 0.5 0.4 240 1.8 4.6 200 8.0 13.2 101.4 Blowfish 3 28.2 0.8 240 264.8 13.7 200 79.1 96.0 717.8 Camellia 3 27.3 0.6 240 2.5 9.0 200 17.5 27.3 146.8 CAST 3 27.3 0.6 240 5.4 9.2 200 36.3 46.4 247.5 OpenSSL DES 3 27.3 0.6 240 3.9 9.1 200 9.9 19.9 139.5 Triple DES 3 27.3 0.7 240 13.9 10.5 200 49.2 60.9 351.7 ECDSA (secp256k1) 10 54.1 79.8 240 387.9 18.3 200 55.3 161.2 1,316.3 DSA 10 35.6 29.8 240 195.4 14.7 200 56.9 106.1 1,054.6 RSA 10 44.2 55.0 240 1,343.9 60.9 200 94.3 233.8 4,414.0 AES 3 1081.6 4.0 240 13.6 43.6 200 88.2 136.2 1,223.0 ARC4 3 1081.5 3.9 240 6.4 43.1 200 60.3 107.6 1,222.7 Blowfish 3 1082.3 5.0 240 271.8 47.9 200 526.5 582.2 2,302.6 CAST 3 1081.6 4.0 240 11.8 44.0 200 76.7 125.1 1,223.0 PyCrypto Triple DES 3 1082.4 4.2 240 65.8 45.0 200 63.3 113.3 1,223.8

time Cryptography. In Conference on Computer and Commu- AND THOMPSON, L. Vale: Verifying High-Performance Cryp- nications Security – CCS 2014 (2014), ACM, pp. 1267–1279. tographic Assembly Code. In USENIX Security Symposium 2017 (2017), USENIX Association, pp. 917–934. [8]B ARTHE,G.,REZK, T., AND WARNIER, M. Preventing Tim- ing Leaks Through Transactional Branching Instructions. Electr. [17]B OS, J. W., HUBAIN,C.,MICHIELS, W., AND TEUWEN, P. Notes Theor. Comput. Sci. 153 (2006), 33–55. Differential Computation Analysis: Hiding Your White-Box De- [9]B ASU, T., AND CHATTOPADHYAY, S. Testing Cache Side- signs is Not Enough. In Cryptographic Hardware and Embed- Channel Leakage. In Internaional Conference on Software Test- ded Systems – CHES 2016 (2016), vol. 9813 of LNCS, Springer, ing, Verification and Validation Workshops – ICST Workshops pp. 215–236. (2017), IEEE Computer Society, pp. 51–60. [18]B RASSER, F., CAPKUN,S.,DMITRIENKO,A.,FRASSETTO, [10]B ENGER,N., VAN DE POL,J.,SMART, N. P., AND YAROM, Y. T., KOSTIAINEN,K.,MULLER¨ ,U., AND SADEGHI,A. ”Ooh Aah... Just a Little Bit” : A Small Amount of Side Channel DR.SGX: Hardening SGX Enclaves against Cache Attacks with Can Go a Long Way. In Cryptographic Hardware and Embed- Data Location Randomization. CoRR abs/1709.09917 (2017). ded Systems – CHES 2014 (2014), vol. 8731 of LNCS, Springer, [19]B RUMLEY,B.B., AND HAKALA, R. M. Cache-Timing Tem- pp. 75–92. plate Attacks. In Advances in Cryptology – ASIACRYPT 2009 [11]B ERNSTEIN, D. J. Cache-Timing Attacks on AES, (2009), vol. 5912 of LNCS, Springer, pp. 667–684. 2004. Technical report: https://cr.yp.to/antiforgery/ cachetiming-20050414.pdf. Accessed: 2018-05-29. [20]C HATTOPADHYAY,S.,BECK,M.,REZINE,A., AND ZELLER, A. Quantifying the information leak in cache attacks via sym- [12]B ERNSTEIN,D.J.,BREITNER,J.,GENKIN,D.,BRUIN- bolic execution. In International Conference on Formal Meth- DERINK,L.G.,HENINGER,N.,LANGE, T., VAN VREDEN- ods and Models for System Design – MEMOCODE 2017 (2017), DAAL,C., AND YAROM, Y. Sliding Right into Disaster: Left-to- ACM, pp. 25–35. Right Sliding Windows Leak. In Cryptographic Hardware and Embedded Systems – CHES 2017 (2017), vol. 10529 of LNCS, [21]C OPPENS,B.,VERBAUWHEDE,I.,BOSSCHERE,K.D., AND Springer, pp. 555–576. SUTTER, B. D. Practical Mitigations for Timing-Based Side- Channel Attacks on Modern x86 Processors. In IEEE Symposium [13]B ERNSTEIN,D.J.,LANGE, T., AND SCHWABE, P. NaCl: Net- on Security and Privacy – S&P 2009 (2009), IEEE Computer working and Cryptography library. https://nacl.cr.yp.to/. Society, pp. 45–60. Accessed: 2018-05-29. ORBET [14]B LAZY,S.,PICHARDIE,D., AND TRIEU, A. Verifying [22]C , J. The current state of kernel page-table iso- Constant-Time Implementations by Abstract Interpretation. In lation, 2018. https://lwn.net/Articles/741878/. Ac- European Symposium on Research in Computer Security – ES- cessed: 2018-05-29. ORICS 2017 (2017), vol. 10492 of LNCS, Springer, pp. 260–277. [23]C ORON,J.,KOCHER, P. C., AND NACCACHE, D. Statistics and [15]B LEICHENBACHER, D. Chosen Ciphertext Attacks Against Pro- Secret Leakage. In Financial Cryptography – FC 2000 (2000), tocols Based on the RSA Encryption Standard PKCS #1. In vol. 1962 of LNCS, Springer, pp. 157–173. Advances in Cryptology – CRYPTO 1998 (1998), vol. 1462 of [24]D OYCHEV,G.,FELD,D.,KOPF¨ ,B.,MAUBORGNE,L., AND LNCS, Springer, pp. 1–12. REINEKE, J. CacheAudit: A Tool for the Static Analysis of [16]B OND,B.,HAWBLITZEL,C.,KAPRITSOS,M.,LEINO,K. Cache Side Channels. In USENIX Security Symposium 2013 R.M.,LORCH,J.R.,PARNO,B.,RANE,A.,SETTY, S. T. V., (2013), USENIX Association, pp. 431–446.

618 27th USENIX Security Symposium USENIX Association [25]D OYCHEV,G., AND KOPF¨ , B. Rigorous Analysis of Software [42]J OHNSON,N.M.,CABALLERO,J.,CHEN,K.Z.,MCCA- Countermeasures Against Cache Attacks. In Programming Lan- MANT,S.,POOSANKAM, P., REYNAUD,D., AND SONG,D. guage Design and Implementation – PLDI 2017 (2017), ACM, Differential Slicing: Identifying Causal Execution Differences pp. 406–421. for Security Applications. In IEEE Symposium on Security and Privacy – S&P 2011 [26]F AN,S.,WANG, W., AND CHENG, Q. Attacking OpenSSL Im- (2011), IEEE Computer Society, pp. 347– plementation of ECDSA with a Few Signatures. In Conference 362. on Computer and Communications Security – CCS 2016 (2016), [43]K ASPER¨ ,E., AND SCHWABE, P. Faster and Timing-Attack ACM, pp. 1505–1515. Resistant AES-GCM. In Cryptographic Hardware and Embed- [27]G ARC´IA, C. P., AND BRUMLEY, B. B. Constant-Time Callees ded Systems – CHES 2009 (2009), vol. 5747 of LNCS, Springer, with Variable-Time Callers. In USENIX Security Symposium pp. 1–17. 2017 (2017), USENIX Association, pp. 83–98. [44]K OCHER, P., GENKIN,D.,GRUSS,D.,HAAS, W., HAMBURG, [28]G ARC´IA, C. P., BRUMLEY,B.B., AND YAROM, Y. ”Make M.,LIPP,M.,MANGARD,S.,PRESCHER, T., SCHWARZ,M., Sure DSA Signing Exponentiations Really are Constant-Time”. AND YAROM, Y. Spectre Attacks: Exploiting Speculative Exe- In Conference on Computer and Communications Security – CCS cution. meltdownattack.com (2018). 2016 (2016), ACM, pp. 1639–1650. [45]K OCHER, P. C. Timing Attacks on Implementations of Diffie- [29]G E,Q.,YAROM, Y., COCK,D., AND HEISER, G. A Survey of Hellman, RSA, DSS, and Other Systems. In Advances in Cryp- Microarchitectural Timing Attacks and Countermeasures on Con- tology – CRYPTO 1996 (1996), vol. 1109 of LNCS, Springer, temporary Hardware. J. Cryptographic Engineering 8 (2018), 1– pp. 104–113. 27. [46]K OCHER, P. C., JAFFE,J., AND JUN, B. Differential Power [30]G EORGIADIS,L.,WERNECK, R. F. F., TARJAN,R.E.,TRI- Analysis. In Advances in Cryptology – CRYPTO 1999 (1999), ANTAFYLLIS,S., AND AUGUST, D. I. Finding Dominators in vol. 1666 of LNCS, Springer, pp. 388–397. Practice. In European Symposium on Algorithms – ESA 2004 (2004), vol. 3221 of LNCS, Springer, pp. 677–688. [47]K ONIGHOFER¨ , R. A Fast and Cache-Timing Resistant Imple- mentation of the AES. In Topics in Cryptology – CT-RSA 2008 [31]G IERLICHS,B.,BATINA,L.,TUYLS, P., AND PRENEEL,B. (2008), vol. 4964 of LNCS, Springer, pp. 187–202. Mutual Information Analysis. In Cryptographic Hardware and Embedded Systems – CHES 2008 (2008), vol. 5154 of LNCS, [48]K OPF¨ ,B.,MAUBORGNE,L., AND OCHOA, M. Automatic Springer, pp. 426–442. Quantification of Cache Side-Channels. In Computer Aided Verification – CAV 2012 (2012), vol. 7358 of LNCS, Springer, [32]G OLDREICH,O., AND OSTROVSKY, R. Software Protection and Simulation on Oblivious RAMs. J. ACM 43 (1996), 431–473. pp. 564–580. [33]G OODWILL,G.,JUN,B.,JAFFE,J., AND ROHATGI, P. [49]K UIPER, N. H. Tests concerning random points on a circle. Inda- A Testing Methodology for Side Channel Resistance Val- gationes Mathematicae (Proceedings) 63, Supplement C (1960), idation, 2011. http://csrc.nist.gov/news_events/ 38–47. non-invasive-attack-testing-workshop/papers/08_ [50]L ANGLEY, A. ctgrind: Checking that Functions are Constant Goodwill.pdf. Accessed: 2018-05-29. Time with Valgrind. https://github.com/agl/ctgrind. [34]G RUSS,D.,LETTNER,J.,SCHUSTER, F., OHRIMENKO,O., Accessed: 2018-05-29. HALLER,I., AND COSTA, M. Strong and Efficient Cache Side- [51]L IPP,M.,SCHWARZ,M.,GRUSS,D.,PRESCHER, T., HAAS, Channel Protection using Hardware Transactional Memory. In W., MANGARD,S.,KOCHER, P., GENKIN,D.,YAROM, Y., USENIX Security Symposium 2017 (2017), USENIX Associa- AND HAMBURG, M. Meltdown. meltdownattack.com (2018). tion, pp. 217–233. [52]L IU, F., GE,Q.,YAROM, Y., MCKEEN, F., ROZAS, C. V., [35]G RUSS,D.,MAURICE,C.,FOGH,A.,LIPP,M., AND MAN- HEISER,G., AND LEE, R. B. CATalyst: Defeating last-level GARD, S. Prefetch Side-Channel Attacks: Bypassing SMAP and cache side channel attacks in cloud computing. In High Perfor- Kernel ASLR. In Conference on Computer and Communications mance Computer Architecture – HPCA 2016 (2016), IEEE Com- Security – CCS 2016 (2016), ACM, pp. 368–379. puter Society, pp. 406–418. [36]G RUSS,D.,SPREITZER,R., AND MANGARD, S. Cache Template Attacks: Automating Attacks on Inclusive Last-Level [53]L OPEZ´ -PAZ,D.,HENNIG, P., AND SCHOLKOPF¨ , B. The Ran- Caches. In USENIX Security Symposium 2015 (2015), USENIX domized Dependence Coefficient. In Neural Information Pro- Association, pp. 897–912. cessing Systems – NIPS 2013 (2013), pp. 1–9. [37]G UERON, S. White Paper: Intel Advanced Encryption Standard [54]L UK,C.,COHN,R.S.,MUTH,R.,PATIL,H.,KLAUSER,A., (AES) Instructions Set, 2010. https://software.intel. LOWNEY, P. G., WALLACE,S.,REDDI, V. J., AND HAZEL- com/file/24917. Accessed: 2018-05-29. WOOD, K. M. Pin: Building Customized Program Analysis Tools with Dynamic Instrumentation. In Programming Language [38]I NTEL. Pin - A Dynamic Binary Instrumentation Tool. https: Design and Implementation – PLDI 2005 (2005), ACM, pp. 190– //software.intel.com/en-us/articles/pintool/. Ac- 200. cessed: 2018-05-29. [39]I NTEL. Intel 64 and IA-32 Architectures Software Developers [55]M ANGARD,S.,OSWALD,E., AND POPP, T. Power Analysis Manual, 2016. Reference no. 325462-061US. Attacks – Revealing the Secrets of Smart Cards. Springer, 2007. [40]I NTEL. Intel SgxSSL Library User Guide, 2018. Rev. 1.2.1. [56]M ANTEL,H., AND STAROSTIN, A. Transforming Out Timing https://software.intel.com/sites/default/files/ Leaks, More or Less. In European Symposium on Research in managed/3b/05/Intel-SgxSSL-Library-User-Guide. Computer Security – ESORICS 2015 (2015), vol. 9326 of LNCS, pdf. Accessed: 2018-05-29. Springer, pp. 447–467. [41]I RAZOQUI,G.,CONG,K.,GUO,X.,KHATTRI,H.,KANU- [57]M ANTEL,H.,WEBER,A., AND KOPF¨ , B. A Systematic Study PARTHI,A.K.,EISENBARTH, T., AND SUNAR, B. Did we of Cache Side Channels Across AES Implementations. In En- learn from LLC Side Channel Attacks? A Cache Leakage Detec- gineering Secure Software and Systems – ESSoS 2017 (2017), tion Tool for Crypto Libraries. CoRR abs/1709.01552 (2017). vol. 10379 of LNCS, Springer, pp. 213–230.

USENIX Association 27th USENIX Security Symposium 619 [58]M ELLOR-CRUMMEY,J.M., AND LEBLANC, T. J. A Software [75]S UMNER, W. N., ZHENG, Y., WEERATUNGE,D., AND Instruction Counter. In Architectural Support for Programming ZHANG, X. Precise calling context encoding. In International Languages and Operating Systems – ASPLOS 1989 (1989), ACM Conference on Software Engineering – ICSE 2010 (2010), ACM, Press, pp. 78–86. pp. 525–534. [59]M OLNAR,D.,PIOTROWSKI,M.,SCHULTZ,D., AND WAG- [76]S ZEFER, J. Survey of Microarchitectural Side and Covert Chan- NER, D. A. The Program Counter Security Model: Automatic nels, Attacks, and Defenses. IACR Cryptology ePrint Archive Detection and Removal of Control-Flow Side Channel Attacks. 2016 (2016), 479. In Information Security and Cryptology – ICISC 2005 (2005), [77]T ROMER,E.,OSVIK,D.A., AND SHAMIR, A. Efficient Cache vol. 3935 of LNCS, Springer, pp. 156–168. Attacks on AES, and Countermeasures. J. Cryptology 23 (2010), [60]N EVE,M., AND SEIFERT, J. Advances on Access-Driven Cache 37–71. Attacks on AES. In Selected Areas in Cryptography – SAC 2006 [78]T URNER, P. Retpoline: a software construct for prevent- (2006), vol. 4356 of LNCS, Springer, pp. 147–162. ing branch-target-injection, 2018. https://support.google. [61]O SVIK,D.A.,SHAMIR,A., AND TROMER, E. Cache Attacks com/faqs/answer/7625886. Accessed: 2018-05-29. and Countermeasures: The Case of AES. In Topics in Cryptology [79] VAN DE POL,J.,SMART, N. P., AND YAROM, Y. Just a Little Bit – CT-RSA 2006 (2006), vol. 3860 of LNCS, Springer, pp. 1–20. More. In Topics in Cryptology – CT-RSA 2015 (2015), vol. 9048 [62]P AGE, D. Partitioned Cache Architecture as a Side-Channel De- of LNCS, Springer, pp. 3–21. fence Mechanism. IACR Cryptology ePrint Archive 2005 (2005), [80]W ANG,S.,WANG, P., LIU,X.,ZHANG,D., AND WU,D. 280. CacheD: Identifying Cache-Based Timing Channels in Produc- [63]P ASAREANU,C.S.,PHAN,Q., AND MALACARIA, P. Multi- tion Software. In USENIX Security Symposium 2017 (2017), run Side-Channel Analysis Using Symbolic Execution and Max- USENIX Association, pp. 235–252. SMT. In Computer Security Foundations – CSF 2016 (2016), [81]W ANG,Z., AND LEE, R. B. New Cache Designs for Thwart- IEEE Computer Society, pp. 387–400. ing Software Cache-Based Side Channel Attacks. In Interna- [64]P ERCIVAL, C. Cache Missing for Fun and Profit, tional Symposium on Computer Architecture – ISCA 2007 (2007), 2005. Technical report: http://www.daemonology.net/ ACM, pp. 494–505. hyperthreading-considered-harmful/. Accessed: 2018- [82]W EISER,S.,SPREITZER,R., AND BODNER, L. Single Trace 05-29. Attack Against RSA Key Generation in Intel SGX SSL. In ASIA [65]P ESSL, P., GRUSS,D.,MAURICE,C.,SCHWARZ,M., AND Conference on Information, Computer and Communications Se- MANGARD, S. DRAMA: Exploiting DRAM Addressing for curity – AsiaCCS 2018 (2018), ACM. Cross-CPU Attacks. In USENIX Security Symposium 2016 [83]W ELCH, B. L. The generalization of student’s problem when (2016), USENIX Association, pp. 565–581. several different population varlances are involved. Biometrika [66]R EBEIRO,C.,SELVAKUMAR,A.D., AND DEVI, A. S. L. Bit- 34, 1-2 (1947), 28–35. slice Implementation of AES. In Cryptology and Network Secu- [84]X IAO, Y., LI,M.,CHEN,S., AND ZHANG, Y. STACCO: rity – CANS 2006 (2006), vol. 4301 of LNCS, Springer, pp. 203– Differentially Analyzing Side-Channel Traces for Detecting SS- 212. L/TLS Vulnerabilities in Secure Enclaves. In Conference on [67]R EPARAZ,O.,BALASCH,J., AND VERBAUWHEDE, I. Dude, is Computer and Communications Security – CCS 2017 (2017), my code constant time? In Design, Automation & Test in Europe ACM, pp. 859–874. – DATE 2017 (2017), IEEE, pp. 1697–1702. [85]X IN,B.,SUMNER, W. N., AND ZHANG, X. Efficient Program Execution Indexing. In Programming Language Design and Im- [68]R ESHEF,D.N.,RESHEF, Y. A., SABETI, P. C., AND MITZEN- plementation – PLDI 2008 (2008), ACM, pp. 238–248. MACHER, M. M. An Empirical Study of Leading Measures of Dependence. CoRR abs/1505.02214 (2015). [86]X U, Y., CUI, W., AND PEINADO, M. Controlled-Channel At- tacks: Deterministic Side Channels for Untrusted Operating Sys- [69]S HI,J.,SONG,X.,CHEN,H., AND ZANG, B. Limiting cache- IEEE Symposium on Security and Privacy – S&P 2015 based side-channel in multi-tenant cloud using dynamic page col- tems. In oring. In Dependable Systems and Networks Workshops – DSNW (2015), IEEE Computer Society, pp. 640–656. (2011), IEEE, pp. 194–199. [87]Y AROM, Y., AND FALKNER, K. FLUSH+RELOAD: A High Resolution, Low Noise, L3 Cache Side-Channel Attack. In [70]S HIH,M.,LEE,S.,KIM, T., AND PEINADO, M. T-SGX: Erad- USENIX Security Symposium 2014 (2014), USENIX Associa- icating Controlled-Channel Attacks Against Enclave Programs. tion, pp. 719–732. In Network and Distributed System Security Symposium – NDSS 2017 (2017), The Internet Society. [88]Y AROM, Y., GENKIN,D., AND HENINGER, N. CacheBleed: A Timing Attack on OpenSSL Constant-time RSA. J. Crypto- [71]S ONG,D.X.,BRUMLEY,D.,YIN,H.,CABALLERO,J., graphic Engineering 7 (2017), 99–112. JAGER,I.,KANG,M.G.,LIANG,Z.,NEWSOME,J., POOSANKAM, P., AND SAXENA, P. BitBlaze: A New Approach [89]Z ANKL,A.,HEYSZL,J., AND SIGL, G. Automated Detection to Computer Security via Binary Analysis. In International Con- of Instruction Cache Leaks in Modular Exponentiation Software. ference on Information Systems Security – ICISS 2008 (2008), In Smart Card Research and Advanced Applications – CARDIS vol. 5352 of LNCS, Springer, pp. 1–25. 2016 (2016), vol. 10146 of LNCS, Springer, pp. 228–244. [72]S TANDAERT, F. How (not) to Use Welch’s T-test in Side-Channel [90]Z HOU,Z.,REITER,M.K., AND ZHANG, Y. A Software Ap- Security Evaluations. IACR Cryptology ePrint Archive 2017 proach to Defeating Side Channels in Last-Level Caches. In Con- (2017), 138. ference on Computer and Communications Security – CCS 2016 (2016), ACM, pp. 871–882. [73]S UMNER, W. N., AND ZHANG, X. Memory indexing: canoni- calizing addresses across executions. In Foundations of Software [91]Z HUANG,X.,ZHANG, T., AND PANDE, S. HIDE: An Infras- Engineering – FSE 2010 (2010), ACM, pp. 217–226. tructure for Efficiently Protecting Information Leakage on the Address Bus. In Architectural Support for Programming Lan- [74]S UMNER, W. N., AND ZHANG, X. Identifying execution points guages and Operating Systems – ASPLOS 2004 (2004), ACM, Automated Software Engineering – ASE for dynamic analyses. In pp. 72–84. 2013 (2013), IEEE, pp. 81–91.

620 27th USENIX Security Symposium USENIX Association