1

Symbolic Verification of Cache Side-channel Freedom Sudipta Chattopadhyay, Abhik Roychoudhury

Abstract—Cache timing attacks allow third-party observers reasoning over multiple execution traces. To this end, we pro- to retrieve sensitive information from program executions. But, pose a symbolic verification technique within our CACHEFIX is it possible to automatically check the vulnerability of a framework. Concretely, we capture the cache behaviour of a program against cache timing attacks and then, automatically shield program executions against these attacks? For a given program via symbolic constraints over program inputs. Then, program, a cache configuration and an attack model, our we leverage recent advances on satisfiability modulo theory CACHEFIX framework either verifies the cache side-channel (SMT) and constraint solving to (dis)prove the cache side- freedom of the program or synthesizes a series of patches to channel freedom of a program. ensure cache side-channel freedom during program execution. An appealing feature of our CACHEFIX approach is to At the core of our framework is a novel symbolic verification technique based on automated abstraction refinement of cache employ symbolic reasoning over the real counterexample semantics. The power of such a framework is to allow symbolic traces. To this end, we systematically explore real counterex- reasoning over counterexample traces and to combine it with ample traces and apply such symbolic reasoning to synthesize runtime monitoring for eliminating cache side channels during patches. Each synthesized patch captures a symbolic condition program execution. Our evaluation with routines from OpenSSL, ν on input variables and a sequence of actions that needs libfixedtimefixedpoint, GDK and FourQlib libraries to be applied when the program is processed with inputs reveals that our CACHEFIX approach (dis)proves cache side- channel freedom within an average of 75 seconds. Besides, in satisfying ν. The application of a patch is guaranteed to all except one case, CACHEFIX synthesizes all patches within 20 reduce the channel capacity of the program under inspection. minutes to ensure cache side-channel freedom of the respective Moreover, if our checker terminates, then our CACHEFIX routines during execution. approach guarantees to synthesize all patches that completely shields the program against cache timing attacks [13], [7]. I.INTRODUCTION Intuitively, our CACHEFIX approach can start with a program P vulnerable to cache timing attack. Then, it leverages a Cache timing attacks [24], [23] are among the most critical systematic combination of symbolic verification and runtime side-channel attacks [25] that retrieve sensitive information monitoring to execute P with cache side-channel freedom. from program executions. Recent cache attacks [31] further It is the precision and the novel mechanism implemented show that cache side-channel attacks are practical even in within CACHEFIX that set us apart from the state of the commodity embedded processors, such as in ARM-based art. Existing works on analyzing cache side channels [22], embedded platforms [31]. The basic idea of a cache timing [14], [30] are incapable to automatically build and refine attack is to observe the timing of cache hits and misses for a abstractions for cache semantics. Besides, these works are program execution. Subsequently, the attacker use such timing not directly applicable when the underlying program does to guess the sensitive input via which the respective program not satisfy cache side-channel freedom. Given an arbitrary was activated. program, our CACHEFIX approach generates proofs of its Given the practical relevance, it is crucial to verify whether a cache side-channel freedom or generates input(s) that manifest given program (e.g. an encryption routine) satisfies cache side- the violation of cache side-channel freedom. Moreover, our channel freedom, meaning the program is not vulnerable to symbolic reasoning framework provides capabilities to sys- cache timing attacks. However, verification of such a property tematically synthesize patches and completely eliminate cache is challenging for several reasons. Firstly, the verification of side channels during program execution. cache side-channel freedom requires a systematic integration We organize the remainder of the paper as follows. After of cache semantics within the program semantics. This, in turn, providing an overview of CACHEFIX (Section II), we make is based on the derivation of a suitable abstraction of cache the following contributions: semantics. Our proposed CACHEFIX approach automatically 1) We present CACHEFIX, a novel symbolic verification builds such an abstraction and systematically refines it until framework to check the cache side-channel freedom of a proof of cache side-channel freedom is obtained or a an arbitrary program. To the best of our knowledge, real (i.e. non-spurious) counterexample is produced. Secondly, this is the first application of automated abstraction proving cache side-channel freedom of a program requires refinement and symbolic verification to check the cache S. Chattopadhyay is with the Information Systems Technology and De- behaviour of a program. sign, Singapore University of Technology and Design, Singapore. e-mail: 2) We instantiate our CACHEFIX approach with direct- (sudipta [email protected]). A. Roychoudhury is with the School of Computing, National University of mapped caches, as well as with set-associative caches Singapore, Singapore. e-mail: ([email protected]). with least recently used (LRU) and first-in-first-out 2

(FIFO) policy (Section IV-). In Section IV-D, we show The style of encoding, as shown in Equation 2, facilitates the generalization of our CACHEFIX approach over the usage of state-of-the-art solvers for verifying cache side- timing-based attacks [13] and trace-based attacks [7]. channel freedom. 3) We discuss a systematic exploration of counterexamples In general, we note that the cache behaviour of the program to synthesize patches and to shield program executions in Figure 1(a), i.e., the cache behaviours of r1, . . . , r4; can against cache timing attacks (Section V). We provide be formulated accurately via the following set of predicates theoretical guarantees that such patch synthesis con- related to cache semantics: verges towards completely eliminating cache side chan- tag Pred = {ρset ∪ ρ | 1 ≤ j < i ≤ 4} (3) nels during execution. cache ji ji 4) We provide an implementation of CACHEFIX and The size of Pred cache depends on the number of memory- evaluate it with 25 routines from OpenSSL, GDK, related instructions. However, |Pred cache| does not vary with FourQlib and libfixedtimefixedpoint li- the cache size. braries. Our evaluation reveals that CACHEFIX can es- Key insight in abstraction refinement: If the attacker tablish proof or generate non-spurious counterexamples observes the number of cache misses, then the cache side- within 75 seconds on average. Besides, in most cases, channel freedom holds for the program in Figure 1(a) when CACHEFIX generated all patches within 20 minutes to all feasible traces exhibit the same number of cache misses. ensure cache side-channel freedom during execution. Hence, such a property ϕ can be formulated as the non- Our implementation and all experimental data are pub- existence of two traces tr1 and tr2 as follows: licly available. 4 4 ! X (tr1) X (tr2) ϕ ≡6 ∃tr1, 6 ∃tr2 s.t. missi 6= missi (4) i=1 i=1 II.OVERVIEW where miss(tr) captures the valuation of miss in trace tr. In this section, we demonstrate the general insight behind i i Our key insight is that to establish a proof of ϕ (or its lack our approach through examples. We consider the simple code thereof), it is not necessary to accurately track the values of fragments in Figure 1(a)-(b) where key is a sensitive input. In all predicates in Pred (cf. Equation 3). In other words, this example, we will assume a direct-mapped cache having cache even if some predicates in Pred have unknown values, it a size of 512 bytes. For the sake of brevity, we also assume cache might be possible to (dis)prove ϕ. This phenomenon occurs that key is stored in a register and accessing key does not due to the inherent design principle of caches and we exploit involve the cache. The mapping of different program variables this in our abstraction refinement process. into the cache appears in Figure 1(c). Finally, we assume the To realize our hypothesis, we first start with an ini- presence of an attacker who observes the number of cache tial set of predicates (possibly empty) whose values are misses in the victim program. For such an attacker, examples accurately tracked during verification. In this example, let in Figure 1(a)-(b) satisfy cache side-channel freedom if and us assume that we start with an initial set of predicates only if the number of cache misses suffered is independent of Pred = {ρset, ρtag, ρset, ρtag}. The rest of the predicates key. init 13 13 23 23 in Pred \ Pred are set to unknown value. With Why symbolic verification?: Cache side-channel freedom cache init this configuration at hand, CACHEFIX returns counterexample of a program critically depends on how it interacts with the traces tr and tr (cf. Equation 4) to reflect that ϕ does not cache. We make an observation that the program cache be- 1 2 hold for the program in Figure 1(a). In particular, the following haviour can be formulated via a well-defined set of predicates. traces are returned: To this end, let us assume set(ri) captures the cache set tr1 ≡ hmiss1 = miss3 = 1, miss2 = miss4 = 0i accessed by instruction ri and tag(ri) captures the accessed cache tag by the same instruction. Consider the instruction tr2 ≡ hmiss1 = 0, miss2 = miss3 = miss4 = 1i in Figure 1(a). We introduce a symbolic variable miss , 3 3 Given tr1 and tr2, we check whether any of them are which we intend to set to one if r3 suffers a cache miss and spurious. To this end, we reconstruct the logical condition to set to zero otherwise. We observe that miss depends on 3 (cf. Equation 2) that led to the specific valuations of missi the following logical condition: variables in a trace. For instance in trace tr2, such a logical V set tag condition is captured via ¬Γ(r1) ∧ Γ(ri). It turns Γ(r3) ≡ ¬ 0 ≤ key ≤ 127 ∧ ρ13 ∧ ¬ρ13 i∈[2,4] (1) out that ¬Γ(r ) ∧ V Γ(r ) is unsatisfiable, making tr ∧¬ key ≥ 128 ∧ ρset ∧ ¬ρtag 1 i∈[2,4] i 2 23 23 spurious. This happened due to the incompleteness in tracking tag set the predicates Pred . where ρ ≡ (tag(rj) 6= tag(ri)) and ρ ≡ cache ji ji To systematically augment the set of predicates and rerun (set(rj) = set(ri)). Intuitively, Γ(r3) checks whether our verification process, we extract the unsatisfiable core from both r1 and r2, if executed, load different memory blocks V ¬Γ(r1) ∧ Γ(ri). Specifically, we get the following than the one accessed by r3. Therefore, if Γ(r3) is evaluated i∈[2,4] unsatisfiable core: to true, then miss3 = 1 (i.e. r3 suffers a cache miss) and miss = 0 (i.e. r is a cache hit), otherwise. Formally, we set tag 3 3 U ≡ ¬ρ34 ∨ ρ34 (5) set the cache behaviour of r3 as follows: Intuitively, with the initial abstraction Pred init, our checker Γ(r3) ⇔ (miss3 = 1) ; ¬Γ(r3) ⇔ (miss3 = 0) (2) CACHEFIX failed to observe that r3 and r4 access the same 3

/* key is sensitive input */ /* key is sensitive input */ a[0], b[255] /* key=255 */ char a[256]; char a[256]; a[1] /* inject cache miss */ unsigned char key; unsigned char key; => load reg’, fresh[0] 256 bytes char b[256]; char b[256]; r2 -> r3 -> c1 -> r4 a[255] if (key <= 127) if (key <= 127) /* 128<=key<=254 */ r1: load reg2, b[255] r1: load reg2, b[255] key else else /* do nothing */ b[0] r2 -> r3 -> c1 -> r4 r2: load reg2, b[128] r2: load reg2, a[255] b[1] 256 bytes r3: load reg1, a[key] r3: load reg1, a[key] /* 0<=key<=127 */ c1: add reg1, reg2 c1: add reg1, reg2 /* do nothing */ b[254] Cache r4: store reg1, a[key] r4: store reg1, a[key] r1 -> r3 -> c1 -> r4 (a) (b) (c) (d)

Fig. 1. A code fragment (a) satisfying cache side-channel freedom, (b) violating cache side-channel freedom. (c) Mapping of variables into the cache. (d) Runtime actions and the execution order for the program in Figure 1(b) to ensure cache side-channel freedom.

Extract Monitor memory block, hence, U is unsatisfiable. We then augment our Real + Counterexample initial set of predicates with the predicates in U and therefore, Synthesize Patch refining the abstraction as follows: Program ( ) ♽ set tag set tag set tag P Pred cur = {ρ13 , ρ13 , ρ23 , ρ23 , ρ34 , ρ34 } Initial Abstraction Side-channel Verify ( ) Freedom Predinit ✅ CACHEFIX successfully verifies the cache side-channel Cache (C) freedom of the program in Figure 1(a) with the set of pred- ♽ Attack Model (O) icates Pred cur. We note that the predicates in Pred cache \ Refine Abstraction Spurious ( ) Counterexample Pred cur 6= φ. In particular, we still have unknown values Predcur assigned to the following set of predicates: set tag set tag set tag Fig. 2. Workflow of our symbolic verification and patching Pred unknown = {ρ12 , ρ12 , ρ14 , ρ14 , ρ24 , ρ24 } Therefore, it was possible to verify ϕ by tracking only half of set tag and ¬Γ(r1) ∧ Γ(r2) ∧ ¬Γ(r3) ∧ ¬Γ(r4), respectively. Both the predicates in Pred cache. Intuitively, ρ12 and ρ12 were not needed to be tracked as r and r cannot appear in a single the formulas are satisfiable for the example in Figure 1(b). 1 2 r trace, as captured via the program semantics. In contrast, the Intuitively, this happens due to 2, which loads the same memory block as accessed by r3 only if key = 255. rest of the predicates in Pred unknown were not required for We observe that tr2 will be equivalent to tr1 if a cache miss the verification process, as neither r1 nor r2 influences the is inserted in the beginning of tr2. To this end, we need to cache behaviour of r4 – it is influenced completely by r3. Key insight in fixing: In general, the state-of-the-art in know all inputs that lead to tr2. Thanks to the symbolic nature fixing cache side-channel is to revert to constant-time pro- of our analysis, we obtain the exact symbolic condition, i.e., gramming style [8]. Constant-time programming style imposes ¬Γ(r1) ∧ Γ(r2) ∧ ¬Γ(r3) ∧ ¬Γ(r4) that manifests the trace heavy burden on a programmer to follow certain programming tr2. Therefore, if the program in Figure 1(b) is executed with patterns, such as to ensure the absence of input-dependent any input satisfying ¬Γ(r1) ∧ Γ(r2) ∧ ¬Γ(r3) ∧ ¬Γ(r4), then branches and input-dependent memory accesses. Yet, most a cache miss is injected as shown in Figure 1(d). This ensures programs do not exhibit constant-time behaviour. Besides, the the cache side-channel freedom during program execution, as example in Figure 1(a) shows that an application can still all traces exhibit the same number of cache misses. have constant cache-timing, despite not following the constant- Our proposed fixing mechanism is novel that it does not rely on any specific programming style. Moreover, as we generate time programming style. Using our CACHEFIX approach, we observe that it is not necessary to always write constant- the fixes by directly leveraging the verification results, we can time programs. Instead, the executions of such programs provide strong security guarantees during program execution. can be manipulated to exhibit constant time behaviour. We Overall workflow of CACHEFIX: Figure 2 outlines the accomplish this by leveraging our verification results. overall workflow of CACHEFIX. The abstraction refinement We consider the example in Figure 1(b) and let us as- process is guaranteed to converge towards the most precise abstraction of cache semantics to (dis)prove the cache side- sume that we start with the initial abstraction Pred init = set tag set tag channel freedom. Moreover, as observed in Figure 2, our cache {ρ13 , ρ13 , ρ23 , ρ23 }.CACHEFIX returns the following coun- terexample while verifying ϕ (cf. Equation 4): side channel fixing is guided by program verification output, enabling us to give cache side channel freedom guarantees tr1 ≡ hmiss1 = miss3 = 1, miss2 = miss4 = 0i about the fixed program. tr2 ≡ hmiss2 = 1, miss1 = miss3 = miss4 = 0i If we reconstruct the logical condition that led to the specific III.THREAT AND SYSTEM MODEL valuations of miss1, . . . , miss4 in tr1 and tr2, then we get In the following, we discuss the threat models and the type the symbolic formulas Γ(r1) ∧ ¬Γ(r2) ∧ Γ(r3) ∧ ¬Γ(r4) of processors targeted by CACHEFIX. 4

A. Threat Model A. Initial abstract domain We assume that an attacker makes observations on the We assume that a routine may start execution with any execution traces of victim program P and the implementation initial cache state, but it does not access memory blocks of P is known to the attacker. Besides, there does not exist any within the initial state during execution [22]. Hence, for a error in the observations made by the attacker. We also assume given instruction ri, its cache behaviour might be affected by that an attacker can execute arbitrary user-level code on the all instructions executing prior to ri. Concretely, the cache processor that runs the victim program. This, in turn, allows behaviour of ri can be accurately predicted based on the set the attacker to flush the cache (e.g. via accessing a large array) of logical predicates Pred set and Pred tag as follows: before the victim routine starts execution. We, however, do not Pred i = {set(r ) = set(r ) | 1 ≤ j < i} assume that the attacker can access the address space of the set j i i (6) victim program P. We believe the aforementioned assumptions Pred tag = {tag(rj) 6= tag(ri) | 1 ≤ j < i} on the attacker are justified, as we aim to verify the cache side- i channel freedom of programs against strong attacker models. Intuitively, Pred set captures the set of predicates checking whether any instruction prior to ri accesses the same cache set We capture an execution trace via a sequence of hits (h) i and misses (m). Hence, formally we model an attacker as the as ri. Similarly, Pred tag checks whether any instruction prior to r has a different cache tag than tag(r ). Based on this in- mapping O : {h, m}∗ → X, where X is a countable set. For i i ∗ tuition, the following set of predicates are sufficient to predict tr1, tr2 ∈ {h, m} , an attacker can distinguish tr1 from tr2 the cache behaviours of N memory-related instructions. if and only if O(tr1) 6= O(tr2). In this paper, we instantiate our checker for the following realistic attack models: N N [ i [ i ∗ Pred set = Pred ; Pred tag = Pred (7) • Otime : {h, m} → N. Otime maps each execution trace set tag to the number of cache misses suffered by the same. This i=1 i=1 attack model imitates cache timing attacks [13]. For the sake of efficiency, however, we launch verification with ∗ ∗ • Otrace : {h, m} → {0, 1} . Otrace maps each execution a smaller set of predicates Pred init ⊆ Pred tag ∪ Pred set as trace to a bitvector (h is mapped to 0 and m is mapped follows: to 1). This attack model imitates trace-based attacks [7]. N [ Pred init = {p | p ∈ Pred tag ∪ Pred set ∧ |σ(ri)| = 1 ∧ i=1 B. Processor model ∀k ∈ [1, i). |σ(rk)| = 1 ∧ guardk ⇒ true} (8) We assume an ARM-style processor with one or more cache levels. However, we consider timing attacks only due to first- σ(ri) captures the set of memory blocks accessed by in- level instruction or data caches [13], [7]. We currently do struction ri and guardk captures the control condition under not handle more advanced attacks on shared caches [32]. which rk is executed. In general, our CACHEFIX approach First-level caches can either be partitioned (instruction vs. works even if Pred init = φ. However, to accelerate the data) or unified. We assume that set-associative caches have convergence of CACHEFIX, we start with the predicates whose either LRU or FIFO replacement policy. Other deterministic values can be statically determined (i.e. independent of inputs). replacement policies can easily be integrated within CACHE- Intuitively, we take this approach for two reasons: firstly, the FIX via additional symbolic constraints. Finally, our timing set Pred init can be computed efficiently during symbolic ex- model only takes into account the effect of caches. Timing ecution. Secondly, as the predicates in Pred init have constant effects due to other micro-architectural features (e.g. pipeline valuation, they reduce the size of the formula to be discharged and branch prediction) are currently not handled. For the sake to the SMT solver. We note that guardk depends on the of brevity, we discuss the timing effects due to memory-related program semantics. The abstraction of program semantics is instructions. It is straightforward to integrate the timing effects an orthogonal problem and for the sake of brevity, we skip its of computation instructions (e.g. add) into CACHEFIX. discussion here.

IV. ABSTRACTION REFINEMENT B. Abstract domain refinement We use the mapping Γ: {r , r , . . . , r } → {true, false} Notations: We represent cache via a triple h2S , 2B, Ai 1 2 N to capture the conditions under which r was a cache hit (i.e. where 2S , 2B and A capture the number of cache sets, cache i miss = 0) or a cache miss (i.e. miss = 1). In particular, line size and cache associativity, respectively. We use set(r ) i i i the following holds: and tag(ri) to capture the cache set and cache tag, respec- r tively, accessed by instruction i. Additionally, we introduce Γ(ri) ⇔ (missi = 1) ; ¬Γ(ri) ⇔ (missi = 0) (9) a symbolic variable missi to capture whether ri was a miss i i (missi = 1) or a hit (missi = 0). For instructions ri and rj, Γ(ri) depends on predicates in Pred set ∪ Pred tag and the we have j < i if and only if rj was (symbolically) executed cache configuration. We show the formulation of Γ(ri) in before ri. Section IV-C. 5

Algorithm 1 Abstraction Refinement Algorithm Procedure 2 Symbolically Tracking Program and Cache Input: Program P, cache configuration C, attack model O States Output: Successful verification or a concrete counterexample 1: /* symbolically execute P with cache configuration C*/ 2: procedure EXECUTESYMBOLIC(P, C) 1: /* Ψ is a formula representation of P */ 3: i := 1; Ψ := true; Pred set := Pred tag := φ 2: /* Pred is cache-semantics-related predicates */ 4: ri := GETNEXTINSTRUCTION(P) 3: /* Γ determines cache behaviour of all instructions */ 5: while ri 6= exit do 4: (Ψ, Pred, Γ) := EXECUTESYMBOLIC(P, C) 6: if ri is memory-related instruction then 5: 7: /* Collect predicates for cache semantics */ /* Formulate initial abstraction (cf. Equation 8) */ i i 8: Pred ∪ = Pred ; Pred ∪ = Pred 6: Pred cur:=Pred init := GETINITIALABSTRACTION(Pred) set set tag tag 7: /* Rewrite Ψ with initial abstraction */ 9: /* Γ(ri) determines cache behaviour of ri */ 10: Formulate Γ(r ) /* see Section IV-C */ 8: REWRITE(Ψ, Pred init) i 9: /* Formulate cache side-channel freedom property */ 11: Γ ∪ = {Γ(ri)} 10: ϕ := GETPROPERTY(O) 12: /* Integrate cache semantics within Ψ */ 11: /* Invoke symbolic verification to check Ψ ∧ ¬ϕ */ 13: Ψ := CONVERT(Ψ, Γ(ri) ⇔ (missi = 1)) 14: Ψ := CONVERT(Ψ, ¬Γ(r ) ⇔ (miss = 0)) 12: (res, tr1, tr2) := VERIFY(Ψ, ϕ) i i 15: end if 13: while (res=false) ∧ (tr1 or tr2 is spurious) do 16: /* Integrate program semantics of r within Ψ */ 14: /* Extract unsatisfiable core from tr1 and/or tr2 */ i 17: /* ϕ(r ) is a predicate capturing r semantics */ 15: U := UNSATCORE(tr1, tr2, Γ) i i 16: /* Refine abstractions and repeat verification */ 18: Ψ := CONVERT(Ψ, ϕ(ri)) 19: i := i + 1 17: Pred cur := REFINE(Pred init, U, Pred) 20: r := GETNEXTINSTRUCTION(P) 18: REWRITE(Ψ, Pred cur) i 21: end while 19: (res, tr1, tr2) := VERIFY(Ψ, ϕ) 22: return (Ψ, Pred ∪ Pred , Γ) 20: Pred init := Pred cur set tag 21: end while 23: end procedure 22: return res

GETINITIALABSTRACTION: We start our verification with an initial abstraction of cache semantics (cf. Equa- EXECUTESYMBOLIC: Algorithm 1 captures the overall tion 8). Such an initial abstraction contains a partial set verification process based on our systematic abstraction refine- of logical predicates Pred init ⊆ Pred set ∪ Pred tag. Based ment. The symbolic verification engine computes a formula on Pred init, we rewrite Ψ via the procedure REWRITE as representation Ψ of the program P. This is accomplished follows: We walk through Ψ and look for occurrences of via a symbolic execution on program P (cf. procedure EXE- − any predicate p ∈ (Pred set ∪ Pred tag) \ Pred init. For any CUTESYMBOLIC) and systematically translating the cache and p− discovered in Ψ, we replace p− with a fresh symbolic program semantics of each instruction into a set of constraints variable Vp− . Intuitively, this means that during the verification (cf. procedure CONVERT). process, we assume any truth value for the predicates in CONVERT: During the symbolic execution, a set of sym- (Pred set ∪ Pred tag) \ Pred init. This, in turn, substantially bolic states, each capturing a unique execution path reaching reduces the size of the symbolic formula Ψ and simplifies an instruction ri, is maintained. This set of symbolic states can the verification process. be viewed as a disjunction Ψ(ri) ≡ ψ1 ∨ ψ2 ∨ ... ∨ ψj−1 ∨ ψj, VERIFY and GETPROPERTY: The procedure VERIFY where Ψ(ri) ⇒ Ψ and each ψi symbolically captures a unique invokes the solver to check the cache side-channel freedom of execution path leading to instruction ri. At each instruction P with respect to attack model O. The property ϕ, capturing ri, the procedure CONVERT translates Ψ(ri) in such a fashion the cache side-channel freedom, is computed via GETPROP- that Ψ(ri) integrates both the cache semantics (cf. lines 13- ERTY. For example, in timing-based attacks, ϕ is captured 14) and program semantics (cf. lines 18) of ri. For instance, via the non-existence of any two traces tr1 and tr2 that have to integrate cache semantics of a memory-related instruction different number of cache misses (cf. Equation 4). In other ri, Ψ(ri) is converted to Ψ(ri) ∧ (Γ(ri) ⇔ (missi = 1)) ∧ words, if the following formula is satisfied with more than PN (¬Γ(ri) ⇔ (missi = 0)). Similarly, the program semantics of one valuations for i=1 missi, then side-channel freedom is instruction ri, as captured via ϕ(ri), is integrated within Ψ(ri) violated: N ! ! as Ψ(ri) ∧ ϕ(ri). Translating the program semantics of each X instruction to a set of constraints is a standard technique in any Ψ ∧ missi ≥ 0 (10) symbolic model checking [19]. Moreover, such a translation is i=1 typically carried out on a program in static single assignment Here N captures the total number of memory-related instruc- (SSA) form and takes into account both data and control flow. tions encountered during the symbolic execution of P. Unlike classic symbolic analysis, however, we consider both REFINE and REWRITE: When our verification process the cache semantics and program semantics of an execution fails, we check the feasibility of a counterexample trace path, as explained in the preceding. trace ∈ {tr1, tr2}. Recall from Equation 9 that ri is a cache 6

miss if and only if Γ(ri) holds true. We leverage this relation to Recall that guardj captures the control condition under which construct the following formula Γtrace for feasibility checking: rj is executed. Hence, if guardj is evaluated false for a trace, cold N ( (trace) then rj does not appear in the respective trace. If Θi is ^ Γ(ri), if miss = 1; i satisfied, then ri inevitably suffers a cold cache miss. Γtrace = (trace) (11) i=1 ¬Γ(ri), if missi = 0; Formulating conditions for conflict cache misses: For direct-mapped caches, an instruction ri suffers a conflict miss In Equation 11, miss(trace) captures the valuation of symbolic i due to an instruction rj if all of the following conditions are variable missi in the counterexample trace. We note that satisfied: trace is not a spurious counterexample if and only if Γtrace cnf,dir φji : If rj accesses the same cache set as ri, however, is satisfiable, hence, highlighting the violation of cache side- rj accesses a different cache tag as compared to ri. This is channel freedom. formally captured as follows: If Γtrace is unsatisfiable, then our initial abstraction φcnf,dir ≡ ρtag ∧ ρset (15) Pred init was insufficient to (dis)prove the cache side-channel ji ji ji freedom. In order to refine this abstraction, we extract the rel,dir φji : No instruction between rj and ri accesses the unsatisfiable core Γ from the symbolic formula trace via the same memory block as ri. For instance, consider the memory- NSAT ORE procedure U C . Such an unsatisfiable core contains a block access sequence (r1 : m1) → (r2 : m2) → (r3 : m2), ∈ S Γ(r ) Γ(r ) set of CNF clauses k∈[1,N] k . We note each k is a where both m1 and m2 are mapped to the same cache set Pred ∪Pred function of the set of predicates tag set. Finally, we and r1...3 captures the respective memory-related instructions. cf. EFINE refine the abstraction ( procedure R in Algorithm 1) It is not possible for r1 to inflict a conflict miss for r3, as rel,dir to Pred cur by including all predicates in the unsatisfiable core the memory block m2 is reloaded by instruction r2. φji as follows: is formally captured as follows: + + Pred cur := Pred init ∪ {p | p ∈ UnsatCore(Γtrace) rel,dir ^ tag set  φ ≡ ρ ∨ ¬ρ ∨ ¬guardk (16) + (12) ji ki ki ∧ p ∈/ Pred init} j

With the refined abstraction Pred cur, we rewrite the sym- rel,dir Intuitively, φji captures that all instructions between rj bolic formula Ψ (cf. procedure REWRITE). In particular, we and ri either access a different memory block than ri (hence, identify the placeholder symbolic variables for predicates in tag set satisfying ρ ∨ ¬ρki ) or does not appear in the execution the set Pred \ Pred . We rewrite Ψ by replacing these ki cur init trace (hence, satisfying ¬guardk). placeholder symbolic variables with the respective predicates Given the intuition mentioned in the preceding paragraphs, in the set Pred cur \ Pred init. It is worthwhile to note that cnf,dir we conclude that ri suffers a conflict miss if both φji the placeholder symbolic variables in (Pred tag ∪ Pred set) \ rel,dir and φji are satisfied for any instruction executing prior Pred cur remain unchanged. cnf,dir to ri. This is captured in the symbolic condition Θi as follows: C. Modeling Cache Semantics cnf,dir _  cnf,dir rel,dir  Θi ≡ φji ∧ φji ∧ guardj (17) For each memory-related instruction ri, the formulation of j∈[1,i) Γ(ri) is critical to prove the cache side-channel freedom. The formulation of Γ(ri) depends on the configuration of caches. Computing Γ(ri): For direct-mapped caches, ri can be a Due to space constraints, we will only discuss the symbolic cache miss if it is either a cold cache miss or a conflict miss. model for direct-mapped caches (symbolic models for LRU Hence, Γ(ri) is captured symbolically as follows: and FIFO caches are provided in the supplement [18]). To  cold cnf,dir simplify the formulation, we will use the following abbrevia- Γ(ri) ≡ guardi ∧ Θi ∨ Θi (18) tions for the rest of the section: set tag D. Property for cache side-channel freedom ρij ≡ (set(ri) = set(rj)) ; ρij ≡ (tag(ri) 6= tag(rj)) (13) In this paper, we instantiate our checker for timing-based We also distinguish between the following variants of misses: attacks [13] and trace-based attacks [7] as follows. Timing-based attacks: In timing-based attacks, an at- 1) Cold misses: Cold misses occur when a memory block tacker aims to distinguish traces based on their timing. In our is accessed for the first time. framework, we verify the following property to ensure cache 2) Conflict misses: All cache misses that are not cold side-channel freedom: misses are referred to as conflict misses. N ! ! Formulating conditions for cold misses: Cold cache X misses occur when a memory block is accessed for the first Ψ ∧ missi ≥ 0 ≤ 1 (19) i=1 sol PN miss time during program execution. In order to check whether ( i=1 i) ri suffers a cold miss, we check whether all instructions N captures the number of symbolically executed, memory- r ∈ {r1, r2, . . . , ri−1} access different memory blocks than PN related instructions. sol( i=1 missi) captures the number of the memory block accessed by ri. This is captured as follows: PN valuations of i=1 missi. Intuitively, Equation 19 aims to cold ^ set tag  check that the underlying program has exactly one cache Θi ≡ ¬ρji ∨ ρji ∨ ¬guardj (14) j∈[1,i) behaviour, in terms of the total number of cache misses. 7

Trace-based attacks: In trace-based attacks, an attacker generation. To this end, CACHEFIX employs a strategy that monitors the cache behaviour of each memory access. We can be visualized as an exploration of the equivalence classes define a partial function ξ : {r1, . . . , rN } 9 {0, 1} as follows: of observations (e.g. #cache misses), i.e., we explore all ( counterexamples in the same equivalence class in one shot. 1, if guardi ∧ (missi = 1) holds; In order to find another counterexample exhibiting the same ξ(ri) = (20) 0, if guardi ∧ (missi = 0) holds; observation as observation o, we modify the verification goal cf. The following verification goal ensures side-channel freedom: as follows, for timing and trace-based attacks, respectively ( Equation 20 for ξ):  Ψ ∧ |dom(ξ)| ≥ 0 ∧ kr ∈dom(ξ) ξ(ri) ≥ 0 ≤ 1 N ! ! i sol(X) X (21) Ψ ∧ ¬ missi 6= o ; where dom(ξ) captures the domain of ξ, k captures the ordered i=1   (with respect to the indexes of r ) concatenation operation |dom(ξ)|= 6 [|dom(ξ)|]o i   Ψ ∧ ¬ ∨ kr ∈dom(ξ) ξ(ri) 6= kr ∈dom(ξ) ξ(ri) and X = h|dom(ξ)| , kri∈dom(ξ) ξ(ri)i. Intuitively, we check  i i o  whether there exists exactly one cache behaviour sequence. (23) where [X] captures the valuation of X with respect to obser- V. RUNTIME MONITORING o vation o and N is the total number of symbolically executed, CACHEFIX produces the first real counterexample when it memory-related instructions. If Equation 23 is unsatisfiable, discovers two traces with different observations (w.r.t. attack then it captures the absence of any more counterexample with model O). These traces are then analyzed to compute a set of observation o. We note that Ψ is automatically refined to runtime actions that are guaranteed to reduce the uncertainty avoid discovering duplicate or spurious counterexamples. If to guess sensitive inputs. Overall, our runtime monitoring Equation 23 is satisfiable, our checker provides another real involves the following crucial steps: counterexample with the observation o. We repeat the process • We analyze a counterexample trace tr and extract the until no more real counterexample with the observation o is symbolic condition for which the same trace would be found, at which point Equation 23 becomes unsatisfiable. generated, To explore a different equivalence class of observation than • We systematically explore unique counterexamples with that of observation o,CACHEFIX negates the verification goal. the objective to reduce the uncertainty to guess sensitive For Otime, as an example, the verification goal is changed as inputs, follows: • We compute a set of runtime actions that need to be N ! ! X applied for improving the cache side-channel freedom. Ψ ∧ ¬ missi = o (24) In the following, we discuss these three steps in more detail. i=1 We note that Equation 24 is satisfiable if and only if there A. Analyzing a counterexample trace exists an execution trace with observation differing from o. Given a real counterexample trace, we extract a symbolic condition that captures all the inputs for which the same C. Runtime actions to improve side-channel freedom counterexample trace can be obtained. Thanks to the symbolic Our checker maintains the record of all explored observa- nature of our analysis, CACHEFIX already includes capabilities tions and the symbolic conditions capturing the equivalence to extract these monitors as follows. classes of respective observations. At each round of patch (i.e. ( (trace) runtime action) synthesis, we walk through this record and ^ Γ(ri), if missi = 1; ν ≡ (trace) (22) compute the necessary runtime actions for improving cache ¬Γ(ri), if miss = 0; ri∈trace i side-channel freedom. (trace) Otime: Assume Ω = {hν1, o1i, hν2, o2i,..., hνk, oki} where missi is the valuation of symbolic variable missi where each oi captures a unique number of observed cache in trace. We note that ν ⇒ ¬Γ(rj) for any rj that does not misses and νi symbolically captures all inputs that lead to appear in trace (i.e. rj ∈/ trace). Hence, to formulate ν, it was sufficient to consider only the instructions that appear in observation oi. Our goal is to manipulate executions so that trace. they lead to the same number of cache misses. To this end, the Once we extract a monitor ν from counterexample trace, patch synthesis stage determines the amount of cache misses the symbolic system Ψ is refined to Ψ ∧ ¬ν. This is to ensure that needs to be added for each element in Ω. Concretely, the that we only explore unique counterexample traces. set of runtime actions generated are as follows:       ν1, max oi − o1 ,..., νk, max oi − ok (25) B. Systematic exploration of counterexamples i∈[1,k] i∈[1,k] The order of exploring counterexamples is crucial to satisfy In practice, when a program is run with input I, we monotonicity, i.e., to reduce the channel capacity (a standard check whether I ∈ νx for some x ∈ [1, k]. Subsequently,  metric to quantify the information flow from sensitive input maxi∈[1,k] oi − ox cache misses were injected before the to attacker observation) of P with each round of patch program starts executing. 8

Otrace: During trace-based attacks, the attacker makes E. Properties guaranteed by CACHEFIX an observation on the sequence of cache hits and misses CACHEFIX satisfies the following crucial properties (proofs in an execution trace. Therefore, our goal is to manipulate are included in the supplement [18]) on channel capacity, executions in such a fashion that all execution traces lead to shannon entropy and min entropy; which are standard metrics the same sequence of cache hits and misses. To accomplish to quantify the information flow from sensitive inputs to the this, each runtime action involves the injection of cache misses attacker observation. or hits before execution, after execution or at an arbitrary point of execution. It also involves invalidating an address in cache. Property 1. (Monotonicity) Consider a victim program P Concretely, this is formalized as follows: with sensitive input K. Given attack models Otime or Otrace, assume that the channel capacity to quantify the uncertainty of P P hνi, h(c1, a1), (c2, a2),..., (ck, ak)ii (26) guessing K is Gcap. CACHEFIX guarantees that Gcap monoton- ically decreases with each synthesized patch (cf. Equation 25- where νi captures the symbolic input condition where the 26) employed at runtime. runtime actions are employed. For any input satisfying ν , we i Property 2. (Convergence) Consider a victim program P with count the number of instructions executed. If the number of sensitive input K. In the absence of any attacker, assume that executed instructions reaches c , then we perform the action a j j the uncertainty to guess K is Ginit, Ginit and Ginit , via chan- (e.g. injecting hits/misses or invalidating an address in cache), cap shn min nel capacity, Shannon entropy and Min entropy, respectively. If for any j ∈ [1, k]. CACHEFIX terminates and all synthesized patches are applied As an example, consider a trace-based attack in the example at runtime, then the channel capacity (respectively, Shannon of Figure 1(b). Our checker will manipulate counterexample init init entropy and Min entropy) will remain Gcap (respectively, Gshn traces by injecting cache misses and hits as follows (injected and Ginit ) even in the presence of attacks captured via O cache hits and misses are highlighted in bold): min time and Otrace. tr00 ≡ hmiss, miss, hit, hiti; tr00 ≡ hmiss, miss, hit, hiti 1 2 VI.IMPLEMENTATION AND EVALUATION Therefore, the following actions are generated to ensure cache In this section, we will discuss our implementation setup side-channel freedom against trace-based attacks: and key findings from the evaluation.

hkey = 255, h(0, miss)ii , h0 ≤ key ≤ 254, h(3, hit)ii A. Implementation setup

We use string alignment algorithm [6] to make two traces The input to CACHEFIX is the target program and a equivalent (via insertion of cache hits/misses or substitution cache configuration. We have implemented CACHEFIX on CBMC of hits to misses). top of bounded model checker [1]. It first builds a formula representation of the input program via symbolic execution. Then, it checks the (un)satisfiability of this formula D. Practical consideration against a specification property. Despite being a bounded model checker, CBMC is used as a classic verification tool In practice, the injection of a cache miss can be performed in our experiments. In particular for program loops, CBMC via accessing a fresh memory block (cf. Figure 1(d)). However, first attempts to derive loop bounds automatically. If CBMC unless the injection of a cache miss happens to be in the fails to derive certain loop bounds, then the respective loop beginning or at the end of execution, the cache needs to bounds need to be provided manually. Nevertheless, during be disabled before and enabled after such a cache miss. the verification process, CBMC checks all manually provided Consequently, our injection of misses does not affect cache loop bounds and the verification fails if any such bound was states. In ARM-based processor, this is accomplished via erroneous. In our experiments, all loop bounds were automat- manipulating the C bit of CP15 register. The injection of ically derived by CBMC. In short, if CACHEFIX successfully a cache hit can be performed via tracking the last accessed verifies a program, then the respective program exhibits cache memory address and re-accessing the same address. side-channel freedom for the given cache configuration and To change a cache hit to a cache miss, the accessed memory targeted attack models. address needs to be invalidated in the cache. CACHEFIX The implementation of our checker impacts the entire work- symbolically tracks the memory address accessed at each flow of CBMC. We first modify the symbolic execution engine memory-related instruction. When the program is run with of CBMC to insert the predicates related to cache semantics. input I ∈ νi, we concretize all memory addresses with respect As a result, upon the termination of symbolic execution, the to I. Hence, while applying an action that involves cache formula representation of the program encodes both the cache invalidation, we know the exact memory address that needs to semantics and the program semantics. Secondly, we systemati- be invalidated. In ARM-based processor, the instruction MCR cally rewrite this formula based on our abstraction refinement, provides capabilities to invalidate an address in the cache. with the aim of verifying cache side-channel freedom. Finally, We note the preceding manipulations on an execution re- we modify the verification engine of CBMC to systematically quires additional registers. We believe this is possible by using explore different counterexamples, instrument patches and some system register or using a locked portion in the cache. refining the side-channel freedom properties on-the-fly. To 9

manipulate and solve symbolic formulas, we leverage Z3 CACHEFIX verifies or generates real counterexamples at theorem prover. All reported experiments were performed on an average within 70.38 secs w.r.t. attack model Otime an Intel I7 machine, having 8GB RAM and running OSX. and at an average within 74.12 secs w.r.t. Otrace. More- over, the abstraction refinement process embodied within B. Subject programs and cache CACHEFIX substantially reduces the verification time as We have chosen security-critical subjects from OpenSSL opposed to when no refinement process was employed. library [5], GDK library [3], arithmetic routines from libfixedtimefixedpoint [4] and routines from FourQlib [2] to evaluate CACHEFIX (cf. Table I). D. Overhead from monitors The choice of our subjects is driven by the criticality of the respective routines in developing cryptographic software. We evaluated the time taken by CACHEFIX for counterex- cf. To stress test our checker, we include representative routines ample exploration and patch generation ( Section V). For exhibiting constant cache-timing, as well as routines exhibiting each generated patch, we have also evaluated the overhead variable cache timing. We set the default cache to be 1KB induced by the same at runtime (cf. Table II). direct-mapped, with a line size of 32 bytes. Table II captures the maximum and average overhead in- duced by CACHEFIX at runtime. We compute the overhead via the number of additional runtime actions (i.e. cache misses, C. Efficiency of checking hits or invalidations) introduced solely via CACHEFIX. In Table I captures a summary of our evaluation for CACHE- absolute terms, the maximum (average) overhead captures FIX. The outcome of this evaluation is either a successful veri- the maximum (average) number of runtime actions induced fication () or a non-spurious counterexample (). CACHEFIX over all equivalence classes. The maximum overhead was accomplished the verification tasks for all subjects only within introduced in case of DES – 300 actions for Otime and 371 a few minutes. The maximum time taken by our checker actions for Otrace. This is primarily due to the difficulty was 390 seconds for the routine fix_pow – a constant in making a large number of traces equivalent in terms of time implementation of powers (xy). fix_pow has complex the number of cache misses and the sequence of hit/miss, memory access patterns, however, its flat structure ensures respectively. Although the number of actions introduced by cache side-channel freedom. CACHEFIX is non-negligible, we note that their effect is To check the effectiveness of our abstraction refinement minimal on the overall execution. To this end, we execute the process, we compare CACHEFIX with a variant of our checker program for 100 different inputs in each explored equivalence where all predicates in Pred set ∪ Pred tag are considered. class and measure the overhead introduced by CACHEFIX (cf. Therefore, such a variant does not employ any abstraction “% actions” in Table II) with respect to the total number of refinement, as the set of predicates Pred set ∪ Pred tag is instructions executed. We observe that the maximum overhead sufficient to determine the cache behaviour of all instructions reaches up to 3.2% and the average overhead is up to 2.1%. in the program. We compare CACHEFIX with this variant in We believe this overhead is acceptable in the light of cache terms of the number of predicates, as well as the verification side-channel freedom guarantees provided by CACHEFIX. time. We record the set of predicates Pred cur considered in Except AES and DES, the cache behaviour of a single CACHEFIX when it terminates with a successful verification program path is independent of program inputs. For the () or a real counterexample (). Table I clearly demonstrates respective subjects, exactly the same number of equivalence the effectiveness of our abstraction refinement process. Specif- classes were explored for both attack models (cf. Table II). ically, for AES, DES and fix_pow, the checker does not ter- Each explored equivalence class was primarily attributed to minate in ten minutes when all predicates in Pred set ∪Pred tag a unique program path. Nevertheless, due to more involved are considered during the verification process. In general, computations (cf. Section V), the overhead of CACHEFIX in the refinement process reduces the number of considered attack model Otrace is higher than the overhead in attack predicates by a factor of 1.81x on average. This leads to a model Otime. As observed from Table II, CACHEFIX discov- substantial improvement in verification time, as observed from ers significantly more equivalence classes w.r.t. attack model Table I. Otrace as compared to the number of equivalence classes w.r.t. The routines chosen from OpenSSL library are single path attack model Otime. This implies AES is more vulnerable programs. However, AES and DES exhibit input-dependent to Otrace as compared to Otime. Excluding DES subject to memory accesses, hence, violating side-channel freedom. Otrace, our exploration terminates in all scenarios within 20 The other routines violate cache side-channel freedom due mins. to input-dependent loop trip counts. For example, routines chosen from the GDK library employ a binary search of To explore counterexample and generate patches, CACHE- the input keystroke over a large table. We note that both FIX takes 2.27x more time with Otrace, as compared to libfixedtimefixedpoint and FourQlib libraries in- Otime. Moreover, excluding DES, CACHEFIX explores all clude comments involving the security risks in fix_frac and equivalence classes of observations within 20 minutes in ecc_point_validate (cf. validate in Table I). all scenarios. Finally, the runtime overhead induced by CACHEFIX is only up to 3.2% with respect to the number of executed instructions. 10

TABLE I SUMMARY OF CACHEFIX EVALUATION.TIMEOUTISSETTOTENMINUTES. Time CAPTURESTHETIMETAKENBY CACHEFIX (I.E.# PREDICATES |Pred cur|), WHEREAS Timeall CAPTURES THE TIME TAKEN WHEN ALL PREDICATES (I.E.# PREDICATES |Pred set ∪ Pred tag|) ARECONSIDERED.

Library Routine Size Otime Otrace (LOC) |Pred set ∪ Pred tag| |Pred cur| Result Time Timeall |Pred cur| Result Time Timeall (secs) (secs) (secs) (secs) AES128 740 231959 115537  148.73 timeout 115545  175.34 timeout OpenSSL DES 2124 205567 95529  105.87 timeout 95639  110.32 timeout [5] RC5 1613 50836 25277  26.88 541.35 26277  34.84 585.32 GDK keyname 712 20827 18178  21.75 120.21 18178  24.32 125.11 [3] unicode 862 21917 19178  27.49 117.78 19178  32.09 119.56 fix_eq 334 71 32  1.70 5.70 32  4.31 6.08 fix_cmp 400 957 474  2.09 6.12 476  2.08 6.15 fix_mul 930 33330 16651  22.44 100.32 17016  20.19 121.37 fix_conv_64 350 211 102  1.69 1.81 102  1.78 1.98 fix_sqrt 2480 150127 74961  60.54 359.94 87834  67.90 581.45 fixedt fix_exp 1128 101418 50655  53.47 400.11 56194  55.12 416.21 [4] fix_ln 1140 92113 46025  58.89 114.89 47065  61.11 127.21 fix_pow 2890 643961 321885  389.97 timeout 323787  377.55 timeout fix_ceil 390 266 128  1.70 1.70 128  4.65 4.70 fix_conv_double 650 1921 953  2.70 2.75 945  4.16 4.20 fix_frac 370 60172 53638  22.14 23.18 51432  22.15 24.58 eccmadd 1550 324661 162058  220.59 437.77 165453  214.16 502.94 eccnorm 1303 165291 82464  105.97 198.90 83100  114.93 219.09 pt_setup 1345 3850 1866  10.45 11.66 1901  14.90 14.95 eccdouble 1364 531085 265219  285.70 558.78 267889  312.31 597.18 R1_to_R2 1352 85750 42731  73.07 249.12 41014  84.11 398.77 FourQ R1_to_R3 1328 25246 12538  26.95 53.21 12555  24.89 54.45 [2] R2_to_R4 1322 28415 14105  32.29 55.11 17001  31.12 61.88 R5_to_R1 1387 12531 5255  22.07 34.05 5278  24.32 52.88 eccpt_validate 1406 57129 38012  34.48 61.91 37948  34.31 71.54

TABLE II OVERHEADINGENERATINGMONITORSANDDUETOTHEAPPLIEDRUNTIMEACTIONS. Time CAPTURES THE TIME TO GENERATE ALL PATCHES.THE OVERHEAD (MAXIMUM AND AVERAGE) CAPTURESTHENUMBEROFEXTRACACHEMISSES, HITS AND INVALIDATIONS INTRODUCED BY CACHEFIXIN ABSOLUTETERM (I.E.# ACTIONS) ANDWITHRESPECTTOTHETOTALNUMBEROFINSTRUCTIONS (I.E.% ACTIONS).

Routine Otime Otrace #Equivalence Time Max. overhead Avg. overhead #Equivalence Time Max. overhead Avg. overhead class (secs) (# actions) (% actions) (# actions) (% actions) class (secs) (# actions) (% actions) (# actions) (% actions) AES128 3 5 117 0.1% 60 0.05% 38 1271 120 0.1% 90 0.07% DES 333 444 300 0.3% 150 0.15% 982 12601 371 0.6% 231 0.4% fix_frac 82 521 80 1.2% 40 0.6% 82 956 119 1.5% 62 1.1% eccpt_validate 20 22 18 0.09% 9 0.05% 20 234 21 1.1% 11 0.04% keyname 41 1119 39 2.6% 19 1.2% 41 1324 45 3.2% 28 2.1% unicode 43 197 41 2.4% 20 1.2% 43 229 49 2.4% 28 1.8%

E. Sensitivity w.r.t. cache configuration of 1.5x. However, such an increased number of predicates We evaluated CACHEFIX for a variety of cache associa- does not translate to significant verification timing for set- tivity (1-way, 2-way and 4-way), cache size (from 1KB to associative caches. 8KB) and with LRU as well as FIFO replacement policies (detailed experiments are included in the supplement [18]). VII.REVIEWOF PRIOR WORKS We observed that the verification time increases marginally (about 7%) when set-associative caches were used instead of Earlier works on cache analysis are based on abstract direct-mapped caches and does not vary significantly with interpretation [35] and its combination with model check- respect to replacement policy. Finally, we observed changes ing [17], to estimate the worst-case execution time (WCET) in the number of equivalence classes of observations for both of a program. In contrast to these approaches, CACHEFIX AES and DES while running these subjects with different automatically builds and refine the abstraction of cache se- replacement policies. However, neither AES nor DES satisfied mantics for verifying side-channel freedom. Cache attacks are cache side-channel freedom for any of the cache size and re- one of the most critical side-channel attacks [25], [13], [7], placement policies tested in our evaluation. The relatively low [28], [26], [38], [37], [32], [27]. In contrast to the literature on verification time results from the fact that the total number of side-channel attacks, we do not engineer new cache attacks in predicates (i.e. Pred set ∪Pred tag) is independent of cache size this paper. Based on a configurable attack model, CACHEFIX and replacement policy. Nevertheless, the symbolic encoding verifies and reinstates the cache side-channel freedom of for set-associative caches is more involved than direct-mapped arbitrary programs. caches. This results in an average increase to the number of Orthogonal to approaches proposing countermeasures [36], predicates considered for verification (i.e. Pred cur) by a factor [21], the fixes generated by CACHEFIX is guided by program 11

verification output. Thus, CACHEFIX can provide cache side- ACKNOWLEDGMENTS channel freedom guarantees about the fixed program. More- We thank the anonymous reviewers for their insightful over, CACHEFIX can be leveraged to formally verify whether comments. This work is partially supported by SUTD research existing countermeasures are capable to ensure side-channel grant SRIS17123. This research is also supported in part by freedom. the National Research Foundation, Prime Ministers Office, In contrast to recent approaches on statically analyzing Singapore under its National Cybersecurity R&D Program cache side channels [30], [22], [14], [29], our CACHEFIX (Award No. NRF2014NCR-NCR001-21) and administered by approach automatically constructs and refines the abstractions the National Cybersecurity R&D Directorate. for verifying cache side-channel freedom. Moreover, contrary to CACHEFIX, approaches based on static analysis are not di- REFERENCES rectly applicable when the underlying program does not satisfy [1] CBMC: Bounded Model Checking for Software. (Date last accessed cache side-channel freedom. CACHEFIX targets verification 23-October-2017). of arbitrary software programs, over and above constant-time [2] FourQLib library. (Date last accessed 20-October-2017). implementations [11], [8]. Existing works based on symbolic [3] GDK library. (Date last accessed 20-October-2017). [4] A library for doing constant-time fixed-point numeric operations. (Date execution [34], [10], taint analysis [20], [33] and verifying last accessed 20-October-2017). timing-channel freedom [9] ignore cache attacks. Moreover, [5] OpenSSL library. (Date last accessed 20-October-2017). these works do not provide capabilities for automatic ab- [6] SSW Library. (Date last accessed 23-October-2017). [7] Onur Acıic¸mez and C¸etin Kaya Koc¸. Trace-driven cache attacks on straction refinement and patch synthesis for ensuring side- AES. In Information and Communications Security. Springer, 2006. channel freedom. Finally, in contrast to these works, we show [8] Jose´ Bacelar Almeida, Manuel Barbosa, Gilles Barthe, Franc¸ois Dupres- soir, and Michael Emmi. Verifying constant-time implementations. In that our CACHEFIX approach scales with routines from real USENIX, pages 53–70, 2016. cryptographic libraries. [9] Timos Antonopoulos, Paul Gazzillo, Michael Hicks, Eric Koskinen, Finally, recent approaches on testing and quantifying cache Tachio Terauchi, and Shiyi Wei. Decomposition instead of self- composition for proving the absence of timing channels. In PLDI, pages side-channel leakage [15], [12], [16] are complementary to 362–375, 2017. CACHEFIX. These works have the flavour of testing and and [10] Michael Backes, Boris Kopf,¨ and Andrey Rybalchenko. Automatic they do not provide capabilities to ensure cache side-channel discovery and quantification of information leaks. In IEEE S&P, pages 141–153, 2009. freedom. [11] Gilles Barthe, Gustavo Betarte, Juan Diego Campo, Carlos Daniel Luna, and David Pichardie. System-level non-interference for constant-time . In CCS, pages 1267–1279, 2014. [12] Tiyash Basu and Sudipta Chattopadhyay. Testing cache side-channel VIII.DISCUSSION leakage. In ICST Workshops, pages 51–60, 2017. [13] Daniel J Bernstein. Cache-timing attacks on AES, 2005. In this paper, we propose CACHEFIX, a novel approach to [14] Pablo Canones,˜ Boris Kopf,¨ and Jan Reineke. Security analysis of cache replacement policies. In POST, pages 189–209, 2017. automatically verify and restore cache side-channel freedom [15] Sudipta Chattopadhyay. Directed automated memory performance of arbitrary programs. The key novelty in our approach is two testing. In TACAS, pages 38–55, 2017. fold. Firstly, our CACHEFIX approach automatically builds [16] Sudipta Chattopadhyay, Moritz Beck, Ahmed Rezine, and Andreas Zeller. Quantifying the information leak in cache attacks via symbolic and refines abstraction of cache semantics. Although targeted execution. In MEMOCODE, pages 25–35, 2017. to verify cache side-channel freedom, we believe CACHEFIX [17] Sudipta Chattopadhyay and Abhik Roychoudhury. Scalable and precise is applicable to verify other cache timing properties, such as refinement of cache timing analysis via path-sensitive verification. Real- Time Systems, 49(4):517–562, 2013. WCET. Secondly, the core symbolic engine of CACHEFIX [18] Sudipta Chattopadhyay and Abhik Roychoudhury. Symbolic verification systematically combines its reasoning power with runtime of cache side-channel freedom. CoRR, abs/1807.04701, 2018. monitoring to ensure cache side-channel freedom during pro- [19] Edmund M. Clarke, Armin Biere, Richard Raimi, and Yunshan Zhu. Bounded model checking using satisfiability solving. Formal Methods gram execution. Our evaluation reveals promising results, for in System Design, 19(1):7–34, 2001. 25 routines from several cryptographic libraries, CACHEFIX [20] James Clause, Wanchun Li, and Alessandro Orso. Dytan: a generic (dis)proves cache side-channel freedom within an average 75 dynamic taint analysis framework. In ISSTA, pages 196–206, 2007. [21] Stephen Crane, Andrei Homescu, Stefan Brunthaler, Per Larsen, and seconds. Moreover, in most scenarios, CACHEFIX generated Michael Franz. Thwarting cache side-channel attacks through dynamic patches within 20 minutes to ensure cache side-channel free- software diversity. In NDSS, 2015. dom during program execution. Despite this result, we believe [22] Goran Doychev, Boris Kopf,¨ Laurent Mauborgne, and Jan Reineke. CacheAudit: a tool for the static analysis of cache side channels. that CACHEFIX is only an initial step for the automated TISSEC, 18(1):4, 2015. verification of cache side-channel freedom. In particular, we do [23] Moritz Lipp et al. Meltdown. ArXiv e-prints, 2018. not account cache attacks that are more powerful than timing [24] Paul Kocher et al. Spectre attacks: Exploiting speculative execution. ArXiv e-prints, 2018. or trace-based attacks. Besides, we do not implement the syn- [25] Qian Ge, Yuval Yarom, David Cock, and Gernot Heiser. A survey of thesized patches in a commodity embedded system to check microarchitectural timing attacks and countermeasures on contemporary their performance impact. We hope that the community will hardware. In Cryptology ePrint Archive, 2016. https://eprint.iacr.org/ 2016/613.pdf/. take this effort forward and push the adoption of formal tools [26] Daniel Gruss, Raphael Spreitzer, and Stefan Mangard. Cache template for the evaluation of cache side-channel. For reproducibility attacks: Automating attacks on inclusive last-level caches. In USENIX and research, our tool and all experimental data are publicly Security, 2015. [27] Roberto Guanciale, Hamed Nemati, Christoph Baumann, and Mads available: Dam. Cache storage channels: Alias-driven attacks and verified counter- measures. In IEEE Symposium on Security and Privacy, pages 38–55, https://bitbucket.org/sudiptac/cachefix 2016. 12

[28] David Gullasch, Endre Bangerter, and Stephan Krenn. Cache games– [34] Corina S. Pasareanu, Quoc-Sang Phan, and Pasquale Malacaria. Multi- bringing access-based cache attacks on AES to practice. In IEEE run side-channel analysis using symbolic execution and max-smt. In Symposium on Security and Privacy. IEEE, 2011. CSF, 2016. [29] Boris Kopf¨ and David A. Basin. An information-theoretic model for [35] Henrik Theiling, Christian Ferdinand, and Reinhard Wilhelm. Fast and adaptive side-channel attacks. In CCS, pages 286–296, 2007. precise WCET prediction by separated cache and path analyses. Real- [30] Boris Kopf,¨ Laurent Mauborgne, and Mart´ın Ochoa. Automatic quan- Time Systems, 18(2-3), 2000. tification of cache side-channels. In CAV. Springer, 2012. [36] Zhenghong Wang and Ruby B. Lee. New cache designs for thwarting [31] Moritz Lipp, Daniel Gruss, Raphael Spreitzer, Clementine´ Maurice, and software cache-based side channel attacks. In ISCA, pages 494–505, Stefan Mangard. ARMageddon: Cache attacks on mobile devices. In 2007. USENIX Security Symposium, pages 549–564, 2016. [37] Yuval Yarom and Katrina Falkner. FLUSH+RELOAD: A high reso- [32] Fangfei Liu, Yuval Yarom, Qian Ge, Gernot Heiser, and Ruby B. Lee. lution, low noise, L3 cache side-channel attack. In USENIX Security Last-level cache side-channel attacks are practical. In IEEE Symposium Symposium, pages 719–732, 2014. on Security and Privacy, pages 605–622, 2015. [38] Yuval Yarom, Daniel Genkin, and Nadia Heninger. Cachebleed: a timing [33] James Newsome, Stephen McCamant, and Dawn Song. Measuring attack on openssl constant-time RSA. J. Cryptographic Engineering, channel capacity to distinguish undue influence. In PLAS, pages 73–85, 7(2):99–112, 2017. 2009.