Constant-Time Foundations for the New Spectre Era

Sunjay Cauligi† Craig Disselkoen† Klaus v. Gleissenthall† Dean Tullsen† Deian Stefan† Tamara Rezk⋆ Gilles Barthe♠♣ †UC San Diego, USA ⋆INRIA Sophia Antipolis, France ♠MPI for Security and Privacy, Germany ♣IMDEA Software Institute, Spain Abstract 1 Introduction The constant-time discipline is a software-based countermea- Protecting secrets in software is hard. Security and cryptog- sure used for protecting high assurance cryptographic imple- raphy engineers must write programs that protect secrets, mentations against timing side-channel attacks. Constant- both at the source level and when they execute on real hard- time is effective (it protects against many known attacks), ware. Unfortunately, hardware too easily divulges informa- rigorous (it can be formalized using program semantics), and tion about a program’s execution via timing side-channels— amenable to automated verification. Yet, the advent of micro- e.g., an attacker can learn secrets by simply observing (via architectural attacks makes constant-time as it exists today timing) the effects of a program on the hardware16 cache[ ]. far less useful. The most robust way to deal with timing side-channels This paper lays foundations for constant-time program- is via constant-time programming—the paradigm used to im- ming in the presence of speculative and out-of-order exe- plement almost all modern cryptography [2, 11, 12, 26, 27]. cution. We present an operational semantics and a formal Constant-time programs can neither branch on secrets nor definition of constant-time programs in this extended setting. access memory based on secret data.1 These restrictions Our semantics eschews formalization of microarchitectural ensure that programs do not leak secrets via timing side- features (that are instead assumed under adversary control), channels on hardware without microarchitectural features. and yields a notion of constant-time that retains the ele- Unfortunately, these guarantees are moot for most modern gance and tractability of the usual notion. We demonstrate hardware: Spectre [20], Meltdown [22], ZombieLoad [29], the relevance of our semantics in two ways: First, by con- RIDL [32], and Fallout [5] are all dramatic examples of attacks trasting existing Spectre-like attacks with our definition of that exploit microarchitectural features. These attacks reveal constant-time. Second, by implementing a static analysis that code that is deemed constant-time in the usual sense tool, Pitchfork, which detects violations of our extended may, in fact, leak information on processors with microar- constant-time property in real world cryptographic libraries. chitectural features. The decade-old constant-time recipes are no longer enough.2 CCS Concepts: • Security and privacy → Formal secu- In this work, we lay the foundations for constant-time in rity models; Side-channel analysis and countermeasures. the presence of microarchitectural features that have been Keywords: Spectre; ; semantics; static exploited in recent attacks: out-of-order and speculative ex- analysis ecution. We focus on constant-time for two key reasons. First, impact: constant-time programming is largely used in ACM Reference Format: real-world crypto libraries—and high-assurance code—where Sunjay Cauligi, Craig Disselkoen, Klaus v. Gleissenthall, Dean developers already go to great lengths to eliminate leaks via Tullsen, Deian Stefan, Tamara Rezk, and Gilles Barthe. 2020. Constant- side-channels. Second, foundations: constant-time program- Time Foundations for the New Spectre Era. In Proceedings of the 41st ming is already rooted in foundations, with well-defined ACM SIGPLAN International Conference on Programming Language semantics [4, 8]. These semantics consider very powerful Design and Implementation (PLDI ’20), June 15–20, 2020, London, attackers—e.g., attackers in [4] have control over the cache UK. ACM, New York, NY, USA, 20 pages. https://doi.org/10.1145/ and the scheduler. An advantage of considering powerful 3385412.3385970 attackers is that the semantics can overlook many hardware details—e.g., since the cache is adversarially controlled, there Permission to make digital or hard copies of part or all of this work for is no point in modeling it precisely—making constant-time personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies amenable to automated verification and enforcement. bear this notice and the full citation on the first page. Copyrights for third- Contributions. We first define a semantics for an abstract, party components of this work must be honored. For all other uses, contact three-stage (fetch, execute, and retire) machine. Our machine the owner/author(s). PLDI ’20, June 15–20, 2020, London, UK 1More generally, constant-time programs cannot use secret data as input to © 2020 Copyright held by the owner/author(s). any variable-time operation—e.g., floating-point multiplication. ACM ISBN 978-1-4503-7613-6/20/06. 2OpenSSL found this situation so hopeless that they recently updated their https://doi.org/10.1145/3385412.3385970 security model to explicitly exclude “physical system side channels” [25]. PLDI ’20, June 15–20, 2020, London, UK S. Cauligi, C. Disselkoen, K. v. Gleissenthall, D. Tullsen, D. Stefan, T. Rezk, and G. Barthe supports out-of-order and speculative execution by model- before evaluating the branch condition. In that case, the pro- ing reorder buffers and transient instructions, respectively. cessor guesses which branch will be taken. For example, the We assume that attackers have complete control over mi- processor may erroneously guess that the branch condition croarchitectural features (e.g., the branch target predictor) at line 1 evaluates to true, even though ra contains value 9. It when executing a victim program and model the attacker’s will therefore continue down the “true” branch speculatively. control over predictors using directives. This keeps our se- In hardware, such guesses are made by a branch prediction mantics simple yet powerful: our semantics abstracts over unit, which may have been mistrained by an adversary. all predictors when proving security—of course, assuming These guesses, as well as additional choices such as exe- that predictors themselves do not leak secrets. We further cution order, are directly supplied by the adversary in our show how our semantics can be extended to capture new semantics. We model this through a series of directives, as predictors—e.g., a hypothetical memory aliasing predictor. shown on the bottom left of Figure1. The directive fetch: true We then define speculative constant-time, an extension of instructs our model to speculatively follow the true branch constant-time for machines with out-of-order and specula- and to place the fetched instruction at index 1 in the reorder tive execution. This definition allows us to discover microar- buffer. Similarly, the two following fetch directives place the chitectural side channels in a principled way—all four classes loads at indices 2 and 3 in the buffer. The instructions in the of Spectre attacks as classified by Canella et al.[6], for ex- reorder buffer, called transient instructions, do not necessarily ample, manifest as violations of our constant-time property. match the original instructions, but can contain additional We further use our semantics as the basis for a prototype information (see Table1). For instance, the transient version analysis tool, Pitchfork, built on top of the angr symbolic of the branch instruction records which branch has been execution engine [30]. Like other symbolic analysis tools, speculatively taken. Pitchfork suffers from path explosion, which limits the depth In our example, the attacker next instructs the model to of speculation we can analyze. Nevertheless, we are able to execute the first load, using the directive execute 2. Because use Pitchfork to detect multiple Spectre bugs in real code. the bounds check has not yet been executed, the load reads We use Pitchfork to detect leaks in the well-known Kocher from the secret element Key[1], placing the value in rb . The test cases [19] for Spectre v1, as well as our more exten- attacker then issues directive execute 3 to execute the fol- sive test suite which includes Spectre v1.1 variants. More lowing load; this load’s address is calculated as 44 + Key[1]. significantly, we use Pitchfork to analyze—and find leaks Accessing this address affects externally visible cache state, in—real cryptographic code from the libsodium, OpenSSL, allowing the attacker to recover Key[1] through a cache side- and curve25519-donna libraries. channel attack [16]. This is encoded by the leakage observa- Open source. Pitchfork and our test suites are open source tion shown in red on the bottom right. Though this secret and available at https://pitchfork.programming.systems. leakage cannot happen under sequential execution, our se- mantics clearly highlights the possible leak when we account 2 Motivating Examples for microarchitectural features. In this section, we show why classical constant-time pro- Modeling hypothetical attacks. Next, we give an example gramming is insufficient when attackers can exploit microar- of a hypothetical class of Spectre attack captured by our chitectural features. We do this via two example attacks and extended semantics. The attack is based on a microarchi- show how these attacks are captured by our semantics. tectural feature which would allow processors to speculate Classical constant time is not enough. Our first example whether a store and load pair might operate on the same consists of 3 lines of code, shown in Figure1 (top right). The address, and forward values between them [18, 28]. program, a variant of the classical Spectre v1 attack [20], We demonstrate this attack in Figure2. The reorder buffer, branches on the value of register ra (line 1). If ra’s value after all instructions have been fetched, is shown in the top is smaller than 4, the program jumps to program location right. The program stores the value of register rb into the 2, where it uses ra to index into a public array A, saves the secretKeysec array and eventually loads two values from pub- value into register rb , and uses rb to index into another public lic arrays. The attacker first issues the directive execute 2 : array B. If ra is larger than or equal to 4 (i.e., the index is out value; this results in a buffer where the store instruction at 2 of bounds), the program skips the two load instructions and has been modified to record the resolved value xsec. Next, the jumps to location 4. In a sequential execution, this program attacker issues the directive execute 7 : fwd 2, which causes neither loads nor branches on secret values. It thus trivially the model to mispredict that the load at 7 aliases with the satisfies the constant-time discipline. store at 2, and thus to forward the value xsec to the load. The However, modern processors do not execute sequentially. forwarded value xsec is then used in the address a = 48 +xsec Instead, they continue fetching instructions before prior of the load instruction at index 8. There, the loaded value X instructions are complete. In particular, a processor may is irrelevant, but the address a is leaked to the attacker, al- continue fetching instructions beyond a conditional branch, lowing them to recover the secret value xsec. The speculative Constant-Time Foundations for the New Spectre Era PLDI ’20, June 15–20, 2020, London, UK

Registers Program Registers Reorder buffer r ρ(r) n µ(n) r ρ(r) i buf (i) r br(>, ( ,r ), , ) a 9pub 1 4 a 2 4 ra 2pub 2 store(rb , [40,ra]) (r = load([40,r ], )) Memory 2 b a 3 rb xsec ... ( ) ( = ([44, ], )) a µ a 3 rc load rb 4 Memory 7 (rc = load([45])) 40..43 array A 4 ... pub a µ(a) 8 (rc = load([48,rc ])) 44..47 array B pub 40..43 secretKeysec 48 4B .. array Keysec 44..47 pubArrApub 48..4B pubArrBpub Speculative execution: Directive Effect on reorder buffer Leakage Speculative execution fetch: true 1 7→ br(>, (4,ra), 2, (2, 4)) Directive Effect on buf Leakage 7→ ( = ([40, ])) fetch 2 rb load ra execute 2 : value 2 7→ store(xsec, [40,ra]) 7→ ( = ([44, ])) fetch 3 rc load rb execute 7 : fwd 2 7 7→ (rc = load([45], xsec, 2)) 7→ ( = [ ] ) read 49 execute 2 2 rb Key 1 sec pub execute 8 8 7→ (rc = X {⊥, a}) read asec execute 3 3 7→ (r = X) read a c sec execute 2 : addr 2 7→ store(rb , 42pub) fwd 42pub where a = Key[1]sec + 44 rollback, execute 7 {7, 8} < buf fwd 45pub Figure 1. Example demonstrating a Spectre v1 attack. The where a = xsec + 48 branch at 1 acts as bounds check for array A. The execution speculatively ignores the bounds check, and leaks a byte of Figure 2. Example demonstrating a hypothetical attack abus- the secret Key. ing an aliasing predictor. This attack differs from prior spec- ulative data forwarding attacks in that branch misprediction is not needed. execution continues and rolls back when the misprediction is detected (details on this are given in Section3), but at this point, the secret has already been leaked. Reorder buffer. The reorder buffer maps buffer indices (nat- As with the example in Figure1, the program in this ex- ural numbers) to transient instructions. We write buf (i) to ample follows the (sequential) constant-time discipline, yet denote the instruction at index i in buffer buf , if i is in buf ’s leaks during speculative execution. But, both examples are domain. We write buf [i 7→ instr] to denote the result of ex- insecure under our new notion of speculative constant-time tending buf with the mapping from i to instr, and buf \buf (i) as we discuss next. for the function formed by removing i from buf ’s domain. 3 Speculative Semantics and Security We write buf [j : j < i] to denote the restriction of buf ’s domain to all indices j, s.t. j < i (i.e., removing all map- In this section we define the notion of speculative constant pings at indices i and greater). Our rules add and remove time, and propose a speculative semantics that models exe- indices in a way that ensures that buf ’s domain will always cution on modern processors. We start by laying the ground- be contiguous. work for our definitions and semantics. Notation. We let MIN(M) (resp. MAX(M)) denote the mini- Configurations. A configuration C ∈ Confs represents the mum (maximum) index in the domain of a mapping M. We de- state of execution at a given step. It is defined as a tuple note the empty mapping as ∅ and let MIN(∅) = MAX(∅) = 0. (ρ, µ,n, buf ) where: For a formula φ, we may discuss the bounded highest ▶ ρ : R ⇀ V is a map from a finite set of register names (lowest) index for which a formula holds. We write max(j) < R to values; i : φ(j) to mean that j is the highest index less than i for ▶ µ : V ⇀ V is a memory; which φ holds, and define min(j) > i : φ(j) analogously. n : V is the current program point; ▶ Register resolve function. In Figure3, we define the register buf : N ⇀ TransInstr is the reorder buffer. ▶ resolve function, which we use to determine the value of Values and labels. As a convention, we use n for memory a register in the presence of transient instructions in the addresses that map to instructions, and a for addresses that reorder buffer. For index i and register r, the function may map to data. Each value is annotated with a label from a (1) return the latest assignment to r prior to position i in the lattice of security labels with join operator ⊔. For brevity, buffer, if the corresponding operation is already resolved; we sometimes omit public label annotation on values. (2) return the value from the register map ρ, if there are no Using labels, we define an equivalence ≃pub on configura- pending assignments to r in the buffer; or (3) be undefined. tions. We say that two configurations are equivalent if they Note that if the latest assignment to r is yet unresolved then coincide on public values in registers and memories. (buf +i ρ)(r) = ⊥. We extend this definition to values by PLDI ’20, June 15–20, 2020, London, UK S. Cauligi, C. Disselkoen, K. v. Gleissenthall, D. Tullsen, D. Stefan, T. Rezk, and G. Barthe

Table 1. Instructions and their transient instruction form.

Instruction Transient form(s) arithmetic operation (r = op(op, −r−v−⃗−−)) (unresolved op) (r = op(op, −r−v−⃗−−,n′)) (op specifies opcode) (r = vℓ) (resolved value) br(op, −r−v−⃗−−,n , (ntrue,nfalse)) (unresolved conditional) conditional branch br(op, −r−v−⃗−−,ntrue,nfalse) 0 jump n0 (resolved conditional) (r = load(−r−v−⃗−−))n (unresolved load) −−−−− n memory load −−−−− ′ (r = load(rv⃗, (vℓ, j))) (partially resolved load with dependency on j) (r = load(rv⃗,n )) n (at program point n) (r = vℓ {⊥, a}) (resolved load without dependencies) n (r = vℓ {j, a}) (resolved load with dependency on j) store(rv, −r−v−⃗−−) (unresolved store) memory store store(rv, −r−v−⃗−−,n′) store(vℓ, aℓ) (resolved store) −−−−− −−−−− indirect jump jmpi(rv⃗) jmpi(rv⃗,n0) (unresolved jump predicted to n0) call(n ,n ) call (unresolved call) function calls f ret ret ret (unresolved return) speculation fence fence n fence (no resolution step)

This approach has two important consequences. First, the vℓ if max(j) < i : buf (j) = (r = _) ∧  use of observations and directives allows our semantics to  buf (j) = (r = vℓ) (buf +i ρ)(r) = remain tractable and amenable to verification. For instance, ρ(r) if j < i : buf (j) , (r = _) we do not need to model the behavior of the cache or any  ∀ ⊥ otherwise . Second, our notion of speculative constant-  time is robust, i.e., it holds for all possible branch predictors and replacement policies—assuming that they do not leak Figure 3. Definition of the register resolve function. secrets directly, a condition that is achieved by all practical hardware implementations. Given an attacker directive d, we use C ,→o C ′ to denote defining (buf + ρ)(v ) = v for all v ∈ V, and lift it to lists d i ℓ ℓ ℓ the execution step from configuration C to configuration C ′ of registers or values using a pointwise lifting. that produces observation o. Program execution is defined from the small-step semantics in the usual style. We use 3.1 Speculative Constant-Time N ′ C O⇓D C to denote a sequence of execution steps from C to We present our new notion of constant-time security in terms C ′. Here D and O are the concatenation of the single-step of a small-step semantics, which relates program configura- directives and leakages, respectively; N is the number of tions, observations, and attacker directives. retired instructions, i.e., N = #{d ∈ D | d = retire}. When Our semantics does not directly model caches, nor any such a big step from C to C ′ is possible, we say D is a well- of the predictors used by speculative semantics. Rather, we formed schedule of directives for C. We omit D, N , or O when model externally visible effects—memory accesses and con- not used. trol flow—by producing a sequence of observations. We can thus reason about any possible cache implementation, as Definition 3.1 (Speculative constant-time). We say a config- any cache eviction policy can be expressed as a function of uration C with schedule D satisfies speculative constant-time (SCT) with respect to a low-equivalence relation ≃pub iff for the sequence of observations. Furthermore, exposing control ′ ′ flow observations directly in our semantics makes it unnec- every C such that C ≃pub C : ′ ′ ′ ′ essary for us to track various other side channels. Indeed, C D⇓O C1 iff C D⇓O′ C1 and C1 ≃pub C1 and O = O . while channels such as port contention or register renaming A program satisfies SCT iff every initial configuration satis- produce distinct measurable effects [20], they only serve to fies SCT under any schedule. leak the path taken through the code—and thus modeling these observations separately would be redundant. For the Aside, on sequential execution. Processors work hard to same reason, we do not model a particular branch predic- create the illusion that assembly instructions are executed tion strategy; we instead let the attacker resolve scheduling sequentially. We validate our semantics by proving equiv- non-determinism by supplying a series of directives. alence with respect to sequential execution. Formally, we Constant-Time Foundations for the New Spectre Era PLDI ’20, June 15–20, 2020, London, UK define sequential schedules as schedules that execute and re- Fetch. We give the rule for the fetch stage below. tire instructions immediately upon fetching them. We attach cond-fetch to each program a canonical sequential schedule and write µ(n) = br(op, −r−v−⃗−−,ntrue,nfalse) i = MAX(buf ) + 1 N ′ ′ −−−−− true true false C⇓seq C to model execution under this canonical schedule. buf = buf [i 7→ br(op,rv⃗,n , (n ,n ))] Our sequential validation is defined relative to an equiva- (ρ, µ,n, buf ) ,−−−−−−−→ (ρ, µ,ntrue, buf ′) lence ≈ on configurations. Informally, two configurations fetch: true are equivalent if their memories and register files are equal, The cond-fetch rule speculatively executes the branch de- even if their speculative states may be different. termined by a Boolean value b given by the directive. We show the case for b = true; the case for false is analogous. Theorem 3.2 (Sequential equivalence). Let C be an initial The rule updates the current program point n, allowing exe- N configuration and D a well-formed schedule for C. If C⇓D C1, cution to continue along the specified branch. The rule then N true false then C⇓seq C2 and C1 ≈ C2. records the chosen branch n (resp. n ) in the transient jump instruction. Complete definitions, more properties, and proofs are This semantics models the behavior of most modern pro- given in AppendixB. cessors. Since the target of the branch cannot be resolved in the fetch stage, speculation allows execution to continue 3.2 Overview of the Semantics and not stall until the branch target is resolved. In hardware, a branch predictor chooses which branch to execute; in our As shown in Table1, each instruction has a physical form semantics, the directives fetch: true and fetch: false deter- and one or more transient forms. Our semantics operates on mine which of the rules to execute. This allows us to abstract these instructions similar to a multi-stage processor pipeline. over all possible predictor implementations. Physical instructions are fetched from memory and become Execute. transient instructions in the reorder buffer. They are then Next, we describe the rules for the execute stage. executed until they are fully resolved. Finally they are retired, cond-execute-correct ( ) ( −−−⃗−− ( true false)) updating the non-speculative state in the configuration. buf i = br op,rv,n0, n ,n In the rest of this section, we show how we model specula- j < i : buf (j) , fence ∀ −−−−− −−−−−− −−−−−− tive execution (Section 3.3), memory operations (Section 3.4), (buf +i ρ)(rv⃗) = v⃗ℓ op(v⃗ℓ) = trueℓ true ′ J K true aliasing prediction (Section 3.5), and fence instructions (Sec- n = n0 buf = buf [i 7→ jump n ] true tion 3.6). We also briefly describe indirect jumps and function jump nℓ (ρ, µ,n, buf ) ,−−−−−−−→ (ρ, µ,n, buf ′) calls (Section 3.7), which are presented in full in AppendixA. execute i Our semantics captures a variety of existing Spectre vari- cond-execute-incorrect ants, including v1 (Figure1), v1.1 (Figure6), and v4 (Figure7), buf (i) = br(op, −r−v−⃗−−,n , (ntrue,nfalse)) as well as a new hypothetical variant (Figure2). Additional 0 j < i : buf (j) , fence variants (e.g., v2 and ret2spec) can be expressed with the ex- ∀ −−−−− −−−−−− −−−−−− tended semantics given in AppendixA. Our semantics shows (buf +i ρ)(rv⃗) = v⃗ℓ op(v⃗ℓ) = trueℓ true ′ J K true that these attacks violate SCT by producing observations n , n0 buf = buf [j : j < i][i 7→ jump n ] true depending on secrets. rollback,jump nℓ (ρ, µ,n, buf ) ,−−−−−−−−−−−−−−−→ (ρ, µ,ntrue, buf ′) execute i 3.3 Speculative Execution Both rules evaluate the condition op via an evaluation func- We start with the semantics for conditional branches which tion · . In both, the function produces true; but the false J K introduce speculative execution. rules are analogous. The rules then compare the actual branch target ntrue against the speculatively chosen target n0 from Conditional branching. The physical instruction for con- the fetch stage. −−−−− true false ditional branches has the form br(op,rv⃗,n ,n ), where If the correct path was chosen during speculation, i.e., n0 op is a Boolean operator whose result determines whether agrees with the correct branch ntrue, rule cond-execute- or not to execute the jump, −r−v−⃗−− are the operands to op, and correct updates buf with the fully resolved jump instruc- true false true n and n are the program points for the true and false tion and emits an observation: jump nℓ . This models an branches, respectively. attacker that can observe control flow, e.g., by timing execu- We show br’s transient counterparts in Table1. The unre- tions along different paths. The leaked observation ntrue has solved form extends the physical instruction with a program label ℓ, propagated from the evaluation of the condition. point n0, which is used to record the branch that is executed In case the wrong path was taken during speculation, i.e., true false true (n or n ) speculatively, and may or may not correspond the calculated branch n disagrees with n0, the semantics to the branch that is actually taken once op is resolved. The must roll back all execution steps along the erroneous path. resolved form contains the final jump target. For this, rule cond-execute-incorrect removes all entries PLDI ’20, June 15–20, 2020, London, UK S. Cauligi, C. Disselkoen, K. v. Gleissenthall, D. Tullsen, D. Stefan, T. Rezk, and G. Barthe

(a) Predicted correctly model a large variety of architectures. For example, in a sim- ( ) ( ) −−− i Initial buf i buf i after execute 4 ple , addr(v⃗) might compute the sum of its 3 (rb = 4)(rb = 4) J K operands; in an -style address mode, addr([v1,v2,v3]) 4 br(<, (2,ra), 9, (9, 12)) jump 9 J K might instead compute v1 + v2 · v3. 5 (rc = op(+, (1,rb ))) (rc = op(+, (1,rb ))) Store forwarding. Multiple transient load and store instruc- (b) Predicted incorrectly tions may exist concurrently in the reorder buffer. In par- i Initial buf (i) buf (i) after execute 4 ticular, there may be unresolved loads and stores that will 3 (rb = 4)(rb = 4) read or write to the same address in memory. Under a naive 4 br(<, (2,ra), 12, (9, 12)) jump 9 model, we must wait to execute load instructions until all 5 (rd = op(*, (rд,rh))) - prior store instructions have been retired, in case they write to the address we will load from. Indeed, some real-world Figure 4. Correct and incorrect branch prediction. Initially, processors behave exactly this way [10]. ra = 3. In (b), the misprediction causes a rollback to 4. For performance, most modern processors implement store- forwarding for memory operations: if a load reads from the same address as a prior store and the store has already been in buf that are newer than the current instruction (i.e., all resolved, the processor can forward the resolved value to entries j ≥ i), sets the program point n to the correct branch, the load. The load can then proceed without waiting for the and updates buf at index i with correct value for the resolved store to commit to memory [34]. jump instruction. Since an attacker can observer misspec- To model these store forwarding semantics, we use an- ulation through instruction timing [20], the rule issues a notations to recall if a load was resolved from memory or rollback observation in addition to the jump observation. n forwarding. A resolved load has the form (r = vℓ {j, a}) , Retire. The rule for the retire stage is shown below; its only where the index j records either the buffer index of the store effect is to remove the jump instruction from the buffer. instruction that forwarded its value to the load, or ⊥ if the jump-retire value was taken from memory. We also record the memory MIN(buf ) = i address a associated with the data, and retain the program ′ buf (i) = jump n0 buf = buf \ buf (i) point n of the load instruction that generated the value in- (ρ, µ,n, buf ) ,−−−−→ (ρ, µ,n, buf ′) struction. The resolved load otherwise behaves as a resolved retire value instruction (e.g., for the register resolve function). Examples. Figure4 shows how branch prediction affects the Fetch. We now discuss the inference rules for memory oper- reorder buffer. In part (a), the branch at index 4 is predicted ations, starting with the fetch stage. correctly. The jump instruction is resolved, and execution simple-fetch ′ proceeds as normal. In part (b), the branch at index 4 is µ(n) ∈ {op, load, store, fence } n = next(µ(n)) ′ incorrectly predicted. Upon executing the branch, the mis- i = MAX(buf ) + 1 buf = buf [i 7→ transient(µ(n))] prediction is detected, and buf is rolled back to index 4. (ρ, µ,n, buf ) ,−−−→ (ρ, µ,n′, buf ′) fetch 3.4 Memory Operations Given a fetch directive, rule simple-fetch extends the re- The physical instruction for loads is (r = load(−r−v−⃗−−,n′)), while order buffer buf with a new transient instruction (see Ta- the form for stores is store(rv, −r−v−⃗−−,n′). As before, n′ is the ble1). Other than load and store, the rule also applies to program point of the next instruction. For loads, r is the op and fence instructions. The transient(·) function simply register receiving the result; for stores, rv is the register or translates the physical instruction at µ(n) to its unresolved value to be stored. For both loads and stores, −r−v−⃗−− is a list of transient form. It inserts the new, transient instruction at the operands (registers and values) which are used to calculate first empty index in buf , and sets the current program point ′ the operation’s target address. to the next instruction n . Note that transient(·) annotates Transient counterparts of load and store are given in Ta- the transient load instruction with its program point. ble1. We annotate unresolved load instructions with the Load execution. Next, we cover the rules for the load exe- program point of the physical instruction that generated cute stage. them; we omit annotations whenever not used. Unresolved load-execute-nodep −−−−− and resolved store instructions share the same syntax, but buf (i) = (r = load(rv⃗))n j < i : buf (j) , fence −−−−− −−−−−− ∀ −−−−−− for resolved stores, both address and operand are required (buf +i ρ)(rv⃗) = v⃗ℓ addr(v⃗ℓ) = a −⃗−− J K to be single values. ℓa = ⊔ℓ j < i : buf (j) , store(_, a) ∀ ′ n Address calculation. We assume an arithmetic operator µ(a) = vℓ buf = buf [i 7→ (r = vℓ {⊥, a}) ] addr which calculates target addresses for stores and loads read aℓ (ρ, µ,n, buf ) ,−−−−−−→a (ρ, µ,n, buf ′) from its operands. We leave this operation abstract in order to execute i Constant-Time Foundations for the New Spectre Era PLDI ’20, June 15–20, 2020, London, UK

load-execute-forward store-execute-addr-hazard −−−−− −−−−− buf (i) = (r = load(rv⃗))n j < i : buf (j) , fence buf (i) = store(rv,rv⃗) j < i : buf (j) , fence −−−−− −−−−−− ∀−−−−−− −⃗−− −−−−− −−−−−− ∀ −−−−−− −⃗−− (buf +i ρ)(rv⃗) = v⃗ℓ addr(v⃗ℓ) = a ℓa = ⊔ℓ (buf +i ρ)(rv⃗) = v⃗ℓ addr(v⃗ℓ) = a ℓa = ⊔ℓ J K J K nk max(j) < i : buf (j) = store(_, a) ∧ buf (j) = store(vℓ, a,) min(k) > i : buf (k) = (r = ... {jk , ak }) : ′ n buf = buf [i 7→ (r = vℓ {j, a}) ] (ak = a ∧ jk < i) ∨ (jk = i ∧ ak , a) ′ fwd aℓa buf = buf [j : j < k][i 7→ store(rv, a )] (ρ, µ,n, buf ) ,−−−−−−→ (ρ, µ,n, buf ′) ℓa execute i rollback fwd , aℓa ′ (ρ, µ,n, buf ) ,−−−−−−−−−−−−−→ (ρ, µ,nk , buf ) Given an execute directive for buffer index i, under the execute i: addr The execution of store is split into two steps: value resolution, condition that i points to an unresolved load, rule load- represented by the directive execute i : value, and address execute-nodep applies if there are no prior store instruc- resolution, represented by the directive execute i : addr; tions in buf that have a resolved, matching address. The rule −−−−− −−−−−− a schedule may have either step first. Either step may be first resolves the operand list rv⃗ into a list of values v⃗ℓ, and −−−−−− skipped if data or address are already in immediate form. then usesv⃗ to calculate the target address a. It then retrieves ℓ Rule store-execute-addr-ok applies if no misprediction the current value v at address a from memory, and finally ℓ has been detected, i.e., if no load instruction forwarded data adds to the buffer a resolved value instruction assigning v to ℓ from an outdated store. We check this by requiring that all the target register r. We annotate the value instruction with value instructions after the current index (indices k > i) the address a and ⊥, signifying that the value comes from with an address a matching the current store must be using memory. Finally, the rule produces the observation read a , ℓa a value forwarded from a store at least as recent as this one which renders the memory read at address a with label ℓ a (a = a ⇒ j ≥ i). We define ⊥ < n for any index n—that is, visible to an attacker. k k if a future load matches the address of the current store but Rule load-execute-forward applies if the most recent loaded its value from memory, we consider this a hazard. store instruction in buf with a resolved, matching address If there is indeed a hazard, i.e., if there was a resolved has a resolved data value. Instead of accessing memory, the load with an outdated value, the rule store-execute-addr- rule forwards the value from the store instruction, annotat- hazard picks the earliest such instruction (index k) and ing the new value instruction with the calculated address restarts execution by resetting the instruction pointer to a and the index j of the originating store instruction. The the program point n of this instruction. It then discards all rule produces a fwd observation with the labeled address a . k ℓa transient instructions at indices at least k from the reorder This observation captures that the attacker can determine buffer. As in the case of misspeculation, the rule issuesa (e.g., by observing the absence of memory access using a rollback observation. cache timing attack) that a forwarded value from address a was found in the buffer instead of loaded from memory. Retire. Resolved loads are retired using the following rule. Importantly, neither of the rules has to wait for prior stores value-retire MIN(buf ) = i buf (i) = (r = vℓ) to be resolved and can proceed speculatively. This can lead ′ ′ to memory hazards when a more recent store to the load’s ρ = ρ[r 7→ vℓ] buf = buf \ buf (i) address has not been resolved yet; we show how to deal with (ρ, µ,n, buf ) ,−−−−→ (ρ′, µ,n, buf ′) hazards in the rules for the store instruction. retire This is the same retire rule used for simple value instructions Store execution. We show the rules for stores below. (e.g., resolved op instructions). The rule updates the register map ρ with the new value, and removes the instruction from store-execute-value −−−−− the reorder buffer. buf (i) = store(rv,rv⃗) j < i : buf (j) , fence ∀ −−−−− Stores are retired using the rule below. ( )( ) ′ [ 7→ ( , ⃗)] buf +i ρ rv = vℓ buf = buf i store vℓ rv store-retire ′ (ρ, µ,n, buf ) ,−−−−−−−−−−−→ (ρ, µ,n, buf ) MIN(buf ) = i buf (i) = store(vℓ, aℓa ) execute i: value ′ ′ µ = µ[a 7→ vℓ] buf = buf \ buf (i)

write aℓ (ρ, µ,n, buf ) ,−−−−−−−→a (ρ, µ′,n, buf ′) store-execute-addr-ok retire −−−−− buf (i) = store(rv,rv⃗) j < i : buf (j) , fence A fully resolved store instruction retires similarly to a value −−−−− −−−−−− ∀ −−−−−− −⃗−− (buf +i ρ)(rv⃗) = v⃗ℓ addr(v⃗ℓ) = a ℓa = ⊔ℓ instruction. However, instead of updating the register map J K k > i : buf (k) = (r = ... {jk , ak }) : ρ, rule store-retire updates the memory µ. Since an at- ∀ (ak = a ⇒ jk ≥ i) ∧ (jk = i ⇒ ak = a) tacker can observe memory writes, the rule produces the ′ write buf = buf [i 7→ store(rv, aℓa )] observation aℓa with the labeled address of the store.

fwd aℓ Example. Figure5 gives an example of store-to-load for- (ρ, µ,n, buf ) ,−−−−−−−−−−→a (ρ, µ,n, buf ′) execute i: addr warding. In the starting configuration, the store at index PLDI ’20, June 15–20, 2020, London, UK S. Cauligi, C. Disselkoen, K. v. Gleissenthall, D. Tullsen, D. Stefan, T. Rezk, and G. Barthe

Registers ρ(ra) = 40pub 2 is fully resolved, while the store at index 3 has an unre- Directives D= execute 4; execute 3 : addr solved address. The first directive executes the load at 4. This Leakage for D fwd 43pub; rollback, fwd 43pub load accesses address 43, which matches the store at index starting buf buf after execute 4 buf after D 2. Since this is the most recent such store and has a resolved 2 store(12, 43pub) 2 store(12, 43pub) 2 store(12, 43pub) value, the load gets the value 12 from this store. The follow- 3 store(20, [3,ra]) 3 store(20, [3,ra]) 3 store(20, 43pub) ing directive resolves the address of the store at index 3. This 4 (rc = load([43])) 4 (rc = 12{2, 43}) store also matches address 43. As this store is more recent than store 2, this directive triggers a hazard for the load at 4, Figure 5. Store hazard caused by late execution of store leading to the rollback of the load from the reorder buffer. addresses. The store address for 3 is resolved too late, causing Capturing Spectre. We now have enough machinery to the later load instruction to forward from the wrong store. capture several variants of Spectre attacks. When 3’s address is resolved, the execution must be rolled We discussed how our semantics model Spectre v1 in Sec- back. In this example, addr(·) adds its arguments. tion2 (Figure1). Figure6 shows a simple disclosure gadget J K Registers Reorder buffer using forwarding from an out-of-bounds write. In this exam- r ρ(r) i buf (i) ple, a secret value xsec is supposed to be written to secretKey at an index r as long as r is within bounds. However, due ra 5pub 1 br(>, (4,ra), 2, (2, 4)) a a to branch misprediction, the store instruction is executed rb xsec 2 store(rb , [40,ra]) Memory ... despite ra being too large. The load instruction at index a µ(a) 7 (rc = load([45])) 7, normally benign, now aliases with the store at index 2, 40..43 secretKeysec 8 (rc = load([48,rc ])) and receives the secret xsec instead of a public value from 44..47 pubArrApub pubArrA. This value is then used as the address of another 48..4B pubArrBpub load instruction, causing xsec to leak. Directive Effect on buf Leakage Figure7 shows a Spectre v4 vulnerability caused when a store fails to forward to a future load. In this example, the execute 2 : addr 2 7→ store(rb , 45pub) fwd 45pub load at index 3 executes before the store at 2 calculates its execute 2 : value 2 7→ store(xsec, 45pub) address. As a result, this execution loads the outdated secret execute 7 7 7→ (rc = xsec{2, 45}) fwd 45pub 43 execute 8 8 7→ (rc = X {⊥, a}) read asec value at address and leaks it, instead of using the public where a = xsec + 48 zeroed-out value that would be written.

Figure 6. Example demonstrating a store-to-load Spectre 3.5 Aliasing Prediction v1.1 attack. A speculatively stored value is forwarded and We extend the memory semantics from the previous section then leaked using a subsequent load instruction. to model aliasing prediction by introducing a new transient instruction (r = load(−r−v−⃗−−, (v , j)))n. This instruction repre- Registers Reorder buffer ℓ r ρ(r) i buf (i) sents a partially resolved load with speculatively forwarded data. As before, r is the target register, −r−v−⃗−− is the list of argu- ra 40pub 2 store(0, [3,ra]) ments for address calculation, and n is the program point Memory 3 (rc = load([43])) load v a µ(a) 4 (rc = load([44,rc ])) of the physical instruction. The new parameters are ℓ, j store 40..43 secretKeysec the forwarded data, and , the index of the originating . 44..47 pubArrApub Forwarding via prediction. Directive Effect on buf Leakage load-execute-forwarded-guessed ( ) ( (−−−−−))n execute 3 3 7→ (r = secretKey[3]{⊥, 43}) read 43 buf i = r = load rv⃗ j < i c pub −−−−− execute 4 4 7→ (rc = X {⊥, a}) read asec k < i : buf (k) , fence buf (j) = store(vℓ,rv⃗j ) ∀ ′ −−−−− {3, 4} < buf rollback, buf = buf [i 7→ (r = load(rv⃗, (v , j)))n] execute 2 : addr ℓ 7→ ( , 43 ) fwd 43 2 store 0 pub pub (ρ, µ,n, buf ) ,−−−−−−−−−−−→ (ρ, µ,n, buf ′) where a = secretKey[3]sec + 44 execute i: fwd j Rule load-execute-forwarded-guessed implements for- Figure 7. Example demonstrating a v4 Spectre attack. The warding in the presence of unresolved target addresses. In- store is executed too late, causing later load instructions to stead of forwarding the value from a store with a matching use outdated values. address, as in Section 3.4, the attacker can now freely choose to forward from any store with a resolved value—even if its target address is not known yet. Given a choice of which store j to forward from—supplied via directive—the rule up- dates the reorder buffer with the new partially resolved load Constant-Time Foundations for the New Spectre Era PLDI ’20, June 15–20, 2020, London, UK and records both the forwarded value vl and the buffer index Resolving when originating store is not in the buffer. j of the store instruction. We must also consider the case where we have delayed re- Register resolve function. We extend the register resolve solving the load address to the point where the originating store has already retired, and is no longer available in buf . function (buf +i ρ) to allow using values from partially re- solved loads. In particular, whenever the register resolve If this is the case, and no other prior store instructions have function computes the latest resolved assignment to some a matching address, then we must check the forwarded data register r, it now considers not only fully resolved value against memory. instructions, but also our new partially resolved load: when- load-execute-addr-mem-match ever the latest assignment in the buffer is a partially resolved −−−−− n buf (i) = (r = load(rv⃗,vℓ, j)) load, the register resolve function returns its value. −−−−− −−−−−− −⃗−− j < buf (buf +i ρ)(rv⃗) = v⃗ℓ ℓa = ⊔ℓ We now discuss the execution rules, where partially re- −−−−−− addr(v⃗ℓ) = a k < i : buf (k) , store(_, a) solved loads may fully resolve against either the originating ′ ∀ n Jµ(a) = vℓK buf = buf [i 7→ (r = vℓ {⊥, a}) ] store or against memory. read aℓa ′ Resolving when originating store is in the reorder buffer. (ρ, µ,n, buf ) ,−−−−−−→ (ρ, µ,n, buf ) execute i load-execute-addr-ok load-execute-addr-mem-hazard ( ) ( (−−−−− ( )))n −−−−− n′ buf i = r = load rv⃗, vℓ, j buf (i) = (r = load(rv⃗,vℓ, j)) (buf + ρ)(−r−v−⃗−−) = −v−−−⃗−− addr(−v−−−⃗−−) = a −−−−− −−−−−− −⃗−− −−− i ℓ ℓ j < buf (buf +i ρ)(rv⃗) = v⃗ℓ ℓa = ⊔ℓ ⊔⃗ ( ) ( −−−⃗−−J) ∧ (−−−⃗−− K ′ ⇒ ′ ) −−− ℓa = ℓ buf j = store vℓ,rv j rv j = a a = a addr(v⃗ℓ) = a k < i : buf (k) , store(_, a) ( < < ) ( ) ( , ) J ′ K ′ ∀ ′ k : j k i : buf k , store _ a µ(a) = v ′ v ′ , vℓ buf = buf [j : j < i] ∀ ′ n ℓ ℓ buf = buf [i 7→ (r = vℓ {j, a}) ] rollback read , aℓa ′ ′ fwd aℓ (ρ, µ,n, buf ) ,−−−−−−−−−−−−−−→ (ρ, µ,n , buf ) (ρ, µ,n, buf ) ,−−−−−−→a (ρ, µ,n, buf ′) execute i execute i If the originating store has retired, and no intervening load-execute-addr-hazard stores match the same address, we must load the value from −−−−− n′ buf (i) = (r = load(rv⃗, (vℓ, j))) memory to ensure we were originally forwarded the correct −−−−− −−−−−− −−−−−− (buf +i ρ)(rv⃗) = v⃗ℓ addr(v⃗ℓ) = a value. If the value loaded from memory matches the value we −⃗−− J ′ K′ ℓa = ⊔ℓ (buf (j) = store(vℓ, a ) ∧ a , a) ∨ were forwarded, we update the reorder buffer with a resolved ( k : j < k < i ∧ buf (k) = store(_, a)) load annotated as if it had been loaded from memory (rule ∃ buf ′ = buf [j : j < i] load-execute-addr-mem-match). If a store different from the originating store overwrote the rollback,fwd aℓ (ρ, µ,n, buf ) ,−−−−−−−−−−−−−→a (ρ, µ,n′, buf ′) originally forwarded value, the value loaded from memory execute i may not match the value we were originally forwarded. In −−−−− n To resolve (r = load(rv⃗, (vℓ, j))) when its originating store this case we roll back execution to just before the load (rule is still in buf , we calculate the load’s actual target address a load-execute-addr-mem-hazard). and compare it against the target address of the originating We demonstrate these semantics in the attack shown in store at buf (j). If the store is not followed by later stores to a, Figure2. An earlier draft of this paper [ 7] incorrectly claimed and either (1) the store’s address is resolved and its address to have a proof-of-concept exploit for this attack on real is indeed a, or (2) the store’s address is still unresolved, we hardware. update the reorder buffer with an annotated value instruction (rule load-execute-addr-ok). 3.6 Speculation Barriers If, however, either the originating store resolved to a differ- We extend our semantics with a speculation barrier instruc- ent address (mispredicted aliasing) or a later store resolved tion, fence n, that prevents further speculative execution to the same address (hazard), we roll back our execution to until all prior instructions have been retired. just before the load (rule load-execute-addr-hazard). We allow the load to execute even if the originating store fence-retire has not yet resolved its address. When the store does fi- MIN(buf ) = i buf (i) = fence buf ′ = buf \ buf (i) nally resolve its address, it must check that the addresses (ρ, µ,n, buf ) ,−−−−→ (ρ, µ,n, buf ′) match and that the forwarding was correct. The gray formu- retire las in store-execute-addr-ok and store-execute-addr- The fence instruction uses simple-fetch as its fetch rule, hazard (Section 3.4) perform these checks: For forwarding and its rule for retire only removes the instruction from to be correct, all values forwarded from a store at buf (i) must the buffer. It does not have an execute rule. However, fence have a matching annotated address ( k > i : jk = i ⇒ ak = instructions affect the execution of all instructions in there- a). Otherwise, if any value annotation∀ has a mismatched order buffer that come after them. In prior sections, execute address, then the instruction is rolled back (jk = i ∧ ak , a). rules have the highlighted condition j < i : buf (j) , fence. ∀ PLDI ’20, June 15–20, 2020, London, UK S. Cauligi, C. Disselkoen, K. v. Gleissenthall, D. Tullsen, D. Stefan, T. Rezk, and G. Barthe

Before executing 1 After In AppendixA, we present detailed rules for indirect i buf [i] i buf [i] jumps, functions calls, and returns. We also show how both 1 br(>, (4,ra), 2, (2, 5)) 1 jump 5 Spectre v2 [20] and ret2spec [23] attacks can be expressed in 2 fence our semantics, as well as the retpoline mitigation [31] against ( ([40, ])) 3 rb = load ra Spectre v2 attacks. 4 (rc = load([44,rb ]))

Figure 8. Example demonstrating fencing mitigation against 4 Detecting Violations Spectre v1 attacks. The fence instruction prevents the load We develop a tool Pitchfork based on our semantics to check instructions from executing before the br. for SCT violations. Pitchfork first generates a set of sched- ules representing various worst-case attackers. This set of schedules is far smaller than the set of all possible sched- This condition ensures that as long as a fence instruction re- ules for the program, but is nonetheless sound: if there is an mains in buf , any instructions fetched after the fence cannot SCT violation in any possible schedule, then there will be an be executed. SCT violation in one of the worst-case schedules. Pitchfork We use fence instructions to restrict out-of-order execu- then checks for secret leakage by symbolically executing the tion in our semantics. Notably, we can use it to prevent program under each schedule. attacks of the forms shown in Figures1,6 and7. Pitchfork only exercises a subset of our semantics; it does Example. The example in Figure8 shows how placing a not detect SCT violations based on alias prediction, indirect fence instruction just after the br instruction prevents the jumps, or return stack buffers (Sections 3.5 and 3.7). Doing Spectre v1 attack from Figure1. The fence in this example so would require it to generate a prohibitively large number prevents the load instructions at 2 and 3 from executing and of schedules. Nevertheless, Pitchfork still exposes attacks forces the br to be resolved first. Evaluating the br exposes based on Spectre variants 1, 1.1, and 4. the misprediction and causes the two loads (as well as the We describe our schedule generation in Section 4.1, and fence) to be rolled back. evaluate Pitchfork on several crypto libraries in Section 4.2.

3.7 Indirect Jumps and Return Address Prediction 4.1 Schedule Generation Finally, we briefly discuss two additional extensions to our Given a program, Pitchfork generates a set of schedules rep- semantics. First, we extend our semantics with indirect jumps. resenting various worst-case attackers. Pitchfork’s schedule Rather than specifying jump targets directly as with the generation is parametrized by a speculation bound, which br instruction in Section 3.3, indirect jumps compute the limits the size of the reorder buffer, and thus the depth of target from a list of argument operands. The indirect jump speculation. instruction has the form jmpi(−r−v−⃗−−), where −r−v−⃗−− is the list of In general, Pitchfork constructs worst-case schedules to operands for calculating the jump target. The transient form maximize speculation. These schedules eagerly fetch instruc- −−−−− of jmpi is jmpi(rv⃗,n0), where n0 is the predicted jump target. tions until the reorder buffer is full, i.e., the size of the reorder To fetch a jmpi instruction, we use fetch: n′, where n′ is buffer equals the speculation bound. Once the reorder buffer the speculated jump target. In all other respects, the rules is full, the schedules only retire instructions as necessary to for indirect jump instructions are similar to the rules for fetch new ones. conditional branches. When conditional branches are to be fetched, Pitchfork Second, we extend our semantics with call and ret instruc- constructs schedules containing both possible outcomes: one tions. The call instruction has the form call(nf ,nret), where where the branch is guessed true (fetch: true) and one where nf is the callee program point and nret is the program point to the branch is guessed false (fetch: false). For the mispredicted return to. The return instruction is simply ret. Both the call outcome, Pitchfork’s schedules execute the branch as late as and ret instructions have the simple transient forms call and possible (i.e., it is the oldest instruction in the reorder buffer ret. However, when fetched, they are unpacked into multiple and the reorder buffer is full), which delays the rollback of transient instructions. Fetching a call produces the call tran- mispredicted paths. sient instruction as well as transient instructions which will To account for the load-store forwarding hazards described increment a stack pointer and store the return program point in Section 3.4, Pitchfork constructs schedules containing to memory. Fetching a ret produces a corresponding load, all possible forwarding outcomes. For every load instruc- decrement, and jump as well as the ret transient instruction. tion l in the program, Pitchfork finds all prior stores si Furthermore, the call and ret instructions respectively push within the speculation bound that would resolve to the same and pop program points to an additional configuration state address. Then, for each such store, Pitchfork constructs a representing the return stack buffer (RSB). The RSB is used schedule that would cause that store to forward its data to l. to predict the new program point upon fetching a ret. That is, Pitchfork constructs separate schedules [execute s1 : Constant-Time Foundations for the New Spectre Era PLDI ’20, June 15–20, 2020, London, UK addr; execute l], [execute s2 : addr; execute l], and so on. Table 2. A indicates Pitchfork found an SCT violation. A Additionally, Pitchfork constructs a schedule where none of f indicates the violation was found only with forwarding the prior stores si have resolved addresses, forcing the load hazard detection. instruction to read from memory. For all instructions other than conditional branches and Case Study C FaCT memory operations, Pitchfork only constructs schedules where these instructions are executed eagerly and in or- curve25519-donna ✓ ✓ secretbox der. Reordering of these instructions is uninteresting: either libsodium ✓ OpenSSL ssl3 record validate f the instructions naturally commute, or data dependencies f prevent the reordering (i.e., the reordered schedule is invalid OpenSSL MEE-CBC for the program). This intuition matches with the property that any out-of-order execution of a given program has the 1 for (int cnt= nlist-1; cnt >=0;--cnt) { same final result regardless of its schedule. 2 iov[cnt].iov_base=( char *) list->str; We formalize the soundness of Pitchfork’s schedule con- 3 // ... struction in more detail in Appendix B.3. 4 list= list->next; 5 } 4.2 Implementation and Evaluation We implement Pitchfork on top of the angr binary-analysis Figure 9. Vulnerable snippet from __libc__message().4 tool [30]. Pitchfork uses angr to symbolically execute a given program according to each of its worst-case schedules, flag- ging any resulting secret leakage. we ran Pitchfork without forwarding hazard detection—only To sanity check Pitchfork, we create and analyze a set looking for Spectre v1 and v1.1 violations—and with a spec- of Spectre v1 and v1.1 test cases, and ensure we flag their ulation bound of 250 instructions. If Pitchfork did not flag SCT violations. Our test cases are based off the well-known any violations, we re-enabled forwarding hazard detection— Kocher Spectre v1 examples [19]. Since many of the Kocher looking for Spectre v4 violations—and ran Pitchfork with examples exhibit violations even during sequential execution, a reduced bound of 20 instructions. The reduced bound en- we create a new set of Spectre v1 test cases which only exhibit sured that the analysis was tractable. violations when executed speculatively. We also develop a similar set of test cases for Spectre v1.1 data attacks. 4.2.2 Detected Violations. Table2 shows our results. Pitch- Pitchfork necessarily inherits the limitations of angr’s fork did not flag any SCT violations in the curve25519-donna symbolic execution. For instance, angr concretizes addresses implementations; this is not surprising, as the curve25519- for memory operations instead of keeping them symbolic. donna library is a straightforward implementation of crypto Furthermore, exploring every speculative branch and poten- primitives. Pitchfork did, however, find SCT violations (with- tial store-forward within a given speculation bound leads out forwarding hazard detection) in both the libsodium and to an explosion in state space. In our tests, we were able to OpenSSL codebases. Specifically, Pitchfork found violations support speculation bounds of up to 20 instructions. We were in the C implementations of these libraries, in code ancillary able to increase this bound to 250 instructions when we dis- to the core crypto routines. This aligns with our intuition abled checking for store-forwarding hazards. Though these that crypto primitives will not themselves be vulnerable to bounds do not capture the speculation depth of some mod- Spectre attacks, but higher-level code that interfaces with ern processors, Pitchfork still correctly finds SCT violations these primitives may still leak secrets. Such higher-level code in all our test cases, as well as SCT violations in real-world is not present in the corresponding FaCT implementations, crypto code. We consider the design and implementation of and Pitchfork did not find any violations in the FaCT code a more scalable tool future work. with these settings. However, with forwarding hazard detec- tion, Pitchfork was able to find vulnerabilities even in the 4.2.1 Evaluation Procedure. To evaluate Pitchfork on FaCT versions of the OpenSSL implementations. We describe real-world crypto implementations, we use the same case two of the violations Pitchfork flagged next. studies as FaCT [8], a domain-specific language and compiler C libsodium secretbox. The libsodium codebase compiles for constant-time crypto code. We use FaCT’s case studies for with stack protection [15] turned on by default. This means two reasons: these implementations have been verified to be that, for certain functions (e.g., functions with stack allocated (sequentially) constant-time, and their inputs have already char buffers), the compiler inserts code in the function epi- been annotated by the FaCT authors with secrecy labels.3 logue to check if the stack was “smashed”. If so, the program We analyzed both the FaCT-generated binaries and the cor- displays an error message and aborts. As part of printing the responding C binaries for the case studies. For each binary, error message, the program calls a function __libc_message, 3https://github.com/PLSysSec/fact-eval which does printf-style string formatting. PLDI ’20, June 15–20, 2020, London, UK S. Cauligi, C. Disselkoen, K. v. Gleissenthall, D. Tullsen, D. Stefan, T. Rezk, and G. Barthe

1 aesni_cbc_encrypt(/* ... */); 5 Related Work 2 // (len _out) is in %r14 Prior work on modeling speculative or out-of-order execu- 3 secret mut uint32 pad= _out[ len _out-1]; tion is concerned with correctness rather than security [1, 4 public uint32 maxpad= tmppad> 255? 255: tmppad; 5 if (pad> maxpad) { 21]. We instead focus on security and model side-channel 6 pad= maxpad; leakage explicitly. Moreover, we abstract away the specifics 7 ret=0; // overwrites %r14 of microarchitectural features, considering them to be adver- 8 } sarially controlled. 9 // ... Disselkoen et al. [13] explore speculation and out-of-order 10 _sha1_update(/* ... */); // can return to line 3 effects through a relaxed memory model. Their semantics sits at a higher level, and is orthogonal to our approach. They do Figure 10. Vulnerable snippet from the FaCT OpenSSL MEE not define a semantic notion of security that prevents Spectre- implementation.5 like attacks, and do not provide support for verification. Mcilroy et al. [24] reason about micro-architectural attacks using a multi-stage pipeline semantics (though they do not Figure9 shows a snippet from this function which tra- define a formal security property). Their semantics models verses a linked list. When running the C secretbox imple- branch predictor and cache state explicitly. However, they mentation speculatively, the processor may misspeculate on do not model the effects of speculative barriers, nor other the stack tampering check and jump into the error handling microarchitecture features such as store-forwarding. Thus, code, eventually calling __libc_message. Again due to mis- their semantics can only capture Spectre v1 attacks. speculation, the processor may incorrectly proceed through Both Guarnieri et al. [17] and Cheang et al. [9] define spec- the loop extra times, traversing non-existent links, eventu- ulative semantics that are supported by tools. Their seman- ally causing secret data to be stored into list instead of tics handle speculation through branch prediction—where a valid address (line 4). On the following iteration of the the predictor is left abstract—but do not capture more general loop, dereferencing list (line 2) causes a secret-dependent out-of-order execution nor other types of speculation. These memory access. works also propose new semantic notions of security (differ- FaCT OpenSSL MEE. In Figure 10, we show the code from ent from SCT); both essentially require that the speculative the FaCT port of OpenSSL’s authenticated encryption imple- execution of a program not leak more than its sequential mentation. The FaCT compiler transforms the branch at lines execution. If a program is sequentially constant-time, this 5-7 into straight-line constant-time code, since the variable additional security property is equivalent to our notion of pad is considered secret. speculative constant-time. Though our property is stronger, Initially, register %r14 holds the length of the array _out. it is also simpler to verify: we can directly check SCT without The processor leaks this value due to the array access on first checking if a program is sequentially constant-time. And line 3; this is not a security violation, as the length is pub- since we focus on cryptographic code, we directly require lic. On line 7, the value of %r14 is overwritten with 0 if the stronger SCT property. pad> maxpad , or 1 (the initial value of ret) otherwise. After- Balliu et al. [3] define a semantics in a style similar to ours. wards, the processor calls _sha1_update. Their semantics captures various Spectre attacks, including To return from _sha1_update, the processor must first load an attack similar to our alias prediction example (Figure2), the return address from memory. When forwarding hazard and a new attack based on their memory ordering semantics, detection is enabled, Pitchfork allows this load to specula- which our semantics cannot capture. tively receive data from stores older than the most recent Finally, several tools detect Spectre vulnerabilities, but store to that address (see Section 3.4). Specifically, it may do not present semantics. The oo7 static analysis tool [33], receive the prior value that was stored at that location: the for example, uses taint tracking to find Spectre attacks and return address for the call to aesni_cbc_encrypt. automatically insert mitigations for several variants. Wu After the speculative return, the processor executes line 3 and Wang [35], on the other hand, perform cache analysis a second time. This time, %r14 does not hold the public value of LLVM programs under speculative execution, capturing len _out; it instead holds the value of ret, which was de- Spectre v1 attacks. rived from the secret condition pad> maxpad . The processor thus accesses either _out[0] or _out[-1], leaking informa- tion about the secret value of pad via cache state.

4Code snippet taken from https://github.com/lattera/glibc/blob/ 895ef79e04a953cac1493863bcae29ad85657ee1/sysdeps/posix/libc_fatal.c 6 Conclusion 5Code snippet taken from https://github.com/PLSysSec/fact- eval/blob/888bc6c6898a06cef54170ea273de91868ea621e/openssl- We introduced a semantics for reasoning about side-channels mee/20170717_latest.fact under adversarially controlled out-of-order and speculative Constant-Time Foundations for the New Spectre Era PLDI ’20, June 15–20, 2020, London, UK execution. Our semantics capture existing transient execu- [9] Kevin Cheang, Cameron Rasmussen, Sanjit Seshia, and Pramod Subra- tion attacks—namely Spectre—but can be extended to fu- manyan. 2019. A Formal Approach to Secure Speculation. Cryptology ture hardware predictors and potential attacks. We also de- ePrint Archive, Report 2019/310. [10] Tien-Fu Chen and Jean-Loup Baer. 1992. Reducing Memory Latency fined a new notion of constant-time code under speculation— via Non-blocking and Prefetching Caches. In 5th ACM International speculative constant-time (SCT)—and implemented a proto- Conference on Architectural Support for Programming Languages and type tool to check if code is SCT. Our prototype, Pitchfork, Operating Systems. ACM. discovered new vulnerabilities in real-world crypto libraries. [11] Cryptography Coding Standard. 2016. Coding Rules. Retrieved June There are several directions for future work. Our imme- 9, 2017 from https://cryptocoding.net/index.php/Coding_rules [12] Frank Denis. 2019. libsodium. Retrieved May 16, 2018 from https: diate plan is to use our semantics to prove the effectiveness //github.com/jedisct1/libsodium of existing countermeasures (e.g., retpolines) and to justify [13] Craig Disselkoen, Radha Jagadeesan, Alan Jeffrey, and James Riely. new countermeasures. 2019. The Code That Never Ran: Modeling Attacks on Speculative Evaluation. In 40th IEEE Symposium on Security and Privacy. IEEE. Acknowledgments [14] Dmitry Evtyushkin, Ryan Riley, Nael Abu-Ghazaleh, and Dmitry Pono- marev. 2018. BranchScope: A New Side-Channel Attack on Directional We thank the anonymous PLDI and PLDI AEC reviewers Branch Predictor. In 23rd International Conference on Architectural and our shepherd James Bornholt for their suggestions and Support for Programming Languages and Operating Systems. ACM. insightful comments. We thank David Kaplan from AMD [15] GCC Team. 2019. Using the GNU Compiler Collection (GCC): In- for his detailed analysis of our proof-of-concept exploit that strumentation Options. Retrieved November 21, 2019 from https: //gcc.gnu.org/onlinedocs/gcc/Instrumentation-Options.html we incorrectly thought to be abusing an aliasing predictor. [16] Qian Ge, Yuval Yarom, David Cock, and Gernot Heiser. 2018. A survey We also thank Natalie Popescu for her aid in editing and of microarchitectural timing attacks and countermeasures on contem- formatting this paper. This work was supported in part by porary hardware. Journal of Cryptographic Engineering (2018). gifts from Cisco and Fastly, by the NSF under Grant Number [17] Marco Guarnieri, Boris Köpf, José F. Morales, Jan Reineke, and Andrés CCF-1918573, by ONR Grant N000141512750, and by the Sánchez. 2020. SPECTECTOR: Principled Detection of Speculative Information Flows. In 41st IEEE Symposium on Security and Privacy. CONIX Research Center, one of six centers in JUMP, a Semi- IEEE. conductor Research Corporation (SRC) program sponsored [18] Saad Islam, Ahmad Moghimi, Ida Bruhns, Moritz Krebbel, Berk Gulme- by DARPA. zoglu, Thomas Eisenbarth, and Berk Sunar. 2019. SPOILER: Speculative Load Hazards Boost Rowhammer and Cache Attacks. In 28th USENIX References Security Symposium. USENIX Association. [19] Paul Kocher. 2018. Spectre mitigations in Microsoft’s C/C++ com- [1] Jade Alglave, Anthony Fox, Samin Ishtiaq, Magnus O. Myreen, Susmit piler. Retrieved April 6, 2020 from https://www.paulkocher.com/doc/ Sarkar, Peter Sewell, and Francesco Zappa Nardelli. 2009. The Seman- MicrosoftCompilerSpectreMitigation.html Proceedings tics of Power and ARM Multiprocessor Machine Code. In [20] Paul Kocher, Jann Horn, Anders Fogh, Daniel Genkin, Daniel Gruss, of the 4th Workshop on Declarative Aspects of Multicore Programming . Werner Haas, Mike Hamburg, Moritz Lipp, Stefan Mangard, Thomas https: [2] Arm Mbed. [n.d.]. mbed TLS. Retrieved May 16, 2018 from Prescher, Michael Schwarz, and Yuval Yarom. 2019. Spectre Attacks: //github.com/armmbed/mbedtls Exploiting Speculative Execution. In 40th IEEE Symposium on Security InSpectre: [3] Musard Balliu, Mads Dam, and Roberto Guanciale. 2019. and Privacy. IEEE. Breaking and Fixing Microarchitectural Vulnerabilities by Formal Anal- [21] Shuvendu K. Lahiri, Sanjit A. Seshia, and Randal E. Bryant. 2002. Mod- ysis . arXiv:1911.00868 [cs.CR] eling and Verification of Out-of-Order Microprocessors in UCLID. In [4] Gilles Barthe, Gustavo Betarte, Juan Campo, Carlos Luna, and David International Conference on Formal Methods in Computer-Aided Design. Pichardie. 2014. System-level non-interference for constant-time cryp- Springer. Proceedings of the 2014 ACM SIGSAC Conference on Com- tography. In [22] Moritz Lipp, Michael Schwarz, Daniel Gruss, Thomas Prescher, Werner puter and Communications Security . ACM. Haas, Anders Fogh, Jann Horn, Stefan Mangard, Paul Kocher, Daniel [5] Claudio Canella, Daniel Genkin, Lukas Giner, Daniel Gruss, Moritz Genkin, Yuval Yarom, and Mike Hamburg. 2018. Meltdown: Reading Lipp, Marina Minkin, Daniel Moghimi, Frank Piessens, Michael Kernel Memory from User Space. In 27th USENIX Security Symposium. Schwarz, Berk Sunar, Jo Van Bulck, and Yuval Yarom. 2019. Fallout: USENIX Association. Proceedings of the 2019 Leaking Data on Meltdown-resistant CPUs. In [23] Giorgi Maisuradze and Christian Rossow. 2018. ret2spec: Speculative ACM SIGSAC Conference on Computer and Communications Security . Execution Using Return Stack Buffers. In Proceedings of the 2018 ACM ACM. SIGSAC Conference on Computer and Communications Security. ACM. [6] Claudio Canella, Jo Van Bulck, Michael Schwarz, Moritz Lipp, Ben- [24] Ross McIlroy, Jaroslav Sevcik, Tobias Tebbi, Ben L. Titzer, and Toon jamin von Berg, Philipp Ortner, Frank Piessens, Dmitry Evtyushkin, Verwaest. 2019. Spectre is here to stay: An analysis of side-channels and and Daniel Gruss. 2019. A Systematic Evaluation of Transient Ex- speculative execution. arXiv:1902.05178 28th USENIX Security Symposium ecution Attacks and Defenses. In . [25] OpenSSL. 2019. Security Policy. Retrieved April 6, 2020 from https: USENIX Association. //www.openssl.org/policies/secpolicy.html [7] Sunjay Cauligi, Craig Disselkoen, Klaus von Gleissenthall, Dean [26] Thomas Pornin. 2016. Why Constant-Time Crypto? Retrieved Towards Tullsen, Deian Stefan, Tamara Rezk, and Gilles Barthe. 2019. November 15, 2018 from https://www.bearssl.org/constanttime.html Constant-Time Foundations for the New Spectre Era . arXiv:1910.01755v2 [27] Thomas Pornin. 2018. Constant-Time Toolkit. Retrieved November [8] Sunjay Cauligi, Gary Soeller, Brian Johannesmeyer, Fraser Brown, 15, 2018 from https://github.com/pornin/CTTK Riad S. Wahby, John Renner, Benjamin Gregoire, Gilles Barthe, Ranjit [28] Michael Schwarz, Claudio Canella, Lukas Giner, and Daniel Gruss. Jhala, and Deian Stefan. 2019. FaCT: A DSL for timing-sensitive com- 2019. Store-to-Leak Forwarding: Leaking Data on Meltdown-resistant 40th ACM SIGPLAN Conference on Programming Language putation. In CPUs. arXiv:1905.05725 Design and Implementation. ACM. PLDI ’20, June 15–20, 2020, London, UK S. Cauligi, C. Disselkoen, K. v. Gleissenthall, D. Tullsen, D. Stefan, T. Rezk, and G. Barthe

[29] Michael Schwarz, Moritz Lipp, Daniel Moghimi, Jo Van Bulck, Julian Registers Program Stecklina, Thomas Prescher, and Daniel Gruss. 2019. ZombieLoad: r ρ(r) n µ(n) Cross-Privilege-Boundary Data Sampling. In Proceedings of the 2019 ra 1pub 1 (rc = load([48, ra], 2)) ACM SIGSAC Conference on Computer and Communications Security. rb 8pub 2 fence 3 ACM. Memory 3 jmpi([12,rb ]) [30] Yan Shoshitaishvili, Ruoyu Wang, Christopher Salls, Nick Stephens, a µ(a) ... Mario Polino, Audrey Dutcher, John Grosen, Siji Feng, Christophe 44..47 array B 16 fence 17 Hauser, Christopher Kruegel, and Giovanni Vigna. 2016. SoK: (State pub 48..4B ( = ([44, r ], )) of) The Art of War: Offensive Techniques in Binary Analysis. In 37th array Keysec 17 rd load c 18 IEEE Symposium on Security and Privacy. IEEE. Directive Effect on buf Leakage [31] Paul Turner. 2019. Retpoline: a software construct for preventing fetch 1 7→ rc = load(48 + ra) branch-target-injection. Retrieved April 6, 2020 from https://support. fetch 7→ fence google.com/faqs/answer/7625886 2 [32] Stephan van Schaik, Alyssa Milburn, Sebastian Österlund, Pietro Frigo, execute 1 1 7→ rc = Key[1]sec read 49pub Giorgi Maisuradze, Kaveh Razavi, Herbert Bos, and Cristiano Giuffrida. fetch: 17 3 7→ jmpi([12,rb ], 17) 2019. RIDL: Rogue In-Flight Data Load. In 40th IEEE Symposium on fetch 4 7→ rd = load([44,rc ]) Security and Privacy. IEEE. retire 1 < buf [33] Guanhua Wang, Sudipta Chattopadhyay, Ivan Gotovchits, Tulika Mitra, retire 2 < buf and Abhik Roychoudhury. 2019. oo7: Low-overhead Defense against execute 4 4 7→ r = X read asec IEEE Transactions on Software d Spectre Attacks via Program Analysis. where a = Key[1] + 40 Engineering (2019). sec [34] Henry Wong. 2014. Store-to-Load Forwarding and Memory Dis- ambiguation in X86 Processors. Retrieved April 6, 2020 from Figure 11. Example demonstrating Spectre v2 attack from https://blog.stuffedcow.net/2014/01/x86-memory-disambiguation/ a mistrained indirect branch predictor. Speculation barriers [35] Meng Wu and Chao Wang. 2019. Abstract Interpretation under Spec- are not a useful defense against this style of attack. ulative Execution. In 40th SIGPLAN ACM Conference on Programming Language Design and Implementation. ACM. In the execute stage, we calculate the actual jump target A Extended semantics and compare it to the guess. If the actual target and the guess match, we update the entry in the reorder buffer to the A.1 Indirect jumps resolved jump instruction jump n0. If actual target and the Semantics. The semantics for jmpi are given below: guess do not match, we roll back the execution by removing all buffer entries larger or equal to i, update the buffer with jmpi-fetch the resolved jump to the correct address, and set the next µ(n) = jmpi(−r−v−⃗−−) ′ −−−−− ′ instruction. i = MAX(buf ) + 1 buf = bu f [i 7→ jmpi(rv⃗,n )] Like conditional branch instructions, indirect jumps leak (ρ, µ,n, buf ) ,−−−−−−→ (ρ, µ,n′, buf ′) the calculated jump target. fetch: n′ Examples. The example in Figure 11 shows how a mistrained jmpi-execute-correct indirect branch predictor can lead to disclosure vulnerabil- −−−−− buf (i) = jmpi(rv⃗,n0) ities. After loading a secret value into r at program point −−−−− −−−−−− c j < i : buf (j) , fence (buf +i ρ)(rv⃗) = v⃗ℓ 1, the program makes an indirect jump. An adversary can −⃗∀−− −−−−−− ′ ℓ = ⊔ℓ addr(v⃗ℓ) = n0 buf = buf [i 7→ jump n0] mistrain the predictor to send execution to 17 instead of the J K jump n0ℓ intended branch target, where the secret value in r is im- (ρ, µ,n, buf ) ,−−−−−−→ (ρ, µ,n, buf ′) c execute i mediately leaked. Because indirect jumps can have arbitrary branch target locations, fence instructions do not prevent jmpi-execute-incorrect −−−−− these kinds of attacks; an adversary can simply retarget the buf (i) = jmpi(rv⃗,n ) j < i : buf [j] , fence 0 −−− indirect jump to the instruction after the fence, as is seen in (buf + ρ)(−r−v−⃗−−) = −v−−−⃗−− ℓ = ⊔⃗∀ℓ addr(−v−−−⃗−−) = n′ n i ℓ ℓ , 0 this example. buf ′ = buf [j : j < i][ i 7→Jjump n′]K ′ rollback,jump nℓ A.2 Return address prediction (ρ, µ,n, buf ) ,−−−−−−−−−−−−−→ (ρ, µ,n′, buf ′) execute i Next, we discuss how our semantics models function calls. When fetching a jmpi instruction, the schedule guesses Instructions. We introduce the following two physical in- ′ the jump target n . The rule records the operands and the structions: call(nf ,nret), where nf is the target program point guessed program point in a new buffer entry. In a real proces- of the call and nret is the return program point; and the return sors, the jump target guess is supplied by an indirect branch instruction ret. Their transient forms are simply call and ret. predictor; as branch predictors can be arbitrarily influenced Call stack. To track control flow in the presence of function by an adversary [14], we model the guess as an attacker calls, our semantics explicitly maintains a call stack in mem- directive. ory. For this, we use a dedicated register rsp which points to Constant-Time Foundations for the New Spectre Era PLDI ’20, June 15–20, 2020, London, UK

n 1 2 3 Program We now present the step rules for our semantics. µ(n) call(3, 2) ret ret Directive n buf σ Calling. fetch 1 → 3 1 7→ call 1 7→ push 2 2 7→ rsp = op(succ,rsp) call-direct-fetch 3 7→ store(2, [rsp]) µ(n) = call(nf ,nret) i = MAX(buf ) + 1 → 7→ 7→ fetch 3 2 4 ret 4 pop buf 1 = buf [i 7→ call][i + 1 7→ (rsp = op(succ,rsp))] 5 7→ r = load([r ]) ′ tmp sp buf = buf 1[i + 2 7→ store(nret, [rsp])] 6 7→ rsp = op(pred,rsp) ′ ′ σ = σ[i 7→ push nret] n = nf 7→ ([ ] ) 7 jmpi rtmp , 2 ′ ′ ′ → 7→ 7→ (ρ, µ,n, buf , σ) ,−−−→ (ρ, µ,n , buf , σ ) fetch: n 2 n 8 ret 8 pop fetch 9 7→ rtmp = load([rsp]) 10 7→ rsp = op(pred,rsp) 11 7→ jmpi([rtmp],n) call-retire Figure 12. Example demonstrating a ret2spec-style at- MIN(buf ) = i tack [23]. The attacker is able to send (speculative) execution buf (i) = call buf (i + 1) = (rsp = vℓ) ′ to an arbitrary program point, shown in red. buf (i + 2) = store(nret, aℓa ) ρ = ρ[rsp 7→ vℓ] ′ ′ µ = µ[a 7→ nret] buf = buf [j : j > i + 2]

write aℓ (ρ, µ,n, buf , σ) ,−−−−−−−→a (ρ′, µ′,n, buf ′, σ) the top of the call stack, and which we call the stack pointer retire register. On fetching a call instruction, we update r to point to sp On fetching a call instruction, we add three transient in- the address of the next element of the stack using an ab- structions to the reorder buffer to model pushing the return stract operation succ. It then saves the return address to the address to the in-memory stack. The first transient instruc- newly computed address. On returning from a function call, tion, call, simply serves as an indication that the following our semantics transfers control to the return address at r , sp two instructions come from fetching a call instruction. The and then updates r to point to the address of the previ- sp remaining two instructions advance r to point to a new ous element using a function pred. This step makes use of a sp stack entry, then store the return address n in the new en- temporary register r . ret tmp try. Neither of these transient instructions are fully resolved— Using abstract operations succ and pred rather than com- they will need to be executed in later steps. We next add a mitting to a concrete implementation allows our semantics new entry to the RSB, signifying a push of the return ad- to capture different stack designs. For example, on a 32-bit dress n to the RSB. Finally, we set our program point to x86 processor with a downward-growing stack, op(succ,r ) ret sp the target of the call n . would be implemented as r − 4, while op(pred,r ) would f sp sp When retiring a call, all three instructions generated dur- be implemented as r + 4; on an upward growing system, sp ing the fetch are retired together. The register file is updated the reverse would be true. with the new value of r , and the return address is written Note that the stack register r is not protected from illegal sp sp to physical memory, producing the corresponding leakage. access and can be updated freely. The semantics for direct calls can be extended to cover Return stack buffer. For performance, modern processors indirect calls in a straightforward manner by imitating the speculatively predict return addresses. To model this, we semantics for indirect jumps. We omit them for brevity. extend configurations with a new piece of state called the re- turn stack buffer (RSB), written as σ. The return stack buffer Evaluating the RSB. We define a function top(σ) that re- contains the expected return address at any execution point. trieves the value at the top of the RSB stack. For this, we let Its implementation is simple: for a call instruction, the se- σ be a function that transforms the RSB stack σ into a stack mantics pushes the return address to the RSB, while for a inJ K the form of a partial map (st : N ⇀ V) from the natural ret instruction, the semantics pops the address at the top numbers to program points, as follows: the function · ap- of the RSB. Similar to the reorder buffer, we address the plies the commands for each value in the domain of σ,J inK the RSB through indices and roll it back on misspeculation or order of the indices. For a push n it adds n to the lowest empty memory hazards. index of st. For pop, it and removes the value with the highest We model return prediction directly through the return index in st, if it exists. We then define top(σ) as st(MAX(st)), stack buffer rather than relying on attacker directives, as where st = σ , and ⊥, if the domain of st is empty. For exam- most processors follow this simple strategy, and the predic- ple, if σ is givenJ K as ∅[1 7→ push 4][2 7→ push 5][3 7→ pop], tions therefore cannot be influenced by an attacker. then σ = ∅[1 7→ 4], and top(σ) = 4. J K PLDI ’20, June 15–20, 2020, London, UK S. Cauligi, C. Disselkoen, K. v. Gleissenthall, D. Tullsen, D. Stefan, T. Rezk, and G. Barthe

Returning. Registers Program r ρ(r) n µ(n) ret-fetch-rsb r 8 3 call(5, 4) µ(n) = ret top(σ) = n′ b pub rsp 7Cpub 4 fence 4 i = MAX(buf ) + 1 buf 1 = buf [i 7→ ret] 5 rd = op(addr, [12,rb ], 6) buf = buf [i + 1 7→ (r = load([r ]))] 2 1 tmp sp 6 store(rd , [rsp], 7) buf 3 = buf 2[i + 2 7→ (rsp = op(pred,rsp))] 7 ret ′ buf 4 = buf 3[i + 3 7→ jmpi([rtmp],n )] ′ σ = σ[i 7→ pop] Effect of successive fetch directives ′ ′ n buf σ (ρ, µ,n, buf , σ) ,−−−→ (ρ, µ,n , buf 4, σ ) fetch 3 → 5 3 7→ call 3 7→ push 4 4 7→ r = op(succ,r ) ret-fetch-rsb-empty sp sp 7→ (4 [ ]) µ(n) = ret top(σ) = ⊥ 5 store , rsp 5 → 6 6 7→ r = op(addr, [12,r ]) i = MAX(buf ) + 1 buf = buf [i 7→ ret] d b 1 6 → 7 7 7→ store(r , [r ]) buf = buf [i + 1 7→ (r = load([r ]))] d sp 2 1 tmp sp 7 → 4 8 7→ ret 8 7→ pop buf 3 = buf 2[i + 2 7→ (rsp = op(pred,rsp))] ′ 9 7→ rtmp = load([rsp]) buf 4 = buf 3[i + 3 7→ jmpi([rtmp],n )] 10 7→ rsp = op(pred,rsp) σ ′ = σ[i 7→ pop] 11 7→ jmpi([rtmp], 4) ′ ′ (ρ, µ,n, buf , σ) ,−−−−−−→ (ρ, µ,n , buf 4, σ ) 4 → 4 12 7→ fence fetch: n′ Directive Effect on buf Leakage ret-retire execute 4 4 7→ r = 7B MIN(buf ) = i sp execute 6 6 7→ rd = 20 buf (i) = ret buf (i + 1) = (rtmp = v1ℓ ) 1 execute 7 : value 7 7→ store(20, [rsp]) buf (i + ) = (r = v ) buf (i + ) = jump n′ 2 sp 2ℓ2 3 execute 7 : addr 7 7→ store(20, 7B) fwd 7B ρ′ = ρ[r 7→ v ] buf ′ = buf [j : j > i + 3] sp 2ℓ2 execute 9 9 7→ rtmp = 20 fwd 7B ( , , , , ) ,−−−→ ( ′, , , ′, ) 12 < buf rollback, ρ µ n buf σ ρ µ n buf σ execute 11 retire 11 7→ jump 20 jump 20 On a fetch of ret, the next program point is set to the pre- dicted return address, i.e., the top value of the RSB, top(σ). Figure 13. Example demonstrating “retpoline” mitigation Just as with call, we add the transient ret instruction to the against Spectre v2 attack. The program is able to jump to reorder buffer, followed by the following (unresolved) in- program point 12 +rb = 20 without the schedule influencing structions: we load the value at address rsp into a temporary prediction. register rtmp, we “pop” rsp to point back to the previous stack entry, and then add an indirect jump to the program point given by rtmp. Finally, we add a pop entry to the RSB. As with Examples. We present an example of an RSB underflow at- call instructions, the set of instructions generated by a ret tack in Figure 12. After fetching a call and paired ret instruc- fetch are retired all at once. tion, the RSB will be “empty”. When one more (unmatched) When the RSB is empty, the attacker can supply a specu- ret instruction is fetched, since top(σ) = ⊥, the program lative return address via the directive fetch: n′. This is con- point n is no longer set by the RSB, and is instead set by the sistent with the behavior of existing processors. In practice, (attacker-controlled) schedule. there are several variants on what processors actually do Retpoline mitigation. A mitigation for Spectre v2 attacks when the RSB is empty [23]: is to replace indirect jumps with retpolines [31]. Figure 13 ▶ AMD processors refuse to speculate. This can be mod- shows a retpoline construction that would replace the indi- eled by defining top(σ) to be a failing predicate if it rect jump in Figure 11. The call sends execution to program would result in ⊥. point 5, while adding 4 to the RSB. The next two instructions ▶ Intel Skylake/Broadwell processors fall back to using at 5 and 6 calculate the same target as the indirect jump in their branch target predictor. This can be modeled by Figure 11 and overwrite the return address in memory with allowing arbitrary n′ for the fetch: n′ directive for the this jump target. When executed speculatively, the ret at 7 ret-fetch-rsb-empty rule. will pop the top value off the RSB, 4, and jump there, landing ▶ “Most” Intel processors treat the RSB as a circular on a fence instruction that loops back on itself. Thus spec- buffer, taking whichever value is produced when the ulative execution cannot proceed beyond this point. When RSB over- or underflows. This can be modeled by hav- the transient instructions in the ret sequence finally execute, ing top(σ) always produce an according value, and the indirect jump target 20 is loaded from memory, causing a never producing ⊥. roll back. However, execution is then directed to the proper Constant-Time Foundations for the New Spectre Era PLDI ’20, June 15–20, 2020, London, UK jump target. Notably, at no point is an attacker able to hijack will result in the same configuration since they inde- the jump target via misprediction. pendently resolve the components of the store. ′′ ′′ ▶ For br, D1 and D2 may have different guesses for their B Full proofs initial fetch directives. However, both cond-execute- B.1 Consistency correct and cond-execute-incorrect will result in ′ ′′ the same configuration regardless of the initial guess, o ′ o ′′ Lemma B.1 (Determinism). If C ,−→ C and C ,−→ C then br d d as the is the only instruction in the buffer. Similarly C ′ = C ′′ and o′ = o′′. for jmpi. For call and ret, the ordering of execution of the re- ( ) ▶ Proof. The tuple C,d fully determines which rule of the sulting transient instructions does not affect the final semantics can be executed. □ configuration.

Definition B.2 (Initial/terminal configuration). A configu- Thus for all cases we have C1 = C2. □ ration C is an initial (or terminal) configuration if |C.buf | = To make our discussion easier, we will say that a directive 0. o d applies to a buffer index i if when executing a step C ,−→ C ′: Definition B.3 (Sequential schedule). Given a configura- d tion C, we say a schedule D is sequential if every instruction ▶ d is a fetch directive, and would fetch an instruction that is fetched is executed and retired before further instruc- into index i in buf . tions are fetched. ▶ d is an execute directive, and would execute the in- struction at index i in buf . ⇓N ′ Definition B.4 (Sequential execution). C O D C is a sequen- ▶ d is a retire directive, and would retire the instruction tial execution if C is an initial configuration, D is a sequential at index i in buf . ′ schedule for C, and C is a terminal configuration. We would like to reason about schedules that do not con- N ′ tain misspeculated steps, i.e., directives that are superfluous We write C O⇓seq C if we execute sequentially. due to their effects getting wiped away by rollbacks. N Lemma B.5 (Sequential equivalence). If C O ⇓ C1 is se- 1 D1 Definition B.6 (Misspeculated steps). Given an execution ⇓N quential and C O2 D C2 is sequential, then C1 = C2. N ′ 2 C O⇓D C , we say that D contains misspeculated steps if there ′ N ′′ ′ Proof. Suppose N = 0. Then neither D1 nor D2 may contain exists d ∈ D such that D = D \ d and C O′⇓D′ C = C . any retire directives. Since we assume that both C .buf and 1 Given an execution C ⇓N C ′ that may contain rollbacks, C .buf have size 0, neither D nor D may contain any fetch O D 2 1 2 we can create an alternate schedule D∗ without any rollbacks directives. Therefore, both D and D are empty; both C and 1 2 1 by removing all misspeculated steps. Note that sequential C are equal to C. 2 schedules have no misspeculated steps6 as defined in Defini- We proceed by induction on N . ′ tion B.6. Let D1 be a sequential prefix of D1 up to the N − 1th retire, ′′ ′ and let D1 be the remainder of D1. That is, #{d ∈ D1 | d = Theorem B.7 (Equivalence to sequential execution). Let C ′ ′′ ′ ′′ retire} = N −1 and D1 ∥D1 = D1. Let D2 and D2 be similarly be an initial configuration and D a well-formed schedule for ⇓N ⇓N ≈ defined. C. If C O1 D C1, then C O2 seq C2 and C1 C2. Furthermore, if N −1 ′ By our induction hypothesis, we know C O′⇓ ′ C and C1 is terminal then C1 = C2. 1 D1 N −1 ′ ′ ′ ′ C ′⇓ ′ C for some C . Since D (resp. D ) is sequential O2 D 1 2 Proof. Since we can always remove all misspeculated steps ′2 ′′ ′′ and |C .buf | = 0, the first directive in D1 (resp. D2 ) must be from any well-formed execution without affecting the final ′ 1 ′ 1 a fetch directive. Furthermore, C O′′⇓ ′′ C1 and C O′′⇓ ′′ C2. configuration, we assume D1 has no misspeculated steps. 1 D1 2 D2 We can now proceed by cases on C ′.µ[C ′.n], the final Suppose N = 0. Then the theorem is trivially true. We instruction to be fetched. proceed by induction on N . Let D′ be the subsequence of D containing the first N − 1 ▶ For op, the only valid sequence of directives is (fetch, 1 1 execute i, retire) where i is the sole valid index in the retire directives and the directives that apply to the same N − D′′ buffer. Similarly for fence, with the sequence {fetch, retire}. indices of the first 1 retire directives. Let 1 be the complement of D′ with respect to D . All directives in D′′ ▶ For load, alias prediction is not possible, as no prior 1 1 1 D′ stores exist in the buffer. Therefore, just as with op, the apply to indices later than any directive in 1, and thus D′ D′ only valid sequence of directives is (fetch, execute i, cannot affect the execution of directives in 1. Thus 1 is a N −1 ′ well-formed schedule and produces execution C O′⇓ ′ C . retire). 1 D1 1 ′′ ▶ For store, the only possible difference between D1 6 ′′ Sequential schedules may still misspeculate on conditional branches but and D2 is the ordering of the execute i : value and the rollback does not imply removal of any reorder buffer instructions as execute i : addr directives. However, both orderings defined in Definition B.6. PLDI ’20, June 15–20, 2020, London, UK S. Cauligi, C. Disselkoen, K. v. Gleissenthall, D. Tullsen, D. Stefan, T. Rezk, and G. Barthe

′′ ′′ ′′ Since D1 contains no misspeculated steps, the directives Dseq (with trace O2 ) as the subsequence of D1 of directives ′′ ′ in D1 can be reordered after the directives in D1. Thus applying to the remaining instruction to be retired. As noted ′′ ′ D1 is a well-formed schedule for C1, producing execution before, executing the subsequence of a schedule produces ′ 1 ′′ ′′ ′′ C O′′⇓ ′′ C with C ≈ C1. If C1 is terminal, then C is also the corresponding subsequence of the original trace; hence 1 1 D1 1 1 1 ′′ o ∈ O ′′ : ℓ < o. terminal and C1 = C1. 2 ′ ∀ ′ ′′ By our induction hypothesis, we know there exists D The trace of the final (sequential) schedule Dseq = Dseq ∥Dseq seq ′ ′′ ′ N −1 ′ ′ is O ∥O . Since O satisfies the label stability property via such that C O′⇓ ′ C . Since D contains equal numbers of 2 2 2 2 Dseq 2 1 the induction hypothesis, we have o ∈ O ′ ∥O ′′ : ℓ < o. fetch and retire directives, ends with a retire, and contains ∀ 2 2 ′ ′ ′ □ no misspeculated steps, C1 is terminal. Thus C1 = C2. ′′ ′′ Let Dseq be the subsequence of D1 containing the retire By letting ℓ be the label secret, we get the following corol- ′′ directive in D1 and the directives that apply to the same lary: ′′ ′ index. Dseq is sequential with respect to C1 and produces ′ 1 ′′ ′′ ′′ ′′ Corollary B.10 (Secrecy). If speculative execution ofC under executionC O′′⇓ ′′ C withC ≈ C ≈ C1. IfC is terminal, 1 2 Dseq 2 2 1 1 schedule D produces a trace O that contains no secret labels, D′′ = D′′ C ′′ = C ′′ = C then seq 1 and thus 2 1 1. then sequential execution of C will never produce a trace that ′ ∥ ′′ Let Dseq = Dseq Dseq. Dseq is thus itself sequential and contains any secret labels. N ′′ produces execution C ′ ′′ ⇓ C , completing our proof. (O2 ∥O2 ) seq 2 □ With this, we can prove the following proposition: Corollary B.8 (General consistency). Let C be an initial Proposition B.11. For a given initial configuration C and configuration. If C ⇓N C and C ⇓N C , then C ≈ C . well-formed schedule D, if C is SCT with respect to D, and O1 D1 1 O2 D2 2 1 2 execution of C with D results in a terminal configuration C1, Furthermore, if C1 and C2 are both terminal then C1 = C2. then C is also sequentially constant-time. Proof. By Theorem B.7, there exists D′ such that execut- seq ′ ≃ C C ′ ≈ C C ′ = C Proof. Since C is SCT, we know that for all C pub C, we ing with produces 1 1 (resp. 1 1). Similarly, N ′ N ′ ′ ′ ′′ ′ ′ ′ have C O⇓D C1 and C O ⇓D C where C1 ≃pub C and O = O . there exists Dseq that produces C2 ≈ C2 (resp. C2 = C2). By 1 1 Lemma B.5, we have C ′ = C ′. Thus C ≈ C (resp. C = By Theorem B.7, we know there exist sequential executions 1 2 1 2 1 N ′ N ′ such that C ⇓ C and C ′ ⇓ C . Note that the two C2). □ Oseq seq 2 Oseq seq 2 sequential schedules need not be the same. ′ B.2 Security C1 is terminal by hypothesis. Execution ofC uses the same ′ ℓ schedule D, so C1 is also terminal. Since we have C1 = C2 Theorem B.9 (Label stability). Let be a label in the lattice ′ ′ ′ ′ N N ≃ ≃ L. If C ⇓ C and o ∈ O : ℓ o, then C ⇓ C and and C1 = C2, we can lift C1 pub C1 to get C2 pub C2. O1 D1 1 1 < O2 seq 2 ′ ∈ ℓ ∀ To prove the trace property Oseq = Oseq, we note that if o O2 : < o. ′ ′ ∀ Oseq , Oseq, then since C2 ≃pub C , it must be the case that ∗ 2 Proof. Let D1 be the schedule given by removing all mis- there exists some o ∈ Oseq such that secret ∈ Oseq. Since ∗ ′ speculated steps from D1. The corresponding trace O1 is a this is also true for O and O , we know that there exist no ∗ ′ subsequence of O1, and hence o ∈ O : ℓ < o. We thus observations in either O or O that contain secret labels. ∀ 1 proceed assuming that execution of D1 contains no misspec- By Corollary B.10, it follows that no secret labels appear in ′ ′ ulated steps. either Oseq or Oseq, and thus Oseq = Oseq. □ Our proof closely follows that of Theorem B.7. When ′ ′′ constructing D1 and D1 from D1 in the inductive step, we B.3 Soundness of Pitchfork ′′ know that all directives in D1 apply to indices later than any Definition B.12 (Affecting an index). We say a directive d ′ directive in D1, and cannot affect execution of any directive affects an index i if: in D′ . This implies that O ′ is the subsequence of O that 1 1 1 d is a fetch-type directive and would produce a new corresponds to the mapping of D′ to D . ▶ 1 1 mapping in buf at index i. Reordering the directives in D′′ after D′ do not affect the 1 1 d is an execute-type directive and specifies index i observations produced by most directives. The exceptions to ▶ directly (e.g., execute i). this are execute directives for load instructions that would d is a retire directive and would cause the instruction have received a forwarded value: after reordering, the store ▶ at i in buf to be removed. instruction they forwarded from may have been retired, and they must fetch their value from memory. However, even in Definition B.13 (Path function). The function Path(C, D) this case, the address aℓa attached to the observation does produces the sequence of branch choice (from fetching br ′′ not change. Thus o ∈ O2 : ℓ < o. instructions) and store-forwarding information (when exe- Continuing the∀ proof as in Theorem B.7, we create sched- cuting load instructions) when executing D with initial con- ′ ′ ule Dseq (with trace O2) from the induction hypothesis and figuration C. That is, for a schedule D without misspeculated Constant-Time Foundations for the New Spectre Era PLDI ’20, June 15–20, 2020, London, UK steps: same register dependencies. For each register dependency Path(C, ∅) = [] r, if the register was calculated by a transient instruction at a prior index j, we can create prefixes D1, j and D2, j of D1 ( ) ( ) Path C, D ; i,b , d = fetch: b and D2 respectively that end at the execute directive that  Path(C, D); (i, j), d produces vℓ {j, a} resolves r at buffer index j. By our induction hypothesis, Path(C, D∥d) = both D and D calculate the same value v for r. Path(C, D); (i, ⊥), d produces vℓ {⊥, a} 1, j 2, j ℓ  We now proceed by cases on the transient instruction Path(C, D), otherwise  being executed. where d affects index i. If D has misspeculated steps, then Op, Store (value). Since all dependencies calculate the same ∗ ∗ Path(C, D) = Path(C, D ) where D is the subset of D with values, both instructions calculate the same value. misspeculated steps removed. We write simply Path(D) when Store (address). Both instructions calculate the same address. C is obvious. Since Path(D1) = Path(D2), both schedules have the same For the Lemmas B.14, B.16 and B.17, we start with the pattern of store-forwarding behavior. Thus execution of d1 following shared assumptions: causes a hazard if and only if d2 causes a hazard. ▶ C is an initial configuration. Load. Both instructions calculate the same address, pro- ▶ D1 and D2 are nonempty schedules. ducing the same observations o1 and o2. Since Path(D1) = ⇓ ⇓ Path(D ), either d and d cause the values to be retrieved ▶ C D1 O1 C1 and C D2 O2 C2. 2 1 2 ▶ Path(C, D1) = Path(C, D2). from the same prior stores, or they both load values from the ′ ′ ▶ D1 = D1 ∥d1 and D2 = D2 ∥d2 and d1 = d2. same address in memory. By our induction hypothesis, these ▶ d1 and d2 affect the same index i in the their respective values will be the same, so both instructions will resolve to reorder buffers. the same value.

Let o1 (resp. o2) be the observation produced during execu- Branch. Both instructions calculate the same branch con- tion of d1 (resp. d2). dition, producing the same observations o1 and o2. Since Path(D1) = Path(D2), execution of d1 causes a misspecu- Lemma B.14 (Fetch). If d1 and d2 are both fetch-type direc- lation hazard if and only if d also causes misspeculation . . . [ ] . [ ] 2 tives, then C1 n = C2 n and C1 buf i = C2 buf i . hazard. □ Proof. Since fetches happen in-order, the index i of a given Lemma B.17. If d1 and d2 are both retire directives, then physical instruction along a control flow path is deterministic. o1 = o2. Both D1 and D2 both have the same (control flow) path. Since by hypothesis both d1 and d2 affect the same index i, d1 Proof. From Lemmas B.14 and B.16 we know that for both and d2 must necessarily both be fetching the same physical d1 and d2, the transient instructions to be retired are the instruction. Furthermore, since Path(D1) = Path(D2), if the same. Thus the produced observations o1 and o2 are also the fetched instruction is a br instruction, then both d1 and d2 same. □ must have made the same guess. The lemma statements all We now formally define the set of schedules examined by hold accordingly. □ Pitchfork: ∗ ∗ Corollary B.15. If D1 and D2 are nonempty schedules such Definition B.18 (Tool schedules). Given an initial configu- ∗ ∗ ∗ ∗ that C ∗⇓ C and C ∗⇓ C and Path(C, D ) = Path(C, D ), D1 1 D2 2 1 2 ration C and a speculative window size n, we define the set then: For any i ∈ C∗.buf , if i ∈ C∗.buf , then both C∗.buf [i] 1 2 1 of tool schedules DT (n) recursively as follows: The empty and C∗.buf [i] were derived from the same physical instruction. 2 schedule ∅ is in DT (n). If D0 ∈ DT (n) and C D0⇓ C0 and ∗ |C .buf | < n, then based on the next instruction to be fetched Proof. Let D1 be the prefix of D such that the final directive 0 1 (and where i is the index of the fetched instruction): in D1 is the latest fetch that affects i. Let D2 be similarly ∗ op: D ∥fetch; execute i ∈ D (n). defined w.r.t. D2. Then by Lemma B.14, D1 and D2 both fetch ▶ 0 T the same physical instruction to index i. □ ▶ load: D0 ∥fetch; execute i ∈ DT (n). ▶ store: D0 ∥fetch; execute i : value ∈ DT (n) and Lemma B.16. If d1 and d2 are both execute-type directives, D0 ∥fetch; execute i : value; execute i : addr ∈ DT (n). then C .buf [i] = C .buf [i] and o = o . 1 2 1 2 ▶ br: Let b be the “correct” path for the branch condition. ∥ ∈ ( ) Proof. We proceed by full induction on the size of D1. Then D0 fetch: b; execute i DT n and ∥ ¬ ∈ ( ) For the base case: if |D1| = 1, then the lemma statements D0 fetch: b DT n . are trivial regardless of the directive d1. Otherwise, if |C0.buf | = n, then we instead extend based We know from Corollary B.15 that since d1 and d2 both on the oldest instruction in the reorder buffer. If the oldest affect the same index i, the two transient instruction must be instruction is a store with an unresolved address, and will derived from the same physical instruction, and thus has the not cause a hazard, then D0 ∥execute i : addr; retire ∈ DT (n). PLDI ’20, June 15–20, 2020, London, UK S. Cauligi, C. Disselkoen, K. v. Gleissenthall, D. Tullsen, D. Stefan, T. Rezk, and G. Barthe

Otherwise, if the oldest instruction is fully resolved, then D0 ∥retire ∈ DT (n).

Proposition B.19 (Path coverage). If D1 is a well-formed schedule for C whose reorder buffer never grows beyond size n, then D2 : Path(D1) = Path(D2) ∧ D2 ∈ DT (n). ∃ Proof. The proof stems directly from the definition of DT (n); at every branch, both branches are added to the set of sched- ules, and every load is able to “skip” any combination of prior stores. □ Theorem B.20 (Soundness of tool). If speculative execution of C under a schedule D with speculation bound n produces a trace O that contains at least one secret label, then there exists a schedule Dt ∈ DT (n) that produces a trace Ot that also contains at least one secret label. Proof. We can truncate D to a schedule D∗ that ends at the first directive to produce a secret observation. By Propo- sition B.19 there exists a schedule D0 ∈ DT (n) such that ∗ Path(Dt ) = Path(D ). By following construction of tool schedules as given in Definition B.18, we can find a schedule Dt ∈ DT (n) that satisfies the preconditions for Lemma B.16. Then by that same lemma, Dt produces the same final obser- vation as D∗, which contains a secret label. □