A Benchmark Suite for Evaluating Caches’ Vulnerability to Timing Attacks Shuwen Deng, Wenjie Xiong, Jakub Szefer Yale University {shuwen.deng, wenjie.xiong, jakub.szefer}@yale.edu

Abstract oretical model of all possible timing-based attacks in caches, Timing-based side or covert channels in processor caches and a benchmark suite that can test for the theoretical vulnera- continue to present a threat to computer systems, and they bilities on real processors, or simulations of new designs. A are the key to many of the recent Spectre and Meltdown at- key part of this work is an improved three-step model. Com- tacks. Based on improvements to an existing three-step model pared to existing work [11], the new three-step model addition- for cache timing-based attacks, this work presents 88 Strong ally considers: (1) differences in “local” and “remote” cores types of theoretical timing-based vulnerabilities in processor for making the timing observations, and running the victim caches. To understand and evaluate all possible types of vul- or the attacker code, (2) the victim and attacker running in nerabilities in processor caches, this work further presents hyper-threading or time-slicing, (3) using both read and write and implements a new benchmark suite which can be used operations as memory accesses in the potential attacks (all to test to which types of cache timing-based attacks a given but one prior work only considered reads), and (4) two types processor or cache design is vulnerable. In total, there are of cache line invalidation operations, through flush instruc- 1094 automatically-generated test programs which cover the tion, or using cache coherence by writing on “remote” core 88 theoretical vulnerabilities. The benchmark suite generates to invalidate “local” core’s cache lines. The new three-step the Cache Timing Vulnerability Score which can be used to model shows 32 new types of timing-based vulnerabilities not evaluate how vulnerable a specific cache implementation is to considered before, which is in addition to new attacks found different attacks. A smaller Cache Timing Vulnerability Score in the three-step model in [11]. means the design is more secure, and the scores among differ- Based upon the new three-step model, we derive that there ent machines can be easily compared. Evaluation is conducted are in total 88 Strong type of vulnerabilities, including 32 new on commodity Intel and AMD processors and shows the dif- ones. For these vulnerabilities, we write scripts to automat- ferences in processor implementations can result in different ically generate a total of 1094 benchmarks, which consider types of attacks that they are vulnerable to. Beyond testing different variants for each vulnerability, such as using reads commodity processors, the benchmarks and the Cache Timing vs. using writes. The benchmarks are then used to test com- Vulnerability Score can be used to help designers of new se- modity Intel and AMD machines on both workstations and cure processor caches evaluate their design’s susceptibility to servers in lab and machines in Amazon’s EC2 cloud. The cache timing-based attacks. benchmarks can also be run on simulators to evaluate new 1. Introduction types of secure processor caches. The benchmarks are fur- ther used to generate a new Cache Timing Vulnerability Score Cache timing channels have long been used to deploy attacks (CTVS), which can help evaluate how different processors are that can reconstruct sensitive data, especially secrets such as vulnerable to different attacks – differences of the processor cryptographic keys [3, 2, 36]. They usually make use of the implementations mean that not all processors have the same timing difference in the memory operations between observing vulnerabilities. Based on the CTVS, designers can have a a cache hit and a cache miss to derive the victim’s secrets. better understanding of the vulnerabilities in their designs and Since 2018, the cache timing-based attacks have gained new can develop new defenses. The defenses can, for example, be arXiv:1911.08619v1 [cs.CR] 19 Nov 2019 attention due to their use in Spectre [26] and Meltdown [31]. customized to vulnerabilities that CTVS detects on the given With the long history of attacks, there are also many de- processor, instead of a one-size-fits-all defense. fenses proposed or deployed in software, e.g., [7, 27], and in While the paper focuses on the 88 types of vulnerabilities hardware, e.g., [53, 54, 44, 45, 33, 6, 43, 30, 50, 14, 32, 24, 23, for the benchmark design, we also generated benchmarks for 25, 49, 4, 38, 47]. However, existing defenses can usually only all 4913 possible three-step combinations in the three-step prevent a subset of the attacks. For example, RIC [23] cache model. Based on the results, there are no more effective ones has been shown to be able to defend the Prime+Probe type of that are found apart from the vulnerabilities we derived. attacks [36, 37], but is vulnerable to the Flush+Reload [52] type of attacks. More importantly, so far most researchers have 1.1. Contributions focused on coming up with individual attacks (and defenses for them), and there has been limited work on understanding The contributions of this work are as follows: all possible types of attacks. • Derivation of a set of timing vulnerabilities on caches by the To address the need to understand and evaluate all the dif- three-step model, based on improved modeling of the pro- ferent possible types of attacks, this paper presents both a the- cessor’s behavior; a total of 88 Strong vulnerability types are 2 BACKGROUND

derived, including 32 new ones compared to prior work [11]. Table 1: The 17 possible states for a single cache block of three-step • Development of the first automatically-generated bench- model in [11]. mark suite that is used to evaluate processor caches’ vulner- ability to all possible timing attacks. State Description • Evaluation of the benchmarks on Intel and AMD processors The cache block contains a specific memory location u brought in the lab and on Amazon’s EC2 cloud environment. Vu in by the victim, which is the address unknown to the attacker • Generation of Cache Timing Vulnerability Score of proces- but within the set of sensitive memory locations x. The cache block state can be anything except u in the cache sors to understand which attacks they are vulnerable to, inv Vu block. The data and its address u are “removed” from the cache inv evaluate and customize defenses. block by the victim Vu . • Validation of the three-step model using the benchmarks, The cache block contains a specific memory location a, due to A a a memory access by the attacker, A , or the victim, V . The demonstrating no benchmarks that give positive results be- or a a address a is within the range of sensitive locations x and known yond the ones corresponding to the 88 Strong types of vul- V a to the attacker. nerabilities (in addition to Weak and repeat types). The cache block contains a memory address aalias, due to a memory access by the attacker, A alias , or the victim, V alias . The benchmarks and other code used in this paper will be re- Aaalias a a or alias leased under open-source license at https://urlredacted. The address a is within the range x and not the same as a, Vaalias but it maps to the same cache block as a, i.e. it “aliases” to a. The address aalias is known to the attacker. The cache block contains a memory address d due to a memory A or 2. Background d access by the attacker, A , or the victim, V . The address d is V d d d not within the range x and known to the attacker. This section presents background on cache timing-based at- Ainv The cache block is invalid. The data and its address are “re- or moved” from the cache block by the attacker, Ainv, or the victim, tacks, the existing three-step model and existing metrics used V inv V inv, as a result of cache block being invalidated. to help understand vulnerabilities of caches to timing attacks. inv Aa The cache block state can be anything except a in this cache or block now. The data and its address a are “removed” from the inv inv inv 2.1. Timing of Memory Operations and Caches Va cache block by the attacker, Aa , or the victim, Va . Ainv aalias aalias The cache block state can be anything except in this cache or block now. The data and its address aalias are “removed” from V inv the cache block by the attacker, Ainv , or the victim, V inv . Processor caches are the key to helping processors maintain aalias aalias aalias inv d high performance when accessing data. However, the caches Ad The cache block state can be anything except in this cache or block now. The data and its address d are “removed” from the are of finite size, not all data can fit in them, and timing related inv inv inv Vd cache block by the attacker Ad or the victim Vd . to the memory operations involving caches, such as accesses Any data, or no data, can be in the cache block. The attacker ? resulting in cache hits and misses, can reveal information has no knowledge of the memory address in this cache block. about the addresses or even data (for instruction caches it may be possible to reveal information about instructions as well). 2.3. Previous Three-Step Model for Timing-Based At- In general, two types of memory-related operations exhibit tacks in Caches timing variations that can be abused for timing-based side or covert channel attacks in processor data caches. First, memory Recent work [11] has presented a systematic approach to find access operations, such as loads and stores can be fast (e.g., a all possible cache timing-based vulnerabilities. Their model cache hit) or slow (e.g., a cache miss). Second, invalidation- is established based on two observations: all existing cache related operations, such as cache flush, can be fast (e.g., there timing attacks focusing on the data are within three memory is no dirty data in cache so flush finishes quickly) or slow (e.g., operations, and timing attacks can be analyzed by checking there is dirty data in the cache so it has to be written back, the behavior of one cache block (since all blocks are updated resulting in longer timing). in the same manner by the cache logic). Following these observations, the work in [11] presented a 2.2. Timing-Based Attacks on Processor Caches three-step model focusing on one cache block for evaluating all possible timing-based attacks. Further, a soundness analysis Researchers have proposed to use the timing differences in of the three-step model was performed to show that three steps memory-related operations to attack software, e.g., [19, 37, are sufficient to model all the timing-based attacks in caches. 2, 3, 1]. Especially, the timing-based side-channel attacks In the three-step model, each step represents the state of the often focus on cryptographic applications, e.g., attacks on cache line after some memory-related operation is performed. software using AES with table lookups [8]. Further, there are First, there is the initial step that sets the cache line into a many timing-based covert-channel attacks where the sender known state. Second, there is the following step that modifies and receiver cooperate to leak data, e.g., there are cache covert the state of the cache line. Finally, there is the last step, based channels focusing on the last-level cache [34] and cross-core on the timing of which, the change in the state of the cache cache covert channels [35]. And most recently, timing-based line is observed. Each of the steps can be performed by the channels are used as part of Spectre and Meltdown variants, attacker (A) or the victim (V). The goal of [11] is to find e.g., [26, 31, 40, 29]. which three-step combinations can represent timing attacks

2 2.4 Metrics for Vulnerabilities in Caches 3 MODELING FOR CACHE TIMING ATTACKS from which the attacker learns information about the unknown model [13], SVF [10] and CSV [56] all only evaluated Prime address accessed by the victim. The authors list 17 possible + Probe attack [36, 37]. Besides, the work in [28] quantified states for a cache line, shown in Table1. Among these states, the cache side-channel leakage but mainly focused on access- V represents that the state is a result of the victim’s operation, driven attacks such as Evict + Time Attack [36]. CacheD [42] while A represents that the state is a result of the attacker’s identified cache-based timing channels but mainly targeted operation. x denotes the set of virtual memory addresses access-driven attacks such as Evict + Time Attack [36] and storing addresses of sensitive data, and u denotes the victim’s time-driven attacks such as Bernstein’s Attack [2] for timing- secret address within x which is unknown to the attacker. a, related attacks. SCADET [39] provided a side-channel detec- aalias and d denote known memory addresses that map to the tion tool targeting Prime + Probe attack [36, 37]. same cache line. d refers to an address outside of x, while the This work is the first work to actually test for all possible others are the address within x. The attacker’s goal is to obtain timing-based vulnerabilities in caches, not just verify the cases u, which could be the same as a or aalias, maps to the same set concerning one or few attacks. It is also the first that presents as a, aalias and d, or not. code which can be run on commodity processors and which Given that there are 17 states and 3 steps, there are in total can generate the Cache Timing Vulnerability Score (CTVS) 173 = 4913 possible combinations of three steps that can be to evaluate different processors’ caches. derived. Their analysis demonstrated that 72 of the three-step combinations are Strong effective vulnerabilities, where the 3. Modeling for Cache Timing Attacks attacker can obtain the value of the unknown address u unam- The goal of this work is to present the first set of benchmarks biguously with fast or slow timing. Meanwhile, 64 patterns which can be used to evaluate all the vulnerabilities of proces- were shown to be Weak effective vulnerabilities, where there sor caches to timing-based attacks. Such attacks can be used, are timing differences according to different values of u, but for example, by Spectre variants, e.g., [26, 31, 40, 29], to ex- no single timing corresponds to a unique possible value of u. tract sensitive information. With our benchmarks, it is possible With their analysis, [11] showed that 29 out of 72 vulnerabili- to help test processor caches or future secure cache designs, ties map to existing cache attacks. The other 43 types are new and understand which timing attacks they are vulnerable to. without corresponding attacks in literature at that time. The existing work [11], however, was limited in how they 3.1. Assumptions and Threat Model model the caches. In particular, the work did not consider: The goal of the benchmarks is to evaluate timing-based vulner- read vs. write accesses, invalidation using flush instruction abilities in caches. The presented benchmarks are not actual vs. invalidation using cache coherence, multithreading, and security exploits, rather they implement memory-related op- multicore system, and the possibility that there are more than erations that correspond to all possible timing-based attacks. just one “fast” and one “slow” timing in memory-related oper- Each benchmark outputs whether there is a statistically signifi- ation. Also, they did not present any code or benchmarks for cant timing difference that the attacker could observe to extract realizing and testing for the possible types of vulnerabilities. information from the timing channel about the unknown ad- Our work improves the model of caches, presents a set of dress u of the victim. 88 Strong vulnerability types, including 32 types not in prior The current model focuses on all possible timing-based work [11], implements benchmarks for testing commodity attacks in the L1 data cache. The model includes uses of processors, and validates the theoretical analysis by running any memory-related operations (load, store, flush) and cache all possible three-step combinations to show no other types of coherence protocol. The model assumes a multi-core and timing differences (and thus vulnerabilities) exist. possibly hyper-threading processor, with a cache hierarchy of 2.4. Metrics for Vulnerabilities in Caches local and remote L1 cache, L2 cache, and a shared L3 cache (which is possibly divided into different cache slices). Previously, [55] leveraged mutual information to measure Current benchmarks do not consider timing-based attacks potential cache side-channel leakage. [20] modeled cache of other levels in cache hierarchy besides L1, but it should interference using probabilistic information flow graph. How- be straightforward to extend to the other levels. We do not ever, [55] and [20] only examined limited attacks including consider directory-related attacks [51] or attacks based on Evict + Time attack [36], Cache Collision attack [3], Bern- replacement policy [48], but it should be possible to model stein’s attack [2], Prime + Probe attack [36, 37], and Flush + these by adding more states to the model (and still keep an Reload attack [52]. What’s more, an analytical model was pro- only total of three steps). This work does not cover TLB posed in [13] to track the fraction of the victim’s critical items attacks [15, 21], but there is already a theoretical model for accessible in the cache to determine leakage. In a different TLBs [12], and similar benchmarks can be developed for TLB work, SVF [10] metric measured information leakage by mea- attacks (possibly merge with our benchmarks). suring the signal-to-noise ratio in an attacker’s observations. The work considers more than just “fast” and “slow” tim- CSV [56] metric used direct correlation, in place of phase ings (as was not done by [11]). This means that the influence correlation used by SVF, to measure leakage. The analytical of structures such as Miss Status Handling Registers (MSHRs),

3 3.2 Improved Modeling of Real Processors 3 MODELING FOR CACHE TIMING ATTACKS

100% 100% {1}. L1cl (116) {7}. L1di (115) {13}. DRAM (1084) {18}. L2rL2cl (159) {1}. L1cl (91) {7}. L1di (95) {13}. DRAM (1060) {18}. L2rL2cl (155) 80% {2}. L2cl (150) {8}. L2di (121) {14}. L1rL1cl (117) {19}. L2rL3cl (158) 80% {2}. L2cl (117) {8}. L2di (90) {14}. L1rL1cl (111) {19}. L2rL3cl (158) {3}. L3cl (364) {9}. L3di (164) {15}. L1rL2cl (117) {20}. L3rL1cl (366) {3}. L3cl (301) {9}. L3di (128) {15}. L1rL2cl (114) {20}. L3rL1cl (344) 60% {4}. rL1cl (134) {10}. rL1di (530) {16}. L1rL3cl (117) {21}. L3rL2cl (363) 60% {4}. rL1cl (99) {10}. rL1di (558) {16}. L1rL3cl (110) {21}. L3rL2cl (338) {5}. rL2cl (122) {11}. rL2di (519) {17}. L2rL1cl (157) {22}. L3rL3cl (375) {5}. rL2cl (116) {11}. rL2di (538) {17}. L2rL1cl (159) {22}. L3rL3cl (390) 40% {6}. rL3cl (118) {12}. rL3di (307) 40% {6}. rL3cl (119) {12}. rL3di (275)

20% 20%

0% 0% 0 100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500 1600 1700 1800 1900 2000 0 100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500 1600 1700 1800 1900 2000

(a) Timing of read access on Intel Xeon E5-1620 processor (b) Timing of read access on Intel Xeon E5-2690 processor

100% 100% {23}. L1cl (354) {29}. L1di (354) {35}. DRAM (455) {40}. L2rL2cl (624) {23}. L1cl (293) {29}. L1di (331) {35}. DRAM (361) {40}. L2rL2cl (641) {24}. L2cl (277) {30}. L2di (328) {36}. L1rL1cl (618) {41}. L2rL3cl (633) 80% {24}. L2cl (355) {30}. L2di (353) {36}. L1rL1cl (620) {41}. L2rL3cl (626) 80% {25}. L3cl (364) {31}. L3di (361) {37}. L1rL2cl (620) {42}. L3rL1cl (994) {25}. L3cl (278) {31}. L3di (280) {37}. L1rL2cl (649) {42}. L3rL1cl (847) 60% {26}. rL1cl (638) {32}. rL1di (698) {38}. L1rL3cl (629) {43}. L3rL2cl (902) 60% {26}. rL1cl (639) {32}. rL1di (730) {38}. L1rL3cl (600) {43}. L3rL2cl (873) {27}. rL2cl (645) {33}. rL2di (678) {39}. L2rL1cl (989) {44}. L3rL3cl (896) {27}. rL2cl (660) {33}. rL2di (693) {39}. L2rL1cl (607) {44}. L3rL3cl (987) 40% {28}. rL3cl (657) {34}. rL3di (484) 40% {28}. rL3cl (653) {34}. rL3di (446)

20% 20%

0% 0% 0 100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500 1600 1700 1800 1900 2000 0 100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500 1600 1700 1800 1900 2000

(c) Timing of write access on Intel Xeon E5-1620 processor (d) Timing of write access on Intel Xeon E5-2690 processor 100% 100% {45}. L1cl (1036) {51}. L1di (1814) {57}. DRAM (925) {62}. L2rL2cl (1118) {45}. L1cl (872) {51}. L1di (1767) {57}. DRAM (900) {62}. L2rL2cl (1045) 80% {46}. L2cl (985) {52}. L2di (1758) {58}. L1rL1cl (1032) {63}. L2rL3cl (1032) 80% {46}. L2cl (879) {52}. L2di (1585) {58}. L1rL1cl (1034) {63}. L2rL3cl (1050) {47}. L3cl (984) {53}. L3di (1595) {59}. L1rL2cl (1119) {64}. L3rL1cl (1035) {47}. L3cl (813) {53}. L3di (1310) {59}. L1rL2cl (1036) {64}. L3rL1cl (955) 60% {48}. rL1cl (986) {54}. rL1di (1644) {60}. L1rL3cl (1036) {65}. L3rL2cl (1119) 60% {48}. rL1cl (1002) {54}. rL1di (1630) {60}. L1rL3cl (1044) {65}. L3rL2cl (915) {49}. rL2cl (1068) {55}. rL2di (1617) {61}. L2rL1cl (1031) {66}. L3rL3cl (1030) {49}. rL2cl (1011) {55}. rL2di (1664) {61}. L2rL1cl (1053) {66}. L3rL3cl (1120) 40% {50}. rL3cl (988) {56}. rL3di (1295) 40% {50}. rL3cl (1005) {56}. rL3di (1229)

20% 20%

0% 0% 0 100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500 1600 1700 1800 1900 2000 0 100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500 1600 1700 1800 1900 2000

(e) Timing of flush operation on Intel Xeon E5-1620 processor (f) Timing of flush operation on Intel Xeon E5-2690 processor

Figure 1: Histograms of read, write, and flush operations (each has 8 operations) timing under all possible data movements considered in this work. The timing is for the timing observation step, i.e. Step 3, in the tested three-step patterns. Note, different processors have different timing, and not all different types of data movements can be distinguished on different processors. The data is presented for Intel Xeon E5-1620 (a, c, e) and Intel Xeon E5-2690 (b, d, f) processors. Numbers in the “{}” in the legend denote the different data movement types. {1} - {22} correspond to read operation, {23} - {44} correspond to write operation, {45} - {66} correspond to flush operation, to access clean L1 data, clean L2 data, clean L3 data, remote clean L1 data, remote clean L2 data, remote clean L3 data, dirty L1 data, dirty L2 data, dirty L3 data, remote dirty L1 data, remote dirty L2 data, remote dirty L3 data, DRAM data, clean data in both L1 and remote L1, clean data in both L1 and remote L2, clean data in both L1 and remote L3, clean data in both L2 and remote L1, clean data in both L2 and remote L2, clean data in both L2 and remote L3, clean data in both L3 and remote L1, clean data in both L3 and remote L2, clean data in both L3 and remote L3, respectively. Numbers in the “()” in the legend show the average cycles needed for completing that type of memory operation. The x axis shows the access latency in cycles. load and store buffers between processor and caches, and line- For each read, write or flush operation, it may target the fill buffers between cache levels are accounted for – however, data that is in the local L1 cache, L2 cache, or L3 cache slice, benchmarks for timing attacks that are just due to these struc- or that is in the remote L1 cache, L2 cache, or L3 cache slice. tures could likely be developed. The cache block can be either in a clean or dirty state for the The flush operation here refers to the clflush instruction in above 6 locations (6 ∗ 2 = 12 types). The clean data may also , which causes data to be flushed from all levels of caches be in both local or remote core, which can be in any cache back to main memory (including flushing data on other cores). hierarchy (L1, L2, or L3 cache) for both cores (3∗3 = 9 types). The timing measurements are from when each of the memory- Otherwise, the data is not in any level of the cache hierarchy, related operations is issued until the instruction commits. i.e., it is in the DRAM (1 type). We consider all these 66 timings (3 operations ∗ (12+9+1) = 66) to be different from 3.2. Improved Modeling of Real Processors each other and use these 66 types of timings in our three- step cache simulator, discusses in Section4, to determine if a We expand the model in [11] by considering more realistic three-step combination can be used in an attack. cases for a processor’s memory-related operation. Figure1 shows the histograms of these 66 types of timing Timing Observation on Local vs. Remote Core. Our observations for Intel Xeon E5-1620 and E5-2690 processors. cache attack model assumes a multi-core system and possibly Based on the histograms, we found that some operations are a hyper-threading system as well. We model such a system differentiable from each other, while some are not. In general, using two cores: a “local” and “remote” core, each with L1, L2 the timing is processor-specific, so we need to consider and and shared L3 caches. The target cache block is located in the examine all various cache timings and cannot just assume local core. Remote core affects the target cache block on the “fast” and “slow” timings as was done by [11]. local core by using cache coherence protocol. In particular, the Hyper-Threading vs. Time-Slicing. Our cache model remote core is used to perform write operations to invalidate further considers that each core supports hyper-threading. The the local core’s data due to the cache coherence protocol. victim and the attacker on one core can then either run in time-

4 4 DERIVATION OF ALL VULNERABILITIES

Types of slicing setting or run in parallel as two hyper-threads (if there Classification Reduction timing is hyper-threading support in the processor). For the case of Step Preliminary Strong Step observations Strong 88 66 Vulnerability 203 accesses on “local” vs. “remote” cores, the accesses on local Cache Reduction Vulnerability Exhaustive list Preliminary Weak and remote cores can be done in parallel. Three-Step Rules Weak 80 of all possible Vulnerability 624 Simulator Vulnerability Read (Load) Access vs. Write (Store) Access. For three-step Ineffective Three-Step combinations 4086 memory-access-related operations, our model considers that 4913 they can be either read (load) or write (store). The timing of writes is not well explored in attacks, except for one work [5]. Figure 2: The derivation process of all the Strong and Weak types In this work, the model considers both reads and writes. For of L1 cache timing-based vulnerabilities. example, for Flush + Reload attack, the previous attack [52] attacks belong to Weak type of vulnerabilities. Otherwise, if uses the load operation in the final step to reload secret data the timing is always the same regardless of different values of and observe timing. In our model, we also test store operation u, it will be an Ineffective three-step combination. in the final step to access secret data and reveal that attack with write in the final step is also possible, for example. 4.2. New Cache Three-Step Simulator Flush vs. Write Invalidation. In our model, we consider Figure2 shows the derivation process of vulnerabilities. We that a flush operation can be achieved by a cl f lush type instruc- write Python scripts to develop the cache three-step simulator. tion, that flushes data from all caches back to main memory. The simulator takes all 4913 combinations and 66 types of tim- Apart from that, our model considers the operation that can ing observations as input, checks and outputs the three-steps invalidate the local core’s cache line by writing the correspond- that belong to Strong, Weak and Ineffective types, respectively. ing line in the remote core, which will trigger cache coherence For the step that is Vu, the simulator checks the timing when Vu and result in the local cache line being invalidated. inv is Va, Vaalias , and VNIB. For the step that is Vu , the simulator V inv V inv V inv V inv 4. Derivation of All Vulnerabilities checks the timing when u is a , aalias , and NIB. The timing variance exists if different possible values of u corre- In this work, we build a new cache three-step simulator based spond to different timings of the 66 types. We enumerate all on the new model discussed in Section 3.2. It considers differ- possible operations (read/write for access, remote write/flush ent memory-related operations and differentiates among the for invalidation) for a step and consider different timings for 66 timing variations discussed in Section 3.2 that are related to each operation. Therefore, each three-step pattern may have L1 cache timing-based attack for the final timing observation several types of timing observations. While in [11], only two step. Further, we give categorizations of vulnerabilities to find timing differences (“fast” and “slow”) were considered when common features that attacks exploit. comparing the timing variation to identify effective vulnera- bilities. The rules from [11] to remove repeat and redundant 4.1. Judging the Effectiveness of Three-Step Combination three-step patterns are used for our work. In order for a three-step combination to be effective for an As shown in Figure2, based on the much finer-grained attack, at least the unknown victim’s address u should be categorization of timing differences, we derived in total 88 involved in one of the three steps since u is the unknown secret Strong effective vulnerabilities and 80 Weak effective vulner- the attacker tries to learn. In this case, the vulnerability will abilities after removing repeat three-step patterns. They are inv shown in Table2, where light-blue colored rows (in total 32 have Vu or Vu as one or more of the three-steps to represent the operations on the secret u. types) are the new vulnerabilities (compared to [11]) which Based on the 17 states shown in Table1, for the three-step we found through running of new cache three-step simula- model, the attacker tries to learn the value of u by guessing tor (16 types of the original Strong effective vulnerabilities if u equals: a, aalias or NIB. a denotes the address that is become Weak vulnerabilities when considering multi-core sys- within the set of sensitive locations x and maps to the target tems). We provide new names for the new attacks in Attack cache line. aalias denotes any data address that belongs to Strategy in Table2 while re-use existing names if the attacks sensitive locations x and also maps to the cache line but is not were presented before. As verified in Section8 through the a. Apart from all possible sensitive address mapping to the tested processors, there are no other effective vulnerabilities target cache line, u can possibly not map to the target cache except the types we derive in Table2, further strengthening line the attacker is measuring. We denote these addresses as our findings. alias NIB (not-in-block). Therefore, u can be either a, a , or NIB. 4.3. Categorizations of the Vulnerabilities If the attacker is able to unambiguously correlate the timing to one of the three values, he or she is able to learn the value of u We first categorize different vulnerabilities as based on internal and the corresponding three-steps is a Strong type vulnerability. (I) or external (E) interference. The types that only involve Meanwhile, if the attacker is not able to clearly distinguish the victim’s behavior, V, in the states of Step 2 and Step 3 are whether u is a, aalias, or NIB based on the timing, but there internal interference vulnerabilities (I). The remaining ones are still timing differences observed, then the corresponding are external interference (E) vulnerabilities.

5 4.3 Categorizations of the Vulnerabilities 4 DERIVATION OF ALL VULNERABILITIES

Attack Vulnerability Type Attack No. Vulnerability Type Type Attack No. Type Attack S1 S2 S3 Strategy S1 S2 S3 Strategy inv inv inv 1 A Vu Va I-A [3] 45 A Vu Va I-A new in [11] Cache inv inv inv 2 V Vu Va I-A [3] Cache 46 V Vu Va I-A new in [11] Collision Inv. inv Collision 47 inv V inv I-A [16] 3 Aa Vu Va I-A [3] Aa u Va inv 48 inv V inv I-A [16] Flush + 4 Va Vu Va I-A [3] Va u Va inv inv inv Flush 5 A Vu Aa E-A [52, 17, 40] 49 Aa Vu Aa E-A [16] a inv inv inv Flush 50 V Vu A E-A [16] 6 Va Vu Aa E-A [52, 17, 40] a a inv inv inv 7 A Vu Aa E-A [52, 17, 40] + Reload 51 A Vu Aa E-A new in [11] Flush + inv inv inv Reload Inv. 8 V Vu Aa E-A [52, 17, 40] 52 V Vu Aa E-A new in [11] inv inv A inv E A 9 V Aa Vu E-A new in [11] Reload 53 Vu a Vu - new in [11] Reload + u inv inv inv + Time 54 V Va V I-A new in [11] Time Inv. 10 Vu Va Vu I-A new in [11] u u inv 55 A inv inv E-A new in [11] 11 Aa V Aa E-A [41] a Vu Aa u inv inv Flush + 12 A V inv V I-A new in [11] Flush 56 Aa Vu Va I-A new in [11] a u a inv inv Probe Inv. 13 V inv A E-A new in [11] + Probe 57 Va Vu Aa E-A new in [11] a Vu a inv inv inv 58 Va V V I-A new in [11] 14 Va V Va I-A new in [11] u a u V inv inv E A 15 V Ainv V E-A new in [11] Flush 59 u Aa Vu - new in [11] Flush + u a u V inv inv I A Time Inv. 16 V inv V I-A new in [11] + Time 60 u Va Vu - new in [11] u Va u inv inv inv inv inv 61 A Vu Aa E-A new 17 A Vu Aa E-A new Cache inv inv inv Cache inv inv 62 A Vu Va I-A new 18 A Vu Va I-A new Coherence inv inv inv Coherence inv inv 63 V V A E-A new 19 V V Aa E-A new Flush u a Flush + u 64 V inv V inv V inv I-A new 20 V inv V inv V I-A new + Reload u a Reload Inv. u a 65 Ainv V inv Ainv E-SA new 21 Ainv V inv A E-SA new a u a a u a 66 Ainv V inv V inv I-SA new 22 Ainv V inv V I-SA new a u a a u a 67 V inv V inv Ainv E-SA new 23 V inv V inv A E-SA new a u a a u a Cache 68 V inv V inv V inv I-SA new Cache 24 V inv V inv V I-SA new a u a a u a Coherence 69 Ainv V inv Ainv E-S new Coherence 25 Ainv V inv A E-S new d u d Prime + d u d Prime 70 Ainv V inv V inv I-S new 26 Ainv V inv V I-S new d u d Probe Inv. d u d + Probe 71 V inv V inv Ainv E-S new 27 V inv V inv A E-S new d u d d u d 72 V inv V inv V inv I-S new 28 V inv V inv V I-S new d u d d u d 73 V inv Ainv V inv E-SA new 29 V inv Ainv V E-SA new u a u Cache u a u Cache 74 V inv V inv V inv I-SA new 30 V inv V inv V I-SA new u a u Coherence u a u Coherence 75 V inv Ainv V inv E-S new 31 V inv Ainv V E-S new u d u Evict + u d u Evict 76 V inv V inv V inv I-S new 32 V inv V inv V I-S new + Time u d u Time Inv. u d u 77 V V V inv I-SA new in [11] 33 V V V I-SA [2] u a u Bernstein’s u a u 78 V V V inv I-S new in [11] 34 V V V I-S [2] Bernstein’s u d u Inv. u d u 79 V V V inv I-S new in [11] 35 V V V I-S [2] Attack d u d Attack d u d V V inv I SA 36 V V V I-SA [2] 80 a u Va - new in [11] a u a inv 81 Vd Vu A E-S new in [11] Evict + 37 Vd Vu Ad E-S new in [11] Evict d 82 V V inv E-SA new in [11] Probe Inv. 38 Va Vu Aa E-SA new in [11] + Probe a u Aa inv 39 Ad Vu Vd I-S new in [11] Prime 83 Ad Vu Vd I-S new in [11] Prime + inv 40 Aa Vu Va I-SA new in [11] + Time 84 Aa Vu Va I-SA new in [11] Time Inv. inv 41 Vu Ad Vu E-S [36] Evict 85 Vu Ad Vu E-S new in [11] Evict + inv 42 Vu Aa Vu E-SA [36] + Time 86 Vu Aa Vu E-SA new in [11] Time Inv. inv 43 Ad Vu Ad E-S [36, 37, 18] Prime 87 Ad Vu Ad E-S new in [11] Prime + + Probe inv 44 Aa Vu Aa E-SA [36, 37, 18] 88 Aa Vu Aa E-SA new in [11] Probe Inv. (a) All the cache timing vulnerabilities with S3 as memory access operation. (b) All the cache timing vulnerabilities with S3 as invalidation operation.

Table 2: The table shows all the L1 cache timing-based vulnerabilities. The No. column assigns each type of vulnerability a number. The Vulnerability Type column shows the three steps that define each vulnerability. S# denotes Step#. The Type column proposes the categorization the vulnerability belongs to. “E” and “I” are for internal and external interference types, respectively. “S”, “A” and “SA” are set-based, address- based types and the types that are both set-based and address-based, respectively. The Attack column shows if a vulnerability has been previously presented in the literature. The Attack Strategy column gives a common name for each set of vulnerabilities that would be exploited in an attack in a similar manner. Inv. means invalidation. Light-blue colored rows are the vulnerabilities which are first presented in this work.

In [11, 55], the vulnerabilities are categorized as hit-based Therefore, we create new categorizations for the vulnerabil- and miss-based vulnerabilities, based on the cache behaviors ities. We categorize the vulnerabilities as address-based (A) if the attackers want to observe (cache misses or hits). This they are able to derive the complete address of u by observing definition does not fit our model since there are different types cache hit of u and obtaining different timing compared with of timings for L1 data hits and misses in the real machines. other candidate data. Set-based (S) vulnerabilities are the ones For example, attacks can derive information on accesses using that can know the mapped set of u by conflicting and generat- timing difference from two types of cache misses, which are ing eviction between u and candidate data addresses. The third not covered by the original categorizations. type are the ones that potentially derive information from set

6 5 BENCHMARK IMPLEMENTATION

or address (SA) depending on timing differences derived for not–“vulnerability not found”. Further, if any one of the cases all the candidates of u. For example, SA type #33 vulnerability of the vulnerability returns “vulnerability is found”, the pro- Vu Va Vu can be set-based if a and u are not the same but cessor cache is considered to be vulnerable to that attack type. map to the same cache set, which differs in timing between {2}{8}{24}{30}, a local L2 hit, and {1}{7}{23}{29}, a local 5.3. Timing Measurement L1 hit. Or it can be address-based if a maps to u and Step 1 We use rdtsc instruction in our benchmarks to do timing mea- (Vu) and Step 2 (Va) are accessed by different operations (read surements, which is the most effective method compared with or write), which have different timing between reads of L1 hardware performance counters, which may be limited [16] or clean data and dirty data, {1} and {7}, or writes of L1 clean lacking-determinism [9], or using a “counting” thread. AMD’s data and dirty data, {23} and {29}. rdtsc instruction is not as accurate as Intel machine’s, but there 5. Benchmark Implementation are plenty of works [1, 22, 21] showing that it is also able to be used for cache timing-based attacks. This section presents details of how the benchmarks for the Noise and variation in the timing measurements could fur- different vulnerabilities are implemented. ther result in false negatives (if the time measurement was not accurate enough to distinguish different timings of accesses) or 5.1. Benchmarks for Each Vulnerability false positives (if timing changes resulted in timing measure- For each vulnerability, there are three steps, and each step ment differences even though there is no timing difference). can be two possible cases: read or write access for a mem- To reduce the noise, instead of measuring just one cache ory access operation; flush or write in the remote core for block, we arbitrarily chose 8 cache blocks from different cache an invalidation-related operation. Thus, there are in total of sets to do operation on. Further, the measurements are all re- 23 = 8 cases considering different types of operations. Further, peated RUN_NUM times and collect statistical data. fence if the vulnerabilities have both the victim and the attacker run- instructions are added between each memory-related instruc- ning in one core, these two parties can run either time-slicing tion to enforce an ordering constraint for the attacks. or multi-threading. Based on that, one case may be doubled To reduce the variation of the timing among different cache for running in two settings. So for one vulnerability type, sets, the timing measurement of the last step is repeated for there are corresponding 8-16 cases depending on the specific each test if the last step is u-related step. Specifically, right vulnerability. In total, we found there are 1094 benchmarks after the third step’s timing measurement, we trigger and mea- for all 88 Strong type vulnerabilities. sure the timing of this step again, which is guaranteed to result Since each type of states have the same implementation and in an L1 cache hit timing or timing to invalidate the data that each three-step benchmark follows a similar pattern, we wrote is not in the caches, depending on the concrete memory opera- C scripts to automatically generate the C program for each of tions. We then compare the timing of the third step with the the 1094 benchmarks. repeated third step. This eliminates any variations in timing among different cache sets. 5.2. Judging A Three-Step Combination 5.4. Benchmark Code Example For a specific benchmark that implements one case of the three-step combinations, following the idea of the cache three- Figure3 shows an example pseudo code of #42 vulnerability step simulator in Section4, if the step is Vu, the benchmarks Vu Aa Vu’s benchmark for read (Vu), write (Aa), and write separately test the timing when Vu is Va, Vaalias , or VNIB. If the (Vu) access of the three steps and running hyper-threading. inv step is Vu , the benchmarks separately test the timing when First, we define probability bound of Welch’s t-test (line V inv V inv V inv V inv u is a , aalias , or NIB. The timing of the last step in the 1) and initialize a shared array (line 2-3) used by mutexes to three-step pattern is measured. For each of the cases, there is control the sequence of the three-step accesses. Then, the data RUN_NUM number of trials, and Welch’s t-test [46] is used (stored in the array) that will be accessed by the victim and the to distinguish the distributions of the measured timings. We attacker is loaded into the L1 cache (line 4), and consequently consider two distributions to be significantly different from possibly brought into L2 and L3 caches. We use fork() (line 6 each other if the probability of observing the data given that and line 19) to create sub-process, one for the victim and one they come from the same distribution is less than 0.05%. for the attacker in this example. Each remote and local victim For an effective vulnerability, one of the three candidates of and attacker will have one sub-process throughout the whole inv Vu (or Vu ) should generate timing distribution that is statisti- test. Each sub-process is assigned to a hardware thread (line cally different from the other two candidates. This is for the 8-9, line 21-22). When running hyper-threading, two local Strong vulnerability types which are 88 types in total. The 80 or two remote sub-processes are run in different hardware Weak vulnerability types are not currently considered in the threads of one CPU, if applicable. If running time-slicing, benchmarks but can be straightforward to add if needed. At sub-processes are assigned to one hardware thread. Within end of each benchmark run, the benchmark outputs if there each sub-process, the test will be run for a certain predefined was significant timing difference–“vulnerability is found”, or RUN_NUM (line 10 and line 23) times so the timing statis-

7 6 EVALUATION AND SECURITY DISCUSSION

1. #define LIM 0.0005 Table 3: Configurations of the experimental machines, which all 2. //mutex to sequence three-step operations 3. mutex = mmap(mutex_size, PROT_READ|PROT_WRITE, …) have 64B L1 cache line size. (1) denotes the number of hardware 4. init_array(arr); //initialize data array and load it into L1, L2, and L3 threads sharing one L1 cache; (2) denotes the number of hardware 5. mutex[0] = NOONE_RUN; threads per socket; (3) denotes the number of sockets. 6. if ((pid_l1=fork()) < 0) { exit(1); //fail to fork process} //local attacker Model L1-D L1-I L2 L3 7. else if (pid_l1==0){ (1) (2) (3) 8. CPU_SET(att_num, &mycpuset); Name Cache Cache Cache Cache 9. sched_setaffinity(getpid(), sizeof(cpu_set_t), &mycpuset); Intel Xeon 32KB, 32KB, 256KB, 10MB, 2 8 1 10. for (int m=0; m

8 6.1 Vulnerability Evaluation on Commodity CPUs 6 EVALUATION AND SECURITY DISCUSSION

Found in at least one tested CPU Found in all tested CPUs AMD EPYC 7571 AMD FX-8150 Intel Xeon P-8175 Intel Xeon E5-2686 Intel Core i5-4570 CPU Type Intel Xeon E5-2690 Intel Xeon E5-2667 inter-chip Intel Xeon E5-2667 on-chip Intel Xeon E5-1620

1 2 3 4 5 6 7 8 9 10111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788 vulnerabilities

Figure 4: Evaluation of 88 Strong types of vulnerabilities on different machines. A dot means the corresponding processor is vulnerable to the vulnerability type. Intel Xeon E5-2667 in our lab has two sockets. Therefore, the local and remote core can be both in one socket, i.e., run on-chip; or local and remote core can be in different sockets, i.e., run inter-chip.

Found in at least one tested CPU Found in all tested CPUs AMD EPYC 7571 AMD FX-8150 Intel Xeon P-8175 Intel Xeon E5-2686 Intel Core i5-4570 CPU Type Intel Xeon E5-2690 Intel Xeon E5-2667 inter-chip Intel Xeon E5-2667 on-chip Intel Xeon E5-1620 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 vulnerabilities

(a) #1 - #44 vulnerability testing results on different machines.

Found in at least one tested CPU Found in all tested CPUs AMD EPYC 7571 AMD FX-8150 Intel Xeon P-8175 Intel Xeon E5-2686 Intel Core i5-4570 CPU Type Intel Xeon E5-2690 Intel Xeon E5-2667 inter-chip Intel Xeon E5-2667 on-chip Intel Xeon E5-1620 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 vulnerabilities

(b) #45 - #88 vulnerability testing results on different machines.

Figure 5: Evaluation of 88 Strong types of vulnerabilities for all the cases. A dot means the corresponding processor is vulnerable to the vulnerability case. For each vulnerability, a fixed number of cases (see Section 5.1) are tested according to the vulnerability type. And there are in total 1094 cases for 88 Strong types of vulnerabilities. Amazon EC2. Table3 shows the processor configurations. strates that different machines are vulnerable to different types of attacks. The and result of 9 types of processor configuration 6.1. Vulnerability Evaluation on Commodity CPUs experiments have relatively small percentage of vulnerabili- ties to which machines are all vulnerable. We further list the We evaluated 88 Strong effective vulnerabilities shown in Ta- statistical results as CTVS for each machine in Section7. ble2. Figure4 lists the results of running the experiments when testing the 9 types of processor configurations upon 88 6.2. Analysis of Vulnerabilities Found effective vulnerabilities. For each type of processors, a dot showing up in the figure means that the machine is vulnerable Figure5 shows the results of benchmarks for the 88 vulnerabil- to this vulnerability. Apart from the 9 types of tested proces- ity types. Machines not supporting hyper-threading have much sors, the first line of dots in Figure4 show the or result of fewer effective cases. Similar to Figure4, the dot means the 9 types of processor configurations. The dot means that the related processor is vulnerable to the specific case. The gray vulnerability is found in at least one tested processor. The sec- vertical lines are used to group all the cases per vulnerability ond line of dots in Figure4 show the and result of 9 types of (there are thus 88 vertical bars and groupings). We further col- processor configurations. The dot means that the vulnerability lect the data in Figure5 and group them with different Step 3 is found in all tested processors. types as the timing observation steps in Table4 to compare Figure4 shows that 88 effective vulnerabilities are mostly effects of different operations on cache timing attacks. found in all the tested CPUs. Since our new cache three-step Local read and local write of timing observation step. simulator considers the ideal case where 66 types of timing Previous attacks normally used read access to implement the observations all have unique results, it outputs all the possible side-channel attacks, as analyzed in Section 3.2. However, vulnerability types. For commodity processors, a subset of write access is shown in Figure5 and Table4 to be an effective them are shown to be effective. This is due to the actual cache method to implement attacks as well. It has generally smaller implementation and timing measurement methods, and thus, rate compared with read access to trigger effective vulnerabili- some of the timing of the 66 types are not differentiable, as ties of different cases, especially for tested machine Intel Xeon can be seen from histograms in Figure1. Figure4 also demon- E5-1620 and E5-2690. For the 44 types of vulnerabilities (#1

9 7 CACHE TIMING VULNERABILITY SCORE

Table 4: Percentage of vulnerability cases that are effective for dif- Table 5: Percentage of vulnerability cases that are effective for the ferent types of timing observation steps for different machine config- victim (Vic.) and the attacker (Att.) running the same core (time- urations. The number on the right of “/” is the total cases of vulner- slicing or hyper-threading), running different cores or within the vic- abilities for the corresponding categorization; the number on the left tim for different machine configurations. The number on the right of of “/” is the number of cases to which the corresponding processor “/” is the total cases of vulnerabilities for the corresponding catego- is vulnerable. Machines labeled * do not support hyper-threading. rization; the number on the left of “/” is the number of cases to which the corresponding processor is vulnerable. Machines labeled * do Local Local Remote Flush to Model Name Read Write Write to Inv. Inv. not support hyper-threading. Intel Xeon E5-1620 137/277 118/277 127/277 129/263 Vic., Att. on the Same Core Vic., Att. on Intel Xeon E5-2667 Within 121/277 117/277 80/277 119/263 Model Name Time- Hyper- Different Victim on-chip Slicing Threading Cores Intel Xeon E5-2667 127/277 111/277 124/277 72/263 Intel Xeon E5-1620 181/390 174/390 51/90 105/224 inter-chip Intel Xeon E5-2667 Intel Xeon E5-2690 128/277 101/277 77/277 107/263 156/390 146/390 52/90 83/224 on-chip Intel Core i5-4570* 82/277 66/277 57/277 63/263 Intel Xeon E5-2667 Intel Xeon E5-2686* 87/277 74/277 69/277 80/263 151/390 146/390 49/90 88/224 inter-chip Intel Xeon P-8175 124/277 120/277 75/277 105/263 Intel Xeon E5-2690 144/390 138/390 46/90 85/224 AMD FX-8150* 68/277 65/277 89/277 65/263 Intel Core i5-4570* 143/390 0/390 46/90 79/224 AMD EPYC 7571 125/277 125/277 124/277 114/263 Intel Xeon 166/390 0/390 50/90 94/224 in all CPUs 49/277 34/277 10/277 30/263 E5-2686* at least one CPU 175/277 150/277 162/277 162/263 Intel Xeon P-8175 148/390 143/390 38/90 95/224 AMD FX-8150* 155/390 0/390 43/90 89/224 AMD EPYC 7571 159/390 171/390 55/90 103/224 in all CPUs 67/390 0/390 18/90 38/224 at least one CPU 223/390 217/390 61/90 148/224 - #44) that have access operation as timing observation step, Figure5 demonstrates that there are 38 out of 44 vulnerabili- ties to which at least one machine is vulnerable when using 7. Cache Timing Vulnerability Score read as the timing observation step. While using write access Table6 shows the CTVS values for each tested machine, as as the timing observation step, 34 out of 44 vulnerabilities are well as the and result and or result for machines. For CTVS vulnerable to at least one machine. number, smaller is better. For all the 88 Strong type vulnera- Invalidation using cache coherence or flush for timing bilities, AMD FX-8150 has relatively better CTVS compared observation step. According to Table4, the percentage of vul- with Intel machines. Xeon E5-1620 and P-8175 are the most nerabilities to which the machine is vulnerable mainly depends vulnerable ones among Intel processors. Otherwise, the Xeon on processor types when comparing different invalidation- family and AMD EPYC 7571 have generally similar results. related operation as the timing observation step. Among In Table6, CTVS numbers for the and result of all pro- the tested processors, Intel Xeon E5-2667 running inter-chip, cessors are small. This demonstrates that only a few attacks AMD FX-8150 and AMD EPYC are more vulnerable to re- can be effective in all the processors. These numbers are ex- mote write as the timing observation step. Intel Xeon E5- pected to be even smaller if more processors are tested. CTVS 1620, E5-2667 running on-chip, E5-2690, E5-2686, P-8175, numbers for the or result of all processors are large. This and Core i5-4570 are more vulnerable to flush observation confirms that nearly all of the vulnerabilities derived by the step. Overall, for the 44 types of vulnerabilities (#45 - #88) new three-step model are found in the tested processors. that have invalidation for timing observation, remote write A type vulnerabilities generally have higher effective rates and flush operations both have 38 out of 44 vulnerabilities to than S type vulnerabilities. This is because that S type vul- which at least one machine is vulnerable. nerabilities normally differentiate timing between L1 cache and L2 cache, i.e., accessing or invalidating L1 or L2 data, Running time-slicing or hyper-threading. Besides differ- which are shown in the histograms in Figure1 to be much ent kinds of operations, we also collect results in Table5 for smaller compared with the difference between L1 cache hit running time-slicing and hyper-threading when the victim and and DRAM hit, for example. Especially, the timing difference the attacker run on the same core (either local core or remote between remote write to invalidate dirty L1 data and L2 data core). There are also vulnerabilities for which the victim and is almost non-differentiable, resulting in that related vulnera- the attacker run on different cores, or vulnerabilities only hav- bilities (especially #25 - #28) are found to be not effective in ing victim steps. Based on the results, running time-slicing is all tested processors (shown in Figure4). A type vulnerabil- more vulnerable compared with running hyper-threading for ities generally rely on timing differences between L1 cache Intel processors. While AMD processor EPYC 7571 shows hit and DRAM hit, or L1 cache hit and remote L1 cache hit; that running time-slicing is more vulnerable. Furthermore, histograms in Figure1 demonstrate that these access types hyper-threading provides more choices for the attacker to ex- have large timing differences, making these vulnerabilities ploit the corresponding vulnerability in different ways. much more effective. SA type vulnerabilities generally lever-

10 7.1 for Design of Defense Features 9 CONCLUSION

Table 6: Cache Timing Vulnerability Score (CTVS) for each of the points to the requirement of using new or different perfor- tested processors. The number on the right of “/” is the total cases mance counter types in works that use monitoring. of vulnerabilities for the corresponding categorization; the number The CTVS can also be used to help understand the im- on the left of “/” is the number of types to which the corresponding plementation of different processors especially the micro- processor is vulnerable. Smaller is better. “E” and “I” are internal architectures. In particular, processors that have similar imple- and external interference vulnerabilities, respectively. “S” and “A” mentations may have similar results of CTVS numbers, based are set-based and address-based vulnerabilities, respectively. “SA” on common architectural features implemented for the whole are the ones that are both set-based and address-based. memory hierarchy. Model CTVS I-A I-S I-SA E-A E-S E-SA Name Score Vul. Vul. Vul. Vul. Vul. Vul. 8. Validation Intel Xeon 73/88 20/20 6/12 12/12 20/20 5/12 10/12 E5-1620 To validate if there are any other vulnerabilities that are left out Intel Xeon apart from all the effective vulnerabilities we derived from our E5-2667 66/88 20/20 3/12 11/12 20/20 3/12 9/12 cache three-step simulator, we empirically ran benchmarks for on-chip 3 = Intel Xeon all the cases of the whole 17 4913 three-step combinations E5-2667 64/88 19/20 2/12 12/12 20/20 3/12 8/12 for 9 processor configurations. inter-chip We discovered a number of three-steps, besides the Strong, Intel Xeon 62/88 20/20 1/12 10/12 20/20 2/12 9/12 Weak and repeat types, returned by the benchmarks to have E5-2690 Intel Core timing variations but consider all of them as false positives. 61/88 20/20 1/12 10/12 20/20 1/12 9/12 i5-4570 The false positives that show up in every processor we tested Intel Xeon inv inv 66/88 20/20 2/12 11/12 20/20 3/12 10/12 all have the second or the third step to be A , V or ?. The E5-2686 Intel Xeon corresponding types cannot be any effective vulnerabilities 73/88 20/20 5/12 12/12 20/20 5/12 11/12 P-8175 because these three types of states will make the attacker AMD 50/88 18/20 1/12 7/12 18/20 0/12 6/12 lose track of useful information due to whole cache flush FX-8150 inv inv AMD (A , V ) or zero-knowledge state inference (?) if they are EPYC 62/88 20/20 2/12 10/12 20/20 1/12 9/12 in Step 2 or Step 3. Reason of three-steps with the second 7571 or the third step as Ainv, V inv to seem to be effective in the in all 47/88 17/20 0/12 6/12 18/20 0/12 6/12 CPUs result of running the benchmarks is that whole cache flush at least currently cannot be implemented under user-level privilege. 79/88 20/20 9/12 12/12 20/20 7/12 11/12 one CPU We use approximate method to implement these states in the benchmark by invalidating every address that is related to the attacks. An approximate method is also used for ? to simulate age the timing differences between clean L1 data invalidation inv inv and dirty L1 data invalidation or between local access of re- the zero-knowledge state. Therefore, the timings of A , V mote clean L1 data and remote dirty L1 data; histograms in and ? are not accurate to measure for the real attack. Figure1 again show large timing differences for these, and Overall, we found that there are no effective vulnerabilities related vulnerabilities are found to be very effective by CTVS. that are not covered by the vulnerabilities we derived. Detailed analysis is not shown due to space limits of the paper. For I and E type vulnerabilities, they do not have an explicit distinction of CTVS numbers for the tested processors. 9. Conclusion 7.1. for Design of Defense Features This work presented a new three-step model and the first bench- mark suite for evaluating all 88 possible Strong cache timing- CTVS can be used to test existing processors, as we have based attack types in processors. The model allowed us to find shown in this work. Further, the benchmarks can be used 32 new timing attack types. Further, we implemented scripts to evaluate any new secure cache designs, by running the to auto-generate the 1094 benchmark tests from our three-step benchmarks on a simulator. model’s 88 theoretical attack types for testing different cases. CTVS has shown that different processors are vulnerable to The benchmarks were run on a number of commodity pro- different attacks. Consequently, customized software or hard- cessors to gave each machine the Cache Timing Vulnerability ware defenses can be deployed for each processor based on the Score (CTVS) to measure the degree of the machine’s robust- CTVS score, rather than defending vulnerabilities not present ness against cache timing-based vulnerabilities. The three-step in the specific processor’s caches. For software defenses, the model, benchmarks, and the CTVS can be used to measure access patterns from the benchmarks could be used as a ref- existing systems and help design future secure caches. erence for scanning software to find if it has similar patterns, All code related to this work will be made open-source. e.g., to find malicious software that has such attack patterns. Further, CTVS and our three-step model have shown new at- tack types which are unknown before, and thus not considered by defenses based on monitoring performance counters – this

11 REFERENCES REFERENCES

References [23] Mehmet Kayaalp, Khaled N Khasawneh, Hodjat Asghari Esfeden, Jesse Elwell, Nael Abu-Ghazaleh, Dmitry Ponomarev, and Aamer [1] Onur Acıiçmez and Çetin Kaya Koç. Trace-Driven Cache Attacks on Jaleel. RIC: Relaxed Inclusion Caches for Mitigating LLC Side- AES (short paper). In International Conference on Information and Channel attacks. In Design Automation Conference (DAC), 2017 54th Communications Security, pages 112–121. Springer, 2006. ACM/EDAC/IEEE, pages 1–6. IEEE, 2017. [2] Daniel J Bernstein. Cache-Timing Attacks on AES. 2005. [24] Georgios Keramidas, Alexandros Antonopoulos, Dimitrios N Serpanos, [3] Joseph Bonneau and Ilya Mironov. Cache-Collision Timing Attacks and Stefanos Kaxiras. Non Deterministic Caches: A Simple and against AES. In International Workshop on Cryptographic Hardware Effective Defense against Side Channel Attacks. Design Automation and Embedded Systems, pages 201–215. Springer, 2006. for Embedded Systems, 12(3):221–230, 2008. [25] Vladimir Kiriansky, Ilia Lebedev, Saman Amarasinghe, Srinivas De- [4] Thomas Bourgeat, Ilia Lebedev, Andrew Wright, Sizhuo Zhang, vadas, and Joel Emer. DAWG: A Defense Against Cache Timing Arvind, and Srinivas Devadas. MI6: Secure Enclaves in a Speculative Attacks in Speculative Execution Processors. Out-of-Order Processor. arXiv preprint arXiv:1812.09822, 2018. [26] Paul Kocher, Daniel Genkin, Daniel Gruss, Werner Haas, Mike [5] INTEL CORP. Speculative Store Bypass Bug CVE, 2018. CVE 2018- Hamburg, Moritz Lipp, Stefan Mangard, Thomas Prescher, Michael 3639. https://cve.mitre.org/cgi-bin/cvename.cgi?name= Schwarz, and Yuval Yarom. Spectre Attacks: Exploiting Speculative CVE-2018-3639, accessed online. Execution. ArXiv e-prints, January 2018. [6] Victor Costan, Ilia A Lebedev, and Srinivas Devadas. Sanctum: Mini- [27] Jingfei Kong, Onur Aciiçmez, Jean-Pierre Seifert, and Huiyang Zhou. mal Hardware Extensions for Strong Software Isolation. In USENIX Hardware-Software Integrated Approaches to Defend against Software Security Symposium, pages 857–874, 2016. Cache-Based Side Channel Attacks. In 2009 IEEE 15th International [7] Stephen Crane, Andrei Homescu, Stefan Brunthaler, Per Larsen, and Symposium on High Performance Computer Architecture, pages 393– Michael Franz. Thwarting Cache Side-Channel Attacks Through Dy- 404. IEEE, 2009. namic Software Diversity. In NDSS, pages 8–11, 2015. [28] Boris Köpf, Laurent Mauborgne, and Martín Ochoa. Automatic Quan- [8] Joan Daemen and Vincent Rijmen. AES Proposal: Rijndael. 1999. tification of Cache Side-Channels. In International Conference on Computer Aided Verification, pages 564–580. Springer, 2012. [9] Sanjeev Das, Jan Werner, Manos Antonakakis, Michalis Polychronakis, and Fabian Monrose. SoK: The Challenges, Pitfalls, and Perils of Using [29] Esmaeil Mohammadian Koruyeh, Khaled N Khasawneh, Chengyu Hardware Performance Counters for Security. In Proceedings of 40th Song, and Nael Abu-Ghazaleh. Spectre Returns! Speculation Attacks IEEE Symposium on Security and Privacy (S&P’19), 2019. Using the Return Stack Buffer. In 12th {USENIX} Workshop on Offensive Technologies ({WOOT} 18), 2018. [10] John Demme, Robert Martin, Adam Waksman, and Simha Sethumad- havan. Side-Channel Vulnerability Factor: A Metric for Measuring [30] Ruby B Lee, Peter Kwan, John P McGregor, Jeffrey Dwoskin, and Information Leakage. In 2012 39th Annual International Symposium Zhenghong Wang. Architecture for Protecting Critical Secrets in Mi- on Computer Architecture (ISCA), pages 106–117. IEEE, 2012. croprocessors. In ACM SIGARCH Computer Architecture News, vol- ume 33, pages 2–13. IEEE Computer Society, 2005. [11] Shuwen Deng, Wenjie Xiong, and Jakub Szefer. Analysis of Secure Caches using a Three-Step Model for Timing-Based Attacks. Cryptol- [31] Moritz Lipp, Michael Schwarz, Daniel Gruss, Thomas Prescher, ogy ePrint Archive, Report 2019/167, 2019. https://eprint.iacr. Werner Haas, Stefan Mangard, Paul Kocher, Daniel Genkin, Yuval org/2019/167. Yarom, and Mike Hamburg. Meltdown. ArXiv e-prints, January 2018. [12] Shuwen Deng, Wenjie Xiong, and Jakub Szefer. Secure TLBs. In [32] Fangfei Liu, Qian Ge, Yuval Yarom, Frank Mckeen, Carlos Rozas, Proceedings of the 46th International Symposium on Computer Ar- Gernot Heiser, and Ruby B Lee. CATalyst: Defeating Last-Level Cache chitecture, ISCA ’19, pages 346–259, New York, NY, USA, 2019. Side Channel Attacks in Cloud Computing. In High Performance ACM. Computer Architecture (HPCA), 2016 IEEE International Symposium [13] Leonid Domnitser, Nael Abu-Ghazaleh, and Dmitry Ponomarev. A on, pages 406–418. IEEE, 2016. Predictive Model for Cache-Based Side Channels in Multicore and [33] Fangfei Liu and Ruby B Lee. Random Fill Cache Architecture. In Mi- Multithreaded Microprocessors. In International Conference on Math- croarchitecture (MICRO), 2014 47th Annual IEEE/ACM International ematical Methods, Models, and Architectures for Computer Network Symposium on, pages 203–215. IEEE, 2014. Security, pages 70–85. Springer, 2010. [34] Fangfei Liu, Yuval Yarom, Qian Ge, Gernot Heiser, and Ruby B Lee. [14] Leonid Domnitser, Aamer Jaleel, Jason Loew, Nael Abu-Ghazaleh, and Last-Level Cache Side-Channel Attacks are Practical. In 2015 IEEE Dmitry Ponomarev. Non-Monopolizable Caches: Low-Complexity Symposium on Security and Privacy, pages 605–622. IEEE, 2015. Mitigation of Cache Side Channel Attacks. ACM Transactions on [35] Clémentine Maurice, Christoph Neumann, Olivier Heen, and Aurélien Architecture and Code Optimization (TACO), 8(4):35, 2012. Francillon. C5: Cross-Cores Cache Covert Channel. In International Conference on Detection of Intrusions and Malware, and Vulnerability [15] Ben Gras, Kaveh Razavi, Herbert Bos, and Cristiano Giuffrida. Trans- Assessment, pages 46–64. Springer, 2015. lation Leak-aside Buffer: Defeating Cache Side-channel Protections with TLB Attacks. In 27th {USENIX} Security Symposium ({USENIX} [36] Dag Arne Osvik, Adi Shamir, and Eran Tromer. Cache Attacks and Security 18), pages 955–972, 2018. Countermeasures: the Case of AES. In Cryptographers’ Track at the RSA Conference, pages 1–20. Springer, 2006. [16] Daniel Gruss, Clémentine Maurice, Klaus Wagner, and Stefan Mangard. [37] Colin Percival. Cache Missing for Fun and Profit, 2005. Flush+ Flush: a Fast and Stealthy Cache Attack. In International Conference on Detection of Intrusions and Malware, and Vulnerability [38] Moinuddin K Qureshi. CEASER: Mitigating Conflict-Based Cache Assessment, pages 279–299. Springer, 2016. Attacks via Encrypted-Address and Remapping. In 2018 51st Annual IEEE/ACM International Symposium on (MICRO), [17] Daniel Gruss, Raphael Spreitzer, and Stefan Mangard. Cache Template pages 775–787. IEEE, 2018. Attacks: Automating Attacks on Inclusive Last-Level Caches. In USENIX Security Symposium, pages 897–912, 2015. [39] Majid Sabbagh, Yunsi Fei, Thomas Wahl, and A Adam Ding. SCADET: A Side-Channel Attack Detection Tool for Tracking Prime-Probe. In [18] Roberto Guanciale, Hamed Nemati, Christoph Baumann, and Mads 2018 IEEE/ACM International Conference on Computer-Aided Design Dam. Cache Storage Channels: Alias-Driven Attacks and Verified (ICCAD), pages 1–8. IEEE, 2018. Countermeasures. In Security and Privacy (SP), 2016 IEEE Symposium [40] Michael Schwarz, Martin Schwarzl, Moritz Lipp, and Daniel Gruss. on, pages 38–55. IEEE, 2016. Netspectre: Read Arbitrary Memory over Network. arXiv preprint [19] David Gullasch, Endre Bangerter, and Stephan Krenn. Cache Games– arXiv:1807.10535, 2018. Bringing Access-Based Cache Attacks on AES to Practice. In Security [41] Caroline Trippel, Daniel Lustig, and Margaret Martonosi. Melt- and Privacy (SP), 2011 IEEE Symposium on, pages 490–505. IEEE, downprime and Spectreprime: Automatically-Synthesized Attacks 2011. Exploiting Invalidation-Based Coherence Protocols. arXiv preprint [20] Zecheng He and Ruby B Lee. How Secure is Your Cache against Side- arXiv:1802.03802, 2018. Channel Attacks? In International Symposium on Microarchitecture [42] Shuai Wang, Pei Wang, Xiao Liu, Danfeng Zhang, and Dinghao Wu. (MICRO), pages 341–353. ACM, 2017. CacheD: Identifying Cache-Based Timing Channels in Production [21] Ralf Hund, Carsten Willems, and Thorsten Holz. Practical Timing Software. In 26th {USENIX} Security Symposium ({USENIX} Security Side Channel Attacks Against Kernel Space ASLR. In 2013 IEEE 17), pages 235–252, 2017. Symposium on Security and Privacy, pages 191–205. IEEE, 2013. [43] Yao Wang, Andrew Ferraiuolo, Danfeng Zhang, Andrew C Myers, [22] Gorka Irazoqui, Thomas Eisenbarth, and Berk Sunar. Cross Processor and G Edward Suh. SecDCP: Secure Dynamic Cache Partitioning for Cache Attacks. In Proceedings of the 11th ACM on Asia conference on Efficient Timing Channel Protection. In Design Automation Conference computer and communications security, pages 353–364. ACM, 2016. (DAC), 2016 53nd ACM/EDAC/IEEE, pages 1–6. IEEE, 2016.

12 REFERENCES REFERENCES

[44] Zhenghong Wang and Ruby B Lee. New Cache Designs for Thwarting [51] Mengjia Yan, Read Sprabery, Bhargava Gopireddy, Christopher Software Cache-Based Side Channel Attacks. In ACM SIGARCH Fletcher, Roy Campbell, and Josep Torrellas. Attack Directories, Not Computer Architecture News, volume 35, pages 494–505. ACM, 2007. Caches: Side Channel Attacks in a Non-inclusive World. In Attack [45] Zhenghong Wang and Ruby B Lee. A Novel Cache Architecture Directories, Not Caches: Side Channel Attacks in a Non-Inclusive with Enhanced Performance and Security. In Microarchitecture, 2008. World, page 0. IEEE, 2019. MICRO-41. 2008 41st IEEE/ACM International Symposium on, pages 83–93. IEEE, 2008. [52] Yuval Yarom and Katrina Falkner. FLUSH+ RELOAD: A High Resolu- [46] Bernard L Welch. The Generalization of Student’s Problem When tion, Low Noise, L3 Cache Side-Channel Attack. In USENIX Security Several Different Population Variances are Involved. Biometrika, Symposium, pages 719–732, 2014. 34(1/2):28–35, 1947. [53] Danfeng Zhang, Aslan Askarov, and Andrew C Myers. Language- [47] Mario Werner, Thomas Unterluggauer, Lukas Giner, Michael Schwarz, Based Control and Mitigation of Timing Channels. ACM SIGPLAN Daniel Gruss, and Stefan Mangard. ScatterCache: Thwarting Cache Notices, 47(6):99–110, 2012. Attacks via Cache Set Randomization. In 28th USENIX Security Symposium (USENIX Security 19), Santa Clara, CA, 2019. USENIX [54] Danfeng Zhang, Yao Wang, G Edward Suh, and Andrew C Myers. A Association. Hardware Design Language for Timing-Sensitive Information-Flow [48] Wenjie Xiong and Jakub Szefer. Leaking Information Through Cache Security. In ACM SIGARCH Computer Architecture News, volume 43, LRU States. arXiv preprint arXiv:1905.08348, 2019. pages 503–516. ACM, 2015. [49] Mengjia Yan, Jiho Choi, Dimitrios Skarlatos, Adam Morrison, Christo- pher Fletcher, and Josep Torrellas. InvisiSpec: Making Speculative [55] Tianwei Zhang and Ruby B Lee. New Models of Cache Architec- Execution Invisible in the Cache Hierarchy. In 2018 51st Annual tures Characterizing Information Leakage from Cache Side Channels. IEEE/ACM International Symposium on Microarchitecture (MICRO), In Proceedings of the 30th Annual Computer Security Applications pages 428–441. IEEE, 2018. Conference, pages 96–105. ACM, 2014. [50] Mengjia Yan, Bhargava Gopireddy, Thomas Shull, and Josep Torrellas. Secure Hierarchy-Aware Cache Replacement Policy (SHARP): De- [56] Tianwei Zhang, Fangfei Liu, Si Chen, and Ruby B Lee. Side Channel fending Against Cache-Based Side Channel Attacks. In Proceedings of Vulnerability Metrics: the Promise and the Pitfalls. In Proceedings the 44th Annual International Symposium on Computer Architecture, of the 2nd International Workshop on Hardware and Architectural pages 347–360. ACM, 2017. Support for Security and Privacy, page 2. ACM, 2013.

13