Prime+Probe 1, JavaScript 0: Overcoming Browser-based Side-Channel Defenses Anatoly Shusterman Ayush Agarwal Sioli O’Connell Ben-Gurion Univ. of the Negev University of Michigan University of Adelaide [email protected] [email protected] [email protected]

Daniel Genkin Yossi Oren Yuval Yarom University of Michigan Ben-Gurion Univ. of the Negev University of Adelaide and Data61 [email protected] [email protected] [email protected] Abstract While traditionally such attacks were implemented using native code [7, 29, 49, 58, 60, 79, 80], recent works have The “eternal war in cache” has reached browsers, with mul- demonstrated that JavaScript code in browsers can also be tiple cache-based side-channel attacks and countermeasures used to launch such attacks [24, 30, 57, 69]. In an attempt being suggested. A common approach for countermeasures is to mitigate JavaScript-based side-channel leakage, browser to disable or restrict JavaScript features deemed essential for vendors have mainly focused on restricting the ability of an carrying out attacks. attacker to precisely measure time [15, 16, 84]. To assess the effectiveness of this approach, in this work Side-channel attackers, in turn, attempt to get around these we seek to identify those JavaScript features which are es- restrictions by creating makeshift timers with varying accu- sential for carrying out a cache-based attack. We develop racies through the exploitation of other browser APIs, such a sequence of attacks with progressively decreasing depen- as message passing or multithreading [42, 66, 72]. More re- dency on JavaScript features, culminating in the first browser- cently, Schwarz et al. [67] presented Chrome Zero, a Chrome based side-channel attack which is constructed entirely from extension that protects against JavaScript-based side-channels Cascading Style Sheets (CSS) and HTML, and works even by blocking or restricting parts of the JavaScript API com- when script execution is completely blocked. We then show monly used by side channel attackers, based on a user-selected that avoiding JavaScript features makes our techniques archi- protection policy. Going even further, DeterFox [14] aims to tecturally agnostic, resulting in microarchitectural website eliminate side-channel attacks by ensuring completely de- fingerprinting attacks that work across hardware platforms terministic JavaScript execution, and NoScript [51] prevents including Intel Core, AMD Ryzen, Samsung Exynos, and JavaScript-based attacks by completely disabling JavaScript. Apple M1 architectures. A common trend in these approaches is that they are symp- As a final contribution, we evaluate our techniques in hard- tomatic and fail to address the root cause of the leakage, ened browser environments including the browser, Deter- namely, the sharing of microarchitectural resources. Instead, Fox (Cao el al., CCS 2017), and Chrome Zero (Schwartz et most approaches attempt to prevent leakage by modifying al., NDSS 2018). We confirm that none of these approaches browser behavior, striking different balances between security completely defend against our attacks. We further argue that and usability. Thus, we ask the following question. the protections of Chrome Zero need to be more comprehen- sively applied, and that the performance and user experience What are the minimal features required for mounting mi- of Chrome Zero will be severely degraded if this approach is croarchitectural side-channel attacks in browsers? Can at- taken. tacks be mounted in highly-restricted browser environments, despite security-orientated API refinements? 1 Introduction Besides being influenced by defenses, microarchitectural attacks are also affected by an increased hardware diversifi- The rise in the importance of the in modern cation in consumer devices. While the market for high-end society has been accompanied by an increase in the sensitiv- processors used to be dominated by Intel, the past few years ity of the information the browser processes. Consequently, have seen an increase in popularity of other alternatives, such browsers have become targets of attacks aiming to extract as AMD’s Zen architecture, Samsung’s Exynos, and the re- or gain control of users’ private information. Beyond attacks cently launched Apple M1 cores. that target software vulnerabilities and attacks that attempt to Most microarchitectural attack techniques, however, are profile the device or the user via sensor APIs, browsers have inherently dependent on the specifics of the underlying CPU also been used as a platform for mounting microarchitectural hardware, and are typically demonstrated on Intel-based ma- side-channel attacks [22], which recover secrets by measuring chines. While microarchitectural attacks on non-Intel hard- the contention on microarchitectural CPU components. ware do exist [46, 85], these are also far from universal, and Countermeasure Chrome Zero Can Be Technique External Policy Level Bypassed? Requirements None None  Cache Contention [24, 57, 69] None Reduced timer resolution Medium  Sweep Counting [69] None No timers, no threads Paranoid  DNS Racing Non-Cooperating DNS server No timers, threads, or arrays —  String and Sock Cooperating WebSockets server JavaScript completely blocked —  CSS Prime+Probe Cooperating DNS server

Table 1: Summary of results: Prime+Probe Attacks can be Mounted Despite Strict Countermeasures are also highly tailored to their respective hardware platforms. attack implemented solely in CSS and HTML, yet is capable Thus, given the ever increasing microarchitectural diversifica- of achieving a high accuracy even when JavaScript is com- tion, we ask the following secondary question. pletely disabled. To the best of our knowledge, this is the first microarchitectural attack with such minimal requirements. Can microarchitectural side-channel attacks become architecturally-agnostic? In particular, are there universal Architecturally-Agnostic Side Channel Attacks. Next, side channel attacks that can be mounted effectively across we tackle the challenge of mounting side channel attacks diverse architectures, without requiring hardware-dependent across a large variety of computing architectures. We show modifications? that the reduced requirements of our techniques essentially make them architecturally-agnostic, allowing them to run on highly diverse architectures with little adaptation. Empirically 1.1 Our Contribution demonstrating this, we evaluate our attacks on AMD’s Ryzen, Samsung’s Exynos and Apple’s M1 architectures. Ironically, Tackling the first set of questions, in this paper we show that we show that our attacks are sometimes more effective on side channel attacks can be mounted in highly restricted these novel CPUs by Apple and Samsung compared to their browser environments, despite side-channel hardening of well-explored Intel counterparts, presumably due to their sim- large portions of JavaScript’s timing and memory APIs. More- pler cache replacement policies. over, we show that even if JavaScript is completely disabled, side-channel attacks are still possible, albeit with a lower Evaluating Existing Side Channel Protections. Having accuracy. We thus argue that completely preventing side chan- reduced the requirements for mounting side channel attacks nels in today’s browsers is nearly impossible, with leakage in browser contexts, we tackle the question of evaluating the prevention requiring more drastic design changes. security guarantees offered by existing API hardening tech- Next, tackling the second set of questions, we introduce niques. To that aim, we deploy Chrome Zero [67] and measure architecturally-agnostic side channel techniques, that can op- the attack accuracy in the presence of multiple security poli- erate on highly diverse architectures from different vendors. cies. We show that while disabling or modifying JavaScript Empirically evaluating this claim, we show side channel leak- features does attenuate published attacks, it does little to block age from browser environments running on AMD, Apple, attacks that do not require the disabled features. ARM and Intel architectures with virtually no hardware- As a secondary contribution, we find that there are sev- specific modifications. Notably, to the best of our knowledge, eral gaps in the protection offered by Chrome Zero, and that this is the first side-channel attack on Apple’s M1 CPU. fixing those adversely affects Chrome Zero’s usability and performance. This raises questions on the applicability of the Reducing Side Channel Requirements. We focus our in- approach suggested in [67] for protecting browsers. vestigation on website fingerprinting attacks [34]. In these attacks, an adversary attempts to breach the privacy of the Attacking Hardened Browsers. Having shown the effi- victim by finding out the websites that the victim visits. While cacy of our techniques in both Chrome and Chrome Zero initially these attacks relied on network traffic analysis, sev- environments, we also evaluate our attacks on several popular eral past works demonstrated that an attacker-controlled web- security-oriented browsers, such as the Tor Browser [71] and site running on the victim machine can determine the identity DeterFox [14]. Here, we show that attacks are still possible, of other websites the victim visits [6, 39, 53, 57, 74]. albeit at lower accuracy levels. To identify the set of JavaScript features required for cache Summary of Contribution. In summary, in this paper we attacks, we build on the work of [69]. We start from their make the following contributions: website fingerprinting attacks and design a sequence of new • We design three cache-based side-channel attacks on attacks, each requiring progressively less JavaScript features. browsers, under progressively more restrictive assumptions. Our process of progressively reducing JavaScript features cul- In particular, we demonstrate the first side-channel attack minates in CSS Prime+Probe, which is a microarchitectural in a browser that does not rely on JavaScript or any other mobile code (Section 3). Cache Occupancy. In the cache occupancy attack [54, 69], • We empirically demonstrate architecturally-agnostic side the attacker repeatedly accesses a cache-sized buffer while channel attacks, showing the first techniques that can handle measuring the access time. Because the buffer consumes the diverse architectures with little adaptation (Section 3.5). entire cache, the access time to the buffer correlates with the • We re-evaluate the JavaScript API-hardening approach victim’s memory activity. The cache occupancy attack is sim- taken by Chrome Zero, demonstrating significant limita- pler than Prime+Probe, and provides the attacker with less tions that affect security, usability, and performance (Sec- detailed spatial and temporal information. It is also less sensi- tion 5). tive to the clock resolution [69]. Sweep counting is a variant • We evaluate our attacks in multiple scenarios, including in of the cache occupancy attack, in which the adversary counts the restrictive environments of the Tor Browser and Deter- the number of times that the buffer can be accessed between Fox (Section 6). two clock ticks. The main advantage of this technique is that it can work with even lower-resolution clocks. 1.2 Responsible Disclosure 2.2 Defenses Following the practice of responsible disclosure, we have shared a draft of this paper with the product security teams of The root cause of microarchitectural side-channels is the shar- Intel, AMD, Apple, Chrome and prior to publication. ing of microarchitectural components across code executing in different protection domains. Hence, partitioning the state, 2 Background either spatially or temporally, can be effective in preventing attacks [23]. Partitioning can be done in hardware [19, 77] or 2.1 Microarchitectural Attacks by the operating system [40, 45, 50, 68]. Fuzzing or reducing the resolution of the clock are often To improve performance, modern processors typically exploit suggested as a countermeasure [16, 35, 73, 84]. However, the locality principle, which notes the tendency of software to these approaches are less effective against the cache occu- reuse the same set of resources within a short period of time. pancy attack, as it does not require high-resolution timers. Utilizing this, the processor maintains state that describes past Furthermore, these approaches only introduce uncorrelated program behavior, and uses it for predicting future behavior. noise to the channel and do not prevent leakage [17]. Microarchitectural Side Channels. The shared use of a Randomizing the cache architecture is another commonly processor, therefore, creates the opportunity for information suggested countermeasure [61, 77, 78]. These often aim to leakage between programs or security domains [22]. Leakage prevent eviction set creation. However, they are less effective could be via shared state [3, 32, 44, 80] or via contention against the cache occupancy attack, both because the attack on either the limited state storage space [27, 49, 58, 60] or does not require eviction sets and because these techniques the bandwidth of microarchitectural components [2, 10, 82]. do not change the overall cache pressure. Exploiting this leakage, multiple side-channel attacks have been presented, extracting cryptographic keys [2, 10, 11, 25, 2.3 The JavaScript Types and Inheritance 32, 49, 58, 60, 65, 80, 82], monitoring user behavior [29, 33, 57, 64, 69], and extracting other secret information [7, 36, 79]. JavaScript Typing. JavaScript is an object oriented language Side-channel attacks were shown to allow leaking between where every value is an object, excluding several basic prim- processes [32, 49, 58, 60, 80], web browser tabs [24, 57, 69], itive types. For object typing, JavaScript mostly uses “duck virtual machines [37, 49, 80, 86], and other security bound- typing”, where an object is considered to have a required aries [7, 18, 36, 44]. In this work we are mostly interested type as soon as it has the expected methods or properties. in the two attack techniques that target the limited storage in JavaScript deviates from this model for some built-in types, caching elements, mainly data caches. such as TypedArrays, which are arrays of primitive types. Prime+Probe. The Prime+Probe attack [49, 58, 60] exploits While JavaScript code mostly uses these built-in types equiva- the set-associative structure in modern caches. The attacker lently to objects, the JavaScript engine itself provides certain first creates an eviction set, which consists of multiple memory APIs that match the arguments against the required built-in locations that map to a single cache set. The attacker then types, raising exceptions if they mismatch. primes the cache by accessing the locations in the eviction set, JavaScript Inheritance. JavaScript uses a prototypal inher- filling the cache set with their contents. Finally, the attacker itance model, where each object can have a single prototype probes the cache by measuring the access time to the eviction object. When searching for a property of an object, JavaScript set. A long access time indicates that the victim has accessed first checks the object itself. If the property is not found on in memory locations that map to the same cache set, evicting the object, JavaScript proceeds to check its prototype, until it part of the attacker’s data, and therefore teaches the attacker either finds the property or reaches an object that has no pro- about the victim’s activity. totype. The list of prototypes used in this search is called the object’s prototype chain. Finally, when JavaScript modifies an measures the overall level of cache contention, obviating the object property, the prototype chain is not consulted. Instead, need to construct eviction sets. Finally, we adapt both tech- JavaScript sets the property on the object itself, creating it if niques to progressively more restrictive environments. The it does not already exist. specific assumptions on attackers’ capabilities appear in the respective sections (Sections 3.2 to 3.4). 2.4 Virtual Machine Layering The Cache Occupancy Channel. To measure the web page’s cache activity, we follow past works [54, 69] and use Virtual machine layering [43] is a low overhead technique for the cache occupancy channel. Specifically, we allocate an implementing function call interception. To intercept calls to LLC-sized buffer and measure the time to access the entire a particular function, the function is overwritten with a new buffer. The victim’s access to memory evicts the contents of function, in effect intercepting calls to the original function. our buffer from the cache, introducing delays for our access. To partially override the behavior of the original function, Thus, the time to access our buffer is roughly proportional to a reference to the original function is stored, and the desired the number of cache lines that the victim uses. behavior is delegated to it if needed. To prevent external ac- Compared with the Prime+Probe attack, the cache occu- cess to the original intercepted function, a JavaScript closure pancy channel does not provide any spatial information, mean- is used to store this reference. JavaScript closures create new ing that the attacker does not learn any information about the variable scopes, preventing code outside the closure from addresses accessed by the victim. Thus, it is less appropri- accessing references stored within the closure. ate for detailed cryptanalytic attacks which need to track the Virtual machine layering offers a significant advantage victim at the resolution of a single cache set. However,the over other techniques for guaranteeing that all calls to a given cache occupancy attack is simpler than Prime+Probe and in JavaScript function are intercepted. This is because virtual particular avoids the need to construct eviction sets. It also machine layering changes the definition of the function di- requires less accurate temporal information, on the order of rectly, automatically supporting the interception of function milliseconds instead of nanoseconds. Thus, cache occupancy calls from code generated at runtime. attacks are better suited to restricted environments, such as those considered in this section. 3 Overcoming Browser-based Defenses Sweep Counting. Sweep counting [69] is a variant of the ba- sic cache occupancy attack, with reduced temporal resolution. In this section we present several novel browser-based side- Here, rather then timing the traversal of a cache-sized buffer, channel techniques that are effective against increasing levels the attacker counts the number of sweeps across the buffer of browser defenses. More specifically, we present a series than fit within a time unit. While providing even less accu- of attacks that progressively require less JavaScript features, racy than cache occupancy, sweep counting remains effective culminating in CSS Prime+Probe– an attack that does not use when used with low-resolution timing sources (e.g., hundreds JavaScript at all and can work when JavaScript is completely of milliseconds). Just like the cache occupancy attack, sweep disabled. To the best of our knowledge, this is the first side- counting does not provide any spatial resolution. channel attack implemented solely with HTML and CSS, Closed World Evaluation. Using the channels we describe without the need of JavaScript. above, we collect memorygrams of visits to the Alexa Top 100 We evaluate the effectiveness of our techniques via website websites. We visit each site 100 times, each time collecting fingerprinting attacks in the Chrome browser, which aim to a memorygram that spans 30 seconds. We then evaluate the recover pages currently open on the target’s machine. Be- accuracy of our techniques in the closed-world model, where yond demonstrating accurate fingerprinting levels against the an adversary knows the list of 100 websites and attempts to Chrome browser, we show that our attacks are highly portable, guess which one is visited. Here, the base accuracy rate of a and are effective across several different micro-architectures: random guess is 1%, with any higher accuracy indicating the Intel x86, AMD Ryzen , Samsung Exynos 2100 (ARM), and presence of side-channel leakage in the collected traces. finally Apple M1. Evaluated Architectures. We demonstrate in the attacks described in this section on several different architectures 3.1 Methodology and Experimental Setup made by multiple hardware vendors. For Intel, we use sev- eral machines featuring an Intel Core i5-3470 CPU that has a We follow the methodology of Shusterman et al. [69], where 6 MiB last-level cache and 20 GiB memory. The machines are we collect memorygrams, or traces of cache use over the web running Windows 10 with Chrome version 78, and are con- site load time. We use these traces to train a deep neural net- nected via Ethernet to a university network. Next, for AMD, work model, which is then used to identify web sites based on we used six machines equipped with an AMD Ryzen 9 3900X the corresponding memorygrams. Similarly to [69], we mea- 12-Core Processor, which has a 4x16 MiB last-level cache sure cache activity using both the cache occupancy and sweep and 64 GiB memory. These machines were running Ubuntu counting methods (described below). Both of these methods 20.04 server with Chrome version 88.0, and were connected via Ethernet to a cloud provider network. For our ARM eval- 0.1 uation we used five Samsung Galaxy S21 5G mobile phones Local DNS over Ethernet 0.08 Local DNS over WiFi (SM-G991B), featuring an ARM-based Exynos 2100 CPU Cloudflare DNS over Ethernet with an 8 MiB last-level cache and 8 GiB memory. These 0.06 Cloudflare DNS over WiFi phones were running Android 11 with Chrome 88 and were 0.04 connected via Wi-Fi to a University network. Finally, for our evaluation on Apple, we used four Apple Mac Mini machines 0.02 Probability Density equipped with an Apple M1 CPU with a 12 MiB last-level 0 cache for performance cores and 4 MiB for efficiency cores. 0 20 40 60 80 100 The machines were equipped with 16 GiB memory and were Latency (ms) running MacOS Big Sur version 11.1 together with Chrome Figure 1: Measured response latencies when loading an image 88.0 for arm64. These machines were connected via Ethernet from a non-existent domain (local server). to a University network. Machine Learning Methodology. As a classifier we use a deep neural network model, with 10-fold cross validation. See remote DNS server for address resolution. The attacker then Appendix A for details. Following previous works [12, 55], starts the cache probe operation, creating a race between the we report both the most likely prediction of the classifier probe and the asynchronous report of the DNS error. When and the top 5 predictions, noting that the base accuracy for the asynchronous error handling function is called after name the top 5 results is 5% for the closed-world scenarios, and resolution fails, the attacker can determine whether the cache 34% for the open world. The collected data volume of all the probing operation was faster or slower than the network round- experiments is 27 GiB consisting of 40 datasets, where each trip time. Alternatively, when the DNS round-trip time is dataset takes about one week to collect, and each classifier large, the attacker can repeat the probe step, counting the takes on average 30 minutes to train on a cluster of Nvidia number of probes before the DNS error is reported. We note GTX1080 and GTX2080 GPUs. that the attack generates a large number of DNS requests. Such anomalous traffic may be detected by intrusion detection systems and blocked by the firewall. 3.2 DNS Racing For our first attack, DNS Racing, we assume a hypothetical 3.3 String and Sock JavaScript engine that does not provide any timer, neither through an explicit interface nor via repurposing JavaScript Another commonality feature of most microarchitectural at- features such as multithreading [42, 66]. tacks in browsers, including our DNS racing attack, is the DNS-based Time Measurement. Ogen et al. [56] observe use of arrays [24, 28, 47]. Consequently, the use of arrays that browsers behave very predictably when attempting to is often assumed essential for performing cache attacks in load a resource from a non-existent domain, waiting for ex- browsers and suggested countermeasures aim for hardening actly one network round-trip before returning an error. Thus, it arrays against side channels, while maintaining their func- is possible to create an external timer by setting the onerror tionality [67]. To refute this assumption, in this section we handler on an image whose URL points to a non-existent investigate a weaker attack model, in which the attacker can- domain. We evaluate this timer with a local DNS server and not use JavaScript arrays and similar data structures. with a remote Cloudflare DNS server, using both Ethernet and Exploiting Strings. Instead of using JavaScript arrays, our Wi-Fi connections. The results, depicted in Figure 1, show String and Sock attack uses operations on long HTML strings. that all the timers are fairly stable, with little jitter. Specifically, we initialize a very long string variable covering For an Ethernet connection to a local DNS server, the timer the entire cache. Then, to perform a cache contention mea- resolution is about 2 ms, which Shusterman et al. [69] report surement, we use the standard JavaScript indexOf() function is high enough for the basic cache occupancy channel. A local to search for a short substring in this long text. We make sure server over Wi-Fi gives a resolution of about 9 ms, and the that the substring we search for does not appear within the Cloudflare server provides a resolution of roughly 70 ms, for long string, thus ensuring that the search scans all of the long both Ethernet and Wi-Fi. While these resolutions are unlikely string. Because the length of the long string is the same as to be suitable for the basic cache occupancy attack, Shuster- the size of the LLC, the scan effectively probes the cache man et al. [69] show that sweep counting works well with the without using any JavaScript array object. To measure the 100 ms timer of the Tor Browser. duration of this probe operation, we take advantage of an Exploiting DNS for Cache Attacks. Figure 2a shows how external WebSockets [21] server controlled by the attacker. to use the DNS response as a timer. As illustrated in the figure, Socket-Based Time Measurement. Figure 2b shows how the attacker first sets the src attribute of an image to a non- the String and Sock method operates. The attacker first sends existent domain, in causing the operating system to access a a short packet to a cooperating WebSockets server. Next, the Web Page Innocent Web Page Malicious Web Page Malicious on Target DNS Server on Target WebSocket Server on Target DNS Server

Resolve Non-Existent Send Short Packet Log Start Resolve Domain Log Start Domain Time Time Search in Search in Probe Cache String String

NXDOMAIN Err Send Short Packet Log End Resolve Domain Log End Time Time (a) DNS Racing (b) String and Sock (c) CSS Prime+Probe

Figure 2: Interaction diagrams for attacks. attacker performs a string search operation which is known to difference between consecutive DNS requests corresponds fail. As this search scans the entire string before failing, it has to the time it takes to perform the string search, which as the side effect of probing the entire LLC cache. Finally, the described above is a proxy for cache contention. attacker sends a second short packet to the cooperating Web- CSS Prime+Probe Implementation. Figure 3 shows a code Sockets server. The server calculates the timing difference snippet implementing CSS Prime+Probe, using CSS Attribute between the first and second packets, arriving at an estimate Selectors to perform the attack. Specifically, Line 9 defines of the time taken to probe the cache. a div with a very long class name (two million characters). String and Sock in Chrome. We find that Chrome allocates This div contains a large number of other divs, each with its three bytes for each character. As we would like our string own ID (Lines 10–12). The page also defines a style for each to occupy the machines entire last level cache, we allocate of these internal divs (Lines3–5). Each of these matches different string lengths for each architecture considered in the IDs of the internal and external div, and uses an attribute this paper. In particular, we use 2 MiB strings for our Intel selector that searches for a substring in the external div. If machines that feature a 6 MiB LLCs, 3 MiB strings for our not found, the style rule sets the background image of the AMD machines (4x16 MiB LLCs), 1.5 MiB strings for our element some URL at an attacker-controlled domain. Samsung phones (8 MiB LLC), and 2 MiB strings for our Apple machines (12 MiB LLCs on performance cores). We 1 also note that Chrome caches results of recent searches. To 2 7 Our final attack, CSS Prime+Probe targets an even more 8 restricted setting, in which the browser does not support 9

JavaScript or any other scripting language, for example due 10
X< /div> 11 [...] to the NoScript extension [51]. CSS Prime+Probe only uses 12
X< /div> plain HTML and Cascading Style Sheets (CSS) to perform a 13
cache occupancy attack, without using JavaScript at all. 14 CSS Prime+Probe Overview. At a high level, CSS Prime+ Figure 3: Simplified version of CSS-based Prime+Probe. Probe builds on the String-and-Sock approach, and like it relies on string search for cache contention and an attacker- When rendering the page, the browser first tries to render controlled server for timing, see Figure 2c. Here, the at- the first internal div. For that, it performs a long search in the tacker first includes in the CSS an element from an attacker- class name, effectively probing the cache occupancy. Having controlled domain, forcing DNS resolution. The malicious not found the substring, it sets the background image of the DNS server logs the time of the incoming DNS request. The div, resulting in sending a request to the attacker’s DNS attacker then designs an HTML page that evokes a string server. The browser then proceeds to the next internal div. search from CSS, effectively probing the cache. This string As a result of rendering this page, the browser sends to the search is followed by a request for a CSS element that requires attacker a sequence of DNS requests, whose timing depends DNS resolution from the malicious server. Finally, the time on the cache contention. Top-1 Accuracy (%) Top-5 Accuracy (%) Intel AMD Ryzen 9 Apple Samsung Intel AMD Ryzen 9 Apple Samsung Attack Technique i5-3470 3900X M1 Exynos 2100 i5-3470 3900X M1 Exynos 2100 Cache Occupancy 87.5 69.1 89.7 84.5 97.0 91.4 97.8 95.3 Sweep Counting 45.8 54.9 90.5 69.7 74.3 82.9 98.1 91.5 DNS Racing 50.8 5.4 48.2 5.8 78.5 16.3 83.5 37.1 String and Sock 72.0 53.9 90.6 60.2 90.6 85.5 97.9 85.5 CSS Prime+Probe 50.1 — 15.7 — 78.6 — 32.6 —

Table 2: Closed-world accuracy (percent) across different microarchitectures.

Intel AMD Ryzen 9 Apple Samsung work using a wireless link, and the AMD devices, which were Attack Technique i5-3470 3900X M1 Exynos 2100 located in a third-party data center whose network conditions Cache Occupancy 2.9 ms 6.0 ms 6.3 ms 4.0 ms were beyond our direct control. We hypothesize that these net- Sweep Counting 100.0 ms 100.0 ms 100.0 ms 100.0 ms working circumstances led to jitter related to DNS responses, DNS Racing 20.3 ms 1.8 ms 7.2 ms 2.9 ms causing the severe loss of accuracy for these targets. String and Sock 1.5 ms 2.9 ms 2.6 ms 2.5 ms CSS Prime+Probe 0.3 ms 6.7 ms 0.3 ms 33.8 ms String and Sock. This is the first method which repur- poses the browser’s string-handling code for cache eviction. Table 3: Temporal accuracy of attack techniques across differ- Unlike the adversary-controlled code used for mounting the ent microarchitectures. cache occupancy attack described earlier, this third-party code naturally makes no attempt to trick the processor’s cache man- agement heuristics, and, as such, we expected it to have lower 3.5 Empirical Results performance than the JavaScript-based code. We now present the classification results of the attacks de- As we see, this was indeed the case for the Intel, AMD and scribed in this section across different CPU architectures. Samsung targets. The Apple M1 target, on the other hand, did Table 2 summarizes the accuracy of the most likely predic- not encounter a loss in accuracy. It seems that, on this target, tion of the classifier (Top-1), as well as the likelihood that naïvely accessing a large block of memory is an efficient way the correct answer is one of the top 5 results (Top-5). Finally, to evict the cache, and more advanced approaches for tricking Table 3 shows the temporal resolution of each measurement the processor’s prefetcher are not necessary. method, calculated as the time it takes to capture the entire CSS Prime+Probe. As CSS Prime+Probe requires no trace, divided by the number of points in the trace. JavaScript, we test this attack in the presence of the NoScript Cache Occupancy. This method uses JavaScript code both [51] extension, applying the countermeasure only to our at- to iterate over the eviction buffer, and to measure time. The tacker website. As our attack does not use JavaScript at all, JavaScript code goes iterates over the buffer using the tech- NoScript does nothing to prevent it. The accuracy we obtained nique of Osvik et al. [58] to avoid triggering the prefetcher, using this attack was comparable to the one obtained by the and is written to prevent speculative reordering from trigger- String and Sock attack, showing that there is no need for ing the timing measurement before the eviction is completed. JavaScript, or any other mobile code, to mount a successful As can be seen from the results, this approach provides good side-channel attack. accuracy on all of the targets we evaluated, obtaining a top-5 When running this attack on the Intel target, the accuracy is accuracy of over 90% across all platforms. similar to DNS racing, which uses JavaScript for cache evic- Sweep Counting. This method is designed for situations tions. On the M1 target, there was still a significant amount with lower clock resolution, but still uses JavaScript both for of data leaked by the attack, but the accuracy was less than cache eviction and for timing measurement. As the results the DNS racing attack. On the ARM and AMD targets, we show, this added limitation translates to a loss in accuracy for are unable at the present to extract any meaningful data using most targets, with the Apple M1 target the least affected by this method. As our CSS Prime+Probe also relies on DNS the reduced timer resolution. packets, we conjecture that this is due to the network condi- DNS Racing. This method uses JavaScript for cache evic- tions of the devices under test, or due to particular aspects tion, but switches to the network for timing measurements. of the micro-architecture of these devices which make cache This added limitation translates to a loss in accuracy for most eviction less reliable. targets, largely due to the added jitter of the network. The Architectural Agnosticism. As the results show, we were targets most severely affected by the added jitter were the able to mount our side-channel attack across a large variety ARM-based mobile phones, which were connected to the net- of diverse computing architectures. In particular, the Intel, AMD, ARM and Apple target architectures all incorporate 4.1 Closed World Evaluation on Newer Intel different design decisions concerning different cache sizes, Architectures cache coherency protocols and cache replacement policies, as well as related CPU front-end features such as the prefetcher. We begin by reproducing the closed world methodology The reduced requirements of our attack made it immediately and the results of Section 3 albeit on a newer Intel proces- applicable to all of these targets, with little to no tuning of sor. Specifically, we perform the experiments on an Apple the attack’s parameters, and without the need of per-device Macbook Pro featuring an Intel Core i5-7267 CPU with a microarchitectural reverse engineering. 4 MiB last-level cache, and 16 GiB memory, running macOS 10.15 and Chrome version 81. Despite the microarchitectural Attacking Apple’s M1 Architecture. To the best of our changes across 4 CPU generations and the different cache knowledge, this is the first side-channel attack on Apple’s M1 size, the results are very similar to those achieved on the older CPU. The memory and cache subsystem of this new architec- i5-3470 (72.0±1.3% for String and Sock and 50.1±2.3 for ture have never been studied in detail, leading one to hope for CSS Prime+Probe), with the difference being well inside the a “grace period” where attackers will find this target difficult statistical confidence levels. We thus argue that our results to conquer. As this work shows, the novelty and obscurity transfer across a verity of Intel architectures. of this new target do little to protect it from side-channel at- tacks. The M1 processor is rumored to toggle between two completely different memory ordering mechanisms, based 4.2 Open-World Evaluation on the program it is executing. Another noteworthy outcome from the M1 evaluation is that both the native arm64 binary of A common criticism of closed-world evaluations is that the Chrome, as well as the standard MacOS Intel x64 Chrome bi- attacker is assumed to know the complete set of websites nary running under emulation, were vulnerable to the attacks the victim might visit, allowing the attacker to prepare and we described here. train classifiers for these websites [38]. For a more realistic Finally, observing Table 2, it can be seen that our attacks scenario, we follow the methodology proposed by Panchenko are, somewhat ironically, more effective on M1 architecture, et al. [59] and perform an open-world evaluation, collecting than they are on other architectures, including the relatively 5000 traces of different websites used in [63], in addition to well studied Intel architecture. Intel x86 CPUs are known the Alexa Top 100 websites collected in the closed-world to have advanced cache replacement and prefetcher policies, setting. We use the same data collection setting as for the which are have been shown in other works to anticipate and closed-world collection. (See Section 4.1.) mitigate the effect of large memory workloads on cache per- Here, the attacker’s goal in this setting is to first detect if formance [8, 62, 76]. We hypothesize that the M1 architecture the victim visits one of the Alexa Top 100 sites, and secondly makes use of less advanced cache heuristics, and that, as a to identify the website if it is indeed in the list. We note that result, the simplistic memory sweeps our attack performs are in this case, a naive classifier can always claim that the site is more capable of flushing the entire cache on these devices not one of the Alexa Top 100, achieving a base rate of 30%, than they are on the Intel architecture. This in turn results in resulting in slightly higher accuracy scores for any classifier. a higher signal-to-noise ratio for the attack on these newer In this open-world setting, the String and Sock and CSS targets, and therefore in a higher overall accuracy. Prime+Probe attacks obtain accuracy results of 80% and 61%, respectively. The data in this setting is unbalanced – there are more traces from “other” web sites than from each of 4 Attack Scenarios the Alexa Top 100 sites. For such data, the F1 score may be more representative than accuracy. The F1 scores are 67% and We now turn our focus to a deeper investigation of the two 45%, for String and Sock and CSS Prime+Probe, respectively. new attacks we present, String and Sock and CSS Prime+ These are similar to those of the closed-world settings (70% Probe, on the Intel targets. Table 4 provides a summary of the and 48%). We can therefore conclude that our attacks are as results discussed in this section. effective in the open-world as in the closed-world setting.

Attack Scenario String and Sock CSS Prime+Probe 4.3 Robustness to Jitter Closed World 74.5±1.6 48.8±1.6 Open World 80.2±1.1 60.9±1.4 As DNS racing, String and Sock, and CSS Prime+Probe use Artificial Jitter 40.6±1.9 26.6±1.4 an external server for time measurement, these techniques are Tor Browser 19.5±8.7 — inherently sensitive to jitter naturally present on the network DeterFox — 65.7±1.2 between the victim and the web server. Measuring Network Jitter. We measure the network jit- Table 4: Attack accuracy (%) with 95% confidence intervals. ter in two scenarios. First, we perform a local measurement, where the target and an attacker-controlled WebSockets server 0.4 1.0 Top-1 Top-5 1.0 Top-1 Top-5 Top-1 Top-5 0.8 0.8 0.3

0.6 0.6 0.2 0.4 0.4 Accuracy Accuracy Accuracy 0.1 0.2 0.2

0.0 0.0 0.0 1 5 10 15 20 25 1 5 10 15 20 25 1 5 10 Added Jitter (msec) Added Jitter (msec) Added Jitter (msec) (a) String and Sock (b) CSS Prime+Probe (c) DNS Racing (note different scale) Figure 4: Attack classifiers performance with additional jitter. are located on the same institutional network at Ben Gurion the presence of increasing levels of browser hardening. University, Israel. Next, we also perform an inter-continental To that aim, we make use of Chrome Zero [67], a Chrome measurement, where the attacker is located in Israel, while the extension that supports per-website restrictions on JavaScript server is located in the United States (University of Michigan). browser API features. We begin by presenting an overview of Figure 5 shows the distribution of the jitter observed while Chrome Zero’s JavaScript implementation and security objec- sending 100 packets per second for 30 seconds to the Web- tives, focusing on a subset of Chrome Zero’s features which Sockets servers. We find that the jitter in the local network are relevant to this work. We next describe how we modified has a standard deviation of 0.17 ms, whereas the jitter to the Chrome Zero to offer more comprehensive protection, at the cross-continent server has standard deviation of 0.78 ms. cost of usability and performance. Finally, we show that even with these modifications, Chrome Zero is unable to offer side 0.18 Local LAN Server Cross-Continent Server channel protections against the techniques presented in this 0.16 paper. Unless stated otherwise, we use the current version at 0.14 * 0.12 Chrome Zero’s Git repository. 0.1 0.08 0.06 5.1 Chrome Zero Overview 0.04 Probability Density 0.02 Chrome Zero implements a list-based access control policy, which dictates actions to be taken when a website invokes -1 -0.5 0 0.5 1 a JavaScript function or accesses an object property. When Jitter (ms) an access is detected, Chrome Zero either allows the access, Figure 5: Measured Jitter of the WebSockets server response. modifies it, or completely blocks the access based on the policy chosen for the particular website.† Chrome Zero also Evaluating Robustness to Jitter. Having established the supports the option of asking the user about the action to take. typical jitter between the target and the external server, we Default Policies. Chrome Zero offers five preset protection now evaluate the robustness of our techniques to various lev- policies for the user to choose from: None, Low, Medium, High, els of jitter. To that aim, we artificially inject different amounts and Paranoid. ‡ As it progresses through protection policy of jitter to the closed-world dataset of Section 4.1. The jit- levels, Chrome Zero makes increasingly severe restrictions ter is injected by adding random noise to the timing of the on JavaScript capabilities and resources, including blocking monitored events. This noise is selected at random from a them altogether. Table 5 summarizes which capabilities and normal distribution with a mean zero and a standard deviation resources are available at each protection level. that varies from 1 to 25 milliseconds, with higher standard deviation corresponding to larger jitter. Performance. Schwarz et al. [67] claim that Chrome Zero As Figure 4 shows, both the String and Sock and the CSS blocks all of the building blocks required for successful side- Prime+Probe attacks still retain most of their accuracy even channel attacks, including high resolution timers, arrays and if the jitter is an order of magnitude larger than the ones we access to hardware sensors. Moreover, they claim that Chrome measured on a real network. We finally note that the DNS Zero prevents many known CVEs and 50 percent of zero-day Racing attack is more sensitive to added jitter, as it relies on a exploits published since chrome 49. Finally, Schwarz et al. binary race condition to determine timing. [67] benchmark Chrome Zero’s performance and perform a *https://github.com/IAIK/ChromeZero commit 5 Analysis of an API-based Defense fee8adc6c8fce9dd1ab62d7ff8f0697b44a188c1 †Chrome Zero currently only supports a global protection policy that can be changed but applies to all websites. Having established the efficacy of our techniques on various ‡The Chrome Zero extension uses the name “Tin Foil Hat” for Paranoid. microarchitectures, in this section we evaluate our attacks in We stick to the naming in Schwarz et al. [67]. Policy Level Low Medium High Paranoid Memory Addresses Buffer ASLR Array preloading Non-deterministic array Array index randomization Timer manipulation Ask User Low-resolution timestamp Fuzzy time Disabled Multithreading — Message delay WebWorker polyfill Disabled Shared Array Buffer — Slow SharedArrayBuffer Disabled Disabled Sensor API — Ask User Fixed Value Disabled

Table 5: Defense techniques used in each Chrome Zero Policy Level.

protection policy, Chrome Zero applies a “rounded floor” Client JavaScript code function, matching the 100 ms resolution of the Tor Browser. While this already prevents many attacks [66], higher reso- Benign Malicious JavaScript JavaScript lution timers may still be constructed [42, 66, 72]. Thus, at higher protection levels, instead of using a simple “rounded Chrome Zero floor” 100 ms timers, Chrome Zero follows the approach of Vattikonda et al. [73] and fuzzes the timer measurements by Benign adding random microsecond-level noise. Finally, at its highest JavaScript protection level, Chrome Zero disables timers altogether.

JavaScript engine(V8) Arrays. Schwarz et al. [67] identify that many side-channel attacks in browsers [24, 26, 28, 30, 57, 66] require some information about memory addresses. Typically, recovering Figure 6: High-level concept of Chrome Zero the page offset (least significant 12 of 21 bits of the address) facilitates the attacks. Using this information the attacker then usability study. They claim that Chrome Zero has an aver- analyzes the victim’s behavior, deducing information about its age overhead of 1.82% at the second-highest protection level control flow and internal data. Chrome Zero therefore applies (High) and that its presence is indistinguishable to users in 24 several mitigations to JavaScript array APIs. of Alexa’s Top 25 websites. More specifically, Chrome Zero’s second-highest protec- Chrome Zero’s Access Control Implementation. To en- tion level introduces array non-determinism, adding an access force security policies, Chrome Zero intercepts JavaScript to a random element for each array access. The idea is that API calls using Virtual Machine Layering. Specifically, the random accesses themselves force page faults, impeding Chrome Zero is implemented as JavaScript code that is in- the use of page faults as signals for page boundaries. Schwarz jected into a web page when upon initialization. This injected et al. [67] argue that this method prevents eviction set con- code wraps sensitive API functions, having the wrappers im- struction [24, 30, 57, 66, 81], as it interferes with the specific plement actions specified by Chrome Zero’s policy. Chrome sequences required to construct an eviction set, while adding Zero uses closures to ensure that the wrapper contains the noise to the timing information. only reference to the original API functions, thus ensuring Next, Chrome Zero further deploys the buffer ASLR policy, that websites do not trivially bypass its protection (Figure 6). which shifts the entire buffer by a random offset. This is Protecting Timers. Traditionally, microarchitectural side- achieved by intercepting the array constructors and access channel attacks rely on having access to a high-resolution methods. To prevent page alignment, Chrome Zero increases timer, e.g. to distinguish cache hits from cache misses. This the requested array size by 4 KiB, and associates a random includes attacks implemented in native code [3, 27, 29, 31, page offset with the array. On array access, Chrome Zero 49, 58, 60, 80, 82] as well as attacks in JavaScript run- adds the random offset to the requested array index, thereby ning inside the browser [24, 26, 57, 66]. As a countermea- shifting the access by the random offset. sure for such attacks, Chrome’s current implementation of Finally, to protect the offset from being discovered, Chrome performance.now() already reduces timer resolution from Zero attempts to use the additional accesses to random ele- nanoseconds to microseconds and introduces a small amount ments to pre-load all the array’s memory pages into the cache, of jitter. Although these mitigations protect against some high- thus preventing attackers from detecting page boundaries by resolution attacks [26, 57, 66], microsecond-accurate timers looking for array elements which have an increased access still provide sufficient resolution for other side-channel at- time due to page faults. tacks from within JavaScript [28, 30, 66, 70, 72]. Protecting Against Browser Exploits. While not being a To block attacks that exploit microsecond-accurate timers, primary goal of Chrome Zero, Schwarz et al. [67] argue that Chrome Zero employs two main strategies. At its Medium Chrome Zero is also capable of protecting users against some browser exploits. To validate their claim, they reproduced 1 let secureArray = new Array(10); 12 CVEs in the then-current Chrome JavaScript engine, and 2 let secureTimer = performance.now(); found that Chrome Zero prevents exploiting half of the CVEs. 3 4 let insecureArray = new Schwarz et al. [67] attribute this protection to the modification secureArray.__proto__.constructor(10); of JavaScript objects in Chrome Zero, which breaks the CVE 5 let insecureTimer = exploit code. performance.__proto__.now.call( performance);

5.2 API Coverage Figure 8: Bypassing Chrome Zero defenses using prototypes. As stated above, Chrome Zero is essentially an interception layer, which intercepts the critical JavaScript API calls and pass of Chrome Zero object protections, allowing the attacker subsequently directs them to the appropriate logic based on to create original non-proxied JavaScript objects. Lines 1 the current website and protection policy. Thus, to guaran- and2 show the standard ways of creating an array or get- tee security, it is critical to ensure that malicious JavaScript ting the timer, both protected by Chrome Zero. In contrast, code cannot access the original API or otherwise bypass the Lines 4 and5 show how to use prototypes to achieve the same Chrome Zero protections. functionality, bypassing Chrome Zero. Our investigation of Chrome Zero demonstrated that API Evaluating Chrome Zero’s CVE Protection. We also coverage in Chrome Zero leaves a lot to be desired. Specifi- evaluate Chrome Zero’s claimed protection against browser cally, we have identified multiple instances of APIs that are exploits. We first reproduce the results of Schwarz et al. [67] not protected by Chrome Zero. These include: finding that Chrome Zero prevents six of the 12 exploits they • Delayed Extension Initialization. The Chrome Zero ex- experiment with. We then extend the evaluation to CVEs tension initializes after the browser finishes constructing reported after the Chrome Zero publication and find that the Document Object Model (DOM) for the page. Conse- Chrome Zero blocks four of the 17 exploits we managed quently, Chrome Zero does not protect JavaScript objects to reproduce in Chrome. We then modify the exploits that created before the DOM is constructed. Chrome Zero blocks to use APIs that Chrome Zero fails to • Missed Contexts. Chrome Zero only applies its security protect, allowing the attacks to run unhindered. policies in the context of the topmost page in each browser We further note that Chrome Zero only protects incidental tab. It does not, however, protect code in sub-contexts of properties of the exploits rather than addressing the underly- the page, including worker threads and iframes. ing vulnerabilities. Specifically, we can easily modify many • Unprotected Prototype Chains. As we discuss in Sec- of the blocked exploits to avoid using features that Chrome tion 2.3, properties of global objects may be inherited from Zero protects. For the four exploits we cannot modify to by- their prototypes. Yet, while Chrome Zero does protect pass Chrome Zero, we find that the cause is that the use of global objects, it fails to protect their prototype chains, al- protected typed arrays prevents Chrome from compiling Web lowing attackers to access the original JavaScript API. Assembly [75, “read the imports”]. Since the Web Assembly Exploitation. We have exploited each of those omissions and compiler is not invoked, the browser remains protected. demonstrated complete bypass of Chrome Zero protections. In most cases, such bypasses are fairly trivial. As an example 5.3 Fixing and Re-evaluating Chrome Zero we show how we exploit unprotected prototype chains. Chrome Zero’s failure to protect all of the JavaScript API has implications beyond security. Unprotected objects do not Without CRZ prototype Array affect the usability or the performance of the browser. To evaluate the impact of the approach on usability and perfor- constructor Array mance, we fix Chrome Zero to improve its API coverage. new Array() Prototype Specifically, we set Chrome Zero to initialize before any other script executes and to also apply to frames. We further modify Protected Chrome Zero to apply its interception to protected objects With CRZ Array prototype and all the objects in their prototype chain. We do not protect Web Workers, hence our analysis below may still understate Figure 7: Object hierarchy with Chrome Zero. the impact on usability and performance. We further remove bypasses of array protections that apply to some hard-coded Figure 7 shows the object hierarchy for Array with Chrome websites. Specifically, Chrome Zero does not apply some § Zero (solid line) and without it (dotted line). The original un- array protections to YouTube and to Google Maps. protected Array class can be accessed using the Array con- §We note that without the bypass, YouTube does not play videos. We structor method of the prototype object. Figure 8 shows a by- could not find any indication of this bypass in Schwarz et al. [67], which we Finally, Schwarz et al. [67] argue that Chrome Zero offers all of the functions used by the JetStream 1.1 benchmark, and no noticeable impact on user experience while only having manually implement fixes for functions that perform strict a negligible performance cost. We test this claim with and type checking. We note that only the set and subarray meth- without our security fixes. ods for typed arrays need to be fixed, while all other parts of Experimental Setup. We use a ThinkPad P50 featuring an the JavaScript environment can remain unaltered. Intel Core i7-6820HQ CPU, with 16 GiB of memory, running Benchmarking Performance For performance benchmarks Ubuntu version 18.04, with a Chrome 80 browser without any we first try to reproduce the results of Schwarz et al. [67]. We extensions. We evaluate usability on Alexa’s Top 25 USA use the JetStream 1.1 benchmark to facilitate comparison with websites, checking for discernible differences in behavior. Schwarz et al. [67]. We find a slight performance impact of Usability Results. We first replicate the results of Schwarz 1.54% when using an unmodified Chrome Zero. However, et al. [67], finding that an unmodified Chrome Zero has no when ensuring that Chrome Zero applies its protections cor- discernible impact on the usability of websites. However, after rectly and applying the minimum level of fixes for strict type fixing the issues identified in Section 5, we observe a signifi- checking we observe a performance impact of 26% in the cant impact on the usability of websites. Even when setting latency benchmarks and 98% in the throughput benchmarks. Chrome Zero to the Low policy, less than half of the websites function without noticeable problems. At the a higher protec- 5.4 Bypassing Non-Deterministic Arrays tion level, High, only the websites for Wikipedia and eBay function properly. With the exception of speculative execution attacks [9, 13, Strict Type Checking. Investigating the difference in web- 41, 48], most microarchitectural side-channel attacks retrieve site usability between the original and modified Chrome Zero, information about memory access patterns performed by the we find that forcing Chrome Zero to apply its policies before victim. For a language such as JavaScript with no notion of document loading results in type mismatch exceptions while pointers or addresses, most attacks exploit the contiguous loading many JavaScript-enabled web sites. nature and predictable memory layout of arrays to reveal The cause of the issue is that as part of applying its policies, information about the least significant 12 or 21 bits of the Chrome Zero replaces any JavaScript object it protects with a addresses accesses by the victim [26, 30, 57, 66]. proxy that masquerades as the original object. Typically this To prevent this leakage, Chrome Zero’s second-highest pro- does not cause any problems due to JavaScript’s use of “duck tection level introduces array non-determinism, performing a typing”, since replacing objects with the corresponding proxy spurious access to a random array index whenever the script objects is transparent to most JavaScript code, as long as the accesses an array element. Chrome Zero further deploys the original object’s properties are all supported. However, the buffer ASLR policy, which shifts the entire buffer by a ran- W3C standard [20] dictates strict type checking for many in- dom offset, thereby preventing the attacker from obtaining ternal JavaScript functions, especially for typed array objects. page-aligned buffers. The main idea is to use the random In this case, passing a proxy object instead of the original ob- offset to deny the attacker from finding the array elements ject results in a type mismatch exception from the browser’s located on page boundaries. To protect the offset from being JavaScript engine, causing the website’s loading to fail. discovered, Chrome Zero attempts to use the additional ac- Unfortunately, fixing this issue turns out to be a non-trivial cesses to random elements in order to pre-load all the array’s problem, as a significant portion of the JavaScript environment memory pages into the cache, thus preventing the attacker is forced to strictly type check its inputs. This goes well from discovering the array elements which have an increased beyond the member functions of TypedArrays and includes accesses time due to page faults. diverse JavaScript libraries, such as, for example, the Web We now show how we can reliably recover the array ele- Crypto and Web Socket APIs. ments corresponding to page boundaries, despite Chrome Estimating Performance Impact. While we do not claim to Zero’s use of buffer ASLR, non-deterministic arrays, and know an efficient method of automatically solving this prob- fuzzy timers. lem for the entire JavaScript API, we can efficiently solve Array Implementation in Chrome. Unlike their C coun- the issue for specific functions through manual intervention, terparts, JavaScript arrays are quite flexible and can be ex- allowing us to benchmark the result. While we acknowledge tended [5], shrunk [4] and even have their type changed [52] that this does not produce a secure or even correct implemen- at run-time. While the W3C standards require browsers to tation, we argue that it nonetheless allows us to measure a support the extension and shrink APIs, the implementation of lower-bound of the performance impact that any JavaScript these capabilities is left entirely to the browser vendors. zero implementation must have. To that aim, we enumerate In Chrome’s V8 JavaScript engine, whenever an array is initialized, V8 allocates the memory required for the array, find odd given the use of YouTube in the usability evaluation. The Chrome along with an additional memory to support insertion of more Zero source code claims that the bypass is due to a bug in Chrome, however our root cause analysis shows that YouTube fails to play videos due to the elements in O(1) amortized time. However, after the addi- type mismatch we discuss in this section. tion of enough elements, memory reallocation is eventually needed. Hence V8 allocates a new chunk of memory which is Observing Figure 10, the time required to handle the ele- about 1.5× larger than the old one, and frees the old one after ment addition at the point of buffer exhaustion increases as copying the array’s content to the new location. The formula the size of the array grows. This is expected as more elements used by V8 to determine the size of the new memory buffer is need to be copied by V8 as the buffer grows. However, as the number of elements added to the array is attacker-controlled, new_size = size + size  1 + 16, (1) we can make Array.push take an arbitrary amount of time. We exploit this property to mount an attack against Chrome where  is a bit-wise shift-right operation. Zero’s Buffer ASLR policy despite Chrome Zero’s attempts at reducing the resolution of JavaScript timers. More specif- 1 let array = new Array(); ically, after a sufficient number of iterations of the loop in 2 let times = new Array(); 3 Line 4, the time taken to handle the re-allocation of array 4 for(let i=0; i<10000000;i++){ during the insertion of an additional element in Line 6 be- 5 let start = performance.now(); comes visible despite Chrome Zero’s low resolution timer. 6 array.push(0); 7 let delta = performance.now()- start; To deduce the buffer’s offset generated by Chrome Zero, we 8 times.push(delta); apply Chrome Zero’s buffer ASLR policy to Equation 1 to 9 } obtain the following equation.

Figure 9: Measuring Array.push timings new_size+offset = (size+offset)+(size+offset)  1+16. Attack Methodology. We begin by measuring the timings (2) of Array.push using the code presented in Figure 9. We start Observing the spikes in Figure 10, an attacker can detect when with an empty array array (Line 1). We then append data to the memory of array is exhausted. From that, to recover the the end of the array using the JavaScript Array.push method value of offset, we rearrange Equation 2 as (Line 6). On every such element addition we measure the time offset = 2 × new_size − 3 × size − 2 × 16, (3) taken to add an element (Lines 5 and7). While most of these additions are fast, at the point where the memory allocated for where size and new_size are the size’s of array before and the current size of array is exhausted, V8 performs additional after resizing. Finally, to detect resizing events, an attacker work by allocating new memory using Equation 1 and copying can observe spikes in Figure 10. Thus, Chrome Zero’s buffer the old content to the newly-allocated space. ASLR policy can be defeated using two sequential resizing events and applying Equation 3 to solve for offset.

5.5 Attacking Chrome Zero We now present the classification results of the attacks de- scribed in Section 3 across different Chrome Zero policies, starting with the closed-world scenario. Table 6 summarizes the accuracy of our technique, using the Intel i5-3470 setup outlines in Section 3.1. Cache Occupancy and Sweep Counting. As we can see, for the basic cache occupancy attack, Chrome Zero policies have varying impact on the attack accuracy. Low has some impact, but the accuracy is still high. Medium almost com- pletely blocks the attack, with the accuracy being slightly more than the base rate. Surprisingly, High is less effective Figure 10: Push timings with native Chrome (top), and with than the two lower policy levels, possibly because of its sim- Chrome Zero at High level (bottom). pler code design, resulting only in a slight decrease in the accuracy compared to no protection at all. For the sweep Figure 10 shows the insertion times for elements, using both counting attack, we see that the accuracy is lower than that a high resolution timer (top) and Chrome Zero’s fuzzy timer of the basic cache occupancy channel. However, the Medium (bottom). As can be seen, some array insertions are slower policy no longer breaks the attack. Furthermore, while lower than others. We verify that these additional time costs hap- than that of the cache occupancy attack, the accuracy is still pened at a point where the buffer allocated by V8 to support significantly higher than the base rate. Finally, because these the array array was exhausted, forcing V8 to allocate a new attacks require Worker threads, which are blocked in Paranoid, memory space using using Equation 1. they both fail in this policy. Temporal Top-1 Accuracy (%) Top-5 Accuracy (%) Attack Technique Resolution None Low Medium High Paranoid None Low Medium High Paranoid Cache Occupancy 2.9 ms 87.5 71.1 2.2 81.8 N/A 97.0 87.4 6.1 96.5 N/A Sweep Counting 100.0 ms 45.8 24.1 32.2 60.1 N/A 74.3 50.1 59.0 88.3 N/A DNS Racing 20.3 ms 50.8 20.9 61.1 37.2 16.2 78.5 48.9 86.0 67.7 40.1 String and Sock 1.5 ms 72.0 51.3 46.2 58.4 59.9 90.6 80.0 75.9 85.3 82.8 CSS Prime+Probe 2.8 ms (with the NoScript extension) 50.1 (with the NoScript extension) 78.6

Table 6: Closed-world accuracy (percent) with different API restriction levels (Intel i5-3470).

DNS Racing. The DNS Racing technique achieves a mod- the effectiveness of our techniques on two privacy enhanced erate accuracy in the range 20% to 61%. As expected for a browsers: Tor [71] and DeterFox [14]. technique that requires neither timers nor threads, the attack also works with Paranoid policy. 6.1 Attacking the Tor Browser String and Sock. The results with the String and Sock tend to be better than DNS Racing. In fact, the results tend to only The Tor Browser [71] is a highly-modified version of , be slightly inferior to those of the cache occupancy attack, designed to offer a high level of privacy even at the cost of despite not requiring timers, arrays, or threads. We further usability and performance. At a high level, the Tor Browser observe that because the attack uses no protected API, the combines two elements to achieve a higher level of protection various Chrome Zero policies have only a marginal effect on compared to other browsers. First, it hides the user’s browsing attack success. habits from network adversaries by using the Tor network as CSS Prime+Probe. As mentioned in Section 3.4, our CSS an underlying transport layer. Second, it provides a highly Prime+Probe technique does not require JavaScript and is ef- restrictive browser configuration, designed to limit or disable fective even if the attacker’s website is banned from executing convenience features that may have a security impact. In the any JavaScript code (e.g., due to the NoScript extension [51]). context of side channel attacks, the Tor Browser limits the In particular, Chrome Zero’s focus on JavaScript does not resolution of the timer API to only 100 milliseconds. effect our CSS Prime+Probe technique, leaving CSS Prime+ In this section we evaluate our attack techniques from Probe completely unmitigated. within the Tor Browser and demonstrate that they are pos- sible even within this restricted environment. We note that Discussion. Examining the results in Table 6, we see that Shusterman et al. [69] have already demonstrated the Sweep restricting browser APIs such as threads, timers, and array Counting attack in the Tor Browser. We extends that result, access can thwart the standard Cache Occupancy and Sweep demonstrating that making the environment more restrictive Counting attacks, and can significantly degrade the effective- by disabling JavaScript feature does not guarantee protection. ness of the DNS Racing attack. Nevertheless, the two remain- Negative Result: DNS Racing and CSS Prime+Probe. We ing attacks, String and Sock and CSS Prime+Probe, are not begin with a negative result, that the CSS Prime+Probe attack affected by this browser-based countermeasure, since they do we designed is not effective in the Tor Browser. The cause is not use any API which is receiving protection. While there is that for security reasons, the Tor Browser does not directly some variation in accuracy between the different protection resolve DNS requests. Instead, it asks a Tor exit relay to modes for String and Sock, this is likely due to the usability resolve the name on its behalf. This extra redirection step and site loading side-effects related to our fortified version of adds a very large delay to DNS requests, on the order of Chrome Zero, and not due to any intrinsic protection offered hundreds of milliseconds, as well as a high degree of jitter, the API limiting approach. We thus argue that preventing side well beyond what the attack can handle. This issue also affects channels in today’s browsers using API modifications is prac- the DNS Racing attack, making it inapplicable. tically impossible. Properly preventing leakage would require a more systematic approach which considers the sources of Adapting String and Sock to Tor. The String and Sock leakage, and not merely the means for measuring it. technique described in Section 3.3 uses a high bandwidth WebSockets connection to offload timing measurements to a remote server. Unfortunately, due to the high round-trip delay 6 Attacking Hardened Browsers of a Tor connection, the bandwidth available to a WebSockets connection over the Tor transport is significantly lower than Having established the feasibility of mounting cache side a connection made over a regular TCP transport. Effectively channel attacks while only having limited (or no) access to the connection operates in a stop-and-wait mode, buffering the JavaScript API, in this section we proceed to demonstrate outgoing packets as long as not all previously transmitted packets are acknowledged. This buffering removes the timing However, we note that our CSS Prime+Probe technique information that the attack needs. does not require any JavaScript, with the colluding DNS To avoid buffering, we reduce the communication of our server providing time measurement remotely. Thus, our tech- String and Sock attack by sending a probe packet only once niques effectively sidestep all of the side channel protections every n sweeps over the cache, instead of after every sweep. offered by DeterFox. To demonstrate the effectiveness of our We experimentally find that n = 72 provides the best accuracy. attacks on DeterFox, we collect one more dataset of 10,000 traces of Alexa Top 100 websites, using the CSS Prime+Probe 0.2 method while using DeterFox. As expected, DeterFox’s prov-

0.15 ably secure deterministic timing countermeasure did not pre- vent our attack, giving us a Top-1 accuracy of 66% and a 0.1 Top-5 accuracy of 88%.

0.05 Probability Density

0 0 50 100 150 200 250 300 7 Conclusion Probe latency (ms) Figure 11: String and Sock Probe latency distribution on Tor This paper shows that defending against JavaScript-based Browser using an Intel i5-3470 target (6MB LLC). side-channel attacks is more difficult than previously consid- ered. We show that advanced variants of the cache contention Observing the Distribution of Probe Times. Figure 11 attack allow Prime+Probe attacks to be mounted through the shows the probe time distribution using the Intel i5-3470 browser in extremely constrained situations. Cache attacks target. As the figure shows, there are three main elements to cannot be prevented by reduced timer resolution, by the abo- this distribution. First, we note a large subset of the probes lition of timers, threads, or arrays, or even by completely dis- have a fixed latency of around 120 ms. These are buffered by abling scripting support. This implies that any secret-bearing Tor’s network layer, as described above, and sent immediately process which shares cache resources with a browser connect- after all previously sent packets are acknowledged. Thus, ing to untrusted websites is potentially at risk of exposure. these packets do not measure contention of the cache, but We also show that the reduced requirements of our attack instead measure the round-trip delay of the Tor connection. make it agnostic across a variety of microarchitectures with Next, a large number of probes have a near-zero latency. These no modifications. This allows us to present the first end-to-end are packets which are sent together with other packets, and side-channel attack which targets Apple’s new M1 processors. similarly do not encode any cache information. The final So, how can security-conscious users access the web? One subset of the probes has a more diverse set of values, with an complicating factor to this concept is the fact that the web estimated mean of between 150 and 250 milliseconds. These browser makes use of additional shared resources beyond probes encode cache contention information. the cache, such as the operating system’s DNS resolver, the Website Fingerprinting. To demonstrate that these probes GPU and the network interface. Cache partitioning seems a indeed contain cache information, we collect a dataset of promising approach, either using spatial isolation based on 10,000 traces of Alexa Top 100 websites on the i5-3470 tar- cache coloring [40], or by OS-based temporal isolation [23]. get running Tor Browser, using our adapted String and Sock method described above. Using this data, we can correctly fingerprint websites, obtaining a Top-1 accuracy of 20% and Acknowledgements a Top-5 accuracy of 49%. Well above base rates of 1% and 5%, respectively. This demonstrates that completely eliminat- This work was supported the Air Force Office of Scientific Re- ing access to timer and array APIs in the Tor Browser does search (AFOSR) under award number FA9550-20-1-0425; an prevent cache attacks. ARC Discovery Early Career Researcher Award (project num- ber DE200101577); an ARC Discovery Project (project num- 6.2 Attacking DeterFox ber DP210102670); the Defense Advanced Research Projects Agency (DARPA) and Air Force Research Laboratory (AFRL) DeterFox is a Firefox fork aiming to provably prevent timing under contracts FA8750-19-C-0531 and HR001120C0087; attacks from within browser executed code [14]. Its authors Israel Science Foundation grants 702/16 and 703/16; the Na- argue that when using DeterFox, “an observer in a JavaScript tional Science Foundation under grant CNS-1954712; the reference frame will always obtain the same fixed timing in- Research Center for Cyber Security at Tel-Aviv University formation, so that timing attacks are prevented”. To achieve established by the State of Israel, the Prime Minister’s Office this, DeterFox splits its execution context into multiple de- and Tel-Aviv University; and gifts from Intel and AMD. terministic reference frames, and uses a priority-based event The authors thank Jamil Shusterman for his assistance in queue for communication between these reference. bringing up the measurement setup. References [18] Fergus Dall, Gabrielle De Micheli, Thomas Eisenbarth, Daniel Genkin, Nadia Heninger, Ahmad Moghimi, and Yuval Yarom. CacheQuote: [1] Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Efficiently recovering long-term secrets of SGX EPID via cache attacks. Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, IACR Trans. Cryptogr. Hardw. Embed. Syst., 2018(2):171–191, 2018. Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Ge- [19] Leonid Domnitser, Aamer Jaleel, Jason Loew, Nael B. Abu-Ghazaleh, offrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz and Dmitry Ponomarev. Non-monopolizable caches: Low-complexity Kaiser, Manjunath Kudlur, Josh Levenberg, Dandelion Mané, Rajat mitigation of cache side channel attacks. TACO, 8(4):35:1–35:21, 2012. Monga, Sherry Moore, Derek Murray, Chris Olah, Mike Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul [20] ECMA International. ECMAScript 2016 language specifica- Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Viégas, Oriol tion. https://www.ecma-international.org/ecma-262/7.0/ Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, and index.html, 2016. Xiaoqiang Zheng. TensorFlow: Large-scale machine learning on het- [21] I. Fette and A. Melnikov. The WebSocket protocol. RFC 6455, IETF, erogeneous systems, 2015. URL https://www.tensorflow.org/. December 2011. Software available from tensorflow.org. [22] Qian Ge, Yuval Yarom, David Cock, and Gernot Heiser. A survey of mi- [2] Onur Acıiçmez and Jean-Pierre Seifert. Cheap hardware parallelism croarchitectural timing attacks and countermeasures on contemporary implies cheap security. In FDTC. IEEE Computer Society, 2007. hardware. J. Cryptographic Engineering, 8(1):1–27, 2018. [3] Onur Acıiçmez, Çetin Kaya Koç, and Jean-Pierre Seifert. Predicting [23] Qian Ge, Yuval Yarom, Tom Chothia, and Gernot Heiser. Time protec- secret keys via branch prediction. In CT-RSA, pages 225–242, 2007. tion: The missing OS abstraction. In EuroSys, pages 1:1–1:17, 2019. [4] Array.prototype.pop. Array.prototype.pop(). https: [24] Daniel Genkin, Lev Pachmanov, Eran Tromer, and Yuval Yarom. Drive- //developer.mozilla.org/en-US/docs/Web/JavaScript/ by key-extraction cache attacks from portable code. In ACNS, pages Reference/Global_Objects/Array/pop, 2020. 83–102, 2018. [5] Array.prototype.push. Array.prototype.push(). https: [25] Daniel Genkin, Romain Poussier, Rui Qi Sim, Yuval Yarom, and Yuan- //developer.mozilla.org/en-US/docs/Web/JavaScript/ jing Zhao. Cache vs. key-dependency: Side channeling an implementa- Reference/Global_Objects/Array/push, 2020. tion of Pilsung. IACR Trans. Cryptogr. Hardw. Embed. Syst., 2020(1): [6] Jo M. Booth. Not so incognito: Exploiting resource-based side channels 231–255, 2020. in JavaScript engines. Bachelor thesis, Harvard, April 2015. [26] Ben Gras, Kaveh Razavi, Erik Bosman, Herbert Bos, and Cristiano [7] Ferdinand Brasser, Urs Müller, Alexandra Dmitrienko, Kari Kostiainen, Giuffrida. ASLR on the line: Practical cache attacks on the MMU. In Srdjan Capkun, and Ahmad-Reza Sadeghi. Software grand exposure: NDSS, 2017. SGX cache attacks are practical. In WOOT, 2017. [27] Ben Gras, Kaveh Razavi, Herbert Bos, and Cristiano Giuffrida. Trans- [8] Samira Briongos, Pedro Malagón, José Manuel Moya, and Thomas lation leak-aside buffer: Defeating cache side-channel protections with Eisenbarth. Reload+Refresh: abusing cache replacement policies to TLB attacks. In USENIX Security, pages 955–972, 2018. perform stealthy cache attacks. In USENIX Security, pages 1967–1984, [28] Daniel Gruss, David Bidner, and Stefan Mangard. Practical memory 2020. deduplication attacks in sandboxed JavaScript. In ESORICS, pages [9] Jo Van Bulck, Marina Minkin, Ofir Weisse, Daniel Genkin, Baris 108–122, 2015. Kasikci, Frank Piessens, Mark Silberstein, Thomas F. Wenisch, Yu- [29] Daniel Gruss, Raphael Spreitzer, and Stefan Mangard. Cache template val Yarom, and Raoul Strackx. Foreshadow: Extracting the keys to the attacks: Automating attacks on inclusive last-level caches. In USENIX intel SGX kingdom with transient out-of-order execution. In USENIX Security, pages 897–912, 2015. Security, pages 991–1008, 2018. [30] Daniel Gruss, Clémentine Maurice, and Stefan Mangard. Rowham- [10] Alejandro Cabrera Aldaya, Billy Bob Brumley, Sohaib ul Hassan, Cesar mer.js: A remote software-induced fault attack in JavaScript. In DIMVA, Pereida García, and Nicola Tuveri. Port contention for fun and profit. pages 300–321, 2016. In IEEE SP, pages 870–887, 2019. [31] Daniel Gruss, Clémentine Maurice, Klaus Wagner, and Stefan Mangard. [11] Alejandro Cabrera Aldaya, Cesar Pereida García, Luis Manuel Al- Flush+Flush: A fast and stealthy cache attack. In DIMVA, pages 279– varez Tapia, and Billy Bob Brumley. Cache-timing attacks on RSA 299, 2016. key generation. IACR Trans. Cryptogr. Hardw. Embed. Syst., 2019(4): 213–242, 2019. [32] David Gullasch, Endre Bangerter, and Stephan Krenn. Cache games – bringing access-based cache attacks on AES to practice. In IEEE SP, [12] Aylin Caliskan-Islam, Richard Harang, Andrew Liu, Arvind Narayanan, pages 490–505, 2011. Clare Voss, Fabian Yamaguchi, and Rachel Greenstadt. De- anonymizing programmers via code stylometry. In USENIX Sec, pages [33] Berk Gülmezoglu, Andreas Zankl, M. Caner Tol, Saad Islam, Thomas 255–270, 2015. Eisenbarth, and Berk Sunar. Undermining user privacy on mobile devices using AI. In AsiaCCS, pages 214–227, 2019. [13] Claudio Canella, Jo Van Bulck, Michael Schwarz, Moritz Lipp, Ben- jamin von Berg, Philipp Ortner, Frank Piessens, Dmitry Evtyushkin, [34] Andrew Hintz. Fingerprinting websites using traffic analysis. In and Daniel Gruss. A systematic evaluation of transient execution at- Privacy Enhancing Technologies, 2002. tacks and defenses. In USENIX Security, pages 249–266, 2019. [35] Wei-Ming Hu. Reducing timing channels with fuzzy time. In IEEE [14] Yinzhi Cao, Zhanhao Chen, Song Li, and Shujiang Wu. Deterministic SP, pages 8–20, 1991. browser. In CCS, pages 163–178, 2017. [36] Ralf Hund, Carsten Willems, and Thorsten Holz. Practical timing [15] Alex Christensen. Reduce resolution of performance.now. side channel attacks against kernel space ASLR. In IEEE SP, pages https://developer.mozilla.org/en-US/docs/Web/API/ 191–205, 2013. Performance/now, 2015. [37] Mehmet Sinan Inci, Berk Gülmezoglu, Gorka Irazoqui, Thomas Eisen- [16] Chromium Project. window.performance.now does not support sub- barth, and Berk Sunar. Cache attacks enable bulk key recovery on the millisecond precision on Windows. https://bugs.chromium.org/ cloud. In CHES, pages 368–388, 2016. p/chromium/issues/detail?id=158234#c110, 2016. [38] Marc Juárez, Sadia Afroz, Gunes Acar, Claudia Díaz, and Rachel Green- [17] David Cock, Qian Ge, Toby C. Murray, and Gernot Heiser. The last stadt. A critical evaluation of website fingerprinting attacks. In Gail- mile: An empirical study of timing channels on seL4. In CCS, pages Joon Ahn, Moti Yung, and Ninghui Li, editors, CCS, pages 263–274, 570–581, 2014. 2014. [39] Hyungsub Kim, Sangho Lee, and Jong Kim. Inferring browser activity [59] Andriy Panchenko, Lukas Niessen, Andreas Zinnen, and Thomas Engel. and status through remote monitoring of storage usage. In ACSAC, Website fingerprinting in onion routing based anonymization networks. 2016. In Yan Chen and Jaideep Vaidya, editors, WPES, pages 103–114, 2011. [40] Taesoo Kim, Marcus Peinado, and Gloria Mainar-Ruiz. STEALTH- [60] Colin Percival. Cache missing for fun and profit. In BSDCan 2005, MEM: system-level protection against cache-based side channel attacks 2005. URL http://css.csail.mit.edu/6.858/2014/readings/ in the cloud. In USENIX Security Symposium, pages 189–204. USENIX ht-cache.pdf. Association, 2012. [61] Moinuddin K. Qureshi. CEASER: mitigating conflict-based cache [41] Paul Kocher, Jann Horn, Anders Fogh, Daniel Genkin, Daniel Gruss, attacks via encrypted-address and remapping. In MICRO, pages 775– Werner Haas, Mike Hamburg, Moritz Lipp, Stefan Mangard, Thomas 787, 2018. Prescher, Michael Schwarz, and Yuval Yarom. Spectre attacks: Exploit- [62] Moinuddin K. Qureshi, Aamer Jaleel, Yale N. Patt, Simon C. Steely ing speculative execution. In IEEE SP, pages 1–19, 2019. Jr., and Joel S. Emer. Set-dueling-controlled adaptive insertion for [42] David Kohlbrenner and Hovav Shacham. Trusted browsers for uncer- high-performance caching. IEEE Micro, 28(1):91–98, 2008. tain times. In USENIX Sec, pages 463–480, 2016. [63] Vera Rimmer, Davy Preuveneers, Marc Juárez, Tom van Goethem, [43] Erick Lavoie, Bruno Dufour, and Marc Feeley. Portable and efficient and Wouter Joosen. Automated website fingerprinting through deep run-time monitoring of JavaScript applications using virtual machine learning. In NDSS, 2018. layering. In ECOOP 2014, pages 541–566, 2014. [64] Thomas Ristenpart, Eran Tromer, Hovav Shacham, and Stefan Savage. [44] Sangho Lee, Ming-Wei Shih, Prasun Gera, Taesoo Kim, Hyesoon Kim, Hey, you, get off of my cloud: exploring information leakage in third- and Marcus Peinado. Inferring fine-grained control flow inside SGX party compute clouds. In CCS, pages 199–212, 2009. enclaves with branch shadowing. In USENIX Security, pages 557–574, [65] Eyal Ronen, Robert Gillham, Daniel Genkin, Adi Shamir, David Wong, 2017. and Yuval Yarom. The 9 lives of Bleichenbacher’s CAT: new cache [45] Jochen Liedtke, Hermann Härtig, and Michael Hohmuth. OS-controlled attacks on TLS implementations. In IEEE SP, pages 435–452, 2019. cache predictability for real-time systems. In RTAS, pages 213–224, [66] Michael Schwarz, Clémentine Maurice, Daniel Gruss, and Stefan Man- 1997. gard. Fantastic timers and where to find them: High-resolution microar- [46] Moritz Lipp, Daniel Gruss, Raphael Spreitzer, Clémentine Maurice, chitectural attacks in JavaScript. In Financial Cryptography and Data and Stefan Mangard. ARMageddon: Cache attacks on mobile devices. Security, pages 247–267, 2017. In USENIX Security, pages 549–564, 2016. [67] Michael Schwarz, Moritz Lipp, and Daniel Gruss. JavaScript Zero: [47] Moritz Lipp, Daniel Gruss, Michael Schwarz, David Bidner, Clémen- Real JavaScript and zero side-channel attacks. In NDSS, 2018. tine Maurice, and Stefan Mangard. Practical keystroke timing attacks [68] Jicheng Shi, Xiang Song, Haibo Chen, and Binyu Zang. Limiting in sandboxed JavaScript. In ESORICS (2), pages 191–209, 2017. cache-based side-channel in multi-tenant cloud using dynamic page [48] Moritz Lipp, Michael Schwarz, Daniel Gruss, Thomas Prescher, Werner coloring. In DSN Workshops, pages 194–199. IEEE Computer Society, Haas, Anders Fogh, Jann Horn, Stefan Mangard, Paul Kocher, Daniel 2011. Genkin, Yuval Yarom, and Mike Hamburg. Meltdown: Reading kernel [69] Anatoly Shusterman, Lachlan Kang, Yarden Haskal, Yosef Meltser, memory from user space. In USENIX Security, pages 973–990, 2018. Prateek Mittal, Yossi Oren, and Yuval Yarom. Robust website finger- [49] Fangfei Liu, Yuval Yarom, Qian Ge, Gernot Heiser, and Ruby B. Lee. printing through the cache occupancy channel. In USENIX Security, Last-level cache side-channel attacks are practical. In IEEE SP, pages pages 639–656, 2019. 605–622, 2015. [70] Paul Stone. Pixel perfect timing attacks with HTML5. [50] Fangfei Liu, Qian Ge, Yuval Yarom, Frank McKeen, Carlos V. Rozas, https://www.contextis.com/media/downloads/Pixel_ Gernot Heiser, and Ruby B. Lee. CATalyst: Defeating last-level cache Perfect_Timing_Attacks_with_HTML5_Whitepaper.pdf, 2013. side channel attacks in cloud computing. In HPCA, pages 406–418, [71] The Tor Project, Inc. The Tor Browser. https://www.torproject. 2016. org/projects/torbrowser.html.en. [51] Giorgio Maone. Noscript. https://noscript.net. [72] Tom Van Goethem, Wouter Joosen, and Nick Nikiforakis. The clock [52] Bynens Mathias. Elements kinds in V8. https://v8.dev/blog/ is still ticking: Timing attacks in the modern web. In ACSAC, pages elements-kinds, 2017. 1382–1393, 2015. [53] Nikolay Matyunin, Yujue Wang, Tolga Arul, Kristian Kullmann, Jakub [73] Bhanu C. Vattikonda, Sambit Das, and Hovav Shacham. Eliminating Szefer, and Stefan Katzenbeisser. Magneticspy: Exploiting magne- fine grained timers in Xen. In CCSW, pages 41–46, 2011. tometer in mobile devices for website and application fingerprinting. [74] Pepe Vila and Boris Köpf. Loophole: Timing attacks on shared event In WPES, pages 135–149, 2019. loops in Chrome. In USENIX Sec, pages 849–864, 2017. [54] Clémentine Maurice, Christoph Neumann, Olivier Heen, and Aurélien [75] W3C. Webassembly JavaScript interface. https://webassembly. Francillon. C5: cross-cores cache covert channel. In DIMVA, pages github.io/spec/js-api/index.html, 2020. 46–64, 2015. [76] Daimeng Wang, Zhiyun Qian, Nael B. Abu-Ghazaleh, and Srikanth V. [55] Arvind Narayanan, Hristo Paskov, Neil Zhenqiang Gong, John Bethen- Krishnamurthy. PAPP: prefetcher-aware prime and probe side-channel court, Emil Stefanov, Eui Chul Richard Shin, and Dawn Song. On the attack. In DAC, page 62, 2019. feasibility of internet-scale author identification. In IEEE SP, pages [77] Zhenghong Wang and Ruby B. Lee. New cache designs for thwarting 300–314, 2012. software cache-based side channel attacks. In ISCA, pages 494–505, [56] Rom Ogen, Kfir Zvi, Omer Shwartz, and Yossi Oren. Sensorless, 2007. permissionless information exfiltration with Wi-Fi micro-jamming. [78] Mario Werner, Thomas Unterluggauer, Lukas Giner, Michael Schwarz, In WOOT, 2018. Daniel Gruss, and Stefan Mangard. ScatterCache: Thwarting cache [57] Yossef Oren, Vasileios P. Kemerlis, Simha Sethumadhavan, and Ange- attacks via cache set randomization. In USENIX Security, pages 675– los D. Keromytis. The spy in the sandbox: Practical cache attacks in 692, 2019. JavaScript and their implications. In CCS, pages 1406–1418, 2015. [79] Mengjia Yan, Christopher W. Fletcher, and Josep Torrellas. Cache [58] Dag Arne Osvik, Adi Shamir, and Eran Tromer. Cache attacks and telepathy: Leveraging shared resource attacks to learn DNN architec- countermeasures: The case of AES. In CT-RSA, pages 1–20, 2006. tures. In USENIX Security, 2020. [80] Yuval Yarom and Katrina Falkner. Flush+Reload: a high resolution, Table 7: Hyperparameters for the deep learning classifier low noise, L3 cache side-channel attack. In USENIX Security, pages 719–732, 2014. Hyperparameter Value [81] Yuval Yarom, Qian Ge, Fangfei Liu, Ruby B. Lee, and Gernot Heiser. Mapping the Intel last-level cache. IACR Cryptology ePrint Archive Optimizer Adam 2015/905, 2015. Learning rate 0.001 Batch size 128 [82] Yuval Yarom, Daniel Genkin, and Nadia Heninger. CacheBleed: A Training epoch Early stop by validation accuracy timing attack on OpenSSL constant time RSA. In CHES, pages 346– Input units vector size of the 30 seconds input 367, 2016. Convolution layers 2 [83] Andy B. Yoo, Morris A. Jette, and Mark Grondona. SLURM: Sim- Convolution activation relu ple Linux utility for resource management. In Dror Feitelson, Larry Convolution Kernels 256 Rudolph, and Uwe Schwiegelshohn, editors, Job Scheduling Strate- Convolution Kernel size 16,8 gies for Parallel Processing, pages 44–60. Springer Berlin Heidelberg, Pool size 4 2003. LSTM activation tanh [84] Boris Zbarsky. Clamp the resolution of performance.now() calls to LSTM units 32 5us. https://hg.mozilla.org/integration/mozilla-inbound/ Dropout 0.7 rev/48ae8b5e62ab, 2015. [85] Xiaokuan Zhang, Yuan Xiao, and Yinqian Zhang. Return-oriented Flush-Reload side channels on ARM and their implications for android overfitting in model estimation, we employ 10 fold cross devices. In CCS, pages 858–870, 2016. validation, a method which divides the dataset into 10 parts, [86] Yinqian Zhang, Ari Juels, Michael K. Reiter, and Thomas Ristenpart. Cross-VM side channels and their use to extract private keys. In CCS, with each part becoming the test set while the others are used pages 305–316, 2012. as the train set. Each training set is fed to a different model, and the evaluation is made on the related test set. After each experiment, we noted the average cross-fold accuracy, as well A Machine Learning Model as the standard deviation between folds. The output of our classifier is not only the label of the most Our machine learning classifier receives as input a side- probable class, but rather a complete probability distribution channel trace, and outputs a probability distribution over the over all possible labels. This flexibility allows us to capture 100 potential websites. Before the trace is fed to the model, the the case where the attacker has some prior knowledge of the input vector was normalized between 0 and 1. We then used a victim and some expectation of the websites they may be deep learning network to perform our analysis, meaning that browsing. To do so, we look not only at the top-rated label, feature extraction was done inside the neural network and did but also at a few of the next most probable predictions. This not require additional preprocessing steps. We used the deep methodology was previously used in similar works where learning model whose hyperparameters are presented in Ta- low-accuracy classifiers were evaluated [12, 55]. We thus ble 7. The model begins with a convolution layer which learns calculated not only the raw accuracy, but also the probability the unique patterns of each label, followed by a Max-Pooling that the right prediction is among the top 5 websites output layer which reduces the dimensionality of the output of the as the most probable by the classifier. The base accuracy rate previous layer. The output of the Max-Pooling layer is then of this prediction method, as obtained by a random classifier reshaped to a one dimension vector and fed to a Long-Short with no knowledge of the traces, is 5%. Term Layer, which extracts temporal features over its input. The machine learning model was implemented in python Finally, the output layer of the network is a fully-connected version 3.6, using TensorFlow [1] library version 1.4. The layer with a softmax activation function. model training algorithms were run on a cluster made out of The model was evaluated on a test set whose traces are Nvidia GTX1080 and GTX2080 graphics processing units not part of the training set. The metric we use is accuracy – (GPUs), managed by Slurm workload manager [83] version the probability of a trace to be classified correctly. To avoid 19.05.4.