Linköping University | Department of Computer and Information Science Master’s thesis, 30 ECTS | Datateknik 2021 | LIU-IDA/LITH-EX-A--2021/057--SE

Detection of side-channel attacks targeting SGX Detektion av attacker mot Intel SGX

David Lantz

Supervisor : Felipe Boeira Examiner : Mikael Asplund

Linköpings universitet SE–581 83 Linköping +46 13 28 10 00 , www.liu.se Upphovsrätt

Detta dokument hålls tillgängligt på Internet - eller dess framtida ersättare - under 25 år från publicer- ingsdatum under förutsättning att inga extraordinära omständigheter uppstår. Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner, skriva ut enstaka ko- pior för enskilt bruk och att använda det oförändrat för ickekommersiell forskning och för undervis- ning. Överföring av upphovsrätten vid en senare tidpunkt kan inte upphäva detta tillstånd. All annan användning av dokumentet kräver upphovsmannens medgivande. För att garantera äktheten, säker- heten och tillgängligheten finns lösningar av teknisk och administrativ art. Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman i den omfattning som god sed kräver vid användning av dokumentet på ovan beskrivna sätt samt skydd mot att dokumentet ändras eller presenteras i sådan form eller i sådant sammanhang som är kränkande för upphovsman- nens litterära eller konstnärliga anseende eller egenart. För ytterligare information om Linköping University Electronic Press se förlagets hemsida http://www.ep.liu.se/.

Copyright

The publishers will keep this document online on the Internet - or its possible replacement - for a period of 25 years starting from the date of publication barring exceptional circumstances. The online availability of the document implies permanent permission for anyone to read, to down- load, or to print out single copies for his/hers own use and to use it unchanged for non-commercial research and educational purpose. Subsequent transfers of copyright cannot revoke this permission. All other uses of the document are conditional upon the consent of the copyright owner. The publisher has taken technical and administrative measures to assure authenticity, security and accessibility. According to intellectual property law the author has the right to be mentioned when his/her work is accessed as described above and to be protected against infringement. For additional information about the Linköping University Electronic Press and its procedures for publication and for assurance of document integrity, please refer to its www home page: http://www.ep.liu.se/.

© David Lantz Abstract

In recent years, trusted execution environments like Intel SGX have allowed developers to protect sensitive code inside so called enclaves. These enclaves protect its code and data even in the cases of a compromised OS. However, SGX enclaves have been shown to be vulnerable to numerous side-channel attacks. Therefore, there is a need to investigate ways that such attacks against enclaves can be detected. This thesis investigates the viability of using performance counters to detect an SGX- targeting side-channel attack, specifically the recent Load Value Injection (LVI) class of at- tacks. A case study is thus presented where performance counters and a threshold-based detection method is used to detect variants of the LVI attack. The results show that certain attack variants could be reliably detected using this approach without false positives for a range of benign applications. The results also demonstrate reasonable levels of speed and overhead for the detection tool. Some of the practical limitations of using performance counters, particularly in an SGX-context, are also brought up and discussed. Acknowledgments

I would like to thank my examiner Mikael Asplund for giving me the initial idea for this the- sis, as well as for providing useful guidance throughout the work. I would also like to thank my supervisor Felipe Boeira for helping me improve the thesis with his feedback. Finally I want to thank my parents and my sister for always being helpful and supportive.

iv Contents

Abstract iii

Acknowledgments iv

Contents v

List of Figures vii

List of Tables viii

1 Introduction 1 1.1 Motivation...... 1 1.2 Aim...... 2 1.3 Research questions ...... 2 1.4 Delimitations ...... 2 1.5 Methodology overview...... 2

2 Background and related work3 2.1 Trusted Execution Environments and SGX...... 3 2.2 Side-channel attacks ...... 4 2.3 SCAs targeting Intel SGX...... 8 2.4 Performance counters...... 10 2.5 Existing SCA mitigations...... 11

3 Method 14 3.1 Selecting and running attack...... 14 3.2 Reading performance counters ...... 17 3.3 Measuring LVI impact on counters...... 18 3.4 Detection...... 21 3.5 Evaluation...... 24

4 Results 26 4.1 Attack impact on performance counters...... 26 4.2 Detection thresholds ...... 33 4.3 Evaluation of detection...... 34

5 Discussion 37 5.1 Results ...... 37 5.2 Method...... 38 5.3 Limitations of using performance counters to detect SCAs ...... 41 5.4 Other possible defenses against LVI ...... 42 5.5 The work in a wider context...... 43

6 Conclusion 45

v 6.1 Research questions ...... 45 6.2 Future work...... 46

Bibliography 47

A Performance counter measurements for additional scenarios 52

vi List of Figures

2.1 Pipelined processor...... 6 2.2 LVI classification ...... 10

3.1 Minimal enclave code, LVI-US-SB...... 16 3.2 Minimal enclave code, LVI-PPN-L1D...... 17 3.3 Pseudocode, first attack detector ...... 23 3.4 Pseudocode, second attack detector...... 24

4.1 Total and minor page faults, LVI-PPN-L1D ...... 27 4.2 Fluctuations in total number of instructions...... 32

vii List of Tables

2.1 Naming of some transient execution attacks...... 8

3.1 LVI PoC variants ...... 15 3.2 Selected perf stat events ...... 19 3.3 Selected events for per-process measurements ...... 20 3.4 Scenario explanation...... 20

4.1 Averages of counters for different attack variants...... 27 4.2 Averages of counter events for different scenarios ...... 29 4.3 Averages of counter events for different scenarios, normalized and scaled . . . . . 30 4.4 Chosen thresholds, DET_LVI_US...... 33 4.5 Chosen thresholds, DET_LVI_PPN...... 34 4.6 Detection results, 1 s sampling interval...... 34 4.7 Detection results, 100 ms sampling interval...... 35 4.8 Detection overhead...... 36

A.1 Additional data for multi-process scenarios...... 53 A.2 Additional data for multi-process scenarios, normalized and scaled...... 54

viii 1 Introduction

The purpose of this chapter is to give an introduction to the thesis. Separate sections are therefore provided that give an overview over the motivation, aim, research questions, de- limitations as well as methodology overview of this work.

1.1 Motivation

With the growing emergence of internet services and cloud computing, more and more po- tentially sensitive data is handled and stored by external services. The question of how users can trust these services is therefore an important one. One way to address this problem is through the concept of a Trusted Execution Environment, or TEE. A TEE basically provides a shielded execution environment separated from the Operating System (OS). This means that even if the OS itself is compromised and controlled by an attacker, a TEE can still guarantee confidentiality and integrity for the code and data contained in the TEE [9]. In the last few years companies like Intel and ARM have offered their own implementa- tions of TEEs. Intel’s solution is called Intel SGX, and works by reserving parts of system memory that are then encrypted and thus isolated from the rest of the system [11]. The en- crypted parts of memory are used to provide TEE instances called enclaves. This way, Intel guarantees the confidentiality and integrity of the data used by the enclaves. However, these enclaves have been shown to be vulnerable to numerous Side-Channel Attacks (SCAs) [36, 57], where architectural side effects of a program like cache behaviour, power consumption, etc., are used to infer secret data used by a program. While Intel SGX is by no means the only TEE implementation vulnerable to SCAs, a particularly large amount of research has been conducted regarding SCAs targeting Intel SGX, and several different SCAs have thus been discovered in recent years. Many of these attacks have been mitigated in different ways, but for several attacks the proposed mitigations can have a large impact on performance, etc. For example, the mitiga- tion for the attack Load Value Injection (LVI for short) [53] can according to the authors slow down performance of an application by a factor of 2 to 19. Furthermore, new attacks are reg- ularly discovered, so there is an incentive to find other and potentially more general ways to prevent SCAs. It’s therefore of interest to research methods that can be used to detect attacks against SGX. Existing detection tools like T-SGX [49] and Déjà Vu [8] use novel methods to protect enclaves against certain kinds of attacks, but these also have their limitations.

1 1.2. Aim

There have been many examples in the literature of using Hardware Performance Coun- ters (HPCs) to detect side-channel attacks, particularly attacks targeting cache behaviour [1]. For example, Mushtaq et al. [34] developed a tool called WHISPER that uses machine learn- ing and HPCs to detect cache-based SCAs. There are also software performance counters. However, we are not aware of any work that explores using performance counters for de- tecting attacks targeting SGX or other TEEs. It is therefore of interest to investigate whether such counters can be used to detect attacks against Intel SGX. There are of course many prac- tical limitations with this, given that for example accessing performance counters within SGX enclaves compiled for release is not currently possible. Nevertheless, given their widespread usage in non-SGX contexts, it’s still of theoretical interest to investigate the viability of perfor- mance counters for detecting SCAs targeting SGX, and how that differs to non-SGX contexts.

1.2 Aim

The overall goal of this work was to investigate to what extent performance counters can be used to detect SCAs against Intel SGX, as well as the practical limitations of such a method. More specifically the goal was to do a case study on Load Value Injection [53], a current and relevant SGX-targeting SCA, by investigating how performance counters could be used to detect this attack.

1.3 Research questions

The following are the main research questions that this thesis aims to answer: 1. How does a side-channel attack like Load Value Injection impact performance counters? 2. Can performance counters be used to detect SGX-targeted side-channel attacks like Load Value Injection? 3. What are the limitations with using performance counters to detect side-channel at- tacks, particularly against SGX?

1.4 Delimitations

Due to practical limitations of using performance counters in SGX-contexts, the goal of this work was not to provide a fully functioning detection tool that solves these limitations, but rather to investigate if performance counters could theoretically be used to detect SCAs against SGX. Since the focus is on SGX specifically, this work also did not investigate side- channel attacks against other TEEs like ARM’s TrustZone. Other than for providing back- ground and a broader view of the context of this work, SCAs not targeting SGX are also not discussed in any greater detail. Another limitation is that while there are many different SGX-targeting SCAs, this work mainly focused on the Load Value Injection attack as a case study. The hope is that this serves as a case study of using performance counters for detecting SCAs targeting SGX, and that some of the conclusions drawn from this can be extended to a wider context outside of specifically the LVI attack.

1.5 Methodology overview

This work adopted a case study approach and investigated a specific SCA targeting SGX, Load Value Injection (LVI). Firstly, the LVI attack was reproduced and its impact on perfor- mance counters was investigated. From these results a threshold-based detection method using performance counters was formed and evaluated according to not only accuracy, but additional factors such as speed and overhead.

2 2 Background and related work

This chapter aims to give enough background on the theory and related work in order to understand the rest of the thesis. Section 2.1 gives an overview of Trusted Execution Envi- ronments and Intel SGX. Sections 2.2 and 2.3 provide background for side-channel attacks both in general and specifically targeting SGX. Section 2.4 explains hardware and software performance counters, and the last section goes through some of the existing mitigations strategies as well as related work regarding detection of side-channel attacks.

2.1 Trusted Execution Environments and SGX

A Trusted Execution Environment, or TEE, is essentially an isolated execution environment that aims to provide an additional layer of security to the code and data within it, making it secure even from a compromised OS. The Confidential Computing Consortium (CCC) defines a TEE as "an environment that provides a level of assurance of data integrity, data confidentiality, and code integrity" [9]. This is often done through specialized hardware features. Depending on the TEE, code confidentiality may also be assured. Existing TEE implementations include Intel SGX, ARM TrustZone and AMD PSP. The main motivation for TEEs is to add additional protection for sensitive applications. As noted by the CCC [9], this makes it suitable for cloud services, where TEEs handling users’ sensitive data would allow those same users to trust that their data is secure even if the cloud service provider is compromised. So even in cases where an entire system is untrusted or compromised, a TEE should still be able to provide the assurances stated above for a sensitive application running within it. An application may be sensitive in the sense that it handles sensitive data (passwords, medical data, encryption keys, etc.), or that it uses algorithms that the algorithm owners want to keep secret [9].

2.1.1 Intel SGX Intel SGX stands for Intel Software Guard Extensions and is a TEE implementation that was introduced with the Skylake processor generation in 2015. SGX allows for the creation of so-called enclaves by reserving parts of memory that are then encrypted. The enclave is then only decrypted within the CPU itself, and any attempts to access the contents of the enclave from the outside are denied. An SGX enclave exists within the virtual address space of a

3 2.2. Side-channel attacks normal user application. An SGX application thus consists of two parts, the untrusted part which creates and hosts the enclave and calls its functions, and the trusted part which consists of the actual enclave that works on the application secrets. These secrets can be sent to the enclave through remote attestation, where proof is first provided to a user that software is running within an enclave and that it can be trusted (using cryptographic signatures) [11]. In the SGX threat model, only the enclave can be trusted while all other parts of the system are deemed untrustworthy. As with other TEEs, it’s therefore designed to be secure even if an attacker has complete control of the rest of the system. However, as Section 2.3 shows, SGX has been shown to be vulnerable against a number of side-channel attacks. As a matter of fact, side-channel attacks are excluded from the SGX threat model [11], and Intel instead leaves it up to enclave developers to write their programs in a manner which makes it resistant to those types of attacks [21].

SGX details This part goes over some of the terms related to SGX enclaves in more detail. The main source for all information provided here is [11].

• PRM: Stands for Processor Reserved Memory, the actual piece of memory reserved for the SGX enclave.

• EPC: Enclave Page Cache, exists within the PRM and contains the actual memory pages with the code and data of the enclave.

• EPCM: Stands for Enclave Page Cache Map. Since the translation of virtual memory addresses to physical memory addresses is still left to the untrusted system, the EPCM does checks to verify that these memory mappings are valid.

• SECS: Each enclave is associated with an SGX Enclave Control Structure (SECS) that contains enclave meta-data.

• TCS: Each enclave is also associated to one or more Thread Control Structures (TCS), one for each executing thread. Each TCS further includes the State Save Area (SSA) structure, which can store the context of the enclave thread in case it needs to be restored after a so-called Asynchronous Enclave Exit (AEX).

• AEX: An Asynchronous Enclave Exit (AEX), occurs when an interrupt happens during enclave execution. When an AEX occurs, the enclave context, often along with the reason for the exit, is saved in the SSA and an Asynchronous Exit Pointer (AEP) is saved on the call stack. The interrupt is then handled by the OS before it returns to the AEP, which in turn acts as a trampoline to a handler in the host process. This handler can then execute the ERESUME instruction, at which point the enclave context is restored and enclave execution continues.

2.2 Side-channel attacks

Side-Channel Attacks (SCAs), are different from many other types of cyberattacks, in that they target weaknesses with the underlying implementation of the whole system rather than the actual algorithms themselves. A channel in this context refers to a medium over which information can be sent. A side-channel is an unintentional channel not meant for informa- tion transfer, but that leaks information due to how the system is physically constructed. In SCAs, such channels can be exploited by an attacker to extract enough information to break even advanced cryptographic systems like implementations of AES. Examples of possible side-channels are power consumption, memory access patterns, timing patterns and even acoustics and electromagnetic radiation [1].

4 2.2. Side-channel attacks

A closely related concept to side-channels are covert channels. According to Canella et al. [5], "Covert channels are a special use case of side-channel attacks, where the attacker controls both the sender and the receiver". With this definition, covert channels thus differs from other side-channels in that for a covert channel, the attacker can have direct control over what information is leaked. This would be in contrast to other side-channels where the attacker can just observe the information that (unintentionally) leaks through the channel. In line with the provided definition, attacks that make use of covert channels are considered to be a subset of side-channel attacks for this work. The term side-channel therefore also includes covert channels for the rest of this thesis. SCAs can be divided into different categories, depending on what side-channel they target in order to gain secret information. For example, timing attacks exploit the fact that compu- tations might take different amounts of time depending on certain parameters, and that by timing these computations, an attacker can gain secret information. Attackers can for exam- ple extract secret keys from cryptosystems, by exploiting that they can take different amounts of time depending on input data and/or encryption keys [27]. Power analysis attacks instead focus on the power consumption of systems, and the fact that it varies depending on what computations are taking place [26]. Two other large types of side-channel attacks are cache-based SCAs and transient execu- tion attacks, which are described in more detail in Subsections 2.2.1 and 2.2.2 since they are of particular importance for this thesis. Different types of SCAs also often overlap. For example, a cache-based SCAs is often used as the final step of a transient execution attack [29].

2.2.1 Cache-based SCAs In most modern processors there are up to three different levels of cache memory (L1, L2 and L3), where the cache at the last level, also known as LLC (Last Level Cache), is often shared between multiple cores. The L1 cache is usually further divided into two separate caches for instructions and data respectively. Cache-based SCAs (CSCAs) target the access and timing behaviour of a victim program with regards to the cache. There are many different CSCAs, but they all utilize similar tech- niques. Some common CSCA techniques include Prime+Probe, Flush+Reload, Flush+Flush, Evict & Time, etc. The different techniques are often similar, but work in slightly different ways. For example, the Flush+Reload technique consists of an attacker flushing (emptying) cache lines from a shared cache with a special CLFLUSH instruction. Then the attacker waits for the victim to execute for a while before executing the Reload phase. In the Reload phase, the attacker reloads the same cache line and can measure the time to see if the same cache line was used by the victim [1]. Another CSCA technique is Flush+Flush[16], which replaces the Reload step of Flush+Reload with another Flush step. An attacker instead measures the execution time of the flush instruction, which can vary depending on whether data exists in the cache or not. While many cache attacks lead to high amounts of cache misses (accessed mem- ory not present in cache) when accessing memory (e.g. Reload step of Flush+Reload), the Flush+Flush attack is considered more stealthy than other attacks, since the attacker doesn’t need to make memory accesses that can result in cache misses.

2.2.2 Transient execution attacks Transient execution attacks belong to a class of attacks targeting weaknesses of . Speculative execution is a concept that belongs to a larger series of optimization techniques used in modern processors known as Out-of-order execution. Out-of-order ex- ecution aims to improve performance of the CPU by not always executing instructions in sequence, but instead utilizing instruction cycles that would otherwise be wasted. Due to the

5 2.2. Side-channel attacks

Figure 2.1: Structure of a pipelined processor1. Each block represents a separate instruction. pipelined structure of modern processors (see Figure 2.1), several instructions can often be executed in parallel. In the case of speculative execution, instructions that may or may not actually be needed are executed. The instructions in the processor pipeline can for example be branching instructions that result in different execution paths depending on the outcome of the instruction. Normally, the processor would have to wait until the initial instruction has finished before doing further work, but due to speculative execution, the processor predicts which execution path will be taken (based on previous paths taken, etc.) and starts fetching and executing instructions related to that path. When the prediction is correct, these instruc- tions are retired, meaning their results are committed to the program state. Correct predictions lead to big performance boosts, but a prediction could naturally be wrong in which case the result is ignored and the work is reversed. Instructions that are executed due to speculative execution and then rolled back are known as transient, meaning impermanent, since they can be said to only "exist" for a short time. Transient instructions can however still leave traces in the microarchitectural state of the processor (caches, etc.), which can be exploited by attackers to create covert channels and transmit secret information. Such attacks are therefore known as transient execution attacks [5].

Spectre Transient execution attacks first became known with the discovery of Spectre by Kocher et al. [25]. Spectre specifically targets branch prediction, and works by essentially controlling which path is speculatively executed by the processor. This is done by poisoning various branch predictors, thus causing the program to execute instructions that it wouldn’t during normal execution. This way, Spectre can get the victim process to execute code that transfers secret data from the victim context to the adversary context via a covert channel like the cache.

1CBurnett (2006), Pipeline, 4 stage https://commons.wikimedia.org/wiki/File:Pipeline,_4_stage.svg, License Creative Commons (BY-SA) [2021-04-12]

6 2.2. Side-channel attacks

Spectre can then finally read the secret data via the covert channel by using CSCA techniques like Flush+Reload or Evict+Reload. There are several different branch predictors, depending on what is actually predicted. Among them are the Pattern History Table (PHT) which predicts conditional branches, the Branch Target Buffer (BTB) that predicts call/jump destinations and the return stack buffer (RSB) that predicts return addresses. There are several different variations of the Spectre attack depending on which of these buffers is targeted [5]. In the original paper [53], two versions of the Spectre attack were investigated (Spectre v1 and Spectre v2), targeting the PTH (v1) and the BTB (v2), respectively. One notable difference between Spectre v1 and v2 is that v2 allows the attacker to redirect execution to gadgets (pieces of code) in the victim address space, in a way similar to Return Oriented Programming (ROP) attacks [25].

Meltdown Meltdown is another type of transient execution attack, first discovered by Lipp et al. [29]. In contrast to Spectre, Meltdown makes use of transient instructions occurring after exceptions. The attacker first identifies a region in physical memory containing a secret, i.e. an inacces- sible value that the attacker wants to leak. The actual attack consists of causing an exception and then executing transient instructions that operate on the secret value. The first instruc- tion accessing this inaccessible part of memory will induce an exception, but due to a race condition between memory access and privilege checking, the processor will still perform the following transient instructions operating on the secret value. This will then leave traces that the attacker can extract secret information from using covert channels like for Spectre. The Meltdown attack allows an attacker to read arbitrary physical memory, even belonging to the kernel space, from an unprivileged user program.

Other attacks Since Spectre and Meltdown there have been several other transient execution attacks, for example [52], Fallout [6], RIDL [43], and Zombieload [45]. The later three are commonly known under a collective name of Microarchitectural Data Sampling (MDS), be- cause of their exploitation of internal CPU buffers (e.g. Store Buffers (SBs), Line Fill Buffers (LFBs), and Load Ports (LPs)). Using these buffers and transient execution in a way similar to Meltdown, they were shown to be able to leak data over several protection boundaries [36]. Load Value Injection (LVI), which is described in more detail in Section 2.3, specifically targets SGX and is also a type of transient execution attack.

Classification of transient execution attacks In 2019, Canella et al. published an evaluation of transient execution attacks [5], which pro- vides a classification for these types of attacks. This classification divides attacks into two main types, Spectre-type and Meltdown-type, according to which method is used to induce transient instructions. Spectre-type attacks target branch misprediction, and include all of the previously mentioned Spectre variations as well as others that haven’t been mentioned. Spectre-type attacks are further divided by their targeted branch predictor. Meltdown-type attacks instead exploit transient instructions after a fault or microcode assist. Microcode are hardware-level instructions that can be performed inside the CPU. A microcode assist points to a specific microcode routine, and can happen in rare, complex cases to "assist" higher level instructions. Most of the attacks mentioned in the "Other attacks" section classify as Meltdown-type attacks, and these are then divided according to the used fault/assist type and the source of leakage [5]. It should also be noted that Intel for the most part uses its own classification for transient execution attacks [22], so many of the already mentioned attacks are also known by other names. Some examples of this are given in Table 2.1.

7 2.3. SCAs targeting Intel SGX

Name Classification by [5] Intel equivalent Spectre (v1) Spectre-PHT Bounds Check Bypass Spectre (v2) Spectre-BTB Branch Target Injection Meltdown Meltdown-US-L1 Rogue Data Cache Load Foreshadow Meltdown-P-L1 L1 Terminal Fault

Table 2.1: Different naming of some transient execution attacks.

2.3 SCAs targeting Intel SGX

Although Intel SGX is supposed to guarantee confidentiality and integrity of secret data, researchers have discovered numerous possible attacks breaking these guarantees. A lot of these attacks are side-channel attacks, and new attack vectors are regularly discovered. In a recent survey of published attacks against Intel SGX, Nilsson et al. [36] list a number of mostly side-channel based attacks and existing mitigations for them. One important thing to note is that since SGX has a threat model where the underlying OS could be compromised, most of the attacks identified towards SGX assume an attacker with greater capabilities than for side-channel attacks in the non-SGX context. For instance, Xu et al. [56] introduced the concept of controlled-channel-attacks where secret information is gained by repeatedly causing page faults during enclave execution. In the original paper, they make use of the fact that the OS has control over page table mappings even for SGX applications. They use this to restrict access to certain memory pages, that when used by the victim, cause page faults that transfer control over to the OS. By repeatedly causing a page fault and recording which memory page was accessed, the attacker can recover secret data from the trace of accessed pages. The authors demonstrate the capabilities of this type of attack by using it to extract complete text documents and outlines of images from the protected applications. In another work related to controlled channel attacks, Bulck et al. developed the at- tack framework SGX-Step [54]. SGX-Step provides a kernel framework that allows attackers to track page table entries directly from user space. SGX-Step uses APIC (Advanced Pro- grammable Interrupt Controller) timers, in order to issue interrupts during enclave instruc- tions and thereby causing AEXs that preempt enclave execution. Then, attacker spy code can track page table entries from a custom interrupt handler. All of this is done in a way that allows for the attacker to single-step enclaves, meaning interrupting the enclave after every single instruction. SGX-Step is an open-source framework2 and has been used to construct a number of other attacks. Cache-based SCAs have been shown to be possible on SGX [14,4], due to SGX enclaves still making use of shared caches. Several more advanced attacks making use of CSCAs have since been constructed, like CacheZoom [32] and Memjam [31]. A unique attack by Schwarz et al. [46] used cache attacks from within an enclave itself to attack other enclaves. Another attack, called Plundervolt [33], used dynamic voltage scaling to corrupt the in- tegrity of enclave computations. The authors showed how this could be abused to extract secret keys from cryptographic algorithms. SGX has also been shown to be vulnerable to transient execution attacks, for example the similarly named attacks SgxPectre [7] and SgxSpectre [37] demonstrated that SGX was vulnerable to Spectre-type attacks. The first version of Foreshadow [52] also targeted SGX. The aforementioned MDS attacks [6, 43, 45] were shown to be able to leak data across several protection boundaries, including SGX enclaves. Other attacks include CacheOut [44] and CrossTalk [41]. Load Value Injection (LVI) [53] is another transient execution attack specialized for SGX, and is described in more detail in the following section.

2https://github.com/jovanbulck/sgx-step

8 2.3. SCAs targeting Intel SGX

2.3.1 Load Value Injection One of the more recent attacks against Intel SGX was found by Bulck et al. [53] and was dubbed Load Value Injection, or LVI for short. In their paper, they explain that LVI should be seen as a new class of attacks rather than a standalone attack, since it introduces a new method of attacking enclaves. They describe LVI as a "reverse Meltdown"-type attack since it essentially turns the process of a Meltdown attack around by injecting data instead of leaking data. The attack operates under the assumption that the attacker can provoke page faults or microcode assists during enclave execution, optionally using features offered by a framework such as SGX-Step. LVI works by an attacker first poisoning a microarchitectural buffer with a value of their choice. Then when the victim attempts to load a trusted value from memory, a page fault (or microcode assist) is induced by the attacker. Following this faulting load, the attacker-controlled value is incorrectly forwarded, or injected, from the poisoned buffer, instead of the trusted value. This way, LVI can cause instructions following the initial load to be transiently executed with poisoned data, and these can then leave traces in the microar- chitectural state of the CPU. This allows the attacker to create covert channels (e.g. cache) to transmit secret enclave data. An attacker can also hijack victim execution following the fault- ing load, and redirect it to second-stage code gadgets (already existing in enclave memory) in a similar way to Spectre attacks. These code gadgets can then be used to leave traces in the CPU microarchitectural state in the same way as previously mentioned. Apart from poisoning the buffer, causing a fault/assist and reconstructing a secret through a side-channel, most of the LVI attack takes place in the victim enclave itself. Furthermore, the authors found that recent processors with mitigations to meltdown-type attacks, zero out the result of a faulting load and pass a NULL value to the following transient instructions. However, the authors found that this behaviour can still be exploited in LVI-type attacks (LVI-NULL), in which case the poisoning of a buffer is not necessary. In the paper the authors suggest a mitigation strategy for LVI where LFENCE instructions are inserted after every potentially faulting memory load. An LFENCE instruction serializes the processor pipeline, and thus guarantees to halt transient execution, which in turn prevents LVI from happening. This mitigation strategy corresponds to the one adopted by Intel, where additional compiler configurations are available for inserting the LFENCE instructions [10] into a program author’s code. However, as noted by the authors of the LVI paper, fully mitigating LVI this way can incur performance overheads of factor 2 to 19.

Classification of LVI attacks According to the original paper [53], LVI represents a new class of transient execution attacks similarly to Spectre and Meltdown before it. They also include a classification tree, displaying all possible LVI attack variants they discovered, which is shown in Figure 2.2. Note that not all of the variants have working proof of concepts yet, so many of them are still purely theoretical. Also note that two attacks are marked green, since the authors found that these variants are already (unintentionally) prevented with mitigations designed for Meltdown- type attacks.

9 2.4. Performance counters

Figure 2.2: Classification of LVI attacks. Source: [53], © Copyright 2020 IEEE.

As can be seen from Figure 2.2, LVI attacks are first divided by how they interrupt victim execution, which is either through Page Faults (PFs) or MicroCode Assists (MCAs). They are further classified by the microarchitectural buffer that they target as a source for injecting data. Note that further variations on LVI depending on which kind of side-channel (e.g. cache) is used to transmit secret data is not included by the classification. The classification also doesn’t divide attacks based on whether the attacker first hijacks control flow or exploits instructions directly following the faulting load to send the secret.

Outside SGX context In the original paper on LVI [53], the authors discuss that some LVI-type attacks are theoreti- cally possible in other contexts besides SGX, like in cross-process and user-to-kernel contexts. More specifically, the LVI-P and LVI-AD variants are not specific to the SGX context. How- ever, they did not find practical LVI gadgets that could be exploited in such scenarios, and only created synthetic proof of concepts demonstrating such attacks. According to the LVI website [51], they currently assess LVI in non-SGX contexts to be mainly of academic interest. They do however encourage future research into LVI in such contexts. Researchers at Bitdefender also showed a proof of concept for a cross-process LVI scenario exploiting LFBs and microcode assists [30]. This attack (of variant LVI-AD-LFB using the classification in Figure 2.2) thus demonstrated LVI capabilities in contexts outside SGX, but is still a synthetic proof of concept.

2.4 Performance counters

Performance counters can be used to count the occurrences of certain hardware or software related events in a system. Hardware Performance Counters (HPCs) are special registers that can be configured to measure various hardware-related activities occurring in the CPU. There are different sets of counter events for different processor architectures and models, but some common counters include total retired instructions, CPU cycles and cache misses. While HPCs can be used to count many different events, there is a limited number of registers to store the counts, leading to a limit for the number of events that can be counted at the same

10 2.5. Existing SCA mitigations time (usually varies from 4 to 8). While some of the performance counters are programmable, some counters are also fixed to a specific event [42]. There are also software performance counters that are made available through the OS and that relate to software events such as page faults and context switches [24]. The term per- formance counters is used in this thesis to refer to both hardware and software performance counters. There are several tools and APIs that can be used for accessing the data from performance counters. For linux, Perf is one of the most common tools for performance profiling [50]. Some other tools that can be used are for example PAPI [40] and VTune.

2.5 Existing SCA mitigations

This section aims to provide an overview of some of the mitigation strategies and tools that exist to combat the SCAs described in Section 2.3. Note that the focus is on mitigations against SGX attacks, so mitigations that specifically target non-SGX specific SCAs, like the KAISER mitigation for Meltdown [29], are not discussed. For several attacks, Intel has released microcode patches as prevention. Microcode patch- es/updates are delivered to the CPU by either the BIOS or the OS upon system boot and directly change the microcode, thus affecting the behaviour of the CPUs themselves. Such a patch designed for a specific attack mostly fully prevents the attack, but also often disable cer- tain features to do this. For example, the patch mitigating the Plundervolt attack [33] removes the possibility to undervolt the computer, a feature that for some users might be important for performance needs [3], etc. Most of the mentioned transient execution attacks for SGX (though not LVI) have been mitigated with microcode patches. For attacks that haven’t been addressed by Intel with either microcode updates or changes to the SGX system design, solutions can still be deployed through modified compilers or up- dated enclave SDKs. As mentioned before, the current mitigation against Load Value Injec- tion gives enclave authors the option of choosing a special compiler option that mitigates LVI. However, this mitigation can have large impacts on performance [53]. Many SCAs can also be avoided by the enclave developers designing the enclaves in a different, more secure way. For example, Gruss et al. [15] presented the software library Cloak to combat cache side-channel attacks in general, and showed that it could be used to prevent CSCAs towards SGX as well. Cloak makes use of hardware transactional memory like Transactional Synchronization Extensions (TSX) to put sensitive code and data in so called transactions, that guarantee the code to be run in an atomic and isolated way. There are however drawbacks with this, like that TSX is not available on all processors with SGX [54].

2.5.1 Detection methods For the aforementioned mitigations, the focus is more on preventing the actual attacks from being executed in the first place, rather than detecting attacks in progress. Most of the re- search regarding detection of SCAs has been done for non-SGX contexts, but there are also some detection tools developed specifically for SGX. This subsection therefore goes through some of these detection tools, first for the SGX context and then for the non-SGX context.

SGX context An SGX-specific detection tool which was mentioned in Section 1.1 is T-SGX [49], that makes use of TSX to prevent controlled-channel attacks. T-SGX consists of a modified LLVM com- piler, that protects enclave code by basically wrapping enclave code blocks in TSX transac- tions. Page faults that occur inside such transactions are not reported to the OS, making it impossible for the OS to use that controlled channel to gain secret information. Furthermore, the number of transaction aborts is also counted, and if too many such aborts are counted the

11 2.5. Existing SCA mitigations enclave execution is terminated. It’s stated that T-SGX works faster than previous state of the art mitigation schemes, but T-SGX still induces performance overheads of 50% on average and storage overheads of about 30%. A similar work was done by Chen et al. who described their solution Déjà Vu [8]. Déjà Vu exists as an extension to the LLVM compiler, and also uses TSX to protect against controlled channel attacks. Instead of recompiling enclave code however, Déjà Vu uses an in-enclave reference clock thread protected by TSX. Using this clock, the current program execution time is then measured against what is expected. Since frequent enclave preemptions by an attacker would increase the execution time significantly, an attack is detected if the current execution time crosses a certain threshold. The authors of SGX-Step [54] themselves state that tools like T-SGX and Déjà Vu would be able to detect an ongoing attack by the frequent interrupts caused by single-stepping an enclave. However, they also bring up some of the drawbacks with these detection methods. They for example bring up the fact that TSX is not available on all SGX-enabled processors, and that TSX defenses significantly increase run-time performance overheads.

Non-SGX context In contexts outside of SGX, a number of methods and tools for detecting side-channel attacks have been proposed in the literature. Most of the research regarding SCA detection concerns cache-based SCAs. In a survey by Akram et al. [1], the authors present much of the known work concerning CSCA detection, and note among other things that almost all of the methods use hardware performance counters for detecting the attacks. There are also examples of both signature-based and anomaly-based detection. Signature-based detection tools detect attacks based on signatures of known attacks, while anomaly-based detection tools instead detect attacks based on deviations from normal execution patterns (anomalies) in benign processes. In a paper by Mushtaq et al. [34], the authors develop a tool called WHISPER for detect- ing CSCAs during runtime with the help of machine learning and hardware performance counters. In order to detect a larger set of attacks, their tool uses an ensemble model, where multiple different machine learning algorithms are used together to make predictions. The tool is designed to collect values from HPCs during runtime, which are then used as features for the ensemble learning model. These include cache-related events like cache accesses and cache misses as well as system-wide events like total number of CPU cycles. They demon- strate the capability of the tool by using it to detect three CSCA variants targeting the AES cryptosystem: Flush+Reload, Flush+Flush, and Prime+Probe. The tool adopts an anomaly- based detection methodology, allowing it to detect even the stealthier Flush+Flush attack, since the victim still experiences more cache misses, etc., due to the attack. So, although the attack process itself might not be detected, the tool does detect that the victim is under attack. The effectiveness of the detection tool is evaluated with metrics such as accuracy, speed, and overhead. They also perform the experiments for different levels of system load (no load, average load, and full load) in order to simulate realistic load conditions. All three attacks were detected with high accuracy and speed as well as relatively small overhead. Finally, they also use the tool to detect Spectre and Meltdown attacks using CSCAs. They show that while not modifying the detection tool at all resulted in many false positives during full load conditions, retraining the tool and using other events related to these specific attacks as features, like page faults (using a software performance counter) for Meltdown and branch mispredictions for Spectre, gave very good results.

HexPADS While many CSCA detection methods use machine learning methods together with HPCs for detection [35,2], some also present tools based on value thresholds for certain HPC events.

12 2.5. Existing SCA mitigations

HexPADS is one such tool developed by Mathias Payer [38]. HexPADS uses HPC events like LLC accesses, LLC misses and total number of retired instructions to detect certain at- tacks, like Rowhammer attacks [47] as well as some cache-based SCAs according to known attack signatures. It does this by monitoring performance counters of all currently running processes in the system using the perf_event_open interface. By then comparing to existing at- tack signatures for CSCAs and Rowhammer attacks among others, a process is either deemed to be benign or is detected as a potential attack. The source code for HexPADS is available as open source3. The HexPADS tool consists of 4 fundamental components:

1. readproc: Gathers information about processes by accessing the /proc directory, which contains information about all running processes in separate folders mapped to the Process ID (PID).

2. readperfctrs: Initializes and reads performance counters for running processes using perf_event_open, the underlying interface for the perf command line tool.

3. detector: Calculates various statistics (like cache miss rate) based on the counters and matches these to known attack signatures. If a process matches a signature, a potential attack is reported.

4. mitigator: Either kills or slows down a process which has been reported as a potential attack.

3Available at https://github.com/HexHive/HexPADS

13 3 Method

The following chapter describes the overall method used for this work. The first section de- scribes the choice of the LVI attack and goes into some more detail around the specific proof of concepts that were used for this work. Section 3.2 goes through the method used for ac- cessing performance counters. Section 3.3 describes in more detail the method of measuring impact of the LVI attack on performance counters, in order to provide an answer to the first research question. Then, Section 3.4 goes through the methodology for creating a detection method for LVI, while Section 3.5 presents the method for evaluating the detection, with the overall goal of answering the second research question. All of the measurements and tests done for this work were performed on a computer with an Intel Core i5-6200U processor (Skylake), running Ubuntu 20.04 (Linux kernel version 5.4.0-42-generic).

3.1 Selecting and running attack

As stated earlier, the overall aim of this work was to investigate SCAs against SGX and whether they could be detected using performance counters. Which specific attacks to inves- tigate was therefore not yet decided at the outset of this thesis work, and a choice of attack or attacks to perform a case study on had to be made. As shown in Section 2.3, there are many side-channel attacks targeting Intel SGX. While several different SCAs were considered, Load Value Injection (LVI) described by Jo Van Bulck et al. [53] was chosen for further investigation, in order to keep this work within a reasonable scope. LVI was selected based on a few factors. It is an SGX-specific attack that was discovered pretty recently, and is therefore quite relevant today. As noted by the authors, the current software mitigation can also have significant neg- ative impacts on performance, making the attack even more relevant. Due to the mitigation being a special compiler option, LVI is also easier to recreate compared to some other SCAs since one can simply opt to not compile the enclaves in that mode. The LVI attack also has available Proof of Concept (PoC) code examples as part of the SGX-Step attack framework4. Finally, as discussed by the authors and shown with LVI-LFB [30], LVI might be possible in settings outside SGX as well, making it even more relevant to investigate.

4PoC code available at https://github.com/jovanbulck/sgx-step/tree/master/app/lvi

14 3.1. Selecting and running attack

When it comes to using performance counters for detecting LVI attacks, one can also note a few interesting things in how the attack works (described in Section 2.3), which make it more relevant for this work. Firstly, the attack provokes page faults alternatively microcode assists in order to forward attacker controlled values. Both page faults and microcode assists can be recorded with performance counters. Similar to other transient execution attacks like Spectre and Meltdown, LVI can also make use of CSCAs in the final stage to transmit the secret data. As shown in Section 2.5.1, HPC- based methods have proven effective for detecting CSCAs, even when the CSCA is used as a part of transient execution attack like Spectre [34]. All of the attack PoCs make use of a Flush+Reload attack as the last step in the main attack loop, and the more practical attacks described in the paper mostly use cache attacks as well.

3.1.1 Running attack For running the LVI attack, the existing PoC implementation was used. The PoC code is included as a part of the SGX-Step [54] repository, and consists of a program that can run three separate LVI attack variants on a victim enclave. These attack variants are described in more detail in Subsection 3.1.2. Having access to a system where SGX is both present and enabled is obviously a prerequisite for running the LVI attack.

3.1.2 LVI PoC details The three variants of LVI included with the PoC code, along with their equivalent variant according to the classification tree in Figure 2.2, are shown in Table 3.1. From this point, the names from the LVI classification tree will be used to refer to the PoC attacks, with the exception of LVI-SB-ROP which will be referred to as LVI-US-SB-ROP.

PoC variant LVI classification equivalent LVI-SB (store buffer injection) LVI-US-SB LVI-SB-ROP (transient control flow hijacking) LVI-US-SB LVI-L1D (L1D cache injection) LVI-PPN-L1D

Table 3.1: The three LVI PoC variants and their corresponding LVI variant according to the classification done by Bulck et al. [53].

It is important to note that these attack PoCs are simplified and don’t demonstrate real attacks, but since they still contain all the major steps behind LVI attacks they can still be used for investigating the impact of LVI on performance counters. One should also note that while the PoC is part of the SGX-Step repository, it only makes use of its page table manipulation features and not its single-stepping capabilities (however, more practical LVI attacks can).

LVI-US-SB and LVI-US-SB-ROP These two variants both preempt enclave execution by using page faults that occur when a Page Table Entry (PTE) is marked as belonging to the kernel with the User/Supervisor (U/S) attribute of a page table entry. It’s important to mention that there are valid page faults that can occur during normal program execution, since a page fault happens whenever a process accesses a memory page which is not mapped into the virtual address space. There are (for most systems) two types of valid page faults, minor and major page faults. Major page faults for example occur when the requested page is not yet present in memory. Minor faults occur when pages exist in memory but have not yet been marked by the Memory Management Unit (MMU) [24]. The types of page faults caused by LVI (and Meltdown-type attacks) however, occur due to one

15 3.1. Selecting and running attack

of several invalid conditions (e.g. accessing a page without high enough privileges) [18], and can therefore be seen more as errors that shouldn’t occur in a normal program. Both the LVI-US variants also exploit the store buffer to inject malicious data. The LVI-US- SB variant then uses victim enclave instructions following the faulting instruction to directly encode enclave data via a cache covert channel. The minimal code example in Figure 3.1 shows the instructions inside a victim enclave that are exploited in the LVI-US-SB PoC. The instruction at line 3 dereferences an untrusted pointer from the (attacker-controlled) host, and stores an attacker-controlled value to it (in this case the string ’S’). This store instruction brings the value ’S’ into the store buffer. By modifying the PTE for the pointer enclave_pt before calling this enclave function, the attacker can then make the load instruction at line 4 fault. This causes the following transient instructions at line 5 to compute on the value ’S’ instead of the trusted enclave data. Thus, the value ’S’ has been injected which lets the attacker use the gadget code at line 5 to leak enclave data through a cache-based covert channel.

1 void ecall_lvi_store_enclave( char * user_pt , char * o r a c l e ) 2 { 3 * user_pt = ’S’; 4 trusted_val= * enclave_pt ; 5 leak=oracle[4096* trusted_val ]; 6 }

Figure 3.1: Minimal enclave example for the LVI-US-SB PoC attack.

The LVI-US-SB-ROP attack on the other hand, uses the same method of data injection, but instead faults a return instruction in order to get it to load the attacker value. This allows the attacker to redirect control flow to an arbitrary second-stage code gadget located within the existing enclave code base, where secrets are again encoded through a cache-based covert channel. This variant thus demonstrates the possibility of using LVI to hijack control flow, using techniques similar to Return Oriented Programming (ROP).

LVI-PPN-L1D The final PoC demonstrates an example of LVI-PPN-L1D. A minimal enclave victim code example is shown in Figure 3.2. This variant instead exploits a type of page fault unique for SGX, by the attacker remapping the PTE for a page (page_b) to point to the Physical Page Number (PPN) of another page (page_a). This is done before entering the enclave function. Then, inside the enclave, page_a is dereferenced correctly at line 6, bringing the physical memory containing ’A’ into the L1 data cache. However, when page_b is dereferenced at line 7, the EPCM checks of SGX detect the virtual-to-physical remapping and a page fault is raised. However, the poisoned physical address is still sent to the L1D cache before the fault is architecturally raised, and due to ’A’ already being in the L1D cache, this leads to a cache hit. ’A’ is thus injected into transient execution at line 8, where the attacker again can use a cache-based covert channel to leak enclave data as in the previous PoC.

16 3.2. Reading performance counters

1 char __attribute__((aligned(0x1000))) page_a[4096] = { ’A’}; 2 char __attribute__((aligned(0x1000))) page_b[4096] = { ’B’}; 3 4 void ecall_lvi_remap( char * o r a c l e ) 5 { 6 a = * page_a ; 7 b = * page_b ; 8 leak=oracle[4096* b ] ; 9 }

Figure 3.2: Minimal enclave example for the LVI-PPN-L1D PoC attack.

3.2 Reading performance counters

The aim of this work was to investigate how performance counters could be used to detect LVI attacks, so the purpose of this section is to present the overall method that was used to access and read these performance counters. Subsection 3.2.1 goes into some of the difficulty of trying to read counters for the enclaves themselves, while Subsection 3.2.2 describes how counters were instead used for the untrusted parts of SGX applications.

3.2.1 Within enclave Accessing HPCs from within an SGX enclave using an interface such as PAPI, was discovered to be impractical if not impossible, even for enclaves running in debug-mode. Firstly, using third party libraries within Intel SGX (and other TEEs) is non-trivial because of the additional security requirements that comes with enclave execution. In a paper by Wang et al. [55], this matter is discussed and explored in more detail. Secondly, reading HPCs requires making use of an underlying instruction, RDPMC (hardware instruction for reading HPCs), that is classified by Intel as illegal within enclaves, along with many other hardware instructions5. According to Intel the RDPMC instruction can result in a VMEXIT (transferring control to the VMM) when executed within an enclave, and since the VMM cannot be allowed to update the enclave, Intel classifies the instruction as illegal. RDPMC is also by default restricted to privilege level 0. In order to use the instruction from within the enclave, a certain flag in register CR4 would have to be manually set in order to allow performance monitoring counters for all privilege levels [17]. However, due to RDPMC being illegal within enclaves according to Intel, this was not investigated further. Furthermore, it seems that even when enclaves are compiled in debug-mode, perfor- mance counters are still disabled for them, meaning that performance counters cannot be used to profile the enclaves from the untrusted part of the application either. According to Intels software development manual [19], a special flag needs to be set for every TCS (TCS.FLAGS.DBGOPTIN) associated with an enclave in order to enable certain debug fea- tures. If an enclave is compiled in debug-mode, this flag can be set by special debugging software like the SGX debugger. The tool VTune also has support for this with its utility sgx- hotspots, which can be used to measure a specific event (INST_RETIRED.PREC_DIST) that emulates precise clockticks. When attempting to read counters for the entire SGX application without using any special debuggers, etc. the performance information for the actual enclave is simply not recorded, and the only resulting information corresponds to the hosting application. By for example using perf looking at the instructions and cache-misses of the entire LVI application with the perf commands perf record and perf report, it could be seen that none of the captured

5https://software.intel.com/content/www/us/en/develop/documentation/sgx-developer-guide/top/ processor-features/illegal-instructions-within-an-enclave.html

17 3.3. Measuring LVI impact on counters events took place within the actual enclave. This is further discussed by Michael Schwarz et al. [46], where the authors verify that even counters for last-level cache activity are disabled for debug and pre-release enclaves. They note that all of the captured events could be traced back to the hosting application only. In the paper, they use this fact to show that a side- channel attack performed from within an enclave cannot be detected by an outside detection mechanism built on performance counters.

3.2.2 Outside of enclave Although reading performance counters for an enclave itself was found to not be viable, one can still profile the untrusted code by reading performance counters for the host application. Since the attacker in this case attacks the enclave from the untrusted part of the application, this amounts to identifying malicious behaviour in the untrusted code according to known behaviour of an attack (signature-based detection). In prior work, malicious processes using SCAs have been successfully detected by using information from performance counters and matching this to signatures of known attacks [38]. The question of whether the LVI attack has enough of a signature to be detected with such a method is therefore of interest, though there are further practical limitations with such a method in the SGX-context (see Section 5.3).

Usage of HexPADS The code base for HexPADS was used in this work to measure performance counters of sev- eral processes, both for comparing LVI to other processes as described in Section 3.3.2 and for detecting LVI attacks (Section 3.4). In both cases, the components related to reading pro- cesses and performance counters were mostly left unchanged, while the detector component was changed to fit the purposes of this work. Since this work mainly focuses on detection, the mitigator component of HexPADS was not used.

3.3 Measuring LVI impact on counters

The purpose of this section is to present the method used to answer the first question re- lated to how the LVI attack impacts performance counters. In order to measure the impact of LVI attacks on performance counters, two series of tests were made. Firstly, tests were constructed for comparing different attack variants to each other. Secondly, tests were made for comparing LVI attack processes to other, benign processes by running a range of different attack and non-attack scenarios. The aim with this was to see in which ways LVI differs from benign applications when it comes to performance counters.

3.3.1 Difference between attack variants For the purpose of finding potential differences between the different attack variants, a test was constructed using the Linux perf command line tool. Using perf stat, which takes a pro- gram as input and gathers performance counter statistics for its entire runtime, the values for different counters could be gathered for the LVI attack examples. Perf stat can be used to collect both hardware and software performance counter statistics, and users can select one or more specific events to monitor. If more hardware events are specified than there are hardware registers for such counters, perf automatically multiplexes the counters (measures different counters in different time intervals) in order to give results for all the specified coun- ters. However, when multiplexing counters it’s important to remember that all the events are not measured all the time and that the resulting counts are just estimates of what the counts would have been if the corresponding events had been measured for the entire run of the program [50]. To avoid any unnecessary multiplexing, only one event was monitored at a

18 3.3. Measuring LVI impact on counters

Event Description INSTRUCTIONS Total number of retired (executed) instructions CYCLES Number of CPU cycles CACHE_REF LLC cache references (both hits and misses) CACHE_MISS LLC misses CONTEXT_SWITCH Number of context switches BR_INSTR Executed branch instructions BR_MISS Mispredicted branch instructions PAGE_FAULTS Total number of page faults MINOR_FAULTS Number of minor page faults MAJOR_FAULTS Number of major page faults L1D_LOADS L1 data cache loads L1D_LOAD_MISS L1 data cache load misses DTLB_LOADS Data Translation lookaside buffer (TLB) loads DTLB_LOAD_MISS Data TLB load misses DTLB_STORES Data TLB stores DTLB_STORE_MISS Data TLB store misses

Table 3.2: Selected events for investigating differences between LVI attack variants using perf stat.

time. Furthermore, each event was monitored in over 100 separate runs for every LVI attack example, where each run constitutes 4096 iterations of the main loop of the respective LVI attack variant. The selected events for investigation can be seen in Table 3.2, and were mostly selected on a basis of which performance counter events have been used in prior work as well as which counters could potentially be interesting to investigate for the LVI attack. For example, events related to the LLC as well as the L1 cache have been used to detect cache attacks [1]. At least one work [39] has also used events related to the data Translation Looka- side Buffer (dTLB) to detect CSCAs. Using the number of instructions and/or cycles is also common in much of the prior work [1]. Finally, Mushtaq et al. [34] showed improvements to their tool WHISPER when adding events such as page faults and branch mispredictions to detect Meltdown and Spectre attacks respectively. Due to how the LVI attack functions, the page fault related events are especially interesting in this regard. Comparing the counter values for the different attack variants then allowed for any po- tential major differences between these to be discerned.

3.3.2 Comparing to benign scenarios For these tests, measurements for running processes were taken using a modified version of HexPADS [38] which was described earlier. This tool uses the perf_event_open interface in order to collect samples for the performance counters of interest. The interface is one of the underlying interfaces for the perf command line tool, and offers access to a wide variety of both hardware and software performance counters. As in the previous tests, which performance counters to investigate was decided by both looking at which counters have been successfully used in related work, and by analyzing the LVI attack itself to see which counters could be interesting. The selected counters are shown in Table 3.3. Note that while many events are similar and transferable to events in Table 3.2, there are some differences when it comes to, for example, certain cache events. Some events related to the instruction TLB were also investigated, since they had been previously been used in the work by Gruss et al. [16] to normalize other events.

19 3.3. Measuring LVI impact on counters

Event Description INSTRUCTIONS Total number of retired (executed) instructions CACHE_REF LLC references CACHE_MISS LLC misses BR_INSTR Executed branch instructions BR_MISS Mispredicted branch instructions PAGE_FAULTS Total number of page faults L1D_RA L1 data cache read accesses L1D_RM L1 data cache read misses DTLB_RA Data TLB read accesses DTLB_RM Data TLB read misses ITLB_RA Instruction TLB read accesses ITLB_RM Instruction TLB read misses

Table 3.3: Selected events for using with the perf_event_open interface to measure counters for running processes.

The purpose of these tests is to compare the values of these performance counters when an LVI attack is ongoing, to their values for other benign processes. In order to achieve this, a number of different scenarios were constructed in order to showcase both attack applica- tions and a variety of benign processes with varying loads on the system, like web browsers, games, benchmark programs, etc. Daniel Gruss et al. [16] demonstrate a threshold detection method in order to detect Flush+Reload and Rowhammer attacks. To arrive to their thresh- olds, they first evaluated a number of different performance counters for 8 different scenarios, ranging from casual computer usage situations to high load benign applications (using the stress benchmark tool) and finally attack applications. For this work, a number of different processes were evaluated with the similar goal of including several benign computer usage situations with varying load. The selected scenarios are seen in Table 3.4.

Nr Scenario Description 1 LVI-US-SB Normal LVI-US-SB PoC attack scenario 2 LVI-PPN-L1D Normal LVI-PPN-L1D PoC attack scenario 3 LVI-US-SB-ROP Normal LVI-US-SB-ROP PoC attack scenario 4 NA (SB) No Attack (NA) scenario (LVI-US-SB) 5 NA (L1D) No Attack (NA) scenario (LVI-PPN-L1D) 6 NA (SB-ROP) No Attack (NA) scenario (LVI-US-SB-ROP) 7 Text editor User creating a new file and writing to it in the Atom text editor 8 Firefox - Youtube User watching a youtube video in the Firefox browser 9 Firefox - Twitter User scrolling down a twitter feed in the Firefox browser 10 Game User playing a game (Civilization 5) via the Steam platform 11 Stress -c 1 Loop continuously performing CPU computation sqrt() 12 Stress -m 1 Loop continuously performing malloc() and free() for 256MB arrays 13 Stress -i 1 Loop continuously calling I/O function sync()

Table 3.4: Explanation of different scenarios for measuring performance counters.

The first three scenarios simply demonstrate the unchanged LVI attacks variants. To com- pare with situations where an attack doesn’t take place, scenarios where the victim enclave is simply run without an attack were included (scenarios 4-6). Since all three attack variants have designated victim code loops that it attacks, there are also three different baseline sce- narios without an attack, one for each variant. Scenarios 7-10 can be seen as normal, benign computer usage processes, while scenarios 11-13 demonstrate benchmark applications that put a high load on the system.

20 3.4. Detection

Data collection For all scenarios in Table 3.4, performance counter data for all events specified in Table 3.3 was collected for all running user processes, including the LVI process in scenarios 1-6 (which is run with root capabilities). Due to limits of how many events can be counted at one time, the events for each scenario needed to be collected in separate runs. Using the cpuid utility it was found that the test system used for this work had 3 fixed counters per core and 4 pro- grammable counters. So apart from the fixed counters, a maximum of 4 other hardware coun- ters could be used simultaneously. It was further confirmed using Intel documentation that the hardware event INST_RETIRED_ANY measuring the number of retired instructions was a fixed counter. This means that the INSTRUCTIONS event which is tied to said hardware event could be used together with 4 other, non-fixed counter events [20]. Using more counters than this limit resulted in either very low values or no values at all for any running processes. This limit doesn’t affect software performance counters however since they don’t make use of any hardware registers. Another possible limitation comes from perf_event_open opening a new file descriptor for each event and process. A large number of processes and counter events can thus lead to one reaching the limit of file descriptors for a process (1024 in this case). This limit can however be changed, but should probably be done so with care. Because of the limited amount of counters that could be used at the same time, perfor- mance counter data for the events was collected during three separate runs for each scenario. The INSTRUCTIONS event was however collected for each run, since if an event is compared to the total number of instructions, it’s more reasonable to have measured the instructions event during the same run in case there are significant differences across runs. It was also made sure that events that could be used together for calculating various statistics were mea- sured in the same run. For example, since the LLC cache miss rate is calculated by dividing LLC cache misses with LLC cache references, these were measured in the same run. Each run consisted of running the corresponding scenarios while at the same time running the measurement tool for 150 iterations, with one second between each iteration. In each iter- ation, the tool scanned for running processes and sampled counter values before resetting the counters again. It’s important to note that while counters were sampled in the sense that they were read and then reset in each iteration, they were still used in counting-mode and not in sampling-mode, which means something slightly different in the context of perf_event_open. For each complete run, the first 120 samples for the relevant processes in each scenario were gathered. From this data, averages for every event could be analyzed and compared across the different scenarios. The INSTRUCTIONS event was also used in order to provide normalized averages of each event.

3.4 Detection

This section describes the approach used to create a detection method for LVI. Note that this section builds upon the results from the tests outlined in Section 3.3, which are presented in Section 4.1. Based on the data measuring the impact of LVI attacks on different performance counter events, the events with the most promise for detecting an LVI attack could be determined (Subsection 3.4.1). Threshold-based detectors using these counter events could then be used in order to answer the question of whether the LVI attacks are theoretically possible to detect using performance counters. A cache side-channel attack like Flush+Reload at the end of the LVI attack loop was assumed, and no changes to the PoC attacks were made.

3.4.1 Chosen attack indicators Given the results in Section 4.1, the performance events that were deemed to show the most promise for detection, were total number of instructions, page faults (total as well as minor

21 3.4. Detection and major), LLC cache misses, and LLC cache references. These events (except instructions) showed a large difference between attack and no-attack LVI scenarios, and were quite signif- icant in attack scenarios compared to other, benign scenarios. Page faults, LLC cache misses and references can also be directly tied to the behaviour of an LVI attack, which is not true for many of the other events. For the page faults, a distinction was made between valid and invalid page faults since the invalid page faults seemed to be the best indicator of an attack. Since a counter event for specifically invalid page faults doesn’t exist, a high amount of total page faults but a low amount of both minor and major faults was used to indicate a high amount of invalid faults.

3.4.2 Detectors We note that while the two LVI-US variants could potentially be detected by just using the total number of total and minor page faults, LVI-PPN-L1D needs to be detected based on features related to the cache side-channel, since the invalid page faults are not visible through the performance counters in that scenario. The detection was therefore split into two separate detectors, hereby referred to as DET_LVI_US and DET_LVI_PPN. These custom detectors were integrated into the detection module of the HexPADS tool. Apart from changing which counters the tool measured and extending the monitored processes to root processes and not only user processes, the other components of the detection tool were mostly left as is.

DET_LVI_US For LVI variants where the invalid page faults are recorded by the performance counters, a de- tector was designed based on the amount of total page faults being above a certain threshold and the amounts of minor and major page faults being below certain thresholds. Thresholds for the other attack indicators mentioned in Subsection 3.4.2 (instructions and LLC misses and references) were also used. Finally, the LLC miss rate being above a certain threshold was also used for this detector. Pseudocode for DET_LVI_US is shown in Figure 3.3, with constants for all thresholds. How the actual thresholds were later chosen (see Subsection 4.2) differed slightly depending on the indicator, but the thresholds were mainly formed through observations of the LVI attacks and the data from the results of the tests described in Sec- tion 3.3.2. When creating such thresholds, one needs to keep in mind that they should be high enough as to lessen the possibility of false positives. However, they should also not be too tailored to the observed data specifically, but instead give some margin for the attack to change behaviour. This is especially true since the attacks studied in this work are PoCs, and do not represent real-life attacks.

22 3.4. Detection

1 cache_miss_rate = cache_misses / cache_references; 2 i f (total_page_faults > TPF && 3 minor_faults INSTR_1 && 8 cache_references >CR_1&& 9 cache_misses >CM_1&& 10 cache_miss_rate > CM_RATE_1) 11 { 12 // Attack detected! (Full detection) 13 } 14 }

Figure 3.3: Pseudocode for DET_LVI_US, detector for the LVI-US-SB attack variants.

When evaluating the detection method (see Section 4.3), it was noted whenever a process matched all the attack indicators (Full detection). However, in order to evaluate the detection capabilities of just using the page fault characteristics, it was also noted whenever a running process matched these initial attack indicators (Partial detection). This is visualized by the comments in pseudocode shown in Figure 3.3.

DET_LVI_PPN The second detector, DET_LVI_PPN, is similar to the first one except that it cannot detect based on a high total of page faults, since the results showed that the invalid page faults caused by LVI-PPN-L1D weren’t reported to the performance counters (leading to a low number of total page faults). Instead, another attack indicator was added in an attempt to minimize the possibility of false positives. From the results it was observed that the values of counters for many benign applications varied a lot between samples, while the LVI pro- cesses showed very little variation between samples, since they all run a continuous attack loop without major variations between iterations. A measurement on this variation could be obtained by dividing the total number of instructions for the current iteration with the same value for the last iteration. In order to fully detect an attack with DET_LVI_PPN, this value was assumed to be within a certain interval. While this indicator can increase the time to detect an attack, it can also lessen the risk of false positives. How necessary this final indica- tor could be was tested during evaluation (Section 4.3) by noting every process that matched all earlier indicators (Partial detection) as well as every process that also matched the final indicator (Full detection), using the same approach as for DET_LVI_US.

23 3.5. Evaluation

1 cache_miss_rate = cache_misses / cache_references; 2 instr_diff = instructions / instructions_prev_iteration; 3 i f (total_page_faults < TPF && 4 minor_faults INSTR_2 && 8 cache_references >CR_2&& 9 cache_misses >CM_2&& 10 cache_miss_rate > CM_RATE_2) 11 { 12 // Attack potentially detected! (Partial detection) 13 i f (instr_diff > IDIFF_LOW && instr_diff < IDIFF_HIGH ) 14 { 15 // Attack detected! (Full detection) 16 } 17 } 18 }

Figure 3.4: Pseudocode for DET_LVI_PPN, detector for the LVI-PPN-L1D attack.

3.5 Evaluation

In order to evaluate the effectiveness of the detectors for LVI and provide an answer to the sec- ond research question, the same scenarios described in Section 3.3.2 were used to offer both attack and non-attack scenarios. In addition to these, an idle scenario (only detection tool run- ning) and a benchmarking scenario using Geekbench5 [28] were also added. The Geekbench5 benchmark tests a system by running several different (both single-core and multi-core) CPU workloads doing certain tasks, like AES, image compression, ray tracing and more. The sce- narios were run for 120 seconds while the detection tool measured performance counters for all running processes (with the standard sampling interval of 1 s) and noted whenever a po- tential attack was detected. It was also noted whether a potential attack process matched all the attack indicators (Full detection) or some of them (Partial detection).

3.5.1 Sampling interval, speed and overhead An effective detection tool should however also be evaluated from additional criteria other than just detection accuracy. Speed of detection and overhead are both important factors for an effective detection tool, so these factors were also considered. For this detection tool, the main factor impacting both these factors is the sampling interval, meaning the time between samples. A shorter interval between sample iterations can lead to faster detection of a poten- tial attack, but also leads to more overhead since samples are gathered much more often. It can also impact the accuracy of the detection. The same scenarios as before were therefore evaluated again but with a sampling interval of 100 ms instead of 1 s. Each scenario was again run for at least 120 seconds. The limits related to things like total number of page faults or instructions, were simply divided by 10. Thresholds regarding the cache miss rate or the instruction difference across samples were left unchanged. Regarding the speed of detection, it was also noted (for both 100 ms and 1 s sampling interval) how many iterations it took for the detection to detect an LVI attack process. Finally, the overhead of the detection process was measured by running the detection tool at different sampling intervals and observing memory and CPU usage with the Linux top command.

24 3.5. Evaluation

3.5.2 System load Another important aspect that can affect the effectiveness of a detection method is the system load. If there are multiple processes consuming a lot of resources, the performance of the whole system is affected. As shown in prior work [34], different load scenarios can thus also impact the accuracy of detection methods using performance counters. While no extensive testing of how different load conditions impacted LVI was done for this work, some evaluation was done by running several of the average and higher load scenarios (web browser, text editor, game, stress, Geekbench) at the same time, alongside an LVI attack and the detection tool (1 s sampling interval).

25 4 Results

This chapter presents the results that were obtained in this work. Section 4.1 presents the results regarding the impact of the LVI attack on performance counters. Section 4.2 presents the thresholds used for detecting the LVI attacks, while Section 4.3 presents the results of evaluating the detection method.

4.1 Attack impact on performance counters

This section presents the results from measuring the impact of LVI on performance coun- ters. Subsection 4.1.1 presents results for the tests outlined in Subsection 3.3.1 of the Method chapter, while the results in Subsection 4.1.2 correspond to Subsection 3.3.2.

4.1.1 Difference between attack variants In Table 4.1 the average values for every event and attack variant calculated over 100 runs are shown. As stated in the method, each performance event was measured separately. One of the more obvious observations one can make from the data in Table 4.1 is that the software events CONTEXT_SWITCH and MAJOR_FAULTS are zero for all attack variants. One can also see that for most of the other events, the three different variants have very similar averages. Many of the minor differences for certain events are too small to draw any meaningful conclusions from, and can probably be simply explained by the fact that the main loops of the different variants are different in length. For example, the LVI-US-SB-ROP variant has a slightly higher average number of instructions, so it makes sense that it also has a slightly higher average of for example L1D_LOADS. Other deviations can occur due to background processes and differences in how the CPU schedules tasks, etc. The only event that shows any significant difference between different attack variants is the total number of page faults. For the LVI-US-SB and the LVI-US-SB-ROP variants, the average total number of page faults is the same at around 4 594. However, for the LVI-PPN- L1D variant, the same number only reaches around 500. Furthermore, the measurements for total number of page faults for LVI-PPN-L1D seem to correspond well to the measurements for the number of minor page faults. This can be clearly seen in the graph shown in Figure 4.1, where total page faults for each run is compared to the minor page faults in each run. Further testing with perf stat, measuring both total page faults and minor page at the same time

26 4.1. Attack impact on performance counters

Event / Attack variant LVI-US-SB LVI-PPN-L1D LVI-US-SB-ROP INSTRUCTIONS 301 864 263 301 835 434 301 905 309 CYCLES 1 813 396 070 1 813 939 022 1 814 248 665 CACHE_REF 4 621 505 4 648 929 4 769 514 CACHE_MISS 2 382 520 2 286 442 2 408 731 CONTEXT_SWITCH 0 0 0 BR_INSTR 43 733 643 43 733 400 43 741 053 BR_MISS 63 034 62 899 61 869 PAGE_FAULTS 4 594 499 4 594 MINOR_FAULTS 498 498 498 MAJOR_FAULTS 0 0 0 L1D_LOADS 110 233 156 110 217 199 110 251 029 L1D_LOAD_MISS 3 269 828 3 284 936 3 272 543 DTLB_LOADS 110 234 694 110 217 026 110 251 147 DTLB_LOAD_MISS 1 261 301 1 254 786 1 259 786 DTLB_STORES 86 666 442 86 658 249 86 674 773 DTLB_STORE_MISS 1 107 731 1 111 925 1 108 253

Table 4.1: Averages of performance counters for different LVI attack variants calculated over 100 runs of each attack & event combination, counters measured with perf stat.

when running the LVI application, showed that for the LVI-PPN-L1D variant the only visible measured page faults are minor page faults. For the other two variants, the average amount of total page faults is much higher than the amount of minor and major page faults combined. It therefore seems like invalid page faults are captured by the total page faults event, and that the majority of page faults for the LVI-US variants are the invalid page faults generated when the victim enclave is interrupted during the attack. These two variants use the same method to cause the page faults, so it makes sense that they are similar in this regard. However, the SGX-specific EPCM page faults that the LVI-PPN-L1D variant causes thus seems to not be recorded by performance counters.

Figure 4.1: Number of total page faults and number of minor page faults for LVI-PPN-L1D.

27 4.1. Attack impact on performance counters

4.1.2 Comparing to benign scenarios Table 4.2 and Table 4.3 show the average of 120 samples of each event for the processes corre- sponding to each scenario described in earlier chapters. The values shown in Table 4.3 have been normalized by dividing with the average total number of instructions recorded during that run and then scaled by a factor of 1000. Note that not all events were recorded during the same run. Also note that for some of the scenarios, there were several related processes, and that the results shown for these scenarios in the table below correspond to only one of these processes. For scenarios where there were several related processes, see Appendix A for data on additional processes.

Page faults One can see in Table 4.2 that the number of page faults for the two LVI-US-SB attack variants is quite large compared to other processes. While the number of page faults for stress_m is very high, further examination with the perf command line tool showed that all of these consisted of minor page faults. The LVI attacks on the other hand generate a lot of visible page faults in the cases of LVI-US-SB and LVI-US-SB-ROP, but almost none of these are minor faults. In fact, it could be seen that the amount of minor page faults for an LVI-US-SB attack scenario is the same as when the LVI application is run without an attack taking place. A high total of page faults paired with low amounts of minor and major faults therefore seems like it can be a good indicator of an LVI attack, at least for the two LVI-US variants.

Cache misses and references Compared to No Attack (NA) scenarios, the averages for CACHE_REF and CACHE_MISS are much higher for LVI attacks, both in terms of normal averages (Table 4.2) and normalized averages (Table 4.3). This makes sense due to the cache side-channel used by all the LVI attack variants, which causes higher amounts of cache misses but also cache references. The normal averages are also quite high compared to other, benign scenarios when one looks at the total (Table 4.2). However, when normalized to number of instructions (Table 4.3) they don’t stick out nearly as much. One can also see that the cache miss rate, taken as the ratio between average cache misses and average cache references, is much higher for LVI attack variants than for LVI example applications without an attack. However, it can also be seen that this ratio is significantly higher for other scenarios, like the Firefox, Game and stress_m scenarios. For the Game scenario, the cache miss rate for the process shown in Table 4.2 and for many of the additional processes (see Appendix A), is close to or even surpasses 90%. Many of the processes are related to the Steam platform, and even without a game running they can reach very high cache miss rates. A detection method based on only a high cache miss rate would therefore be almost guar- anteed to lead to false positives for several of the benign scenarios. Even when combined with for example total number of cache misses, it can lead to false positives. For example, false positives were experienced for several scenarios with the original version of HexPADS, which uses thresholds based on total cache misses, a high cache miss rate as well as a low rate of page faults. While no thorough testing was done with the original version HexPADS, false positives were experienced multiple times for processes related to Steam, the Atom text editor and the Firefox web browser. While they would have to be paired with other attack indicators in order to avoid false positives, the large difference between attack and no attack scenarios still indicate that cache references and cache misses could be good indicators of an attack.

28 4.1. Attack impact on performance counters

Event / Scenario LVI-US-SB LVI-PPN-L1D LVI-US-SB-ROP NA (SB) NA (L1D) NA (SB-ROP) INSTRUCTIONS 321 959 222 322 566 762 322 573 658 364 360 594 366 599 631 365 612 496 CACHE_REF 8 847 386 9 039 289 9 046 131 179 179 159 473 151 051 CACHE_MISS 2 786 524 2 815 215 2 808 805 4 962 6 029 5 547 BR_INSTR 48 135 524 48 237 409 48 227 421 59 736 716 60 164 538 59 963 871 BR_MISS 86 638 96 423 84 490 303 263 307 532 303 803 PAGE_FAULTS 10 545 3 10 555 0 0 0 L1D_RA 110 111 185 110 263 070 110 215 485 107 063 192 107 233 137 107 308 894 L1D_RM 5 665 740 5 677 329 5 629 507 10 247 301 10 185 916 10 029 212 DTLB_RA 110 306 150 110 324 717 110 396 478 106 986 705 107 292 913 107 141 397 DTLB_RM 3 245 746 3 224 027 3 184 735 2 005 517 1 835 192 2 015 137 ITLB_RA 142 202 119 1 286 1 072 1 210 ITLB_RM 440 423 441 517 424 313 1 789 295 1 798 069 1 753 699

Event / Scenario Text editor Firefox - youtube Firefox - twitter Game INSTRUCTIONS 30 262 086 5 658 974 30 403 103 917 409 CACHE_REF 4 465 897 898 182 4 529 313 414 827 CACHE_MISS 2 412 578 408 687 2 165 473 359 319 BR_INSTR 6 773 521 1 303 641 6 861 281 183 467 BR_MISS 372 262 71 821 376 110 22 996 PAGE_FAULTS 24 25 58 0 L1D_RA 7 222 439 1 952 597 9 440 485 260 499 L1D_RM 667 201 181 662 874 141 31 122 DTLB_RA 6 923 730 1 585 204 11 079 259 344 023 DTLB_RM 73 326 14 081 68 328 5 704 ITLB_RA 72 560 32 937 200 283 3 201 ITLB_RM 77 837 15 102 73 985 8 774

Event / Scenario stress_c stress_m stress_i INSTRUCTIONS 6 244 337 465 6 276 236 209 366 CACHE_REF 1 841 942 634 170 119 CACHE_MISS 378 768 787 379 BR_INSTR 1 555 152 677 2 281 899 125 621 BR_MISS 2 969 789 82 314 PAGE_FAULTS 0 563 245 0 L1D_RA 1 747 086 675 564 114 43 564 L1D_RM 993 810 866 54 963 DTLB_RA 1 747 902 258 569 712 41 894 DTLB_RM 4 568 479 407 ITLB_RA 880 415 23 405 ITLB_RM 4 258 406

Table 4.2: Averages of different performance counter events for different scenarios. Results for scenarios related to the LVI PoCs are shown in the top table. The middle table shows results for the more normal and benign computer usage scenarios while the lower table shows results for scenarios related to the stress benchmark tool.

29 4.1. Attack impact on performance counters

Event / Scenario LVI-US-SB LVI-PPN-L1D LVI-US-SB-ROP NA (SB) NA (L1D) NA (SB-ROP) INSTRUCTIONS 1000.00 1000.00 1000.00 1000.00 1000.00 1000.00 CACHE_REF 27.48 28.02 28.04 0.49 0.44 0.41 CACHE_MISS 8.65 8.72 8.71 0.01 0.02 0.02 BR_INSTR 149.51 149.54 149.51 163.95 164.12 164.01 BR_MISS 0.27 0.30 0.26 0.83 0.84 0.84 PAGE_FAULTS 0.03 0.00 0.03 0.00 0.00 0.00 L1D_RA 342.05 341.94 341.92 293.27 293.23 293.45 L1D_RM 17.60 17.60 17.46 28.07 27.85 27.43 DTLB_RA 342.05 342.06 342.06 293.27 293.23 293.45 DTLB_RM 10.07 10.00 9.87 5.50 5.02 5.52 ITLB_RA 0.00 0.00 0.00 0.00 0.00 0.00 ITLB_RM 1.37 1.37 1.31 4.90 4.91 4.80

Event / Scenario Text editor Firefox - youtube Firefox - twitter Game INSTRUCTIONS 1000.00 1000.00 1000.00 1000.00 CACHE_REF 147.57 158.72 148.98 452.17 CACHE_MISS 79.72 72.22 71.23 391.67 BR_INSTR 223.83 230.37 225.68 199.98 BR_MISS 12.30 12.69 12.37 26.07 PAGE_FAULTS 0.00 0.00 0.00 0.00 L1D_RA 261.75 277.19 271.60 345.68 L1D_RM 24.18 25.79 25.15 41.30 DTLB_RA 262.07 279.89 271.33 346.84 DTLB_RM 2.78 2.49 1.67 5.75 ITLB_RA 2.75 5.82 4.90 3.23 ITLB_RM 2.95 2.67 1.81 8.85

Event / Scenario stress_c stress_m stress_i INSTRUCTIONS 1000.00 1000.00 1000.00 CACHE_REF 0.00 150.19 812.54 CACHE_MISS 0.00 122.49 1.81 BR_INSTR 249.05 363.58 600.01 BR_MISS 0.48 0.01 1.50 PAGE_FAULTS 0.00 90.88 0.00 L1D_RA 279.94 91.02 199.82 L1D_RM 0.00 130.83 252.10 DTLB_RA 279.94 91.07 199.82 DTLB_RM 0.00 90.87 1.94 ITLB_RA 0.00 0.07 111.63 ITLB_RM 0.00 0.04 1.94

Table 4.3: Averages of different performance counter events for different scenarios, normal- ized to total number of instructions and scaled by a factor of 1000.

30 4.1. Attack impact on performance counters

Other events Judging by the results, branch instructions and branch mispredictions don’t seem to be a good indicator of an ongoing LVI attack. The total number of branch instructions and branch misses both (Table 4.2) increase in the LVI no attack scenarios, and the rate between branch mispredictions and total instructions (Table 4.3) is higher for many other scenarios. It is rea- sonable that branch instructions and mispredictions don’t make good indicators of an ongo- ing attack. While LVI is similar to Spectre in some ways, it does not use branch mispredictions to hijack control flow. The L1D cache read accesses and misses don’t seem to be a good indicator, either. Firstly, the amount of L1D read accesses stays roughly the same for the no attack scenarios while read misses increase (Table 4.2). While the total number of instructions is larger for no attack scenarios, the events don’t stick out in any particular way in comparison to total number of instructions, either (Table 4.3). One can also note that the total number of L1D read misses (Table 4.2) outnumbers the L1D read accesses for the stress_m and stress_i scenarios, making the rate between misses and accesses unreliable. While low dTLB miss rates has been used (together with high cache miss rate) in an earlier work to detect CSCAs [39], the rate of dTLB read misses to dTLB read accesses for LVI doesn’t seem to be particularly low here compared to other, benign scenarios. This is true even for several scenarios with a pretty high cache miss rate (e.g. Game), so using high cache miss rate together with low dTLB rate doesn’t seem to be a good enough indicator. In the work by Gruss et al. [16], iTLB events were used to normalize other events. How- ever, it can immediately be noted in regards to the number of iTLB read accesses and iTLB read misses (Table 4.2), that the average number of misses is often larger than the average number of accesses. For the LVI scenarios, both with and without actual attacks, this differ- ence is quite significant. This is different to the results in [16], and might be because iTLB misses events are mapped to a different, actual hardware event for this system. For this reason, the iTLB events were considered to not be suitable for creating detection thresholds, or for normalizing events as in [16].

Fluctuation of data It’s important to note that for some of the scenarios, the values of counters showed large fluctuations, meaning that the maximum values for those counters were much larger than the averages. Figure 4.2 illustrates this by comparing the total number of instructions over 50 samples for LVI-US-SB and two other benign scenarios.

31 4.1. Attack impact on performance counters

(a) Total number of instructions for the LVI-US-SB and Firefox - Twitter scenarios.

(b) Total number of instructions for the LVI-US-SB and Text editor scenarios.

Figure 4.2: Fluctuations in total number of instructions.

When the total number of instructions varies, variations are also naturally found for the other events as well. The scenarios with the largest fluctuations were the text editor, web browsing, and game scenarios. This could be expected since those scenarios are a bit more varied in their behaviour during the course of the samples. For example, Firefox scenarios shows quite a spiky behaviour, which might imply that the larger spikes correspond to when the bottom of the Twitter feed was reached and older content was loaded. The large spike for the Text editor scenario around sample 5 could perhaps correspond to when the new file was actually created, which could demand more instructions than when just writing to the existing file. While these fluctuations can be expected, they are important to keep in mind when looking at the averages for those scenarios. In comparison, the LVI attack scenarios shows no large fluctuations for either counter, since it simply continuously runs an attack loop for several iterations, and doesn’t change behaviour significantly across sample iterations. The same is true for the stress benchmark scenarios. Another thing to note is that the very first sample for every scenario, would give counter values that were significantly smaller than the rest of the samples. This is most easily seen

32 4.2. Detection thresholds by looking at the graphs for LVI-US-SB in Figure 4.2. For many scenarios, this first sample value would even be zero. Interestingly, the same behaviour could be observed even when changing the sampling interval to 100 ms or even 10 ms, in that just the first sample would be unusually low and non-representative. This is probably due to there being a shorter time span between the initialization of the counters and the first reading, than between subsequent readings.

4.2 Detection thresholds

This section presents the detection thresholds used for the detectors described in Sec- tion 3.4. Subsections 4.2.1 and 4.2.2 present the thresholds used for the DET_LVI_US and DET_LVI_PPN detectors respectively.

4.2.1 DET_LVI_US As mentioned in Subsection 3.4.2, the detection thresholds should not be too low as to give false positives, while at the same time not too high so that they become too adapted to the specific PoC examples used here. With this in mind, it was noted that the average number of page faults for the LVI-US-SB attack was 10 545 according to the data in Table 4.2, while the benign process with the highest number of total page faults (apart from the outlier scenario stress_m) was Web Content(1) in the Firefox - Twitter scenario (Appendix A), with 2 322 page faults. By taking the number with the maximum distance to both of these numbers and rounding it to the nearest thousand, a threshold of 6000 was derived. Since the number of minor and major faults were observed to be zero consistently for the attacks (except during the initial setup phase), these were assumed to be zero or close to zero. Since major faults are less frequent than minor faults in other processes, the threshold for major faults was set to 0 while the minor faults threshold was set to 10. As for total number of instructions, cache references and cache misses (for the current sample), their thresholds were chosen to be approximate to one third of the value that was observed when running the attacks. The LVI attacks consistently reached cache miss rates of over 0.3, so this was also used as a threshold. It should be noted that this cache miss rate is quite low, especially to other attacks that make use of CSCAs. When investigating a version of the original Poc of Spectre for example, cache miss rates of over 0.8 were consistently measured. All the chosen values for the thresholds used with the DET_LVI_US detector are shown in Table 4.4.

Threshold Value TPF (>) 6000 MINF_1 10 MAJF_1 1 INSTR_1 100000000 CR_1 3000000 CM_1 1000000 CM_RATE_1 0.3

Table 4.4: Chosen values for the thresholds used with the DET_LVI_US detector.

4.2.2 DET_LVI_PPN Since DET_LVI_PPN couldn’t rely on invalid page faults as an attack indicator, the risk of false positives was deemed to be larger for this detector. The thresholds for number of in-

33 4.3. Evaluation of detection structions, cache references and cache misses were therefore increased. Furthermore, the number of minor page faults was assumed to be zero instead of close to or equal to zero. Finally, the value for the difference in instructions between iterations was assumed to be within 0.9-1.1. When observing the LVI attacks, this value was found to often be less than a hundredth away from 1. The specific values used as thresholds for the DET_LVI_PPN detector are shown in Ta- ble 4.5.

Threshold Value TPF (<) 6000 MINF_2 1 MAJF_2 1 INSTR_2 150000000 CR_2 6000000 CM_2 2000000 CM_RATE_2 0.3 IDIFF_LOW 0.9 IDIFF_HIGH 1.1

Table 4.5: Chosen values for the thresholds used with the DET_LVI_PPN detector.

4.3 Evaluation of detection

This section presents the results from evaluating the detection tool with the tests outlined in Section 3.5. The results from running the detection tool (sampling interval 1 s) during different attack and non-attack scenarios are shown in Table 4.6.

DET_LVI_US DET_LVI_PPN Partial Full Partial Full LVI-US-SB X X x x LVI-PPN-L1D x x X X LVI-US-SB-ROP X X x x NA (SB) - - - - NA (L1D) - - - - NA (SB-ROP) - - - - Text editor - - - - Firefox - Youtube - - - - Firefox - Twitter - - - - Game - - - - Stress_c - - - - Stress_m - - - - Stress_i - - - - Idle - - - - Geekbench5 - - X -

Table 4.6: Detection results during different scenarios, each running for at least 120 seconds and a sampling interval of 1 s. The "X " symbol means that an attack process was correctly reported as an attack, while "X " means that a non-attack process was reported as an attack at least once during this scenario (false positive). The "x" symbol corresponds to false negatives while "-" corresponds to true negatives.

As can be seen in Table 4.6, each attack scenario was fully detected by the suitable detector, and no other scenarios lead to a benign process being fully detected as an attack. The LVI-US- SB attacks were both detected after either 2 or 3 sample iterations of the detector. This could

34 4.3. Evaluation of detection vary depending on the number of minor page faults caused during the setup phase of the attack (creation of enclave, etc.), which could remain above the threshold for multiple sample iterations. Afterwards, the attacks were however consistently and reliably detected. The LVI-PPN-L1D attack was also detected fully and consistently with the DET_LVI_PPN detector, but it should be noted that Partial detection took 2 sample iterations while Full detection took 3 iterations, since the transition from setup phase to main attack loop in the attack application also means a bigger difference in number of instructions. This shows that the attack has to run for a longer time in order to match the final indicator. But as can be seen for the Geekbench scenario, which was reported as a potential attack (Partial) twice, not having the final indicator would have lead to benign processes being fully reported as attacks.

4.3.1 Sampling interval, speed, and overhead The detection results from using a sampling interval of 100 ms are shown in Table 4.7. From the table one can see that the DET_LVI_US behaves the same with a faster sampling rate, with no false positives. The attacks were detected after 6 or 7 sample iterations, meaning they were detected faster this time. It was however observed that Full detection wasn’t as consistent during the 120 second runtime of the scenario, due to the cache miss rate sometimes dropping below 0.3. The rate was however always over 0.25, and the results in Tables 4.6 and 4.7 seem to indicate that the final attack indicators of the DET_LVI_US detector could be lowered or even removed entirely for it to be a reliable detector.

DET_LVI_US DET_LVI_PPN Partial Full Partial Full LVI-US-SB X X X * x LVI-PPN-L1D x x X X LVI-US-SB-ROP X X x x NA (SB) - - X * - NA (L1D) - - - - NA (SB-ROP) - - - - Text editor - - X * X * Firefox - Youtube - - X * - Firefox - Twitter - - X * X * Game - - X * - Stress_c - - X * - Stress_m - - X * - Stress_i - - - - Idle - - X * - Geekbench5 - - X X

Table 4.7: Detection results during different scenarios and a sampling interval of 100 ms. A star (*) next to a X symbol means that a process not directly tied to the specific scenario was reported as a potential attack.

The DET_LVI_PPN detector on the other hand, showed much worse detection results with a shorter sampling interval. While the LVI-PPN-L1D variant was still fully detected (after 7 sample iterations), false positives were observed in the majority of scenarios, even with all attack indicators. It should be noted that for most of these scenarios (like LVI-US-SB), the false positives came from processes not directly tied to the scenarios, but from processes like gnome-terminal-server and gnome-shell. The Geekbench scenario that showed a false posi- tive for Partial detection with 1 s sample interval, was now reported as an attack even with all attack indicators, since the difference in number of instructions has a higher chance of

35 4.3. Evaluation of detection being small with shorter time between samples. For this reason, difference between number of instructions between doesn’t seem reliable enough as a final attack indicator, since benign processes that match the previous indicators can also display small differences between sam- ples under certain circumstances. It was also observed that for some of the scenarios (e.g. Geekbench), it would be quite hard to adjust any of the thresholds further so that it avoids false positives without also making it not able to detect the LVI attack.

Speed of detection The results from the tests showed that when LVI attacks were detected they were so within the first few iterations of the detection tool, and with a sampling interval of 1 s a Full de- tection took at most 3 iterations (meaning approximately 3 s). The question of how fast a detection method should be able to detect a side-channel attack for it to be effective, depends on multiple factors. The type of side-channel, the kind of application the attack is targeting and what secret data it’s trying to leak are all factors that affect how fast an attack could ex- tract the secret data. A side-channel attack also doesn’t have to always finish completely for an attacker to gain enough data, for example in the cases of secret key retrieval. According to Mushtaq et al. [34], an attacker only needs to complete 50 % of an attack in order to gain enough data to then later reconstruct the complete secret key through other means. In the case of LVI, it can be somewhat hard to say if the detection speed presented here is enough since the PoCs studied in this work do not represent real attacks. In the case study attack on AES-NI discussed in the original LVI paper [53] however, the authors note that their attack took on average 25.94 seconds to complete (including time needed for creating the enclave) and required over 240 000 executions of the AES function. The detection speeds demonstrated in this work are therefore deemed to be enough to prevent a more proper attack (assuming it could detected within a similar time frame).

Overhead Table 4.8 shows approximate usages of CPU resources at different sampling intervals, mea- sured with the Linux top command. From the table it can be seen that longer time intervals give relatively low overheads in terms of CPU usage but that this increases a lot with shorter time intervals. The memory usage was low enough for the tool to report 0 % usage for all sampling intervals.

1000 ms 100 ms 10 ms CPU usage (%) « 0.7 « 5.0 « 31.9

Table 4.8: Overhead of detection process at different sampling intervals. Each value were taken as the median from 100 samples taken with top.

4.3.2 System load When running several of the average and higher load scenarios alongside an LVI attack and the detection tool, it was observed that the DET_LVI_US detector could still reliably detect LVI-US-SB without giving false positives. However, the DET_LVI_PPN detector reported nu- merous false positives for several processes tied to the scenarios as well as unrelated processes like gnome-terminal-server. It was also observed that while LVI-PPN-L1D was still reliably detected, the total instructions showed more fluctuation, indicating that higher system loads could lead to worse detection with that as an attack indicator.

36 5 Discussion

This chapter provides a discussion around the work done in this thesis. First, the results and the possible implication for these results are discussed. In the next section, the method and possible improvements to it are discussed. Section 5.3 aims to provide an answer to the third research question stated in Section 1.3, while Section 5.4 ties back to some of the existing SGX defenses brought up in Section 2.5, and discusses whether they could be used against LVI. The final section offers some discussion around various societal factors (among others) related to the work.

5.1 Results

The results presented in the previous chapter indicate that the LVI-US-SB attack variant could be reliably detected using performance counters, without false positives in normal, benign computer scenarios and with reasonably good speed and overhead. It also seems like even if another type of covert channel apart from the cache is used, events related to the invalid page faults caused by the attack might be enough on its own for it to be detectable. The results how- ever also seem to indicate that the LVI-PPN-L1D variant studied in this work, doesn’t show enough of a distinct attack pattern to be detected reliably without also giving false positives for benign processes. One of the big reasons for these results is that it is detected based on fea- tures related to the cache side-channel, but that the performance counters don’t show a high enough cache miss rate for it to be reliably differentiated from many other, benign processes. At the same time, the LVI-PPN-L1D doesn’t seem to show any other obvious attack indicators that can be observed with performance counters other than those related to that cache side- channel. While there are multiple areas of improvement to the method used in this work (see Section 5.2), the results observed here indicate that the LVI attacks can’t be reliably detected based on the cache side-channel alone using the method presented here. Especially given that the cache miss rate was shown to be relatively low, at least for the PoCs investigated in this work. The reasons for this relatively low cache miss rate for LVI could perhaps have something to do with it targeting SGX or the simple nature of the PoC examples. What these results could mean for other LVI attack variants and SGX side-channel attacks in general is discussed in Section 5.1.1. Note that the implications discussed in this section are mostly theoretical, since a detection method for SGX-based SCAs has many practical lim- itations as discussed in Section 5.3.

37 5.2. Method

5.1.1 Possible implication of results for LVI When looking at the original classification tree of LVI attacks in Figure 2.2, many possible LVI attack variants have been suggested. Two of these were investigated in this work (LVI-US- SB, LVI-PPN-L1D), of which the LVI-US-SB variant was shown to be theoretically possible to detect with performance counters without false positives for benign processes. One can note that while there are multiple suggested LVI-US attack variants based on which microarchi- tectural buffer is targeted, it wouldn’t necessarily matter for a detection approach similar to the one used in this work. The performance counter events used are only related to the page faults used to preempt enclave execution, as well as the cache covert channel used to transmit enclave data. As such, the results for this particular variant should be transferable to all other LVI-US variants as well, and maybe even to the LVI-P variants as well since they don’t rely on the SGX-specific EPCM page faults either. Moreover, since the results seem to indicate that the invalid page faults alone could be enough to indicate an attack, a cache side-channel doesn’t necessarily have to be assumed. So if an attack would make use of another type of covert channel or use a more stealthy cache attack variant like Flush+Flush [16], the attack might still be detectable with performance counters. Finally, the majority of LVI variants rely on page faults, but a few rely on microcode as- sists instead. While none of the investigated PoC attacks used microcode assists, Intel Sky- lake processors do have two HPC events, FP_ASSIST.ANY and OTHER_ASSIST.ANY, that measure microcode assists [20] and that can be accessed using the raw event functionality of perf_event_open [24]. These events may make it possible to detect the LVI-MCA variants as well. LVI-PPN variants however rely on a specific type of page fault which was shown not to be visible through the performance counters, which make them hard to detect, since events related only to a cache side-channel at the end of the attack loop doesn’t seem to show enough of a distinct pattern that can be mapped to an attack reliably. Additionally, if a more stealthier cache attack technique like Flush+Flush or a different type of side-channel entirely is used, the attack would become even more unlikely of being detected.

LVI outside SGX In the original LVI paper [53], the authors suggested that some LVI variants (LVI-P and LVI-AD variants) may be exploitable in contexts outside SGX, like cross-process and user- to-kernel scenarios. The LVI-AD-LFB attack by Bitdefender researchers [30] also provided a PoC of such a scenario (cross-process). There are however questions regarding how realistic non-SGX LVI exploits are in real life. Even in the SGX-context, there are numerous complex steps needed to perform the attack as well as many requirements on the victim code in terms of exploitable code gadgets. These issues become even more complex in non-SGX contexts where the OS/VMM is not assumed to be compromised, since the attacker has much less capabilities. There are also features that when correctly used could raise the bar for LVI exploits in these scenarios considerably, like Supervisor Mode Access Prevention (SMAP) [53]. Intel themselves don’t seem to regard LVI as being a particularly realistic exploit in non-SGX contexts, due to the many requirements put on victim code and the increased difficulty in causing faults/assists during victim exe- cution [10]. As previously mentioned, the authors of the original paper also state that they consider LVI to be mainly relevant for SGX. However, they do encourage future work to research LVI in non-SGX settings.

5.2 Method

This section presents a discussion for several aspects of the method used in this work. Sub- section 5.2.1 discusses the choice of LVI for this work, while Subsection 5.2.2 discusses factors

38 5.2. Method around measuring the impact of LVI on performance counters. Subsections 5.2.3 and 5.2.4 presents discussion around the detection thresholds and the evaluation method respectively. The section is concluded with some criticism in regards to the sources used in this work.

5.2.1 Choice of attack As described in Section 3.1, the LVI attack was chosen for this work based on several fac- tors, like it being a relatively recent attack and PoC code being available. The possibly large drawbacks (for performance) of the current mitigation was also a large motivating factor. Many other SCAs towards SGX that are for example mitigated through microcode updates, are comparatively both harder to reproduce (since these updates would have to rolled back) and arguably less interesting to investigate. As a case study for investigating the viability of performance counters for detecting SCAs targeting SGX however, other SCAs like more traditional cache-based SCAs [14, 32] could also have been investigated. A limitation with the PoC attacks investigated in this work, is that they are not meant to demonstrate real-world attacks and are therefore somewhat simplified and differently struc- tured compared to how a more real attack would look like. However, these attack PoCs still demonstrate the core concept behind LVI attacks and consist of the most important steps necessary for any realistic attack variant. Therefore, most conclusions drawn from the perfor- mance impact of these PoCs could most likely be transferred to real attack variants as well. Reproducing real-world practical examples of attacks is deemed to be out of scope for this work.

5.2.2 Measuring impact on performance counters Selection of counter events While there are many performance counters, a limited amount of them were investigated due to time constraints, etc. The chosen counters to investigate were chosen by two main factors. Firstly on a basis of the behaviour of the LVI attack and how that could affect certain perfor- mance counters. Secondly on a basis of which counters that have been used in the related work. However, there is a possibility that other counter events exist that could potentially be used to indicate an LVI attack, and that weren’t taken into account here.

Selection of scenarios In order to investigate whether an LVI attack application could be detected among other run- ning processes, a selection of different scenarios were chosen in this work for comparing LVI attacks to benign processes with varying load on the system. This was based on the ap- proach utilized by Gruss et al. in one of the parts of their work on the Flush+Flush attack [16], where they detect other existing cache attacks using thresholds based on values of certain performance counters. Several of the scenarios used by Gruss et al. are used in this work as well, while other scenarios used in this work represent similar scenarios but with some dif- ferences in how they were constructed. However, the selection of scenarios can still be seen as somewhat arbitrary, and a different selection of scenarios would have resulted in different measurements as well. One can at least argue though that invalid page faults shouldn’t occur in any normal computing scenarios, and that this would have shown to be a good indicator of malicious (or erroneous) behaviour regardless of which other scenarios had been used.

Impact of different factors on measurements When using hardware performance counters, there are some factors to consider that can affect the reliability and reproducibility of measurements. Some of these are limitations of HPCs themselves that are discussed in Section 5.3.1, but some come from external factors, like the

39 5.2. Method scheduling of programs, OS activity, etc. [13]. The values are also likely to be different on a different system or OS (as well as which counter events are available). As previously men- tioned, this can affect results when measuring different sets of counters for several processes across several runs, since external conditions can be different during these runs. This was for example clearly experienced in the Game scenario where one of the additional processes related to Steam showed counter values during one of the runs, but zero for both following runs (this process was not included in the tables in appendix A for this reason). While each scenario was replicated as similarly as possible for each run, there is (for most scenarios at least) no way to exactly replicate results from a previous run. The total number of instructions used to normalize other events in Table 4.3 was therefore the value taken during that partic- ular run, and closely related events (e.g. cache references and cache misses) were always measured during the same run.

5.2.3 Detection thresholds Care was taken during this work not to create attack thresholds that were too closely adjusted to the specific behaviour of the attacks observed here. The reasons for this is the knowledge that an attack can change behaviour (speed of attack, etc.) as well as the several factors that can impact performance counters. There are several improvements that could be done to the detection used in this work. First of all, the detection thresholds used in this work were still mostly simple thresholds based on observation, and more advanced statistical analysis of attacker behaviour could be done. The thresholds could also be based more on a larger history of samples (similarly to the original HexPADS thresholds), rather than just the current sample. Finally, much of the state-of-the-art regarding SCA detection with performance counters, uses machine learning methods to detect attacks. It is possible that by training machine learning algorithms on LVI attack behaviour, better results can be achieved.

5.2.4 Evaluation To evaluate the effectiveness of the detection tool, the same scenarios as for measuring the impact on performance counters were used (along with two additional scenarios). The same issues that were discussed in Section 5.2.2 thus apply here as well, since a different selection of scenarios might have impacted the detection results in some ways. While the main goal of this thesis isn’t to provide a practical, ready-to-use detection tool for LVI, some attempts were also made to evaluate the detection tool based on additional important factors that impact its usefulness. How fast the detection was able to detect the attacks was therefore recorded during the tests, as well as the overall CPU and memory over- head of the detection tool. While the performance impact on the monitored processes due to the performance counters wasn’t investigated in this work, the author of the original paper on HexPADS [38] found that their version achieved negligible overhead (1s sampling rate). However, with more monitored events and shorter sampling rates, this overhead is likely to be larger. In regards to system load, the scenarios chosen for measuring counters and evaluating the detection demonstrated varying levels of system load. Some basic evaluation of the impact of system load was also provided in Section 4.3.2. However, given that related work has shown that high system loads can have large effects on the effectiveness of a detection tool, this is something that should be considered in a more thorough manner when developing detection tools. In some related works [34, 35], collections of SPEC benchmarks have for example been used to emulate different levels of system load.

40 5.3. Limitations of using performance counters to detect SCAs

5.2.5 Source criticism Most of the sources used in this work come from scientific databases such as IEEE Xplore and ACM Digital Library. These sources were mostly found by either using Google Scholar or finding references used in other works. Many sources were for example found using the two surveys [36,1] that were used in order to gain an overview of the recent research regarding both SCAs against SGX and SCA detection. Much of the research around these two fields referenced in this work has been published recently and are well cited, and can be said to represent the state-of-the-art within their respective fields. However, some of the primary sources for developing the threshold detection used in this work, like HexPADS [38] which was used for much of the detection source code, are a bit older and lack discussion around certain aspects like system load. This work however tries to address or at least acknowledge most of these factors in some way by taking inspiration from more recent works [35, 34].

5.3 Limitations of using performance counters to detect SCAs

This section aims to provide an answer to the third and final research question. The first part discusses some of the limitations and drawbacks with performance counters for SCA detection in general, and the second part discusses some of the specific limitations regarding performance counters in SGX contexts.

5.3.1 In general While hardware (and software) performance counters have been used extensively in the liter- ature to detect (mainly cache-related) SCAs, there are some issues with particularly hardware performance counters that should be considered when developing detection tools meant for deployment. Das et al. [13] discuss several of the challenges of using hardware performance counters. These include external sources that might be hard to control, like the behaviour of the OS and other processes, that affect the resulting counter values. Some events may also be non-deterministic, meaning that they change values between identical runs of a program. Another problem is overcounting, meaning that on some processors certain events may be counted more times than it actually happened. In this work for example, it was found (using validation tests included with PAPI) that many of the events related to L2 cache showed large errors, so these weren’t used. Some events may also be mapped to different, actual hardware events depending on the processor. The authors also investigated the use of HPCs for security, and found that most of the lit- erature neither acknowledged nor addressed the issues of overcounting or non-determinism. Detection thresholds used in this work were created with some of these issues in mind (by not creating thresholds adjusted too closely to attack), but are most likely still susceptible to some of them. Finally it should be mentioned that most detection methods will have limitations com- pared to prevention methods for an attack, since even the best detection methods rarely can be guaranteed work completely without false positives and false negatives.

5.3.2 Limitations for SGX The fundamental limitation of the method used in this work that prevents it from being prac- tical, is that the detection tool exists in untrusted code. It is therefore not protected against an adversary that has compromised the entire OS, which is the exact type of threat that SGX is supposed to protect against (and that an LVI attack assumes). A privileged attacker could also mess with the values or configurations of hardware performance counters, making their

41 5.4. Other possible defenses against LVI results unreliable. Software performance counters are particularly unreliable in this case since they originate from the OS and not from hardware. Moreover, as has been previously discussed in this work, hardware performance coun- ters are disabled for SGX enclaves (except in debug mode), so the method used here relied on using the performance pattern of the attacker (host application) instead. Having access to per- formance counter information for an enclave itself would perhaps allow detection methods to easier identify attacks towards it, since they could compare to normal execution performance and see if there is anomalous behaviour indicating an attack. However, as it stands now, it is not realistic to have hardware performance counter support for enclave processes, since it is currently disabled for good reasons. If it was enabled in the same way as for normal pro- cesses, it would give insight to the enclave process not only for a potential detection process, but for attackers as well. Attackers could then use the information from the HPC registers to easier mount attacks against the enclaves. In order for hardware performance counters for SGX enclaves to ever be a realistic option in the future, it would have to be implemented in a way that doesn’t significantly alter the threat model of SGX. Therefore allowing performance counter values for SGX to be seen system wide is probably not realistic, since that would allow a potentially untrusted OS to also observe the values from these counters. It would also have to be designed in such a way that an attacker wouldn’t be able to alter counter values or configurations. If this would even be possible in the future, it would probably mean hardware changes and specialized counters that can only be accessed by enclaves.

5.4 Other possible defenses against LVI

Given the current practical limitations of using performance counters to detect LVI attacks against SGX, some short discussion is provided here regarding other, existing defense mech- anisms (apart from the current compiler mitigation) and if they could be used to mitigate LVI. Future processors will include hardware fixes for LVI [51]. However, hardware fixes ob- viously doesn’t solve the problem for already released processors. However, some of the defense mechanisms introduced in Section 2.5 could possibly also be used to mitigate LVI. Cache side-channels can for example be hindered by wrapping sensitive code in TSX trans- actions using a software library like Cloak [15]. A previously unmentioned defense mechanism for SGX is SGX-Shield [48], which intro- duces ASLR (Address Space Layout Randomization) to SGX enclaves. Using such a method could perhaps mitigate the LVI variants hijacking control flow.

5.4.1 Detection methods While existing detection mechanisms like T-SGX and Déjà Vu are not discussed in the LVI paper, a successful LVI attack is dependent on preempting enclave execution by causing page faults (or microcode assists) within a victim enclave, and could therefore perhaps be detected using these existing mechanisms. The PoC examples are part of the SGX-Step repository, and while they don’t make use of its single-stepping features, some of the more advanced attacks described in the paper do [53]. As mentioned in Section 2.5.1, T-SGX and Déjà Vu are explicitly mentioned in the SGX-Step paper and it is stated that these tools would be able to detect the frequent interrupts caused by SGX-Step [54]. Following this logic, they should be able to detect an LVI attack using SGX-Step as well, at least if the attack uses the single- stepping features of SGX-Step. However, as previously described, they also bring up some of the drawbacks with these detection tools. Defenses like T-SGX and Déjà Vu would be much improved if information about page faults occuring within enclaves was recorded. In an revised SGX model (SGX2), which seems to only be available on certain newer processors currently, page faults are reported to the

42 5.5. The work in a wider context enclaves through the EXITINFO field in the SSA [19]. However, the authors of the T-SGX paper [49] state that an attacker could easily overwrite the exit reason by sending an arbitrary interrupt after the page fault. This way, the page fault would still not be reported to the enclave. In a paper by Jiang et al. [23] state that defenses that monitor performance metrics, like T-SGX and Déjà Vu, are not enough for SCA detection within SGX, since an attacker could get different pieces of a secret during separate runs of the enclave. They however also state that a workable solution for enclaves that are not part of a public service, would be to prevent arbitrary reruns of the enclaves. Such functionality is available as an optional feature of T- SGX [49].

5.4.2 Alternatives to SGX According to the original LVI paper authors, LVI is primarily applicable to Intel processors with SGX [51]. Processors by AMD or ARM with TEEs like TrustZone, etc. might therefore be protected against LVI. However, these other systems have been shown to also be vulnerable to numerous other transient execution vulnerabilities and side-channel attacks. One of the root causes that certain SCAs (like cache attacks) work for SGX (and other TEEs), is that Intel SGX isn’t designed with these types of attacks in mind. As mentioned earlier, Intel instead leaves it up to developers to design enclaves in a side-channel secure manner. As a response to the SGX design, researchers at MIT created the Sanctum [12] design, which focuses more on software isolation through minor changes in hardware in order to eliminate entire attack surfaces. With this design they claim to offer the same functionality as SGX, while also protecting against for example cache attacks without introducing too much overhead.

5.5 The work in a wider context

This section aims to provide some discussion around the work in a wider context, and discuss various societal, ethical and environmental factors related to this work.

5.5.1 Future of SGX By investigating whether performance counters could be used to detect SCAs against Intel SGX, the overall goal of this work was to investigate ways that TEEs like Intel SGX can be made more secure against malicious attackers. As cloud services become more prevalent and more and more user data is stored or computed on at remote cloud servers, it is important that such data is kept as secure possible. As TEEs is currently one of the more prominent solutions for adding extra protection for such sensitive data, their security are of great im- portance. Therefore, it is interesting that Intel currently leaves it up to enclave developers to make their programs secure against side-channel attacks [21]. Especially since such a large amount of side-channel attacks have been discovered against it [36]. One should however note that the amount of vulnerabilities could also be due in large part to SGX being one of the more prominent TEEs today, making it more interesting to research. Nonetheless, there has obviously been a large number of vulnerabilities that even Intel have considered to be seri- ous enough to warrant expensive (e.g. in terms of performance) mitigations in the form of for example microcode patches. If this development continues in the coming years, it remains to be seen whether it has any significant effect on the future of SGX, and if it would even force Intel to make changes to core design of Intel SGX that mitigates these kinds of attacks.

43 5.5. The work in a wider context

5.5.2 Effect of mitigations on other factors As discussed earlier, mitigations for side-channel attacks (and attacks in general for that mat- ter) often come with downsides. In some cases these mitigations can disable certain fea- tures that impact some users negatively (e.g. the Plundervolt [33] mitigations which dis- ables undervolting [3]). In other cases, as for the current LVI mitigation, performance can be severely impacted. A detection method based on performance counters also comes with similar downsides. While the overhead of the detection tool in this work was pretty small at certain sampling intervals, such a detection method is always going to introduce a cer- tain level of overhead. Additionally, if SGX were to be extended with hardware to allow support for performance counters, there is also an environmental aspect to consider. With more hardware features it also requires a larger consumption of raw materials to construct this hardware, which in turn can also lead to more amounts of potentially toxic materials polluting the environment as hardware is discarded. These types of factors are important to take into consideration when deploying mitigations of this kind, especially ones based on extensions to hardware.

44 6 Conclusion

As cloud services become more and more prevalent, the question of how users can trust these remote services with their data is one of growing concern. SGX is one of the solutions that can allow users to put less trust in the cloud service provider (as TEEs like SGX will provide certain security assurances), but recent side-channel attacks against it like LVI show that there is still work that needs to be done in regards to its security. This work therefore investigates a common method used for detecting SCAs and applies it to the SGX context, by providing a case study for one of the more recent and relevant SCAs against SGX. The next section provides more concise answers to the initial research questions, while the last section discusses some of the possible future work.

6.1 Research questions

• How does a side-channel attack like Load Value Injection impact performance counters?

The LVI attack has many different variants, so not all of them impact performance coun- ters in the same way. They all however interrupt a victim by way of page faults or microcode assists, which can be recorded with software or hardware performance counters. They are all also dependent on some sort of side-channel (often cache) at the end to transmit secret data, which also has a measurable impact on performance counters related to that channel. While it might be different for more practical LVI attacks, the examples investigated here (which all used a cache side-channel) however showed relatively low cache miss rates compared to other attacks that use CSCAs.

• Can performance counters be used to detect SGX-targeted side-channel attacks like Load Value Injection?

This work found that some variants of LVI are theoretically possible to detect with perfor- mance counters using a relatively simple threshold-based detection, with acceptable speed and levels of overhead. Moreover, this shows a concrete example of a side-channel attack targeting SGX enclaves being detectable with the help of performance counters.

• What are the limitations with using performance counters to detect side-channel at- tacks, particularly against SGX?

45 6.2. Future work

This question is answered in Section 5.3. In summary, there are numerous practical limi- tations of using performance counters to detect SGX-specific attacks, that make other existing mitigation strategies currently more preferable. There are also some challenges with perfor- mance counters in general, most notably non-determinism and overcounting. These need to be addressed and carefully considered when developing a SCA detection tool that’s supposed to be deployed and expected to work reliably across many different systems.

6.2 Future work

As previously mentioned, the current limitations of performance counters to detect SCAs against SGX make it not a feasible method unless changes are made to hardware. Future research could be done regarding whether hardware performance counters could potentially become a viable feature for SGX enclaves without reducing its threat model. Regarding the use of performance counters for SCA detection in general, future research into such methods should pay close attention to the limitations of such counters (particularly hardware counters). If these limitations are disregarded, future detection tools run the risk of being not that useful in practice, due to lack of reliability or portability. Defenses against controlled channel attacks and other attacks built on regularly interrupt- ing an enclave through page faults, would be much improved if page faults were properly reported to the enclave itself. As mentioned in Section 5.4.1, a newer version of SGX does this but attackers can still get around it. Further efforts should therefore be done to devise a way that SGX enclaves can receive information about page faults without allowing attackers to alter this information. Finally, for LVI specifically, there is still a question of how realistic the attack is in contexts outside SGX, as well as for non-Intel processors. Future research could therefore focus on investigating LVI in these contexts in more detail.

46 Bibliography

[1] A. Akram, M. Mushtaq, M. K. Bhatti, V. Lapotre, and G. Gogniat. “Meet the Sherlock Holmes’ of Side Channel Leakage: A Survey of Cache SCA Detection Techniques”. In: IEEE Access 8 (2020), pp. 70836–70860. DOI: 10.1109/ACCESS.2020.2980522. [2] Zirak Allaf, Mo Adda, and Alexander Gegov. “A Comparison Study on Flush+ Reload and Prime+ Probe Attacks on AES Using Machine Learning Approaches”. In: vol. 650. Sept. 2017. DOI: 10.1007/978-3-319-66939-7_17. [3] Douglas Black. Intel & OEMs are disabling undervolting. Here’s how to re-enable it. URL: https://www.ultrabookreview.com/37095- dells- disabling- undervolting- on- their- laptops-heres-how-to-re-enable-it/ (visited on 01/29/2021). [4] Ferdinand Brasser, Urs Müller, Alexandra Dmitrienko, Kari Kostiainen, Srdjan Cap- kun, and Ahmad-Reza Sadeghi. “Software Grand Exposure: SGX Cache Attacks Are Practical”. In: 11th USENIX Workshop on Offensive Technologies (WOOT 17). Vancouver, BC: USENIX Association, Aug. 2017. URL: https : / / www. usenix . org / conference / woot17/workshop-program/presentation/brasser. [5] Claudio Canella, Jo Van Bulck, Michael Schwarz, Moritz Lipp, Benjamin von Berg, Philipp Ortner, Frank Piessens, Dmitry Evtyushkin, and Daniel Gruss. “A Systematic Evaluation of Transient Execution Attacks and Defenses”. In: 28th USENIX Security Symposium (USENIX Security 19). Santa Clara, CA: USENIX Association, Aug. 2019, pp. 249–266. ISBN: 978-1-939133-06-9. URL: https : / / www. usenix . org / conference / usenixsecurity19/presentation/canella. [6] Claudio Canella, Daniel Genkin, Lukas Giner, Daniel Gruss, Moritz Lipp, Marina Minkin, Daniel Moghimi, Frank Piessens, Michael Schwarz, Berk Sunar, Jo Van Bulck, and Yuval Yarom. “Fallout: Leaking Data on Meltdown-resistant CPUs”. In: Proceedings of the ACM SIGSAC Conference on Computer and Communications Security (CCS). ACM. 2019. [7] G. Chen, S. Chen, Y. Xiao, Y. Zhang, Z. Lin, and T. H. Lai. “SgxPectre: Stealing Intel Se- crets from SGX Enclaves Via Speculative Execution”. In: 2019 IEEE European Symposium on Security and Privacy (EuroS P). 2019, pp. 142–157. DOI: 10.1109/EuroSP.2019.00020. [8] Sanchuan Chen, Xiaokuan Zhang, Michael K. Reiter, and Yinqian Zhang. “Detecting Privileged Side-Channel Attacks in Shielded Execution with DéJà Vu”. In: Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security. ASIA CCS

47 Bibliography

’17. Abu Dhabi, United Arab Emirates: Association for Computing Machinery, 2017, pp. 7–18. ISBN: 9781450349444. DOI: 10.1145/3052973.3053007. [9] Confidential Computing Consortium. “Confidential Computing: Hardware-Based Trusted Execution for Applications and Data”. In: (July 2020). URL: https : / / confidentialcomputing . io / wp - content / uploads / sites / 85 / 2020 / 06 / ConfidentialComputing_OSSNA2020.pdf. [10] Intel Corporation. Load Value Injection. URL: https : / / software . intel . com / security - software - guidance / deep - dives / deep - dive - load - value - injection (visited on 02/16/2021). [11] V. Costan and S. Devadas. “Intel SGX Explained”. In: IACR Cryptol. ePrint Arch. 2016 (2016), p. 86. [12] Victor Costan, Ilia Lebedev, and Srinivas Devadas. “Sanctum: Minimal Hardware Ex- tensions for Strong Software Isolation”. In: 25th USENIX Security Symposium (USENIX Security 16). Austin, TX: USENIX Association, Aug. 2016, pp. 857–874. ISBN: 978-1- 931971-32-4. URL: https://www.usenix.org/conference/usenixsecurity16/technical- sessions/presentation/costan. [13] S. Das, J. Werner, M. Antonakakis, M. Polychronakis, and F. Monrose. “SoK: The Chal- lenges, Pitfalls, and Perils of Using Hardware Performance Counters for Security”. In: 2019 IEEE Symposium on Security and Privacy (SP). 2019, pp. 20–38. DOI: 10.1109/SP. 2019.00021. [14] Johannes Götzfried, Moritz Eckert, Sebastian Schinzel, and Tilo Müller. “Cache At- tacks on Intel SGX”. In: Proceedings of the 10th European Workshop on Systems Secu- rity. EuroSec’17. Belgrade, Serbia: Association for Computing Machinery, 2017. ISBN: 9781450349352. DOI: 10.1145/3065913.3065915. [15] Daniel Gruss, Julian Lettner, Felix Schuster, Olya Ohrimenko, Istvan Haller, and Manuel Costa. “Strong and Efficient Cache Side-Channel Protection using Hardware Transactional Memory”. In: 26th USENIX Security Symposium (USENIX Security 17). Vancouver, BC: USENIX Association, Aug. 2017, pp. 217–233. ISBN: 978-1-931971-40- 9. URL: https://www.usenix.org/conference/usenixsecurity17/technical-sessions/ presentation/gruss. [16] Daniel Gruss, Clémentine Maurice, Klaus Wagner, and Stefan Mangard. “Flush+Flush: A Fast and Stealthy Cache Attack”. In: Proceedings of the 13th International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment - Volume 9721. DIMVA 2016. San Sebastián, Spain: Springer-Verlag, 2016, pp. 279–299. ISBN: 9783319406664. DOI: 10.1007/978-3-319-40667-1_14. [17] Nishad Herath and Anders Fogh. “These are not your grand Daddys cpu performance counters–CPU hardware performance counters for security”. In: Black Hat Briefings (2015). [18] Intel. Intel® 64 and IA-32 Architectures Software Developer’s Manual. Volume 3A: System Programming Guide, Part 1. 2021. [19] Intel. Intel® 64 and IA-32 Architectures Software Developer’s Manual. Volume 3D: System Programming Guide, Part 4. 2021. [20] Intel. Intel® 64 and IA-32 Architectures Software Developer’s Manual. Volume 3B: System Programming Guide, Part 2. 2021. [21] Intel. Protection from Side-Channel Attacks. URL: https://software.intel.com/content/ www/us/en/develop/documentation/sgx-developer-guide/top/protection-from- sidechannel-attacks.html (visited on 05/11/2021). [22] Intel. Refined Speculative Execution Terminology. URL: http://www-cs-faculty.stanford. edu/~uno/abcde.html (visited on 01/29/2021).

48 Bibliography

[23] Jianyu Jiang, Claudio Soriente, and Ghassan Karame. Monitoring Performance Metrics is not Enough to Detect Side-Channel Attacks on Intel SGX. Nov. 2020. [24] Michael Kerrisk. perf_event_open(2) — Linux manual page. URL: https : / / man7 . org / linux/man-pages/man2/perf_event_open.2.html (visited on 04/15/2021). [25] Paul Kocher, Jann Horn, Anders Fogh, Daniel Genkin, Daniel Gruss, Werner Haas, Mike Hamburg, Moritz Lipp, Stefan Mangard, Thomas Prescher, Michael Schwarz, and Yuval Yarom. “Spectre Attacks: Exploiting Speculative Execution”. In: 40th IEEE Sym- posium on Security and Privacy (S&P’19). 2019. [26] Paul Kocher, Joshua Jaffe, Benjamin Jun, et al. Introduction to differential power analysis and related attacks. 1998. URL: https://www.rambus.com/introduction-to-differential- power-analysis-and-related-attacks/. [27] Paul C. Kocher. “Timing Attacks on Implementations of Diffie-Hellman, RSA, DSS, and Other Systems”. In: Advances in Cryptology — CRYPTO ’96. Ed. by Neal Koblitz. Berlin, Heidelberg: Springer Berlin Heidelberg, 1996, pp. 104–113. ISBN: 978-3-540-68697-2. [28] Primate Labs. Introducing Geekbench 5. URL: https://www.geekbench.com/ (visited on 04/14/2021). [29] Moritz Lipp, Michael Schwarz, Daniel Gruss, Thomas Prescher, Werner Haas, Anders Fogh, Jann Horn, Stefan Mangard, Paul Kocher, Daniel Genkin, Yuval Yarom, and Mike Hamburg. “Meltdown: Reading Kernel Memory from User Space”. In: 27th USENIX Security Symposium (USENIX Security 18). 2018. [30] A. Lutas and D. Lutas. Load Value Injection in the Line Fill Buffers: How to Hijack Control Flow without Spectre. Tech. rep. Bitdefender, Mar. 2020. URL: https://businessresources. bitdefender.com/hubfs/Bitdefender_Whitepaper_LVI-LFB_EN.pdf?hsLang=en-us. [31] Ahmad Moghimi, Jan Wichelmann, Thomas Eisenbarth, and Berk Sunar. “MemJam: A False Dependency Attack Against Constant-Time Crypto Implementations”. In: 47.4 (Aug. 2019), pp. 538–570. ISSN: 0885-7458. DOI: 10.1007/s10766-018-0611-9. [32] Daniel Moghimi, Gorka Irazoqui, and Thomas Eisenbarth. “CacheZoom: How SGX Amplifies the Power of Cache Attacks”. In: Mar. 2017, pp. 69–90. ISBN: 978-3-319-66786- 7. DOI: 10.1007/978-3-319-66787-4_4. [33] Kit Murdock, David Oswald, Flavio D. Garcia, Jo Van Bulck, Daniel Gruss, and Frank Piessens. “Plundervolt: Software-based Fault Injection Attacks against Intel SGX”. In: 41st IEEE Symposium on Security and Privacy (S&P’20). 2020. [34] M. Mushtaq, J. Bricq, M. K. Bhatti, A. Akram, V. Lapotre, G. Gogniat, and P. Benoit. “WHISPER: A Tool for Run-Time Detection of Side-Channel Attacks”. In: IEEE Access 8 (2020), pp. 83871–83900. DOI: 10.1109/ACCESS.2020.2988370. [35] Maria Mushtaq, Ayaz Akram, Muhammad Khurram Bhatti, Maham Chaudhry, Vian- ney Lapotre, and Guy Gogniat. “NIGHTs-WATCH: A Cache-Based Side-Channel In- trusion Detector Using Hardware Performance Counters”. In: Proceedings of the 7th International Workshop on Hardware and Architectural Support for Security and Privacy. HASP ’18. Los Angeles, California: Association for Computing Machinery, 2018. ISBN: 9781450365000. DOI: 10.1145/3214292.3214293. [36] Alexander Nilsson, Pegah Nikbakht Bideh, and Joakim Brorsson. A Survey of Published Attacks on Intel SGX. English. Mar. 2020. [37] D. O’Keeffe, D. Muthukumaran, P.-L. Aublin, F. Kelbert, C. Priebe, J. Lind, H. Zhu, and P. Pietzuch. Spectre attack against SGX enclave. 2018. URL: https://github.com/lsds/ spectre-attack-sgx.

49 Bibliography

[38] Mathias Payer. “HexPADS: A Platform to Detect "Stealth" Attacks”. In: Proceedings of the 8th International Symposium on Engineering Secure Software and Systems - Volume 9639. ESSoS 2016. London, UK: Springer-Verlag, 2016, pp. 138–154. ISBN: 9783319308050. DOI: 10.1007/978-3-319-30806-7_9. [39] Shuang-he Peng, Qiao-feng Zhou, and Jia-li Zhao. “Detection of cache-based side chan- nel attack based on performance counters”. In: DEStech Transactions on Computer Science and Engineering aiie (2017). [40] Performance Application Programming Interface. 2021. URL: http://icl.cs.utk.edu/papi/ (visited on 02/12/2021). [41] Hany Ragab, Alyssa Milburn, Kaveh Razavi, Herbert Bos, and Cristiano Giuffrida. “CrossTalk: Speculative Data Leaks Across Cores Are Real”. In: S&P. Intel Bounty Re- ward. May 2021. URL: Paper=https://download.vusec.net/papers/crosstalk_sp21. pdf%20Web=https://www.vusec.net/projects/crosstalk%20Code=https://github. com/vusec/ridl%20Press=https://bit.ly/3frdRuV. [42] Christoph Reichenbach. Performance counters. Youtube. SEPL Goethe University Frank- furt. 2015. URL: https://www.youtube.com/watch?v=mpbWQbkl8_g#t=20m15s. [43] Stephan van Schaik, Alyssa Milburn, Sebastian Österlund, Pietro Frigo, Giorgi Maisuradze, Kaveh Razavi, Herbert Bos, and Cristiano Giuffrida. “RIDL: Rogue In- flight Data Load”. In: S&P. May 2019. [44] Stephan van Schaik, Marina Minkin, Andrew Kwong, Daniel Genkin, and Yuval Yarom. CacheOut: Leaking Data on Intel CPUs via Cache Evictions. 2020. arXiv: 2006 . 13353 [cs.CR]. [45] Michael Schwarz, Moritz Lipp, Daniel Moghimi, Jo Van Bulck, Julian Stecklina, Thomas Prescher, and Daniel Gruss. “ZombieLoad: Cross-Privilege-Boundary Data Sampling”. In: CCS. 2019. [46] Michael Schwarz, Samuel Weiser, Daniel Gruss, Clémentine Maurice, and Stefan Man- gard. “Malware guard extension: Using SGX to conceal cache attacks”. In: International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment. Springer. 2017, pp. 3–24. [47] Mark Seaborn and Thomas Dullien. “Exploiting the DRAM rowhammer bug to gain kernel privileges”. In: Black Hat 15 (2015). [48] Jaebaek Seo, Byoungyoung Lee, Seongmin Kim, Ming-Wei Shih, Insik Shin, Dongsu Han, and Taesoo Kim. “SGX-Shield: Enabling Address Space Layout Randomization for SGX Programs”. In: Jan. 2017. DOI: 10.14722/ndss.2017.23037. [49] Ming-Wei Shih, Sangho Lee, Taesoo Kim, and Marcus Peinado. “T-SGX: Eradicating Controlled-Channel Attacks Against Enclave Programs”. In: Jan. 2017. DOI: 10.14722/ ndss.2017.23193. [50] Tutorial: Linux kernel profiling with perf. 2015. URL: https://perf.wiki.kernel.org/index. php/Tutorial (visited on 02/12/2021). [51] Jo Van Bulck. LVI - Hijacking Transient Execution with Load Value Injection. URL: https: //lviattack.eu/ (visited on 05/11/2021). [52] Jo Van Bulck, Marina Minkin, Ofir Weisse, Daniel Genkin, Baris Kasikci, Frank Piessens, Mark Silberstein, Thomas F. Wenisch, Yuval Yarom, and Raoul Strackx. “Foreshadow: Extracting the Keys to the Intel SGX Kingdom with Transient Out-of-Order Execution”. In: Proceedings of the 27th USENIX Security Symposium. USENIX Association, Aug. 2018. [53] Jo Van Bulck, Daniel Moghimi, Michael Schwarz, Moritz Lipp, Marina Minkin, Daniel Genkin, Yarom Yuval, Berk Sunar, Daniel Gruss, and Frank Piessens. “LVI: Hijacking Transient Execution through Microarchitectural Load Value Injection”. In: 41th IEEE Symposium on Security and Privacy (S&P’20). 2020.

50 Bibliography

[54] Jo Van Bulck, Frank Piessens, and Raoul Strackx. “SGX-Step: A Practical Attack Frame- work for Precise Enclave Execution Control”. In: Proceedings of the 2nd Workshop on Sys- tem Software for Trusted Execution. SysTEX’17. Shanghai, China: Association for Com- puting Machinery, 2017. ISBN: 9781450350976. DOI: 10.1145/3152701.3152706. [55] Pei Wang, Yu Ding, Mingshen Sun, Huibo Wang, Tongxin Li, Rundong Zhou, Zhaofeng Chen, and Yiming Jing. “Building and Maintaining a Third-Party Library Supply Chain for Productive and Secure SGX Enclave Development”. In: Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering: Software Engineering in Practice. ICSE-SEIP ’20. Seoul, South Korea: Association for Computing Machinery, 2020, pp. 100–109. ISBN: 9781450371230. DOI: 10.1145/3377813.3381348. [56] Y. Xu, W. Cui, and M. Peinado. “Controlled-Channel Attacks: Deterministic Side Chan- nels for Untrusted Operating Systems”. In: 2015 IEEE Symposium on Security and Privacy. 2015, pp. 640–656. DOI: 10.1109/SP.2015.45. [57] Y. Zhang, M. Zhao, T. Li, and H. Han. “Survey of Attacks and Defenses against SGX”. In: 2020 IEEE 5th Information Technology and Mechatronics Engineering Conference (ITOEC). 2020, pp. 1492–1496. DOI: 10.1109/ITOEC49072.2020.9141835.

51 A Performance counter measurements for additional scenarios

For some of the scenarios measured in 4.1.2, there were additional processes directly related to those scenarios. Table A.1 and A.2 show the normal averages and the normalized averages of these events respectively.

52 Text editor Event / Process name atom(1) atom(2) atom(3) INSTRUCTIONS 30 262 086 28 741 356 183 594 496 CACHE_REF 4 465 897 4 607 509 17 613 253 CACHE_MISS 2 412 578 2 385 754 9 066 688 BR_INSTR 6 773 521 5 428 552 38 344 981 BR_MISS 372 262 143 659 1 656 897 PAGE_FAULTS 24 120 228 L1D_RA 7 222 439 8 631 556 55 259 617 L1D_RM 667 201 1 598 104 3 849 686 DTLB_RA 6 923 730 8 043 347 54 074 773 DTLB_RM 73 326 41 445 196 473 ITLB_RA 72 560 139 730 465 054 ITLB_RM 77 837 39 288 175 096

Firefox twitter Event / Process name firefox Privileged Cont. Webextensions Web Content(1) Web Content(2) INSTRUCTIONS 30 403 103 1 032 753 232 337 927 088 469 158 874 CACHE_REF 4 529 313 259 175 46 415 68 724 223 27 270 CACHE_MISS 2 165 473 106 114 17 484 31 436 717 7 938 BR_INSTR 6 861 281 253 503 54 300 185 010 949 37 976 BR_MISS 376 110 23 336 4 146 6 354 368 2 573 PAGE_FAULTS 58 3 0 2 322 0 L1D_RA 9 440 485 344 617 78 464 264 579 235 53 092 L1D_RM 874 141 37 703 7 518 19 274 673 4 728 DTLB_RA 11 079 259 367 080 93 256 295 904 181 68 837 DTLB_RM 68 328 3 399 810 1 160 286 545 ITLB_RA 200 283 8 732 2 220 1 891 730 1 538 ITLB_RM 73 985 4 872 1 126 490 438 790

Firefox - youtube Event / Process name firefox Privileged Cont. Webextensions Web Content(1) Web Content(2) INSTRUCTIONS 5 658 974 631 141 80 600 40 691 370 4 797 CACHE_REF 898 182 166 124 21 031 5 347 675 591 CACHE_MISS 408 687 55 156 10 204 2 789 684 229 BR_INSTR 1 303 641 155 384 18 376 8 239 851 1 126 BR_MISS 71 821 15 356 1 898 468 895 65 PAGE_FAULTS 25 6 0 223 0 L1D_RA 1 952 597 324 227 24 847 12 798 885 1 362 L1D_RM 181 662 30 757 2 760 1 212 572 91 DTLB_RA 1 585 204 259 280 19 246 8 739 310 1 207 DTLB_RM 14 081 2 533 196 71 702 5 ITLB_RA 32 937 6 396 467 122 428 9 ITLB_RM 15 102 3 509 251 60 612 8

Game Event / Process name Civ5Xp swh.(1) swh.(2) swh.(3) swh.(4) steam gameoverlayui INSTRUCTIONS 917 409 71 300 6 506 931 206 1 885 845 1 293 584 2 564 647 CACHE_REF 414 827 19 570 2 179 28 243 77 860 396 285 619 297 CACHE_MISS 359 319 16 713 1 941 13 792 54 903 342 132 566 396 BR_INSTR 183 467 14 244 1 388 157 206 359 452 289 561 629 767 BR_MISS 22 996 1 418 159 15 059 17 602 20 533 44 609 PAGE_FAULTS 0 0 0 0 0 0 0 L1D_RA 260 499 20 903 1 925 12 279 22 138 466 891 1 192 974 L1D_RM 31 122 2 314 270 1 019 3 159 47 251 95 748 DTLB_RA 344 023 20 733 1 957 11 943 21 942 490 683 1 168 880 DTLB_RM 5 704 306 35 71 237 8 188 15 618 53 ITLB_RA 3 201 192 19 144 645 4 210 6 247 ITLB_RM 8 774 445 50 71 298 9 376 15 572

Table A.1: Averages for all processes in scenarios where there were multiple related processes. Note that only processes that showed any measured results are included (swh = short for steamwebhelper). Text editor Event / Process name atom(1) atom(2) atom(3) INSTRUCTIONS 1000.00 1000.00 1000.00 CACHE_REF 147.57 160.31 95.94 CACHE_MISS 79.72 83.01 49.38 BR_INSTR 223.83 188.88 208.86 BR_MISS 12.30 5.00 9.02 PAGE_FAULTS 0.00 0.00 0.00 L1D_RA 261.75 280.48 318.69 L1D_RM 24.18 51.93 22.20 DTLB_RA 262.07 280.56 320.41 DTLB_RM 2.78 1.45 1.16 ITLB_RA 2.75 4.87 2.76 ITLB_RM 2.95 1.37 1.04

Firefox twitter Event / Process name firefox Privileged Cont. Webextensions Web Content(1) Web Content(2) INSTRUCTIONS 1000.00 1000.00 1000.00 1000.00 1000.00 CACHE_REF 148.98 250.96 199.77 74.13 171.65 CACHE_MISS 71.23 102.75 75.25 33.91 49.96 BR_INSTR 225.68 245.46 233.71 199.56 239.03 BR_MISS 12.37 22.60 17.84 6.85 16.20 PAGE_FAULTS 0.00 0.00 0.00 0.00 0.00 L1D_RA 271.60 301.10 288.65 284.05 281.14 L1D_RM 25.15 32.94 27.66 20.69 25.04 DTLB_RA 271.33 299.64 286.54 283.53 280.79 DTLB_RM 1.67 2.77 2.49 1.11 2.22 ITLB_RA 4.90 7.13 6.82 1.81 6.27 ITLB_RM 1.81 3.98 3.46 0.47 3.22

Firefox - youtube Event / Process name firefox Privileged Cont. Webextensions Web Content(1) Web Content(2) INSTRUCTIONS 1000.00 1000.00 1000.00 1000.00 1000.00 CACHE_REF 158.72 263.21 260.93 131.42 123.20 CACHE_MISS 72.22 87.39 126.60 68.56 47.74 BR_INSTR 230.37 246.20 227.99 202.50 234.73 BR_MISS 12.69 24.33 23.55 11.52 13.55 PAGE_FAULTS 0.00 0.01 0.00 0.01 0.00 L1D_RA 277.19 305.43 304.27 290.30 269.65 L1D_RM 25.79 28.97 33.80 27.50 18.02 DTLB_RA 279.89 305.23 305.21 290.57 265.16 DTLB_RM 2.49 2.98 3.11 2.38 1.10 ITLB_RA 5.82 7.53 7.41 4.07 1.98 ITLB_RM 2.67 4.13 3.98 2.02 1.76

Game Event / Process name Civ5Xp swh.(1) swh.(2) swh.(3) swh.(4) steam gameoverlayui INSTRUCTIONS 1000.00 1000.00 1000.00 1000.00 1000.00 1000.00 1000.00 CACHE_REF 452.17 274.47 334.92 30.33 41.29 306.35 241.47 CACHE_MISS 391.67 234.40 298.34 14.81 29.11 264.48 220.85 BR_INSTR 199.98 199.78 213.34 168.82 190.61 223.84 245.56 BR_MISS 26.07 19.89 24.44 16.17 9.33 15.87 17.39 PAGE_FAULTS 0.00 0.00 0.00 0.00 0.00 0.00 0.00 L1D_RA 345.68 313.24 309.14 337.55 303.28 375.02 400.55 L1D_RM 41.30 34.68 43.36 28.01 43.28 37.95 32.15 DTLB_RA 41.30 313.48 309.36 337.83 303.39 372.89 400.57 DTLB_RM 5.75 4.63 5.53 2.01 3.28 6.22 5.35 54 ITLB_RA 3.23 2.90 3.00 4.07 8.92 3.20 2.14 ITLB_RM 8.85 6.73 7.90 2.01 4.12 7.13 5.34

Table A.2: Averages for all processes in scenarios where there were multiple related processes, normalized to total number of instructions and scaled by 1000.