Meltdown and Spectre, explained – mattklein123 – Medium 3/30/18, 7 10 AM
mattklein123 Follow Engineer @lyft Jan 14 · 19 min read
Meltdown and Spectre, explained Although these days I’m mostly known for application level networking and distributed systems, I spent the The vulnerabilities are astounding; I would argue they are one of the most important discoveries in computer science in the last 10–20 years. The mitigations are also diFcult to understand and accurate information about them is hard to Although a lot has been written about Meltdown and Spectre since their announcement, I have not seen a good mid-level introduction to the vulnerabilities and mitigations. In this post I’m going to attempt to correct that by providing a gentle introduction to the hardware and software background required to understand the vulnerabilities, a discussion of the vulnerabilities themselves, as well as a discussion of the current mitigations. Important note: Because I have not worked directly on the mitigations, and do not work at Intel, Microsoft, Google, Amazon, Red Hat, etc. some of the details that I am going to provide may not be entirely accurate. I have pieced together this post based on my knowledge of how these systems work, publicly available documentation, and patches/discussion posted to LKML and xen-devel. I would love to be corrected if any of this https://medium.com/@mattklein123/meltdown-spectre-explained-6bc8634cc0c2 Page 1 of 20 Meltdown and Spectre, explained – mattklein123 – Medium 3/30/18, 7 10 AM post is inaccurate, though I doubt that will happen any time soon given how much of this subject is still covered by NDA. Background In this section I will provide some background required to understand the vulnerabilities. The section glosses over a large amount of detail and is aimed at readers with a limited understanding of computer hardware and systems software. Virtual memory Virtual memory is a technique used by all operating systems since the 1970s. It provides a layer of abstraction between the memory address layout that most software sees and the physical devices backing that memory (RAM, disks, etc.). At a high level, it allows applications to utilize more memory than the machine actually has; this provides a powerful abstraction that makes many programming tasks easier. Figure 1: Virtual memory Figure 1 shows a simplistic computer with 400 bytes of memory laid out in “pages” of 100 bytes (real computers use powers of two, typically 4096). The computer has two processes, each with 200 bytes of memory across 2 pages each. The processes might be running the same code https://medium.com/@mattklein123/meltdown-spectre-explained-6bc8634cc0c2 Page 2 of 20 Meltdown and Spectre, explained – mattklein123 – Medium 3/30/18, 7 10 AM using Translating virtual to physical addresses is such a common operation in modern computers that if the OS had to be involved in all cases the computer would be incredibly slow. Modern CPU hardware provides a device called a Translation Lookaside Buder (TLB) that caches recently used mappings. This allows CPUs to perform address translation directly in hardware the majority of the time. Figure 2: Virtual memory translation Figure 2 shows the address translation bow: 1. A program fetches a virtual address. 2. The CPU attempts to translate it using the TLB. If the address is found, the translation is used. 3. If the address is not found, the CPU consults a set of “page tables” to determine the mapping. Page tables are a set of physical memory pages provided by the operating system in a location the hardware can 4. If the page table contains a mapping it is returned, cached in the TLB, and used for lookup. If the page table does not contain a mapping, a “page fault” is raised to the OS. A page fault is a special https://medium.com/@mattklein123/meltdown-spectre-explained-6bc8634cc0c2 Page 3 of 20 Meltdown and Spectre, explained – mattklein123 – Medium 3/30/18, 7 10 AM kind of interrupt that allows the OS to take control and determine what to do when there is a missing or invalid mapping. For example, the OS might terminate the program. It might also allocate some physical memory and map it into the process. If a page fault handler continues execution, the new mapping will be used by the TLB. Figure 3: User/kernel virtual memory mappings Figure 3 shows a slightly more realistic view of what virtual memory looks like in a modern computer (pre-Meltdown — more on this below). In this setup we have the following features: • Kernel memory is shown in red. It is contained in physical address range 0–99. Kernel memory is special memory that only the operating system should be able to access. User programs should not be able to access it. • User memory is shown in gray. • Unallocated physical memory is shown in blue. In this example, we start seeing some of the useful features of virtual memory. Primarily: • User memory in each process is in the virtual range 0–99, but backed by diderent physical memory. https://medium.com/@mattklein123/meltdown-spectre-explained-6bc8634cc0c2 Page 4 of 20 Meltdown and Spectre, explained – mattklein123 – Medium 3/30/18, 7 10 AM • Kernel memory in each process is in the virtual range 100–199, but backed by the same physical memory. As I brieby mentioned in the previous section, each page has associated permission bits. Even though kernel memory is mapped into each user process, when the process is running in user mode it cannot access the kernel memory. If a process attempts to do so, it will trigger a page fault at which point the operating system will terminate it. However, when the process is running in kernel mode (for example during a system call), the processor will allow the access. At this point I will note that this type of dual mapping (each process having the kernel mapped into it directly) has been standard practice in operating system design for over thirty years for performance reasons (system calls are very common and it would take a long time to remap the kernel or user space on every transition). CPU cache topology Figure 4: CPU thread, core, package, and cache topology. https://medium.com/@mattklein123/meltdown-spectre-explained-6bc8634cc0c2 Page 5 of 20 Meltdown and Spectre, explained – mattklein123 – Medium 3/30/18, 7 10 AM The next piece of background information required to understand the vulnerabilities is the CPU and cache topology of modern processors. Figure 4 shows a generic topology that is common to most modern CPUs. It is composed of the following components: • The basic unit of execution is the “CPU thread” or “hardware thread” or “hyper-thread.” Each CPU thread contains a set of registers and the ability to execute a stream of machine code, much like a software thread. • CPU threads are contained within a “CPU core.” Most modern CPUs contain two threads per core. • Modern CPUs generally contain multiple levels of cache memory. The cache levels closer to the CPU thread are smaller, faster, and more expensive. The further away from the CPU and closer to main memory the cache is the larger, slower, and less expensive it is. • Typical modern CPU design uses an L1/L2 cache per core. This means that each CPU thread on the core makes use of the same caches. • Multiple CPU cores are contained in a “CPU package.” Modern CPUs might contain upwards of 30 cores (60 threads) or more per package. • All of the CPU cores in the package typically share an L3 cache. • CPU packages Speculative execution https://medium.com/@mattklein123/meltdown-spectre-explained-6bc8634cc0c2 Page 6 of 20 Meltdown and Spectre, explained – mattklein123 – Medium 3/30/18, 7 10 AM Figure 5: Modern CPU execution engine (Source: Google images) The The primary takeaway is that modern CPUs are incredibly complicated and do not simply execute machine instructions in order. Each CPU thread has a complicated pipelining engine that is capable of executing instructions out of order. The reason for this has to do with caching. As I discussed in the previous section, each CPU makes use of multiple levels of caching. Each cache miss adds a substantial amount of delay time to program execution. In order to mitigate this, processors are capable of executing ahead and out of order while waiting for memory loads. This is known as speculative execution. The following code snippet demonstrates this. if (x < array1_size) { y = array2[array1[x] * 256]; } https://medium.com/@mattklein123/meltdown-spectre-explained-6bc8634cc0c2 Page 7 of 20 Meltdown and Spectre, explained – mattklein123 – Medium 3/30/18, 7 10 AM In the previous snippet, imagine that array1_size is not available in cache, but the address of array1 is. The CPU might guess (speculate) that x is less than array1_size and go ahead and perform the calculations inside the if statement. Once array1_size is read from memory, the CPU can determine if it guessed correctly. If it did, it can continue having saved a bunch of time. If it didn’t, it can throw away the speculative calculations and start over. This is no worse than if it had waited in the Another type of speculative execution is known as indirect branch prediction. This is extremely common in modern programs due to virtual dispatch. class Base { public: virtual void Foo() = 0; }; class Derived : public Base { public: void Foo() override { … } }; Base* obj = new Derived; obj->Foo(); (The source of the previous snippet is this post) The way the previous snippet is implemented in machine code is to load the “v-table” or “virtual dispatch table” from the memory location that obj points to and then call it. Because this operation is so common, modern CPUs have various internal caches and will often guess (speculate) where the indirect branch will go and continue execution at that point. Again, if the CPU guesses correctly, it can continue having saved a bunch of time. If it didn’t, it can throw away the speculative calculations and start over. https://medium.com/@mattklein123/meltdown-spectre-explained-6bc8634cc0c2 Page 8 of 20 Meltdown and Spectre, explained – mattklein123 – Medium 3/30/18, 7 10 AM Meltdown vulnerability Having now covered all of the background information, we can dive into the vulnerabilities. Rogue data cache load The 1. uint8_t* probe_array = new uint8_t[256 * 4096]; 2. // ... Make sure probe_array is not cached 3. uint8_t kernel_memory = *(uint8_t*)(kernel_address); 4. uint64_t final_kernel_memory = kernel_memory * 4096; 5. uint8_t dummy = probe_array[final_kernel_memory]; 6. // ... catch page fault 7. // ... determine which of 256 slots in probe_array is cached Let’s take each step above, describe what it does, and how it leads to being able to read the memory of the entire computer from a user program. 1. In the 2. Following the allocation, the attacker makes sure that none of the memory in the probe array is cached. There are various ways of accomplishing this, the simplest of which includes CPU-speci 3. The attacker then proceeds to read a byte from the kernel’s address space. Remember from our previous discussion about virtual memory and page tables that all modern kernels typically map the entire kernel virtual address space into the user process. Operating systems rely on the fact that each page table entry has permission settings, and that user mode programs are not allowed to access https://medium.com/@mattklein123/meltdown-spectre-explained-6bc8634cc0c2 Page 9 of 20 Meltdown and Spectre, explained – mattklein123 – Medium 3/30/18, 7 10 AM kernel memory. Any such access will result in a page fault. That is indeed what will eventually happen at step 3. 4. However, modern processors also perform speculative execution and will execute ahead of the faulting instruction. Thus, steps 3–5 may execute in the CPU’s pipeline before the fault is raised. In this step, the byte of kernel memory (which ranges from 0–255) is multiplied by the page size of the system, which is typically 4096. 5. In this step, the multiplied byte of kernel memory is then used to read from the probe array into a dummy value. The multiplication of the byte by 4096 is to avoid a CPU feature called the “prefetcher” from reading more data than we want into into the cache. 6. By this step, the CPU has realized its mistake and rolled back to step 3. However, the results of the speculated instructions are still visible in cache. The attacker uses operating system functionality to trap the faulting instruction and continue execution (e.g., handling SIGFAULT). 7. In step 7, the attacker iterates through and sees how long it takes to read each of the 256 possible bytes in the probe array that could have been indexed by the kernel memory. The CPU will have loaded one of the locations into cache and this location will load substantially faster than all the other locations (which need to be read from main memory). This location is the value of the byte in kernel memory. Using the above technique, and the fact that it is standard practice for modern operating systems to map all of physical memory into the kernel virtual address space, an attacker can read the computer’s entire physical memory. Now, you might be wondering: “You said that page tables have permission bits. How can it be that user mode code was able to speculatively access kernel memory?” The reason is this is a bug in Intel processors. In my opinion, there is no good reason, performance or otherwise, for this to be possible. Recall that all virtual memory access must occur through the TLB. It is easily possible during speculative execution to check that a cached mapping has permissions compatible https://medium.com/@mattklein123/meltdown-spectre-explained-6bc8634cc0c2 Page 10 of 20 Meltdown and Spectre, explained – mattklein123 – Medium 3/30/18, 7 10 AM with the current running privilege level. Intel hardware simply does not do this. Other processor vendors do perform a permission check and block speculative execution. Thus, as far as we know, Meltdown is an Intel only vulnerability. Edit: It appears that at least one ARM processor is also susceptible to Meltdown as indicated here and here. Meltdown mitigations Meltdown is easy to understand, trivial to exploit, and fortunately also has a relatively straightforward mitigation (at least conceptually — kernel developers might not agree that it is straightforward to implement). Kernel page table isolation (KPTI) Recall that in the section on virtual memory I described that all modern operating systems use a technique in which kernel memory is mapped into every user mode process virtual memory address space. This is for both performance and simplicity reasons. It means that when a program makes a system call, the kernel is ready to be used without any further work. The Figure 6: Kernel page table isolation https://medium.com/@mattklein123/meltdown-spectre-explained-6bc8634cc0c2 Page 11 of 20 Meltdown and Spectre, explained – mattklein123 – Medium 3/30/18, 7 10 AM Figure 6 shows a technique called Kernel Page Table Isolation (KPTI). This basically boils down to not mapping kernel memory into a program when it is running in user space. If there is no mapping present, speculative execution is no longer possible and will immediately fault. In addition to making the operating system’s virtual memory manager (VMM) more complicated, without hardware assistance this technique will also considerably slow down workloads that make a large number of user mode to kernel mode transitions, due to the fact that the page tables have to be modi Newer x86 CPUs have a feature known as ASID (address space ID) or PCID (process context ID) that can be used to make this task substantially cheaper (ARM and other microarchitectures have had this feature for years). PCID allows an ID to be associated with a TLB entry and then to only bush TLB entries with that ID. The use of PCID makes KPTI cheaper, but still not free. In summary, Meltdown is an extremely serious and easy to exploit vulnerability. Fortunately it has a relatively straightforward mitigation that has already been deployed by all major OS vendors, the caveat being that certain workloads will run slower until future hardware is explicitly designed for the address space separation described. Spectre vulnerability Spectre shares some properties of Meltdown and is composed of two variants. Unlike Meltdown, Spectre is substantially harder to exploit, but adects almost all modern processors produced in the last twenty years. Essentially, Spectre is an attack against modern CPU and operating system design versus a speci Bounds check bypass (Spectre variant 1) The