Revanc: a Framework for Reverse Engineering Hardware Page Table Caches

RevAnC: A Framework for Reverse Engineering Hardware Page Table Caches Stephan van Schaik Kaveh Razavi Ben Gras Vrije Universiteit Amsterdam Vrije Universiteit Amsterdam Vrije Universiteit Amsterdam [email protected] [email protected] [email protected] Herbert Bos Cristiano Giuffrida Vrije Universiteit Amsterdam Vrije Universiteit Amsterdam [email protected] [email protected] ABSTRACT (ASLR) [10, 12, 13, 14, 17] leaking cryptographic keys [27, Recent hardware-based attacks that compromise systems 6] and even tracking mouse movements [21]. with Rowhammer or bypass address-space layout random- Many of these attacks abuse how modern processors in- ization rely on how the processor's memory management teract with memory. At the core of any processor today is a unit (MMU) interacts with page tables. These attacks often memory management unit (MMU) that simplifies the man- need to reload page tables repeatedly in order to observe agement of the available physical memory by virtualizing it changes in the target system's behavior. To speed up the between multiple processes. The MMU uses a data struc- MMU's page table lookups, modern processors make use of ture called the page table to perform the translation between multiple levels of caches such as translation lookaside buffers virtual memory to physical memory. Page tables are an at- (TLBs), special-purpose page table caches and even general tractive target for hardware-based attacks. For example, a data caches. A successful attack needs to flush these caches single bit flip in a page table page caused by Rowhammer reliably before accessing page tables. To flush these caches could grant an attacker the control over a physical memory from an unprivileged process, the attacker needs to create location that she does not own, enough to gain root privi- specialized memory access patterns based on the internal leges [25, 26]. Further, security defenses such as ASLR and architecture and size of these caches as well as how they in- others that bootstrap using ASLR [5, 8, 19, 20] rely on ran- teract with each other. While information about TLBs and domizing where code or data is placed in virtual memory. data caches are often reported in processor manuals released Since this (secret) information is embedded in page tables, by the vendors, there is typically little or no information the attackers can perform various side-channel attacks on about the properties of page table caches on different pro- the interaction of the MMU with page tables to leak this cessors. In this paper, we describe RevAnC, an open-source information [12]. framework for reverse engineering internal architecture, size The virtual to physical memory translation is slow as it and the behavior these page table caches by retrofitting a requires multiple additional memory accesses to resolve the recently proposed EVICT+TIME attack on the MMU. RevAnC original virtual address. To improve performance, modern can automatically reverse engineer page table caches on new processors make use of multiple levels of caches such as architectures while providing a convenient interface for flush- translation lookaside buffers (TLBs), special-purpose page ing these caches on 23 different microarchitectures that we table caches and even general data caches. To mount a suc- evaluated from Intel, ARM and AMD. cessful attack on page tables, attackers often need to repeatedly flush these caches [12, 13, 25, 26] to observe the system's behavior when manipulating page tables. The details 1. INTRODUCTION of TLBs and data caches are easy to find by examining pro- As software is becoming harder to compromise due to the cessor manuals [15, 16]. Information on page table caches plethora of advanced defenses [1, 7, 11, 18, 19], attacks on such as their size and behavior, however, is often lacking. hardware are instead becoming an attractive alternative. Without this information, attackers need to resort to trial These attacks range from compromising the system using and error and it becomes difficult to build robust attacks the Rowhammer vulnerability [4, 24, 25, 26] and using side that work across different architectures. channels for breaking address space layout randomization In this paper, we retrofit AnC [12], an existing EVICT+TIME side-channel attack on the MMU to build RevAnC, a reverse engineering framework for determining the size and internal Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed architecture of page table caches as well as how they inter- for profit or commercial advantage and that copies bear this notice and the full cita- act with other caches on 23 microarchitectures comprising tion on the first page. Copyrights for components of this work owned by others than various generations from Intel, ARM and AMD processors. ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or re- publish, to post on servers or to redistribute to lists, requires prior specific permission RevAnC relies on the fact that page table lookups by the and/or a fee. Request permissions from [email protected]. MMU are stored in the last level cache (LLC) in order to EuroSec ’17 April 23, 2017, Belgrade, Serbia speed up the next required translation. By flushing parts © 2017 ACM. ISBN 123-4567-24-567/08/06. $15.00 of the LLC and timing the page table lookup, RevAnC can DOI: http://dx.doi.org/10.475/123 4 identify which parts of the LLC store page tables. On top of flushing the LLC, RevAnC needs to flush the TLB as well as Level 1 Level 2 Level 3 Level 4 PTE 0: PTE 0: PTE 0: PTE 0: . page table caches. Since the information on the size of TLB . CR3: Level 1 Physical Addr and the LLC is available, we can use RevAnC to reverse PTE 200: Leve l 2 Phys Addr . engineer the properties of the page table caches that are of . PTE 300: Level 3 Phys Addr interest to attackers, like their internal architecure and size. Summarizing, we make the following contributions: . PTE 400: Leve l 4 Phys Addr . • We describe a novel technique to reverse engineer the . PTE 500: Target Phys Addr . undocumented page table caches commonly found in . modern processors. • We evaluate our technique on 23 different microarchi- Figure 1: MMU's page table walk to translate tectures from recent Intel, ARM and AMD processors. 0x644b321f4000 to its corresponding memory page on the x86_64 architecture. • We release RevAnC as open-source software. RevAnC provides a convenient interface to flush page table caches on the microarchitectures that we tested and can au- 2.2 Caching MMU’s Operations tomatically detect page table caches on new processors. More information about AnC and RevAnC can The performance of memory accesses improves greatly if be found at: https://www.vusec.net/projects/anc the MMU can avoid having to resolve a virtual address that it already resolved recently. For this purpose, CPUs store We provide some necessary background in Section 2. We the resolved address mappings in the TLB cache. Hence, then discuss our design and implementation on multiple ar- a hit in the TLB removes the need for an expensive page chitectures in Section 3 before evaluating it on 23 different table walk. In addition, to improve the performance of a processors in Section 4. We discuss related work in Section 5 TLB miss, the processor stores the page table data in the and conclude in Section 6. data caches. Modern processors may improve the performance of a 2. BACKGROUND AND MOTIVATION TLB miss even further by means of page table caches or In this section, we discuss the paging memory manage- translation caches that store PTEs for different page table ment scheme and its implementation on most modern pro- levels [2] [3]. While page table caches use the physical ad- cessors. Furthermore, we look at how the MMU performs dress and the PTE index for indexing, translation caches virtual address translation, and the caches that are used to use the partially resolved virtual address instead. With improve the performance of this translation. translation caches the MMU can look up the virtual address and select the page table with the longest matching prefix. 2.1 Paging and the MMU Hence, translation caches skip upper levels of the page ta- Paging has become integral to modern processor architec- ble and continue the translation with the lowest-level page tures as it simplifies the management of physical memory table present within the cache for the given virtual address. by virtualizing it: the operating system no longer needs to While this allows the MMU to skip part of the page table relocate the entire memory of applications due to a limited walk, the implementation of translation caches also comes address space and it no longer needs to deal with fragmenta- with additional complexity. Furthermore, these caches can tion of physical memory. Furthermore, the operating system be implemented as split dedicated caches for different page can limit the memory to which a process has access, prevent- table levels, as one single unified cache for different page ta- ing malicious or malfunctioning code from interfering with ble levels, or even as a TLB that is also capable of caching other processes. PTEs. We discover using RevAnC that for instance, AMD's As a direct consequence, many modern processor archi- Page Walking Caches, as found in the AMD K8 and AMD tectures use an MMU, a hardware component responsible K10 microarchitectures, and Intel's Page-Structure Caches for the translation of virtual addresses to the corresponding are examples of unified page table caches and split translation physical addresses.

Revanc: a Framework for Reverse Engineering Hardware Page Table Caches

Memorandum in Opposition to Hewlett-Packard Company's Motion to Quash Intel's Subpoena Duces Tecum

Reverse Engineering X86 Processor Microcode

AMD's Early Processor Lines, up to the Hammer Family (Families K8

The X86 Is Dead. Long Live the X86!

The Microarchitecture of Intel and AMD Cpus

“架构+工艺”，Cpu 业务拉动业绩持续成长（）投资价值分析报告｜ Amd Amd.O 2019.10.10

AMD K8 Processor Architecture

Instruction Latencies and Throughput for AMD and Intel X86 Processors

The Microarchitecture of Intel, AMD and VIA Cpus: an Optimization Guide for Assembly Programmers and Compiler Makers

Welcome to Anandtech.Com [ Article: Intel Core Versus AMD's K

UPGRADING and REPAIRING Pcs

An Exploratory Analysis of Microcode As a Building Block for System Defenses