Libmpk: Software Abstraction for Intel Memory Protection Keys

libmpk: Software Abstraction for Intel Memory Protection Keys Soyeon Park Sangho Lee† Wen Xu Hyungon Moon∗ Taesoo Kim Georgia Institute of Technology †Microsoft Research ∗Ulsan National Institute of Science and Technology Abstract of cycles at most. Maintaining page access rights is important to prevent at- Intel memory protection keys (MPK) is a new hardware fea- tackers from accessing and manipulating arbitrary memory lo- ture to support thread-local permission control on groups of cations to eliminate information leakage and memory corrup- pages without requiring modification of page tables. Unfor- tion attacks. An arbitrary read vulnerability of an application tunately, its current hardware implementation and software can lead to leak the application’s sensitive information kept in supports suffer from security, scalability, and semantic-gap the memory. For example, the Heartbleed vulnerability of the problems: (1) MPK is vulnerable to protection-key-use-after- OpenSSL library [27] allows attackers to steal the sensitive free and protection-key corruption; (2) MPK does not scale data of applications using the library, including cryptographic due to hardware limitations; and (3) MPK is not perfectly private keys and passwords. Further, protecting control data compatible with mprotect() because it does not support per- from the memory corruption attacks helps to prevent ways mission synchronization across threads. to achieve arbitrary code execution. By using information In this paper, we propose libmpk, a software abstraction leakage vulnerabilities, attackers aim to find the locations of for MPK. libmpk virtualizes protection keys to eliminate the control data (e.g., return addresses and virtual function tables) protection-key-use-after-free and protection-key corruption and corrupt them with arbitrary write vulnerabilities, to carry problems while supporting a tremendous number of memory out control-flow hijacking attacks [36]. page groups. libmpk also prevents unauthorized writes to A number of software- or hardware-based mechanisms its metadata and supports inter-thread key synchronization. have been proposed to ensure the confidentiality and integrity We apply libmpk to three real-world applications: OpenSSL, of the memory. Some of them [8, 31, 32] are designed to JavaScript JIT compiler, and Memcached for memory pro- prevent a class of attacks completely. However, due to their tection and isolation. An evaluation shows that libmpk in- high costs, others [7,13,16,29,38] focus on minimizing the troduces negligible performance overhead (<1%) compared impact of single vulnerability by isolating a small, critical with insecure versions, and improves their performance by portion of code and data only, which are practical but have 8.1× over secure equivalents using mprotect(). The source coverage and/or scalability problems. code of libmpk will be publicly available and maintained as mem- an open source project. Recently, Intel deployed a hardware feature, known as ory protection keys (MPK). MPK has three advantages over 1 Introduction page-table-based mechanisms: (1) performance, (2) group- wise control per-thread view Operating systems (OSs) rely on the memory management , and (3) . First, MPK utilizes a protection key rights register PKRU unit (MMU) to define and enforce a process’s access right to ( ) to maintain the access arXiv:1811.07276v1 [cs.CR] 18 Nov 2018 a memory page. The page table maintains a page table entry rights of individual protection keys associated with specific (PTE) for each page specifying the permission, e.g., readable, memory pages: read/write, read-only, or no access. Unlike writable, or executable, and the MMU checks it to determine the page-table-based mechanisms that flush the TLB and up- the legitimacy of each memory access. OSs can change the date kernel-level page-table data structures (virtual memory permission by updating PTEs and flushing the translation areas, VMAs) to change access rights, taking more than 1,000 lookaside buffer (TLB) to reload the updated PTEs into the cycles, MPK only requires to execute a non-privileged instruc- TLB. tion WRPKRU to update PKRU, taking around 20 cycles (§2.3). In addition, MPK enables execute-only memory [24] because its Alternatively, some instruction set architectures (ISAs), access rights are orthogonal to whether the page is executable. e.g., ARM [3] and IBM Power [18], allow OSs to assign a key to each page, and define the access rights of a process Second, MPK supports up to 16 different protection keys; that is, there can be up to 16 different memory page groups using a CPU register. This effectively classifies the memory 1 into multiple groups with the same access rights. Changing consisting of memory pages with the same protection keys. a process’s access right to a group of pages is as easy as 1The default group (0) is special, so only 15 groups are effective in updating a CPU register, which commonly takes about tens general. 1 The pages that compose a group do not need to be contiguous, to ensure the semantics of mprotect() with MPK. and their access right can be changed at once. This group-wise First, libmpk provides virtual protection keys to applica- control allows developers to change access rights to memory tions to hide real hardware keys from them. This design page groups in a fine-grained manner according to the types avoids the protection-key-use-after-free problem by schedul- and contexts of data stored in them. For example, a server ing the mappings between virtual protection keys and hard- application can associate different memory page groups with ware protection keys. With the virtual key scheduling, libmpk different sessions or clients to protect them individually. scales MPK to support a virtually infinite number of protec- Third, MPK allows each thread (i.e., each hyperthread) to tion keys with the same semantics: group-wise control and have a unique PKRU, realizing per-thread memory view. Even per-thread view. if two threads share the same address space, their access rights Second, libmpk protects its metadata from corruption. Ba- to the same page can differ (e.g., read/write versus read-only). sically, libmpk makes all virtual protection keys read-only by Unfortunately, we find that the current hardware imple- hardcoding them to the code and enforcing direct calls. All mentation of Intel MPK and its standard library and ker- other important metadata, including the mappings between nel supports suffer from (1) security, (2) scalability, and (3) virtual and hardware keys, are maintained in the kernel to semantic-gap problems, hindering its widespread adoption. avoid corruption while avoiding unnecessary system calls to First, MPK suffers from protection-key-use-after-free and minimize the performance overhead. In addition, libmpk is protection key corruption problems. The protection key as- designed across the layers—the kernel and user—to protect signed to a page group can be re-used after deallocation via its internal metadata from malicious overwrites through privi- pkey_free() system call. However, pkey_free() does not lege separation, and yet to maximize the performance cost by initialize the protection key field of the page group associ- avoiding unnecessary system calls. ated with the deallocated key, resulting in ambiguity when Third, libmpk provides an efficient inter-thread key syn- the deallocated key is re-assigned to a different group using chronization technique to utilize MPK as an efficient alter- pkey_alloc(). Also, if attackers know an arbitrary write vul- native of mprotect() with the same semantics. It is 1.7×– nerability, they can corrupt MPK protection keys stored in a 3.8× faster than mprotect() while varying the number of variable to manipulate an application to change the permission 4KB pages from 1 to 1,000 (§6.2). This huge performance of a target page group. improvement benefits from our lazy PKRU synchronization Second, MPK does not scale because its PKRU can manage technique and lacks of TLB flush and VMA update. up to 16 protection keys due to a hardware restriction. When To show the effectiveness and practicality of libmpk, we an application tries to allocate more than 16 protection keys, apply it to three real-world applications: OpenSSL library, pkey_alloc() just returns an error, implying that the applica- JavaScript just-in-time (JIT) compiler, and Memcached. First, tion itself should implement its own mechanism if it has to we modify the OpenSSL library to create secure memory deal with more than 16 memory page groups. pages for storing cryptographic keys to mitigate information Third, MPK has a semantic gap with the conventional leakage. Second, we modify three JavaScript JIT compil- mprotect() because, unlike mprotect() working at a pro- ers (SpiderMonkey, ChakraCore, and v8) to protect the code cess level, MPK works at a thread level, which results in cache from memory corruption, by enforcing the W⊕X secu- potential security and performance problems. For example, rity policy. Third, we modify Memcached to secure almost Linux kernel implements an execute-only memory feature all its data, including slab and hash table whose size can be with MPK: mprotect(addr, len, PROT_EXEC). However, several gigabytes. The evaluation results show that libmpk this MPK-based execute-only memory does not consider inter- and its applications have negligible overhead (<1%). thread permission synchronization, which should be ensured We summarize the contributions of this paper as follows: by mprotect() by nature. This allows another thread that did • Comprehensive study. We study the design, function- not execute mprotect() to have a chance to read the execute- ality, and characteristics of Intel MPK in detail. Further, only memory. Further, since some applications still assume a we identify the critical challenges of utilizing MPK: se- process-level memory permission model, they cannot benefit curity (protection-key-use-after-free and protection-key from MPK’s efficiency and group-wise control, unless they corruption), scalability (a limited number of hardware synchronize access rights across threads by themselves.

Load more