EPTI: Efficient Defence Against Meltdown Attack for Unpatched

EPTI: Efficient Defence against Meltdown Attack for Unpatched VMs Zhichao Hua, Dong Du, Yubin Xia, Haibo Chen, and Binyu Zang, Institute of Parallel and Distributed Systems, Shanghai Jiao Tong University https://www.usenix.org/conference/atc18/presentation/hua This paper is included in the Proceedings of the 2018 USENIX Annual Technical Conference (USENIX ATC ’18). July 11–13, 2018 • Boston, MA, USA ISBN 978-1-939133-02-1 Open access to the Proceedings of the 2018 USENIX Annual Technical Conference is sponsored by USENIX. EPTI: Efficient Defence against Meltdown Attack for Unpatched VMs Zhichao Hua, Dong Du, Yubin Xia, Haibo Chen, Binyu Zang Institute of Parallel and Distributed Systems, Shanghai Jiao Tong University fhuazhichao123,dd nirvana,xiayubin,haibochen,[email protected] Abstract Spectre vulnerability. However, in order to fix the Melt- down vulnerability, which is much more serious and eas- The Meltdown vulnerability, which exploits the inher- ier to exploit, users are required to apply a kernel patch ent out-of-order execution in common processors like named KPTI (kernel page table isolation) [30] that uses x86, ARM and PowerPC, has shown to break the fun- two page tables to host kernel and user programs to iso- damental isolation boundary between user and kernel late kernel address space from any user process. While space. This has stimulated a non-trivial patch to mod- this patch can effectively defend the Meltdown attacks, ern OS to separate page tables for user space and kernel it brings three issues, which leaves thousands of millions space, namely, KPTI (kernel page table isolation). While of unpatched machines in danger. this patch stops kernel memory leakages from rouge user First, the patch has to be applied manually by every processes, it mandates users to patch their kernels (usu- user. In cloud environment, although the cloud adminis- ally requiring a reboot), and is currently only available trators can patch the host OS, they cannot directly patch on the latest versions of OS kernels. Further, it also in- guest OS running in VMs (virtual machines) since they troduces non-trivial performance overhead due to page are not allowed to do so. For example, Amazon “recom- table switching during user/kernel crossings. mend that customers patch their instance operating sys- In this paper, we present EPTI, an alternative approach tems to address process-to-process or process-to-kernel to defending against the Meltdown attack for unpatched concerns of this issue” [12]. However, many cloud users VMs (virtual machines) in cloud, yet with better per- are not capable of doing such system maintenance. formance than KPTI. Specifically, instead of using two Second, the patch may depend on specific versions of guest page tables, we use two EPTs (extended page ta- kernel, especially for Linux. Till now, Linux community bles) to isolate user space and kernel space, and unmap just released version 4.15 that contains the patch. The all the kernel space in user’s EPT to achieve the same patch may not work on some early versions of kernel like effort of KPTI. The switching of EPTs is done through 4.4 [28]. It is expected to take a long time before the a hardware-support feature called EPT switching within patch can be applied to all the versions of Linux kernel. guest VMs without hypervisor involvement. Meanwhile, EPT switching does not flush TLB since each EPT has Third, the patch may incur non-trivial performance its own TLB, which further reduces the overhead. We slowdown. The KPTI patch makes the kernel and have implemented our design and evaluated it on Intel user process use different page tables, which causes Kaby Lake CPU with different versions of Linux kernel. TLB-flush during the switching between user-mode and The results show that EPTI only introduces up to 13% kernel-mode and thus increases the rate of TLB miss. overhead, which is around 45% less than KPTI. Prior evaluation results show that for some system-call intensive workload, the performance penalty may be high as 30% in VMs [22]; our own experiments con- 1 Introduction firmed such performance slowdown (Section 6). In this paper, we present an alternative approach to The recently discovered Meltdown [16] and Spectre [14] defending against Meltdown attack for VMs in cloud. vulnerabilities allow unauthorized processes to read data Our approach, namely EPTI, can be applied to unpatched of privileged kernel or other processes, which brings se- guest VMs without users’ awareness and can achieve bet- vere security threat especially to cloud platforms. Cur- ter performance than KPTI at the same time. First, in- rently, Intel has released micro-code patches to fix the stead of using two gPTs (guest page tables) as in KPTI, USENIX Association 2018 USENIX Annual Technical Conference 255 EPTI uses two EPTs (extended page tables), namely Origin KPTI EPTI EPTk and EPTu, to run the kernel and user processes, Kernel correspondingly. The guest kernel and user still share space one gPT, but in user mode, the gPT entries for mapping kernel address space are set to zero in EPTu, which for- bids any translation of address within kernel space to mit- User space igate the Meltdown attack. Second, we leverage one of Intel’s hardware features for virtualization, named EPT User-mode Kernel-mode User-mode Kernel-mode User-mode switching, to switch the two EPTs within the VM itself Kernel-mode Mapped in both gPT and EPT without causing any VMExit. We use binary instrumen- Not mapped in gPT Mapped in gPT, not mapped in EPT tation to insert two trampolines at the entrance and exit of guest kernel to do the EPT switching, which does not Figure 1: Page table isolation. For a VM, KPTI uses two require kernel’s source code and has little (if any) de- gPTs and one EPT, while EPTI uses one gPT (since VM is not pendence on kernel versions. Third, through a detailed patched) and two EPTs. micro-architectural analysis, we find that EPT switching can be more efficient than gPT switching. Since each to access kernel address A and to leverage its data as an EPT has its own TLB, when switching the EPTs there index to access the cache; step-2: to get the data through will be no TLB flushing by hardware, which is the main cache covert channel. The key problem here is that the reason of performance degradation of KPTI. We also Step-1 is executed reordered and will be canceled even- adopt several optimizations to minimize the number of tually, but the cache layout is affected without rollback. VMExits to further reduce the overhead. Fourth, EPTI Since the kernel will typically map all the physical mem- can be seamlessly deployed in the cloud by combining ory within its memory space, the malicious application with live VM migration [5]: a host can migrate away all can potentially get all of the memory contents. the guest VMs, patch the host hypervisor with EPTI, and KPTI (kernel page table isolation) [30] is based on then migrate all the VMs back. KAISER (kernel address isolation to have side-channels We have implemented EPTI on KVM and use unmod- efficiently removed) [19], which is proposed to defend ified Ubuntu distribution as guest VM for evaluation. We against the Meltdown attack. This patch separates user conduct a detailed security analysis as well as evalua- space and kernel space page tables entirely, as shown in tion to show that our EPTI can achieve the same security Figure 1. The one used by kernel is the same as before, guarantee as KPTI. We also evaluate real-world bench- while the one used by application contains a copy of user marks to measure the performance overhead. The results space and a small set of kernel space mapping with only show that the average performance overhead on server trampoline code to enter the kernel. Since the data of ker- applications of EPTI is about 6%, which is 45% lower nel are no longer mapped in the user space, a malicious than KPTI whose average overhead is 11%. application cannot directly de-reference kernel’s data ad- To summarize, this paper makes the following contri- dress, and thus cannot issue Meltdown attack. KPTI has butions: been merged to the mainstream Linux kernel 4.15, which was released on 28 Jan, 2018. However, the patch still • An EPT-level isolation of kernel’s and user’s has problems on previous Linux kernel versions. For address spaces to defend against Meltdown attack example, it is reported that some Ubuntu user “just got for unpatched guest VMs. the Meltdown update to kernel linux-image-4.4.0-108- • Several optimizations to achieve better performance generic but this does not boot at all” [28]. Considering than the current solution KPTI. the patch needs to be applied manually by system ad- ministrators, it may take a long time before most of the • A prototype of our design on real hardware for machines getting the patch deployed. performance and security evaluation. 2 Motivation and Background 2.2 Overhead of KPTI 2.1 Meltdown Attack and KPTI KPTI introduces performance overhead since both entering-kernel and exiting-kernel require additional The Meltdown vulnerability was published in January page table switching. The switching is done by loading 2018, known to affect Intel’s x86 CPU, ARM Cortex- the CR3 register, which takes around 300 cycles. Mean- A75 [16] and some versions of PowerPC processor [11]. while, since TLB (translation lookaside buffer) will be Through this attack, a malicious user application can flushed during CR3 changing, the performance will fur- steal contents of kernel memory in two steps.

Load more