Fast and Scalable VMM Live Upgrade in Large Cloud Infrastructure

Total Page:16

File Type:pdf, Size:1020Kb

Fast and Scalable VMM Live Upgrade in Large Cloud Infrastructure Fast and Scalable VMM Live Upgrade in Large Cloud Infrastructure Xiantao Zhang Xiao Zheng Zhi Wang Alibaba Group Alibaba Group Florida State University [email protected] [email protected] [email protected] Qi Li Junkang Fu Yang Zhang Tsinghua University Alibaba Group Alibaba Group [email protected] [email protected] [email protected] Yibin Shen Alibaba Group [email protected] Abstract hand over passthrough devices to the new KVM instance High availability is the most important and challenging prob- without losing any ongoing (DMA) operations. Our evalua- lem for cloud providers. However, virtual machine mon- tion shows that Orthus can reduce the total migration time itor (VMM), a crucial component of the cloud infrastruc- and downtime by more than 99% and 90%, respectively. We ture, has to be frequently updated and restarted to add secu- have deployed Orthus in one of the largest cloud infrastruc- rity patches and new features, undermining high availabil- tures for a long time. It has become the most effective and ity. There are two existing live update methods to improve indispensable tool in our daily maintenance of hundreds of the cloud availability: kernel live patching and Virtual Ma- thousands of servers and millions of VMs. chine (VM) live migration. However, they both have serious CCS Concepts • Security and privacy → Virtualization drawbacks that impair their usefulness in the large cloud and security; • Computer systems organization → Avail- infrastructure: kernel live patching cannot handle complex ability. changes (e.g., changes to persistent data structures); and VM live migration may incur unacceptably long delays when Keywords virtualization; live upgrade; cloud infrastructure migrating millions of VMs in the whole cloud, for example, ACM Reference Format: to deploy urgent security patches. Xiantao Zhang, Xiao Zheng, Zhi Wang, Qi Li, Junkang Fu, Yang In this paper, we propose a new method, VMM live up- Zhang, and Yibin Shen. 2019. Fast and Scalable VMM Live Upgrade grade, that can promptly upgrade the whole VMM (KVM & in Large Cloud Infrastructure. In ASPLOS ’19: ACM International QEMU) without interrupting customer VMs. Timely upgrade Conference on Architectural Support for Programming Languages of the VMM is essential to the cloud because it is both the and Operating Systems. ACM, New York, NY, USA, 13 pages. https: main attack surface of malicious VMs and the component to //doi.org/10.1145/1122445.1122456 integrate new features. We have built a VMM live upgrade system called Orthus. Orthus features three key techniques: 1 Introduction dual KVM, VM grafting, and device handover. Together, they High availability is the most important but challenging prob- enable the cloud provider to load an upgraded KVM instance lem for cloud providers. It requires the service to be accessi- while the original one is running and “cut-and-paste” the ble anywhere and anytime without perceivable downtime. VM to this new instance. In addition, Orthus can seamlessly However, there are two important issues that can undermine high availability: new features and security. Specifically, a Permission to make digital or hard copies of all or part of this work for cloud usually provides a range of services, such as the stor- personal or classroom use is granted without fee provided that copies age, database, and program language services, in addition to are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights running customer VMs. Features like the cloud storage need for components of this work owned by others than the author(s) must to be integrated into the core cloud component, the VMM. be honored. Abstracting with credit is permitted. To copy otherwise, or New features will only take effect after the VMM is restarted, republish, to post on servers or to redistribute to lists, requires prior specific thus interrupting the service. Meanwhile, security demands permission and/or a fee. Request permissions from [email protected]. the system software to be frequently updated to patch newly ASPLOS ’19, April 13–17, 2019, Providence, RI discovered vulnerabilities. All the major Linux distributions © 2019 Copyright held by the owner/author(s). Publication rights licensed to ACM. push security updates weekly, if not daily. Cloud providers ACM ISBN 978-1-4503-9999-9/18/06...$15.00 have to keep their systems patched otherwise risk being https://doi.org/10.1145/1122445.1122456 compromised. Even though there are efforts to proactively ASPLOS ’19, April 13–17, 2019, Providence, RI Zhang and Zheng, et al. Methods Application Advantages Limitations Cannot handle complex fixes Kernel live patching Simple security fixes No downtime Maintenance headache Short downtime New cloud features VMM live upgrade Scalable, no migration traffic VMM-only update (justified) Complex fixes in the VMM Passthrough device support Not scalable, needs spare servers Migration traffic consumes network bandwidth VM live migration Whole system upgrade Update everything on the server Performance downgrade and long downtime Cannot support passthrough devices Table 1. Comparison of three live update methods discover and fix vulnerabilities in the VMM, vulnerabilities urge security vulnerabilities. In addition, large-scale are inevitable given the complexity of the VMM. For example, VM migration can cause severe network congestion. KVM, the VMM used by many major public cloud providers, • Second, VM live migration will incur temporary per- is a type-2 VMM running in the host Linux kernel, which formance downgrade and service downtime [40]. The has more than 16 millions lines of source code. Therefore, it more memory a VM has, the longer the migration is critical to keep the VMM updated to minimize the window takes. High-end cloud customers may use hundreds of vulnerability. In summary, both new features and security of GBs of memory per VM to run big data, massive demand a method to quickly update the cloud system with streaming, and other applications. These applications minimal downtime. are particularly sensitive to performance downgrade Limitations of the existing methods: currently, there and downtime. Such customers often closely monitor are two live update methods available to cloud providers: ker- the real-time system performance. VM live migration nel live patching and VM live migration. They are valuable is simply not an option for those VMs. in improving the cloud availability but have serious draw- • Last, VM live migration cannot support passthrough backs: kernel live patching applies patches to the running devices. Device passthrough grants a VM direct ac- kernel without rebooting the system [7, 27, 33]. However, cess to the hardware devices. Passthrough devices are live patching cannot handle complex updates that, for exam- widely deployed in the cloud. For example, network ple, change the persistent data structures or add new features. adapter passthrough is often used to speed up network It is most suitable for simple security fixes, such as checking traffic, and many heterogeneous computing services buffer sizes. We instead aim at creating an update mecha- rely on GPU or FPGA passthrough. Existing VM live nism that can not only support simple security checks but migration systems do not support device passthrough. also add new features and apply complex security patches (e.g., the Retpoline fix for Spectre that requires recompiling VMM live upgrade, a new method: in this paper, we the code [39]). propose a third method complimentary to, but more useful VM live migration is another valuable tool used by cloud than, kernel live update and VM live migration – VMM live providers to facilitate software update [10, 12]. It iteratively upgrade, a method to replace the live VMM. Unlike kernel live copies the VM states (e.g., memory) from its current server patch, VMM live upgrade replaces the whole running VMM. to an idle backup server while the VM is still running. At the Consequently, it can apply more complex fixes to the VMM, final iteration, it completely stops the VM, copies the last for example, to support a new cloud storage system. Because changed states, and restarts the VM on the backup server. VMM live upgrade is conducted on the same physical server, This allows the cloud provider to update/replace everything it does not require spare servers to temporarily hold VMs on the source server, including the host kernel, the VMM, or introduce significant network traffic; its performance and failed hardware devices, etc. The VM can be migrated back to downtime are significantly better than VM live migration; the source server after the update if necessary. Nevertheless, and it can seamlessly support passthrough devices. Table 1 VM live migration has three limitations that severely limit compares these three methods. its usefulness in large cloud datacenters: VMM live upgrade replaces the VMM only. This is con- • First, it requires spare servers to temporarily hold the sistent with the threat model in the cloud, where the host migrating VMs. These servers must be equally power- kernel is well isolated from the untrusted VMs through net- ful in order not to degrade the VM performance. If a work segregation, device passthrough, and driver domains; large number of VMs have to be migrated simultane- and the main attack surface lies in the interaction between ously, it brings significant pressure to the inventory the VMM and VMs. This is reflected in Table 2, which lists control for spared servers. In large clouds like ours, all the known vulnerabilities related to KVM/QEMU [3, 34]. it is common for hundreds of thousands of or even These vulnerabilities are exposed to the VMs and pose the millions of VMs to be migrated, for example, to fix most imminent threat to the cloud infrastructure. 2 Fast and Scalable VMM Live Upgrade in Large Cloud Infrastructure ASPLOS ’19, April 13–17, 2019, Providence, RI x86 Other Core Emu VMX PIC Other Arch Kernel QEMU Total the same hardware configurations.
Recommended publications
  • Live Kernel Patching Using Kgraft
    SUSE Linux Enterprise Server 12 SP4 Live Kernel Patching Using kGraft SUSE Linux Enterprise Server 12 SP4 This document describes the basic principles of the kGraft live patching technology and provides usage guidelines for the SLE Live Patching service. kGraft is a live patching technology for runtime patching of the Linux kernel, without stopping the kernel. This maximizes system uptime, and thus system availability, which is important for mission-critical systems. By allowing dynamic patching of the kernel, the technology also encourages users to install critical security updates without deferring them to a scheduled downtime. A kGraft patch is a kernel module, intended for replacing whole functions in the kernel. kGraft primarily oers in-kernel infrastructure for integration of the patched code with base kernel code at runtime. SLE Live Patching is a service provided on top of regular SUSE Linux Enterprise Server maintenance. kGraft patches distributed through SLE Live Patching supplement regular SLES maintenance updates. Common update stack and procedures can be used for SLE Live Patching deployment. Publication Date: 09/24/2021 Contents 1 Advantages of kGraft 3 2 Low-level Function of kGraft 3 1 Live Kernel Patching Using kGraft 3 Installing kGraft Patches 4 4 Patch Lifecycle 6 5 Removing a kGraft Patch 6 6 Stuck Kernel Execution Threads 6 7 The kgr Tool 7 8 Scope of kGraft Technology 7 9 Scope of SLE Live Patching 8 10 Interaction with the Support Processes 8 11 GNU Free Documentation License 8 2 Live Kernel Patching Using kGraft 1 Advantages of kGraft Live kernel patching using kGraft is especially useful for quick response in emergencies (when serious vulnerabilities are known and should be xed when possible or there are serious system stability issues with a known x).
    [Show full text]
  • Think ALL Distros Offer the Best Linux Devsecops Environment?
    Marc Staimer, Dragon Slayor Consulting WHITE PAPER Think All Distros Offer the Best Linux DevSecOps What You’re Not Being Told About Environment? Database as a Service (DBaaS) Think Again! WHITE PAPER • Think Again! Think All Distros Provide the Best Linux DevSecOps Environment? Think Again! Introduction DevOps is changing. Developing code with after the fact bolt-on security is dangerously flawed. When that bolt-on fails to correct exploitable code vulnerabilities, it puts the entire organization at risk. Security has been generally an afterthought for many doing DevOps. It was often assumed the IT organization’s systemic multiple layers of security measures and appliances would protect any new code from malware or breaches. And besides, developing code with security built in, adds tasks and steps to development and testing time. More tasks and steps delay time-to-market. Multi-tenant clouds have radically changed the market. Any vulnerability in a world with increasing cyber-attacks, can put millions of user’s data at risk. Those legacy DevOps attitudes are unsound. They are potentially quite costly in the current environment. Consider that nearly every developed and most developing countries have enacted laws and regulation protecting personally identifiable information or PII1. PII is incredibly valuable to cybercriminals. Stealing PII enables them to commit many cybercrimes including the cybertheft of identities, finances, intellectual property, admin privileges, and much more. PII can also be sold on the web. Those PII laws and regulations are meant to force IT organizations to protect PII. Non-compliance of these laws and regulations often carry punitive financial penalties.
    [Show full text]
  • Fast and Live Hypervisor Replacement
    Fast and Live Hypervisor Replacement Spoorti Doddamani Piush Sinha Hui Lu Binghamton University Binghamton University Binghamton University New York, USA New York, USA New York, USA [email protected] [email protected] [email protected] Tsu-Hsiang K. Cheng Hardik H. Bagdi Kartik Gopalan Binghamton University Binghamton University Binghamton University New York, USA New York, USA New York, USA [email protected] [email protected] [email protected] Abstract International Conference on Virtual Execution Environments (VEE Hypervisors are increasingly complex and must be often ’19), April 14, 2019, Providence, RI, USA. ACM, New York, NY, USA, updated for applying security patches, bug fixes, and feature 14 pages. https://doi.org/10.1145/3313808.3313821 upgrades. However, in a virtualized cloud infrastructure, up- 1 Introduction dates to an operational hypervisor can be highly disruptive. Before being updated, virtual machines (VMs) running on Virtualization-based server consolidation is a common prac- a hypervisor must be either migrated away or shut down, tice in today’s cloud data centers [2, 24, 43]. Hypervisors host resulting in downtime, performance loss, and network over- multiple virtual machines (VMs), or guests, on a single phys- head. We present a new technique, called HyperFresh, to ical host to improve resource utilization and achieve agility transparently replace a hypervisor with a new updated in- in resource provisioning for cloud applications [3, 5–7, 50]. stance without disrupting any running VMs. A thin shim Hypervisors must be often updated or replaced for various layer, called the hyperplexor, performs live hypervisor re- purposes, such as for applying security/bug fixes [23, 41] placement by remapping guest memory to a new updated adding new features [15, 25], or simply for software reju- hypervisor on the same machine.
    [Show full text]
  • (12) United States Patent (10) Patent No.: US 8,124,082 B2 Fong Et Al
    USOO8124082B2 (12) United States Patent (10) Patent No.: US 8,124,082 B2 Fong et al. (45) Date of Patent: Feb. 28, 2012 (54) HUMANIZED ANTI-BETA7 ANTAGONISTS 3.68 A 13 3. SEetC ren al. et al.1 AND USES THEREFOR 5,624,821 A 4/1997 Winter et al. 5,648,260 A 7, 1997 Winter et al. (75) Inventors: Sherman Fong, Alameda, CA (US); 5,658,727 A 8, 1997 Barbas et al. Mark S. Dennis, San Carlos, CA (US) 5,693,762 A 12/1997 Queen et al. 5,712.374. A 1/1998 Kuntsmann et al. 5,714,586 A 2, 1998 Kunstmann et al. (73) Assignee: Genentech, Inc., South San Francisco, 5,731, 168 A 3, 1998 Carter et al. CA (US) 5,733,743 A 3/1998 Johnson et al. 5,739,116 A 4, 1998 Hamann et al. (*) Notice: Subject to any disclaimer, the term of this 5,750,373 A 5/1998 Garrard et al. patent is extended or adjusted under 35 5,767.285 A 6/1998 Hamann et al. U.S.C. 154(b) by 119 days 5,770,701 A 6/1998 McGahren et al. M YW- y yS. 5,770,710 A 6/1998 McGahren et al. 5,773,001 A 6/1998 Hamann et al. (21) Appl. No.: 12/390,730 5,837.242 A 1 1/1998 Holliger et al. 5,877,296 A 3, 1999 Hamann et al. (22) Filed: Feb. 23, 2009 5,969,108 A 10/1999 McCafferty et al. 9 6,172,197 B1 1/2001 McCafferty et al.
    [Show full text]
  • Instant OS Updates Via Userspace Checkpoint-And
    Instant OS Updates via Userspace Checkpoint-and-Restart Sanidhya Kashyap, Changwoo Min, Byoungyoung Lee, and Taesoo Kim, Georgia Institute of Technology; Pavel Emelyanov, CRIU and Odin, Inc. https://www.usenix.org/conference/atc16/technical-sessions/presentation/kashyap This paper is included in the Proceedings of the 2016 USENIX Annual Technical Conference (USENIX ATC ’16). June 22–24, 2016 • Denver, CO, USA 978-1-931971-30-0 Open access to the Proceedings of the 2016 USENIX Annual Technical Conference (USENIX ATC ’16) is sponsored by USENIX. Instant OS Updates via Userspace Checkpoint-and-Restart Sanidhya Kashyap Changwoo Min Byoungyoung Lee Taesoo Kim Pavel Emelyanov† Georgia Institute of Technology †CRIU & Odin, Inc. # errors # lines Abstract 50 1000K 40 100K In recent years, operating systems have become increas- 10K 30 1K 20 ingly complex and thus more prone to security and per- 100 formance issues. Accordingly, system updates to address 10 10 these issues have become more frequently available and 0 1 increasingly important. To complete such updates, users 3.13.0-x 3.16.0-x 3.19.0-x May 2014 must reboot their systems, resulting in unavoidable down- build/diff errors #layout errors Jun 2015 time and further loss of the states of running applications. #static local errors #num lines++ We present KUP, a practical OS update mechanism that Figure 1: Limitation of dynamic kernel hot-patching using employs a userspace checkpoint-and-restart mechanism, kpatch. Only two successful updates (3.13.0.32 34 and → which uses an optimized data structure for checkpoint- 3.19.0.20 21) out of 23 Ubuntu kernel package releases.
    [Show full text]
  • Update Your Kernel with No Service Interruption: SUSE Linux Enterprise
    Data sheet Update your kernel with no service interruption SUSE Linux Enterprise Live Patching on HPE infrastructure Eliminate the need for planned downtime Keep your kernel up-to-date Downtime is expensive, even when it Take a proactive and dynamic approach to kernel patching is planned. SUSE Linux Enterprise Live Patching virtually eliminates the need SUSE® Linux® Enterprise Live Patching running on HPE infrastructure is an open source for downtime—and allows for easier solution that delivers live kernel patching without the need to reboot the system. With this planning of scheduled downtime—by subscription offering based on the open source kGraft project, you can perform patching applying critical Linux kernel fixes without interrupting your mission-critical workloads and in-memory databases—saving outside of maintenance windows. the cost of downtime and increasing service availability. Because the solution builds on the SUSE Linux Enterprise Live Patching offers a proactive and dynamic approach existing SUSE Linux Enterprise kernel infrastructure and uses familiar deployment methods, to kernel maintenance that saves your SUSE Linux Enterprise Live Patching on HPE infrastructure is an easy way to make operating company valuable time and money by system maintenance more efficient and secure. never needing to stop the kernel. SUSE Linux Enterprise Live Patching on HPE infrastructure puts you in charge of kernel updates and service availability. Even when urgent kernel updates are needed, SUSE Linux Customer profile Enterprise Server can run continuously—with zero execution interruptions—while you apply • SAP HANA®: Live Patching targets critical kernel patches in the background. users of SAP® applications, specifically the HANA in-memory databases.
    [Show full text]
  • SUSE Linux Enterprise Live Patching
    Data Sheet SUSE Linux Enterprise Live Patching SUSE® Linux Enterprise Live Patching Downtime is expensive, even when it is planned. Live Patching virtually eliminates the need for downtime—and allows for easier planning of scheduled downtime—by applying critical Linux kernel fixes outside of maintenance windows. Live Patching offers a proactive and dynamic approach to kernel maintenance that saves your company valuable time and money by never needing to stop the kernel. System Requirements Product Overview interruptions, not even a millisecond—while SUSE® Linux Enterprise Live Patching is a you apply critical kernel patches in the • Minimum Requirements: + A system that runs SUSE Linux simple open source solution that deliv- background. Enterprise Server 12 or 15 ers live kernel patching without the need + Zypper must be installed and configured to receive updates to reboot. With this subscription offering Key Benefits based on the kGraft project, you can per- SUSE Linux Enterprise Live Patching keeps • Supported Processor form patching without interrupting your your systems running smoothly and securely Platforms: mission-critical workloads and in-memory on the front end while critical updates are + x86-64 databases, saving the cost of downtime applied on the back end. + ppc64le (IBM Power Systems) + IBM Z and LinuxONE and increasing service availability. Because it builds on to the existing SUSE Linux • Reduce downtime—You can reduce For detailed product specifica- tions and system requirements, Enterprise kernel infrastructure and uses downtimes whether planned or un- visit: https://www.suse.com/ familiar deployment methods, Live Patching planned. In the case of unplanned down- products/live-patching/ is an easy way to make operating system times, you can potentially eliminate maintenance more efficient and secure.
    [Show full text]
  • Think All Distors Offer the Best Linux Devsecops Environment?
    Marc Staimer, Dragon Slayer ARTIGO Acha que todas as distribuições oferecem o melhor O que não lhe falaram sobre banco de ambiente de DevSecOps no Linux? dados como serviço (DBaaS) Pense melhor! ARTIGO • Pense melhor! Acha que todas as distribuições fornecem o melhor ambiente de DevSecOps no Linux? Pense melhor! Introdução O DevOps está mudando. O desenvolvimento de código com segurança adicional posterior é uma estratégia perigosamente falha. Quando essa segurança adicional não consegue corrigir vulnerabilidades de código exploráveis, ela coloca toda a organização em risco. Para muitos que trabalham com DevOps, a segurança geralmente era apenas algo com o que eles se preocupavam depois. Era comum supor que as múltiplas camadas sistêmicas de medidas e dispositivos de segurança da organização de TI protegeriam qualquer novo código contra malware ou violações. Além disso, o desenvolvimento de código com segurança integrada adiciona tarefas e etapas ao tempo de desenvolvimento e teste. Mais tarefas e etapas atrasam o prazo de lançamento no mercado. As nuvens multilocatário mudaram radicalmente o mercado. Em um mundo com um número cada vez maior de ataques cibernéticos, qualquer vulnerabilidade pode colocar em risco os dados de milhões de usuários. Essas atitudes herdadas do DevOps são inseguras. Elas são potencialmente muito caras no ambiente atual. Lembre-se de que quase todos os países desenvolvidos e a maioria dos países em desenvolvimento promulgaram leis e regulamentos para proteger informações de identificação pessoal, ou PII1. As PII são incrivelmente valiosas para os cibercriminosos. O roubo de PII permite que eles cometam muitos crimes cibernéticos, incluindo o roubo cibernético de identidades, finanças, propriedade intelectual, privilégios de administrador e muito mais.
    [Show full text]
  • Mitigating Vulnerability Windows with Hypervisor Transplant Dinh Ngoc Tu, Boris Teabe Djomgwe, Alain Tchana, Gilles Muller, Daniel Hagimont
    Mitigating vulnerability windows with hypervisor transplant Dinh Ngoc Tu, Boris Teabe Djomgwe, Alain Tchana, Gilles Muller, Daniel Hagimont To cite this version: Dinh Ngoc Tu, Boris Teabe Djomgwe, Alain Tchana, Gilles Muller, Daniel Hagimont. Mitigating vulnerability windows with hypervisor transplant. EuroSys 2021 - European Conference on Computer Systems, Apr 2021, Edinburgh / Virtual, United Kingdom. pp.1-14, 10.1145/3447786.3456235. hal- 03183856 HAL Id: hal-03183856 https://hal.archives-ouvertes.fr/hal-03183856 Submitted on 30 Mar 2021 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Mitigating vulnerability windows with hypervisor transplant Tu Dinh Ngoc Boris Teabe Alain Tchana University of Toulouse, France University of Toulouse, France ENS Lyon, France Inria, France Gilles Muller Daniel Hagimont Inria, France University of Toulouse, France ABSTRACT such as Spectre and Meltdown [25, 32]1. The time to apply the The vulnerability window of a hypervisor regarding a given secu- patch mainly depends on the datacenter operators’ patching policy. rity flaw is the time between the identification of the flawandthe Together, this timeframe leaves plenty of time to launch an attack integration of a correction/patch in the running hypervisor.
    [Show full text]
  • Protecting Commodity Operating Systems Through Strong Kernel Isolation
    Protecting Commodity Operating Systems through Strong Kernel Isolation Vasileios P. Kemerlis Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Graduate School of Arts and Sciences COLUMBIA UNIVERSITY 2015 c 2015 Vasileios P. Kemerlis All Rights Reserved ABSTRACT Protecting Commodity Operating Systems through Strong Kernel Isolation Vasileios P. Kemerlis Today’s operating systems are large, complex, and plagued with vulnerabilities that allow perpetrators to exploit them for profit. The constant rise in the number of software weak- nesses, coupled with the sophistication of modern adversaries, make the need for effective and adaptive defenses more critical than ever. In this dissertation, we develop a set of novel protection mechanisms, and introduce new concepts and techniques to secure commodity operating systems against attacks that exploit vulnerabilities in kernel code. Modern OSes opt for a shared process/kernel model to minimize the overhead of opera- tions that cross protection domains. However, this design choice provides a unique vantage point to local attackers, as it allows them to control—both in terms of permissions and contents—part of the memory that is accessible by the kernel, easily circumventing protec- tions like kernel-space ASLR and WˆX. Attacks that leverage the weak separation between user and kernel space, characterized as return-to-user (ret2usr) attacks, have been the de facto kernel exploitation technique in virtually every major OS, while they are not limited to the x86 platform, but have also targeted ARM and others. Given the multi-OS and cross-architecture nature of ret2usr threats, we propose kGuard: a kernel protection mechanism, realized as a cross-platform compiler extension, which can safeguard any 32- or 64-bit OS kernel from ret2usr attacks.
    [Show full text]
  • High Velocity Kernel File Systems with Bento
    High Velocity Kernel File Systems with Bento Samantha Miller Kaiyuan Zhang Mengqi Chen Ryan Jennings Ang Chen‡ Danyang Zhuo† Thomas Anderson University of Washington †Duke University ‡Rice University Abstract kernel-level debuggers and kernel testing frameworks makes this worse. The restricted and different kernel programming High development velocity is critical for modern systems. environment also limits the number of trained developers. This is especially true for Linux file systems which are seeing Finally, upgrading a kernel module requires either rebooting increased pressure from new storage devices and new demands the machine or restarting the relevant module, either way on storage systems. However, high velocity Linux kernel rendering the machine unavailable during the upgrade. In the development is challenging due to the ease of introducing cloud setting, this forces kernel upgrades to be batched to meet bugs, the difficulty of testing and debugging, and the lack of cloud-level availability goals. support for redeployment without service disruption. Existing Slow development cycles are a particular problem for file approaches to high-velocity development of file systems for systems. Recent changes in storage hardware (e.g., low latency Linux have major downsides, such as the high performance SSDs and NVM, but also density-optimized QLC SSD and penalty for FUSE file systems, slowing the deployment cycle shingle disks) have made it increasingly important to have an for new file system functionality. agile storage stack. Likewise, application workload diversity We propose Bento, a framework for high velocity devel- and system management requirements (e.g., the need for opment of Linux kernel file systems. It enables file systems container-level SLAs, or provenance tracking for security written in safe Rust to be installed in the Linux kernel, with forensics) make feature velocity essential.
    [Show full text]
  • Practical Linux Topics
    Binnie BOOKS FOR PROFESSIONALS BY PROFESSIONALS® THE EXPERT’S VOICE® IN LINUX Practical Linux Topics Practical Practical Linux Topics This book teaches you how to improve your hands-on knowledge of Linux using challenging, real-world scenarios. Each chapter explores a topic that has been chosen specifically to demonstrate how to enhance your base Linux system, and resolve important issues. This book enables sysadmins, DevOps engineers, developers, and other technical professionals to make full use of Linux’s rocksteady foundation. Practical Linux Explore specific topics in networking, e-mail, filesystems, encryption, system monitoring, security, servers, and more—including systemd and GPG. Understand salient security concerns and how to mitigate them. Applicable to almost all Linux flavors—Debian, Red Hat, Ubuntu, Linux Mint, CentOS—Power Linux Topics can be used to reference other Unix-type systems Topics with little modification. Improve your practical know-how and background knowledge on servers and workstations alike, increase your ability to troubleshoot and ultimately solve the daily challenges encountered — by all professional Linux users. Empower your Linux skills by adding Power Linux Topics to your library today. Chris Binnie US . ISBN 978-1-4842-1771-9 Shelve in: 54999 Linux/General User level: Intermediate–Advanced 9781484 217719 SOURCE CODE ONLINE www.apress.com Practical Linux Topics Chris Binnie Practical Linux Topics Copyright © 2016 by Chris Binnie This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed.
    [Show full text]