XOS: an Application-Defined Operating System for Datacenter Computing

XOS: An Application-Defined Operating System for Datacenter Computing Chen Zheng∗y, Lei Wang∗, Sally A. McKeez, Lixin Zhang∗, Hainan Yex and Jianfeng Zhan∗y ∗State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences yUniversity of Chinese Academy of Sciences, China zChalmers University of Technology xBeijing Academy of Frontier Sciences and Technology Abstract—Rapid growth of datacenter (DC) scale, urgency of ability, and isolation. Such general-purpose, one-size-fits-all cost control, increasing workload diversity, and huge software designs simply cannot meet the needs of all applications. The investment protection place unprecedented demands on the op- kernel traditionally controls resource abstraction and alloca- erating system (OS) efficiency, scalability, performance isolation, and backward-compatibility. The traditional OSes are not built tion, which hides resource-management details but lengthens to work with deep-hierarchy software stacks, large numbers of application execution paths. Giving applications control over cores, tail latency guarantee, and increasingly rich variety of functionalities usually reserved for the kernel can streamline applications seen in modern DCs, and thus they struggle to meet systems, improving both individual performances and system the demands of such workloads. throughput [5]. This paper presents XOS, an application-defined OS for modern DC servers. Our design moves resource management There is a large and growing gap between what DC ap- out of the OS kernel, supports customizable kernel subsystems plications need and what commodity OSes provide. Efforts in user space, and enables elastic partitioning of hardware to bridge this gap range from bringing resource management resources. Specifically, XOS leverages modern hardware support into user space to bypassing the kernel for I/O operations, for virtualization to move resource management functionality avoiding cache pollution by batching system calls, isolat- out of the conventional kernel and into user space, which lets applications achieve near bare-metal performance. We implement ing performance via user-level cache control, and factoring XOS on top of Linux to provide backward compatibility. XOS OS structures for better scalability on many-core platforms. speeds up a set of DC workloads by up to 1.6× over our Nonetheless, several open issues remain. Most approaches to baseline Linux on a 24-core server, and outperforms the state- constructing kernel subsystems in user space only support of-the-art Dune by up to 3.3× in terms of virtual memory limited capabilities, leaving the traditional kernel responsible management. In addition, XOS demonstrates good scalability and strong performance isolation. for expensive activities like switching contexts, managing Index Terms—Operating System, Datacenter, Application- virtual memory, and handling interrupts. Furthermore, many defined, Scalability, Performance Isolation instances of such approaches cannot securely expose hardware resources to user space: applications must load code into the I. INTRODUCTION kernel (which can affect system stability). Implementations Modern DCs support increasingly diverse workloads that based on virtual machines incur overheads introduced by the process ever-growing amounts of data. To increase resource hypervisor layer. Finally, many innovative approaches break utilization, DC servers deploy multiple applications together current programming paradigms, sacrificing support for legacy on one node, but the interferences among these applications applications. and the OS lower individual performances and introduce Ideally, DC applications should be able to finely control unpredictability. Most state-of-the-practice and state-of-the-art their resources, customize kernel policies for their own ben- arXiv:1901.00825v1 [cs.OS] 3 Jan 2019 OSes (including Linux) were designed for computing envi- efit, avoid interference from the kernel or other application ronments that lack the support for diversity of resources and processes, and scale well with the number of available cores. workloads found in modern systems and DCs, respectively, From 2011 to 2016, in collaboration with Huawei, we have a and they present user-level software with abstracted interfaces large project to investigate different aspects of DC computing, to those hardware resources, whose policies are only optimized ranging from benchmarking [6], [7], architecture [8], hard- for a few specific classes of applications. ware [9], OS [4], and programming [10]. Specifically, we ex- In contrast, DCs need streamline OSes that can exploit plores two different OS architectures for DC computing [11]– 1 modern multi-core processors, reduce or remove the overheads [13]. This paper dedicates to XOS—an application-defined of resource allocation and management, and better support per- OS architecture that follows three main design principles: formance isolation. Although Linux has been widely adopted • Resource management should be separated from the OS as a preferred OS for its excellent usability and programma- kernel, which then merely provides resource multiplexing bility, it has limitations with respect to performance, scal- and protection. The corresponding author is Jianfeng Zhan. 1The another OS architecture for DC computing is reported in [4] p p A the system plus several important DC workloads. XOS outperforms state-of-the-art systems like Linux and Dune/IX [14], Y S C A L L S [15] by up to 2.3× and 3.3×, respectively, for a virtual memory management microbenchmark. Compared to Linux, F S V XOS speeds up datacenter workloads from BigDataBench [7] U n i x D e v i c e i l e F by up to 70%. Furthermore, XOS scales 7× better on a 24-core I P C F i l e S y s t e m p p p A p A p A p S S e r v e r D r i v e r s r v e r e machine and performs 12× better when multiple memory- V i r t u a l c h e d u l e r S intensive microbenchmarks are deployed together. XOS keeps M e m o r y e v i c e I n t e r r u p t D tail latency — i.e., the time it takes to complete requests falling I P C V i r t u a l M e m o r y D i s p a t c h e r r i v e r s H a n d l e r D into the 99th latency percentile — in check while achieving much higher resource utilization. H a r d w a r e H a r d w a r e (a) monolithic (b) microkernel II. BACKGROUND AND RELATED WORK XOS is inspired by and built on much previous work in A p p A p p A p p A p p p p A p A p OSes to minimize the kernel, reduce competition for resources, L i b r a r y L i b r a r y O S M i r a g e v L i b r a S O S O and specialize functionalities for the needs of specific work- R u n t i m e V M loads. Each of these design goals addresses some, but not E x o k e r n e l y p e r v i s o r H all, of the requirements for DC workloads. Our application- defined OS model is made possible by hardware support for H a r d w a r e a r d w a r e H virtualizing CPU, memory, and I/O resources. (c) exokernel [1] (d) unikernel [2] Hardware-assisted virtualization goes back as far as the IBM System/370, whose VM/370 [16] operating system presented App each user (among hundreds or even thousands) with a separate App App App App App App App App App App App App App virtual machine having its own address space and virtual Single OS Image Single OS Image ® CPU CPU CPU Supervisor SubOS SubOS CPU CPU CPU Supervisor SubOS SubOS devices. For modern CPUs, technologies like Intel VT-x [17] driver driver driver driver driver driver break conventional CPU privilege modes into two new modes: Hardware Hardware Hardware Hardware VMX root mode and VMX non-root mode. (e) multikernel [3] (f) ITFS OS [4] For modern I/O devices, single root input/output virtualization (SR-IOV) [18] standard allows a network interface (in A p p A p p p p this case PCI Express, or PCI-e) to be safely shared among A p A p i n u x L several software entities. The kernel still manages physical X O S X O S R u n t i m e R u n t i m e e r n e l K device configuration, but each virtual device can be configured a r d w a r e H independently from user level. OS architectures have leveraged virtualization to better (g) XOS nokernel meet design goals spanning the need to support increasing Fig. 1: The difference of the XOS nokernel model from the hardware heterogeneity to provide better scalability, elasticity other OS models. with respect to resource allocation, fault tolerance, customiz- ability, and support for legacy applications. Most approaches break monolithic kernels into smaller components to specialize functionality and deliver more efficient performance and fault • Applications should be able to define user-space kernel isolation by moving many traditionally privileged activities out subsystems that give them direct resource access and of the kernel and into user space. customizable resource control. Early efforts like HURRICANE [19] and Hive [20] were • Physical resources should be partitioned such that appli- designed for scalable shared-memory (especially NUMA) cations have exclusive access to allocated resources. machines. They deliver greater scalability by implementing Our OS architecture provides several benefits. Applications microkernels (Figure 1b) based on hierarchical clustering. have direct control over hardware resources and direct access In contrast, exokernel architectures [1] (Figure 1c) sup- to kernel-specific functions; this streamlines execution paths port customizable application-level management of physical and avoids both kernel overheads and those of crossing be- resources.

XOS: an Application-Defined Operating System for Datacenter Computing

A Programmable Microkernel for Real-Time Systems∗

Interrupt Handling in Linux

Uva-DARE (Digital Academic Repository)

High Performance Computing Systems

Additional Functions in HW-RTOS Offering the Low Interrupt Latency

W4118: Multikernel

Interrupt Handling

Multiprocessor Operating Systems CS 6410: Advanced Systems

Interrupt Handling with KSDK and Kinetis Design Studio

An Architectural Overview of QNX®

Design and Implementation of the Heterogeneous Multikernel Operating System

On the Performance and Isolation of Asymmetric Microkernel Design For